User:Headbomb/unreliable#What it does

{{notice|text={{anchor|Disclaimer}}This is the Unreliable/Predatory Source Detector (UPSD), a user script that identifies various unreliable and potentially unreliable sources. This is not a tool to be mindlessly used.

{{Shortcut|WP:UPSD|WP:PREDSCRIPT}}

  • It does not cover every unreliable source out there. It is actually fairly bad at dealing with offline sources, like books and magazines.
  • It does not answer whether a source should be used or not.
  • It does not understand context. See common cleanup and non-problematic cases below for details.
  • It is not perfect. See limitations below for details. See also the first 8 points of the CiteWatch disclaimer.

For example, Twitter is generally unreliable. If Twitter is used in an article, the script will only tell you that a generally unreliable source was used. It does not say that Twitter was used inappropriately, or that it shouldn't be used for that information. The script cannot tell the difference between a tweet by a random person or one by NASA.

Questions, comments and requests can be made on the talk page.

}}

{{Reduced pull quote|right|

;User comments

  • Totally and utterly awesome. – SandyGeorgia
  • I'm using your script and it looks really great! – Newslinger
  • Wow, that's excellent! – JzG

}}

How to install

{{Infobox user script

| name = unreliable.js

| desc = Easily detects unreliable and potentially unreliable sourcing

| author = creffett, Headbomb, Jorm, SD0001

| maintainer = Headbomb

| source = User:Headbomb/unreliable.js

| status = WP:TOPSCRIPTS #11

}}

;Method 1 – Automatic

  1. Go in the 'Gadgets' tab of your preferences and select the 'Install scripts without having to manually edit JavaScript files' option at the bottom of the 'Advanced' section. Refresh this page after enabling that.
  2. Click on the 'Install' button in the infobox on the right, or at the top of the source page.

;Method 2 – Manual

  1. Go to Special:MyPage/common.js. (Alternatively, you can go to Special:MyPage/skin.js to make the script apply only to your current skin.)
  2. Add importScript( 'User:Headbomb/unreliable.js' ); // Backlink: User:Headbomb/unreliable.js to the page (you may need to create it), like [https://en.wikipedia.org/w/index.php?title=User:Headbomb/monobook.js&diff=next&oldid=939455109 this].
  3. Save the page and bypass your cache to make sure the changes take effect.

Once installed, you can go to User:Headbomb/unreliable/testcases to see if it works.

What it does

The script breaks down external links (including DOIs) to various sources in different 'severities' of unreliability. In general, the script is kept in sync with

{{columns-list|colwidth=15em|

}}

and common sense "duh" case I come across (like a parody website) with some minor differences.

class="wikitable"

! Severity

! Appearance

! Explanation

Blacklisted

| [https://example.com Sadequain Naqqash"), but is potentially indicative of AI-generated slop ("AI, write me a biography of Pakistani painter Sadequain Naqqash"). The script is looking for the string utm_source=<AI source>, like utm_source=chatgpt.com. If the source usage is valid, simply remove utm_source=<AI source> from the URL. If not, you may need to go as far as deleting the entire passage containing these sources. See WP:LLM and WP:AICLEAN for further advice.

Note: Detects AskPandi, Claude, ChatGPT, Copilot, Gemini, Grok, Groq, Jasper, and Perplexity. If you know of others, please let me know!

If you see a source that should be highlighted but isn't (or shouldn't be highlighted but is), first let me know on the talk page, along with the relevant website or DOI. But since I do not want my opinion to be king, I maintain a general policy that everything is appealable at WP:RSN, in case of mistakes, accidental misclassifications, etc.

Common cleanup and non-problematic cases

{{See also|Wikipedia:Wikipedia Signpost/2022-08-01/Tips and tricks}}

:The first question to answer about whether or not a source is reliable is reliable for what? For example, Gawker is considered generally unreliable, but that does not mean specific Gawker articles cannot be cited. For example, if Gawker had a scoop, which was subsequently picked up in other reliable sources, it could be entirely acceptable to write {{xt|Gawker first reported the story on 12 September 2019,Gawker... followed by The New York Times on 15 September and several other outlets afterwards.}} However, it could also constitute a violation of WP:DUE, of WP:PRIMARY, of WP:BLPSOURCE, and many other policies and guidelines. Compare these two situations, using the same hypothetical source

style="width:90%; margin-left:auto; margin-right:auto;"
Secret CIA experiments like MKUltra successfully indoctrinated the French and Italian prime ministers in the 1970s. Evidence for these this is secretly held in Area 51. It is suspected that the current heads of state of Lesotho, Canada, and Finland are under currently influence, but experts' opinion vary on whether or not the CIA truly controls them, or if the Reptilians or perhaps even Zeta Reticulans are using the CIA as a proxy.Smith, John (16 September 2006). {{highlight|1=[https://www.example.com "MKUltra: IRREFUTABLE Proof of Reptilians Infiltration"].|2=#ffdddd;}} Unreliable.com.
{{reflist}}
Conspiracy theorists like John Smith often claim that CIA "mind control" experiments have indoctrinated heads of states.Smith, John (16 September 2006). {{highlight|1=[https://www.example.com "MKUltra: IRREFUTABLE Proof of Reptilians Infiltration"].|2=#ffdddd;}} Unreliable.com.
{{blockquote|{{reflist}}

:The script cannot distinguish between these nuances, so use it as a scalpel, not a hammer. If you are worried about the drive-by removal of a source which is used acceptably, you can always put a comment next to the source, e.g. {{xt|Gawker... }}.

  • Book sellers (e.g. Amazon) and copyvio farms (e.g. Scribd) {{anchor|Book sellers}}

:Often times, the 'problem' highlighted by the script is really a citation in need of cleanup more than an actual sourcing problem. For instance, Amazon.com is not considered a reliable source. However, people will often link to Amazon.com as a way to refer to a book sold on Amazon. In those case, the Amazon citation should simply be converted to a proper {{tl|cite book}}

:*[https://www.amazon.com/Modern-Physics-Raymond-Serway/dp/0534493394 Amazon.com] → {{cite book |last=Serway |first=Raymond |last2=Moses |first2=Clement |last3=Moyer |first3=Curt |year=2004 |title=Modern Physics |publisher=Brooks Cole |edition=3rd |isbn=978-0534493394}}

:Likewise, if an {{tl|ASIN}} is present the solution is simply to replace the ASIN with and ISBN and/or OCLC number, if available

:*{{ASIN|0060555661}} → {{ISBN|978-0060555665}} / {{OCLC|907441603}}

:If there's an ASIN as well as an ISBN/OCLC, the ASIN should be removed, otherwise leave it there as an identifier of 'last resort'. Likewise, if you find a link to a good document hosted in violation of copyright, simply update the citation to refer to the proper document, and remove the copyright violating link.

  • External links {{anchor|External links}}

:Many sources are not acceptable as sources, but will be acceptable as external links, e.g. IMDb, Discogs, etc. For more information on the subject of which external links are acceptable, see WP:ELYES, WP:ELMAYBE, and WP:ELNO.

  • General repositories {{anchor|General repositories}}

:Many citations will look like

::{{Aye}} Lewoniewski, W.; Węcel, K.; Abramowicz, W. (2017). {{highlight|[https://www.researchgate.net/publication/320041870 "Analysis of References Across Wikipedia Languages"]|#fffdd0;}}. In Damaševičius, R.; Mikašytė, V. (eds.). Information and Software Technologies. Communications in Computer and Information Science. 756. Cham: Springer. pp. 561–573. {{doi|10.1007/978-3-319-67642-5_47}}. {{ISBN|978-3-319-67641-8}}.

:When you have a yellow "article" link, but a plain DOI link, that usually means the article links to a general repository like Academia.edu, HAL, ResearchGate, Semantic Scholar, Zenodo, and many others. This is generally not problematic, though it could also be that the registrant hasn't been evaluated yet (especially if the DOI prefix is over 10.20000). Note that if there is no freely-available PDF at those repositories, but other identifiers (like the DOI or ISBN) are present, you can remove the repository link as pointless.

:When no plain DOI link is present, you may wish to verify that the document being cited is from a proper journal, as general repositories usually do not filter against preprints and papers from predatory journals being uploaded. For example, this publication, like the one above, is also hosted on ResearchGate

::{{Nay}} Feng, Youjun; Wu, Zuowei (2011). {{highlight|[https://www.researchgate.net/publication/308755209 "Streptococcus suis in Omics-Era: Where Do We Stand?"]|#fffdd0;}}. Journal of Bacteriology & Parasitology. S2: 001.

:Inspection of this source reveals that the Journal of Bacteriology & Parasitology is published by the OMICS Publishing Group, one of the most infamous predatory publishers out there. Thus this second source is problematic, while the first one (published by Springer Science+Business Media), is not.

:Google Books and OCLC host a variety of content, including self-published books and books from blacklisted publishers. As such, both Google Books and OCLC links will be highlighted in yellow. These links will often not be problematic, but you may wish to verify that the book being cited is from a reputable publisher.

::{{Aye}} Wong, S. S. M. (1998). {{highlight|[https://books.google.com/books?id{{=}}YgkfZgFdui8C Introductory Nuclear Physics]|#fffdd0;}} (2nd ed.). New York: Wiley Interscience. {{ISBN|978-0-471-23973-4}}. OCLC {{highlight|[https://www.worldcat.org/oclc/1023294425 1023294425]}}.

::{{Nay}} Pratt, J. (2011). {{highlight|[https://books.google.ca/books?id{{=}}8Y5MAgAAQBAJ Stewardship Economy: Private Property Without Private Ownership]|#fffdd0;}}. San Francisco: Lulu.com. {{ISBN|978-1-4467-0151-5}}. OCLC {{highlight|[https://www.worldcat.org/oclc/813296703 813296703]}}.

:Note that if you have an 'main' OCLC link, it's usually a good idea to convert it into an OCLC identifier (or remove the link if the OCLC identifier is already there), as it makes it look like a freely available version of the book is available.

  • Preprints {{anchor|Preprints}}

: The links to arXiv/bioRxiv/CiteSeerX/SSRN preprints and documents generated via the {{para|arxiv}}/{{para|biorxiv}}/{{para|citeseerx}}/{{para|ssrn}} parameters of the various {{cite xxx}} templates are obviously not problematic for reliability, so long as the citation itself isn't problematic. Citations to preprints will often be acceptable for routine claims as self-published expert sources, but they will invariably fail WP:MEDRS nor will they meet a higher standard of sourcing, as preprints are not peer-reviewed (or will reflect a state prior to peer-review). Keep in mind that several papers hosted on preprint repositories will have been published in peer-reviewed venues (and some of those papers are even technically postprints), so you should always investigate rather than assume that something is unreliable simply because it's on a preprint server. You may simply need to update things to a proper {{tl|cite journal}}, rather than a {{tl|cite arxiv}} or {{tl|cite biorxiv}}.

  • Social media and video-sharing platforms {{anchor|Social media|video-sharing platforms}}

:The sources are as reliable as the account owner is. For example, a tweet by NASA, or a YouTube video from the BBC News is as reliable as those organizations are. Videos of random people giving their opinions, [http://www.quotationspage.com/quote/37615.html not so much]. Beware of WP:COPYVIOEL, where Randy in Boise uploaded a video from BBC News or some other valid source.

::{{Aye}} Valentine, Andre. (15 July 2023). {{highlight|1=[https://www.youtube.com/watch?v=olPdP9zMegE "Celebrating the Webb Space Telescope's First Year of Science on This Week @NASA – July 14, 2023"]|2=#ffdddd;}}. NASA's YouTube Channel – via YouTube.com.

::{{Nay}} CoasterFan2105. (6 May 2016). {{highlight|1=[https://www.youtube.com/watch?v=xVMsAgHy_IY "California Trains! 1 Hour, 150+ Trains!"]|2=#ffdddd;}}. CoasterFan2105's YouTube Channel – via YouTube.com.

Limitations

{{nutshell|title=This section|1=

  • The script will only look for matches in a) URLs and DOIs b) references (...) and c) proper lists (e.g. Further reading/External links sections).
  • False positives are possible, e.g. https://www.alexa.com/siteinfo/deprecated.com
  • It can produce some weird things in discussions concerning bad sources, like highlighting an entire comment which mentions a bad source.

}}

=What the script looks for=

The script only operates on

  • URL links, including those generated through templates like {{tl|citation}}, {{tl|cite journal}}, {{tl|cite web}}, {{tl|doi}}, and others.
  • If no URL link is present, the text inside ... tags
  • If no URL link is present, the text inside list items

That is, it will detect links to Deprecated.com, as well as references and list items that mention Deprecated.com, but it won't recognize other mentions of Deprecated.com in the text. In practice, this means that all URLs are checked (regardless of where they are), as well as all lists of publications/bibliographies/references that follow a regular format (including those in the Further reading/External links sections).

  • John Smith (2014). "[https://www.deprecated.com/article Article of things]". Deprecated.com. Accessed 2020-02-14.

:: → John Smith (2014). "[https://www.deprecated.com/article Article of things]". Deprecated.com. Accessed 2020-02-14.

  • John Smith (2014). "Article of things". Available on Deprecated.com. Accessed 2020-02-14.

:: → John Smith (2014). "Article of things". Available on Deprecated.com. Accessed 2020-02-14.

The script can easily classify DOIs by their DOI prefixes, which correspond to various registrants (for instance 10.4172/... belongs to the OMICS Publishing Group). Most registrants are publishers, some are individual journals.

The script can also classify DOIs through "starting patterns", but this is trickier. For example, Chaos, Solitons, & Fractals has DOIs like {{doi|10.1016/j.chaos.2018.11.014}} or {{doi|10.1016/S0960-0779(09)00060-5}}. These have starting patterns of 10.1016/j.chaos. and 10.1016/S0960-0779, which will not match other journals. However, this is very tricky to determine, as those patterns can vary over time, and can also be hard to recognized as meaningful patterns (here S0960-0779 is related to the ISSN of that specific journal, and isn't just a random string like {{doi|10.1023/a:1022444005336}}). They could also be so closely related to the patterns of other journals to cause a collision.

=What if an unreliable source comes from possibly AI-generated text?=

In the case of an unreliable source that comes from possibly AI-generated text (e.g. https://www.deprecated.com/?utm_source=), the warning against AI will apply first. If you deem that the source usage is acceptable and that the passage is not AI-generated slop, then you should remove utm_source= from the url (e.g. https://www.deprecated.com/). It will then be flagged as unreliable as normal.

This means you may need to do two rounds of review. First for AI-generated slop, then for proper reliable source usage.

=False positives=

Because the script is looking for strings that correspond to URL domains anywhere in the url, it could match the urls of other websites. For example, the script cannot distinguish between

  • [https://www.deprecated.com/MOON-CHEESE-CURES-CANCER.html Moon cheese cures cancer!][https://www.deprecated.com/MOON-CHEESE-CURES-CANCER.html [https://www.alexa.com/siteinfo/deprecated.com New York Times article might also failed higher sourcing requirements like WP:BLP or WP:MEDRS. The script will also not flag those.

    Likewise, if no URL/DOI is provided, the source will get not flagged. For example, the {{doi-inline|10.21767/2577-0594.100001|following paper}}

    • {{cite journal |first=J. R. |last1=Lawal |first2=A. D. |last2=El Yuguda |first3=U. I. |last3=Ibrahim |title=Survey on Prevalence of Newcastle Disease Antibodies in Village Poultry at Live Birds Markets in Gombe, Nigeria |journal=Journal of Animal Sciences and Livestock Production |date=2017 |volume=1 |issue=1 |pages=1-9}}

    is from an iMedPub journal, a subsidiary of the (in)famous OMICS Publishing Group predatory publisher, and is not getting flagged because of a lack of recognizable URL/DOI.

=Comments=

For technical reasons, it will also sometimes highlight entire comments made in ordered and unordered lists for (i.e. comments that start with * or #):

  • Keep When searching for sources, I found something on Deprecated.com that would indicate that the Foobarological Remedies are responsible for over 25% of remissions. This should count for meeting WP:N. User:Example (talk) 17:29, 19 August 2020 (UTC)
  • Actually, that site is not a reliable source, and does not established notability, much less efficacy per WP:MEDRS. User:Example2 (talk) 18:29, 19 August 2020 (UTC)

This can be avoided by giving the actual link, in which case only the link will be highlighted

  • Keep When searching for sources, I found something on [https://www.deprecated.com/article

    },

    ];

    and the background colour #fffdd0 will no longer be applied. If you instead want to change Google Books links to a different color with a red border, like #7cfc00, then use

    :css: { "background-color": "#7cfc00", "border":"2px solid red"}

    instead of

    :css: { "background-color": "" }

    in the above example.

=Example 2: Add a source=

If you have a specific source that needs to be added, you should generally ask for it to be added on the talk page of the script (if obvious) or WP:RSN (if consensus is needed), this way everyone using the script can benefit from its detection. However, if the source doesn't warrant being flagged by the script for everyone, but you'd like it to be flagged for you (for example, Biodiversity Heritage Library and ChemSpider links), you can create your own rules by adding the following to Special:MyPage/unreliable-rules.js

unreliableCustomRules = [

{

comment: 'Biodiversity Heritage Library',

regex: /\b(biodiversitylibrary\.com)/i,

css: { "background-color": "#40e0d0" }

},

{

comment: 'ChemSpider',

regex: /\b(chemspider\.com)/i,

css: { "background-color": "#d8bfd8" }

},

];

and these links will be highlighted in {{highlight|#d8bfd8|#d8bfd8}} and {{highlight|#40e0d0|#40e0d0}} respectively.

Source code

{{Main|User:Headbomb/unreliable.js}}

While I (Headbomb) came up with the idea for the script and am the person maintaining it, the basic script was designed by SD0001 with refinements by Jorm and creffett. Anything clever in the code is from them. I'm mostly just maintaining the list of sources to be covered.

See also

{{col begin}}

{{Col-break}}

;Reliability-related

{{col break}}

;General ressources

  • User:JL-Bot/DOI – A list of Crossref DOI registrants. Useful to find to which registrant a DOI prefix is associated. The list is not comprehensive (Crossref isn't the only issuer of DOIs), and the output is not clean.

;Other scripts

{{col end}}