MediaWiki talk:Captcha-addurl-whitelist#From the Wikipedia Library

{{archive box|auto=long|search=yes}}

{{shortcut|WT:CWL}}

__TOC__

From the Wikipedia Library

{{sudo|answered=yes}}

Hi,

Sam Walton [https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:The_Wikipedia_Library&diff=833038065&oldid=833009090 provided] this list of websites from the Wikipedia Library partners. Clayoquot (talk | contribs) 23:13, 29 March 2018 (UTC)

{{collapse top}}

class="wikitable"

|+

!Publisher

!URL

HighBeam

|*.highbeam.com

Questia

|*.questia.com

Credo

|*.credoreference.com

JSTOR

|*.jstor.org

The Royal Society

|*.royalsocietypublishing.org

British Newspaper Archive

|*.britishnewspaperarchive.co.uk

Keesings

|*.keesings.com

Oxford University Press

|*.oxfordartonline.com

Oxford University Press

|*.oxfordmusiconline.com

Oxford University Press

|*.anb.org

Oxford University Press

|*.oxforddnb.com

Oxford University Press

|*.oxfordbibliographies.com

Oxford University Press

|*.oxfordjournals.org

Oxford University Press

|*.academic.oup.com

Oxford University Press

|*.oxfordhandbooks.com

Oxford University Press

|*.oxfordscholarship.com

Oxford University Press

|*.oxfordreference.com

Oxford University Press

|*.ouplaw.com

BMJ

|*.bmj.com

Newspapers.com

|*.newspapers.com

Past Masters

|*.nlx.com

FindMyPast

|*.findmypast.co.uk

Fold3

|*.fold3.com

Scotlands People

|*.scotlandspeople.gov.uk

Project MUSE

|*.muse.jhu.edu

De Gruyter

|*.degruyter.com

Sage

|*.sagepub.com

Elsevier

|*.sciencedirect.com

Adam Matthew

|*.amdigital.co.uk

RIPM

|*.ripmfulltext.org

McFarland

|*.mcfarlandbooks.com

Royal Pharmaceutical Society

|*.pharmaceutical-journal.com

Newspaperarchive.com

|*.newspaperarchive.com

World Bank

|*.worldbank.org

Women Writers Online

|*.wwp.northeastern.edu

Royal Society of Chemistry

|*.rsc.org

American Psychological Association

|*.apa.org

Brill

|*.brillonline.com

Brill

|*.brill.com

L'Harmattan

|*.editions-harmattan.fr

Cairn

|*.cairn.info

HeinOnline

|*.heinonline.org

Taylor and Francis

|*.tandfonline.com

AAAS

|*.sciencemag.org

Loeb Classical Library

|*.loebclassics.com

MIT Press Journals

|*.mitpressjournals.org

Erudit

|*.erudit.org

International Monetary Fund

|*.imf.org

Sabinet

|*.sabinet.co.za

Sabinet

|*.journals.co.za

Al Manhal

|*.almanhal.com

OpenEdition

|*.openedition.org

Numerique Premium

|*.numeriquepremium.com

Annual Reviews

|*.annualreviews.org

EBSCO

|*.ebscohost.com

Gale

|*.galegroup.com

Miramar Ship Index

|*.miramarshipindex.org.nz, *.miramarshipindex.nz

Future Science

|*.future-science.com

Future Science

|*.futuremedicine.com

Cambridge University Press

|*.cambridge.org

Baylor

|*.baylorpress.com

Alexander Street Press

|*.alexanderstreet.com

EDP Sciences

|*.edpsciences.org

Nomos

|*.nomos.de

Nomos

|*.nomos-elibrary.de

Edinburgh University Press

|*.euppublishing.com

World Scientific

|*.worldscientific.com

Foreign Affairs

|*.foreignaffairs.com

ASHA

|*.asha.org

Emerald

|*.emeraldinsight.com

Bloomsbury

|*.dramaonlinelibrary.com

Bloomsbury

|*.ukwhoswho.com

Bloomsbury

|*.bloomsburyfashioncentral.com

Bloomsbury

|*.whitakersalmanack.com

De Gruyter

|*.pschyrembel.de

American Psychiatric Association

|*.psychiatryonline.org

EDP Sciences

|*.edpsciences.org

Oxford University Press

|*.e-enlightenment.com

ProQuest

|*.proquest.com

SpringerNature

|*.nature.com

SpringerNature

|*.springer.com

Wiley

|*.wiley.com

JAMA

|*.jamanetwork.com

ACM

|*.acm.org

ACS

|*.acs.org

BioOne

|*.bioone.org

IOP

|*.iop.org

{{collapse bottom}}

:{{re|Clayoquot}} I posted at Wikipedia:Reliable_sources/Noticeboard#White_listing_sites_from_WP:TWL for a review, if no issues in a week please activate the edit request tag at the top of this section. Thanks, — xaosflux Talk 01:51, 30 March 2018 (UTC)

:{{t1|on hold}} pending RSN or time. — xaosflux Talk 14:51, 30 March 2018 (UTC)

:: Thanks. The relevant discussion is now archived and there were no objections. Cheers, Clayoquot (talk | contribs) 22:14, 18 April 2018 (UTC)

:{{doing}} — xaosflux Talk 23:09, 18 April 2018 (UTC)

::{{done}} {{re|Clayoquot}} these have been added, let me know if you see any trouble. — xaosflux Talk 23:14, 18 April 2018 (UTC)

:::Excellent! I'm glad I [https://www.mediawiki.org/w/index.php?title=Topic:U9qc1qoaa0h1m7vp&topic_showPostId=u9s8nr067vq1e18p#flow-post-u9s8nr067vq1e18p mentioned it], which I think is what led to all this activity. Thanks for getting some sensible updates through, all. :) Quiddity (WMF) (talk) 23:41, 18 April 2018 (UTC)

::::Thanks to {{re|Samwalton9 (WMF)}} as well. — xaosflux Talk 00:27, 19 April 2018 (UTC)

:::::No problem! There are definitely many more sites that could be added here, but that's a good start :) Samwalton9 (WMF) (talk) 09:50, 19 April 2018 (UTC)

Proposal to add major newspapers etc.

{{sudo|answered=yes}}

A short RSN discussion showed some support for the principle of adding major newspapers to this list, and I think we can extend that to some other media such as the BBC. Should we produce a full list for approval?

class="wikitable"

!Publisher

!URL

BBC

|https://www.bbc.co.uk/

The Guardian

|https://www.theguardian.com/

The Independent

|https://www.independent.co.uk/

The Times

|https://www.thetimes.co.uk/

Financial Times

|https://www.ft.com/

The Daily Telegraph

|https://www.telegraph.co.uk/

The Scotsman

|https://www.scotsman.com/

The Herald (Glasgow)

|http://www.heraldscotland.com/

Please can non-UK editors add respected journals from their own countries? The Washington Post, The Globe and Mail and The Hindu have been suggested. I've left off tabloids such as The Sun (United Kingdom) and the Daily Mirror to maximise the chance of approval. I hope we can leave the initial www off the URL pattern, to allow variants such as news.bbc.co.uk. The Times has a paywall; is it worth including such sources?

Someone recently posted a link to a useful article with a Venn diagram classifying news sources by political bias and level of detail, but I've lost it. Please can someone point us at that again? Thanks, Certes (talk) 10:44, 19 April 2018 (UTC)

:{{on hold}} activated an edit request too see if any patrolling admins want to comment before processing. — xaosflux Talk 12:03, 19 April 2018 (UTC)

::Would it be better to start this discussion somewhere else, returning if and when it has enough detail and support to qualify as an edit request? If so, is WP:RSN the right forum? I don't think anyone doubts that these are reliable sources; the question is whether they should be added to this whitelist. Certes (talk) 12:19, 19 April 2018 (UTC)

:::{{re|Certes}} RSN is the best forum I can think of for these, you can move it there, or just link in to this from there with a summary. Basically if domains are representative of reliable sources, are useful for new users, and not being abused (such as for spam, advertising, selling subscriptions, etc) they are OK to be on this list as far as I'm concerned. — xaosflux Talk 12:27, 19 April 2018 (UTC)

::::A [https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Noticeboard#Captcha_whitelist_for_major_newspapers_etc. notice was posted at WP:RSN] on 19 April asking that people come here to comment. EdJohnston (talk) 14:39, 22 April 2018 (UTC)

:FWIW, I fully support this. Ed [talk[majestic titan] 19:57, 22 April 2018 (UTC)

:{{doing}} — xaosflux Talk 20:08, 22 April 2018 (UTC)

::{{done}} — xaosflux Talk 20:12, 22 April 2018 (UTC)

:::Thank you! I still hope editors from beyond the UK will contribute similar lists for their countries. Certes (talk) 22:48, 22 April 2018 (UTC)

What exactly is this?

I wonder what exactly is this? Is this just a list of urls that don't require a CAPTCHA for unregistered users? Therefore should we add all low risks but popular URLs? --Emir of Wikipedia (talk) 20:49, 22 April 2018 (UTC) {{pme}}

:{{re|Emir of Wikipedia}} yes, normally unregistered and new editors have to solve a captcha to add links; these specific domains are exempt from that. There is some performance to consider, so keeping this to "popular" as in links that are actually being appropriately added to pages is a factor. In general this means the links should be for "reliable sources". It is important that the exemptions are not useful for disruptive use as well. We have only recently begun using this and this page is not well watched - I suggest discussing additions at WP:RSN first. — xaosflux Talk 21:39, 22 April 2018 (UTC)

::Thanks for the information. I have seen the discussions at RSN and came here for clarification. --Emir of Wikipedia (talk) 20:01, 23 April 2018 (UTC)

Please add IPCC and National Academies domains

{{sudo|answered=yes}}

Could you please add:

  • ipcc.ch (Intergovernmental Panel on Climate Change)
  • nap.edu (National Academies of Sciences, Engineering, and Medicine)

? Clayoquot (talk | contribs) 22:52, 22 February 2020 (UTC)

:{{not done}} (not yet) following the directions, please link to where this was {{tq|discuss additions publicly such as at the Wikipedia:Reliable sources/Noticeboard}}. — xaosflux Talk 14:02, 23 February 2020 (UTC)

::{{u|Xaosflux}}, it's pretty inconceivable that a discussion at RSN would yield a result other than "yes, those are reliable sources". Would you consider pulling an IAR to add these two without going through a community process? Best, Clayoquot (talk | contribs) 17:57, 23 February 2020 (UTC)

:::{{re|Clayoquot}} I'll leave this open for at least a day in case anyone else wants to skip the discuss (which on these is usually more of a 'no objections, go ahead') type. I've never heard of ipcc.ch, (it appears to only have 5 article usages). nap.edu only appears to have 4 article usages as well - so at the very least these don't seem to be popular sources. — xaosflux Talk 19:00, 23 February 2020 (UTC)

::::{{u|Xaosflux}}, For www.nap.edu, I'm seeing usage in 957 pages,[https://en.wikipedia.org/w/index.php?title=Special:LinkSearch&limit=500&offset=500&target=https%3A%2F%2Fwww.nap.edu%2F] and www.ipcc.ch appears to be referenced in 736 pages.[https://en.wikipedia.org/w/index.php?title=Special:LinkSearch&limit=500&offset=500&target=https%3A%2F%2Fwww.nap.edu%2F] Clayoquot (talk | contribs) 17:42, 24 February 2020 (UTC)

:::::Looks like I had my wildcard wrong, more popular than my first count indeed :) — xaosflux Talk 18:14, 24 February 2020 (UTC)

::::::{{u|Xaosflux}}, We've all done that :) Clayoquot (talk | contribs) 02:54, 25 February 2020 (UTC)

:::::::{{ping|Clayoquot}} please post at WP:RSN if you are ignored for a week, reactivate and I'll add here. — xaosflux Talk 15:26, 27 February 2020 (UTC)

:::::::: Posted there. Thanks. Clayoquot (talk | contribs) 18:20, 27 February 2020 (UTC)

:::::::::Done. There were no objections: https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Noticeboard/Archive_286#CAPTCHA_exemption_for_reliable_domains Clayoquot (talk | contribs) 22:00, 7 March 2020 (UTC)

:::::::::: Could someone make this change please? {{ping|Xaosflux}}? Clayoquot (talk | contribs) 17:30, 11 March 2020 (UTC)

:{{done}} {{ping|Clayoquot}} as there were no objections, I've added. — xaosflux Talk 17:38, 11 March 2020 (UTC)

RfC on adding [[WP:GREL|generally reliable]] sources to the [[MediaWiki:Captcha-addurl-whitelist|CAPTCHA whitelist]]

{{edit fully-protected|answered=yes}}

There is a request for comment on adding generally reliable sources from the perennial sources list to the CAPTCHA whitelist, which allows new and anonymous users to cite them in articles without needing to solve a CAPTCHA. If you are interested, please participate at {{slink|WP:RSN|Adding generally reliable sources to the CAPTCHA whitelist}}. — Newslinger talk 19:42, 7 March 2020 (UTC)

:The discussion has passed with "near-unanimous" consensus in favour of the proposal and should be implemented. For future reference, it is now archived at Wikipedia:Reliable_sources/Noticeboard/Archive_291#Adding_generally_reliable_sources_to_the_CAPTCHA_whitelist. 107.190.33.254 (talk) 17:01, 7 May 2020 (UTC)

:Would someone please regex this up in to a ready to go addition, then activate the edit request here? — xaosflux Talk 00:57, 8 May 2020 (UTC)

{{re|Newslinger|Xaosflux}} Not sure why this discussion died out, but on WP:RSNP, this did the trick:

console.log([...$('.perennial-sources .s-gr a[href*="Linksearch&target=https://"]')].map(a => '\\b' + a.href.match(/\*\.(.*)/)[1].replaceAll(".", "\\.")).join("\n"))

{{collapse top|RSNP list}}

\babcnews\.com

\babcnews\.go\.com

\btheage\.com\.au

\bafp\.com

\baljazeera\.com

\baljazeera\.net

\bamnesty\.org

\badl\.org

\baon\.com

\barstechnica\.com

\barstechnica\.co\.uk

\bap\.org

\bapnews\.com

\btheatlantic\.com

\btheaustralian\.com\.au

\bavclub\.com

\bavn\.com

\baxios\.com

\bbbc\.co\.uk

\bbbc\.com

\bbehindthevoiceactors\.com

\bbellingcat\.com

\bbloomberg\.com

\bbusinessweek\.com

\bburkespeerage\.com

\bbuzzfeednews\.com

\bbuzzfeed\.com

\bcsmonitor\.com

\bclimatefeedback\.org

\bcnet\.com

\bcnn\.com

\bcodastory\.com

\bcommonsensemedia\.org

\btheconversation\.com

\btelegraph\.co\.uk

\bdeadline\.com

\bdeadlinehollywooddaily\.com

\bdebretts\.com

\bdeseretnews\.com

\bdw\.com/en

\bdigitalspy\.co\.uk

\bdigitalspy\.com

\bthediplomat\.com

\beconomist\.com

\biranicaonline\.org

\bengadget\.com

\bew\.com

\bft\.com

\bforbes\.com

\bfoxnews\.com

\bfoxbusiness\.com

\bgamedeveloper\.com

\bgamasutra\.com

\bgameinformer\.com

\bwyborcza\.pl

\bgeonames\.usgs\.gov

\bgizmodo\.com

\btheglobeandmail\.com

\btheguardian\.com

\bguardian\.co\.uk

\btheguardian\.co\.uk

\bhaaretz\.com

\bhaaretz\.co\.il

\bthehill\.com

\bthehindu\.com

\bhollywoodreporter\.com

\bhuffpost\.com

\bhuffingtonpost\.com

\bhuffingtonpost\.co\.uk

\bhuffingtonpost\.ca

\bhuffingtonpost\.com\.au

\bhuffpostbrasil\.com

\bhuffingtonpost\.de

\bhuffingtonpost\.es

\bhuffingtonpost\.fr

\bhuffingtonpost\.gr

\bhuffingtonpost\.in

\bhuffingtonpost\.it

\bhuffingtonpost\.jp

\bhuffingtonpost\.kr

\bhuffpostmaghreb\.com

\bhuffingtonpost\.com\.mx

\bidolator\.com

\bign\.com

\bindependent\.co\.uk

\bindianexpress\.com

\binsider\.com

\bthisisinsider\.com

\bipsnews\.net

\bipsnoticias\.net

\bipscuba\.net

\btheintercept\.com

\bifcncodeofprinciples\.poynter\.org

\bjacobinmag\.com

\bcatalyst-journal\.com

\bjamanetwork\.com

\bthejc\.com

\bkirkusreviews\.com

\bkommersant\.ru

\bkommersant\.com

\bkommersant\.uk

\blatimes\.com

\bmg\.co\.za

\bthemarysue\.com

\bmetacritic\.com

\bgamerankings\.com

\bmonde-diplomatique\.fr

\bmondediplo\.com

\bmotherjones\.com

\bmsnbc\.com

\bthenation\.com

\bnationalgeographic\.com

\bnbcnews\.com

\bnewrepublic\.com

\bnymag\.com

\bvulture\.com

\bthecut\.com

\bgrubstreet\.com

\bnydailynews\.com

\bnytimes\.com

\bnewyorker\.com

\bnzherald\.co\.nz

\bnewslaundry\.com

\bnewsweek\.com

\bnpr\.org

\bpeople\.com

\bpewresearch\.org

\bpeople-press\.org

\bjournalism\.org

\bpewsocialtrends\.org

\bpewforum\.org

\bpewinternet\.org

\bpewhispanic\.org

\bpewglobal\.org

\bpinknews\.co\.uk

\bplayboy\.com

\bpolitico\.com

\bpolitifact\.com

\bpolygon\.com

\bpropublica\.org

\bqz\.com

\brfa\.org

\brappler\.com

\breason\.com

\btheregister\.co\.uk

\breligionnews\.com

\breuters\.com

\brollingstone\.com

\brottentomatoes\.com

\bsciencebasedmedicine\.org

\bscientificamerican\.com

\bscotusblog\.com

\bnews\.sky\.com

\bsnopes\.com

\bscmp\.com

\bsplcenter\.org

\bspace\.com

\bspiegel\.de

\bsmh\.com\.au

\bthewrap\.com

\btime\.com

\bthetimes\.co\.uk

\bthesundaytimes\.co\.uk

\btimesonline\.co\.uk

\btorrentfreak\.com

\btvguide\.com

\btvguidemagazine\.com

\busnews\.com

\busatoday\.com

\bvanityfair\.com

\bvariety\.com

\bventurebeat\.com

\btheverge\.com

\bvogue\.com

\bvoanews\.com

\bvox\.com

\bwsj\.com

\bwashingtonpost\.com

\bweeklystandard\.com

\bthewire\.in

\bthewirehindi\.com

\bthewireurdu\.com

\bwired\.com

\bwired\.co\.uk

\bnews\.yahoo\.com

\bzdnet\.com

{{collapse bottom}}

{{collapse top|Duplicates to remove from the old list}}

\bbbc\.com

\bbbc\.co\.uk

\bft\.com

\bindependent\.co\.uk

\bjamanetwork\.com

\btelegraph\.co\.uk

\btheguardian\.com

\bthetimes\.co\.uk

{{collapse bottom}}

I participated in that discussion, but see no reason think the consensus isn't still valid. Suffusion of Yellow (talk) 19:19, 20 May 2023 (UTC)

:@Suffusion of Yellow I was only here as an edit request patrolling admin, the ER wasn't ready - if it's ready now, please reactivate the request to enqueue this again. — xaosflux Talk 19:50, 20 May 2023 (UTC)

::Well, I don't see any problems, but can't hurt to ask {{u|Headbomb}} who probably has RSNP memorized. Does it look like I generated that list properly? Suffusion of Yellow (talk) 23:21, 20 May 2023 (UTC)

:Minor quibble: does the /en after bdw.com actually work? I'm not exactly how the check does with the whitelist, but I imagine it works only on the domain (not the path within the host), to prevent citations such as wikipedia.org.spamsite.tld/spamspamspam.html. Certes (talk) 11:10, 21 May 2023 (UTC)

::Oops, it doesn't: see #Protected edit request on 11 April 2021 (updated today) below. Certes (talk) 22:34, 21 May 2023 (UTC)

: I've reactivated the request, per lack of objection. Please:

:* Add all lines from the "RSNP" list above

:* Remove all lines from the "Duplicates" list

: Thanks. Suffusion of Yellow (talk) 20:54, 23 May 2023 (UTC)

:{{done}} Izno (talk) 23:09, 24 May 2023 (UTC)

Adding NCBI to the list

{{resolved}}

  • [https://www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov]

Is undeniably a source of reliable peer-reviewed journal articles and is often used in citations (eg. WP:PUBMED) - i.e. same as jstor.org, which is already on the list. 107.190.33.254 (talk) 17:08, 7 May 2020 (UTC)

:The entire nih.gov domain is already on the list - is it not working? — xaosflux Talk 17:48, 7 May 2020 (UTC)

::My bad; then; I only searched for "ncbi" using ctrl+f and couldn't find it. Through I could have sworn it didn't always work; maybe it was some other website as result of citation templates or maybe I was adding multiple sources. Anyway, now it works without a doubt, case closed. Thanks, 107.190.33.254 (talk) 18:19, 7 May 2020 (UTC)

Protected edit request on 14 May 2020

{{edit fully-protected|MediaWiki:Captcha-addurl-whitelist|answered=yes}}

Remove "such as those used in {{tl|cite doi}}." from the header and "and in Template:Cite doi" from the comment after doi.org, since Template:Cite doi was deprecated. * Pppery * it has begun... 19:35, 14 May 2020 (UTC)

:{{Bcc|Pppery}}{{done}}. Thanks for submitting this! — Newslinger talk 21:46, 14 May 2020 (UTC)

Protected edit request on 11 April 2021

{{edit fully-protected|MediaWiki:Captcha-addurl-whitelist|answered=yes}}

  • Change every single regex entry to have $ at the end. Two example lines:
  • - \bwikipedia\.org # All language versions of Wikipedia
  • + \bwikipedia\.org$ # All language versions of Wikipedia
  • (...)
  • - \bbbc\.com
  • + \bbbc\.com$

I've indicated with <del> and <ins> what the respective changes for these lines should be, but I think the changes should be self-explanatory.

The reason this change is necessary is because currently this whitelist also whitelists urls such as http://wikipedia.org.phishing.site.example.org/my_virus_url, just to give a blatant example of a bad url. Please do test this yourself, but from my testing on another wiki, those URLs were accepted as long as the regular expressions are not finished with a $. As the page states: "Every non-blank line is a regex fragment which will only match hosts inside URLs". This means that the end of the domain name can safely be finished with a $ marker, since the text that will be matched against will never contain anything after the last character in the domain name.

I'm not sure if this should be communicated to other international versions of wikipedia, but it seems relevant for you guys to change this since you are the first hit on Google when I search for the system message name ("MediaWiki:Captcha-addurl-whitelist"). Joeytje50 (talk) 17:43, 11 April 2021 (UTC)

: I'm pretty sure this would break it to only allow https://wikipedia.org, and not say https://wikipedia.org/any/page.php. If I'm right, what you actually want is to add a / to the end. Anomie 01:03, 12 April 2021 (UTC)

::If the trailing slash is optional then we need something like \bwikipedia\.org(/.*)?$, though I think this still allows not-wikipedia.org. Certes (talk) 10:14, 12 April 2021 (UTC)

:::The \b boundries aren't stopping that? — xaosflux Talk 17:58, 16 April 2021 (UTC)

:{{not done}} this needs more review and testing before bulk changes are made. — xaosflux Talk 17:58, 16 April 2021 (UTC)

{{re|Joeytje50|Anomie|Xaosflux|Certes}} Some tests at test2wiki (testwiki's link handling is broken) Anything not marked (captcha) didn't get a captcha:

  • \bacm\.org
  • https://acm.org
  • https://acm.org.spam.site
  • https://acm.orgg.spam.site
  • https://aacm.org (captcha)
  • https://spam.site/acm.org (captcha)
  • https://acm.org/index.html
  • \bacs\.org$
  • https://acs.org
  • https://acs.org/ (captcha)
  • https://acs.org.spam.site (captcha)
  • \banb\.org/
  • https://anb.org (captcha)
  • \bapa\.org(?:/|$)
  • https://apa.org
  • https://apa.org/
  • https://apa.org/index.html
  • https://apa.org.spam.site (captcha)
  • https://foo.apa.org/
  • https://foo-apa.org/
  • (?<=[./])bbc\.com(?:/|$)
  • https://bbc.com/
  • https://foo.bbc.com
  • https://foo-bbc.com/ (captcha)
  • https://bbc.com.spam.site (captcha)
  • \bdw\.com/en
  • https://dw.com/en
  • https://dw.com/spam (captcha)
  • https://dw.com (captcha)

So yes, the problem is real. It looks like the right format is (?<=[./])some\.good\.site(?:/|$) Not sure what to do here. Adding all those (?:/|$) seems cheap enough. But what about all those (?<=[./]) lookbehinds? Could that cause a performance hit? Suffusion of Yellow (talk) 21:54, 21 May 2023 (UTC)

:Even that will match https://malicious.domain/pretending.to.be.some.good.site/virus.exe, though not https://some.good.site:80/innocent.html. Is the whole URL matched against the pattern? If so, we may need to parse the whole URL, starting the regexp with ^. There's at least one [https://urlregex.com/ whole website] devoted to how to do that properly, or see page 50 of https://www.ietf.org/rfc/rfc3986.txt. Certes (talk) 23:03, 21 May 2023 (UTC)

::No, see the https://spam.site/acm.org example above. Assuming [https://gerrit.wikimedia.org/g/mediawiki/extensions/ConfirmEdit/+/1a440848fa1bc989be606692e90dd770f3459786/SimpleCaptcha/SimpleCaptcha.php#777 this] is the right place, the regexes are bundled together, then prefixed with ^(?:https?:)?\/\/+[a-z0-9_\-.]*. We could use the <noprotocol> option and supply the prefixes ourselves, but would that be even slower? Or we could do the bundling ourselves, but that would make this page as unreadable as some edit filters. Suffusion of Yellow (talk) 23:47, 21 May 2023 (UTC)

:::{{re|Suffusion of Yellow|Certes}} If the prefix ^(?:https?:)?\/\/+[a-z0-9_\-.]* is added, then that would be an issue in MediaWiki itself, right? You would expect the prefix to require a period at the end, if there is any subdomain preceding the whitelisted domain. Otherwise I'm pretty sure almost every single wiki that has a whitelist is vulnerable to adding a link to http://fake-wikipedia.org ([https://regex101.com/r/4VKgnv/1 demo]). A simple \b is not sufficient, due to the existence of the dash in domain names.

:::So regardless of this protected edit request, I'd say MediaWiki should change the prefix to ^(?:https?:)?\/\/+([a-z0-9_\-.]*\.)* to enforce the period at the end. Let me know what you guys think about that.

:::Regarding this edit request, I'd say the testing done by Suffusion of Yellow is pretty conclusive that some changes are needed. The lookbehind is required because of the aforementioned issue with hyphens (simple \b is insufficient), and the lookahead for the trailing slash or string terminator is required because otherwise wikipedia.org.spam.site would be whitelisted as well. I haven't re-enabled the edit request template at the top, but if anyone knows what the impact would be on performance, I think this request can be re-enabled. If performance is impacted significantly, I think the aforementioned change to MediaWiki software is even more important, and if lookbehinds are impacting performance, I'd assume changing the lookbehind to (/|$) as a regular capturing group would work as well.

:::The updated edit request is now:

::::At the start of every line: \b(?<=[./])

::::At the end of every line: (?:/|$)

:::Joeytje50 (talk) 11:49, 29 January 2024 (UTC)

::::Thanks, that looks good to me. It's hard to be sure without analysing the code which will apply the regexp, but I am hopeful that it will work without side effects. Certes (talk) 13:48, 29 January 2024 (UTC)

Protected edit request on 20 May 2023

{{edit fully-protected|MediaWiki:Captcha-addurl-whitelist|answered=yes}}

Please add:

\btoolforge\.org

I assume this will be uncontroversial; wmflabs is already there. Suffusion of Yellow (talk) 00:22, 20 May 2023 (UTC)

:{{done}} — xaosflux Talk 01:07, 20 May 2023 (UTC)

Protected edit request on 1 June 2023

{{edit fully-protected|MediaWiki:Captcha-addurl-whitelist|answered=yes}}

Please add the following URLs (except for books.google.com and cnbc.com, those are auto-generated by various CS1 templates when the required IDs are passed to them; see Template:Citation Style documentation/id2):

\bapi\.semanticscholar\.org

\barxiv\.org

\bbiorxiv\.org

\bbooks\.google\.com

\bciteseerx\.ist\.psu\.edu

\bcnbc\.com

\bhdl\.handle\.net

\blccn\.loc\.gov

\bmathscinet\.ams\.org

\bopenlibrary\.org

\bosti\.gov

\bpapers\.ssrn\.com

\btools\.ietf\.org

\bui\.adsabs\.harvard\.edu

\bzbmath\.org

93.72.49.123 (talk) 14:50, 1 June 2023 (UTC)

:{{done}} — Martin (MSGJ · talk) 12:28, 13 June 2023 (UTC)

Protected edit request on 8 June 2024

{{edit fully-protected|MediaWiki:Captcha-addurl-whitelist|answered=yes}}

Please Google and Bing to the list:

\bgoogle\.com

\bbing\.com

Since {{t|AfC submission/pending}} template includes links to the search engines through the {{t|find sources}} invocation, unconfirmed users are forced to enter captchas when submitting drafts. Unconfirmed users have a rate limit of 8 edit attempts per minute which is not much. The counter is incremented every time an edit is interrupted due to a captcha requirement, and also every time a captcha entered is incorrect. According to the [https://grafana.wikimedia.org/d/IlK0cZbSk/gadget-stats?orgId=1&var-prefix=gadget_afcsw&var-metric=All&from=1716163200000&to=1717372799000 metrics] collected from the submission wizard, 10% of all submits fail with a rate limit error. The issue has also been reported by users: Wikipedia talk:WikiProject Articles for creation#Rate limit issue, Wikipedia talk:WikiProject Articles for creation/Submission wizard#Need help draft.

Links to search results don't help with SEO or otherwise have much spam potential. – SD0001 (talk) 06:39, 8 June 2024 (UTC)

:This seems like a bad idea. General purpose commercial search engines like Google and Bing are certainly not reliable sources and shouldn't be getting linked to; change the template to fix problem with this one use case. — xaosflux Talk 09:26, 8 June 2024 (UTC)

::Are you saying that {{t|find sources}} should not link to Google or Bing? – SD0001 (talk) 11:05, 8 June 2024 (UTC)

:::Or {{tl|AfC submission/pending}} could not transclude {{tl|find sources}}… jlwoodwa (talk) 04:26, 9 June 2024 (UTC)

::::Yup, more along that. Improving that workflow seems like a better idea. — xaosflux Talk 12:39, 9 June 2024 (UTC)

::So the idea is to use the captcha system to generate friction for editors trying to add search engines? Sounds like it is too broad if it is also generating friction when using official templates such as {{t|AfC submission/pending}}. Not sure what the performance cost would be, but an edit filter could potentially warn against this with a better warning message and less false positives. The regex would be something like ]*>[^<]+google\.com. Although I suppose this would only catch refs and not external links. Hmm. –Novem Linguae (talk) 08:48, 9 June 2024 (UTC)

content model

{{ping|Pppery}} have you verified that this is still working after your content model change? Not sure why that was even necessary? — xaosflux Talk 19:23, 30 January 2025 (UTC)

: Yes. Or at least I verified that my additions for TWA still work. The reason I changed the content model was that it felt silly to have a "wikitext" page that just added an unsightly "#" at the beginning to make it not wikitext. * Pppery * it has begun... 19:24, 30 January 2025 (UTC)

::If it still works guess no big deal. In general, I think these are expected to be in default wikitext (e.g. MediaWiki:Spam-whitelist) - doubt this content model will clash with upstream software improvements, but if it is just because you don't like the way it showed on the page don't think that is needed. — xaosflux Talk 15:53, 31 January 2025 (UTC)

::: MediaWiki:Spam-whitelist should be plaintext contentmodel too. All of them should be. * Pppery * it has begun... 16:42, 31 January 2025 (UTC)

::::The workflow of having to make a special content model change to allow that software to work wouldn't be a good thing (for why they should be in that content model). — xaosflux Talk 19:43, 1 February 2025 (UTC)

::::: The software "works" regardless of the content model. It's to make it look less ugly. * Pppery * it has begun... 17:41, 2 February 2025 (UTC)

:::::: T377334. * Pppery * it has begun... 18:29, 19 February 2025 (UTC)