Wikipedia:Request a query/Archive 1#Newly created Presbyterian articles

{{Automatic archive navigator}}

Redlinked categories with a history of deletion

I'd love a report on currently populated, redlinked categories with a history of deletion - similar to Wikipedia:Database reports/Deleted red-linked categories. This allows for easier spotting of stuff that was never migrated after a merge/rename and also many that were deleted per C1 that are now eligible for restoration, among other things. Unfortunately that report had an error and is only showing very few entries, and it doesn't appear as if it's going to be fixed anytime soon after I brought up the issue, so maybe someone here can re-run the report (I'm not sure where reports like this usually end up, but I would suggest simply replacing that page with the final report so others will be able to find it easily instead of a userspace page or whatnot). Thanks, VegaDark (talk) 19:03, 14 January 2017 (UTC)

:Hah, I think this page was specifically set up to distinguish itself from Wikipedia:Database reports or one of its many, many talk pages. I can't remember what went wrong with {{dbr|Deleted red-linked categories}}. The page history says it was last updated in November 2016? --MZMcBride (talk) 20:06, 14 January 2017 (UTC)

::Yeah, the November report only shows 50-something categories. It's blatantly wrong. Not sure what happened to the report but it's only showing a fraction of the categories the report is intended to show. If you want to fix that, I'm all for it :D VegaDark (talk) 04:09, 15 January 2017 (UTC)

::: I think the issue we were/are hitting is phabricator:T131266. I tweaked the report to work around this issue, but the consequence is that we're now hitting MediaWiki's api.php instead. Doing this many times, without batching the requests, makes the script take a lot longer. We'll see what happens in a few hours. --MZMcBride (talk) 07:19, 16 January 2017 (UTC)

::: Hmm, or not. The report finished in about 49 minutes with about 1,100 results. I guess the api.php queries were faster than I thought. I was assuming one per second, but I think we went faster than that. There about 31,000 red-linked categories, if you include categories that have not been previously deleted. --MZMcBride (talk) 07:29, 16 January 2017 (UTC)

::::Thanks! I think the red-linked categories report is also useful, but with 31k it's not particularly manageable - no end in sight for that cleanup task. VegaDark (talk) 03:23, 18 January 2017 (UTC)

OhioLINK Finding Aid Repository

Is it possible to generate a list (with wikilinks) of the subjects of all the biographies (like [http://ead.ohiolink.edu/xtf-ead/view?docId=ead/ODaWU0122.xml;chunk.id=bioghist_1;brand=default this one]) at this website? Note that they are all noindexed. The list could be useful for the Missing articles project to work on. See talk thread (which hasn't got any replies yet, through). 103.6.159.75 (talk) 13:25, 17 January 2017 (UTC)

:I would suggest contacting the website owners to see if they can provide you with a list. There are other things that you could try like writing a spider bot or downloading the entire website, but these can hog the website's bandwidth and might get your IP blocked etc., so it would be much better to get in touch first. The site's owners can probably run a simple SQL query on their database to find the information, which would be a lot less work. Once we have a list of entries it can easily be formatted to make wikilinks or compared against the Wikipedia database. — Mr. Stradivarius ♪ talk ♪ 04:29, 18 January 2017 (UTC)

BSicons

{{Archive top|reason=Done by {{u|JJMC89}} on Commons. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me
10:24, 25 January 2017 (UTC)}}

Are there ways to (a) make a list of Commons files with page name beginning with File:BSicon_ which were uploaded, reuploaded, edited or created as redirects in the past {{math|x}} (or 7) days, (b) make a list of Commons redirects with page name beginning with File:BSicon_ or (c) make a list of Commons files with page name matching the regex ^File:BSicon .+(l|r)(f|g).*\.svg$? (See also this bot request. It would help very much if these were automated, but I have no experience with making bots or using Tool Labs.) Thanks, Jc86035 (talk) Use {{re|Jc86035}}
to reply to me
05:15, 15 January 2017 (UTC)

(b) and (c) can probably be done with PetScan and a text editor, actually. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me
14:08, 15 January 2017 (UTC)

Pinging {{u|AlgaeGraphix}}. Jc86035 (talk) Use {{re|Jc86035}}
to reply to me
14:09, 15 January 2017 (UTC)

{{Archive bottom}}

Any talk pages of monthly maintenance categories?

Can anyone compile a list of all talk pages of categoiries in :Category:Monthly clean up category counter, if any exist? 103.6.159.67 (talk) 14:47, 24 January 2017 (UTC)

: I have posted this report at User:Topbanana/Monthly_cleanup_talk_pages for you. 143 of the 8879 relevant category pages have talk pages. - TB (talk) 10:01, 21 February 2017 (UTC)

hi, any help is appreciated (or suggestion)

I'm a member of wikiproject Medicine, basically this happened to us [https://en.wikipedia.org/w/index.php?title=User_talk:West.andrew.g&diff=765479254&oldid=765478719] and so we have a source code but we need someone's help to do 2016 version (of 2015[https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Stats/Top_medical_editors_2015_(all)]), I can assist in whatever is needed. ...thank you--Ozzie10aaaa (talk) 17:56, 17 February 2017 (UTC)

:{{Re|Ozzie10aaaa}} To track down this kind of expert, try posting at Wikipedia:Request a query. -- John of Reading (talk) 19:49, 17 February 2017 (UTC)

::will do, and thanks--Ozzie10aaaa (talk) 20:59, 17 February 2017 (UTC)


  • as indicated above I'm posting here and would appreciate any help, thank you--Ozzie10aaaa (talk) 21:02, 17 February 2017 (UTC)
  • Happy to take a look at the source {{u|Ozzie10aaaa}} and make a decision as to if it's something I can do -- Samtar talk · contribs 13:02, 23 February 2017 (UTC)

:::thank you--Ozzie10aaaa (talk) 13:13, 23 February 2017 (UTC)

Stats Table required

I'm looking for ways to get the huge and still increasing backlog at Special:NewPagesFeed reduced. In November I poineered the rollout of the New Page Reviewer right. After three months I now need to know how it is performing and what the next steps will be. To do this I need the following data in sortable Wikitable form of New Page Reviewers showing:

  1. user name
  2. date of 'promotion'
  3. number of Curations
  4. date of last curation
  5. date of last edit.
  6. Admin yes/no

There are currently around 300 or so non-admin Reviewers in this user group, all the rest are admins (who besides myself, are unlikely to be patrolling new pages with the Curation tool. Is this something someone could quarry for me fairly quickly? I'm sorry I have to ask, but I don't have a clue about regex, SQL, or data mining - but once it's been done once and I know how, I could repeat the operation when I need it again. Thanks, Kudpung กุดผึ้ง (talk) 12:23, 3 February 2017 (UTC)

:{{ping|Kudpung}} found it... I'll see what I can do -- Samtar talk · contribs 20:57, 9 February 2017 (UTC)

::I'm watching it. I'll see what I can do, I'll get started when I get home (circa 2 hours from now). Iazyges Consermonor Opus meum 19:17, 16 February 2017 (UTC)

::{{ctop|useful information for interested editors (for request "Stats Table required")}}

= Tables needed =

  • enwiki_p.logging - to find the rights change (log_type = 'rights')
  • enwiki_p.user - general user information
  • enwiki_p.user_groups - to find users who currently have the 'patroller' right

{{cbot}}

: Howdy. I've popped a draft version of this report up at User:Kudpung/NPR activity for you. I suspect a more rigorous definition of what constitutes a 'curation' is needed. Good luck. - TB (talk) 18:05, 22 February 2017 (UTC)

::{{u|kudpung}} ping. --Izno (talk) 20:18, 23 February 2017 (UTC)

:::Hi {{U|Topbanana}}, A 'Curation' is a use of the Curation Tool to tag an article for deletion, maintenance, or to pass it as 'patrolled' fit for retention. Curations are logged in a user Page Curation log as at [https://en.wikipedia.org/w/index.php?title=Special%3ALog&type=pagetriage-curation&user=Kudpung&page=&year=&month=-1&tagfilter=&hide_thanks_log=1&hide_patrol_log=1&hide_tag_log=1&hide_review_log=1]. Kudpung กุดผึ้ง (talk) 21:14, 23 February 2017 (UTC)

::::{{ping|Topbanana}} I had a look into this last week, but my SQL skills weren't good enough to get any useful results. However, I found out a couple of tricks that could be useful. Page curations are done using the PageTriage extension, which adds (among other things) the pagetriage-curation and pagetriage-deletion log_type values to the logging table. See [https://quarry.wmflabs.org/query/16456 this query] for an example where I've found my own curations on enwiki. Also, [https://quarry.wmflabs.org/query/16451 this query] may or may not be useful as a hackish way to find when users were given the patroller user right. (If a user was given the right, had it removed, and then was given it again, though, they will show up twice.) Best — Mr. Stradivarius ♪ talk ♪ 04:29, 24 February 2017 (UTC)

:::::In my experience, there's always a matching pagetriage-curation row for each pagetriage-deletion one (try sorting your results on timestamp), so the latter can be disregarded. Query 16451 doesn't just find promotions that add patroller, it finds all rights changes where the user has patroller afterward. quarry:query/16640 finds just ones where it was actually added, but still gives multiple entries for - as you say - users who had the right, had it removed, and then got it back (as actually happened with {{noping|Nordic Nightfury}} and {{noping|Zppix}}).{{pb}}{{ping|Topbanana}} Your query finds not just patrols, but autopatrols too - they all got action='patrol', though I seem to recall that was fixed very recently. (Some discussion here.) {{noping|Kaldari}}'s results are illustrative - [https://en.wikipedia.org/w/index.php?title=Special%3ALog&type=patrol&user=Kaldari&page=&year=&month=-1&tagfilter=&subtype=patrol this log] exactly matches his 1204 results in your query. You're also only showing results from users who currently explicitly have the patroller right, which no admin does.{{pb}}Also not specifically asked for here, but implicit in the request, is that he was only looking for results since page curation was restricted to users with the patroller right, which happened sometime in mid-November.{{pb}}Results from pagetriage-curation are here. Results from the patrol log would probably be useful too, but there's no efficient way to filter out the autopatrols without butting up against Quarry's half-hour timeout cap and lack of temp tables. I'll leave that to someone with toolserver access. —Cryptic 06:43, 24 February 2017 (UTC)

::::::Small nitpick - query 16451 does work, as the NOT LIKE '%"patroller"%"patroller"%' check excludes entries that had the patroller right both before and after whatever rights change is being logged. The regex in query 16640 is more robust, though (e.g. it won't fail if the patroller right somehow gets listed in the newgroups array twice), so using that one is probably a better idea. — Mr. Stradivarius ♪ talk ♪ 08:24, 24 February 2017 (UTC)

:::::::I'd've sworn there was only the first LIKE clause when I looked at it, my mistake. —Cryptic 02:55, 2 March 2017 (UTC)

Redirects to Special:

{{Archive top|reason=A list of all pages in any namespace that hard or soft redirect to the special namespace as of March 2, 2017 is available at User:Godsy/R to special for all those who may be interested. — Godsy (TALKCONT) 13:12, 2 March 2017 (UTC)}}

An alphabetical list of all pages in any namespace that redirect (including both hard and soft redirects; separate lists if possible, though together would be okay) to the special namespace would be useful if possible. This discussion sparked my interest in this type of redirect. — Godsy (TALKCONT) 02:24, 2 March 2017 (UTC)

:Hard redirects: quarry:query/16642. Soft redirects are indistinguishable from any other internal links. (Unless maybe you just want ones named as parameters to a specific template? Though that's still difficult to query for.) —Cryptic 02:51, 2 March 2017 (UTC)

::{{reply|Cryptic}} How about all pages that contain a transclusion of Template:Soft redirect and the word "Special:"? If not, there are only about 4600 translcusions, so it wouldn't take too long for me to sort through them manually. Thanks for quarry:query/16642. — Godsy (TALKCONT) 03:08, 2 March 2017 (UTC)

:::The labs database don't have wikitext, but I can find pages that both transclude Template:Soft redirect (or [https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Soft_redirect&hidetrans=1&hidelinks=1 one of its redirects]) and have one or more direct links to the Special: namespace. I've done so at quarry:query/16643. There aren't any results at all, which is perplexing; I'd expect dozens, at least. (It works fine for other namespaces - the link from User talk:Xbspiro to Wikipedia:RS shows up in the first ten results for links to Wikipedia:, for example - so I think the query is correct.) If you do find any soft redirs to Special: using that template, I'd be very interested to hear which. —Cryptic 03:46, 2 March 2017 (UTC)

::::{{reply|Cryptic}} I've began compiling a list of such redirects at User:Godsy/R to special#Soft redirects (7 so far; haven't gotten through the first 500 yet). So far, they all use the redirect Template:Softredirect, perhaps that is why.— Godsy (TALKCONT) 03:55, 2 March 2017 (UTC)

:::::(There's no need to ping me.) That's not the problem. The only link from Wikipedia:Blocked IPs on your list that's showing up is to Wikipedia:Soft redirect (quarry:query/16644). Which I guess makes sense, because Special:Whatlinkshere/Special:Ipblocklist doesn't show anything, either. Bleah. If the data's there, I don't know how to get at it, sorry. —Cryptic 04:05, 2 March 2017 (UTC)

::::::No worries and thanks again. — Godsy (TALKCONT) 04:23, 2 March 2017 (UTC)

{{Archive bottom}}

Large redirects

A list of redirects over a kilobyte in size, ordered by namespace then size, in all namespaces except User and User Talk. (there is a DBR for this, but it is stale). If possible, exclude pages in > 20 categories to avoid things like {{noredirect|Constitution of the World Health Organization}}. Thanks! — Train2104 (t • c) 21:22, 5 March 2017 (UTC)

:quarry:query/16645. This doesn't exclude by category. Download Data > Wikitable, then paste onto a wiki page, to make the namespaces legible. —Cryptic 23:57, 5 March 2017 (UTC)

::Thanks! I should've said ordered by namespace then size descending, but I can work with this. Created at User:Train2104/Large redirects. — Train2104 (t • c) 01:12, 6 March 2017 (UTC)

Bad project space moves

I'm not sure if the move log is available in SQL, but I would like to request a list of moves into project space from User, Draft, or article space where the target still exists, and whether or not the target is now a redirect. A lot of new users are choosing "Wikipedia:" in the move dropdown thinking that it's article space, and I'm finding more every day. I've put in a request at WP:EFR to try to stop new ones, but there's a lot of cleanup to do. – Train2104 (t • c) 06:15, 6 April 2017 (UTC)

:It's there, but data about the move target isn't easily accessible from sql because it's not stored directly - that is, I can get you a list of moves from those namespaces to Wikipedia:, but not filter by whether the destination page still exists or is a redirect. The five most recent such moves are at quarry:query/16646, with the move target stuffed as plaintext into the middle of that horrid log_params column. Full results eventually at quarry:query/16647. (There's a good chance that'll timeout; if so, I'll try again and limit it by date.) Someone with Labs access and permissions to create temp tables could do better. —Cryptic 07:10, 6 April 2017 (UTC)

:{{ping|Train2104}} Postprocessed results at User:Cryptic/query/16647. That'll at least show you if the destination still exists, and I believe there's user scripts floating around to restyle links that are redirects. Feel free to edit or move that page as convenient.{{pb}}This isn't exhaustive - the way data for either move logging or just the destination page is stored was apparently changed in March 2012, which is the earliest result I get; and I may well have gotten false negatives from after that, too, from the way I parsed log_params in the initial query. —Cryptic 07:53, 6 April 2017 (UTC)

::Thanks! Unfortunately AWB's "Links on page (blue links only) stops after 5075 links, which excludes WP space since they're done in alphabetical order. But this is good enough for now. Pinging {{ping|Steel1943}}, here's more things to clean up! – Train2104 (t • c) 12:54, 6 April 2017 (UTC)

:::{{Ping|Train2104}} I seriously wish there was some way to restrict page creations or moves into the "Wikipedia:" namespace. Even something as simple as a prompt ... for the editor to confirm this is really what they want to do ... would be adequate for a temporary (maybe permanent) resolution. Steel1943 (talk) 13:02, 6 April 2017 (UTC)

::::I asked for it at WP:EFR, but no response yet. May also want to post at WP:ENB since a disproportionate number of these (at least when I patrol the move log, not so much in this list) seem to be students. – Train2104 (t • c) 13:04, 6 April 2017 (UTC)

Request for title query

I am requesting for a list of all pages containing any of the following phrases:

  1. version 1
  2. version 2
  3. version 3
  4. version 4

...with any capitalization of the word "version". Also, if possible (not necessary, but helpful), please sort the query into the following 4 categories:

  1. Redirects in the "(article)" namespace
  2. Non-redirects in the "(article)" namespace
  3. Redirects in any non-article namespace
  4. Non-redirects in any non-article namespace

This page can be created at User:Steel1943/VersionPageList. Thanks! Steel1943 (talk) 16:05, 24 April 2017 (UTC)

Non-redirect subpages of redirect pages

This situation occurs when someone moves a page that has subpages but without checking the "Move subpages" box (un-checked is default). The result leaves the subpages stranded at the old name, which can break links from pages at the new name. See Talk:Wireless power transfer#Archives not listed for one visible effect of this situation when it happens for pages in the Talk: namespace, where subpages become lost archives. Presumably same problem for files becoming unattached in User: and User talk: if a user is renamed. DMacks (talk) 18:15, 7 May 2017 (UTC)

:Well, I tried at quarry:query/18428, which predictably timed out. It's probably feasible with a temporary table - select non-redirects containing '/' in the title, add a column with the base page name, select rows where the base page is a redirect - but I don't have permissions for that. —Cryptic 21:37, 7 May 2017 (UTC)

:{{reply|DMacks}} Here is my query at quarry:query/18429. It excludes the Main (article) namespace. It uses a temporary table that I created. --Bamyers99 (talk) 20:32, 8 May 2017 (UTC)

::And I've refined that to quarry:query/18457, since we want to find e.g. Talk:.45/38 Auto Pistol, not Talk:.45. {{ping|DMacks}} To make those {{ns:1}}: namespaces legible, Download data > Wikitable and paste it into a sandbox on-wiki. —Cryptic 23:40, 8 May 2017 (UTC)

:::Belated thanks! Real Life should settle down soon and I'll get back to working on this situation. DMacks (talk) 05:39, 23 May 2017 (UTC)

New search results feature

In the ongoing effort to improve the discoverability of all the world's knowledge, the [https://www.mediawiki.org/wiki/Wikimedia_Discovery#Search Discovery] team is proposing an enhancement to the search results page on Wikipedia ([https://www.mediawiki.org/wiki/Special:Search Special:Search]). The goal of this feature is to display related content for each individual search result returned from query, so that even when the search result itself isn't maybe entirely useful, perhaps its related content would be. We would like your early feedback on this new feature—more information can be found on [https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements/explore_similar MediaWiki] about the explore similar functionality and testing can be done in your own browser, using step by step [https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements/self-guided_testing self-guided testing] instructions. Cheers, DTankersley (WMF) (talk) 21:01, 31 May 2017 (UTC)

Newly created Presbyterian articles

Is there a way to search for articles created within the past week that contain "Presbyterian" in the title? Many thanks. Anna Frodesiak (talk) 19:17, 17 July 2017 (UTC)

:{{ping|Anna Frodesiak}} Nothing in mainspace. quarry:query/20320. —Cryptic 20:30, 17 July 2017 (UTC)

::Thank you, Cryptic. Now, if I click that once a week, will it continue to work? Best, Anna Frodesiak (talk) 20:33, 17 July 2017 (UTC)

:::You'd have to rerun the query. To do so:

:::*Go to https://quarry.wmflabs.org/

:::*Click the big blue "Login with Wikimedia" button

:::*Click "Allow" when OAuth on metawiki prompts you for permission

:::*Go back to quarry:query/20320

:::*Click the blue "Fork" button near the upper-right; this will create a copy of the query owned by you, which you'll be able to find again later from your profile page from the dropdown menu in the upper right

:::*Then click "Submit query" on your copy of the query.

:::Subsequent runs will only need the last step. —Cryptic 20:41, 17 July 2017 (UTC)

:::{{ping|Anna Frodesiak}} And quarry:query/20321 is a bit more suited for reuse. (It produces readable output for more namespaces, and shows results out to the 30-day limit for recentchanges.) —Cryptic 20:55, 17 July 2017 (UTC)

::::Thank you again, Cryptic. I clicked "publish" somewhere. Have I crashed the server now? :) Please undo that if it was a bad thing. Many thanks again. This will really help find recent creations of a persistent sock. I'm grateful! :) Anna Frodesiak (talk) 21:07, 17 July 2017 (UTC)

:::::It's harmless - all "publish" does is to put the query into a different section on your user profile on Quarry. —Cryptic 21:11, 17 July 2017 (UTC)

::::::Ah, okay. Thank you again, Cryptic. :) Anna Frodesiak (talk) 21:20, 17 July 2017 (UTC)

Request for argument list

Unsure if this is best place to ask, if not then could you point me in the right direction. Wonder if it is possible to obtain a list of used (non-blank and not just a comment) arguments to {{tl|Geobox}} used for type River. The list of articles to process can be found in :Category:Geobox usage tracking for river type. Do not need the values just the argument names. Keith D (talk) 19:34, 11 October 2017 (UTC)

Duplicate disambiguation pages

This isn't a query request, more of a plea for help with a query I'm trying to write. I have some SQL experience but I'm new to Quarry. If there's a better place to ask, just let me know.

I'm trying to identify when Foo and Foo (disambiguation) are distinct dab pages (not redirects to each other). I've written [https://quarry.wmflabs.org/query/22276 DabPairs], but it's not completing (which I guess from the previous reply means it is horrendously inefficient and times out after 30 minutes). It's clear from Explain that I'm accessing C1 and P1 efficiently but failing to use an index on P2 and scanning the entire table for each matching row from P1. (Joining to C2 should also be efficient, but I doubt we ever get there.)

One specific question:

  • How do I bludgeon MySQL into using the name_title index on the page table? I've specified a constant page_namespace and an expression for page_title. Initially I used JOIN to let the optimiser work its magic, but it didn't. I've tried using STRAIGHT_JOIN (which successfully enforces the join order I'm aiming for) and FORCE INDEX (which fails, telling me that name_title isn't a key: a clue?) but nothing seems to work.

More general questions:

  1. Is there any way to see the query plan without actually running the SQL? (The Explain button only appears once the SQL is running, and the EXPLAIN statement fails because we don't have the necessary table permissions.)
  2. If not, is there any way to stop a query once it's running and I can see from Explain that it's going to hog resources without producing useful results?

Thanks, Certes (talk) 16:37, 21 October 2017 (UTC)

:{{reply|Certes}}

: Use [https://tools.wmflabs.org/tools-info/optimizer.py SQL Optimizer] to get Explain results.

: Instead of (P1.page_title + "_(disambiguation)"), try concat(P1.page_title, "_(disambiguation)")

: FORCE INDEX doesn't work because queries use views on the underlying tables and views don't propagate the indexes on the underlying tables. --Bamyers99 (talk) 18:48, 21 October 2017 (UTC)

::Thank you {{u|Bamyers99}}! concat brought the query down to about a minute (which is fine, as it won't run regularly) and Optimizer showed that it's now using the name_title index without any prompting from me. Certes (talk) 20:10, 21 October 2017 (UTC)

encyclopedic text we have versus the amount of discussion text

Has anyone calculated the amount of encyclopedic text we have versus the amount of discussion text? I want to calculate the ratio of discussion that goes into making the article text for the English Wikipedia as a whole. This would include all archived discussions. This is for a planned Wikimania presentation. --RAN (talk) 02:25, 8 November 2017 (UTC)

:Here are the page size totals for the namespaces with more than 1GB of text. Templates just miss out with 972,668,061 but I think I've already counted their transclusions.

class="wikitable"
Namespace

! Bytes

(Main/Article)37,632,427,800
Talk13,025,336,470
User6,862,780,512
User talk39,804,738,147
Wikipedia11,460,273,425
Wikipedia talk2,115,102,631

:Hope that helps, Certes (talk) 12:13, 8 November 2017 (UTC)

::If that was a wikitext query rather than an HTML query, no, the transclusions would not have been counted. I don't think it affects anything too significantly though, since we don't drop oodles of prose into templates. --Izno (talk) 13:40, 8 November 2017 (UTC)

:::It's the "Uncompressed length in bytes of the page's current source text" from SUM(enwiki_p.page.page_len), so you're right: templates not included. Certes (talk) 16:13, 8 November 2017 (UTC)

SVGs without XML declaration

Is it possible to search within SVGs' source code? I would like to find and reupload SVGs on Commons (and maybe other wikis) which do not have an XML declaration. (SVGs without declarations stopped being rendered about 7 years ago but some files still haven't been purged since then.) Jc86035 (talk) 13:25, 29 November 2017 (UTC)

Revisions with broken parent ids

I would like someone to print out a list of revisions on Wikipedia whose rev_parent_id value is a deleted revision id in the archive table. Those revisions are the ones that need to be fixed for T186280, and are caused by T183375. GeoffreyT2000 (talk) 03:44, 25 February 2018 (UTC)

Toolforge SQL Optimizer

I'm here to announce the release of the Toolforge SQL Optimizer (which supersedes toolforge:tools-info/optimizer.py). This tool runs EXPLAIN on your query so you can see how efficient it is. Namely, you'd be looking at the number of "rows" scanned, and whether an index is being used, etc. I noticed our own Wikipedia:Request a query/Schemas failed to indicate how the Toolforge replicas differ from production (I've since updated that page). The Toolforge SQL Optimizer will attempt to point out such mistakes. I hope others find this tool useful MusikAnimal talk 20:06, 1 June 2018 (UTC)

Redirects to [[:fr:]]

I am seeking an alphabetical list of all pages in the mainspace that redirect to a target at :fr: if possible. Cryptic helped me with a similar request; I tried to adjust that code and run it myself {{plainlink|https://quarry.wmflabs.org/query/28936|to no avail}}. — Godsy (TALKCONT) 04:13, 13 August 2018 (UTC)

:Adding fr: to the SELECT clause at the top just changes what gets displayed in the output; you're still asking for redirect targets in the Special: namespace (WHERE rd_namespace = -1). You'd want something like WHERE rd_title 'fr:%' instead, if they got stored normally, but they apparently don't - there's a rd_interwiki column, from many many years ago before following hard redirects to other wikis was disabled. Querying for WHERE rd_interwiki != '' finds only one such page, User:Thryduulf/R to other wiki.{{pb}}Soft redirects, as before, can't be easily queried for. The same sort of workaround I tried before is at quarry:query/28941. —Cryptic 05:14, 13 August 2018 (UTC)

Articles without WikiProject on talk page

Per a request at Wikipedia:Help desk#articles without talk pages I created a [https://quarry.wmflabs.org/query/31864 quarry query]. The results can be pasted into a sandbox page and previewed to have wikilinks to the articles. --Bamyers99 (talk) 22:02, 6 December 2018 (UTC)

All articles in two categories (including subcategories)

Hi! I'd much appreciate any help running a query that finds all articles that are in both :Category:European novels (including subcategories) and :Category:Young adult novels (including subcategories). Thank you. /212.112.188.253 (talk) 16:42, 8 February 2019 (UTC)

:[https://petscan.wmflabs.org/?language=en&project=wikipedia&depth=1&categories=European%20novels%0D%0AYoung%20adult%20novels&ns%5B0%5D=1&search_max_results=500&interface_language=en&active_tab= This sounds like a job for Petscan]. —Cryptic 05:40, 9 February 2019 (UTC)

::I agree. You may have to increase the "depth" so that you get all "real" subcategories without false positives such as The Boy Sherlock Holmes. Certes (talk) 11:02, 9 February 2019 (UTC)

Populated places in Austria

I'd like to request a list of the complete contents of the {{cl|Populated places in Austria}} tree (inclusive of subcategories) as of [https://archive.org/download/enwiki-20190220 February 20]. It can be saved to my sandbox, or wherever else such dump reports get saved if there's a standard process for that.

There's a discussion at Wikipedia:Village_pump_(technical)#Populated_places_in_Austria if you need the background context on what happened, but since that discussion exists I don't think it's necessary to repeat the whole story.

Thanks. Bearcat (talk) 16:57, 5 May 2019 (UTC)

:That sounds like an easy job for PetScan, but it gets harder if you want to exclude subcategories which are not about settlements, such as {{cl|Buildings and structures in Eisenstadt}} and {{cl|People from Güssing‎}}. Certes (talk) 00:00, 6 May 2019 (UTC)

::I am pretty sure PetScan does not work off anything but the most recent dumps, if in fact it even looks at the dumps at all (I am pretty sure it works from the DB replicas setup on ToolForge). --Izno (talk) 00:19, 6 May 2019 (UTC)

:Generating a full category tree is problematic in all but the simplest cases. Would a list of the pages that (directly) transcluded {{tl|Infobox Town AT}} and the categories those were in be sufficient? —Cryptic 02:41, 6 May 2019 (UTC)

::That would be fine, if it's possible to do. Bearcat (talk) 15:05, 6 May 2019 (UTC)

:::User:Bearcat/Pages transcluding Infobox Town AT as of 20190220 dump. —Cryptic 17:23, 6 May 2019 (UTC)

::::Thanks. Much appreciated. Bearcat (talk) 15:37, 7 May 2019 (UTC)

:Another alternative is Wikidata, which you can query with SPARQL (tutorial). [https://query.wikidata.org/#SELECT%20%3Fplace%20%3Fpage_titleEN%20WHERE%20%7B%0A%20%20%3Fplace%20wdt%3AP17%20wd%3AQ40.%0A%20%20%3Fplace%20wdt%3AP31%2Fwdt%3AP279%2a%20wd%3AQ486972.%0A%20%20%3Farticle%20schema%3Aabout%20%3Fplace.%0A%20%20%3Farticle%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fen.wikipedia.org%2F%3E.%0A%20%20%3Farticle%20schema%3Aname%20%3Fpage_titleEN.%0A%7D%0ALIMIT%2010000 This WDQS query] may be what you need. (Click the white triangle on blue, bottom left, to run it). Of course, this tells you what Wikidata thinks is an Austrian settlement with a Wikipedia article, not which articles Wikipedia has categorised as Austrian settlements. Certes (talk) 11:42, 6 May 2019 (UTC)

Rcat templates and their redirects

I'd like a list of all pages in :Category:Redirect templates and, for each template in that category, a list of all pages in the Template namespace which redirect to that template. I tried to do this using pywikibot but it was not in any way efficient and ran into frequent errors. I'm trying to make the list used by the Capricorn user script more comprehensive, and such a list would be invaluable. Wug·a·po·des​ 23:07, 14 August 2019 (UTC)

:quarry:query/38397. —Cryptic 00:41, 15 August 2019 (UTC)

:Also, quarry:query/38398. There's gotta be a better way to do it. —Cryptic 01:20, 15 August 2019 (UTC)

::{{re|Cryptic}} Thanks so much! I should have specified that I wasn't too concerned with the subcats, but it's great that you were able to include them. Your way is already far better than my pywikibot script! Wug·a·po·des​ 03:58, 15 August 2019 (UTC)

[[WP:MOSTARTICLES]] query

The creator of this list has retired from the list and I am taking over. The task is to generate the top 10,000 users by how many articles they have created ie. by looking at the first revision of every mainspace page, who authored it, and adding up totals for every user and taking the top 10,000.

I made a solution based solely on API calls and data caching that works. But suspect it might be faster to run a SQL query (on Toolforge). However I am not well versed in SQL. Is this query someone can help me with? Preferably the query returns the top 10,000 results not 5.7 million. The inclusion of the redirects column is optional, primary goal is main article space. Thanks! -- GreenC 18:41, 23 September 2019 (UTC)

:{{re|GreenC}} A Quarry query such as this might do the job. I've artificially limited it to creations today, the top ten users and 60 seconds of run time to get some fast test results, but it should work with the whole table unless it hits some runtime limit. Certes (talk) 19:38, 23 September 2019 (UTC)

::That's not going to work - the creation log only goes back to [https://en.wikipedia.org/w/index.php?title=Special:Log/create&dir=prev&type=create 27 June 2018]. I don't think there's any other way to do it besides querying the whole revision table. —Cryptic 20:19, 23 September 2019 (UTC)

FWIW I ran the [https://quarry.wmflabs.org/query/39153 query for 10,000 rows] (not the top 10,000 users but wanted to time it) and extrapolating it would take about 30hrs to run the full 5.7 million articles. Which is very close to what it takes the API+cache method. So there may not be a great benefit with SQL. Not sure why I didn't think to {{ping|MZMcBride}} [original list creator] if you had any thoughts. -- GreenC 20:51, 23 September 2019 (UTC)

:That is the top ten thousand users. The reason the numbers are so much lower than the list at WP:MOSTARTICLES is that the creation log doesn't include any pages created before mid-2018. —Cryptic 20:58, 23 September 2019 (UTC)

::Ah! -- GreenC 21:02, 23 September 2019 (UTC)

::: I guess you're discovering that this is not a particularly enjoyable report to generate. :-) I believe the method I used previously was maintaining a separate SQL database table and joining against that. However, Toolserver (Toolforge, Wikimedia Labs, whatever) rules changed and user-maintained database tables were no longer allowed to live on the same hosts as replicated wiki database tables. This meant that I could no longer do a standard SQL join to generate this report. I'm pretty sure this is why the report no longer updates. At this point, I imagine any (accurate) updates of the report would be welcome, even if the report takes over a day to generate and only gets regenerated once every few months. --MZMcBride (talk) 08:25, 24 September 2019 (UTC)

:I'm sure MZMcBride and others have gone down this route before but one approach is:

:#Do a one-off run from the revision table listing everyone who created more than (arbitrary limit) 30 pages before 27 June 2018, and download somewhere

:#Regularly run the actor_logging query as amended by GreenC (takes 3 minutes) and download somewhere

:#Regularly run a script off-wiki to combine the two and show the 10,000 highest combined totals

:There will be minor errors: editors who became prolific recently but created a few pages years ago may have their counts understated by up to 30, and editors who used to be prolific and created a handful of pages recently may have that handful disregarded. However, bearing in mind that we are already ignoring contributions to deleted pages and mislabelling pages converted to or from a redirect, it may be close enough. Certes (talk) 11:12, 24 September 2019 (UTC)

::Yes that could work.. it would have the advantage of speed with disadvantage of some off counting. The API method I'm using has precise counts. It is also robust, saving state information thus can crash and restart as Grid nodes come and go due to resource allocation (it runs in the "continuous" pool). The drawback is speed, which is about 30-35 hours once the cache is done. Since it is not mission critical running once a week or month is not a problem. -- GreenC 23:52, 24 September 2019 (UTC)

::: Some of the other annoyances I encountered when generating this report, from memory:

:::* dealing with bots

:::* dealing with users who have opted out of being listed

:::* including non-registered users

:::* counting redirects and non-redirects

:::* choosing a sort order (by count of all pages created in the main namespace or by count of non-redirects)

:::* accounting for user renames

:::* accounting for page status changes (deletion, expansion, redirectifying)

::: And with millions of pages created, it's a decent amount of data that continues to grow. With changes to MediaWiki, such as the addition of the "actor" database table, implementing parts of this report may now be easier. --MZMcBride (talk) 04:26, 25 September 2019 (UTC)

::::Thanks for this. Is there a prior list of opted-out users available? Off the top of my head for others, the API only retrieves non-redirects so if a page is created as an article, and then later redirected/deleted it won't be counted. I planned on sorting primary on mainspace. User renames should be accounted for in the API since it is from the live database. I assume non-registered users (and blocked users) can be included in the top 10000. Assume bot-created articles can be included also, like Dr. Blofield. -- GreenC 15:06, 25 September 2019 (UTC)

::::: No problem. The scripts I used are at Wikipedia:List of Wikipedians by article count/Configuration. For the opt-out list, I re-used Wikipedia:List of Wikipedians by number of edits/Anonymous. --MZMcBride (talk) 06:23, 27 September 2019 (UTC)

: I try several queries and find that the only query that ends within a reasonable time is this query: SELECT rev_page, rev_actor, MIN(rev_id) FROM revision GROUP BY rev_page;. I using SELECT rev_page, rev_actor, MIN(rev_id) FROM revision GROUP BY rev_page LIMIT 400;. A little idea just for your reference. Thank you for your hard work. --Kanashimi (talk) 08:58, 27 September 2019 (UTC)

What is the distribution of the number of categories for biographical and non-biographical articles?

A request, please. As per discussion at Wikipedia_talk:Categorization#Is_there_a_general_problem_with_overcategorisation%3F, I would like to know how many categories an average article has, for biographical and non-biographical articles. Take a random sample of biographical and non-biographical articles, count the number of categories per article, show that data. We can then fit a statistical distribution to it. Cheers. Bondegezou (talk) 07:24, 27 October 2019 (UTC)

:How large a sample? —Cryptic 08:19, 27 October 2019 (UTC)

::I'll take whatever can be done. 50 biography non-stubs, 50 biography stubs, 50 non-biography non-stubs, 50 non-biography stubs...? Bondegezou (talk) 10:02, 27 October 2019 (UTC)

:::That's a lot lower than I was afraid of. quarry:query/39761 has about 16000 probable biographies and 43000 probable non-biographies. (I can cut that down if it's too large to be easily worked with, but not easily change the distributions of biography-vs-nonbiography.){{pb}}The easiest reasonable way to differentiate between stub and non-stub is page length (the raw character count). Where do you want the cutoff? —Cryptic 10:18, 27 October 2019 (UTC)

::::I'll take it all. Can you do a csv file with 1 row per article, with columns for raw character count and for # categories? Bondegezou (talk) 10:34, 27 October 2019 (UTC)

:::::Cyan "Download data" dropdown button on the quarry page I linked. —Cryptic 10:43, 27 October 2019 (UTC)

::::::Great! Many, many thanks. Bondegezou (talk) 13:25, 27 October 2019 (UTC)

Attempt to obtain revision data for specific dates&times

I am trying to devise a query to obtain a small number of revisions starting at a specific timestanp. I am hopting to be able to replicate User:Opabinia regalis/Article statistics with a larger dataset and a more spread out range of times. What I really need is the revision ID, aka old_Id, or soemthign else that can be converted into a diff. If I can get the page name and username as well, that would save some effort (given a diff these atre availalbe though the interface, of course). I do know SQL, but not the MW db structure. i got as far as the following, but it runs too long and gets killed:

SET max_statement_time = 300;

USE enwiki_p;

SELECT page_title, rev_timestamp

FROM page

JOIN revision_userindex

ON rev_page = page_id

WHERE SUBSTRING(rev_timestamp,1,10)= '2018020112'

AND page_namespace = 0

ORDER BY rev_timestamp DESC

LIMIT 10

Any suggestions would be welcome. Please ping me if there is a response. DES (talk)DESiegel Contribs 01:14, 28 November 2019 (UTC)

:{{ping|DESiegel}} A few points:

:* Why use revision_userindex when you don't need the user index?

:* If all you want is the revision id, why are you not selecting it?

:* Am I understand that you want, eg, the first 10 edits made in mainspace after a certain time? DannyS712 (talk) 01:49, 28 November 2019 (UTC)

revision_userindex and SUBSTRING() are both pessimizations here. This query is instant:SELECT page_title, rev_timestamp

FROM page

JOIN revision

ON rev_page = page_id

WHERE rev_timestamp BETWEEN '2018020112' AND '2018020113'

AND page_namespace = 0

ORDER BY rev_timestamp DESC

LIMIT 10;revision.rev_id can be plugged into e.g. Special:Diff/823468207 for the diff; join actor_revision on actor_id=rev_actor for the username/ip (in actor_revision.actor_name). —Cryptic 02:03, 28 November 2019 (UTC)

::Thanks i'll try that. A few answers for you:

::*I used revision_userindex because I read Help:Toolforge/Database to say that if I wanted any user-specific info i needed that. I may have mis-read it.

::*This was a first step, i was going to add the revision ID once this worked.

::* Yes exactly

:Thanks again. DES (talk)DESiegel Contribs 02:19, 28 November 2019 (UTC)

Zero transclusion cross namespace redirects to template space

Hi, there has been quite a lot of discussion about redirects in template namespace recently and I'm quite curious how many zero transclusion cross namespace redirects to template space there are and potentially if there is an merit to a batch RfD. I would usually do this using the search function but since an insource search on redirects isn't possible as far as I know. Could this be done using a query? ‑‑Trialpears (talk) 21:42, 7 February 2020 (UTC)

::quarry:query/42002. —Cryptic 22:32, 7 February 2020 (UTC)

:Good question, {{u|Trialpears}}! I look forward to the result of this query. Doug Mehus T·C 21:50, 7 February 2020 (UTC)

:I also like the idea of doing a batch deletion if there's a straightforward way to do this and subject to the templates not being "never transclude" templates. Doug Mehus T·C 21:52, 7 February 2020 (UTC)

::You're going to have to get consensus for that at WT:CSD first - good luck with that - or list them individually at RFD. —Cryptic 22:32, 7 February 2020 (UTC)

Making a list of inactive subscribers

I asked about this at VPT, and I got [https://en.wikipedia.org/w/index.php?oldid=950795068#Finding_out_who_is_still_around a "good enough" answer], but User:Izno and User:Cryptic thought I should ask here.

My problem was that WP:VisualEditor/Newsletter is an old page, which means that there's the possibility that people signed up and have since stopped editing. But it's also true that the page was the first – we didn't create the multi-lingual, multi-wiki version until later – so there are editors on that list who don't edit here, but who are active on other wikis and still want this newsletter. Cryptic gave me a list, and I've used that for now. I'm here in search of a more sustainable solution.

The goal is that every now and again (possibly less often than once a year), I could figure out who's stopped editing, and remove those names from the list. I have this list here, plus the one at m:VisualEditor/Newsletter. The Growth team has a newsletter, and, looking at User talk:EdSaperia (one of the names on my list inactive editors), I think that the WP:SIGNPOST probably needs to consider purging its subscription list as well. So this seems like it might be a general problem, but not a frequent one. What do you recommend? (Please ping me.) Whatamidoing (WMF) (talk) 23:27, 13 April 2020 (UTC)

:{{ping|Whatamidoing (WMF)}} {{doing}} to clarify, something like:

:* Extract all links to a user talk page

:* Check when the user last edited

:* Remove link if user's last edit was more than (how long?) DannyS712 (talk) 07:42, 28 April 2020 (UTC)

For now, until I know what users should be removed from the list (in terms of time since last edit), here is a list of all users subscribed and the time they last edited:

{{collapse top}}

+----------------------------+----------------+

| user_name | last_edit |

+----------------------------+----------------+

| Xa | 20050804001923 |

| Bibliothecary | 20070106161327 |

| Loggats | 20130705194812 |

| Bjmendelson | 20130705200209 |

| Ewiz88 | 20130717143620 |

| Lacolchadetu senscape | 20130725191113 |

| Sarah Gallienne | 20130801114244 |

| Davidzicoman | 20130802142211 |

| Ad mous12 | 20130805100651 |

| Kateecherry | 20130805220649 |

| Redschuart | 20130806144939 |

| NunoDocksvira | 20130807135229 |

| Lineagekeeper | 20130814222046 |

| Mohammad Mahdi Malayeri | 20130816133933 |

| 666frz | 20130819163252 |

| Blok Glo | 20130907110601 |

| Chaleyer61 | 20131011102607 |

| Goodenough4wiki | 20131021165245 |

| JimStage | 20131028131243 |

| Hal.MacGregor | 20131121002234 |

| Divansantana | 20131121203152 |

| SideLincoln99 | 20131126224042 |

| Ke42sou | 20131128202559 |

| Camelnuaa | 20140212003003 |

| Quicknss | 20140212012753 |

| Asfak786 | 20140212162330 |

| BikashKumar11 | 20140301171342 |

| Einottaja | 20140316145921 |

| Davidhrobertson | 20140319111127 |

| Susanne Ramharter | 20140413123257 |

| Gschmirgal | 20140602083708 |

| Dudel250 | 20140607235338 |

| Adrienne1982 | 20140901121053 |

| BioFilip | 20140919123418 |

| Jamesx12345 | 20141001214823 |

| Fisubar | 20141029133714 |

| Juanktroid96 | 20141104211128 |

| BenevolentUncle | 20141117112454 |

| Hyperspace0 | 20141125192617 |

| Fauban | 20150221162701 |

| OrangesRyellow | 20150302183623 |

| LewisMCYoutube | 20150420173350 |

| Paul.j.richardson | 20150602235543 |

| Abinaya roshini | 20150610042517 |

| Bundledape | 20150619190553 |

| No qwach macken | 20150622214847 |

| Nicoya1967 | 20150713090504 |

| Richard vlamynck | 20150724224105 |

| Buffbills7701 | 20150802220600 |

| Mbcap | 20150808201518 |

| Purplesally17 | 20150817203935 |

| EastDimeBoxFrank | 20150901072437 |

| Junflr | 20151025004301 |

| Taradle | 20151026205216 |

| TheEternalFlame | 20151029150850 |

| Hawraalmana | 20151208140051 |

| Ngoc Trang Nguyen | 20151213200004 |

| Tkmlz | 20151221225947 |

| IAmKing18 | 20151228133508 |

| Lion trainer | 20151229123734 |

| Looking4ufo | 20160126204652 |

| Raptros155 | 20160126214930 |

| Jadair10 | 20160208232403 |

| Upotrebi klikere | 20160209135212 |

| Lukaslt13 | 20160212122848 |

| Rockbrarian | 20160216083213 |

| Mpconnick | 20160222212324 |

| Raftab | 20160310143445 |

| Magboned | 20160412124800 |

| Gordon4752 | 20160427081651 |

| Pandg2 | 20160506053916 |

| Sourov0000 | 20160512210536 |

| WikiClive | 20160514114731 |

| Patel.bhupesh | 20160520145113 |

| Emekaborisama | 20160522152348 |

| Writeswift | 20160621141825 |

| BIG RizZ | 20160701092238 |

| Deubug | 20160804195605 |

| Coolgama | 20160808063708 |

| Natalie.Desautels | 20161024015727 |

| SuperHero2111 | 20161027224415 |

| Joerasmussen | 20161101211916 |

| Gcjasoliya | 20161114060640 |

| Woodstop45 | 20161114202249 |

| Mikebilz | 20161119173942 |

| Chiefmartinez | 20161130081226 |

| Kristephanie | 20161210004515 |

| Superangulon210 | 20170106154608 |

| Penguin9541 | 20170118160836 |

| JordanKyser22 | 20170124223449 |

| Mohit1294 | 20170126135454 |

| Ghimire111 | 20170208153111 |

| Ttrbwki413 | 20170217085859 |

| Afrocreole | 20170225035825 |

| Thiago200002 | 20170325184546 |

| Hsbandrgb | 20170330031124 |

| Bouowmx | 20170406150050 |

| Wc5bk6 | 20170412143743 |

| NepaliHelper | 20170415170043 |

| HasanMasud PUST | 20170419214851 |

| WendigoUK | 20170424212800 |

| Thegreatrituraj | 20170503032958 |

| ShyEager | 20170530194740 |

| TheMillionRabbit | 20170531201244 |

| Patrickbeardmore | 20170620192201 |

| Interitus malorum | 20170629180340 |

| Macowell | 20170707104314 |

| Aozz101x | 20170728155555 |

| Jlmarco | 20170831041017 |

| Farshid7 | 20170912203634 |

| Da Glitchers | 20171116062523 |

| TheBeastdot | 20171126144829 |

| Freedman1 | 20171130234839 |

| The Phase Master | 20180111065316 |

| Spektre1 | 20180112005448 |

| Oreotoast | 20180116195818 |

| Arnold.kevin729 | 20180117133632 |

| Supercell121 | 20180129024407 |

| Paulxxxxxx54 | 20180129042558 |

| Hobbes Novakoff | 20180209043810 |

| Laughtermaster | 20180222235738 |

| Icqa | 20180228233514 |

| Drali1954 | 20180310024435 |

| Tayear9220 | 20180320194826 |

| Aryan hindustan | 20180330031150 |

| SaimS | 20180419065608 |

| A ri gi bod | 20180424030049 |

| FDLeyda | 20180430001021 |

| Chitreshraj | 20180507133211 |

| Cj1340 | 20180509161701 |

| Kingsho | 20180510143818 |

| J.Dong820 | 20180605015148 |

| Ruud Koot | 20180608232349 |

| Jacopo Werther | 20180626165514 |

| Hazeldee0512 | 20180808172510 |

| Mohau | 20180811115456 |

| Katcheez | 20180822163156 |

| BlueScreen | 20180829220340 |

| Otto Knell | 20180831151226 |

| Cryptic C62 | 20180901191945 |

| Ken987654156 | 20180903064514 |

| MirandaStreeter | 20180908220944 |

| Elitre | 20180930060552 |

| Minzy4u | 20181003033150 |

| ChirayuAkotiya | 20181005064849 |

| Tyjayanth | 20181008151814 |

| Im5yrsold | 20181029113158 |

| Drichter1 | 20181107172222 |

| Justin86789 | 20181110082147 |

| MoarSmtp | 20181124175153 |

| Jduranboger | 20181126152929 |

| TrishaNewell | 20181129172238 |

| Yololobia2 | 20190111071642 |

| Wiki3310 | 20190119170626 |

| Nha4601 | 20190123163626 |

| Reb1981 | 20190125001631 |

| Waliy sherpa | 20190129005027 |

| 3oWh pF0wHz | 20190203115618 |

| Rororolalala | 20190204205310 |

| Martijn Hoekstra | 20190205111930 |

| 6033CloudyRainbowTrail | 20190406023904 |

| Cky2250 | 20190417122044 |

| Double Plus Ungood | 20190501220050 |

| Cekli829 | 20190504161005 |

| Suriddha Munshi | 20190517092400 |

| Danlev | 20190522212555 |

| De la Marck | 20190613045703 |

| Tyt0791 | 20190615102943 |

| Gaia Rinaldelli | 20190622074245 |

| Sanyamkamat | 20190627041928 |

| Davisonio | 20190628101218 |

| Khamar | 20190628213947 |

| Sreejiththulaseedharan | 20190630151625 |

| Zhaofeng Li | 20190701215013 |

| Wikiuser13 | 20190706172622 |

| Defan24 | 20190709074649 |

| JJBers | 20190730001237 |

| Wshaggy | 20190731191319 |

| Lalit82in | 20190802013324 |

| Bouktin | 20190811113429 |

| Sam'kelo Mpungoseh | 20190824170101 |

| Spineas | 20190828074120 |

| MrTrains227 | 20190904150950 |

| MatthewBurton | 20190922020959 |

| Simoncaulton | 20190924112442 |

| Orsd | 20190925225628 |

| Acather96 | 20190930104152 |

| LogX | 20191005155640 |

| GreatSculptorIthas | 20191008151919 |

| Charles Edwin Shipp | 20191009123632 |

| Dlwilson88 | 20191015074437 |

| Flekkie | 20191018045429 |

| VeniVidiVicipedia | 20191020092349 |

| Vpilato | 20191024180204 |

| Immu 01 | 20191030004916 |

| Nick Wilson | 20191105172803 |

| FuriouslySerene | 20191107021322 |

| DLynch (WMF) | 20191119150011 |

| Bill52270 | 20191126031726 |

| Three97 | 20191201120535 |

| IkselOwk | 20191204091155 |

| Shanata | 20191206061439 |

| Jamesjpk | 20191213071918 |

| Aschmidt | 20191219195138 |

| Annibale covini gerolamo | 20191222165817 |

| Nicereddy | 20191223035826 |

| There'sNoTime | 20191231182120 |

| Wouterstomp | 20200112175951 |

| Bishwa 777 | 20200113104744 |

| PhotographerTom | 20200119205333 |

| Evangeline | 20200120061203 |

| Captain Assassin! | 20200124035034 |

| MrWonka | 20200126170910 |

| Barzamin | 20200129000902 |

| Dfcfozz | 20200131215138 |

| Jrf | 20200202222926 |

| Mainline421 | 20200205224406 |

| Innovative Username | 20200206154721 |

| AWwikipedia | 20200211083742 |

| Anonymoustofu | 20200212002455 |

| Pointillist | 20200212095532 |

| PieThrowerChamp | 20200214043752 |

| Fresternoch | 20200217182613 |

| Symphonic Spenguin | 20200220021247 |

| Neo12345292 | 20200220061750 |

| Nicco18 | 20200223042713 |

| Esb5415 | 20200223050811 |

| Aleksa Milicevic | 20200223194658 |

| Retinarecorder | 20200224020311 |

| JonsterMonster | 20200302185742 |

| Beauty School Dropout | 20200303022119 |

| LikeLifer | 20200303110523 |

| Roosegarden | 20200303234857 |

| Sue Gardner | 20200304012916 |

| Frederico1234 | 20200304125900 |

| Crh23 | 20200307131909 |

| Fench | 20200309043725 |

| Henrikdv | 20200309133314 |

| C-Brennan-Poole | 20200309214548 |

| Adam Cuerden | 20200310002538 |

| Damenleeturks | 20200310195827 |

| Coal town guy | 20200311140240 |

| Hydriz | 20200312071233 |

| Sachi bbsr | 20200314013931 |

| Gerryyabes | 20200314220339 |

| TPFNoob | 20200316200005 |

| Nigeleezdr | 20200317035341 |

| HJKeats | 20200317125630 |

| Hibsch | 20200323024353 |

| Daxx wp | 20200323081300 |

| LeonardoIannelliCOMPUTE | 20200326210729 |

| Arthur Rubin | 20200327010347 |

| TehPlaneFreak | 20200327021427 |

| Davidbuddy9 | 20200328014951 |

| Mindi Crayon | 20200328223019 |

| Atchison clark | 20200329175836 |

| Shaded0 | 20200331181321 |

| Magol | 20200401163055 |

| Ckoerner | 20200401164412 |

| ZLEA | 20200401235824 |

| Plingsby | 20200402084547 |

| Ugog Nizdast | 20200402171634 |

| Quenhitran | 20200403095509 |

| Cscott | 20200403164401 |

| Ekhaya2000 | 20200403164816 |

| Arodb | 20200404154052 |

| Robert M. Hunt | 20200405173214 |

| Pratyya Ghosh | 20200406034049 |

| Nkansahrexford | 20200406212045 |

| Zziccardi | 20200407031008 |

| Sasuke Sarutobi | 20200407095020 |

| Darklanlan | 20200408093959 |

| Flexdream | 20200409200015 |

| The 19th One | 20200409210731 |

| DarkestElephant | 20200410105939 |

| CWBoast | 20200411151913 |

| Hamham31 | 20200412134306 |

| MadGuy7023 | 20200413124744 |

| Joe Decker | 20200413160809 |

| Quiddity | 20200413161153 |

| Medo9 | 20200413212350 |

| Pseudonymous Rex | 20200414092226 |

| Blahma | 20200414203346 |

| SmartK | 20200415072540 |

| Jtamad | 20200415121648 |

| ??????? | 20200415214416 |

| TenType | 20200415225639 |

| Peterdownunder | 20200416002112 |

| Luke1337 | 20200416125154 |

| Miya | 20200416143403 |

| Mgiganteus1 | 20200416152411 |

| EdSaperia | 20200416173734 |

| Ahm masum | 20200416191649 |

| Ashorocetus | 20200416201231 |

| Howicus | 20200416230033 |

| Fluffernutter | 20200417222848 |

| Toran107 | 20200418024339 |

| ToxiBoi | 20200418042738 |

| James Allison | 20200418235110 |

| Bcorr | 20200419012632 |

| Tom29739 | 20200419121609 |

| JB82 | 20200420001238 |

| Jonathan Lane Studeman | 20200420075252 |

| Cup o' Java | 20200420154511 |

| Zulujive | 20200420183316 |

| VQuakr | 20200420231909 |

| Imagine Wizard | 20200421152401 |

| The Land | 20200421213734 |

| StudiesWorld | 20200421225827 |

| Harej | 20200421231114 |

| Tom Morris | 20200422113409 |

| ??????? | 20200422132300 |

| Saehrimnir | 20200422204954 |

| Lfstevens | 20200423002624 |

| SshibumXZ | 20200423133542 |

| Checkingfax | 20200423144715 |

| Neonorange | 20200423171149 |

| Ocaasi | 20200424002716 |

| Hmich176 | 20200424015854 |

| Oiyarbepsy | 20200424035837 |

| ???? | 20200424074527 |

| Jimbo Wales | 20200424090151 |

| PearlSt82 | 20200424123052 |

| Richard Nevell (WMUK) | 20200424160103 |

| Bevo | 20200424183821 |

| Perumalism | 20200424190532 |

| SMcCandlish | 20200424211542 |

| Slazenger | 20200424225740 |

| Lucky102 | 20200425003809 |

| LORENZ SARMIENTO | 20200425035737 |

| Pizza1016 | 20200425061907 |

| Punetor i Rregullt5 | 20200425070447 |

| Prototime | 20200425081841 |

| Masssly | 20200425140335 |

| Kasyap | 20200425142722 |

| Alsee | 20200425150536 |

| Mvolz | 20200425161415 |

| K6ka | 20200425214544 |

| ScottyWZ | 20200426020814 |

| Spurb | 20200426053722 |

| Goldenshimmer | 20200426055901 |

| Meow | 20200426090334 |

| Llew Mawr | 20200426092455 |

| Evolution and evolvability | 20200426094922 |

| Quadtripplea | 20200426124608 |

| Jhertel | 20200426132957 |

| Andrewa | 20200426172345 |

| Redalert2fan | 20200426174138 |

| Ceyockey | 20200426201512 |

| Judgesurreal777 | 20200426212018 |

| Bilby | 20200426224131 |

| ConradKilroy | 20200426224512 |

| Rhododendrites | 20200427004508 |

| Nate 2169 | 20200427011450 |

| Dtwedt | 20200427022637 |

| Peter Chastain | 20200427030826 |

| TerraCodes | 20200427053710 |

| MusikAnimal | 20200427054846 |

| Muffizainu | 20200427065116 |

| Barte | 20200427074549 |

| Timeshifter | 20200427092823 |

| TheFrog001 | 20200427115656 |

| Andrew Davidson | 20200427132035 |

| Alessandro57 | 20200427142915 |

| GermanJoe | 20200427145729 |

| QWER11296 | 20200427151055 |

| Wittylama | 20200427151441 |

| Babymissfortune | 20200427155329 |

| K4rolB | 20200427160827 |

| Anthony Staunton | 20200427164743 |

| Diego Moya | 20200427165909 |

| Corn cheese | 20200427170106 |

| Waddie96 | 20200427170849 |

| TheSandDoctor | 20200427172221 |

| Alaney2k | 20200427174415 |

| Christian75 | 20200427175559 |

| ClemRutter | 20200427181057 |

| Moyogo | 20200427190827 |

| The Anome | 20200427190854 |

| ProtoDrake | 20200427192353 |

| Charles01 | 20200427193817 |

| Thryduulf | 20200427204156 |

| David Gerard | 20200427214812 |

| Presidentman | 20200427215546 |

| Pelagic | 20200427215732 |

| RoySmith | 20200427215851 |

| Froztbyte | 20200427223608 |

| Wolbo | 20200427232625 |

| Nick Moyes | 20200427235954 |

| Xbony2 | 20200428004518 |

| Whiteghost.ink | 20200428004703 |

| Johnny Au | 20200428005706 |

| BoldLuis | 20200428010107 |

| Davey2010 | 20200428010500 |

| Ched | 20200428011216 |

| Some1 | 20200428014439 |

| Ergo Sum | 20200428014827 |

| Epicgenius | 20200428015039 |

| WhatamIdoing | 20200428015352 |

| Jarble | 20200428021851 |

| CAPTAIN RAJU | 20200428030614 |

| Patchallel | 20200428031809 |

| 3family6 | 20200428032851 |

| Kerry Raymond | 20200428035903 |

| Paine Ellsworth | 20200428041113 |

| Another Believer | 20200428044049 |

| Chewings72 | 20200428044352 |

| John Broughton | 20200428045710 |

| Ammarpad | 20200428051325 |

| PamD | 20200428051914 |

| DGG | 20200428053957 |

| MarkZusab | 20200428055829 |

| Robert McClenon | 20200428060025 |

| Anasskoko | 20200428060100 |

| The ed17 | 20200428061321 |

| Nihiltres | 20200428061459 |

| Ivan Humphrey | 20200428062000 |

| Lotje | 20200428064801 |

| CRS-20 | 20200428065358 |

| JetBlast | 20200428070404 |

| Discott | 20200428071914 |

| JMHamo | 20200428072401 |

| Tenryuu | 20200428072520 |

| Piotrus | 20200428072904 |

| Ged UK | 20200428073039 |

| The wub | 20200428073607 |

| Patriccck | 20200428074602 |

| Donald Trung | 20200428075128 |

| Philoserf | 20200428075132 |

+----------------------------+----------------+

{{collapse bottom}}

Hope this helps --DannyS712 (talk) 07:55, 28 April 2020 (UTC)

:Thank you, DannyS712. I was thinking about a two-year cutoff. The distribution of the numbers above suggest that's about right (~25 per year until 2019 and 2020, with 2018 having 39). If we take today's date, then Cky2250 goes off the list and Double Plus Ungood stays on. Does that sound reasonable to you? (Please ping me.) Whatamidoing (WMF) (talk) 18:52, 28 April 2020 (UTC)

All free files with deleted or hidden previous versions

Hello!

I would like a query for all local Wikipedia files that

  1. Are not in :Category:All non-free media
  2. Have deleted (as in :File:MidAmericanConference 100.png) or hidden (as in :File:Superettan logo.svg) previous versions

The examples would not be in the query since they are non-free. It's hard to find free example, hence this request.Jonteemil (talk) 18:20, 5 May 2020 (UTC)

: {{Reply to|Jonteemil}} See :quarry:query/44649 and :quarry:query/44650. — JJMC89(T·C) 19:56, 5 May 2020 (UTC)

::{{re|JJMC89}} Thanks!Jonteemil (talk) 20:16, 5 May 2020 (UTC)

Redirects to Meta:

Similiar to Wikipedia:Request a query/Archive 1#Redirects to Special:, I would like a list of all pages that hard redirect to meta:. — Godsy (TALKCONT) 04:30, 6 May 2020 (UTC)

:Actually, I think I figured it out: quarry:query/44662. — Godsy (TALKCONT) 04:37, 6 May 2020 (UTC)

::Nope, did not work. — Godsy (TALKCONT) 04:41, 6 May 2020 (UTC)

:If my variant is right then we have just the one redirect: {{-r|User:Interstellarity}} → Meta:User:Interstellarity. rd_namespace=0 because it says so here. Certes (talk) 09:54, 6 May 2020 (UTC)

::Thanks! — Godsy (TALKCONT) 21:32, 7 May 2020 (UTC)

Before and after at Wikivoyage

Is is possible/feasible to generate a list of articles for voy:Wikivoyage:Lede Paragraphs Expedition whose leads are shorter than a certain length (e.g., 100 bytes)? User:SelfieCity and the others there would like to be able to measure the progress of a project to expand the leads, so this query would ideally be run twice (before and after the project).

Wikivoyage articles are very standardized in format, so you would be counting all content that appears after the page banner template (=variable length) and before the ==First section== heading. If it's easier to do readable prose before the first level 2 section heading, that's fine, too. WhatamIdoing (talk) 20:18, 20 May 2020 (UTC)

: This is impossible because queries don't have access to the content of pages. * Pppery * it has begun... 20:56, 20 May 2020 (UTC)

::So maybe it could be done some other way, but not through a query/this process (right?). WhatamIdoing (talk) 03:26, 21 May 2020 (UTC)

:::Well, there's nothing about this "process", per se, that restricts it to queries against the toolserver replicas. (Being related to enwiki, on the other hand....) You could find this data, for example, by downloading a full database dump, installing it into a local sql server, and querying that. That's a considerable investment of storage space and time, though, and this isn't really a straightforward query to begin with. —Cryptic 04:36, 21 May 2020 (UTC)

Adding date parameter

Please check [https://quarry.wmflabs.org/query/45730 this] query. I would like to have this modified in such a way that it displays the date on which the template was created and the date on which last edit was done. Adithyak1997 (talk) 11:22, 10 June 2020 (UTC)

:{{re|Adithyak1997}} I've created a variant query/45740 showing min and max timestamps. As the title is "mlwiki least edited templates", you may want to add something like "HAVING edits < 3" after the GROUP BY. It looks as if many templates have been edited only once (at creation). Certes (talk) 11:34, 10 June 2020 (UTC)

Can i get queries for the following list ? I have written queries to some but not sure they are correct.

What is the total number of orphan articles on Wikipedia?

What is the total number of dead-end articles on Wikipedia?

What is the total number of articles on Wikipedia that have at least one media element (picture or video)?

What is the total number of articles on Wikipedia that have more than one media element (picture or video)?

What is the total number of articles on Wikipedia that have infoboxes?

What is the total number of articles on Wikipedia that links to (or powered/controlled by) Wiki data?

What is the total number of Wikidata Items connected to articles on Wikipedia

What is the total number of articles on Wikipedia that are marked as stubs?

What is the Average edits per page

What is the total number of articles on Telugu Wikipedia that are marked as featured articles? — Preceding unsigned comment added by Ramu ummadishetty (talkcontribs) 16:55, 10 June 2020 (UTC)

:That's a lot of disparate requests, many of them very poorly-defined, and some of them needing complex queries. If you say why you want them besides idle curiosity (which is what it looks like from here), and show what you've done already, you may be able to motivate someone to help out. —Cryptic 21:00, 17 June 2020 (UTC)

Users matching specfic criteria

I need a query that counts all users (no IP's) that:

  • have made 200 or more edits in the article namespace of which
  • the first edit is at least 60 days old and
  • 50 of those edits have been made within the last 365 days
  • and that is not a bot

I can understand and write SQL but I'm too unfamiliar with the structure, so every type of help is appreciated, even a simple list of fields and tables that I need to perform my query correctly. --Paraselenae (talk) 03:23, 22 June 2020 (UTC)

:The short version is that you're looking for mw:Manual:Database layout now, and you'll be looking for [https://sql-optimizer.toolforge.org the Toolforge SQL Optimizer] later when your perfectly-reasonable-seeming queries time out.{{pb}}The long version:

:*revision is where edits (in non-deleted pages) live.

:**revision.rev_page is a foreign key to page.page_id; page.page_namespace should be 0 for the article namespace.

:**revision.rev_timestamp is the time of the edit.

:**revision.rev_actor is a foreign key to actor.actor_id.

:***actor.actor_name is the username for a logged-in editor or the IP for an IP edit.

:***actor.actor_user is NULL for an IP, and a foreign key to user.user_id for a logged-in editor.

:****user_groups is a many-to-many table mapping user_groups.ug_user (a foreign key to user.user_id again) to user_groups.ug_group, which for bots will be 'bot'.

:You'll want to use the actor_revision view of actor instead of actor directly. Both, for technical reasons related to deleted and oversighted edits, hide complex queries into the real actor table - try running 'SHOW CREATE TABLE actor;' and try not to cringe - but actor_revision is less complex for cases where you're coming through the revision table.{{pb}}Similarly, you'll... probably... want to use revision_userindex instead of revision; the view of revision we have access to lacks the indexes on rev_actor, again to omit deleted/oversighted edits. On the other hand, revision sometimes works better than revision_userindex when you're filtering by namespace. You'll probably have to try with both. —Cryptic 04:18, 22 June 2020 (UTC)

::I'll also note that edit count-based queries have been, in my experience, the hardest to write without hitting the query limits. --AntiCompositeNumber (talk) 05:08, 22 June 2020 (UTC)

Pages edited by a single user

I need help to create a query that allows me to see which pages are modified only by a specific user in a specific category. Thanks --Samuele2002 (talk) 16:51, 7 July 2020 (UTC)

:Which user and category? —Cryptic 18:01, 7 July 2020 (UTC)

::{{ping|Cryptic}} I need it for hu.wikibooks the category is: :hu:b:Kategória:Azonnali törlésre váró lapok the user is: KeFe. Thanks. --Samuele2002 (talk) 21:59, 7 July 2020 (UTC)

:::quarry:query/46413. —Cryptic 23:24, 7 July 2020 (UTC)

::::Thank you very much. --Samuele2002 (talk) 00:17, 8 July 2020 (UTC)

Sort keys of members of [[:Category:Pages using Infobox broadcasting network with unknown parameters]]

More than 10 percent of the transclusions of :Category:Pages using Infobox broadcasting network with unknown parameters are in this category which suggests unmet needs. (And I removed another 35!) Can I get a list/count of the sort keys of the members of this category to determine what unknown fields are being used most without appearing? Raymie (tc) 03:38, 9 July 2020 (UTC)

:{{ping|Raymie}} A count isn't helpful here, as the sortkey for a given {{para|key|value}} pair is keyvalue. That means the key can't be easily separated from the value, so I got 181 sortkeys with a count of 1. https://quarry.wmflabs.org/query/46463 is the sorted list of sortkeys though. --AntiCompositeNumber (talk) 04:06, 9 July 2020 (UTC)

::{{ping|AntiCompositeNumber}} Thanks! There will likely be an RfC to improve this template soon. Raymie (tc) 04:09, 9 July 2020 (UTC)

Userspace AfC drafts that haven't been edited in 6 months

My attempt:

SELECT DISTINCT page_title, rev_timestamp, page_latest

FROM page

JOIN revision ON rev_id = page_latest

JOIN templatelinks ON tl_from = page_id

WHERE page_namespace = 2

AND tl_title = "AfC_submission"

AND page_is_redirect = 0

AND rev_timestamp < "20200227044648"

result: "This query took longer than 30 minutes to execute and was killed."

I can't figure why it would that long. There are only [https://linkscount.toolforge.org/?namespace=10&p=AFC_submission&fromNamespace=2&invertFromNamespace=false&dbname=enwiki 3000 pages] transcluding {{t|AfC submission}} in userspace. Am I doing something wrong? SD0001 (talk) 07:35, 27 August 2020 (UTC)

:templatelinks follows redirects. The template you want is actually at 'AFC_submission'. More importantly, your query can't use the fact that there are only ~3000 userspaces transcluding it (and only 42637 total), since you're asking for pages that transclude any page in any namespace with that title - you need to put in an AND tl_namespace = 10 clause. —Cryptic 08:19, 27 August 2020 (UTC)

::Oooof, adding the tl_namespace clause (and fixing the template name) gives the [https://quarry.wmflabs.org/query/47714 result] in <1 second. Mind blown. I had assumed specifying tl_namspace would be superficial since :Category:AFC submission, WP:AfC submission, Portal:AFC submission, etc anyway don't exist. SD0001 (talk) 08:40, 27 August 2020 (UTC)

Extremely long HTML comments / hidden material

(Originally posted at the WP:VPM.) It was [//en.wikipedia.org/w/index.php?title=Fascism&type=revision&diff=966418210&oldid=966023759 recently discovered] that a 10,000+ byte section of the aforelinked article had been hidden in an HTML comment, unnoticed, since [//en.wikipedia.org/w/index.php?title=Fascism&diff=517648656&oldid=517647680 2012](!). I would like a list of other (en.WP) pages with extremely long HTML comments, perhaps ones over 1,000 bytes (or some higher cutoff if that returns way too many [>500] pages). If the list could be sorted by and/or include the size of the HTML comment so the biggest ones could be looked at first, all the better. The rationale is that extremely long HTML comments probably represent material that should either not be hidden or not be in the article. -sche (talk) 20:13, 15 July 2020 (UTC)

:This would be a valuable report. I've also seen cases where good content was commented out, presumably by accident. I'm pretty sure that neither Quarry nor PetScan can do this; it would need a grep or simple custom program to scan a dump of the wikitext of every page, which is not stored in the databases accessible to Quarry. Certes (talk) 20:30, 15 July 2020 (UTC)

:Error 05 at WP:WikiProject Check Wikipedia/List of errors finds HTML comments with no end. It might be possible to modify this to check for long comments too. Certes (talk) 20:38, 15 July 2020 (UTC)

::{{ping|-sche|Certes}} I gave this a go, see [https://public.paws.wmcloud.org/User:HaeB/comments/longcomments_enwiki_20200901.html here] for a first test run based on a partial dump.

::Let me know whether the format and cutoff look useful, and I can then try to produce the same for the entire set of articles.

::Regards, HaeB (talk) 17:23, 8 September 2020 (UTC)

:::That looks perfect (and slightly concerning) but I'd better leave the final say to the editor requesting it. Can we distinguish deliberate long comments from accidental ones? One way might be to look at whether the comment ends with --> or is accidentally terminated by a later, shorter, deliberate HTML comment, i.e. which of <!-- and --> occurs next after the start of the long comment. Does that sound useful? To clarify, we'd be distinguishing between

:::# <!-- Long deliberate comment which rambles on for several lines but is as the editor intended and shouldn't be shown to readers -->

:::# <!-- Accidentally unterminated short comment. [--> omitted here] Long wikitext which should appear but doesn't. <!-- Deliberate short second comment -->

:::Does that sound useful? Certes (talk) 18:28, 8 September 2020 (UTC)

::::That list looks useful, thank you! I notice it catches a lot of Iowa towns with 1001-byte "NOTICE"s not to add non-notable people, which we could either sidestep by slightly increasing the cutoff to ~1005 bytes (LOL), or by, you know, "fixing" all those comments in some way (they don't need to be so long, and unless the articles are known vandalism targets, may not need to be there at all, and would probably be better as "edit notices" rather than hidden article content). As for distinguishing the two kinds of HTML comments, I can see the usefulness if someone wants to (if it's not too difficult to code), since accidentally unterminated comments are clearly always problems that need to be cleaned up. But I would still want to find excessively large deliberate comments, like the one which prompted this thread, because they still tend to represent material that either shouldn't be in the article or shouldn't be hidden, AFAICT. For example, on List of state leaders in 1759, the bulk of the article is commented-out as unsourced, but I imagine no-one is going to notice to reference it under that condition, so it would be better to either un-hide and {{tl|cn}}-tag the lines or {{tl|refimprove}}-tag the article, or move the unsourced lines to a "holding pen" on the article talk page. (I.e., it seems to me that we would still want to find large comments like that and do something about them, even if they were deliberate.) -sche (talk) 21:25, 8 September 2020 (UTC)

:::::(So, to be clear, I think there's no need to split the list or distinguish those two types of HTML comment if doing so would be difficult.) -sche (talk) 07:40, 14 September 2020 (UTC)

:The effect is not always as dramatic as feared. The largest obviously accidental comment is in Ali (introduced {{diff|Ali|prev|975998600|here}}) but it only obscures a few lines of text. After the second </ref>, the parser somehow seems to decide that the comment has gone on quite enough and resumes rendering the wikitext. Certes (talk) 22:31, 8 September 2020 (UTC)

Files with redundant FURs

Hello!

I would like a query which lists all file pages that uses any of the templates at User:Jonteemil/sandbox (including all the redirects of the respective templates), and where the {article} parameter equals a page which the file is NOT used on. For example :File:Atletico Madrid 2017 logo.svg would be listed because it's not used on Atlético Madrid (youth) for which it despite this has a FUR.

Thanks!Jonteemil (talk) 21:16, 15 September 2020 (UTC)

:This can't reliably be done with a database query; we can see which pages a template is used on, which pages a file is used on, which templates are used on a file page, and what pages are linked to from that file page, and link them all together; but we can't see template arguments (or the wikitext of the page). Any other links from the file description pages would show up as false positives.{{pb}}There's a bot (or at least there used to be one) that goes around and removes images from pages they don't have a NFUR for. Of necessity, it would have to have this data. If the botop is active, I'd suggest asking them to make this data available. —Cryptic 02:49, 16 September 2020 (UTC)

:: You're talking about {{u|JJMC89 bot}}, operated by {{u|JJMC89}} (who is still active). * Pppery * it has begun... 02:58, 16 September 2020 (UTC)

Short sections with specific headings in biographies

I'm not sure if this query is possible: I'm looking for short sections in biographies that carry the words "Controversies", "Legal", "Arrest", or "Conviction" in their subheading. (I'm not sure how best to define short; looking for sections with only a sentence or few sentences) –xenotalk 12:46, 26 August 2020 (UTC)

:Well, it's possible, and not even terribly hard if you define "short" as "less than (some specific number of characters)", but it's very, very inconvenient - you'd need to download a database dump and either install it locally or parse it manually, since the toolserver replicas don't have page contents. —Cryptic 12:53, 26 August 2020 (UTC)

::Thanks for the fast response Cryptic! –xenotalk 13:10, 26 August 2020 (UTC)

:I think it's impossible with Cirrus search too because it has no repetition count syntax like /[^=]{1,500}/. Beware that a subsection header doesn't end a section and make it "short"! Certes (talk) 15:37, 26 August 2020 (UTC)

:{{ping|xeno}} I gave this a first try, see [https://public.paws.wmcloud.org/User:HaeB/short%20sections/shortsections_enwiki_20200901.html here] for an initial result based on a partial dump. Let me know whether the format looks useful, and I can then try to produce the same for the entire set of articles (which will take a while to run).

:Note that the first entry (Ayaan Hirsi Ali#Controversies) seems to be an artifact - the section is empty due to mismatched section levels in the article.

:I have to say that I hesitated a little at first to volunteer for this request, having seen a few too many incidents where editors had gone a little overboard and removed sections with relevant controversies entirely (often citing and over-interpreting a certain essay). That said, I totally agree that there are a lot of BLPs that over-emphasize minor incidents - say a DUI arrest without conviction - with separate section headings. And I see that in the discussion thread that apparently gave rise to this request, you already offered [https://en.wikipedia.org/w/index.php?title=Special:Contributions&offset=20200815233822&limit=50&target=Xeno&namespace=0 a sample of edits] that by and large look great to me too, and engaged in a thoughtful discussion about such questions with Flyer22 Frozen, Schazjmd and Tenebrae. (Pinging them in case they too have thoughts about the initial results list, the cutoff value, or the search terms used.) Still, I have included a note on top of the results page to protect other editors from misunderstanding it as an invitation to remove all section headings on the list without individual evaluation.

:Regards, HaeB (talk) 22:52, 9 September 2020 (UTC)

::{{u|HaeB}}: Thank you for your diligence, balanced response, and sample results. One thing I notice initially is false positives like "Legal career" and the like. Meanwhile, it does find many of the small negative sections that run afoul of WP:UNDUE (like was fixed in Special:Diff/977618374). –xenotalk 23:10, 9 September 2020 (UTC)

::{{u|HaeB}}, impressive! I peeked at a random set of the results and in each case felt the headings would be better removed (and the content in one or two as well). As these are all BLPs, I think this is a worthwhile effort. {{u|xeno}}, if you take this on and would like any help, I would glad to contribute. Schazjmd (talk) 23:16, 9 September 2020 (UTC)

:::I'm happy to have help with it Schazjmd. Thank you for the offer! –xenotalk 23:17, 9 September 2020 (UTC)

::::I had to clean up :Ahmed Taleb Ibrahimi, one of my random checks, because it was an 11-year-old placeholder heading and I couldn't just leave it... Schazjmd (talk) 23:24, 9 September 2020 (UTC)

:::::No worries-have at it! I'm not sure when I'll have time to get to the sampler. And it would be good to get a different perspective on how to handle the sections. –xenotalk 23:29, 9 September 2020 (UTC)

::::::I've gone through all of the samples. You got to a few before me, some I fixed, some I left unchanged as the headings didn't seem inappropriate or undue. That was fun! Schazjmd (talk) 19:50, 10 September 2020 (UTC)

::{{u|HaeB}} This [https://en.wikipedia.org/w/index.php?title=Special:Contributions&dir=prev&offset=20200910155721&limit=33&target=Schazjmd&namespace=0 excellent series of edits] by {{u|Schazjmd}} really highlights the scope and depth of the BLP issues highlighted by the sample results you provided. –xenotalk 20:00, 10 September 2020 (UTC)

:::Thank you both for the feedback, glad this is useful!

:::It looks like there is no other input at the moment, so I will launch the full analysis shortly with the same search parameters. As mentioned earlier, this may take a while to complete.

:::By the way, I should be able to post the full results as a wiki page instead of off-wiki, assuming that this makes it easier to coordinate.

:::Regards, HaeB (talk) 04:03, 11 September 2020 (UTC)

::::{{Ping|xeno|Schazjmd}} I have posted the full result here. Feel free to move it to a different location, and also to edit the table if it makes processing easier. (Alternatively, it's also still available [https://public.paws.wmcloud.org/User:HaeB/short%20sections/shortcontrosections_enwiki_20200901.html here] as before.) Regards, HaeB (talk) 08:29, 15 September 2020 (UTC)

:::::Thank you, {{u|HaeB}}! I'm relieved that it didn't return thousands. {{u|Xeno}}, I'm going to start at the bottom and work my way up. Schazjmd (talk) 13:50, 15 September 2020 (UTC)

:::::Also less than I imagined - {{u|HaeB}}, thanks for your hard work! {{u|Schazjmd}} sounds good, happy editing! –xenotalk 14:03, 15 September 2020 (UTC)

::::::{{u|Xeno}}, and, I'm done. There were a surprising number of empty sections, apparently some editors seem to think it's a standard heading that should always be included(?!?). I didn't even look at the "legal career" ones, as those are obviously not the issue. On a few, I could neither read nor translate the refs to make an informed decision so left those untouched. And a couple, I simply couldn't come up with a better way to handle the content, so didn't change those either. But I think I improved most. {{u|HaeB}}, thanks again for creating that page to work from. Schazjmd (talk) 16:58, 17 September 2020 (UTC)

Most-viewed non-vital pages

I'm interested in identifying potential candidates for WikiProject Vital Articles. Would it be possible to run a query that identifies the pages with the most views over the past year that do not have their talk pages categorized in :Category:Wikipedia vital articles or any of its subcategories? {{u|Sdkb}}talk 23:47, 24 September 2020 (UTC)

:{{ping|Sdkb}} {{done}} https://public.paws.wmcloud.org/User:AntiCompositeBot/NonVitalViews.ipynb Out of the top 500 most-viewed articles in 2019, 236 are not vital articles at any level. --AntiCompositeNumber (talk) 00:39, 25 September 2020 (UTC)

::Thanks! Whew, that's a lot of pop culture pages haha. {{u|Sdkb}}talk 02:07, 25 September 2020 (UTC)

Articles by number of <nowiki><ref></ref></nowiki> tags

Hello, and this may not be the best place for this question, but I'm looking for a way to get a list of all pages with 9 or fewer ref tags. Dawnseeker2000 15:13, 5 October 2020 (UTC)

Missing "the First" redirects

I would like a list of all pages which:

  • Have the word I in their titles
  • This word is either the end of the page title, or is followed by a comma, a parenthesis, or one of the words "and", "of"
  • The word I isn't the first word in the title; the previous word starts with a capital letter
  • There is no page with the title replacing the word I with "the First".

217.132.248.209 (talk) 00:46, 26 October 2020 (UTC)

Query to find all cite book templates

Looking for a query to extract the approx 2 million cite book templates eg. {{Cite book|title=A title|author=Author}} from every article similar to what you would get with insource:/[{]{2}cite book[^}]{2}/ (insource stops out at 10k results). Could regex from dumps but they are unwieldy large and update once a month. For use in a script that will run through various language sites. Possible to solve with queries? -- GreenC 22:01, 31 October 2020 (UTC)

:{{re|GreenC}} There are 1,174,934 according to Special:Search/hastemplate:"Cite book". PetScan should give you the titles in downloadable form; do you also need the wikitext of the template usage? Certes (talk) 22:13, 31 October 2020 (UTC)

1,174,934 is the number of articles, there are 1.5 to 2 times as many actual instances of the template. I need the wikitext and article names per the example above. This will be used in an automated script, prefer not to use a third party tool like Petscan but to query the SQL replication database on Toolforge ie. a SQL query. For example, this is what a SQL query looks like for retrieving External inks:

USE enwiki_p;

SELECT el_to, page_namespace, page_title

FROM externallinks JOIN page ON el_from = page_id

WHERE el_index LIKE "%//com.sports-reference.www.%"

Something like that, but for retrieving and wikitext and article names of books citations. FYI I know exactly how to retrieve this information by other means, at least 3 different methods. But I do not know how to retrieve it via SQL, which is what I am interested in, and the purpose of this post. -- GreenC 23:27, 31 October 2020 (UTC)

:I was answering from the viewpoint of Quarry, which can't access wikitext and thus can't do what you need. Perhaps someone who knows Toolforge well can be more helpful. Certes (talk) 00:10, 1 November 2020 (UTC)

:We can get you the page names transcluding the template, but that's about it. Not even a count of transclusions per page. —Cryptic 01:32, 1 November 2020 (UTC)

::Cryptic, thanks, that's what I need to know. -- GreenC 02:01, 1 November 2020 (UTC)

:::For direct transclusions of {{tl|Cite book}},SELECT page_namespace, page_title

FROM templatelinks JOIN page ON page_id = tl_from

WHERE tl_namespace = 10 AND tl_title = 'Cite_book';That won't show the ~4350 pages transcluding redirects to Template:Cite book; for those,SELECT page_namespace, page_title, tl_namespace, tl_title

FROM templatelinks JOIN page ON page_id = tl_from

WHERE (tl_namespace, tl_title) IN

(SELECT page_namespace, page_title

FROM redirect JOIN page ON page_id = rd_from

WHERE rd_namespace = 10 AND rd_title = 'Cite_book');—Cryptic 02:26, 1 November 2020 (UTC)

Medicine pages by protection level

Hi folks, I'm looking for a list of all medicine-related articles (i.e. their talk pages belong to :Category:All WikiProject Medicine articles) that also have some level of protection. If I could have that list separated by level of protection (or just the protections applied to each article marked on the list), that would be ideal. We had a discussion amongst medicine editors wondering how many med-articles have long-standing page protection that could be lowered, but without a list we're mostly just guessing in the dark. If there's an easy way I could do this myself, just point me in the right direction and I'll be out of your hair. Thanks a million! Ajpolino (talk) 17:09, 20 November 2020 (UTC)

:{{re|Ajpolino}} Could you use PetScan to intersect {{c|All WikiProject Medicine articles}} with appropriate subcategories of {{c|Wikipedia protected pages}}? A search on incategory: could also work, but PetScan lets you include subcategories more easily. Certes (talk) 17:16, 20 November 2020 (UTC)

::I can certainly try, but (and pardon my total technical ignorance) since one category is on articles and the other is on their talk pages, can someone cue me in on how to intersect those two in Petscan? Much appreciated. Ajpolino (talk) 17:25, 20 November 2020 (UTC)

::Looks like it's in the PetScan manual. Will give it a try and come back crying if I need help. Thanks! Ajpolino (talk) 17:30, 20 November 2020 (UTC)

:::For the information on the talk page, you'll need to use a template rather than a category, but that should be no problem here. Search wouldn't cope with that, so please ignore my second suggestion above. Beware of categories such as {{c|Biden family}} which is in {{c|Wikipedia indefinitely semi-protected biographies of living people}} but contain unprotected pages. Having said that, my [https://petscan.wmflabs.org/?sitelinks_any=&templates_use_talk_yes=on&language=en&wikidata_prop_item_use=%C2%AEexp_filter&smaller=%C2%AEexp_filter&ns%5B0%5D=1&cb_labels_yes_l=1&edits%5Banons%5D=both&cb_labels_no_l=1&edits%5Bflagged%5D=both&search_query=%C2%AEexp_filter=&cb_labels_any_l=1&project=wikipedia&depth=6&links_to_all=%C2%AEexp_filter&search_max_results=500&categories=Wikipedia%20protected%20pages&templates_yes=WikiProject%20Medicine&sortby=ns_title&edits%5Bbots%5D=both&interface_language=en attempt] finds no pages. I vaguely recall a bug concerning templates on talk pages which could be remedied by using an older version of PetScan. Can anyone else shed light on this? Certes (talk) 17:29, 21 November 2020 (UTC)

::::{{ping|Certes}} You had some weird unregistered trademark text (I guess copy-pasted from somewhere; it's %C2%AEexp_filter ) in that query. However, when I run it I'm just about to a timeout here so... not sure what's going on. I do know that labs is in the middle of a database migration which might be affecting the query (unless you've been running others without issue). --Izno (talk) 20:51, 21 November 2020 (UTC)

:::::"®exp_filter" wasn't in my query, but "Link to a pre-filled form" somehow added it in three places. The real link (hand-crafted) is [https://petscan.wmflabs.org/?sortby=ns_title&cb_labels_any_l=1&interface_language=en&edits%5Banons%5D=both&templates_use_talk_yes=on&cb_labels_yes_l=1&templates_yes=WikiProject%20Medicine&search_max_results=500&depth=6&edits%5Bbots%5D=both&edits%5Bflagged%5D=both&project=wikipedia&cb_labels_no_l=1&categories=Wikipedia%20protected%20pages&ns%5B0%5D=1&language=en this]. Something is still wrong: Depth=5 instantly gives "No result for source categories" and Depth=6, as you say, takes too long. Certes (talk) 21:05, 21 November 2020 (UTC)

:I can't speak to your PetScan woes, but here's a direct query. —Cryptic 16:54, 23 November 2020 (UTC)

::Thank you {{U|Cryptic}}!! And thank you all for the PetScan help. Those queries are still coming up empty for me today, not sure if something is wrong on PetScan's end or my end. Anyway, thanks all for your time! Ajpolino (talk) 18:32, 23 November 2020 (UTC)

WPCleaner large edits

WPCleaner for some time has been making messy edits expanding modules (example [https://en.wikipedia.org/w/index.php?title=2019_World_Women%27s_Handball_Championship&diff=986669111&oldid=973623184]). Is there a way to query for large revisions made with the WPCleaner tag for some interesting timeline? Looking for large additions I think, as far as back as we can go. --Izno (talk) 22:37, 28 November 2020 (UTC)

:{{re|Izno}} I've had a go in quarry:query/49440, guessing that 10,000 bytes increase is a suitable threshold. 23 edits found, including the example above. Certes (talk) 01:26, 29 November 2020 (UTC)

::Certes, thanks. quarry:query/50137 at 2k change manages not to time out at just about 10 minutes runtime and grabs another couple dozen more (though seemingly some false positives even in the more conservative set). I'll take my 10 minute query and go from there. --Izno (talk) 04:53, 29 November 2020 (UTC)

Looking for broken templates containing URLs

{{resolved}}

I am requesting a query that finds all articles (article space only) containing the following regex:

\{\{[^\}<\|]*[Hh]ttps?:

This query finds a subset of articles with broken templates that do not show up in Special:WantedTemplates. I have been using insource searching, but it always times out with about 20 results. When I fix those 20 articles and repeat the search, 20 more new articles turn up. I think there may be hundreds of these articles, since they do not turn up in any categories or reports.

For bonus points, if the matching string from the article could be including in the report, that would be extra helpful.

Feel free to place the article list in User:Jonesey95/sandbox. Thanks in advance! – Jonesey95 (talk) 03:04, 2 December 2020 (UTC)

:Is the regex correct? When simplified a bit and filter only on articles containing http so it doesn't time out it returns a quarter million results:

:* insource:http insource:/\{\{[^\|]*https?:/ [https://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=insource%3Ahttp+insource%3A%2F%5C%7B%5C%7B%5B%5E%5C%7C%5D*https%3F%3A%2F&go=Go&ns0=1]

:— Preceding unsigned comment added by GreenC (talkcontribs)

::Thanks for the question. Removing the "\}" and the "<" from the regex gives thousands of false positives, articles that do not have errors of this type. The regex I pasted above is the one I want. With the exception of erroneous template code inside of comments, it gives me the articles I am looking for, but it times out. – Jonesey95 (talk) 05:20, 2 December 2020 (UTC)

Here are some sample errors that this regex finds (one error on each line):

{{url=https://www.haileybury.com/about-haileybury/our-story/history-of-haileybury/}}

{{url=https://bittman.blogs.nytimes.com/2012/07/11/fao-yields-to- ...

{{cite web]url=https://isearch.asu.edu/profile/932251 |title=...

{{Forschungszentrum Karlsruhe: [http://www-proxy2.fzk.de/fzk/groups/s/documents/internetdokument/id_053468 ...

{{[https://www.plantextractssr.com/product/fruit-vegetable-powder/seabuckthorn-powder/ ...

{{url=https://twitter.com/jeremiahfraites/status ...

{{cite web url=http://www.longislandschools.com/schools/long- ...

{{Website:https://inhomecareassistance.com/

None of these are valid templates. – Jonesey95 (talk) 05:24, 2 December 2020 (UTC)

:Repeating a search which times out risks repeatedly checking the same subset of pages, so once you've done enough fixes you may see no more errors even though they exist. You could limit your search to each initial letter in turn, e.g. insource:/\{\{[^\}<\|]*[Hh]ttps?:/ prefix:A. prefix:1 and 2 are worth checking for year articles, and you'd miss a few accented initials and weird titles, but it's probably an improvement. I'd expect a few hundred cases in total. Certes (talk) 10:29, 2 December 2020 (UTC)

::Oh, prefixing is a cute way to limit search results (though it should come before, not after, of course). --Izno (talk) 19:09, 2 December 2020 (UTC)

:::The prefix has to come after, because it has no delimiter. A search for prefix:Open sesame only returns titles beginning with "Open sesame". To find the Open-air Water Museum article, which mentions sesame, requires sesame prefix:Open instead. Certes (talk) 19:43, 2 December 2020 (UTC)

::::Hey, that prefix thing works great. I set up a group of 36 searches in one of my user pages so that I can click on any one of them to look for these errors in a subset of WP articles. Thanks {{U|Certes}}! – Jonesey95 (talk) 20:09, 2 December 2020 (UTC)

::::I would guess prefix, like the other parameters, supports quotation marks to delimit its text. Does it not? --Izno (talk) 21:01, 2 December 2020 (UTC)

:::::It does, but support for quotes is undocumented, so I wouldn't rely on it. Certes (talk) 21:37, 2 December 2020 (UTC)

:::::Experiment suggests that quotes are supported only around the entire prefix. prefix:"Open" finds titles beginning with Open, but prefix:"C" Is only finds titles which begin with a quotation mark followed by C, etc. Certes (talk) 23:31, 2 December 2020 (UTC)

[edit conflict] Following Certes excellent idea this is what I ran using [https://github.com/greencardamom/Wikiget wikiget] with [https://github.com/greencardamom/BotWikiAwk/blob/master/lib/library.awk library.awk] :

:awk -ilibrary 'BEGIN{c = split("abcdefghijklmnopqrstuvwxyz1234567890#$%^&*()-+=~`'"'"'?:<>[]{}\"", A, ""); for(i=1;i<=c;i++) {print sys2var("wikiget -a \"insource:/\\{\\{[^\\} < \\|]*[Hh]ttps?:/ prefix:" shquote(A[i]) "\"") } }' > out.txt

It produced 80 articles. If you want I'll drop them into User:Jonesey95/sandbox. -- GreenC 20:11, 2 December 2020 (UTC)

:Actually it's producing fewer results then should, probably due to maxlag set too low. Anyway I'll leave this for now unless you want to try again but looks like your method is working fine. -- GreenC 20:19, 2 December 2020 (UTC)

::Thanks for going above and beyond, and for the sandbox dump. I got 163 articles (after I fixed 100+ yesterday) from 26 letter-based prefix searches (and, suspiciously, zero from "prefix:1" and other numbers, which makes me think that they do not work). – Jonesey95 (talk) 21:01, 2 December 2020 (UTC)

:::I found one hit from prefix:1 but it's no longer there, so you may have fixed it. I forget the exact title. Certes (talk) 21:37, 2 December 2020 (UTC)

The reason you're getting more searches (like you already had) instead of queries (like you asked for) is that they really aren't feasible, since queries don't have access to wikitext. What they can do is look for well-formed template invocations, even if they're invocations of redlinked "templates" like Template:url=https://youtu.be/bunchofletters, though that excludes some of your examples (the ones with square brackets, and I suppose ones that don't ever end in }}). It's also ruinously slow without reasonably narrow prefixes for the template name. Queries similar to quarry:query/50255 are about the best that can be done. —Cryptic 20:45, 2 December 2020 (UTC)

Most active pending changes reviewers (debug)

I think I'm pretty close with this one, but it's currently hanging. I need help debugging it. I have two different versions, [https://quarry.wmflabs.org/query/52310 a subquery approach] and [https://quarry.wmflabs.org/query/52311 a group by approach] 1) Where's the mistake in my SQL? 2) Any tips for debugging Quarry when it gets stuck on "running" like in this situation? Normally I could debug something like this, but Quarry isn't throwing any human readable errors that I can tell. I added LIMIT 10 temporarily to make sure the sample size isn't causing the timeout. Thanks. –Novem Linguae (talk) 10:55, 9 February 2021 (UTC)

:The second query link above finished after 25 minutes, and gave me enough hints to fix it. Here's the [https://quarry.wmflabs.org/query/52372 new, working query]. Looks like there's an actor table in between users and logging, gluing it together. And there's also a log_action field that is similar to log_type. Good learning experience. –Novem Linguae (talk) 02:33, 11 February 2021 (UTC)

Missing editnotices

I'd like to add {{tl|People list editnotice}} to pages that need it and are currently missing it. Would it be possible to run a query that returns the redlinked editnotices for pages that meet all of the following conditions?

  1. Member of :Category:Featured lists
  2. Member of :Category:Lists of people or a subcategory
  3. Has {{tl|Dynamic list}}
  4. Does not have an editnotice or groupnotice

Regards, {{u|Sdkb}}talk 22:31, 13 March 2021 (UTC)

:{{u|Sdkb}}, I started writing this one. If I wrote it right, there appears to be no pages that are in both Category:Featured lists and Category:Lists of people. [https://en.wikipedia.org/w/index.php?search=incategory%3A%22Featured+lists%22+incategory%3A%22Lists+of+people%22&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1 Search for conditions 1 AND 2 (no results)]. [https://en.wikipedia.org/w/index.php?search=incategory%3A%22Featured+lists%22+incategory%3A%22Lists+of+people%22+hastemplate%3A%22Dynamic+list%22&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1 Search for conditions 1 AND 2 AND 3 (no results)]. Let me know if I got something wrong or if you have additional feedback. –Novem Linguae (talk) 05:52, 14 March 2021 (UTC)

::{{u|Novem Linguae}}, pretty much everything in :Category:Lists of people is subcategorized. Is that what's causing the issue? {{u|Sdkb}}talk 05:55, 14 March 2021 (UTC)

:::{{u|Sdkb}}, well spotted. I changed "incategory:" to "deepcategory:", but am now running into a [https://en.wikipedia.org/w/index.php?search=deepcategory%3A%22Featured+lists%22+deepcategory%3A%22Lists+of+people%22+hastemplate%3A%22Dynamic+list%22&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1 "too many categories" error]. When I'm not on a laptop (probably tomorrow), and if somebody doesn't do it before me, I'll go download the 17GB offline database and do an SQL query on the `text` table, which should get around this limitation. By the way, can you elaborate a bit on "does not have an editnotice or groupnotice"? Am I checking for certain wikicode, a certain subpage, etc? –Novem Linguae (talk) 06:08, 14 March 2021 (UTC)

::::{{u|Novem Linguae}}, much thanks! For the editnotice, that's the tricky part. You're checking e.g. for List of polio survivors whether Template:Editnotices/Page/List of polio survivors exists. I doubt there's a way to check for groupnotices, but those should be rare. {{u|Sdkb}}talk 06:22, 14 March 2021 (UTC)

:::::Petscan is a good tool for this sort of search. [https://petscan.wmflabs.org/?minlinks=&templates_yes=Dynamic%20list&sortby=title&depth=3&edits%5Bflagged%5D=both&cb_labels_no_l=1&categories=Featured%20lists%0ALists%20of%20people&cb_labels_any_l=1&active_tab=tab_templates_n_links&edits%5Bbots%5D=both&since_rev0=&project=wikipedia&ns%5B0%5D=1&cb_labels_yes_l=1&search_max_results=500&language=en&edits%5Banons%5D=both&interface_language=en&&doit= Here's a result that is pretty close to what you asked for.] You might need to fiddle with the category depth (with the usual caveats about en.WP's category system being somewhat confused). I didn't include the fourth criterion, but if edit notices always include a template, they should be easy to include in the search criteria. – Jonesey95 (talk) 15:52, 14 March 2021 (UTC)

::::::{{u|Jonesey95}}, thanks for mentioning Petscan, that's exactly what was needed for conditions 1-3. For condition 4, I used Quarry and PHP array_diff. {{u|Sdkb}}, here's the results. array ( 0 => 'Editnotices/Page/List_of_Alpha_Phi_Alpha_brothers', 1 => 'Editnotices/Page/List_of_Athabasca_University_people', 2 => 'Editnotices/Page/List_of_Benet_Academy_alumni', 3 => 'Editnotices/Page/List_of_Boston_Latin_School_alumni', 5 => 'Editnotices/Page/List_of_De_La_Salle_University_people', 6 => 'Editnotices/Page/List_of_Georgia_Institute_of_Technology_alumni', 7 => 'Editnotices/Page/List_of_Georgia_Institute_of_Technology_athletes', 9 => 'Editnotices/Page/List_of_Major_League_Baseball_players_with_unidentified_given_names', 10 => 'Editnotices/Page/List_of_Missouri_University_of_Science_and_Technology_alumni', 12 => 'Editnotices/Page/List_of_Phi_Kappa_Psi_brothers', 14 => 'Editnotices/Page/List_of_Tau_Kappa_Epsilon_brothers', 15 => 'Editnotices/Page/List_of_United_States_Air_Force_Academy_alumni', 16 => 'Editnotices/Page/List_of_United_States_Military_Academy_non-graduate_alumni', 17 => 'Editnotices/Page/List_of_United_States_Naval_Academy_alumni', 19 => 'Editnotices/Page/List_of_University_of_Waterloo_people', 20 => 'Editnotices/Page/List_of_Washington_&_Jefferson_College_alumni', 21 => 'Editnotices/Page/List_of_Washington_College_alumni', 22 => 'Editnotices/Page/List_of_Wilfrid_Laurier_University_people', 23 => 'Editnotices/Page/List_of_academicians_educated_at_the_United_States_Military_Academy', 25 => 'Editnotices/Page/List_of_deaths_from_drug_overdose_and_intoxication', 26 => 'Editnotices/Page/List_of_engineers_educated_at_the_United_States_Military_Academy', 34 => 'Editnotices/Page/List_of_human_Sesame_Street_characters', 35 => 'Editnotices/Page/List_of_mathematicians,_physicians,_and_scientists_educated_at_Jesus_College,_Oxford', 40 => 'Editnotices/Page/List_of_politicians,_lawyers,_and_civil_servants_educated_at_Jesus_College,_Oxford', 41 => 'Editnotices/Page/List_of_racing_cyclists_and_pacemakers_with_a_cycling-related_death', 42 => 'Editnotices/Page/List_of_sportspeople_educated_at_Texas_Tech_University', 43 => 'Editnotices/Page/List_of_sportspeople_educated_at_the_United_States_Military_Academy')Novem Linguae (talk) 19:28, 14 March 2021 (UTC)

:::::::Much thanks; did the run! {{u|Sdkb}}talk 18:40, 15 March 2021 (UTC)

Categories with most orphaned articles

In the past, I used https://www.toolserver.org/~lvova/cgi-bin/go.sh, but now it's dead. In Quarry, I found https://quarry.wmflabs.org/query/14172, but unable to make it work now, due to changes in db schema. Could anyone help? Thanks! Bennylin (talk) 16:38, 16 March 2021 (UTC)

:There's a number of deeply suspicious-looking things about that query, such that I'm surprised that it ever produced useful data. quarry:query/53312, or quarry:query/53315 if you only care about counts in visible categories. —Cryptic 08:28, 17 March 2021 (UTC)