Wikipedia:Bot requests/Archive 55#Remove Google Analytics tracking from external links

{{Automatic archive navigator}}

[[WP:Centijimbos]]

Currently this list must be manually updated, which is a tedious time-waster; and I believe it seems like the perfect task for a purpose-built little bot. I can't see it being very difficult to create one that measures the data and updates it. Can we have one for the page? — Preceding unsigned comment added by Doc9871 (talkcontribs)

:If you ask me, this is quite useless and an unnecessary strain on servers as it would have to scan all users, from what I know. So I'm going to say {{botreq|possible}} but {{botreq|advertise}}.—cyberpower ChatOnline 17:37, 28 May 2013 (UTC)

:*I'm all for humour pages, but I'm not sure why we should have a bot for a humourous page, to be honest. Lukeno94 (tell Luke off here) 18:16, 28 May 2013 (UTC)

:By "all users" do you mean all the users on the list? There's really not that many; but I know zero about programming bots. Doc talk 01:49, 29 May 2013 (UTC)

::That would make the list even more useless. He means list all top users. Which would require a scan of every user.—cyberpower ChatOffline 03:36, 29 May 2013 (UTC)

:I'm asking you, Cyberpower, specifically. What "he" means is... you? If you're speaking in the third person, I understand. The uselessness of the list itself is not what I'm looking to discuss. Is it feasible to set up a small bot designed only for this page or not? If not, we'll just have to keep updating it manually. Doc talk 03:44, 29 May 2013 (UTC)

::I have no qualms with this list. I have no qualms with a bot to update it, either. It's one query, for goodness' sakes. Theopolisme (talk) 03:52, 29 May 2013 (UTC)

:::{{replyto|Theopolisme}} What queries would you use to generate this list?—cyberpower ChatOffline 04:18, 29 May 2013 (UTC)

::::From [https://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/Watchers/Watchers_body.php?view=markup&pathrev=62468 the watchers extension], it looks like

/* FROM */ 'watchlist',

/* SELECT */ 'count(wl_user) AS num',

::::is supposed to do the trick, although I may be mistaken. Theopolisme (talk) 04:34, 29 May 2013 (UTC)

:::::Ah database query. But still you are making a lot of queries to the database for every user.—cyberpower ChatOnline 12:30, 29 May 2013 (UTC)

::::::Yeah, although if you hurry you can get to Labs and they'll all be lightning fast ...don't get me started on one of my Labs/Toolserver rants... Theopolisme (talk) 14:28, 29 May 2013 (UTC)

:::::::True. I've got adminstats ready to run over there but I need access to a restricted table to continue.—cyberpower ChatOffline 14:33, 29 May 2013 (UTC)

:There is also no mechanism in place that I'm aware of that automatically includes editors on this list. It's voluntary. Couldn't a bot be programmed to check only the users on the list? Again: this is not my area, which is why I'm here on this page. Doc talk 03:56, 29 May 2013 (UTC)

::Like I said. It's {{botreq|possible}} to create.—cyberpower ChatOffline 04:18, 29 May 2013 (UTC)

:If it's not going to drag down the servers, and if it's not going to interfere with any user not listed on the page: why not? It's a bot. I was doing the same thing manually. It cannot be that hard to make a bot for this. Call it "Centijimbo1.0" or whatever you want. Doc talk|

  • I'd certainly support a bot here, one that checks a list of users who have put their name down, runs the database query for those users and then updates the list. cyberpower, you say this needs wider discussion, how wide exactly? What would you need to see? WormTT(talk) 10:18, 3 June 2013 (UTC)
  • It doesn't need to be all that wide. Just enough to see that my views on using server resources to update a humorous page is not redundant and is fact supported by a good chunk of the community.—cyberpower ChatLimited Access 13:09, 3 June 2013 (UTC)
  • :You keep using that word. I do not think it means what you think it means. Writ Keeper  14:03, 3 June 2013 (UTC)
  • ::I keep using that word? I only see myself using it once here. Anyways, redundant was the wrong word. I meant useless.—cyberpower ChatOffline 14:19, 3 June 2013 (UTC)
  • :::Oh, come on! You've never seen [http://www.youtube.com/watch?v=YIP6EwqMEoE The Princess Bride]? :) Writ Keeper  14:29, 3 June 2013 (UTC)
  • ::::Why is this RfC-worthy? I am seeing the specter of such a thing in Cyberpower's above remark. If it's going to drain the servers on pointless bureaucracy, I'm going to regret this very simple request for sure. Doc talk 05:23, 6 June 2013 (UTC)
  • {{BOTREQ|coding}} now--I agree that an RfC is excessive, so I'm just writing the script to do this now. Theopolisme (talk) 05:32, 6 June 2013 (UTC)
  • {{done}}, check out User:Theo's Little Bot/centijimbo. If people add themselves there now (as I've clarified at WP:CJ), the bot will resort/recalculate the page every week. Enjoy! If you're interested in the source code, see [https://github.com/theopolisme/theobot/blob/master/centijimbo.py]. Theopolisme (talk) 07:33, 6 June 2013 (UTC)
  • :My faith in "the process" has been reinforced greatly! Thank you so much! Doc talk 07:44, 6 June 2013 (UTC)

Adding a template to the top of a list of talk pages

See this discussion. In a nutshell, we at WikiProject Medicine want to add a simple template to the top of all the talk pages that transclude certain infoboxes. It should run periodically to check for new articles; so, obviously, it should only add the template if it doesn't already exist. (Alternatively, it could keep a list of pages that it's already updated in a static store somewhere.)

I haven't written a bot before, but looking at Perl's [http://search.cpan.org/~lifeguard/MediaWiki-Bot-5.005006/lib/MediaWiki/Bot.pm Mediawiki::Bot] modules, I think I could probably write this in a couple of hours. But, it strikes me this must be a very common pattern, and I'm wondering if there isn't a generic bot that can be used, or if there is an example someone could point me to that I could just copy-and-hack.

Thanks! Klortho (talk) 01:23, 6 June 2013 (UTC)

:Hey Klortho, check out AnomieBOT. There are instructions at the top ("Before requesting a WikiProjectTagger run, please read the following"). Cheers, Theopolisme (talk) 01:36, 6 June 2013 (UTC)

Update article class

Many project pages list articles with their "class" ratings – such as Vital articles. It would be nice to have a bot keep these up-to-date.

Some points:

  • Many pages might not want their lists updated. Maybe a parameter should be added to {{tl|icon}} with {{para|bot|yes}}.
  • Some articles have two symbols, e.g. {{icon|FFA}}{{icon|GA}}. This might need a special {{tl|bot-icon}} template of its own.
  • Different WikiProjects might assign different ratings, so a method of determining which one to use is needed.

Actually, I'd be willing to create this myself, but I don't know the first think about making bots... :-( Ypnypn (talk) 03:41, 5 June 2013 (UTC)

:{{ping|Ypnypn}} Sounds like a great idea. Do you mean you're requesting a bot to update WP:Vital articles? What other lists are you talking about? Theopolisme (talk) 01:44, 6 June 2013 (UTC)

::There are other lists of articles that might want this; see Featured topic questions, WikiProject Jewish history/Articles by quality, etc. Ypnypn (talk) 15:43, 6 June 2013 (UTC)

:::Ah, okay. Actually, Wikipedia:Bots/Requests for approval/TAP Bot 3 is currently open for this very task! You might want to ask there if it could be extended to the other pages you mention. Theopolisme (talk) 16:51, 6 June 2013 (UTC)

GFDL relicensing

User:B left the following message at WP:VPT, but nobody's responded to it.

Can we get a bot to go through and substitute the "migration" flag of images with the {{tl|GFDL}} template? Right now, today, if someone uploads a {{tl|GFDL}} image, it shows the {{tl|License migration}} template. Now that it is four years after the license migration, I think it makes sense to change the default to be not eligible for migration. But in order to change the default, we would need a bot to do a one-time addition of something like migration=not-reviewed to all existing uses of the GFDL template. So if you have {{tl|GFDL}} or {{tlx|self|GFDL}}, the template would add "migration=not-reviewed" to the template. After that is done, we can make the default "not-eligible" instead of the default being pending review. --B (talk) 00:16, 2 June 2013 (UTC)
I thoroughly agree with this idea; the template needs to be modified so that it defaults to migration=not-eligible, and we need to mark existing examples as needing review. This wouldn't be the first bot written to modify tons of license templates; someone's addition of "subject to disclaimers" to {{tl|GFDL}} (several years ago) forced Commons to maintain two different GFDL templates, and they had to replace many thousands of {{tl|GFDL}} transclusions. Nyttend (talk) 03:55, 4 June 2013 (UTC)

:I'm willing to do so, but I'm not sure if this needs wider discussion, per WP:NOCONSENSUS. Maybe you should leave a note at WP:VPR? Also, for clarification, it would be adding {{para|migration}}not-reviewed to all transclusions of {{tl|GDFL}} in every transclusion, provided that there was no value set for {{para|migration}}?  Hazard-SJ  ✈  05:01, 5 June 2013 (UTC)

::Requested at WP:VPR. I quoted your "for clarification" sentence and said basically "Here's what's being requested and the proposed solution; are both okay?" Thanks! Nyttend (talk) 02:14, 7 June 2013 (UTC)

FUR addition bot

The request is for an automated bot that scans through :Category:Non-free images for NFUR review and attempts to automatically add a preformatted NFUR rationale when one is not is present.

This bot would not apply to all Non-free content and would be limited initally to {{tl|Non-free album cover}}{{tl|Non-free book cover}} {{tl|Non-free video cover}} and {{tl|Non-free logo}} tagged media where the image is used in an Infobox. Essentially this bot would do automatically, what I've been doing extensively in a manual fashion with FURME

In adding the NFUR the bot would also (having added a rationale) also add the |image_has rationale=yes param as well as leaving an appropriate note that the rationale was autogenerated.

By utilising a bot to add the types of rationale concerned automatically, contributer and admin time can be released to deal with more complex FUR claims, which do not have easy pre-formatted rationales or which require a more complex explanation.

Sfan00 IMG (talk) 14:28, 4 June 2013 (UTC)

:BAD IDEA. Rationales should have human review. Otherwise you get articles with 10 different covers, bot approved. Werieth (talk) 15:09, 4 June 2013 (UTC)

::I'm usually against templated FURs, but the narrow conditions being discussed here (eg cover art or logos in infoboxes, adding a FUR where one is absolutely not present) seems reasonable. I would ask that some parameter be added to the page that makes it clear a bot added the rationale and that thus a human review has not affirmed, as well as language on the page that this action has been performed by the bot and if the editor can improve on it, they should. I agree on the general idea that this moves a number of trivially-fixed cases out of the human chain of NFC review to allow focus on more complex/non-standard cases, though as with Werieth's concern this shouldn't be seen as "okay, the image passes NFCC" rubber stamping that could be implied from this. --MASEM (t) 15:18, 4 June 2013 (UTC)

::: I've not no objections to the bot added tags categorising auto-generated rationales, so that they can still be reviewed by a human. Limiting this to infobox use only would be appropriate as 'appropriateness' of use elsewhere cannot be determined automatically. Masem, did you have a specfic wording in mind? Sfan00 IMG (talk) 15:31, 4 June 2013 (UTC)

: Would something like {{tl|Non-free_autogen}} be acceptable?Sfan00 IMG (talk) 16:31, 4 June 2013 (UTC)

:: Something like that but I would expand it more to say the bot's name (and task if necessary), that the image is believed to be tagged as it meets NFCI#1 or #2 (covers vs logos) but that this does not assure that NFCC is met (not a free pass) and encourage editors to expand the rationale. It should also place the image in a maintenance category related to the bot -tagging - that won't be a cleanup category though users would be free to go through, review rationales, and strip out the template if they can confirm the template rational is fine. --MASEM (t) 16:47, 4 June 2013 (UTC)

:: Feel free to expand it then, this is a wiki :) Sfan00 IMG (talk) 16:51, 4 June 2013 (UTC)

  • I would also stipulate that the file be the only non-free file on the page for a bot to create the rationale. Werieth (talk) 17:19, 4 June 2013 (UTC)

: That seems reasonable initial limitation, given this is intended to be for Non-free content in infoboxes.Sfan00 IMG (talk) 17:24, 4 June 2013 (UTC)

:Also the media concerned must be used in no more than 1 article, mainly because to generate auto rationales for multi page uses gets more complex. Sfan00 IMG (talk) 17:31, 4 June 2013 (UTC)

  • {{ping|Masem|Werieth|Sfan00 IMG}} Shall I begin coding this? Theopolisme (talk) 06:35, 7 June 2013 (UTC)

Here's my general outline of what it looks like the bot will need to do:

For all files in Category:Non-free images for NFUR review:

If image meets the following constraints:

- tagged with {{Non-free album cover}}{{Non-free book cover}}{{Non-free video cover}}{{Non-free logo}}

- only used in one article

- file must be the only non-free file in the article

Then:

- on the image page:

- add some fairuse rationales to {{Non-free use rationale}} or {{Non-free use rationale 2}}

- *** I will need rationales to insert ***

- add "|image has rationale=yes" to {{Non-free album cover}}{{Non-free book cover}}{{Non-free video cover}}{{Non-free logo}}

- add a new parameter "bot=Theo's Little Bot", to {{Non-free album cover}}{{Non-free book cover}}{{Non-free video cover}}{{Non-free logo}}

- this might need additional discussion as far implementation/categorization

As you can see, there are still some questions -- #1, are there rationales prewritten that I can use (Wikipedia:Use_rationale_examples, possibly...)? Secondly, as far as clarifying that it was reviewed by a bot, I think adding {{para|bot}} parameters to {{tl|non-free album cover}} and such would be easy enough, although if you have other suggestions I'm all ears. Theopolisme (talk) 06:53, 7 June 2013 (UTC)

: Per the above thread,

  1. There is an additional criteria that 'the file should be used in the infobox', This is because the code for adding this

is a straight translation (and partial extension) of what FURME does, substituting the {{tl|Non-free use rationale}} types

for the {{ fur}} types it uses currently.

FURME itself needs an overhaul and re-integration into TWINKLE, but that would be outside the scope of a bot request.

  1. the pre written rationales are {{tl|Non-free use rationale album cover}} {{tl|Non-free use rationale book cover}} {{tl|Non-free use rationale video cover}} {{tl|Non-free use rationale logo}} which are the standard templated forms.
  2. I'd been using {{tl|Non-free autogen}} as a means of marking semi-automated additions of rationales I'd made- The wording

still needs to be tweaked, but in essence it uses the {{para|bot}} and {{para|reviewed}} style as opposed to modification of

the license template. Note this also means it's easier to remove the tag once it's been human reviewed. Sfan00 IMG (talk) 11:10, 7 June 2013 (UTC)

::Great, thanks for the speedy reply (and the clarification). I'll get on to coding this now. 15:11, 7 June 2013 (UTC)

:::More details might come out as you start running limited tests on it in terms of process, but I think the general approach is fine. One point, being "file must be the only non-free file in the article" requirement, I don't think is necessary unless you cannot determine that the image is in the infobox (all standard infobox templates). I agree if you can't tell via program that the image is used in the infobox, the single use is a good starting point, but if you can, then you can "broaden" the requirement to being an image strictly used in the infobox. This might catch false positives of cases where editors use infoboxes later in the article (often movie soundtracks for movies) but that at least puts a rationale from there and human intervention is still needed to judge if those are right or wrong. Another point is that as template FURs are not required, there may be prose-based FURs, which at minimum need to name the article (or a redirect to the article) that the image is used in (and this doesn't have to be linked). So you may need, when checking for absence of a FUR, see if this case works too. --MASEM (t) 15:44, 7 June 2013 (UTC)

::::Okay, I'll just add a check to make sure the image is used in an infobox. That makes things a lot easier on my end, trust me! :) Theopolisme (talk) 15:57, 7 June 2013 (UTC)

  • {{ping|Masem|Werieth|Sfan00 IMG}} Okay, guys, I've coded the bot and filed a request for approval here. If you think of anything else, please add it there, not here. Cheers, Theopolisme (talk) 21:58, 8 June 2013 (UTC)

Bot to update Adopt-a-user list

Hi! I am looking for a bot that could update Wikipedia:Adopt-a-user/Adoptee's Area/Adopters's "currently accepting adoptees" to "not currently available" if they haven't made any edits after a certain period of time. This is because of edits like this where new user asks for adopters, but because the adopter has gone from Wikipedia, they never get on and just leave Wikipedia. Thanks, and if you need more clarification just ask! jcc (tea and biscuits) 17:14, 6 June 2013 (UTC)

:I would take up this task, but my hands are somewhat tied other tasks at the moment.—cyberpower ChatOnline 22:06, 6 June 2013 (UTC)

::{{botreq|coding}} -- jcc, after what period of time should users be marked as "not currently available"? 30 days? 60 days? Theopolisme (talk) 02:20, 7 June 2013 (UTC)

:::Erm, up to you really, but maybe 1 month(?), seeing as the bot will probably recheck every so often, so there's no need to do something like 3+ months in fear of the problem that some adopters might come and go. Up to you really, jcc (tea and biscuits) 16:14, 7 June 2013 (UTC)

::::Sorry for forgetting to let you know here, but this bot has been approved and will run weekly. Again, sorry for not keeping you in the loop! Cheers, Theopolisme (talk) 16:59, 9 June 2013 (UTC)

Archive bots

Heh, it's me again, with more "archive" bot requests. Here's another simple two:

1: If a |url= part of a reference has the web.archive.org) string in it, remove it and strip it back to the proper URL link.

2: If a reference has a |archiveurl tag, but is lacking in a |archivedate tag, grab the archiving date from the relevant part of the archive url, e.g http://web.archive.org/web/20071031094153/http://www.chaptersofdublin.com/books/Wright/wright10.htm would have "20071031" grabbed and formatted to "2007/10/31".

[http://en.wikipedia.org/w/index.php?title=1708_in_Ireland&diff=558659609&oldid=558547525] shows where I did this sort of thing manually. Lukeno94 (tell Luke off here) 20:49, 6 June 2013 (UTC)

:Hello, no need. That is already coded in for Wikipedia:Bots/Requests for approval/Hazard-Bot 21, (well ... I used the dash format, though), and I'm going to leave an update there. Also, another bot is doing this as well.  Hazard-SJ  ✈  00:20, 7 June 2013 (UTC)

::{{ping|Hazard-SJ}} I don't see this (at least #1) in your script's source code...Luke is talking about using regex to take a url parameter that is a link to web.archive.org and get the original url from that, then move the archive url to archiveurl, and replace url with the actual (dead, presumably) url. Theopolisme (talk) 00:28, 7 June 2013 (UTC)

:::{{ping|Theopolisme}} I haven't added it to GitHub as yet, but I intend to soon, along with requesting another trial.  Hazard-SJ  ✈  00:45, 7 June 2013 (UTC)

::::Oh, fabulous. My apologies. Theopolisme (talk) 00:46, 7 June 2013 (UTC)

:::::No problem ... I'm currently fine-tuning the code t ensure it works before I do so :)  Hazard-SJ  ✈  00:50, 7 June 2013 (UTC)

:::::*Damn, I guess I've been beaten to the punch on an idea :p Ah well, good to know this is already being dealt with! Lukeno94 (tell Luke off here) 11:45, 7 June 2013 (UTC)

Renewed request – Most missed articles

The Wikipedia:Most missed articles -- often searched for, nonexistent articles -- has not been updated since a batch run in 2008. The German Wikipedia person, Melancholie (de:Benutzer:Melancholie) who did the batch run has not been active since 2009. Where would be a good place to ask for someone with expertise to do another run? It does not seem to fit the requirements of Wikipedia:Village pump (technical) since it is not a technical issue about Wikipedia. It is not a new proposal, and not a new idea. It is not about help using Wikipedia, and it is not a factual WP:Reference Desk question. I didn't find a WikiProject that looked promising. So I am asking for direction here. --Bejnar (talk) 19:52, 11 June 2013 (UTC)

:It's harder than it looks. :) Have you tried emailing Melancholie? Theopolisme (talk) 20:36, 11 June 2013 (UTC)

::Melancholie is long gone. No response. How is it done? Can I learn? --Bejnar (talk) 03:43, 12 June 2013 (UTC)

:::Looks like the [http://stats.wikimedia.org/wikimedia/squids/ squid stats] are in a very poor way. The more I think about it, the harder it seems. Stuartyeates (talk) 04:02, 12 June 2013 (UTC)

:The best reliable method that I know of just counts red links. Werieth (talk) 12:14, 12 June 2013 (UTC)

::As I understand it, this had nothing to do with redlinks. Over a period of several months, the process captured the searched for text from the Wikipedia search box, dropped out those which had hits, stored each unsuccessful search term/phrase alphabetically with a counter that incremented for each use of that search term/phrase. At the end of the time period, the resultant database was sorted by frequency and the low volume terms were dropped, it was then run through a scat-remover to delete common obscenities and the like, and put out for editorial consumption at Wikipedia:Most missed articles I generated a number of articles suggested by that database that have reasonable hit rates. In some ways the programming may/might resemble that of [http://stats.grok.se/ Wikipedia article traffic statistics]. --Bejnar (talk) 05:09, 13 June 2013 (UTC)

Template:Hampton, Virginia

Add {{tl|Hampton, Virginia}} to every page in :Category:Neighborhoods in Hampton, Virginia. Emmette Hernandez Coleman (talk) 23:09, 17 June 2013 (UTC)

:Oooh, and easy one. :-)—cyberpower ChatOnline 00:06, 18 June 2013 (UTC)

::No bot needed, this is only 11 pages.  Hazard-SJ  ✈  05:09, 18 June 2013 (UTC)

Unreliable source bot

This bot's job would be to stick {{tl|unreliable source}} next to references that are links to websites of sketchy reliability as third-party sources (e.g., blog hosting sites, tabloids, and extremely biased news sites). The list of these sites would be a page in the MediaWiki namespace, like MediaWiki:Spam-blacklist. In order to let through false positives (for instance, when the site is being cited as a primary source), the bot would add a hidden category (maybe :Category:Unreliable source tags added by a bot) after the template to identify it as being added by the bot, and enable editors to check the tags. The hidden category would be removed by the editor if it was an accurate tagging. If not, the editor would comment out the {{tl|unreliable source}} tag, which would mean that it would be skipped over by the bot in the future. ❤ Yutsi Talk/ Contributions ( 偉特 ) 14:16, 13 June 2013 (UTC)

:This task might be {{botreq|impossible}} to do. I haven't checked the databases yet though, so I'll get back to this later today.—cyberpower ChatOffline 16:17, 13 June 2013 (UTC)

::Are you proposing a new MedaWiki page? Theopolisme (talk) 16:34, 13 June 2013 (UTC)

:This actually isn't a bad idea, it just needs some more thought. It's trivial for a bot to run around tagging all "domain.com" links as {{tlx|unreliable source|2=bot=AwesomeBot}}, however how you construct your url blacklist is probably most important. For example, examiner.com is on the blacklist because its just overall not a reliable source. However there are many links to it, and those have all probably been individually whitelisted{{cn}}. So tagging those wouldn't be useful. Did you have a few example domains in mind? That would help in evaluating your request. Legoktm (talk) 16:54, 13 June 2013 (UTC)

::I think it's a good idea for a bot too. The problem is, the blacklist are regex fragments and AKAIK, you can't use regex to search for external links. I could be wrong though. I'm pretty sure the database on Labs can help me out with this though, but before I know for sure, I'm just going to assume the worst answer.—cyberpower ChatOffline 17:07, 13 June 2013 (UTC)

:::mw:Manual:externallinks table.... Legoktm (talk) 17:29, 13 June 2013 (UTC)

::::I figured as much. I was talking about the API. I just didn't want to generate false hope before I knew for certain that it was doable. Anyways, I wouldn't mind taking up this task.—cyberpower ChatOffline 17:32, 13 June 2013 (UTC)

::Is the bot going to be able to work around the spam filter? Or maybe should the bot just remove citations that were added before the spam filter entry was created? Thanks! GoingBatty (talk) 03:31, 15 June 2013 (UTC)

:::We'll see. :-)—cyberpower ChatOnline 18:01, 16 June 2013 (UTC)

  • {{botreq|stilldoing}}—cyberpower ChatOnline 13:56, 20 June 2013 (UTC)

Null edits to update categories

Wikipedia:Categories for discussion/Working/Manual#Templates removed or updated - deletion pending automatic emptying of category has a very large backlog of hidden categories that have been renamed. Due to the way some or all of these categories are generated by the template the job queue alone doesn't seem able to process them and individual articles require null edits to get the updated category. Is it possible for a bot to have a crack at these? Timrollpickering (talk) 16:10, 21 June 2013 (UTC)

: Wikipedia:Bots/Requests for approval/Hazard-Bot 23 was already filed and included this.  Hazard-SJ  ✈  20:28, 22 June 2013 (UTC)

Requ. to move WikiProject

: Wikipedia:Categories_for_discussion/Log/2013_June_22#WikiProject_Skepticism

This is a request for assistance in moving the assessment categories for WikiProject Rational Skepticism.Greg Bard (talk) 20:02, 23 June 2013 (UTC)

:WP:CFD/W will take care of that. there are bots that used that page. Werieth (talk) 20:03, 23 June 2013 (UTC)

::WP:CFD/W is a protected page that I cannot edit. So I'm getting the run-around at this point. Greg Bard (talk) 20:59, 23 June 2013 (UTC)

:::Wont happen. Just get an admin to list the cats. Werieth (talk) 21:15, 23 June 2013 (UTC)

:::I can do it. -- Magioladitis (talk) 21:20, 23 June 2013 (UTC)

Fixed all categories manually, updating all pages with my bot. -- Magioladitis (talk) 21:45, 23 June 2013 (UTC)

I deleted all old categories, I fixed/normalised all banners and user wikiproject tags. -- Magioladitis (talk) 23:25, 23 June 2013 (UTC)

Unreferenced

Is there any way that a bot can go through and find instances where an article has the {{tl|unreferenced}} template and {{tl|references}}//any other coding pertaining to references? I've seen a lot of instances of {{tl|unreferenced}} being used on articles that do have references. This seems like it should be an easy fix. Ten Pound Hammer(What did I screw up now?) 03:35, 20 June 2013 (UTC)

: {{Ping|TenPoundHammer}} Yes, that's possible, and in that case, {{BOTREQ|coding}}  Hazard-SJ  ✈  03:37, 20 June 2013 (UTC)

::I was doing this for a while with an AWB bot which would change {{tl|unreferenced}} to {{tl|refimprove}}. However, I found too many instances of {{tl|unreferenced}} being incorrectly used under a section header (instead of {{tl|unreferenced section}}), so I stopped running the bot. I haven't dedicated the mindspace to figure out how to fix this. GoingBatty (talk) 03:59, 21 June 2013 (UTC)

:Also note that often articles will have a {{tl|reflist}} and have zero references, those should still be tagged as unreferenced. My thought would be to convert unreferenced => more refs if is contained in a non-commented out wikicode. Werieth (talk) 16:16, 21 June 2013 (UTC)

::Since {{tag|ref}} tags can contain notes instead of references, you might want to limit your search to citation templates, such as {{tl|cite web}} and {{tl|cite news}}. GoingBatty (talk) 02:52, 22 June 2013 (UTC)

:::Example of an unreferenced article using {{tag|ref}} tags to contain a note: Battle of Breitenfeld (1642). GoingBatty (talk) 12:22, 22 June 2013 (UTC)

:::: {{on hold}}, unless someone else wants to do this.  Hazard-SJ  ✈  02:34, 25 June 2013 (UTC)

:::::Using my bot to preparse the list and find examples of {{tl|unreferenced}} being incorrectly used under a section header - stay tuned. GoingBatty (talk) 04:19, 25 June 2013 (UTC)

Change "can refer to" to "may refer to" in DABpages

Do like [https://en.wikipedia.org/w/index.php?title=Supportability&diff=prev&oldid=561219470 I did here] in disambiguation pages per WP:DABSTYLE. -- Magioladitis (talk) 15:39, 23 June 2013 (UTC)

:Should be pretty to easy to do with AWB I think MOS:DABINT probably backs up the change more. It is maybe too small a change to bother doing unless the page is already being edited for some other reason though. --Jamesmcmahon0 (talk) 15:15, 27 June 2013 (UTC)

Bot to add articles to the Sorani Kurdish Wikipedia (CKB) about Iraqi cities using census data

Hi! Is anyone interested in writing a bot that is used to articles to the Sorani Kurdish Wikipedia (CKB) about Iraqi cities using census data?

I found Iraqi census data at http://cosit.gov.iq/pdf/2011/pop_no_2008.pdf ([http://www.webcitation.org/6HefMqa5Q Archive]) and the idea is something like User:Rambot creating U.S. cities with census data

Thanks

WhisperToMe (talk) 00:11, 26 June 2013 (UTC)

:We could import these data into WP:WD directly. --Ricordisamoa 23:47, 26 June 2013 (UTC)

::Cool! How do we do that? What do I need to do on my end? WhisperToMe (talk) 00:52, 28 June 2013 (UTC)

Exoplanets table

I was wondering if anyone would be able to create a bot that would be able to copy the information about planets detected by the Kepler spacecraft from the Extrasolar Planets Encyclopaedia ([http://exoplanet.eu link]) to our list of planets discovered using the Kepler spacecraft. Rather than merely going to the list, it would be ideal if the bot could follow the link for each Kepler planet and get the full information from there, rather than merely looking at the catalog. The information in the EPE about Kepler planet is in turn copied from [http://kepler.nasa.gov/Mission/discoveries/ the Kepler discoveries catalog], which is in the public domain but is unfortunately offline at the moment (requiring us to use the copyrighted EPE. In addition to the basic information, I would like it if our bot were able to calculate surface gravity where possible based on mass/(r^2). Thanks, Wer900talk 18:08, 27 June 2013 (UTC)

:We could import these data into WP:WD directly: please refer to the Space task force. --Ricordisamoa 18:34, 27 June 2013 (UTC)

::I commented at Astronomy task force because it seems most appropriate. Wer900talk 20:56, 27 June 2013 (UTC)

Bot to assist in identifying articles most in need of cleanup.

Hi. I haven't historically been a big editor on Wikipedia. Though I use it from time to time. I realize that there are probably a number of bots at work using various methods to target entries for improvement. However, I just wanted to add my two cents on a method which may or may not be in use.

First, however, some quick background. I am currently taking a Data Science class and for one of my assignments I developed a script which selects a random Wikipedia article and does the following:

1) Counts total words and total sources (word count does not include 'organizational' sections such as References, External Links etc.

2) Uses all words contributing to the word count to assess the overall sentiment of the text. For this, I used the AFINN dictionary and the word count to get an average sentiment score per word.

3) For each section and sub-section (h2 and h3) in the page which is not organizational in nature (see above definition) counts the number of words, citations and as with item 2 gets a sentiment score for the section/sub-section

So my thought on using this script is as follows:

If it was used to score a large number of Wikipedia pages, we could come up with some parameters on which a page and its sections and subsections could be scored.

1) For all articles, word count, source count and sentiment score.

2) For all sections and sub-sections, word count, citation count and sentiment score.

3) For pages with sources, a sources per word score

4) For sections with citations, a words per citation score

For all of these parameters, the scores from the sample could be used to determine what sort of statistical distribution they follow. A bot could then scan the full set of wikipedia articles and flag those which are beyond some sort of tolerance limit.

Additionally, data could be collected for sections which commonly occur (Early Life, Private Life, Criticisms etc.) to establish expected distributions for those specific section types. For example, we might expect the sections labeled Criticisms would, on average, have a more negative sentiment than other sections.

I hope this all makes sense and perhaps some or all of it is being done. I look forward to hearing from some more experienced Wikipedians on the idea.

Additionally, for sections which com — Preceding unsigned comment added by 64.134.190.157 (talk) 18:46, 21 June 2013 (UTC)

:It's an interesting thought, but we have already tagged over a million articles with unresolved issues - see Wikipedia:Backlog. GoingBatty (talk) 02:57, 22 June 2013 (UTC)

::Thanks for the reply. Let me ask it this way. Given that there already exists a backlog of that size. Would a bot such as I've described be useful in terms of prioritizing them for clean-up? For example, is an unsourced entry about a living person where a bot detects indications of bias more important to clean up than a seemingly neutral unsourced entry giving soccer results from the 1970s? I'm not asking this to be sarcastic. I honestly don't know if there is any sense that clean up of one should take priority over the other. — Preceding unsigned comment added by 72.244.42.10 (talk) 14:07, 28 June 2013 (UTC)

:::I think we already do this - the former has {{tl|BLP unsourced}}, while the latter has {{tl|unreferenced}}. GoingBatty (talk) 16:47, 29 June 2013 (UTC)

Template:South Alexandria

Put {{tl|South Alexandria}} on every article listed on it, and put the listed articles in a category called :Category:South Alexandria, Virginia. Emmette Hernandez Coleman (talk) 09:20, 29 June 2013 (UTC)

:{{doing}} manually. Also created the category, and added it to the template. GoingBatty (talk) 16:34, 29 June 2013 (UTC)

::{{done}}. GoingBatty (talk) 16:42, 29 June 2013 (UTC)

NFUR images with NFUR but no license

Any chance someone could write a bot to license tag these quickly?

[http://tools.wmflabs.org/catscan2/catscan2.php?categories=All+non-free+media&negcats=Non-free+images+for+NFUR+review%0D%0ANon-free+images+with+NFUR+stated%0D%0AWikipedia+files+with+unknown+source%0D%0AAll+free+media&ns[6]=1&templates_no=db-i3&sortby=uploaddate&ext_image_data=1&file_usage_data=1]

Most seem to be reasonably straightforward Sfan00 IMG (talk) 16:20, 18 June 2013 (UTC)

: {{Ping|Sfan00 IMG}} Just to be clear, what exactly should be added?  Hazard-SJ  ✈  03:12, 20 June 2013 (UTC)

: If the first template listed below is found on an image but the second is NOT found, add the second

  • {{tl|Non-free use rationale logo}} , {{tl|Non-free logo}}
  • {{tl|Non-free use rationale album cover}}, {{tl|Non-free album cover}}
  • {{tl|Non-free use rationale book cover}} , {{tl|Non-free book cover}}
  • {{tl|Non-free use rationale video cover}} , ((tl|Non-free video cover}}
  • {{tl|Non-free use rationale poster}} ,{{tl|Non-free poster}}
  • {{tl|Non-free use rationale video game cover}} , {{tl|Non-free video game cover}}
  • {{tl|logo fur}} , {{tl|Non-free logo}}
  • {{tl|Album cover fur}}, {{tl|Non-free album cover}}
  • {{tl|Book cover fur}}, {{tl|Non-free book cover}}

etc...

There may be some others, but this will start to clear out the 5000 or so I've found on the

catscan query notes over on WP:AWB. Sfan00 IMG (talk) 10:14, 23 June 2013 (UTC)

:: {{BOTREQ|brfa|Hazard-Bot 25}}  Hazard-SJ  ✈  21:32, 26 June 2013 (UTC)

:: It seems Theopolisme already plans to do this, so I withdrew the BRFA.  Hazard-SJ  ✈  01:48, 28 June 2013 (UTC)

:That's generating the NFUR's themseleves , It's not adding the license tags for 'existing' media with rationales.Sfan00 IMG (talk) 07:01, 28 June 2013 (UTC)

:: I "unwithdrew".  Hazard-SJ  ✈  21:31, 1 July 2013 (UTC)

COI Template

Back in March [http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28proposals%29/Archive_100#COI_Template there was consensus] in Proposals to test out Template:COI editnotice on the Talk page of articles about extant organizations, to see if it increases the use of {{t|Request edit}} and reduces COI editing. A bot was approved [http://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/RileyBot_11 in April] to apply the template to :Category:Companies based in Idaho. The BOT request said it would effect 1,000+ articles, which would be enough of a sample to test, but it looks like it was only applied to about 40 articles? I am unsure if the bot was never run on the entire category, or if we need a larger category. The original bot-runner is now retired. Any help would be appreciated. CorporateM (Talk) 14:12, 30 June 2013 (UTC)

:Hey CorporateM, I'm taking over the tasks of the original (now retired) bot operator, and this is included. Sorry for the delay, Theopolisme (talk) 20:47, 30 June 2013 (UTC)

:::No problem. CorporateM (Talk) 21:33, 30 June 2013 (UTC)

A bot to tag articles without images

I know that previously PhotoCatBot has tagged articles fitting this criteria, but is there any way that we could get a new bot to help finish up where this bot left off almost three years ago? Thanks! Kevin Rutherford (talk) 02:16, 1 July 2013 (UTC)

:For those interested, the Python source code can be found here. --Ricordisamoa 02:42, 1 July 2013 (UTC)

:This request is similar to this one at itwiki (discussion). I created [//github.com/ricordisamoa/wiki/blob/master/fp_notice.py this script] ([//github.com/ricordisamoa/wiki/blob/master/README.md#fp_noticepy guide]). --Ricordisamoa 02:42, 1 July 2013 (UTC)

:By the looks of the documentation of the task and code, it seems that bot only changed such tags (adding parameters), and not added them. Is that what you want?  Hazard-SJ  ✈  21:30, 1 July 2013 (UTC)

::Possibly, although it would be good to have a bot that tags ones without images as well, so that we can create more up-to-date maps for articles without images, and make it easier to gather photos at the end of the day. Kevin Rutherford (talk) 17:05, 8 July 2013 (UTC)

Template:Arlington County, Virginia

Make sure all articles listed on {{tl|Arlington County, Virginia}} have the template, and are in :Category:Neighborhoods in Arlington County, Virginia.

Create redirects to these articles in the format "X, Virginia" and "X, Arlington, Virginia" and "X". For example all of the following should redirect to Columbia Forest Historic District: Columbia Forest, Virginia, Columbia Forest, Arlington, Virginia, and Columbia Forest, Columbia Forest Historic District, Virginia and Columbia Forest Historic District, Arlington, Virginia. Emmette Hernandez Coleman (talk) 10:41, 3 July 2013 (UTC)

:{{BOTREQ|coding}} Theopolisme (talk) 14:35, 3 July 2013 (UTC)

:I've done the second part of your task on my own account, since there were only 35 edits in total (of course requiring human intervention before saving each edit). Going to tackle the first part now. Theopolisme (talk) 23:39, 3 July 2013 (UTC)

::Status update: harder than it looks. I'm actually going all out and working on a full Python script for inserting and removing text in specific places (for example, navigational templates or categories). So, in progress -- [https://github.com/theopolisme/theotools/blob/master/advanced_addtext.py source so far] Theopolisme (talk) 04:11, 7 July 2013 (UTC)

Invitation Bot?

Howdy, I haven't had the need for a bot before, but I'm organizing a meetup and would like help in posting invites to the folks on this list. I can come up with a short message, is that all I need to provide? Or is there anymore info needed? Thanks, Olegkagan (talk) 00:44, 27 June 2013 (UTC)

:I am willing to do this, but since I haven't been approved for this task before, I'll file a BRFA now.  Hazard-SJ  ✈  00:50, 27 June 2013 (UTC)

:{{BOTREQ|brfa|Hazard-Bot 26}}  Hazard-SJ  ✈  00:55, 27 June 2013 (UTC)

:Umm, User:EdwardsBot/Instructions... Legoktm (talk) 01:51, 27 June 2013 (UTC)

::He doesn't have access.  Hazard-SJ  ✈  02:01, 27 June 2013 (UTC)

:::He can ask ;) Legoktm (talk) 02:02, 27 June 2013 (UTC)

::::Is it my turn to do something now? Olegkagan (talk) 17:56, 27 June 2013 (UTC)

:::::Well, you could use the link that Legoktm provided to use a bot that is already approved for such tasks (you would either have to ask an admin to add you to the access list, or to add himself and submit your request, or ask someone who is already on it so submit the job for you), or I could continue getting approval for my bot to run this task and do it for you. I'm leaving that decision with you.  Hazard-SJ  ✈  23:57, 2 July 2013 (UTC)

:::::::Seems to me that since you volunteered, are ready and willing to help, it makes sense for you to continue getting approval for your bot to carry out the task. Olegkagan (talk) 01:39, 3 July 2013 (UTC)

{{od}} Okay, and it would be nice to have the message to be added in case a trial is requested of me, thanks.  Hazard-SJ  ✈  02:19, 3 July 2013 (UTC)

: {{Ping|Olegkagan}} A 50-edit trial was requested of me 4 days ago, could I please get the message? Thanks.  Hazard-SJ  ✈  03:56, 7 July 2013 (UTC)

:: Pardon the delay. Here is the message: "You are invited to "Come Edit Wikipedia!" at the West Hollywood Library on Saturday, July 27th, 2013. There will be coffee, cookies, and good times! -- Olegkagan (talk)"

::: Thanks.  Hazard-SJ  ✈  04:29, 11 July 2013 (UTC)

Infobox Unternehmen

Please could somebody "Subst:" all 84 article-space transclusions of the German-language {{tl|Infobox Unternehmen}}, which is now a wrapper for the English-language {{tl|Infobox company}}, as in {{Diff|Salzkammergut|563644319|558847577|this edit}}? That may be a job for AWB, which unfortunately I can't use on my small-screen netbook. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:08, 10 July 2013 (UTC)

:{{doing}} Theopolisme (talk) 16:50, 10 July 2013 (UTC)

::Thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 10 July 2013 (UTC)

:::{{done}}, and I wrote a little Python script in case anyone needs to do something like this in the future [https://github.com/theopolisme/theobot/blob/master/subster.py] Theopolisme (talk) 21:28, 10 July 2013 (UTC)

Template:Peconic County, New York

Removes all articles in {{tl|Peconic County, New York}} that are NOT in the following categories: :Category:Sag Harbor, New York, :Category:Riverhead (town), New York, :Category:Shelter Island (town), New York, :Category:Southampton (town), New York, :Category:Southold, New York. Emmette Hernandez Coleman (talk) 04:13, 12 July 2013 (UTC)

Or arterially remove all articles that ARE in the flowing categories: :Category:Babylon (town), New York, :Category:Brookhaven, New York, :Category:Huntington, New York, :Category:Islip (town), New York, :Category:Smithtown, New York. Emmette Hernandez Coleman (talk) 04:21, 12 July 2013 (UTC)

Never-mind. The template might be deleted, so no point in putting that effort into it until we know it will be kept. Emmette Hernandez Coleman (talk) 09:08, 13 July 2013 (UTC)

Book report bot

Given {{ul|NoomBot}} is under since April, maybe another bot could be made to make\update the ever-useful Book Reports. igordebraga 01:31, 13 July 2013 (UTC)

Template:Orleans Parish, Louisiana

Add {{tl|Orleans Parish, Louisiana}} to every page it lists. Emmette Hernandez Coleman (talk) 22:33, 13 July 2013 (UTC)

:{{Ping|Theopolisme}} would you be interested in doing this after #Template:Arlington County, Virginia?  Hazard-SJ  ✈  03:08, 14 July 2013 (UTC)

::Yep! Basically, the script I'm writing will just need to scan every template on the page and define each one based on its contents/its documentation's contents -- i.e., "navbox", "persondata", "stub tag", etc., then assemble a "map" of the page's templates based on type...then the bot can insert elements at the end of the closest section (i.e., if inserting authority control, and the article has no geographical coordinates, insert after the navigation templates), per WP:ORDER. I'll hack on this tomorrow. Theopolisme (talk) 03:21, 14 July 2013 (UTC)

Also {{tl|Neighborhoods of Denver}}. At the rate I'm going I'll probably create a few more navboxes in the next few days, so it would be easier to do a bunch of navboxes together don't bother with either of these yet. Emmette Hernandez Coleman (talk) 08:20, 14 July 2013 (UTC)

Poem dates

I've just added an hCalendar microformat to {{tl|Infobox poem}}, so a Bot is now required, to apply {{tl|Start date}} to the {{para|publication_date}} parameter.

The logic should be:

  1. If the value is empty, do nothing
  2. Else if the value is four digits, YYYY, change to {{Start date|YYYY}} ({{Diff|The Spider and the Fly (poem)|564208161|563230415|example 1}})
  3. Else if the value is a month and year, change to {{Start date|YYYY|MM}}, where "MM" is the number of the month in the year (07 for July, etc).
  4. Else if the value is a day, month and year, DD Month YYYY, change to {{Start date|YYYY|MM|DD|df=y}}
  5. Else if the value is a month, day and year, Month DD, YYYY, change to {{Start date|YYYY|MM|DD}}
  6. Else add a hidden tracking category, for human investigation.

This is related to a larger request with clear community consensus, which has not yet been actioned; I'm hoping that this smaller, more manageable task, will attract a response, which can later be applied elsewhere. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:12, 14 July 2013 (UTC)

:Gotcha. I'll implement this using my favorite python date parsin' module, dateutil ({{ping|Hazard-SJ}} as an aside, it's really [http://stackoverflow.com/a/3276459/1934901 quite powerful], if you haven't used it before). Theopolisme (talk) 14:30, 14 July 2013 (UTC)

::Source code is [https://github.com/theopolisme/theobot/blob/master/poems.py written] (not exactly following your logic, but accomplishes basically the same thing); Andy, is the df=y especially important? At the moment, I haven't implemented it, since according to my tests it would slow down the script a fair bit. Theopolisme (talk) 17:09, 14 July 2013 (UTC)

:::Yes. People get very upset if you change 14 July 2013 to July 14, 2013, or vice versa. And thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:30, 14 July 2013 (UTC)

::::Very true. I thought the template would be magical enough to detect that on its own, but I guess that's too much to hope for ;) Coding that functionality now. Theopolisme (talk) 18:14, 14 July 2013 (UTC)

::::[https://github.com/theopolisme/theobot/commit/45d558f8c4ba0d39a36d35dc26cd8ea661ccd6ec Implemented]; BRFA to follow. Theopolisme (talk) 18:50, 14 July 2013 (UTC)

:::::Wikipedia:Bots/Requests_for_approval/Theo%27s_Little_Bot_24 Theopolisme (talk) 19:23, 14 July 2013 (UTC)

New adminbot

I would like to suggest a bot that disables autoblock on softblocks - if an admin accidentally enables autoblock on a softblock, my suggested bot will automatically DISable autoblock and enable account creation, e-mail, and talk page access if any, some or all of them are disabled. The bot will then lift the autoblock on the IP address affected by the soft-blocked user. 76.226.117.87 (talk) 03:09, 15 July 2013 (UTC)

:How is the bot supposed to know when it is softblocked?—cyberpower ChatOnline 12:04, 15 July 2013 (UTC)

: {{ec}} As written, I can't make sense of your request. According to WP:SOFTBLOCK, a "softblock" is a block on an IP address, and the "autoblock" option is not available when blocking IP addresses. Anomie 12:10, 15 July 2013 (UTC)

::I think he means accounts that are blocked where the autoblocks should be disabled. I'm going to go ahead and mark this as {{botreq|impossible}}.—cyberpower ChatOnline 12:19, 15 July 2013 (UTC)

Fix-up of new bot

I already said that I wanted a bot to change block settings (see the declined request made by 76.226.117.87), but now I have fixed it:

The process goes like this:

1. An administrator ENables autoblock when blocking an account that should have autoblock DISabled.

2. The bot I'm suggesting will change the block settings for the affected account so that account creation, e-mail, and talk page access are all allowed, if any of them are disabled. The bot will also disable autoblock on the affected account, and lift the autoblock on the IP address recently used by the blocked account.

3. The resulting log entry should look something like this (timestamps will vary, and some extra comments are added that are not normally present in block log entries):

  • 13:13, 13 April 2011 User 1 (talk | contribs) blocked User 2 (talk | contribs) (account creation blocked, email disabled) (sample settings) with an expiry time of indefinite ({{uw-softerblock}}, and {{softerblock}}, although may be {{uw-causeblock}}, {{causeblock}}, etc. )
  • 13:14, 13 April 2011 (New bot's username) (talk | contribs) changed block settings for User 2 (talk | contribs) with an expiry time of indefinite (autoblock disabled) (The reason for the block)
  • 13:14, 13 April 2011 (New bot's username) (talk | contribs) unblocked #xxxxxxx (autoblock number will vary) (Blocks like these should not have autoblock enabled. ) 76.226.76.230 (talk) 20:54, 15 July 2013 (UTC)

:Still {{botreq|impossible}}—cyberpower ChatOnline 22:04, 15 July 2013 (UTC)

:: No it's not completely impossible. But it requires an assumption that blocks with reasons referencing certain templates such as "{{uw-softerblock}}" must never be hardblocks, consensus for which should probably be established on WP:AN (and advertised to WT:BLOCK and those templates' talk pages) first.

:: On the other hand, the part of the request to automatically unblock IPs blocked by the mistaken hardblock is impossible for privacy reasons, as watching the bot's block/unblock log would effectively reveal that the originally-blocked user had used the unblocked IP address. Anomie 22:16, 15 July 2013 (UTC)

:::I just see too much room for error. And the second half of your statement is exactly why I considered it impossible to do.—cyberpower ChatOnline 23:20, 15 July 2013 (UTC)

SpellingBot?

I thought of a bot that corrected simple spelling errors, such as beacuse-because and teh-the. buffbills7701 22:17, 15 July 2013 (UTC)

: See WP:Bots/Frequently denied bots#Fully automatic spell-checking bots. Anomie 22:20, 15 July 2013 (UTC)

::{{BOTREQ|notdone}}—cyberpower ChatOnline 23:22, 15 July 2013 (UTC)

:::However, you could use tools such as AutoWikiBrowser to search for and fix spelling errors such as these, as long as you check each edit for incorrect fixes before saving. GoingBatty (talk) 03:18, 16 July 2013 (UTC)

Bot tagging medical articles that lack a PMID or DOI

Context: Citing references is important for medical articles (under WP:MED or related WikiProjects) though it is important for other articles as well. As books have ISBN, almost all the renowned medical journals have Pubmed listing and their articles bear a PMID. Pubmed serves as a global quality regulatory and enlisting body for medical articles and if a medical article does not have a PMID, chances are that the journal is not a popular one and therefrore there is a possibiliy that it does not maintain quality issues that are to be adhered to. Other medical articles have Digital object identifier or DOI (with or without having PMID) which serves to provide a permanent link redirecting to its present url. Some Pubmed articles are freely accessible and have PMC (alongside PMID) which is therefore an optional parameter. Thus, if a is having neither PMID nor DOI, chances ar that 1. the article has a PMID (most cases) but the tag lacks its mention or 2. The article lacks a PMID or DOI and its not a problem of the placed.

I feel the requirement for two different bots.

  1. A bot automatically crawling the pages tagged under WP:MED, WP:Anatomy, WP:Neuroscience, WP:MCB, WP:Pharmacology, WP:Psychology and certain other related WikiProjects, for the articles that have references having neither PMID nor DOI and adding a tag within the reftag such that it adds the Wikipedia article to a browsable list. An easier option for the bot would be to check for the journal (name may be full or abbreviated) in the Pubmed directory and if it is in the list but the is having neither PMID nor DOI, the tag is to be placed to denote the possibilty that the aticle has a PMID that has not been added to the tag. If the bot cannot locate the journal in the Pubmed database, it would place a tag something like 'journal not found in Pubmed db'. There should be a modifiable parameter which can be manually checked on or off by some person (user) to affirm or negate the bot.
  1. A bot automatically tagging pages (criteria above) not using a {{cite journal}} template. for the tag.

Utility: These bots would enable the editors of medical articles to make the mentioned references more reliable and verifiable and would encourage users to use this template while placing references. DiptanshuTalk 16:15, 10 July 2013 (UTC)

:You can proceed with the discussion about utility of such bots at Wikipedia talk:WikiProject Medicine#Bot tagging medical articles that lack a PMID or DOI DiptanshuTalk 16:25, 10 July 2013 (UTC)

  • WPMED tags a lot of articles that shouldn't have references to professional literature, including biographies and articles about businesses. It would be more appropriate to limit this to articles containing infoboxes like {{tl|Infobox disease}} or articles in the category tree under :Category:Diseases and disorders, :Category:Medical treatments, or similar headings. WhatamIdoing (talk) 15:18, 11 July 2013 (UTC)
  • There is no requirement on Wikipedia to use {{t1|cite journal}}, so ... no. There are also occasions when no PMID is used/needed, but the source is still valid (sample, citing the DSM in psych conditions), so ... no. SandyGeorgia (Talk) 01:17, 19 July 2013 (UTC)

Small request for bot/script to replace "language=ru" with "language=Russian"

The folks at the Village Pump suggested that I post this request here.

I have been correcting errors in citations, and I have noticed that pretty much every Russia-related article I edit contains an incorrectly formatted parameter, "language=ru", in its citation parameters. (The "language" parameter takes the full language name, not the two-letter code.)

You can see an example of one of these citations [http://en.wikipedia.org/w/index.php?title=North_Pole-37&oldid=560043220 here]. Note that the rendered reference says "(in ru)" instead of the correct "(in Russian)".

It occurred to me that someone clever with a script or bot or similar creature might be able to do a semi-automated find and replace of "language=ru" in citations with "language=Russian".

Is this a good place to request such a thing? I do not have the time or skills to take on such a project myself. Thanks. Jonesey95 (talk) 14:06, 16 July 2013 (UTC)

:I spotted this with some other languages when doing some capitalisation of language names so may be worth expanding the scope to other 2 character language identifiers. Keith D (talk) 14:16, 16 July 2013 (UTC)

In the pump thread, using Lua to automatically parse the language codes was suggested -- I think that would definitely be preferable if possible, rather than having a bot make a boatload of fairly trivial edits. Theopolisme (talk) 14:27, 16 July 2013 (UTC)

::Yes, a change to Lua would be preferable, but it would probably not fix the problem described above. I think a human or bot would still need to go through and make this boatload of changes. Until Lua changes, I'm fixing these errors by hand; a bot would do a much quicker job and leave me to make changes that require a human brain. We already have bots and AWB users fixing things that are not visible to readers; this change would actually improve the reader's article-reading experience, making it considerably less trivial than those AWB changes. Jonesey95 (talk) 19:41, 16 July 2013 (UTC)

:Original discussion is at WP:VPT#Small request for bot/script to replace "language=ru" with "language=Russian". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:16, 16 July 2013 (UTC)

:I may be a bit slow, but is there a reason the template couldn't be changed so that lang=ru produces (in Russian)? Or is that what the bit above eye: Lua is about? Ignatzmicetalk|

::Yes; that's partly what's meant by "using Lua to automatically parse the language codes", above. Such codes could also be used to put (non visible) language attributes into the emitted HTML, to improve compliance with web standards and accessibility guidelines, and improve searchability. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 06:41, 19 July 2013 (UTC)

Bot to tag references behind a paywall

Wikipedia is too important and too useful of a resource to have citations behind paywalls if there is another possible reference. In order to draw attention to references that need improving/substitution it would be nice if there was a bot that would tag articles that are behind a paywall. I realize that some newspapers slowly roll articles behind a paywall as time passes. However other newspapers have all their content behind a paywall. A good example is The Sunday Times. You can click on any link on http://www.thesundaytimes.co.uk and you will be presented with "your preview of the sunday times." If wikipedians like myself enjoy contributing by verifying citations it is next we cant verify sunday times citations for free. When I see a paywall tagged citation I often try to find another citation and substitute it. A bot would be helpful for this. DouglasCalvert (talk) 23:51, 16 July 2013 (UTC)

:I'm willing to do this, but having a list of other such links would be useful. Could you come up with them?  Hazard-SJ  ✈  00:07, 17 July 2013 (UTC)

::I think this would send the wrong message. First, we want high quality sources, even if they are not free. Only if a free source is of equal or better quality to a non-free source should the free source be preferred. By having a bot going around tagging thousands of paywall sources, it will reinforce the misconception that paywall sources are not acceptable.

::Also, why is it fair to tag paywall sources but not paper books, magazines, and newspapers? Jc3s5h (talk) 00:17, 17 July 2013 (UTC)

::: How is this sending the wrong message? If an article is behind a paywall that is a fact. How can that send the wrong message? I am not saying I think they should be removed or that lower quality citations should be substituted. Thats a nice strawman but I am not going to engage. I think the reference should indicate that the link is behind a paywall for two reasons. Most importantly it lets other editors know that the article could be imporived if an equivalent reference was found from a site that is not hiding behnd a paywall. Secondly there is no point sending readers off to another site if they will not be able to read the reference.

::: As far as "fairness" goes I do not even know what it means to be fair to a book or to be fair to a magazine. If you have a problem with the paywall tag there seems like there must be another avenue to voice your concerns. I can go to a library and verify the citation for free. I cannot go to the library and verify the sunday times citation.DouglasCalvert (talk) 00:26, 17 July 2013 (UTC)

:::: It would make the FUTON bias problem worse. I think this {{BOTREQ|advertise}}; post about it at WP:VPR and see if you can get consensus first. Anomie 01:22, 17 July 2013 (UTC)

::::: The {{Tl|subscription}} template is a useful and neutral tag to apply after citations that contain subscription-only URLs. Jonesey95 (talk) 03:01, 17 July 2013 (UTC)

:::::: Yes, the {{tl|Subscription required}} template seems very fitted to be added by a bot. - Jamesmcmahon0 (talk) 00:21, 22 July 2013 (UTC)

VisualEditor FixBot?

WMF has turned Visual Editor on for IP accounts now, and the results are as expected: [http://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchFilter=550 Filter 550] shows that a significant volume of articles are getting mutilated with stray wikitext. It has been proposed to set the filter to block the edits that include nowikis that indicate that the combination of VE and an unknowing editor has caused a problem.

I'd personally rather see this sort of mess cleaned up by a bot, and a message left on a user talk page that asks the editor either to stop using wiki markup or to stop using VE. I think it's possible to detect strings generated by VE (basically, it surrounds wikitext with nowiki tags), and figure out what the fix is (in some [most?] cases, just remove the nowiki tags), similar to how User:BracketBot figures out that an edit has broken syntax. Given that the problem is on the order of magnitude of 300 erroneous edits per day, is it possible to move with all deliberate pace to field such a bot?

(Background: see WP:VPR#Filter 550 should disallow.) -- John Broughton (♫♫) 03:31, 16 July 2013 (UTC)

:Sounds interesting. I'll look into solutions.—cyberpower ChatOffline 03:36, 16 July 2013 (UTC)

::{{edit conflict}} Hmm, I like this idea too...looking into... Theopolisme (talk) 03:39, 16 July 2013 (UTC)

:::I started a script, but I won't be able to finish it up now as I have to go offline.  Hazard-SJ  ✈  07:51, 16 July 2013 (UTC)

:::: For a bot rather than script-assisted human editing, watch out for the issues raised in WP:CONTEXTBOT. Anomie 10:55, 16 July 2013 (UTC)

::::: Good point; I'll also post at a couple of automated tool pages (AWB, TW). But I really think that false positives on articlespace pages are very unlikely for cases like this:

::::::whatever

::::: And simply removing nowiki tags is a change that seems to me to have so little potential for damage that I think a bot - with the standard human occasional review - can be trusted to do that. -- John Broughton (♫♫) 15:24, 16 July 2013 (UTC)

::::::I can't see how any false positives would result - I can't think of any occasions where nowiki should appear in articles..... Mdann52 (talk) 12:54, 17 July 2013 (UTC)

:::::::From the filter results, I've seen some experienced users use nowiki as a method of escaping wikitext characters in places where it is intended to be used as plain text. For example, someone might write:

::::::::{{ template | title = Pipe Land <nowiki> | </nowiki> Pipes }}

:::::::That's not the typical way of dealing with things like that, but it does work. Dragons flight (talk) 13:00, 17 July 2013 (UTC)

:::::: For people designing scripts, one needs to be aware that when VE adds nowiki it tends to be maximally expansive rather than minimally so. For example if you type:

:::::::I like butterflies and bright red trains.

:::::: In VE the result is:

:::::::<nowiki>I like butterflies and bright red trains.</nowiki>

:::::: Rather than just escaping the link, it will escape all of the added plain text on either side as well. Dragons flight (talk) 13:05, 17 July 2013 (UTC)

:I will try to do something in WPCleaner about this, as requested by John on my talk page, but not sure I will have the time today or tomorrow. I see the following (won't be automatic) features :

:* Retrieving the list of articles which triggered filter 550

:* Detect {{tag|nowiki}} tags in main namespace articles, and suggest a fix for each of them. Basically suggesting to just remove the tags, except for specific cases. I've only one in mind for now : the nowiki at the beginning of a line with whitespace characters after it, the whitespace characters should be removed too.

:--NicoV (Talk on frwiki) 14:26, 17 July 2013 (UTC)

::WPCleaner can now detect {{tag|nowiki}} in main namespace and suggest a fix. To active this detection, edit Special:MyPage/WikiCleanerConfiguration and add the following contents (with the {{tag|source}} tags):

  1. Configuration for error 518: nowiki tags

error_518_bot_enwiki=true END

::After that, in WPCleaner the Abuse filters button lets you choose which Abuse filter you are interested in (choose 550) and gives you the list of pages having triggered that filter. When you analyze a page, {{tag|nowiki}} tags are found and suggestions are given to fix them. It's quite basic, so if you think of any enhancement, tell me. --NicoV (Talk on frwiki) 22:52, 17 July 2013 (UTC)

:::Great; thanks Nico! Theopolisme (talk) 00:36, 18 July 2013 (UTC)

::::Yes, that should help out. Doing it automatically, there's not much that can be done to avoid false positives, so it'd need great consensus, per Anomie's link above.  Hazard-SJ  ✈  00:59, 18 July 2013 (UTC)

:If I have time tonight, I will sort the list of pages where an edit triggered Filter 550 from newest to oldest instead of alphabetically. --NicoV (Talk on frwiki) 07:14, 18 July 2013 (UTC)

I came here from the WP:Village Pump (proposals) page, and I suggest instead of an autofix bot, maybe a bot much like User:DPL bot? They could notify everyone who accidentally triggered the filter and each person could go back and fix it. Unless that would create lots of spam? Just a thought. kikichugirl inquire 22:21, 20 July 2013 (UTC)

::{{ping|Kikichugirl}} Presuming that VE is aimed at new editors, I'm not sure they would understand a talk page message that tried to explain why VE messed up their edit. Plus, IP editors who get a new IP address every time they edit may never see the messages. GoingBatty (talk) 00:47, 22 July 2013 (UTC)

:::{{ping|GoingBatty}} That's actually a good point. Besides, User:DPL bot I've heard is meant for at least slightly more experienced editors anyway. Most of the nowiki problems I've seen are coming from users with redlinked userpages (likely to be newer editors; more experienced users not choosing to have a userpage would just redirect their user page to their talk page). kikichugirl inquire 06:26, 22 July 2013 (UTC)

  • Perhaps I am coming into this a bit late (also please note i haven't had the time to read everything above) But if we can code a bot for this... why shouldn't we just code a fix for VE? ·addshore· talk to me! 11:49, 21 July 2013 (UTC)

:*[http://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedback&diff=prev&oldid=563853529 Because WMF won't].—Kww(talk) 06:31, 22 July 2013 (UTC)

:::I see..... Now, just because they won't probably doesn't mean we cant, I imagine there would be a way to hook into the save of visual editor and run a collection of javascript rules / tidy up scripts over the page before actually saving? :> Or is my mind beginning to wonder? ·addshore· talk to me! 07:37, 22 July 2013 (UTC)

::::Hmm, that's not a bad idea :) Theopolisme (talk) 16:39, 22 July 2013 (UTC)

Flag unreferenced BLPs for WikiProjects

{{User-multi

| User = DASHBot

| Project =

| Lang =

| separator = dot

| 1 = t

| 2 = c

| demo =

| doc = yes

}} used to create, and/or periodically update, a list of unreferenced biographies of living persons for a given Wikiproject (see :User:DASHBot/Wikiprojects). However, that bot has been blocked since March. I'm wondering if another one can accomplish this same task. I'm asking on behalf of WP:JAZZ (whose list is at Wikipedia:WikiProject Jazz/Unreferenced BLPs) but there were a lot of other WikiProjects on that list, as well (I'd already removed WP:JAZZ, though). -- Gyrofrog (talk) 21:55, 23 July 2013 (UTC)

:I'll be happy to do this, but first I emailed DASHBot's operator, on the off chance that he'd be able to send me the source code for the task in question. If I don't receive a response in the next week or two, I'll re-code it myself. Theopolisme (talk) 12:26, 26 July 2013 (UTC)

Vandalism Tracker Robot

It seems like it would be of great use to create a bot that finds and lists the most vandalised pages on the wiki, and create a list article or essay that regularly updates the list, in order to alert all wikipedians to which pages require the most monitoring and reverting. Superwikiwalrus (talk) 14:12, 27 July 2013 (UTC)

:That job should go to User:Cobicyberpower ChatOnline 14:51, 27 July 2013 (UTC)

GA Assessment BOT

Hi! This is my first bot request so bear with me. The bot I am requesting will do/perform the following functions:

  1. When a nomination is added to WP:GAN, it will download/scan/read/whatever the source code, checking for spelling errors
  2. It will check the source code for any maintenance tags, {{citation needed}} {{refimprove}} and so on
  3. It will check all images in the article and check fair use status
  4. It will then report all this somewhere on wiki, preferably with a page per nom, something like User:Retrolord/(Nomination name here report)

Does anyone here have any thoughts on the feasibility of this? KING RETROLORD 09:10, 22 July 2013 (UTC)

One more request, could the bot check there is at a minimum 1 citation per paragraph. Thanks, KING RETROLORD 09:59, 22 July 2013 (UTC)

:{{xt|Checking for spelling errors}} is generally not a good idea. However, the rest of this could be feasible... Theopolisme (talk) 16:38, 22 July 2013 (UTC)

:{{BOTREQ|coding}} a bot to create a report for each new nomination (which will be posted at User:Theo's Little Bot/GAN/**article**). Theopolisme (talk) 16:43, 22 July 2013 (UTC)

::{{ping|Retrolord}} Please take a look at User:Theo's Little Bot/GAN for what I've got so far ([https://github.com/theopolisme/theobot/blob/master/gan.py source code]). Thoughts? Modifications? What other information would be useful to have? Theopolisme (talk) 02:07, 23 July 2013 (UTC)

:::That seems to have covered it. Would you be able to provide the list of tags it checks for?KING RETROLORD 02:20, 23 July 2013 (UTC)

::::Right now it checks for all tags in :Category:Cleanup templates, although this can be changed if you'd like. Really, no other information you're dying to see? ;) Theopolisme (talk) 02:30, 23 July 2013 (UTC)

:::: {{ec}} It uses all templates from :Category:Cleanup templates.  Hazard-SJ  ✈  02:30, 23 July 2013 (UTC)

:::::Sir, hath thou forgotten thy pipe trick? Theopolisme (talk) 02:32, 23 July 2013 (UTC)

::::::Nothing else i'm dying to see in the bot, but when its finished all the reports will be on seperate sub-pages? KING RETROLORD 02:35, 23 July 2013 (UTC)

::{{ping|Theopolisme}} - I agree that automatically fixing spelling errors is not a good idea. However, I'm curious what your concerns would be when it comes to making a list of potential spelling errors in an article and posting that list elsewhere for human review. GoingBatty (talk) 02:44, 23 July 2013 (UTC)

:::Ah, yes, that's a different story. I'm not opposed to that, theoretically... perhaps just a section, "Alerts", saying that it detected "x misspelled words, including x,y,z", and advising the reviewer to run the page through a proper spellchecker? I wouldn't see the point in just blatantly listing all the misspelled words, though. :/ Theopolisme (talk) 02:55, 23 July 2013 (UTC)

:::::::A few more things actually. Could we have the bot check for 1 citation per paragraph? And secondly, on the report pages, could all none free images be written in red font, so they stand out more? Thanks, KING RETROLORD 03:07, 23 July 2013 (UTC)

::::::::Yes and yes to those two, implementing now. Theopolisme (talk) 03:13, 23 July 2013 (UTC)

:::::::::{{done}}, I think. See User:Theo's Little Bot/GAN. For the alerts, I a) ignore the lead (since it doesn't have to be referenced) and b) ignore paragraphs that match the following regular expression: (==|Category:|\[\[File:|\[\[Image:|{{Authority control|{{.*}}|^<.*?>|^;|^\|). It's still not completely foolproof, though, and still gets some false positives. Theopolisme (talk) 04:10, 23 July 2013 (UTC)

::::::::::This is coming along quite nicely. A few final questions, is the bot going to do a report on all current noms? And where are the reports going to end up? (At the moment I can only see the 10 or so on that example page?) Thanks, KING RETROLORD 04:57, 23 July 2013 (UTC)

{{od}}

Yes, when running "for real" the bot will report on all current nominations. As far as where the reports end up... I would like to just use sections on User:Theo's Little Bot/GAN, since that prevents having to creating a ton of new pages—unless you have a reason why multiple pages would be beneficial. Theopolisme (talk) 05:09, 23 July 2013 (UTC)

:There are two reasons why I thought seperate pages would be good. Firstly, will reports on 400+ noms make the page unwieldy to load? And secondly, if there are seperate pages for each nom, then they could be linked at the WP:GAN page next to each nom. Thoughts? KING RETROLORD 05:34, 23 July 2013 (UTC)

::Links could be done using User:Theo's Little Bot/GAN#articlename, so that's not a big deal. Load-wise: perhaps User:Theo's Little Bot/GAN/A, User:Theo's Little Bot/GAN/B, etc? Theopolisme (talk) 05:42, 23 July 2013 (UTC)

:::Sounds good to me. I don't really mind, as long as you don't think there will be problems loading the page it shouldn't matter. Thanks, KING RETROLORD 05:44, 23 July 2013 (UTC)

::::I've implemented the /A, /B, etc system. {{User:Theo's Little Bot/GAN/link}} can be used to automatically link to a specific article's listing. Theopolisme (talk) 06:15, 23 July 2013 (UTC)

{{ping|GoingBatty}} I've implemented basic spell checking using Wikipedia:Lists of common misspellings/For machines ([https://github.com/theopolisme/theobot/commit/74a4537368b8a053446e7f42cc7e7a20b8bafba7 commit]). I initially tried using a larger corpus (courtesy of NLTK+Project Gutenberg), but it was taking way too long to process each article (5-8 minutes), so I settled for Wikipedia:Lists of common misspellings/For machines instead. It's not as complete, but should still catch "common misspellings." ;) Your thoughts? Is this adequate? Theopolisme (talk) 19:17, 23 July 2013 (UTC)

:{{ping|Theopolisme}} - Would you be able to leverage WP:AWB/T ? GoingBatty (talk) 23:01, 23 July 2013 (UTC)

::Ah, good idea. Will look into tonight. Theopolisme (talk) 23:55, 23 July 2013 (UTC)

:::Once again, great idea! Code for integration with WP:AWB/T is written; running some tests now. Theopolisme (talk) 04:40, 24 July 2013 (UTC)

::::{{ping|GoingBatty}} Take a look at User:Theo's Little Bot/GAN -- right now, I'm just printing the typo + line #, but would it make sense to also print the suggested correction? Theopolisme (talk) 04:46, 24 July 2013 (UTC)

:::::{{ping|Theopolisme}} - Thanks for adding the typo info! I suggest you change the description from "Common" to "Possible", and agree that the suggested correction be added. How would you suggest the editor use the line # info? Thanks! GoingBatty (talk) 02:02, 25 July 2013 (UTC)

::::::2 AM and 2 hours of frustration later, I think I've got the "suggested correction" stuff working ([http://bugs.python.org/issue1519638 note to self] *growls*). Take a look at User:Theo's Little Bot/GAN. Good point about the line #s...you're right, they are fairly useless. ;) Perhaps printing a snippet of the surrounding text? Theopolisme (talk) 06:46, 25 July 2013 (UTC)

::::::::I've made it print the surrounding 40 characters instead. Thoughts? Theopolisme (talk) 07:05, 25 July 2013 (UTC)

:Request: Could the sub headings for each article at User:Theo's Little Bot/GAN link to the articles? So the section headers would become bluelinks? Thanks, King∽~Retrolord 05:30, 24 July 2013 (UTC)

::Done in source code [https://github.com/theopolisme/theobot/commit/79a0adcc7cc974d46e8f476f67e1fdac2edc265e here]. Theopolisme (talk) 06:19, 24 July 2013 (UTC)

{{replyto|Theopolisme}} Can this bot be used to scan current Good Articles? The bot might be able to select some articles that might no-longer meet the GA criteria for examination by human users. If the user decides it no-longer meets the criteria, he can open a GAR.--FutureTrillionaire (talk) 01:35, 25 July 2013 (UTC)

:{{xt|The bot might be able to select some articles that might no-longer meet the GA criteria for examination by human users.}} Hmm, what do you think should trigger the bot to select a particular article for reexamination? Theopolisme (talk) 01:56, 25 July 2013 (UTC)

:Regarding the above idea, wouldn't running it through the same procedure that nominations get work? It might not pick up every article needing re-assessment but it will fairly easily spot all the ones with maintenance tags, non free images and lack of citations? King•Retrolord 06:58, 25 July 2013 (UTC)

::Well, yes, but it is a machine, so exact constraints would need to be specified (for example, how many maintenance tags == re-review?). Also, keep in mind that the bot's reports include a fair number of false positives. I think it's just a matter of determining numbers that wouldn't overload the system (i.e., we don't want 700 articles appearing for checking), while still providing a benefit. Theopolisme (talk) 07:08, 25 July 2013 (UTC)

::: I don't know whats getting unrealistic so stop me if it is, but would it be possible for the bot to just scan all GAs, then perhaps list the 10% of "worst offenders" for re-assessment? Looking at some of the GA re-assessment drives, it seems to me that atleast 10% of articles get deslisted after being check, though I may be wrong on that. King•Retrolord 07:36, 25 July 2013 (UTC)

::: Since there are over 18,000 Good Articles, the criteria for selection should be strict. Otherwise, the bot will select too many articles for review, many of them probably don't need it. Any GA with serious issues should be selected. The criteria for selection might be at least one orange tag, or maybe at least 3 citation needed tags, etc.--FutureTrillionaire (talk) 13:31, 25 July 2013 (UTC)

::::Yes, exactly what FutureTrillionaire says. Sure, I could definitely do it using maintenance tags as a metric (and maybe just make a list, User:Theo's Little Bot/GAs with maintenance tags, sorted by number of tags)? Would that be sufficient (or at least a good start)? Theopolisme (talk) 16:14, 25 July 2013 (UTC)

:::::Sounds like a good idea to me. We can test this out. I'm willing to volunteer examining the articles the bot selects.--FutureTrillionaire (talk) 22:24, 25 July 2013 (UTC)

::::::Alright, I'm generating that report now (might take a while). Theopolisme (talk) 03:51, 26 July 2013 (UTC)

:::::::Update: More like a day or so, given the sheer magnitude of articles to parse. Theopolisme (talk) 12:17, 26 July 2013 (UTC)

::::::::Cool. Do you know how often will the bot be able to update the list (if some of the GAs listed were to be delisted or if new orange tags are added to GAs)?--FutureTrillionaire (talk) 14:03, 26 July 2013 (UTC)

:::::::::Does a weekly update sound good? Theopolisme (talk) 20:33, 26 July 2013 (UTC)

::::::::::Sure. I am thinking about transcluding that list to a new section at WP:GAR, so that users can take a look at the selected articles.--FutureTrillionaire (talk) 21:11, 26 July 2013 (UTC)

Looks like the bot is done. However, there are issues. I checked the about 20 of the articles listed, and it appears that the reason the vast majority of these articles were selected is due to having at least one dead link tag in the article. However, this is not very useful because dead links do not violate any of the GA criteria. I saw only one article that contained an orange tag, and few articles only containing citation needed tags or disambiguation needed tags. Is it possible for the bot to ignore dead link tags and other less serious tags? I was hoping to just see articles with orange tags displayed at the top, or something like that.--FutureTrillionaire (talk) 01:50, 27 July 2013 (UTC)

:It's very difficult (if not impossible) for the bot to determine the seriousness of a certain tag. However, we could create a blacklist for tags that should be ignored. We could also just display templates that transclude {{tlx|Ambox}} (that's "the orange" you were talking about). Thoughts? Thanks for bearing with me on this. (Another note: for some reason, the bot listed articles from least->most tags...fixed.) Theopolisme (talk) 02:33, 27 July 2013 (UTC)

::I think we should try out the Ambox option. However, some non-serious tags that use the Ambox template should be blacklisted. Examples that I can think of are {{tlx|Current}} and {{tlx|Split}}.--FutureTrillionaire (talk) 02:59, 27 July 2013 (UTC)

:::{{tlx|Current}} and {{tlx|Split}} wouldn't be included, since they aren't also in :Category:Cleanup templates. Here's a page to enter templates for the whitelist, though, should you stumble upon anything. Theopolisme (talk) 03:39, 27 July 2013 (UTC)

::::Can you run the bot again, limiting the search to only Ambox and cleanup templates? --FutureTrillionaire (talk) 13:09, 28 July 2013 (UTC)

:::::To clarify: you mean "only ambox-based orange cleanup templates", correct? Theopolisme (talk) 16:29, 28 July 2013 (UTC)

::::::Yes, you're right. I think this should reduce the list significantly.--FutureTrillionaire (talk) 16:54, 28 July 2013 (UTC)

:::::::Running now... Theopolisme (talk) 17:53, 28 July 2013 (UTC)