Wikipedia:Bots/Requests for approval/EarwigBot 5


EarwigBot 05

[[User:EarwigBot I|EarwigBot I]] 2

{{Newbot|EarwigBot I|2}}

Operator: The Earwig

Automatic or Manually Assisted: Automatic, unsupervised

Programming Language(s): Python, Pywikipedia

Function Overview: The bot puts the correct timestamp on all pages in :Category:Undated AfC submissions, which is a list of Articles for creation pages that are missing them.

Edit period(s): One time run

Already has a bot flag (Y/N): Y

Function Details: This bot was conceived after a [http://en.wikipedia.org/w/index.php?title=User_talk:The_Earwig&oldid=289962268#Bot discussion] between myself and User:MSGJ (Martin) concerning a bot to follow-up on User:EarwigBot II (BRFA). One of the problems with Articles for Creation submissions is that because there are so many new users involved with it, there are often cases where submissions aren't filed correctly. This is often due to the submissions not having a timestamp on them (formatted like this: {{REVISIONTIMESTAMP}}), and this results in the submissions being miscategorized. A total of {{PAGESINCAT:Undated AfC submissions}} submissions currently are un-timestamped, and are contained within this category: :Category:Undated AfC submissions. The bot aims to fix this problem by taking each page in that category, and using this regex

python afc_timestamper.py -file:afc_timestamper.txt -regex -nocase "{{AFC submission" "{{AFC submission|ts={{subst:REVISIONTIMESTAMP}}"

to add in a timestamp. The bot's source code is available here, while the file containing the category members is here. The only problem is that I don't see any foreseeable way to get the submissions's creation date, rather than the date of the last revision (which is often very close). As Martin said, "Of course, the ideal solution would be a magic word like CREATEDATE but this doesn't exist yet." I think that putting {{REVISIONTIMESTAMP}} should be adequate for the job the bot is to follow, but I am looking into it in more detail to see if that could be changed. For now, that is the best there is. Thanks.

=Discussion=

{{BotTrial|edits=30}} Let's see how it does. – Quadell (talk) 00:00, 17 May 2009 (UTC)

{{BotTrialComplete}} All thirty edits made:

style="margin-left: 1em;; margin-bottom: 0.5em; width: 100%; border: #99B3FF solid 1px; background-color: #FFFFFF; color: #000000; float: right; "

|

Unfortunately, the bot screwed up big time. It seems that no one noticed that {{REVISIONTIMESTAMP}} results in the timestamp of the edit made, not the last edit. See, the bot did it's job correctly, except that it placed all of the entries in :Category:AfC submissions by date/17 May 2009 instead of where they were supposed to go. I'm investigating a way to get around this. This is why we have trials! The Earwig (Talk | Contributions) 03:11, 17 May 2009 (UTC)

:Well, [http://www.mediawiki.org/wiki/Special:Code/MediaWiki/49575 at some point] you'll be able to get this on a separate page as a parser function. Until then I think you'll actually have to query the page history through the API. – Quadell (talk) 03:32, 17 May 2009 (UTC)

Sorry, I meant to come and comment here earlier, as I could have told you this would happen :) Actually the template Template:AFC submission/declined was using REVISIONTIMESTAMP to roughly categorise them by date in the absence of the proper timestamp. Anyway, no harm done. Is it feasible to query the creation date with the software you're using? — Martin (MSGJ · talk) 07:57, 17 May 2009 (UTC)

:Well, here's what I've found: pywikipedia's wikipedia.py module has built-in features (getVersionHistory() or getVersionHistoryTable()) enabling it to retrieve a page's complete version history in a format including the revision ID, the user who made the change, and the edit summary. It can also restrict this list to an arbitrary number of recent revisions. However, I'm not sure how I can get the bot to look at this chart and determine what is the oldest revision ID. What the module does have the capability of doing is getting the revision ID of the most recent revision (previousRevision()), and it can supposedly retrieve the timestamp of said revision. (editTime()). The last one, editTime(), is probably what we want, (it will function similar to how I thought {{REVISIONTIMESTAMP}} would function) and some testing I conducted, although rather buggy, confirmed this. I'm a little worried that the afc_timestamper.py module that I was using for the trial might not accept this modified imput method (i.e., a variable in the regex). Opinions? The Earwig (Talk | Contributions) 16:12, 17 May 2009 (UTC)

::I did a survey of the first 25 submissions in the category: 8 of them were last edited one or two days after the submission was first submitted, and five of them were edited three days or more after the submission was last submitted. This would indicate that if the above method is used, most (1,600) of the submissions will have the correct date, or a date very close to the correct one, and the remaining number (400) of the submissions will have date that is close, but still considerably far from the correct one. I'll see if I can get pywikipedia to understand the API... it isn't built to do queries like that, but the query to get the revision id is simple. The Earwig (Talk | Editor review) 19:48, 23 May 2009 (UTC)

Have you found a way to retrieve the creation date yet? There must be a way, surely ... — Martin (MSGJ · talk) 14:39, 27 May 2009 (UTC)

:::If you want the creation date of the first revision (and don't want to rewrite the whole script to use a different Python framework), you can do something like:

import urllib

import simplejson as json #if you have Python >= 2.6, use: import json

import datetime

params = {'action':'query', 'prop':'revisions', 'rvdir':'newer', 'rvlimit':1, 'rvprop':'timestamp', 'format':'json'}

params['titles'] = pagetitle # Whatever variable has the page title

data = urllib.urlencode(params)

raw = urllib.urlopen("http://en.wikipedia.org/w/api.php", data)

res = json.loads(raw.read())

pageid = res['query']['pages'].keys()[0]

ts = datetime.datetime.strptime(res['query']['pages'][pageid]['revisions'][0]['timestamp'], "%Y-%m-%dT%H:%M:%SZ")

tsstring = ts.strftime("%Y%m%d%H%M%S") # The timestamp as a string

:::Though looking at the source, I have no idea where you would need to integrate this; it seems awfully complex for such a simple task. Mr.Z-man 22:09, 27 May 2009 (UTC)

::::Hm? The reason my source is so complicated is because it's a copy of pywikipedia's replace.py module, which has a lot of extra and unnecessary features such as parsing XML dumps and whatnot. I know how to query the API, thanks for the code anyway, but my problem was integrating this with the module. I think a simple rewrite is in order. {{BOTREQ|coding}} Hopefully I'll have this done tomorrow (I've been quite busy lately). The Earwig (Talk | Editor review) 01:03, 28 May 2009 (UTC)

{{Fixed|Fixed.}} How on earth did this task get so confusing? You see, I thought that this would be a really simple bot at first, so I tried to use the replace.py pywikipedia module verbatim to do it. I didn't realize that I would need to query the API, so when I attempted to add the API functionality to the module, it wouldn't work. The replace.py contained too much garbage, extraneous commands, and unncecessary functions, making the command impossible to implement. When I wrote my own, extremely simple version of the code today (including part of Mr.Z-man's code above), the bot worked fine. Moral of the story: don't try to rewrite pywikipedia modules, write your own! The Earwig (Talk | Editor review) 23:48, 28 May 2009 (UTC)

:{{tld|BAGAssistanceNeeded}} The bot's code is all fixed up now, so I think that it's ready for another quick trial to test it out. The Earwig (Talk | Editor review) 23:48, 28 May 2009 (UTC)

{{BotTrial|edits=50}} Sure thing. – Quadell (talk) 17:48, 29 May 2009 (UTC)

:Thanks. Just give me a few moments to set everything up. The Earwig (Talk | Editor review) 20:07, 29 May 2009 (UTC)

{{BotTrialComplete}} All fifty edits made. I looked through them quickly, and don't notice any problems. All of the timestamps appear to be correct. The Earwig (Talk | Editor review) 20:55, 29 May 2009 (UTC)

style="margin-left: 1em;; margin-bottom: 0.5em; width: 100%; border: #99B3FF solid 1px; background-color: #FFFFFF; color: #000000; float: right; "

|

{{BotApproved}} Marvelous. – Quadell (talk) 02:33, 30 May 2009 (UTC)

:The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.