Wikipedia:Bots/Requests for approval/Tigraan-testbot

Tigraan-testbot

[[User:Tigraan-testbot|Tigraan-testbot]]

{{Newbot|Tigraan-testbot|}}

Operator: {{botop|Tigraan}}

Time filed: 17:57, Sunday, June 25, 2017 (UTC)

Automatic, Supervised, or Manual: Supervised or Manual

Programming language(s): Python 3

Source code available: [https://github.com/Tigraan/Teahouse-bot Github]

Function overview: Notifies the original poster (OP) of a Wikipedia:Teahouse thread that was archived by lowercase sigmabot III.

Links to relevant discussions (where appropriate): Wikipedia_talk:Teahouse/Archive_14#Special_archival_bot_that_notifies_new_users (which itself followed a suggestion from earlier in Wikipedia_talk:Teahouse/Archive_14#Teahouse_archiving) establishes the basic consensus for the bot. In a follow-up we came to a consensus to notify every non-blocked user, even IP editors and editors will lots of edits / advanced userrights.

Edit period(s): Daily

Estimated number of pages affected: ~20/day (depends on the traffic at Wikipedia:Teahouse)

Namespace(s): User talk pages exclusively, except maybe a dedicated logging page in the bot's userspace.

Exclusion compliant (Yes/No): Yes (posting is done by Pywikibot, which [https://www.mediawiki.org/wiki/Special:Code/pywikipedia/r4096 obeys it]).

Function details: The bot parses the page history of Wikipedia:Teahouse to find the last archival edit (e.g. this) and the corresponding threads ("image upload wikimedia commons issue of copyright", "im about to loose my article, please help", etc.). It attempts to match each threads to its creation by an edit with edit summary "(Foo): new section", and identifies the thread's OP by the author of the creating edit. The edit summary of the archival edit is parsed to determine the relevant archive(s) (there one or two archive links usually); the archive(s) is parsed to create a complete link (correct archive + anchor) for each archive thread. Once all this is gathered, the bot posts a notification through this template for each of the archived threads that could be matched, provided the user exists, is not blocked, and has not opted out (the latter is taken care of by Pywikibot).

You may take a look at User_talk:Tigraan-testbot/THA_log which is where the notifications went in my last testing phase before coming here (I used test.wikipedia before but I could not substitute the template there). add_text's {{code|always}} flag is currently off which makes it ask for confirmation before each edit, but I would prefer to run it supervised (flag on).

The endgame plan is to merge the functionality into {{U|HostBot}}'s codebase (maintainer {{U|Jtmorgan}}) and run it as a cron job. (EDIT 18:07, 25 June 2017 (UTC): just to be clear, the idea is to run the task supervised or manual under Tigraan-testbot for a trial period, and if all is well to convert it to a HostBot task.)

=Discussion=

  • Regarding [https://github.com/Tigraan/Teahouse-bot/blob/master/scripts/find_and_notify.py#L37-L39 this config file-related concern], have you tried putting the configuration file in your home directory (specifically, $HOME/.pywikibot on Mac/Linux and %HOME%\.pywikibot on Windows)? Enterprisey (talk!) 03:36, 27 June 2017 (UTC)
  • :I tried a lot of things, but not that particular one - I will give it a shot before running the trial. Thanks for the tip. (I am not a fantastic Python coder, and I did not find PWB's manual pages very clear, to be frank). TigraanClick here to contact me 11:34, 27 June 2017 (UTC)
  • ::It worked, yay! Thanks. TigraanClick here to contact me 19:58, 27 June 2017 (UTC)

:{{BotTrial|edits=50|days=5}} — xaosflux Talk 03:59, 27 June 2017 (UTC)

{{cot|Probably too much detail of how the test run went. Will post a summary once over. TigraanClick here to contact me 22:48, 30 June 2017 (UTC)}}

:*[https://en.wikipedia.org/w/index.php?title=User_talk:Kirschnik&diff=prev&oldid=787821705 First] [https://en.wikipedia.org/w/index.php?title=User_talk:Jd02022092&diff=prev&oldid=787821692 five] [https://en.wikipedia.org/w/index.php?title=User_talk:Naawada2016&diff=prev&oldid=787821667 test] [https://en.wikipedia.org/w/index.php?title=User_talk:NS4545678&diff=prev&oldid=787821646 edits] [https://en.wikipedia.org/w/index.php?title=User_talk:ShibaSan&diff=prev&oldid=787821624 done]. Log available on Github.

::So far, no screwups frontstage (no false positives, every notification contains the correct info). The only real problem is this false negative (no notif sent). It was logged as impossible to match the title to a new section creation in the page history; however, the original post is not older than 10 days nor outside the 500 revisions limit, so I am not sure what happened here.

::Also, the edit summary needs some easily-done tweaking for logging purposes, and maybe I should look carefully at PWB's login process because I clearly have no clue how it works at the moment. TigraanClick here to contact me 19:55, 27 June 2017 (UTC)

:::I wonder if the slightly unusual fact that the section heading had TWO spaces between the text and the closings "==" could explain the false negative? Could it be that in one case multiple adjacent spaces were combined, and in the other they were not, leading to a comparison failure? DES (talk)DESiegel Contribs 05:38, 28 June 2017 (UTC)

::::Nice catch, {{U|DESiegel}}. The logs say the match string (thread name) was {{code|"I am looking for some feedback on my first article written around the North Highland Way."}}, which does not include an ending space, and looking back at the [https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Wikipedia:Teahouse&rvstartid=786783465&rvendid=786783465&rvprop=comment API call for edit summary] it indeed returns an additional space at the end (in addition to the leading space of the {{code|" */ new section"}} fragment). I guess I could fiddle the regexp to catch leading or ending spaces, but this looks like an edge case, and the simpler the regexp are the better I sleep. TigraanClick here to contact me 19:21, 29 June 2017 (UTC)

:::::I am a software developer myself, {{U|Tigraan}}, and matching issues come up often. In my code I generally strip all leading/trailing spaces, and condense multiple adjacent spaces to a single space on both sides before doing a comparison. But I don't use regexs, and maybe that is a non-trivial issue for your code. But you do need to handle 0 spaces (==Help me please==) and exactly 1 space (== I need help ==) as both are common cases. If two spaces will cause a problem, we should probably alert hosts to normalize headings on new sections once the bot is running. DES (talk)DESiegel Contribs 19:53, 29 June 2017 (UTC)

::::::Apparently MediaWiki does some space-trimming: after all, the problem is that the section name just before archival and the edit summary at creation have inconsistent spaces, not that they have zero or one or two (unless I missed an edit, there was no human refactoring the thread title here). Maybe at some point we will run into trouble with esoteric UTF-8 characters or whatnot. Nonetheless, I probably overdramatized the risk in tweaking the regexp; I will give it a shot.

::::::Asking hosts to refactor thread titles is not really an option. By design, the bot compares a section name at archival and an edit summary at creation, and having hosts manually correct those will more likely than not break the matching. Unless those hosts have a good understanding of the matching rules and string modifications caused by MediaWiki; and even then it means we have humans doing more work to help a bot, which somehow doesn't sound right.

::::::Keep also in mind that a false negative is not a huge trouble for this bot. By design, any thread that gets its title changed will not be matched; and that will happen fairly often (e.g. hosts wikilinking the article name, or changing "help!!" to something descriptive). What matters in my view is to have those cases documented, and to keep a log of how many threads failed to match. TigraanClick here to contact me 15:36, 30 June 2017 (UTC)

:::::::Well, I didn't want to fiddle with the regexp, but it turns out that in Python one can just {{code|.strip()}} the leading/trailing spaces, so that was easy and safe enough to do. I also upgraded the page-history pulling routine so that it can overcome the 500 revs limitation (I kept a throttle at 10 API calls = 5000 revs so as to not overload the server if something goes wrong and a very large page history is asked). TigraanClick here to contact me 22:45, 30 June 2017 (UTC)

:*[https://en.wikipedia.org/w/index.php?limit=50&title=Special%3AContributions&contribs=user&target=Tigraan-testbot&namespace=&tagfilter=&start=2017-06-29&end=2017-06-29 Second run with 10 edits] including one talk page creation for an IP editor (well, not really an IP editor judging by the thread, but the script doesn't care). Still no screwups; the archival edit modified two archives, but the correct links were found.

::Yet, two more false negatives... I am going to investigate. TigraanClick here to contact me 18:42, 29 June 2017 (UTC)

:::Well, the last two negatives are explained: one thread title was changed, and the other is beyond the 500 revisions history limit. I should probably fix the 500rev limit via the continue parameter, but the other one is a case we decided to abandon during the design phase. TigraanClick here to contact me 19:21, 29 June 2017 (UTC)

:*[https://en.wikipedia.org/w/index.php?limit=50&title=Special%3AContributions&contribs=user&target=Tigraan-testbot&namespace=&tagfilter=&start=2017-06-30&end=2017-06-30 Third run], 9 edits. First encounter with a blocked user ({{noping|Badri K Vishal 2006}}) went smoothly (no notification sent). [https://en.wikipedia.org/w/index.php?title=User_talk%3ANazim_Hussain_Pak&type=revision&diff=788342122&oldid=788301762 This] could be seen as a bit spammy - it only mirrors how spammy the user was at the Teahouse, but I understand others might have a concern about it. A few false negatives lessened the spam, as the threads were refactored as Wikipedia:Teahouse/Questions/Archive_631#Inquiries_from_Nazim_Hussain_Pak. TigraanClick here to contact me 22:42, 30 June 2017 (UTC)

:*[https://en.wikipedia.org/w/index.php?limit=50&title=Special%3AContributions&contribs=user&target=Tigraan-testbot&namespace=&tagfilter=&start=2017-07-01&end=2017-07-01 Fourth run, 12 edits]. All edits are ok, but still some "false negatives".

:#This one and that one are due thread name changes.

:#This section and that section failed to match during archive search because they could not be matched by strings containing a leading or trailing space ({{tq|"1st Wiki article- need some help! "}} and {{tq|" i lost my mobile phone and it said emergency EE. Any advances?"}}). This should be fixed by another {{code|.strip()}}.

:#That edit generated an interesting "failed to match section creation" warning. It is not exactly a false negative: it is a good thing that no notification was sent because the "link of that page" header was a subsection of the main. It could warrant checking the hierarchy level of the threads to match, but OTOH only new section creations with the appropriate automatic edit summary will be matched, so it is probably not worth it. TigraanClick here to contact me 17:56, 1 July 2017 (UTC)

:::"1st wiki article" and "mobile phone" things solved, see the test diffs Special:Diff/788486055 and Special:Diff/788486175. TigraanClick here to contact me 18:20, 1 July 2017 (UTC)

:*[https://en.wikipedia.org/w/index.php?limit=50&title=Special%3AContributions&contribs=user&target=Tigraan-testbot&namespace=&tagfilter=&start=2017-07-02&end=2017-07-02 Fifth run, 15 edits]. One false negative due to a thread name change but the editor was blocked anyways. TigraanClick here to contact me 14:07, 2 July 2017 (UTC)

{{cob}}

{{BotTrialComplete}} I failed at counting before running the last test and the bot made a total of 51 edits - I apologize for that.

The bot did not make any erroneous notifications or exaggerated resource requests during the trial that I am aware of. I changed the codebase slightly during testing, in order to:

  1. remove leading and trailing spaces when matching thread names
  2. query the full revisions of the last 10 days at Wikipedia:Teahouse via :mw:API:Revisions' {{code|rvcontinue}} parameter
  3. change the edit summary left by the bot

I believe the bot is good to go technically. TigraanClick here to contact me 14:23, 2 July 2017 (UTC)

:{{BotApproved}} — xaosflux Talk 15:41, 6 July 2017 (UTC)

:The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.