Wikipedia:Bots/Requests for approval/Bender the Bot 2
[[User:Bender the Bot|Bender the Bot 2]]
{{Newbot|Bender the Bot|2}}
Operator: {{botop|Bender235}}
Time filed: 19:48, Saturday, August 20, 2016 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): AutoWikiBrowser
Source code available:
Function overview: HTTP → HTTPS conversion for Google News and Google Books links
Links to relevant discussions (where appropriate): Wikipedia:Village pump (proposals)/Archive 127#RfC: Should we convert existing Google and Internet Archive links to HTTPS?
Edit period(s): one time run
Estimated number of pages affected: conservatively guessed 100k (but possibly 300k or more)
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: Since the transition of Internet Archive links to HTTPS is finished and WaybackMedic will take care of Wayback Machine, I want to now fix links to Google services, starting with Google News and Google Books. The bot should find the string
: (see below)
and
:
and
replaced with
:
and
, respectively
The reasons for the change to HTTPS in general have already been elaborated in the RfC. In this particular case, note that http://books.google.com/
automatically redirects to HTTPS (ever since 2012 or so). That means links from Wikipedia (which is HTTPS by default) go HTTPS→HTTP→HTTPS, which not only is slower than HTTPS→HTTPS, but also breaks the HTTP Referrer (per [https://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html#sec15.1.3 RFC 2616 §15.1.3]).
Furthermore, I wanted to combine the HTTPS move with a change in the TLD to .com
, especially for those international TLD considered "sensitive" in certain regions (like .co.il
in Arab countries, or .com.tw
in China).
=Discussion=
Isn't
([https://regex101.com/r/wD6cB4/1 editor]) the regex that should get replaced with
?--Joel Amos (talk) 18:34, 22 August 2016 (UTC)
:Yes it is. Sorry, I had that wrong. Fixed above. Thanks. --bender235 (talk) 19:01, 22 August 2016 (UTC)
::That's fine. Also, the brackets aren't needed around the "s" and a backward slash should precede the first "." (my bad). Also, you'll want to remove the trailing slash from the replacement string so that it doesn't change edit: beat me to it :D --Joel Amos (talk) 19:39, 22 August 2016 (UTC)news.google.com/hello
to news.google.com//hello
:::Fixed the backslash (although it worked fine when I tested it). --bender235 (talk) 19:53, 22 August 2016 (UTC)
::::An un-escaped dot means "any character," so the old regex would've matched false positives (e.g. news@google.com).--Joel Amos (talk) 02:09, 23 August 2016 (UTC)
:::::Fair enough. --bender235 (talk) 14:35, 23 August 2016 (UTC)
:What now? Should I have a trial run of 100 articles like with the previous Internet Archive conversion? --bender235 (talk) 23:39, 26 August 2016 (UTC)
::This may require multiple round of trials (hopefully increasing in size). Please run a short trial and post the initial results below. Please include in all summaries either a link to this BRFA trial or other ways for concerned editors to easily know what was going on and make a reply. — xaosflux Talk 02:51, 27 August 2016 (UTC)
:{{BotTrial|edits=50}} — xaosflux Talk 02:51, 27 August 2016 (UTC)
::{{BotTrialComplete}} Results are in {{user|Bender the Bot}} edit history. Found one issue, on E. R. Cowell: the Regex not only caught the URL, but also the pseudo-URL in the |publisher=
parameter and crippled the rest of the citation template (ran manually, didn't save). Best solution would be to have things like |publisher=Books.google.ca
replaced with |via=
(obviously Google Books is not the publisher of the books). Or, and that is the easier option for now, make the
in the Regex non-optional, so that it only replaces true URLs. Actually, I suggest the latter to keep this bot as simple as possible. --bender235 (talk) 22:53, 27 August 2016 (UTC)
:::{{t1|BAG assistance needed}} So, any further requests or can this bot go live? --bender235 (talk) 20:56, 6 September 2016 (UTC)
:{{u|Bender235}} Due to the huge size of your bot run, I'd like you to run a longer trial to give more opportunity for any odd issues to come up and get caught by other editors. — xaosflux Talk 04:43, 15 September 2016 (UTC)
:{{BotExtendedTrial|edits=600}} — xaosflux Talk 04:43, 15 September 2016 (UTC)
::Fair enough. --bender235 (talk) 14:16, 15 September 2016 (UTC)
::{{BotTrialComplete}}. Didn't spot any unusual behavior. --bender235 (talk) 15:49, 15 September 2016 (UTC)
:{{BotApproved}} Due to your large run size, please ramp up in stages up to the following, this will allow brief periods for unknown issues to be brought to your attention.
:#3000 edits, 24 hour pause
:#4000 edits, 24 hour pause
:#5000 edits, 24 hour pause
:#10000 edits, 24 hour pause
:#50000 edits, 24 hour pause
:#Rest of run. — xaosflux Talk 01:19, 19 September 2016 (UTC)
:The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.