Wikipedia:Bots/Requests for approval/PrimeBOT 17

PrimeBOT 17

[[User:PrimeBOT|PrimeBOT 17]]

{{Newbot|PrimeBOT|17}}

Operator: {{botop|Primefac}}

Time filed: 14:24, Saturday, May 27, 2017 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): AWB

Source code available: AWB

Function overview: Remove UTM parameters (Google analytics) from external links and references (i.e. resurrect Theo's Little Bot task #23)

Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 55#Remove Google Analytics tracking from external links

Edit period(s): Once a month

Estimated number of pages affected: [https://en.wikipedia.org/w/index.php?search=+insource%3A%2Futm_%28content%7Cterm%7Ccampaign%7Cmedium%7Csource%29%2F&title=Special:Search&profile=advanced&fulltext=1&ns0=1&ns11=1&searchToken=8k7v0gsc334fli2bn1ycw6hy8 16000] in the initial run, and maybe 200 a month after that? Theo's task ran in batches of 500, which also works, but I couldn't then give a timeframe.

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Straight-forward find-and-remove. Regex:

  • {{red|\??(?:&?utm_[^=]*?=[^&\s\]\|]*)+(?=]|\s|\|)}}|{{blue|(?<=\?)(?:&?utm_[^=]*?=[^&\s\]\|]*)+&}} ([http://rubular.com/r/GGVIfdcBx8 test cases])
  • {{red|\??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)}}|{{blue|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&}}|{{green|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&}} ([http://rubular.com/r/eoCp4RIDbQ tests])

As near as I can tell, I've managed to cover all of the edge cases which were of concern in the original BRFA. The blue section covers the case where ?utm_ is followed by an & not followed by another utm_ (e.g. ?utm_example=1234¶=value). The red hits everything else (i.e. where the utm_ term(s) are only at the end of the URL). Green is when utm falls in between two other codes

=Discussion=

  • As a note, unlike the original bot run this will not be checking to see if the URLs are still valid. AWB doesn't do that. Primefac (talk) 14:24, 27 May 2017 (UTC)

:{{BotTrial|edits=50}} Please post results here when done. — xaosflux Talk 14:27, 27 May 2017 (UTC)

::{{BotTrialComplete}}. [https://en.wikipedia.org/w/index.php?title=Special:Contributions/PrimeBOT&offset=20170527150500&limit=51&target=PrimeBOT Edits]. Note that there were three errors (1, 2, and 3), which I undid and corrected (1, 2, and 3) with new regex, which I've amended above to reflect the changes. Primefac (talk) 15:30, 27 May 2017 (UTC)

In addition to the UTM parameters, there's also "?cmpid", and probably others. DS (talk) 16:14, 1 June 2017 (UTC)

:An easy addition, just replace utm_ with cmpid in the regex. Primefac (talk) 18:37, 1 June 2017 (UTC)

:{{BotApproved}} Task approved. — xaosflux Talk 03:44, 6 June 2017 (UTC)

  • Amended (00:29, 7 August 2017 (UTC)) to include ?mbid parameter cleanup as well "speedily approved" in lieu of another task as this is low volume. — xaosflux Talk 00:29, 7 August 2017 (UTC)

:The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.

  • Amended to include removing tracking from New York Times URLs; see talk. — The Earwig (talk) 15:36, 25 March 2024 (UTC)