Wikipedia:Bots/Requests for approval/PrimeBOT 17
[[User:PrimeBOT|PrimeBOT 17]]
{{Newbot|PrimeBOT|17}}
Operator: {{botop|Primefac}}
Time filed: 14:24, Saturday, May 27, 2017 (UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): AWB
Source code available: AWB
Function overview: Remove UTM parameters (Google analytics) from external links and references (i.e. resurrect Theo's Little Bot task #23)
Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 55#Remove Google Analytics tracking from external links
Edit period(s): Once a month
Estimated number of pages affected: [https://en.wikipedia.org/w/index.php?search=+insource%3A%2Futm_%28content%7Cterm%7Ccampaign%7Cmedium%7Csource%29%2F&title=Special:Search&profile=advanced&fulltext=1&ns0=1&ns11=1&searchToken=8k7v0gsc334fli2bn1ycw6hy8 16000] in the initial run, and maybe 200 a month after that? Theo's task ran in batches of 500, which also works, but I couldn't then give a timeframe.
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: Straight-forward find-and-remove. Regex:
{{red|
([http://rubular.com/r/GGVIfdcBx8 test cases])\??(?:&?utm_[^=]*?=[^&\s\]\|]*)+(?=]|\s|\|) }}|{{blue|(?<=\?)(?:&?utm_[^=]*?=[^&\s\]\|]*)+& }}{{red|
([http://rubular.com/r/eoCp4RIDbQ tests])\??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|) }}|{{blue|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+& }}|{{green|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+& }}
As near as I can tell, I've managed to cover all of the edge cases which were of concern in the original BRFA. The blue section covers the case where ?utm_ is followed by an & not followed by another utm_ (e.g. ?utm_example=1234¶=value
). The red hits everything else (i.e. where the utm_ term(s) are only at the end of the URL). Green is when utm falls in between two other codes
=Discussion=
- As a note, unlike the original bot run this will not be checking to see if the URLs are still valid. AWB doesn't do that. Primefac (talk) 14:24, 27 May 2017 (UTC)
:{{BotTrial|edits=50}} Please post results here when done. — xaosflux Talk 14:27, 27 May 2017 (UTC)
::{{BotTrialComplete}}. [https://en.wikipedia.org/w/index.php?title=Special:Contributions/PrimeBOT&offset=20170527150500&limit=51&target=PrimeBOT Edits]. Note that there were three errors (1, 2, and 3), which I undid and corrected (1, 2, and 3) with new regex, which I've amended above to reflect the changes. Primefac (talk) 15:30, 27 May 2017 (UTC)
In addition to the UTM parameters, there's also "?cmpid", and probably others. DS (talk) 16:14, 1 June 2017 (UTC)
:An easy addition, just replace utm_
with cmpid
in the regex. Primefac (talk) 18:37, 1 June 2017 (UTC)
:{{BotApproved}} Task approved. — xaosflux Talk 03:44, 6 June 2017 (UTC)
- Amended (00:29, 7 August 2017 (UTC)) to include
?mbid
parameter cleanup as well "speedily approved" in lieu of another task as this is low volume. — xaosflux Talk 00:29, 7 August 2017 (UTC)
:The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.
- Amended to include removing tracking from New York Times URLs; see talk. — The Earwig (talk) 15:36, 25 March 2024 (UTC)