Wikipedia:Bots/Requests for approval/WikiCleanerBot 5

WikiCleanerBot 5

[[User:WikiCleanerBot|WikiCleanerBot 5]]

{{Newbot|WikiCleanerBot|5}}

Operator: {{botop|NicoV}}

Time filed: 08:49, Saturday, June 15, 2019 (UTC)

Function overview: Fix some WP:WCW errors using WPCleaner

Automatic, Supervised, or Manual: Automatic

Programming language(s): Java (WPCleaner)

Source code available: [https://github.com/WPCleaner/wpcleaner On GitHub]

Links to relevant discussions (where appropriate): Wikipedia:Bots/Requests for approval/PkbwcgsBot

Edit period(s): Twice a month, with the dump analysis that I already perform, see Wikipedia:Bots/Requests for approval/WikiCleanerBot.

Estimated number of pages affected: A few thousand articles for the initial runs spread over a few sessions, then normally only a few dozen or hundreds each time.

Namespace(s): Main

Exclusion compliant (Yes/No): Yes

Function details: As PkbwcgsBot hasn't been run for several months, I'd like to take over some of the tasks that Pkbwcgs was performing with WPCleaner. This request is a part of Wikipedia:Bots/Requests for approval/PkbwcgsBot. It includes automatically fixing part of some WP:WCW errors:

  • {{CW|2}}: tags with incorrect syntax. The list of articles that the bot will check comes from [https://tools.wmflabs.org/checkwiki/cgi-bin/checkwiki.cgi?project=enwiki&view=only&id=2 CheckWiki list #2] (currently 617 articles) and from Wikipedia:CHECKWIKI/WPC 002 dump (currently 725 articles): only some articles will be fixed, only the simple ones (like false </br> tags).
  • {{CW|16}}: unicode control characters. The list of articles that the bot will check comes from [https://tools.wmflabs.org/checkwiki/cgi-bin/checkwiki.cgi?project=enwiki&view=only&id=16 CheckWiki list #16] (currently 2508 articles): only some articles will be fixed, only the simple ones.
  • {{CW|17}}: category duplication. The list of articles that the bot will check comes from [https://tools.wmflabs.org/checkwiki/cgi-bin/checkwiki.cgi?project=enwiki&view=only&id=17 CheckWiki list #17] (currently 6328 articles) and from Wikipedia:CHECKWIKI/WPC 017 dump (currently 8449 articles): only some articles will be fixed, only the simple ones (like exact category duplication with same sort key). For example, on the first 100 articles in Wikipedia:CHECKWIKI/WPC 017 dump, 70 are modified.
  • {{CW|85}}: tags without content. The list of articles that the bot will check comes from [https://tools.wmflabs.org/checkwiki/cgi-bin/checkwiki.cgi?project=enwiki&view=only&id=85 CheckWiki list #85] (currently 831 articles): only some articles will be fixed, only the simple ones.
  • {{CW|88}}: DEFAULTSORT with a blank at first position. The list of articles that the bot will check comes from [https://tools.wmflabs.org/checkwiki/cgi-bin/checkwiki.cgi?project=enwiki&view=only&id=88 CheckWiki list #88] (currently 349 articles): only some articles will be fixed, only the simple ones.
  • {{CW|90}}: internal link written as an external link. The list of articles that the bot will check comes from [https://tools.wmflabs.org/checkwiki/cgi-bin/checkwiki.cgi?project=enwiki&view=only&id=90 CheckWiki list #90] (currently 5715 articles): only some articles will be fixed, only the simple ones.
  • {{CW|91}}: interwiki link written as an external link. The list of articles that the bot will check comes from [https://tools.wmflabs.org/checkwiki/cgi-bin/checkwiki.cgi?project=enwiki&view=only&id=91 CheckWiki list #91] (currently 2100 articles): only some articles will be fixed, only the simple ones.

=Discussion=

{{BotTrial|edits=140}} Please run 20 edits for each proposed task. Primefac (talk) 12:31, 15 June 2019 (UTC)

:Thanks ! Here are the results:

:* {{CW|2}} (tags with incorrect syntax): [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190615141300&target=WikiCleanerBot&limit=20 20 edits], no problems detected.

:* {{CW|16}} (unicode control characters): [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190616112600&target=WikiCleanerBot&limit=20 20 edits], no problems detected.

:* {{CW|17}} (category duplication): [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190615143500&target=WikiCleanerBot&limit=20 20 edits], no problems detected.

:* {{CW|85}} (tags without content): [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190616113900&target=WikiCleanerBot&limit=20 20 edits]. Wondering what I should do when there are comments inside the tag without content (gallery tags: [https://en.wikipedia.org/w/index.php?title=Ana_Vidjen&diff=prev&oldid=902077216 Ana Vidjen], [https://en.wikipedia.org/w/index.php?title=Andrews_County_Veterans_Memorial&diff=prev&oldid=902077242 Andrews County Veterans Memorial], [https://en.wikipedia.org/w/index.php?title=Battle_of_Naseby&diff=prev&oldid=902077331 Battle of Naseby], [https://en.wikipedia.org/w/index.php?title=Catherine_Marks&diff=prev&oldid=902077486 Catherine Marks] ; noinclude tags: [https://en.wikipedia.org/w/index.php?title=Barnet_Copthall&diff=prev&oldid=902077302 Barnet Copthall]): either keep the automatic fix as it is now, or comment the tag itself, or do nothing. Answer can be different depending on the tag.

:*: With respect to commented-out markup I'd leave them alone in case the comment markup is ever removed (e.g if the file(s) is/are restored). Jo-Jo Eumerus (talk, contributions) 14:22, 16 June 2019 (UTC)

:*:: Jo-Jo Eumerus. To be on the safe side, I've modified WPC not to automatically remove tag without content when there are comments inside them. --NicoV (Talk on frwiki) 17:22, 17 June 2019 (UTC)

:* {{CW|88}} (DEFAULTSORT with a blank at first position): [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190616115000&target=WikiCleanerBot&limit=20 20 edits], no problems detected.

:* {{CW|90}} (internal link written as an external link): [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190616115900&target=WikiCleanerBot&limit=20 20 edits], no problems detected.

:* {{CW|91}} (interwiki link written as an external link):

:** [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190616122800&target=WikiCleanerBot&limit=4 4 edits], a problem detected on the 4th edit on [https://en.wikipedia.org/w/index.php?title=Azerbaijan_State_Philharmonic_Hall&diff=prev&oldid=902082340 Azerbaijan State Philharmonic Hall]. I've modified WPC not to automatically replace the external link when it's not surrounded by square brackets.

:** [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190616145100&target=WikiCleanerBot&limit=6 6 edits], a problem detected on the 6th edit on [https://en.wikipedia.org/w/index.php?title=Counties_of_Norway&diff=prev&oldid=902096730 Counties of Norway]. I've modified WPC not to automatically replace the external link when there's no text provided.

:** [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190616151500&target=WikiCleanerBot&limit=10 10 edits], no problems detected.

:{{BotTrialComplete}}. --NicoV (Talk on frwiki) 14:17, 15 June 2019 (UTC)

::{{tl|BAG assistance needed}} --NicoV (Talk on frwiki) 13:57, 25 July 2019 (UTC)

:::{{ping|NicoV}} {{BotExtendedTrial}} 20 edits each for {{CW|85}} and {{CW|91}}. Headbomb {t · c · p · b} 04:07, 6 August 2019 (UTC)

::::Thanks Headbomb. Here are the results:

::::* {{CW|85}} (tag without content): [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190806202900&target=WikiCleanerBot&limit=20 20 more edits], no problems dectect.

::::* {{CW|91}} (interwiki link written as an external link): [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20190806204700&target=WikiCleanerBot&limit=20 20 more edits], no problems dectect.

::::{{BotTrialComplete}} --NicoV (Talk on frwiki) 20:48, 6 August 2019 (UTC)

:::::{{Re|NicoV}} [https://en.wikipedia.org/w/index.php?title=Die_Achse_des_Guten&diff=909668612&oldid=908824248 This] would be much better than [https://en.wikipedia.org/w/index.php?title=Die_Achse_des_Guten&diff=909663839&oldid=908824248 this]. Headbomb {t · c · p · b} 21:01, 6 August 2019 (UTC)

::::::{{Re|Headbomb}} I can also remove the carriage return if the empty tag was on the first line, and alone in the line, if you want. For other cases (not on the first line), there may be side effects with removing the carriage return. What do you say? --NicoV (Talk on frwiki) 21:22, 6 August 2019 (UTC)

:::::::Should be for otherwise empty lines only. Headbomb {t · c · p · b} 21:34, 6 August 2019 (UTC)

::::::::{{Re|Headbomb}} The problem is that it will change the display in some situations, see below. --NicoV (Talk on frwiki) 22:53, 6 August 2019 (UTC)

:::::::::{{Re|Headbomb}} I've modified WPC to remove extra white lines (if there are 2 or more, or if they are the beginning or the end of the article). [https://en.wikipedia.org/w/index.php?title=User:NicoV/Test&diff=next&oldid=909814471 Result on the same article that you reported]. --NicoV (Talk on frwiki) 19:32, 7 August 2019 (UTC)

Example:

Line 1 before noinclude tag

<noinclude></noinclude>

Line 2 after noinclude tag

Before removal of the empty tag:

Line 1 before noinclude tag

Line 2 after noinclude tag

After removal of the empty tag (keeping the empty line): same display

Line 1 before noinclude tag

Line 2 after noinclude tag

After removal of the empty tag (removing the empty line): modified display

Line 1 before noinclude tag

Line 2 after noinclude tag

:{{BotExtendedTrial}} 20 edits to see if that case is handled correctly in, and results in oddities otherwise. Have a mix of that case and others in the trial if possible. Headbomb {t · c · p · b} 20:10, 7 August 2019 (UTC)

::{{Re|Headbomb}} Here are the new edits:

::# [https://en.wikipedia.org/w/index.php?title=Feminism_in_Sweden&diff=prev&oldid=909824463 Feminism in Sweden]: span tags in the middle of a sentence

::# [https://en.wikipedia.org/w/index.php?title=FK_Dubnica&diff=prev&oldid=909826071 FK Dubnica]: gallery tags in their own lines

::# [https://en.wikipedia.org/w/index.php?title=FC_Epfendorf_1929&diff=prev&oldid=909826164 FC Epfendorf 1929]: center tags in table cells

::# [https://en.wikipedia.org/w/index.php?title=Eretz_Yisrael_Shelanu&diff=prev&oldid=909826196 Eretz Yisrael Shelanu]: div tags at the end of table

::# [https://en.wikipedia.org/w/index.php?title=Elsa_Cladera_de_Bravo&diff=prev&oldid=909826256 Elsa Cladera de Bravo]: includeonly tags in the middle of a sentence

::# [https://en.wikipedia.org/w/index.php?title=Dominic_Fotia&diff=prev&oldid=909826361 Dominic Fotia]: gallery tags at the beginning of the article

::# [https://en.wikipedia.org/w/index.php?title=Domadugu&diff=prev&oldid=909826407 Domadugu]: div tags on their own lines

::# [https://en.wikipedia.org/w/index.php?title=District_of_Columbia_and_United_States_Territories_Quarters&diff=prev&oldid=909826440 District of Columbia and United States Territories Quarter]: noinclude tags at the beginning of the article

::# [https://en.wikipedia.org/w/index.php?title=History_of_agriculture&diff=prev&oldid=909827287 History of agriculture]: div tags spanning on 2 lines

::# [https://en.wikipedia.org/w/index.php?title=Hidden_message&diff=prev&oldid=909827322 Hidden message]: includeonly tags in the middle of a sentence

::# [https://en.wikipedia.org/w/index.php?title=Heritage_Day_(South_Africa)&diff=prev&oldid=909827358 Heritage Day (South Africa)]: includeonly tags in the middle of a sentence

::# [https://en.wikipedia.org/w/index.php?title=Heidi_Quante&diff=prev&oldid=909827375 Heidi Quante]: gallery tags spanning on 2 lines

::# [https://en.wikipedia.org/w/index.php?title=H._M._Khoja&diff=prev&oldid=909827392 H. M. Khoja]: gallery tags spanning on 2 lines

::# [https://en.wikipedia.org/w/index.php?title=God%27s_Favorite_Customer&diff=prev&oldid=909827414 God's Favorite Customer]: includeonly tags at the beginning of the article

::# [https://en.wikipedia.org/w/index.php?title=Geneva_fusillade_of_9_November_1932&diff=prev&oldid=909827462 Geneva fusillade of 9 November 1932]: span tags in the middle of a sentence

::# [https://en.wikipedia.org/w/index.php?title=Fredericton_shooting&diff=prev&oldid=909827484 Fredericton shooting]: includeonly tags at the beginning of a sentence

::# [https://en.wikipedia.org/w/index.php?title=Frank_Dorsa&diff=prev&oldid=909827507 Frank Dorsa]: gallery tags at the beginning of the article

::# [https://en.wikipedia.org/w/index.php?title=Jacques_Delors&diff=prev&oldid=909828453 Jacques Delors]: includeonly tags at the beginning of the article

::# [https://en.wikipedia.org/w/index.php?title=Jakobstad_Museum&diff=prev&oldid=909829908 Jakobstad Museum]: gallery tags at the end of the article

::# [https://en.wikipedia.org/w/index.php?title=Isleworth_Mona_Lisa&diff=prev&oldid=909829938 Isleworth Mona Lisa]: includeonly tags at the beginning of the article

::{{BotTrialComplete}} --NicoV (Talk on frwiki) 21:29, 7 August 2019 (UTC)

{{BotApproved}} {{CW|85}} is technically cosmetic in many cases, but I feel it's editor-hostile enough to deal with it through a bot. Ping me if there's pushback on that task. Headbomb {t · c · p · b} 22:12, 7 August 2019 (UTC)

:The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.