Wikipedia:Bots/Requests for approval/WikiCleanerBot 18
[[User:WikiCleanerBot|WikiCleanerBot 18]]
{{BRFA help}}
{{Newbot|WikiCleanerBot|18}}
Operator: {{botop|NicoV}}
Time filed: 13:40, Friday, June 12, 2020 (UTC)
Function overview: Fix some nowiki tags after internal links (cf. Wikipedia:CHECKWIKI/WPC 553 dump).
Automatic, Supervised, or Manual: Automatic
Programming language(s): Java (WPCleaner)
Source code available: [https://github.com/WPCleaner/wpcleaner On GitHub] (especially [https://github.com/WPCleaner/wpcleaner/blob/master/WikipediaCleaner/src/org/wikipediacleaner/api/check/algorithm/CheckErrorAlgorithm553.java algorithm 553])
Links to relevant discussions (where appropriate):
Edit period(s): Twice a month
Estimated number of pages affected: About 10k pages found during the dump analysis, not all can be fixed automatically, so a few thousand edits.
Namespace(s): Main
Exclusion compliant (Yes/No): Yes
Function details: Tools like VE or CX tend to create internal links with incorrect formatting (the hyperlink is not covering all the letters), because the user doesn't always select exactly on what the link should apply. Part of such errors could be fixed automatically (see for example what my bot did on [https://fr.wikipedia.org/w/index.php?title=Sp%C3%A9cial:Contributions/WikiCleanerBot&offset=20200611234000&target=WikiCleanerBot frwiki] for several thousand articles). Examples of situations where the bot can automatically fix the internal link:
- [https://fr.wikipedia.org/w/index.php?title=%E2%80%99Ori_tahiti&diff=prev&oldid=171923715 ’Ori tahiti],
replaced byEugène Caillo<nowiki/>t
: displayed text is the same as the target of the linkEugène Caillot - [https://fr.wikipedia.org/w/index.php?title=%C5%9Eabran_(raion)&diff=prev&oldid=171923714 Şabran (raion)],
replaced byforêt<nowiki/>s
: "s" is configured on frwiki as a possible extension (plural). Configuration for enwiki will also include "s", I will see with what is left after a first pass if other extensions can be added.forêts - [https://fr.wikipedia.org/w/index.php?title=%C5%92dipe_et_le_Sphinx&diff=prev&oldid=171923710 Œdipe et le Sphinx],
replaced byIngres<nowiki/>
: whitespace after the nowiki makes it useless.Ingres - [https://fr.wikipedia.org/w/index.php?title=%C4%B0brahim_Tatl%C4%B1ses&diff=prev&oldid=171923708 İbrahim Tatlıses],
replaced byDivorcé<nowiki/>s
: "s" is configured on frwiki as a possible extension (plural).Divorcés
After the first run on frwiki, I'm adding some [https://fr.wikipedia.org/w/index.php?title=Sp%C3%A9cial:Contributions/WikiCleanerBot&offset=20200614143500&target=WikiCleanerBot&limit=157 other automatic fixing abilities to the bot]:
- [https://fr.wikipedia.org/w/index.php?title=Albert_Rhys_Williams&diff=prev&oldid=171997066 Albert Rhys Williams],
replaced byMariett<nowiki/>a
: displayed text is the same as the target of the link minus the text after the opening parenthesisMarietta - [https://fr.wikipedia.org/w/index.php?title=Amarok_(mythologie)&diff=prev&oldid=171997076 Amarok (mythologie)],
replaced byBlack Meta<nowiki/>l
: displayed text is the same as the target of the link, regardless of uppercase/lowercaseBlack Metal
=Discussion=
- Comment: Thanks for taking this on. It looks uncontroversial. Do you know if there is a phabricator bug report so that this can get fixed in VE? – Jonesey95 (talk) 17:35, 12 June 2020 (UTC)
- : Hi Jonesey95. I don't know if there's a specific phabricator bug report for this, but I know the subject of incorrect links created by VE has been a long-standing issue... For example, you also have many links that are to an unrelated article (see for example, the list I'm generating on each dump analysis for frwiki for internal links like
). --NicoV (Talk on frwiki) 14:24, 14 June 2020 (UTC)2000 - ::I haven't seen a bug report for that issue. I will be happy to file one. Do you have links to diffs? We don't link to years on en.WP, but I imagine that there are incorrect links being generated somewhere, given all of the other link-related bugs with VE. – Jonesey95 (talk) 15:27, 14 June 2020 (UTC)
- ::: Hi Jonesey95. A few examples gathered from [https://en.wikipedia.org/w/index.php?hidebots=1&hidecategorization=1&tagfilter=nowiki+added&limit=500&days=30&enhanced=1&damaging__likelybad_color=c4&damaging__verylikelybad_color=c5&title=Special:RecentChanges&urlversion=2 Recent changes with nowiki tag], just by looking at the last 20 edits:
- :::* [https://en.wikipedia.org/w/index.php?title=Mesta&curid=97305&diff=962631681&oldid=940586313 Mesta]:
grazing lands
replaced bygrazing land<nowiki/>s - :::* [https://en.wikipedia.org/w/index.php?title=Chilean_Army&curid=5227710&diff=962625255&oldid=956137928 Chilean Army]:
added.mounted band<nowiki/>s - :::* [https://en.wikipedia.org/w/index.php?title=Manuel_Romero_Rubio&curid=58675681&diff=962612861&oldid=962605456 Manuel Romero Rubio]:
replaced byMexico City2 5, Mexico Cit<nowiki/>y<ref name=":1" /><ref name=":3" />. - :::* [https://en.wikipedia.org/w/index.php?title=Fox_Networks_Group&curid=13387471&diff=962608657&oldid=960015274 Fox Networks Group]:
replaced byFox Corporation and Walt Disney Television while Fox Corporation<nowiki/>while - :::* [https://en.wikipedia.org/w/index.php?title=Attica_Scott&curid=57870066&diff=962597254&oldid=962435488 Attica Scott]:
addedshooting death of Breonna Taylo<nowiki/>r - :::* [https://en.wikipedia.org/w/index.php?title=New_Democratic_Party&curid=19283982&diff=962592082&oldid=962588486 New Democratic Party]:
andIndo-Canadian<nowiki/>to
added.fourth-largest party<nowiki/>in - :::As you can see, it's quite frequent (and most of the other nowiki tags are just different problems...). I gave up on reporting this kind of things to VE team, I reported them years ago... --NicoV (Talk on frwiki) 06:07, 15 June 2020 (UTC)
- :::Jonesey95. If you were speaking about examples of links with an incorrect target, I don't have diffs, but I noticed articles with such problems when doing some trial edits, but I didn't try to find where they are coming from:
- :::* [https://en.wikipedia.org/w/index.php?title=2015_New_South_Wales_Cup&diff=prev&oldid=950680015 2015 New South Wales Cup]:
Newcastle Sports Ground - :::* [https://en.wikipedia.org/w/index.php?title=Akaoni_Studio&diff=prev&oldid=950680370 Akaoni Studio]:
iPhone - :::They are hard to track by a bot (except for the dates, that's why I added #526 for frwiki). --NicoV (Talk on frwiki) 06:21, 15 June 2020 (UTC)
- :::Jonesey95. Even if you're not supposed to link to years on en.WP, I just started a dump analysis for #526, and it quickly found articles with such problems... Maybe some are false positives.
- :::* Australian Labor Party:
1946 - :::* Clement Attlee:
1955 - :::* Ducati Motor Holding S.p.A.:
1999 - :::* European Free Trade Association:
and1973 1995 - :::* Spenser (character):
2007 - :::* William Ewart Gladstone:
1845 - :::* 549:
624 - :::Wikipedia:CHECKWIKI/WPC 526 dump should be generated in a few hours. --NicoV (Talk on frwiki) 06:53, 15 June 2020 (UTC)
- :::Jonesey95. More than 6k pages listed in Wikipedia:CHECKWIKI/WPC 526 dump. --NicoV (Talk on frwiki) 18:06, 15 June 2020 (UTC)
- ::::
Most of those links in the WPC 526 dump look OK to me, per WP:YEARLINK.(sorry, I misread the first few links; I see that most of them appear to link to the wrong year.) It is links like1999 that are typically (but not always) discouraged, per WP:YEARLINK. The real problem links are the ones like
. Over 6,000! Wow. – Jonesey95 (talk) 18:57, 15 June 2020 (UTC)2007 - ::::One note: I believe that many of the links related to sports seasons are intentional, like
, because, as the article says, "The 1998 Pro Bowl was the NFL's all-star game for the 1997 season." In the US, American football seasons take place almost entirely in the second half of a given year, with the post-season games at the beginning of the following year but designated as part of the previous year's "season". If that makes sense. If there is any way to avoid changing links where the link text is one number higher than the target year, please do so pending further discussion. – Jonesey95 (talk) 03:33, 16 June 2020 (UTC)1997 - :::::Hi Jonesey95. I can try to ignore
when xxx=yyyy+1. Do you think it's the same reason for the elections links (2 in the above examples) or it will be problems that are missed? Or do I need to configure the list of "..." for which xxxx=yyyy+1 should be ignored? The incorrect links problem for years if just the tip of the iceberg for incorrect links, but I don't know how I can find all the other ones... --NicoV (Talk on frwiki) 09:22, 16 June 2020 (UTC)yyyy - ::::::The election links generally take the form xxxx=yyyy-1, like 1836 United States presidential election, where the election took place in one year (in November), but the dispute over it took place while votes were being counted in the following months. I think the bot might need to ignore all cases where the years are different by one (higher or lower), since it will run into context problems. The links that differ by more than one look like they are mostly typos and copy/paste errors. – Jonesey95 (talk) 14:10, 16 June 2020 (UTC)
- :::::::Hi Jonesey95. I've modified the detection to [https://en.wikipedia.org/w/index.php?title=Wikipedia:WikiProject_Check_Wikipedia/Translation&curid=22006740&diff=962995884&oldid=962912582 allow configuring the minimum difference], so next time the list is generated, it will be trimmed down a bit. I think we should continue the discussion elsewhere, like Wikipedia talk:WPCleaner. I don't think it's possible to fix this error automatically (sometimes the link is correct, sometimes the displayed year is correct): on frwiki, I'm just adding a template after the link to request help from editors to fix the link. --NicoV (Talk on frwiki) 06:00, 17 June 2020 (UTC)
{{BotTrial|edits=50}} Primefac (talk) 23:55, 15 June 2020 (UTC)
:Thanks Primefac. {{BotTrialComplete}} I've done the [https://en.wikipedia.org/w/index.php?title=Special:Contributions/WikiCleanerBot&offset=20200616184400&target=WikiCleanerBot&limit=50 50 edits], and bot behaved as expected. --NicoV (Talk on frwiki) 18:46, 16 June 2020 (UTC)
::I looked through all 50 test edits, and they all looked fine to me. In [https://en.wikipedia.org/w/index.php?title=1918_VPI_Gobblers_football_team&diff=prev&oldid=962915398 diff 1], I would have changed the link to "Wake Forest's" (I think this is the expected format on en.WP, although I can't find the guideline at the moment; I don't think you'll get any complaints), but the bot's "Wake Forest's" is acceptable. — Preceding unsigned comment added by NicoV (talk • contribs) 06:00, 17 June 2020 (UTC)
:::{{BotApproved}} Primefac (talk) 17:12, 19 June 2020 (UTC)
:The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.