WP:Bots/Requests for approval/William Avery Bot 3

William Avery Bot 3

[[User:William Avery Bot|William Avery Bot 3]]

{{BRFA help}}

{{Newbot|William Avery Bot|3}}

Operator: {{botop|William Avery}}

Time filed: 08:11, Wednesday, June 2, 2021 (UTC)

Function overview: Remove dead URL's and associated {{tl|Dead link}} tagging from CS1 templates if there is a free alternative available via an identifier.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python (pywikibot and mwparserfromhell)

Source code available: [https://bitbucket.org/WilliamAvery/wikipythonics/src/master/unneededDeadlinksBot.py unneededDeadlinksBot.py] [https://bitbucket.org/WilliamAvery/wikipythonics/src/master/unneededDeadlinks_medicine.sh unneededDeadlinks_medicine.sh] [https://bitbucket.org/WilliamAvery/wikipythonics/src/master/unneededDeadlinks.sh unneededDeadlinks.sh] [https://bitbucket.org/WilliamAvery/wikipythonics/src/master/wikipythonics_util.py wikipythonics_util.py]

Links to relevant discussions (where appropriate): WP:BOTREQ#Remove dead links from book and journal citation templates with identifiers - It was agreed to proceed only where there is free access indicated. There may be further discussions to be had regarding other access levels and situations. These would have to be the subject of a further BRFA.

Edit period(s): one time run, with possible ad hoc repeats

Estimated number of pages affected: 111 pages for WikiProject Medicine, c. 1000 pages overall

Namespace(s): Mainspace/Articles

Exclusion compliant (Yes/No): Yes

Function details:

  1. Query the database for pages in :Category:Articles with permanently dead external links (See SQL query in [https://bitbucket.org/WilliamAvery/wikipythonics/src/master/unneededDeadlinks.sh unneededDeadlinks.sh]). I expect initial runs to be confined to WikiProject Medicine articles. (See [https://bitbucket.org/WilliamAvery/wikipythonics/src/master/unneededDeadlinks_medicine.sh unneededDeadlinks_medicine.sh])
  2. Each {{tl|Deadlink}} present will be processed, and processing only proceeds if there is a value of 'yes' in the fix-attempted parameter.
  3. Using mwparserfromhell the deadlink template's ancestor elements are examined to find a tag or other element likely to contain the affected citation.
  4. The affected citation is the sibling template element that precedes the deadlink tag within the identified ancestor. This and the preceding step have details that depend closely on the mwparserfromhell parse tree, and have been refined by scanning large samples of pages. e.g. if there is a plain external link after the preceding template, then the dead link being tagged is that external link, and not a link to any preceding template, so no fix is possible.
  5. The candidate template is checked for a value in the url parameter. Processing only proceeds if there is a url. Editors sometimes mark a broken doi etc with {{tl|Deadlink}}, rather than using the doi-broken-date parameter.
  6. The candidate template is checked for a value in the archive-url parameter. Processing only proceeds if there is not an archive-url. Presence of an archive-url should indicate that the link is fixed.
  7. A check is made to see if there are identifier parameters present that indicate free access. For details see WP:CS1#Access indicator for named identifiers. Under the scope of this request, processing will only proceed if free access is indicated and the access is unaffected by presence of a doi-broken-date or a pmc-embargo-date.
  8. If all the conditions are fulfilled, the url parameter is removed from the template along with any access-date parameter, and the {{tl|Deadlink}} tagging is removed.
  9. Apply a general fix to remove template redirects using the rules at Wikipedia:AutoWikiBrowser/Template redirects. (I thought I could apply this fix prior to the above processing to simplify it, but some of the fixed templates were then removed, making the edit summary misleading.)

Test outputs:

  • User:William Avery Bot/testsample medicine - list of all 111 citations under WikiProject Medicine identified as fixable.
  • User:William Avery Bot/testsample all e - Sample list of 29 fixable citations from general articles beginning with letter 'E'
  • https://en.wikipedia.org/w/index.php?title=User%3AWilliam_Avery_Bot%2Fdeadlinkstest&type=revision&diff=1026339187&oldid=1026333752 - Test edit in userspace. Once a suitable case has been identified, the actual fix is rather simple.

=Discussion=

{{BotTrial|edits=50}} Primefac (talk) 23:13, 29 June 2021 (UTC)

:{{BotTrialComplete}} I have checked the edits made, and the free sources indicated by the reference template parameters are indeed available in all fifty cases, which was my main worry. Edits [https://en.wikipedia.org/w/index.php?title=Special:Contributions/William_Avery_Bot&offset=202107221200&limit=50 here].

:{{Reply_to|Velayinosu|Ajpolino|GreenC}} Courtesy ping to original requesters and GreenC, who gave helpful advice. William Avery (talk) 11:34, 22 July 2021 (UTC)

  • This bot is very helpful since tracking down these articles and making these edits manually would be quite time consuming. It's more limited in scope than I personally prefer but that's understandable since it's new. Maybe its scope can be broadened over time. In any case, thank you for making this bot. Velayinosu (talk) 01:17, 24 July 2021 (UTC)

{{BotApproved}} Primefac (talk) 21:05, 22 August 2021 (UTC)

:The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.