User talk:GreenC bot

{{ombox

| type = content

| style = border:2px solid #B22222

| image = File:Crystal Clear action exit.svg

| text = You can stop the bot by pushing the stop button. The bot sees and immediately stops running. Unless it is an emergency please consider reporting problems first to my talk page.

}}

{{Archive box|search=yes|

}}

Bot updating Webarchive template is adding "url" same as existing "url2"

This bot made a group of WaybackMedic 2.5 edits in June where it "rescued" an archive link in the {{para|url}} parameter of {{tl|Webarchive}}, replacing it with a [https://web.archive.org/web/20100105013709/http://canoeicf.com/site/canoeint/if/downloads/result/Pages%201-41%20from%20Medal%20Winners%20ICF%20updated%202007-2.pdf?MenuID=Results%2F1107%2F0%2CMedal%5Fwinners%5Fsince%5F1936%2F1510%2F0 this link] which was already in the {{para|url2}} parameter. Two examples of this are {{diff|Grant Bramwell|1093238567|957849833|Grant Bramwell: revised 1 June 2022}} and {{diff|List of ICF Canoe Sprint World Championships medalists in men's kayak|1095093520|1093813352|List of ICF Canoe Sprint World Championships medalists in men's kayak: revised 26 June 2022}}. Can the bot remove the duplicate url2/date2/title2 parameters and renumber any subsequent url3/date3/title3, etc.? I've fixed over 500 of these edits myself, but there are still [https://en.wikipedia.org/w/index.php?search=insource%3A%2F%5C%7B%5C%7BWebarchive+*%5C%7Curl%3Dhttps%3A%5C%2F%5C%2Fweb.archive.org%5C%2Fweb%5C%2F20100105013709%5C%2Fhttp%3A%5C%2F%5C%2Fcanoeicf.com%5C%2Fsite%5C%2Fcanoeint%5C%2Fif%5C%2Fdownloads%5C%2Fresult%5C%2FPages%25201-41%2520from%2520Medal%2520Winners%2520ICF%2520updated%25202007-2.pdf%5C%3FMenuID%3DResults%252F1107%252F0%252CMedal%255Fwinners%255Fsince%255F1936%252F1510%252F0%2F+insource%3A%2F%5C%7Curl2%3Dhttps%3A%5C%2F%5C%2Fweb.archive.org%5C%2Fweb%5C%2F20100105013709%5C%2Fhttp%3A%5C%2F%5C%2Fcanoeicf.com%5C%2Fsite%5C%2Fcanoeint%5C%2Fif%5C%2Fdownloads%5C%2Fresult%5C%2FPages%25201-41%2520from%2520Medal%2520Winners%2520ICF%2520updated%25202007-2.pdf%5C%3FMenuID%3DResults%252F1107%252F0%252CMedal%255Fwinners%255Fsince%255F1936%252F1510%252F0%2F&title=Special%3ASearch&go=Go&ns0=1 over 700 remaining to be fixed]. Thanks. -- Zyxw (talk) 03:54, 9 August 2022 (UTC)

:That was part of the deprecation of WebCite which is a dead archive provider. It didn't account for dups. It's complicated here because even though {{para|url}} and {{para|url2}} are the same, {{para|title}} and {{para|title2}} are different - which do you choose. I think the best course is the keep {{para|url}} set and remove the {{para|url2}} set, at least based on two examples. In terms of renumbering that is not required as the webarchive template is designed to allow any numbers up to 10, so long as there is a {{para|url}} .. aka {{para|url1}} .. is the only requirement. I'll start looking at this today. -- GreenC 15:35, 9 August 2022 (UTC)

:: {{Reply|GreenC}} I agree with keeping the {{para|url}} set and removing the {{para|url2}} set when there is a duplicate URL and that is what I did for the 500+ already fixed. I also thought {{tl|Webarchive}} might automatically handle the missing {{para|url2}} set and display the {{para|url3}} set, but as per these tests that is not the case:

:: archive with url/date/title, url2/date2/title2, and url3/date3/title3

::* {{Webarchive |url=https://web.archive.org/web/20100105013709/http://canoeicf.com/site/canoeint/if/downloads/result/Pages%201-41%20from%20Medal%20Winners%20ICF%20updated%202007-2.pdf?MenuID=Results%2F1107%2F0%2CMedal%5Fwinners%5Fsince%5F1936%2F1510%2F0 |date=5 January 2010 |title=Medal Winners – Olympic Games and World Championships (1936–2007) – Part 1: flatwater (now sprint). CanoeICF.com. International Canoe Federation. |url2=https://web.archive.org/web/20100105013709/http://canoeicf.com/site/canoeint/if/downloads/result/Pages%201-41%20from%20Medal%20Winners%20ICF%20updated%202007-2.pdf?MenuID=Results%2F1107%2F0%2CMedal%5Fwinners%5Fsince%5F1936%2F1510%2F0 |date2=5 January 2010 |title2=Wayback Machine |url3=https://web.archive.org/web/20160113142416/http://www.bcu.org.uk/files/Pages%201-41%20from%20Medal%20Winners%20ICF%20updated%202007-2.pdf |date3=13 January 2016 |title3=BCU.org.uk}}

:: url2/date2/title2 removed with url3/date3/title3 remaining

::* {{Webarchive |url=https://web.archive.org/web/20100105013709/http://canoeicf.com/site/canoeint/if/downloads/result/Pages%201-41%20from%20Medal%20Winners%20ICF%20updated%202007-2.pdf?MenuID=Results%2F1107%2F0%2CMedal%5Fwinners%5Fsince%5F1936%2F1510%2F0 |date=5 January 2010 |title=Medal Winners – Olympic Games and World Championships (1936–2007) – Part 1: flatwater (now sprint). CanoeICF.com. International Canoe Federation. |url3=https://web.archive.org/web/20160113142416/http://www.bcu.org.uk/files/Pages%201-41%20from%20Medal%20Winners%20ICF%20updated%202007-2.pdf |date3=13 January 2016 |title3=BCU.org.uk}}

:: url2/date2/title2 removed and url3/date3/title3 renumbered

::* {{Webarchive |url=https://web.archive.org/web/20100105013709/http://canoeicf.com/site/canoeint/if/downloads/result/Pages%201-41%20from%20Medal%20Winners%20ICF%20updated%202007-2.pdf?MenuID=Results%2F1107%2F0%2CMedal%5Fwinners%5Fsince%5F1936%2F1510%2F0 |date=5 January 2010 |title=Medal Winners – Olympic Games and World Championships (1936–2007) – Part 1: flatwater (now sprint). CanoeICF.com. International Canoe Federation. |url2=https://web.archive.org/web/20160113142416/http://www.bcu.org.uk/files/Pages%201-41%20from%20Medal%20Winners%20ICF%20updated%202007-2.pdf |date2=13 January 2016 |title2=BCU.org.uk}}

:: -- Zyxw (talk) 16:15, 9 August 2022 (UTC)

:::Reported at Template_talk:Webarchive#Gaps_in_argument_sequence. I wrote the template originally but Trappist did a major rewrite so I'm not sure if that is my bug or his. I processed the first 500 articles and there are only 3 with a {{para|url3}} suggesting 40 or 50 at most in the whole bunch. Anyway it won't be difficult to renumber them. -- GreenC 16:26, 9 August 2022 (UTC)

::::Ah miscalculated it's 733 not 7,330 :) It's done see anything more let me know. -- GreenC 17:08, 9 August 2022 (UTC)

::::Fixed the webarchive bug. -- GreenC 18:06, 9 August 2022 (UTC)

Avoid editing inside HTML comments

GreenC bot now edits inside HTML comments eg. Special:Diff/1107954452, but I suggest it not to. Although the edit in this example happened to be harmless (even useful), in general, comments could be used for a wide range of reasons, so there is a higher risk that automatic edits could break their intentions. Wotheina (talk) 03:49, 2 September 2022 (UTC)

:That's true but there is a positive trade-off so for a couple reasons I am OK fixing certain (not all) link rot in comments, as I have been doing for 7 years. If someone wants to preserve a block of immutable wikitext they should use the talk page, user page or offline - otherwise anyone can edit the comment or delete it entirely. Comments can be strangely formatted, I take measures, auto and manual, to check commented text before posting a live diff. -- GreenC 05:39, 2 September 2022 (UTC)

Please Update the monthly list of Top 10000 wikipedia users by Article Count

Please Update the monthly list of Top 10000 wikipedia users by Article Count which changes every 1st and 15th date of a month. Abbasulu (talk) 07:52, 3 October 2022 (UTC)

:It's still running for some reason very slowly in 3 days it only completed 19%. -- GreenC 12:51, 3 October 2022 (UTC)

Exactly what purpose did this edit serve? Edit summary is misleading at best

https://en.wikipedia.org/w/index.php?title=Rodney_Marks&diff=1095741886&oldid=1091111369 108.246.204.20 (talk) 20:17, 3 October 2022 (UTC)

:Don't use {{tlx|dead link}} if the citation has a working {{para|archive-url}}. -- GreenC 20:46, 3 October 2022 (UTC)

::it doesn't. "this page is not available". 108.246.204.20 (talk) 04:15, 14 October 2022 (UTC)

:::Ah soft-404. Removed. O also updated the IABot databace. -- GreenC 04:24, 14 October 2022 (UTC)

RSSSF

Why is this bot changing "website=rsssf.com" to "website=RSSSF", where there is already "publisher=RSSSF" parameter, and then in many pages you get stupid outcome like [https://i.ibb.co/26Ztmps/rsssf.png this] with double RSSSF linking? Snowflake91 (talk) 10:27, 7 February 2023 (UTC)

:Yeah it's not ideal, a work in progress. In any case the problem is there should not be both {{para|work}} and {{para|publisher}} use one or the other not both. And should not use a domain name, use the name of the site, is best practice on Wikipedia. The re are so many RSSSF citations, and so many problems with them, I've done a lot of work to fix them but there are still things that need more work. -- GreenC 15:22, 7 February 2023 (UTC)

::Prefer {{para|website}} over {{para|publisher}}. {{tlx|cite web}} does not include {{para|publisher}} in the citation's metadata.

::—Trappist the monk (talk) 16:18, 7 February 2023 (UTC)

:::Special:Diff/1038698982/1138241646 -- GreenC 21:44, 8 February 2023 (UTC)

I think all the doubles are cleared, if you see any more or other problems let me know. -- GreenC 21:45, 8 February 2023 (UTC)

WaybackMedic

{{ping|GreenC}} It seems that WaybackMedic 2.5 is running by GreenC bot 2. However, I can't find its source code of version 2.5 in the [https://github.com/greencardamom/WaybackMedic Github repo]. I need to read the latest code to learn its current behavior. Have you published it yet? -- NmWTfs85lXusaybq (talk) 14:04, 24 March 2023 (UTC)

:I can send snippets or functions if you want for anything you are interested in. The entire codebase is not currently available for public due to containing some proprietary information. It's written in Nim, and some awk utils. -- GreenC 14:44, 24 March 2023 (UTC)

::The bot detection of businessweek.com you mentioned in Wikipedia:Village_pump_(technical)/Archive_203#businessweek.com_links may be bypassed by simply assigning an user agent of a web browser in the header of http requests, such as Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36. As far as I know from version 2.1, WaybackMedic may execute external commands (via execCmdEx) to determine page status and the assignment of user-agent should be easily implemented via some available parameters. By the way, as of version 2.1, I can see the validate_robots function is implemented in medicapi.nim. -- NmWTfs85lXusaybq (talk) 16:55, 24 March 2023 (UTC)

:::Thank you for the suggestion to use a browser agent. I tried it, they appear to limit based on query rate, and it's pretty sensitive. I was able to trigger it by manually requesting 8 headers rapidly then it stopped working, sending a header with "HTTP/1.1 307 s2s_high_score" and redirect to a javascript challenge ("press and hold button"). Maybe I could slow the bot down enough between queries, it would be difficult, and extremely slow, perhaps a month or longer for 10k articles, and would need to verify every header is not 307 otherwise abort and manually clear the challenge. GreenC 21:36, 24 March 2023 (UTC)

::::If they limit the query rate based on ip, you can find some web proxies to accelerate this procedure as your bot may behave like a web crawler. After you collect and validate some free proxies, you can just apply them alternately to your bot, although their stability is not guaranteed. -- NmWTfs85lXusaybq (talk) 03:47, 25 March 2023 (UTC)

:::::I have access to a web proxy that uses home based IPs and it still didn't work. Maybe the solution is to pull every URL into a file and process them outside the bot with a simple script that waits x seconds between each header query. Then feed the results to the bot which URLs are dead. It can run for however long it wouldn't matter. Trying to do it inside the bot is too error prone too complicated and ties up the bot too long. -- GreenC 04:11, 25 March 2023 (UTC)

::::::It's a good idea to run this job outside the bot. However, I'm not sure what you mean by {{talk quote inline|a web proxy that uses home based IPs}}. Have you tried high-anonymity proxies? Did you change proxy IP every time you made a new request? NmWTfs85lXusaybq (talk) 04:45, 25 March 2023 (UTC)

:::::::The IPs change with every request, and the IPs are sourced to home broadband users globally, so they are not detectable by CIDR block. I don't know how they got blocked, maybe Cloudflare is on this service and recorded all of the IPs. -- GreenC 14:46, 25 March 2023 (UTC)

::::::::Then I suppose your proxy strategy is OK. Please make sure your web proxy has high anonymity if all of your configuration works fine. -- NmWTfs85lXusaybq (talk) 15:20, 25 March 2023 (UTC)

:::::::::I ran this bot-block avoidance script and it took forever. What I discovered is just about every link should be archived. Either 404, soft-404 or better-off-dead. The later because the links went to content that was behind a paywall or otherwise messed up in some way - so the archived version is better in nearly every case. -- GreenC 14:17, 3 April 2023 (UTC)

::::::::::I see you mentioned some awk scripts as a workaround at Wikipedia:Link_rot/URL_change_requests#businessweek.com. However, I can't find the meta directory businessweek.00000-10000 you referred to in the Github repo of InternetArchiveBot and WaybackMedic. NmWTfs85lXusaybq (talk) 07:15, 24 April 2023 (UTC)

:::::::::::Oh that's a note to myself, if you want the awk script let me know it's nothing more than going through a list of URLs, pausing between each to avoid rate limiting, getting the headers and recording the results and if it's a bot block header notify and abort the script. It also shuffles the agent string. It seemed to learn agent strings and block based on those which could be avoided by retiring an agent and adding a new one. -- GreenC 13:47, 24 April 2023 (UTC)

Archiving chapter urls

This is a bit of an edge case with GreenC bot's archive repair task, so I wanted to get your opinion. In several articles where I'm citing an archived book that has separate PDFs for each chapter, I use the |archive-url= parameter for the chapter url (since that's the most important one) and have a Wayback url for the book url in the |url= field. It's not ideal, but I'm not sure how else to handle it. My brief search also found this thread where you indicated that |archive-url= was okay to use for the chapter url. However, GreenC bot switches the |archive-url= field to be the archive of the |url= field (example here).

Is there a better way to format these citations? I'm not able to find any. Otherwise, is there any way I can mark the citations to be ignored by the bot? This seems like a relatively rare case; I imagine it's not worth modifying the bot to handle. Thanks, Pi.1415926535 (talk) 22:14, 14 August 2023 (UTC)

:Special:Diff/1170358971/1170410520. Another option:

::{{Cite book |last=Vanasse Hangen Brustlin, Inc |url=http://greenlineextension.eot.state.ma.us/docs_beyondLechmere.html |title=Beyond Lechmere Northwest Corridor Study: Major Investment Study/Alternatives Analysis |date=August 2005 |publisher=Massachusetts Bay Transportation Authority |archiveurl=https://web.archive.org/web/20160705134151/http://greenlineextension.eot.state.ma.us/docs_beyondLechmere.html |archivedate=July 5, 2016}} {{webarchive |url=https://web.archive.org/web/20160705151132/http://greenlineextension.eot.state.ma.us/documents/beyondLechmere/MIS8-05-Chapter4.pdf |date=2016-07-05 |title=Chapter 4: Identification and Evaluation of Alternatives – Tier 1}}

:I like this better because it doesn't hack the cite book template arguments. The downside is the display is a little messier. Another way with some duplication:

::{{Cite book |last=Vanasse Hangen Brustlin, Inc |chapter-url=http://greenlineextension.eot.state.ma.us/documents/beyondLechmere/MIS8-05-Chapter4.pdf |chapter=Chapter 4: Identification and Evaluation of Alternatives – Tier 1 |title=Beyond Lechmere Northwest Corridor Study: Major Investment Study/Alternatives Analysis |date=August 2005 |publisher=Massachusetts Bay Transportation Authority |archiveurl=https://web.archive.org/web/20160705134151/http://greenlineextension.eot.state.ma.us/docs_beyondLechmere.html |archivedate=July 5, 2016}} From {{webarchive |url=https://web.archive.org/web/20160705134151/http://greenlineextension.eot.state.ma.us/docs_beyondLechmere.html |date=2016-07-05 |title=Beyond Lechmere Northwest Corridor Study: Major Investment Study/Alternatives Analysis}}

:To keep the bot off the citation add {{tlx|cbignore}} template after the end of the cite book but inside the ref tags. -- GreenC 02:17, 15 August 2023 (UTC)

::Thanks, much appreciated. Pi.1415926535 (talk) 17:15, 15 August 2023 (UTC)

:::{{ping|GreenC}} Please take a look at Special:Diff/1171111146, where the bot edited several citations already tagged with {{tl|cbignore}}. Thanks, Pi.1415926535 (talk) 06:35, 21 August 2023 (UTC)

::::I found two problems. 1) The {{tld|cbignore}} should follow directly after the template it targets: Special:Diff/1171510462/1171514730 - I think the cbignore docs has this. 2) My bot has a known limitation. Within any block of text between new lines (ie. a paragraph of text), if there is more than one cbignore, the citations the cbignore follows all need to be unique. In this case the two citation are mirror copies. The bot ignored the cbignore for that reason (it has to do with disambiguate it needs to know which citation to target). So, I modified one of the citations, they are now unique: Special:Diff/1171514730/1171514803 (changed the semi-colon to colon in the publisher field for the first citation) -- a bit quirky but tested and it works now. I do recommend though using the alt suggestions above because while my bot honors cbignore most other bot's do not and eventually in the future it's probable some other tool will try to "fix" what it detects as an error (archive URL in the url field). -- GreenC 15:45, 21 August 2023 (UTC)

Incorrect dead flags and archive.today

Hello {{u|GreenC}}! Your bot recently made [https://en.wikipedia.org/w/index.php?title=Pok%C3%A9mon&diff=1171107668 this strange edit] to Pokémon. In it, the bot changed "archive.is" and "archive.ph" to "archive.today". I'm not sure what purpose this has. The task is not explained on User:GreenC bot.

Furthermore, the bot flagged these three sources as dead:

  • https://www.theguardian.com/technology/gamesblog/2013/oct/11/pokemon-blockbuster-game-technology
  • https://order.mandarake.co.jp/order/detailPage/item?itemCode=1052117728
  • https://www.nytimes.com/1997/12/20/news/big-firms-failure-rattles-japan-asian-tremors-spread.html

But as you can see, the above links are not dead. So something must've gone wrong there. I've [https://en.wikipedia.org/w/index.php?title=Pok%C3%A9mon&diff=1171158544 remarked] these refs as live. Cheers, Manifestation (talk) 11:04, 19 August 2023 (UTC)

:Archive.today is what the owner of archive.today wants us to use, it's a redirector that sends traffic to other domains as they are available. The reason those three got marked dead is there was an archive URL in the {{para|url}} field and the bot moved it to the {{para|archive-url}} field and the bot assumes if someone put an archive URL in the main {{para|url}} field it was probably a dead URL. -- GreenC 14:47, 19 August 2023 (UTC)

::{{re|GreenC}} Aaah! So that's why. I wrote the text, so I take full responsibility for the url= / archive-url= mixup. As for archive.today: I looked at our article, and it cites [https://twitter.com/archiveis/status/1081276424781287427 this tweet] from 4 January '19 in which the owner states that the .is domain might stop working soon. However, the domain is still active. In fact, the '@' handle used by the account to this day is still "@archiveis". I've used archive.today many times, including this year. It always gave me either a .is or a .ph link. Cheers, Manifestation (talk) 15:07, 19 August 2023 (UTC)

:::Yeah it redirects to one of the 6 domains like .is or .ph .. but if one of those domains gets shut down by the registar, he can switch where it redirects to easily, without having to change every link on Wikipedia. -- GreenC 15:24, 19 August 2023 (UTC)

::::Hmm ok. Well I guess we should honor his/her request then. For the sake of clarity, maybe the description of Job #2 / WaybackMedic 2.5 on User:GreenC bot could be expanded a little to include a mention of archive.today? archive.today is not part of the Internet Archive, so the term "WaybackMedic" is a bit misleading. - Manifestation (talk) 16:03, 19 August 2023 (UTC)

:::::Alright I updated fix #21 which also now links to Help:Using_archive.today#Archive.today_compared_to_.is,_.li,_.fo,_.ph,_.vn_and_.md. It started out as Wayback-specific then expanded to all archive providers but I kept the original name anyway. -- GreenC 16:41, 19 August 2023 (UTC)

::::::@GreenC Hi! I know that .today is the domain to be used, but every time i try to open a link with .today it returns me a "This site cannot be reached" type of error, and the same goes with .ph links. The only active links i get are the one with .is Astubudustu (talk) 10:55, 2 April 2024 (UTC)

:::::::This is because the DNS resolver you are using is hosted on CloudFlare and that won't work (well) with archive.today domains see Archive.today#Cloudflare_DNS_availability -- GreenC 15:38, 2 April 2024 (UTC)

WaybackMedic 2.5 adding unneceesary URLs

I saw the bot's task run on Guardians of the Galaxy (film) [https://en.wikipedia.org/w/index.php?title=Guardians_of_the_Galaxy_%28film%29&diff=1172020348&oldid=1171941348 here] and it made edits to three references that used {{tl|Cite Metacritic}}, {{tl|Cite Box Office Mojo}}, and {{tl|Cite The Numbers}}, adding in unnecessary URLs and marking the links as dead. The citation templates construct the urls from the given parameters (as most follow a common format on those sites) and were not dead. Didn't know if this was a bot issue, or the templates themselves doing something that is flagging the citations to make the bot adjust them. I can look into the templates to see what the issues may be if that is ultimately the case (and to know what to look for for the error). - Favre1fan93 (talk) 14:16, 24 August 2023 (UTC)

:That is a bot error. It is in 9 articles. I rolled them back (you got 2). Thanks for the report. -- GreenC 15:00, 24 August 2023 (UTC)

::No problem, thank you! - Favre1fan93 (talk) 15:26, 24 August 2023 (UTC)

Timestamp mismatch

This bot is changing the archive-url as [https://en.wikipedia.org/w/index.php?title=Aahvaanam&diff=1173401818&oldid=1167493279 seen here], but it is not changing the archive-date as required, creating a timestamp mismatch error, as [https://en.wikipedia.org/w/index.php?title=Aahvaanam&oldid=1173401818#cite_note-4 seen here]. I just recently emptied this category and now it has over 80 articles (when I wrote this) in it again. Your help would be appreciated. Thanks. Isaidnoway (talk) 05:57, 2 September 2023 (UTC)

:I am aware, did it in two steps, because of the way this particular job was programmed, it was easier this way. You saw it in that 30-minute gap between runs-- GreenC 16:11, 2 September 2023 (UTC)

My bot can empty that category easily. It was 40,000 a week ago. Got it down to few hundred edge cases, which I assume you fixed manually, thank you. I'd like to fully automate it, but right now it's all integrated into WP:WAYBACKMEDIC which can't be fully automated, so I run it on request. -- GreenC 16:16, 2 September 2023 (UTC)

User:Isaidnoway, I'm running a bot job to convert archive.today URLs from short-form to long-form. Example. It is exposing old problems with date mismatches that are showing up in :Category:CS1 errors: archive-url -- after this bot job completes, I'll run another bot to fix the date mismatches, it will clear the tracking cat. No need to do anything manually. -- GreenC 04:57, 8 September 2023 (UTC)

:Hi GreenC! My bot is following yours today. There were several instances when your bot reformatted archive URLs like this edit, mine fixed the archive dates like my bot did in the following edit. My bot is running on :Category:CS1 errors: dates, and pulling the archive date from the archive URL. Any chance your bot could do it all in one edit? Thanks! GoingBatty (talk) 18:25, 8 September 2023 (UTC)

::I used to be able to fix archive.today problems and date mismatches in the same process, but it was semi-automated. Fixing archive.today problems can and should be full-auto, so I separated that out to its own process that uses EventStream to monitor real-time when a new short-form link shows up, log the article name, and once a month or so fix them - all full-auto. [https://uk.wikipedia.org/w/index.php?title=Judas_Priest&diff=prev&oldid=40360299 Across 100s of wikis]. The downside is this program can't fix date mismatch problems. I want to fix date mismatches automatically, and hope to do that eventually with its own process. Once I have that developed I can see about including it in the archive.today program, so it saves the extra edit, when the source of the date mismatch is archive.today short to long conversion.

::The tracking category will be cleared in the next few hours, it's currently generating diffs. This is a one-off event clearing out the backlog of archive.today problems which exposed a lot of problems. Going forward there will be much smaller numbers. We both currently have bots that can clear that category on request, do you know how to update the docs for the category page? -- GreenC 23:41, 8 September 2023 (UTC)

:::Not sure which category page you're referring to, but most of the text on these category pages comes from Help:CS1 errors, so if you updated the help page, it would also appear on the appropriate category page. GoingBatty (talk) 03:15, 9 September 2023 (UTC)

:::::Category:CS1 errors: archive-url. Do you want me to include your bot in the doc as available to clear the cat on-request? I'm going to mention WaybackMedic is available, but only if there are more than 500 entries. -- GreenC 14:25, 9 September 2023 (UTC)

:::::I don't have a bot to clear :Category:CS1 errors: archive-url. GoingBatty (talk) 18:21, 9 September 2023 (UTC)

::::::Oh I see I misinterpreted what you said above I thought it was fixing mismatched dates but it was actually fixing an incomplete date. -- GreenC 19:12, 9 September 2023 (UTC)

Economy of Zimbabwean

I need some help Mindthem (talk) 21:13, 25 September 2023 (UTC)

:@Mindthem: How would you like the bot to help with the Economy of Zimbabwe article? GoingBatty (talk) 19:20, 29 September 2023 (UTC) {{tps}}

Bot put italics in strange places

I don't know what [https://en.wikipedia.org/w/index.php?title=Counsel%27s_Opinion&diff=1180484599&oldid=1108222641 happened here], but the bot appears to have put italics in place where they didn't belong, and then missed putting them in [https://en.wikipedia.org/w/index.php?title=Counsel%27s_Opinion&diff=1180927666&oldid=1180484599 where they did belong]. Given that the bot had to edit three times, I imagine this bot run was stressful for you. If this code is still active, it might need yet another debugging. – Jonesey95 (talk) 18:26, 19 October 2023 (UTC)

:Yeah this was a pain, every time I thought it was done, some new issue came up. And getting those ticks right, in the right place, after the fact, wasn't easy. Anyway this task is done for me (1,200 articles deletion of {{tlx|BFI}}). If you see any problems they need manual adjustment. I don't think the number of problems is very large from spot checking. -- GreenC 18:35, 19 October 2023 (UTC)

::I think you are correct, based on my perusal of the list of Linter errors. – Jonesey95 (talk) 18:54, 19 October 2023 (UTC)

Buck Goldstein

Hi there! In this edit, your bot changed an incorrect {{para|url}} parameter, which added the article to :Category:CS1 errors: URL‎‎. Should the bot have done something different, or should it ignore the {{para|url}} parameter and only update the {{para|archiveurl}}/{{para|archive-url}} parameter? Thanks! GoingBatty (talk) 06:02, 18 December 2023 (UTC)

:You mean Special:Diff/1187499427/1190066019. The bot that runs this process is a global bot, it is not programmed to handle templates in different languages, it only operates on the URL itself, not with template knowledge. The bot didn't do anything wrong, that wasn't already there; it's only purpose is to normalize archive.today URLs wherever they happen to be. If that caused the pre-existing error to be exposed in the tracking cat, it's a step forward. -- GreenC 06:32, 18 December 2023 (UTC)

bug report

At this edit, GreenC bot copied a malformed wayback machine url from {{para|url}} into {{para|archive-url}}. It ought not to have done it like that.

The wayback machine url is malformed because its timestamp is not an acceptable length (14 digits preferred, 4 or 6 tolerated). cs1|2 emits an error message for single-digit timestamps and another error message when the values assigned to {{para|url}} and {{para|archive-url}} are the same.

Trappist the monk (talk) 01:46, 30 January 2024 (UTC)

:Also, not clear where {{para|archive-date|2007-06-15}} came from.

:—Trappist the monk (talk) 01:49, 30 January 2024 (UTC)

Bug report: Incorrect archive-date

Hi there! In this edit, the bot added {{para|archive-date|18990101080101}}. Is there something you could add to the bot to prevent the addition of incorrect dates such as this? Thanks! GoingBatty (talk) 18:22, 30 January 2024 (UTC)

:I do have warnings but apparently was lazy and forgot to check the logs. -- GreenC 20:08, 30 January 2024 (UTC)

bug report (2)

{{cl|CS1 errors: archive-url}} recently bloomed. I have just fixed these four articles broken by Wayback Medic 2.5:

Every error was a {{para|archive-date}} mismatch with the {{para|archive-url}} timestamp. {{para|archive-date}} was always off by one day; always earlier than the time stamp except for this one from 2024 Noto earthquake.

Trappist the monk (talk) 18:57, 1 February 2024 (UTC)

:And then there is this one that is off by a couple of weeks, this one off by a year. So it looks like what I wrote above may not hold much water...

:—Trappist the monk (talk) 19:08, 1 February 2024 (UTC) 19:37, 1 February 2024 (UTC)

The date mismatch error preexisted. The bot only made it more obvious, so that CS1|2 error-checking is now able to see it. I would prefer to fix the archive-date at the same time as expanding archive.today URLs from short to long form (per RfC requirement). However this task is universal it operates on many wiki language sites, it does not have knowledge of template names or arguments in other languages. It only expands a URL wherever it may be, it doesn't look at templates. That would require another universal bot I guess, that can operate on CS1|2 templates in multiple languages. If you want to write one, I have the approval to run it. The reason the dates are frequently offset by 1 day, users add an archive.today link they just created, set {{para|archive-date}} to their relative location, but the archive.today uses UTC time, which has already passed into a new day. The ones offset by a week or year are user entry errors. -- GreenC 21:49, 1 February 2024 (UTC)

:User:Trappist the monk: I have written a separate bot that fixes the date mismatch error populating {{cl|CS1 errors: archive-url}}. Example Special:Diff/1248926553/1248972462. It retrieves the date from the "suggested" date, generated by tCS1|2 in the HTML warning message. This way it can run on other language wikis without needing to deal with language differences. It falls back to ISO mode if it can't get a suggestion. Do you think it is OK to rely on the "suggested" date generated by CS1|2? -- GreenC 14:25, 2 October 2024 (UTC)

::The suggested date is simply the date portion of the archive-url timestamp formatted according to the format specified by {{para|df}} → the global {{tld|use xxx dates}} → format of the date in {{para|archive-date}} → YYYY-MM-DD. Getting the date from the html seems a reasonable thing to do; the grunt work has already been done.

::—Trappist the monk (talk) 15:05, 2 October 2024 (UTC)

bug report (3) Bot ignores cbignore

Here https://en.wikipedia.org/w/index.php?title=Scott_Boman&diff=1226780032&oldid=1226698153 I noticed that the bot edited an external link with cbignore after it. I compared the links before and after the edit to see why the cbignore template was there. The long and short links are from different dates and display different content. The altered link no longer contained the relivent content. This would not matter if the bot observed the cbignore.--198.111.57.100 (talk) 17:05, 4 June 2024 (UTC)

:OK this problem is complicated. There are multiple things going on.

:* All short-form archive.today links need to be expanded to long form. This is required as Wikipedia does not allow URL shortening which has security problems.

:* Archive.today has a bug. When saving links from WebCite, it incorrectly gives the long form.

:::Incorrect: http://archive.today/UfV6G --> https://archive.today/20121120012223/http://romeoareateaparty.org/wordpress/2012-candidates-2/races/u-s-senate/

:::Correct: http://archive.today/UfV6G --> https://archive.today/20121120012223/https://www.webcitation.org/6CIutMLaZ?url=http://romeoareateaparty.org/wordpress/2012-candidates-2/races/u-s-senate/

:Notice the "Correct" version includes the original WebCite URL. The "Incorrect" version excludes the WebCite URL.

:* GreenC bot has a bug in that it can't see cbignore when making these changes.

:* GreenC bot has a bug in so far as it doesn't detect the Archive.today bug

:So I need to make some adjustments to work around the Archive.today bug. I also need to report the bug to Archive.today though there is no guarantee they will fix it. -- GreenC 17:28, 4 June 2024 (UTC)

::*Update the bug is reported to Archive.today -- GreenC 18:14, 4 June 2024 (UTC)

::*:Archive.today fixed it. -- GreenC 21:01, 4 June 2024 (UTC)

:::Thank you!--198.111.57.100 (talk) 16:27, 6 June 2024 (UTC)

Job 18 showing up in WPCleaner

I'm running the WPCleaner and noticed that Error 95 (Editor's signature of link to user space) has flagged the bot, specifically Job 18, on a ton of pages (Arundhathi Subramaniam is one to give an example). It looks like the bot signature is in the "reason" field of the template

{{verify source |date=September 2019 |reason=This ref was deleted Special:Diff/893567847 by a bug in VisualEditor and later restored by a bot from the original cite located at Special:Permalink/893405019 cite #4 - verify the cite is accurate and delete this template. User:GreenC bot/Job 18

I don't have a count of the pages, but it's not an insignificant amount from what I can see. Lindsey40186 (talk) 02:16, 11 June 2024 (UTC)

:I don't know about WPCleaner, or what the error message means. It was an old bot job, that no longer runs. It was a peculiar and difficult situation. -- GreenC 03:56, 11 June 2024 (UTC)

Typo

After Wikipedia:Link_rot/URL_change_requests#deccanchronicle.com, the bot is adding links to Deccan Chronical instead of Deccan Chronicle. See [https://en.wikipedia.org/w/index.php?title=Karthik_Subbaraj&diff=1227476768&oldid=1227217248] and [https://en.wikipedia.org/wiki/Special:WhatLinksHere/Deccan_Chronical]. DareshMohan (talk) 18:59, 14 June 2024 (UTC)

:Oh sheesh, thanks. Fixed Special:Diff/1228320785/1229089609 in 829 pages . -- GreenC 20:17, 14 June 2024 (UTC)

Thanks

Hey, I just want to say thank you for using the Wayback Machine for MTV News for my citations. Can you do that for Drag-On's album Hell and Back? Ill post the original link. JuanBoss105 (talk) 13:30, 2 July 2024 (UTC)

:Hey, I found a link to a MTV.com source that can be used for Rocafella. Can you add it using the wayback machine?

:https://www.mtv.com/news/c1psz3/state-property-members-stress-independence-dont-take-orders&ved=2ahUKEwiS1cGYwIiHAxUdD1kFHf0oCVYQFnoECCIQAQ&usg=AOvVaw1m9yMSZqvcQC7xuV2PKS9D JuanBoss105 (talk) 13:53, 2 July 2024 (UTC)

::User:JuanBoss105: I found an archive URL with a different source URL: https://web.archive.org/web/20150122173241/http://www.mtv.com/news/1498885/state-property-members-stress-independence-dont-take-orders/

::I found it using the archive's search feature: [https://web.archive.org/mtv.com/search/%22state%20property%20members%20stress%20independence%22 Search: "State Property Members Stress Independence"].

::You can find other archive URLs at MTV.com this way.

::For example in Special:Diff/1231668617/1232196891 you added https://www.mtv.com/news/v0uzg8/norah-jones-tops-a-mil-at-1-kanye-west-settles-for-2 you can find the archive URL by going to this search page: [https://web.archive.org/mtv.com/search/%22norah%20jones%20tops%20a%20mil%22 Search: "Norah Jones tops a mil"]. -- GreenC 16:07, 2 July 2024 (UTC)

Tampabay.com

Stop running this right now on tampabay.com links. Every one I've checked is wrong. It is adding archive links (okay) to currently live articles, and tagging them as dead (wrong). Also is overriding explicit |url-status=dead to |url-live when it encounters redirects to the main page of tampabay.com. Tired of fixing these because GreenC bot is on a roll.   ▶ I am Grorp ◀ 00:21, 12 July 2024 (UTC)

: Clarification: Not every single instance, but too many, for sure.   ▶ I am Grorp ◀ 00:31, 12 July 2024 (UTC)

Oh shoot, looks like they used an exotic redirect mechanism, it fooled the bot. I have a way around it, but this is the first I became aware of it. I'll have to reprocess. Anyway, thanks for the info. BTW you should post error reports in the section linked in the edit summary, that is the discussion for this job. -- GreenC 00:38, 12 July 2024 (UTC)

: {{reply|GreenC}} That was gibberish to me so I found this talk page. I just now put a link from there to here. You're welcome to copy this over there, and delete this thread, if that makes more sense. I'll watchlist both.   ▶ I am Grorp ◀ 00:42, 12 July 2024 (UTC)

:: Not all of the edits were incorrect or needed correcting. If you want a list of which ones I corrected, then they're in my contributions list from 22:10, 11 July 2024 to 00:37, 12 July 2024 (UTC). All but the first of my corrections has "GreenC bot" in the edit summary. (I edit in a topic area that relies heavily on tampabay.com, many of which are on my watchlist.)   ▶ I am Grorp ◀ 00:53, 12 July 2024 (UTC)

Grorp,

  1. Special:Diff/1233941553/1233989259 - this appears to be a one-off, maybe a network transient. When I run the page again (locally) the problem does not happen. I'd be surprised there are more like this. It can happen but I don't think it's systematic or common. If you see more, let me know.
  2. Special:Diff/1233948702/1233989465 - exotic redirect problem noted above
  3. Special:Diff/1233957098/1233990527 - ditto
  4. Special:Diff/1233959661/1233991011 - archive.today I manually verify beforehand. This one is a manual verification error, which is rare, but not impossible. I can provide a list of the archive.today URLs that were added (193).

I can redress the exotic redirect, which looks to be limited to URLs ending in .ece -- GreenC 01:29, 12 July 2024 (UTC)

:Update: I found 29 instances of the exotic redirect, among the set of 6,846 pages, or less than 1/2 of one percent. Of the archive.today error, there was one in 193, or about the same 1/2 of one percent. Thanks for the report, find any other problems let me know. -- GreenC 02:42, 12 July 2024 (UTC)

:: Thanks. Will do.   ▶ I am Grorp ◀ 05:45, 12 July 2024 (UTC)

{{od}}I have no idea how to decipher/restore/resurrect these old pqarchiver links (like in your fourth example above). If there's a writeup, or some tips, please point me in the right direction. I do come across these [https://en.wikipedia.org/w/index.php?search=insource%3A%22pqarchiver%22+scientology&title=Special:Search&profile=advanced&fulltext=1&ns0=1 fairly regularly] in this topic area I edit; many point to old sptimes.com news articles (St Petersburg Times was bought out by Tampa Bay Times). If there is any way I can resurrect an actual copy of some of these old articles, I'd like to try to fix some of them.   ▶ I am Grorp ◀ 05:45, 12 July 2024 (UTC)

:I found 63 pqarchiver links (out of the 193 archive.today links added) and they all worked, except this one. If it doesn't exist at archive.org or archive.today it's probably gone forever need to find an alternate source probably. -- GreenC 06:09, 12 July 2024 (UTC)

Other wikis

Do you ever deploy the bot to other wikis to assist with link maintenance and updates? Imzadi 1979  18:20, 28 July 2024 (UTC)

:It's a very big job to internationalize the bot for templates, dates etc - I'd like to eventually. But it does update links in the IABot database (iabot.org), and IABot then updates 300+ wikis based on the contents of the database. Thus when my bot discovers a dead link on enwiki, it updates enwiki adding an archive URL, then also updates the IABot database changing the status to "dead" and adding the archive URL into the database. Then IABot scans the 300+ other wikis and when it finds that link, it adds the archive URL, taken from the database. -- GreenC 18:55, 28 July 2024 (UTC)

::I was curious if it would work on the [https://wiki.aaroads.com AARoads Wiki], which uses the same templates as the English Wikipedia, so no internationalization needed. Imzadi 1979  19:12, 28 July 2024 (UTC)

:::IABot would be better since it continuously scans pages and fully automatic replace dead links. WaybackMedic does more specialized work on a per-domain basis for many types of issues with manual oversight. A good place to post a request is https://meta.wikimedia.org/wiki/User_talk:InternetArchiveBot -- GreenC 20:49, 28 July 2024 (UTC)

bot destructive

I just had to a manual purge on Eyjafjallajökull after bot had visited as page was from the top displaying The time allocated for running scripts has expired.The time allocated for running scripts has expired. The time allocated for running scripts has expired. This is a complex page calling in a couple of data rich templates usually rendered well within normal parsing allowance of 10 seconds but if the wikipedia infrastucture is under load can fail on an edit. The bot accordingly presently needs a (? manual} check of page output after every use. Often the fail is towards the end of such a page with the references so only obvious on a full page manual skim. Please ensure you do this as many high quality pages have reference lists running into 100's with processing times about the 5 second mark. ChaseKiwi (talk) 21:16, 3 August 2024 (UTC)

Bug report - templates in images in infoboxes

Just wanted to flag Special:Diff/1239809626, doesn't seem to recognise there's a template in that URL. Primefac (talk) 12:03, 12 August 2024 (UTC)

:Oops my regex was stopping at "}" instead of "{" had it reversed. Thanks. -- GreenC 18:23, 12 August 2024 (UTC)

Job 15 GA mismatches stoppage

User:GreenC bot/Job 15 (GA mismatches) has stopped after Wikipedia:Good articles/all was [https://en.wikipedia.org/w/index.php?title=Wikipedia%3AGood_articles%2Fall&diff=1237436963&oldid=1229147724 edited]. Adabow (talk) 10:07, 13 August 2024 (UTC)

:User:Adabow, because of Special:Diff/1229147724/1237436963 by User:Beland. The bot is not aware of Wikipedia:Good articles/all2. It aborted because the number of entries in Wikipedia:Good articles/all is below a magic number ie. it looks suspicious. Everything worked, except I neglected to add an email reminder (only logs) so I didn't notice. Thanks for the ping. -- GreenC 16:17, 13 August 2024 (UTC)

::User:Beland could you verify the lists are correct? There appears to be duplication at the top with two table of contents, for example two entries for "Agriculture, food, and drink". There is also a line that says "View the entire list of all good articles or" in which points to Wikipedia:Good articles/all .. is that still accurate? -- GreenC 16:22, 13 August 2024 (UTC)

:::The duplicate TOCs were being transcluded from the per-topic pages. I suppressed them with "noinclude" tags. The link from subpages still points to /all, but once readers get there they will see "all" is split between /all and /all2. I think that's probably fine for now, unless we want to just stop altogether with combining multiple per-topic pages into one or two massive scrollable lists. -- Beland (talk) 20:33, 13 August 2024 (UTC)

::I think this change could break three bots: FACBot, LivingBot, and GreenC bot. There is a message in the page that says changes to the page layout will break the bots (GreenC bot not mentioned I will add it later). Bots should be notified given time to adjust. (looks like the two bots were notified, ty) There might be other tools and bots as well. -- GreenC 16:34, 13 August 2024 (UTC)

::Actually it looks like the creation of "all2" was in February: Special:Diff/1066123344/1229147724 .. so my bot has not been running properly since. Trying this to better communicate: Special:Diff/1237436963/1240124928 -- GreenC 16:46, 13 August 2024 (UTC)

Archive.today isn't accessible from Italy

Hi, I saw your bot replaced archive.is links with the respective archive.today ones in some pages on italian Wikipedia ([https://it.wikipedia.org/w/index.php?title=Capitoli_de_Le_bizzarre_avventure_di_JoJo&diff=141129393&oldid=141126904 here is an example]). However archive.today redirects to archive.ph, which has apparently been [https://www.commissariatodips.it/profilo/centro-nazionale-contrasto-pedopornografia-on-line/index.html blocked] by italian Internet providers after being reported by police for hosting illegal content. [https://imgur.com/a/archive-ph-status-italy-as-of-09-17-24-Dg9GfZY This] is a screenshot I took and [https://feddit.it/post/6613497 here] are other people talking about it. I wanted to warn you about this because now archived URLs aren't accessible and can't be checked without using proxies. Hope you can fix this. Un mondo a stelle e strisce (talk) 15:55, 17 September 2024 (UTC)

:User:Un mondo a stelle e strisce, thank you for this information. Archive.today has problem sometimes. They created multiple domains: archive.is, .fo, .li, .today, .vn, .md, .ph .. do you know if all are blocked in Italy? I read the [https://feddit.it/post/6613497 discussion] (6 months old) and this appears to be something done by the postal police? You could also try using a different DNS resolver that isn't going through Cloudflare, this is the problem for most people, due to a policy disagreement between Archive.today and Cloudflare -- GreenC 16:36, 17 September 2024 (UTC)

::archive.ph is the only one blocked, the others are all fine and working except for .today that redirects to it and therefore isn't accessible, too. According to the warning displayed when trying to reach the address, postal police took this measure because they found pedopornographic content on the website. I don't think the problem has anything to do with Cloudflare, as the page is still accessible via proxy. Un mondo a stelle e strisce (talk) 21:12, 17 September 2024 (UTC)

:::If you want, we can change everything to .is or whichever. In the mean time, I have disabled the twice-monthly process that converts everything to .today -- GreenC 21:24, 17 September 2024 (UTC)

::::Yes, replacing things with .is would be great, thanks for your help. Un mondo a stelle e strisce (talk) 08:26, 18 September 2024 (UTC)

:::::User:Un mondo a stelle e strisce, changed the first 3,000 pages, which is about 10%, then wait time before continuing ([https://it.wikipedia.org/w/index.php?title=Levitating&diff=prev&oldid=141266255 example]). -- GreenC 01:32, 26 September 2024 (UTC)

User:Un mondo a stelle e strisce, this job is complete. Keep in mind, archive.today will continue to be added in many ways, by editors and bots. If you want to clear them out again, drop me a note. Or if this ban is ever lifted, drop me a note. Cheers. -- GreenC 15:18, 17 October 2024 (UTC)

:Yes, I'll let you know about eventual further developments. Thanks very much for your help. Un mondo a stelle e strisce (talk) 16:14, 17 October 2024 (UTC)

"url-status=usurped" causes a CS1 message

Hi GreenC!

I just noticed that the GreenC bot has flagged many refs as part of an effort to combat the passive spamming of the Judi gambling syndicate.

However, | url-status=usurped is currently causing a CS1 maintenance message. I am seeing these messages because I opted to make them visible through my common.css. Normally, they can not be seen.

When I preview a page with a usurped ref, it shows this warning at the top:

:"Script warning: One or more (...) templates have maintenance messages; messages may be hidden (help)."

Also, with me, the altered refs have this bit tagged at the end:

:"CS1 maint: unfit URL (link)"

See :Category:CS1 maint: unfit URL, which currently has 48,594 entries.

Again, the maintenance message is normally not visible, not even to logged-in users. So this isn't an acute problem.

I believe the maintenance message is shown incorrectly. If the URL has been usurped, but the original page was properly archived, then the ref as used on Wikipedia is probably not "unfit", right? What can be done about this?

Cheers, Manifestation (talk) 19:09, 24 September 2024 (UTC)

:It looks like we are tracking all usages of unfit/usurped even legitimate uses and this automatically creates a maintenance message. I don't know what the rationale is. Maybe someone wants to know where the usurped URLs are? -- GreenC 19:35, 24 September 2024 (UTC)

::I have [https://en.wikipedia.org/w/index.php?title=Help_talk:Citation_Style_1&diff=1247745526 started] a thread about this at Help talk:Citation Style 1. This has to be a bug. Cheers, Manifestation (talk) 19:41, 25 September 2024 (UTC)

Oil for your bot

style="background-color: #fdffe7; border: 1px solid #fceb92;"

|rowspan="2" style="vertical-align: middle; padding: 5px;" | 100px

|style="font-size: x-large; padding: 3px 3px 0 3px; height: 1.5em;" | Oil for your bot

style="vertical-align: middle; padding: 3px;" | A hard working bot deserves a refreshing glass of motor oil! Big Blue Cray(fish) Twins (talk) 09:26, 18 November 2024 (UTC)

Question about the Wikipedia:Good_articles/all page

Please see my question at Wikipedia_talk:Good_articles/all#Question to the bots. (I wrote it there because I am asking the same question to 3 bots.) Thank you. Prhartcom (talk) 20:03, 9 December 2024 (UTC)

bot "reformatting" valid dates to URL strings

I noticed a few edits such as [https://en.wikipedia.org/w/index.php?title=Beautiful_(Mariah_Carey_song)&diff=prev&oldid=1263681013 this one] where the bot replaced the date with a portion of the URL. Looks like it's specifically happening with webcitation.org URLs that don't have actual dates within the URL itself. = paul2520 💬 19:57, 18 December 2024 (UTC)

:User:Paul2520, problem in the bot fixed. I see you corrected the pages about 27. -- GreenC 21:34, 18 December 2024 (UTC)

:(BTW that string is actually a date encoded in base62 - this [https://github.com/greencardamom/WebCiteBase62Decoder repo] will decode) -- GreenC 21:37, 18 December 2024 (UTC)

::TIL! Thanks for clarifying (and fixing). = paul2520 💬 22:19, 18 December 2024 (UTC)

Bot makes error when seeing non-CS1 already-archived URLs

The domain xenu-directory.net was usurped, and a few days ago I manually checked/fixed all occurrences in mainspace. In the case where a citation uses the square bracket method like {{code|[https://web.archive.org/web/...etc. title-text]}} and the url is already a wayback machine URL, your bot is incorrectly adding {{tl|usurped}} which renders for readers as little "usurped" tags. (The bot works just fine when the citation is CS1 template style.)

The following recent edits by GreenC bot include an incorrect tag (and it's still running and adding more):

  1. https://en.wikipedia.org/w/index.php?title=Aaron_Saxton&curid=26660450&diff=1264164606&oldid=1262817586
  2. https://en.wikipedia.org/w/index.php?title=APA_Task_Force_on_Deceptive_and_Indirect_Methods_of_Persuasion_and_Control&curid=7589571&diff=1264169964&oldid=1262815584
  3. https://en.wikipedia.org/w/index.php?title=Brain-Washing_(book)&curid=7881283&diff=1264176579&oldid=1262818821
  4. https://en.wikipedia.org/w/index.php?title=Citizens_Commission_on_Human_Rights&curid=20949376&diff=1264181897&oldid=1262810467
  5. https://en.wikipedia.org/w/index.php?title=Hubbard_v_Vosper&curid=37647003&diff=1264202456&oldid=1262815517
  6. https://en.wikipedia.org/w/index.php?title=Inside_Scientology:_How_I_Joined_Scientology_and_Became_Superhuman&curid=9497749&diff=1264203752&oldid=1262819362
  7. https://en.wikipedia.org/w/index.php?title=List_of_Masonic_buildings_in_the_United_States&curid=31726117&diff=1264221816&oldid=1262815969

I had to check about 2 dozen of your recent run (showed on my watchlist) and the above errors will need to be fixed by changing their citations into CS1 cite style. I really don't enjoy doing double work! Fix your bot to recognize when a citation is from archive.org instead of a usurped domain.   ▶ I am Grorp ◀ 03:48, 21 December 2024 (UTC)

:It's correct. See documentation for {{tlx|usurped}}. -- GreenC 03:51, 21 December 2024 (UTC)

Bot edit ate a bunch of unrelated text

See [https://en.wikipedia.org/w/index.php?diff=1268917035&oldid=1260881346&title=Josep_Maria_Subirachs]. The existing wikitext was obviously messed up (template inside an unclosed template inside of a reference), but the bot still shouldn't screw it up that badly. Jay8g [VTE] 05:41, 12 January 2025 (UTC)

:The nature of GIGO (Garbage In / Garbage Out). Not that I designed it that way. Probably it tried to find a closing }} and ate everything in between. Sometimes it might be a small amount other times a lot, depending on the article and location. I'd be happy to hear suggestions how to avoid this particular problem, or even better, someone to write a bot that detects and fixes these - AFAIK no one has ever been able to do it. I did make an attempt once and had some success but not fully automated. -- GreenC 06:13, 12 January 2025 (UTC)

Issues with comment inside reference

Happy New Year! I'm working through :Category:CS1 errors: dates and ran across a couple edits by your bot like [https://en.wikipedia.org/w/index.php?title=Juneau_Monument&diff=1268917726&oldid=1263467142 this one] where the citation template has the {{para|access-date}} commented out for some reason, and your bot doesn't seem to be expecting that. Happy editing! GoingBatty (talk) 21:32, 14 January 2025 (UTC)

:GoingBatty, Sorry, did you find many? I spent time looking at this, and have concluded the bot can't support this without significant work. For now I detect and skip the template. It can edit templates with wikicomments, just not edit fields within templates where the wikicomment co-exists. Also, I'm in the middle of a large batch that was started before this came up. Do you mind if I complete this batch, hopefully it won't be too many . -- GreenC 03:02, 15 January 2025 (UTC)

::I found two: the one I mentioned earlier and [https://en.wikipedia.org/w/index.php?title=Peter_Hessler_bibliography&diff=prev&oldid=1268927993 this one]. If I find a lot more, I'll let you know. Happy editing! GoingBatty (talk) 04:15, 15 January 2025 (UTC)

Bot contributing to CS1 errors URL backlog

on this edit, bot populated url field with wrong input that causes CS1 url error.––kemel49(connect)(contri) 16:48, 17 January 2025 (UTC)

:User:KEmel49: It's GIGO (Garbage In / Garbage Out). Notice https://web.archive.org/web/20190621113030/https:/filmography.bfi.org.uk/person/642186 it has "https:/filmograph" .. there is only one slash, should be two. BTW the point of error tracking categories is to catch errors, if the bot contributed to the category, it did the right thing - now we see where before it was obscure. In the 9 years I have been running this bot, it's the first time I have seen this error. I could try to check for and fix it, I'll take a look, but I think this is an extremely rare error.

:A better question is who made the error. The citation was originally fine, then deleted from the article Special:Diff/1261399458/1261401626. Then readded a few days later in its broken form Special:Diff/1261446328/1261487369. Even more weirdly, the citation has the fingerprints of reFill (the "website=web.archive.org") so it looks like it was copied in from a different article. Overall I think the edits made by User:Vialeoncino (who is a WP:SPA probably a COI - the article has a history of COI problems) looks poor quality and someone might consider reviewing or reverting the whole thing back to August 2024. -- GreenC 17:37, 17 January 2025 (UTC)

::I reverted the article back to August 2024, with a talk page section. -- GreenC 17:49, 17 January 2025 (UTC)

::I fixed all articles on Enwiki that have this problem (about 1,300) per Wikipedia:Link_rot/URL_change_requests#missing_slash (Example Special:Diff/1264570681/1270059998) -- GreenC 18:46, 17 January 2025 (UTC)

::I added a new feature to the bot to detect and fix, if encountered. -- GreenC 19:09, 17 January 2025 (UTC)

:::@GreenC, Thanks for taking time on that. I appreciate your work.––kemel49(connect)(contri) 00:50, 18 January 2025 (UTC)

Bot not escaping single-quote pairs in URLs

This is from July, so may already be fixed, but FYI just in case. Gamapamani (talk) 08:21, 25 January 2025 (UTC)

:OK. This may be a one off situation, I've never encountered before, probably because it may have bypassed some functions during the Google URL conversion process. I think your solution Special:Diff/1236608894/1271703664 works so long as the page is being displayed on a Wikimedia website, but it won't work anywhere else, because it mixes two encoding schemes in the URL: percent-encoding and wiki-encoding. It violates one of the core principles of URL encoding, that there be only 1 encoding scheme (percent encoding) discussed in [https://datatracker.ietf.org/doc/html/rfc3986 RFC 3986]. This is an unfortunate situation on Wikipedia broadly because many tools, reports, processes etc.. use URLs from Wikipedia and they often have trouble distinguishing which encoding scheme since one character might be percent and the next is wiki encoded, it's ambiguous what "{{" means outside the context of Wikimedia and even there it can create conflicting information since some URLs actually use those characters. Anyway I changed it here Special:Diff/1271703664/1271793897 thanks for the report. -- GreenC 19:18, 25 January 2025 (UTC)

::Gave you thanks earlier for fixing the "fix" :). I absentmindedly used {{tl|''}} and of course didn't intend to insert a double prime in there... Google seems to be pretty resilient, though (I did try the link out prior to submitting, but didn't notice the visual difference in the URL). I somewhat disagree with (or am missing something in) your point about encoding, though. {{}} is part of the syntax for generating HTML, it's not going to be in the URL (unless by mistake); if anyone is going to use raw wikitext instead, they better be able to parse it as well. It's just that the template didn't actually generate what I thought it would. Ironically (in terms of mixing encodings), the way to generate exactly the URL the goo.gl redirect gives out would seem to be ''. %27%27 is of course technically sound, but not textually identical. Gamapamani (talk) 09:16, 27 January 2025 (UTC)

:::You should always use percent encoding in URLs. The RFC 3986 is clear on this. Nobody has the resources to correctly determine every instance of {{ in a URL is actually a wiki template or part of the URL. Wikimedia rendering (usually) does, but bot and tool writers will and do have a hard time with that. Don't assume anything, percent encoding when creating URLs is the correct thing. -- GreenC 15:15, 27 January 2025 (UTC)

::::The RFC has no bearing on how URLs are generated, otherwise we'd end up disallowing string concatenation in any programming language, etc. I can sympathize with the pain of having to parse wikitext without the official parser, and it might be considerate toward bot writers to not use template syntax inside wikilinks, but I don't think it's a requirement, as long as the meaning is obvious for a human editor and it's valid wikitext resulting in a compliant URL after parsing. Single quotes are not %-encoded by encodeURIComponent() for example, because it's not generally needed (and the RFC has a provision for that). The bot didn't encode them either, and it was only a problem because of wikitext parsing, not for any other reason. Gamapamani (talk) 13:49, 30 January 2025 (UTC)

:::::I disagree that URLs in wikitext are wikitext. It would be like saying URLs in markup are markup. Or even that URLs in HTML are HTML. The entire reason for the IETF RFC is precisely for this problem that arises with multiple encoding schemes inside URLs. -- GreenC 14:38, 30 January 2025 (UTC)

::::::https://example.org/aaa/{{((}}bbb{{))}}/ccc in wikitext may look like a URL, but it's actually the equivalent of "https://example.org/aaa/" + bbb() + "/ccc" in some other language. The RFC would apply if wikitext were a "final form" directly interpreted by user agents (as HTML is), but it's not (at least as used on Wikipedia). Gamapamani (talk) 18:33, 30 January 2025 (UTC)

:::::::I asked Perplexity.ai - It confirms what you say:

:::::::----

:::::::Key points:

:::::::* The wikitext source file, including templates like {{tld|!}}, is a separate layer of markup that is processed before the final HTML output is generated. This processing occurs within the Wikipedia/MediaWiki system and is not part of the URL itself.

:::::::* The wikitext source file is not the final form of the content and is not directly interpreted by web browsers or other URI consumers. It's an intermediate representation used by the wiki software.

:::::::* The RFC is concerned with the format and encoding of URIs as they are used in web protocols and documents, not with the internal representations or templates used in content management systems.

:::::::In conclusion, the use of wikitext templates like {{tld|!}} in Wikipedia's source files does not violate RFC 3986 because:

:::::::* It's part of an intermediate markup language, not the final URL.

:::::::* The final rendered URL complies with RFC 3986.

:::::::* The wikitext source is outside the scope of what RFC 3986 aims to standardize.

:::::::The mixing of percent-encoding and wikitext-encoding in this context is not a violation, as they operate at different layers of the content creation and rendering process.

:::::::----

:::::::Looks like I've been wrong for 10 years, in multiple discussions. Nobody had an answer before. -- GreenC 00:45, 1 February 2025 (UTC)

::::::::I think this is the first time (that I've been told of, anyway) that an AI has been used to adjudicate the validity of my points, I'm glad it agreed. :) Interesting times! But you've been quite right that all "plain" URL representations in wikitext – the overwhelming majority – should comply with the RFC since they'll be included as-is in HTML; those containing intentional markup to alter them are after all pretty rare – but it can be needed sometimes. Gamapamani (talk) 08:19, 1 February 2025 (UTC)

Bot messing up archive url field

The way bot done this edit, i fear it could mess around with full confidence. kindly fix that.––kemel49(connect)(contri) 12:54, 26 January 2025 (UTC)

:User:KEmel49, another GIGO: [http://www.imir-bg.org/imir/books/malcinstvena%2520politika.pdf%257Curl-status=dead%257Carchive-url=https://web.archive.org/web/20070926235751/http://www.imir-bg.org/imir/books/malcinstvena%2520politika.pdf%257Carchive-date=26 the url] has embedded %257Curl-status=dead%257Carchive-url=https://web.archive.org/web/20070926235751/http://www.imir-bg.org/imir/books/malcinstvena%2520politika.pdf%257Carchive-date=26 which confused the bot. The solution is fix the URL. You did: Special:Diff/1271888544/1271948540 .. this is a bigger problem I posted about here. -- GreenC 17:19, 26 January 2025 (UTC)

BOT caused cite error

Hello, in [https://en.wikipedia.org/w/index.php?title=Roundhay_School&diff=prev&oldid=1269978928 this edit] the BOT caused a "Cite error: The named reference ":0" was defined multiple times with different content". The BOT added {{para|archive-url}} and {{para|archive-date}} to the named reference ":0", but there was a second instance of this ref with all of the fields completed which it did not add the extra fields to. It needs to add the fields to both instances or better still remove the second instance and just leave the named ref tag and add a slash closer". Keith D (talk) 23:38, 4 February 2025 (UTC)

Odd revert acting as anti-vandalism.

The bot made [https://en.wikipedia.org/w/index.php?title=List_of_supertall_skyscrapers&diff=prev&oldid=1276470044 this edit] reverting vandalism. Not complaining but It doesn't look like this was meant as an anti-vandalism bot. Is there a reason it reverted here? TornadoLGS (talk) 02:31, 19 February 2025 (UTC)

:Honestly I'd prefer not to speak on the specifics in the open but if you agree the bot did the correct thing. -- GreenC 02:59, 19 February 2025 (UTC)

probably gigo but ...

Special:Diff/1271516870. Thought you should know.

Trappist the monk (talk) 17:03, 12 March 2025 (UTC)

:I fixed manually a few days before, it was flagged in the logs: Special:Diff/1266394266/1271222183 .. but overlooked the second identical one. -- GreenC 19:13, 12 March 2025 (UTC)

Alone (i-Ten song)

Not sure what happened at Alone (i-Ten song), but this was a weird edit by the bot I had to revert. plicit 05:39, 16 March 2025 (UTC)

:Thanks for fixing. It's a one-off. I know what happened. Ashok Bhadra and Alone (i-Ten song) were assigned the same internal identifier by the bot, and thus shared the same data directory overlapping content. I prevent this by using very large numbers based on nanosecond time strings and random numbers, it looks to be about a 1 in 999 trillion possibility that this would occur. Possibly there are conditions related to the computers clock that makes it not as random as I think. In 10 years and millions of edits, I think this is only the second time it has been reported. -- GreenC 06:53, 16 March 2025 (UTC)

::Did more research, a collision would actually be expected about 1 in 1 million pages. This is too frequent. I added two digits to the identifier making it every 100 million pages on average. I could make it every billion pages but that would increase the size of everything using the identifier, it's a tradeoff. -- GreenC 03:03, 17 March 2025 (UTC)

Archive.today short URLs

Just a quick question why there's a need to convert the short URL format to long form. I read WP:WEBARCHIVES#Archive.Today, but the reason for doing so doesn't seem to jump out at me. Could someone explain this in more detail? Thanks! --GoneIn60 (talk) 18:24, 17 March 2025 (UTC)

:It's URL shortening, which hides the actual URL, allowing bad actors to insert blacklisted URLs since the filters can't see the actual URL. It also helps real people to see the actual URL. And if/when archive.today ever closed down, we wouldn't know what the URL used to be (for square and bare links) and thus wouldn't be able to move to another archive provider. The WP:WEBARCHIVES link is a technical document meant for bot operators. Better information is at Help:Archiving a source and Help:Using archive.today specifically Help:Using_archive.today#Use_within_Wikipedia. -- GreenC 18:52, 17 March 2025 (UTC)

::That helps greatly, thank you! -- GoneIn60 (talk) 19:05, 17 March 2025 (UTC)

Issue with reference template:!

It seems the bot had an issue at this {{diff|Samuel Alito|1279362071|1277507056}} diff changing "|" to the {{!}} template, which didn't format correctly. I've fixed by hand, and you may have solved this over the last month, but thought I'd flag just in case. meamemg (talk) 18:52, 1 April 2025 (UTC)

:Thanks for the report. The problem was a missing "]" Special:Diff/1283472039/1283518020 .. normally it should not matter since it's inside a Wikicomment but in this case it confused the bot which parsed up to the next available "]" which was 2 cites further down. -- GreenC 00:44, 2 April 2025 (UTC)

::My kingdom for a "]". Thanks for looking into it. meamemg (talk) 00:57, 2 April 2025 (UTC)

Stop linking newspapers

Can this bot be told to stop linking newspapers in references. It’s unnecessary and unwanted. - SchroCat (talk) 05:12, 2 April 2025 (UTC)

:MOS:REFLINK Οἶδα (talk) 21:36, 2 April 2025 (UTC)

::That isn’t an answer; let’s try and pretend we can write in sentences, shall we? Bots should not be doing tasks that are not beneficial, which this isn’t. I’ve reverted a few of the changes this made recently and put in a bots deny template, but it shouldn’t have been doing this in the first place. - SchroCat (talk) 04:44, 3 April 2025 (UTC)

:::Apologies, I was merely refuting your assertion that reflinks are "unnecessary and unwanted", which is not borne out by any direct evidence you have thus provided. But more importantly, you seem to be implying that the bot is going around performing edits in which the only revision is the addition of reflinks. That is simply not true in my experience, and not even true for the pages you reverted edits on. Οἶδα (talk) 09:03, 3 April 2025 (UTC)

::::Then you misunderstand. In addition to changing the URL (which is beneficial), it is also wikilinking the newspaper, which it should not be doing. - SchroCat (talk) 09:26, 3 April 2025 (UTC)

:::::No, you misunderstood my comment. You implied it is performing this for all newspapers and it is the ONLY revision being performed. When in reality, the addition of reflinks was alongside the migration of thetimes.co.uk URLs. That was not correct. You were wrong. Οἶδα (talk) 21:44, 3 April 2025 (UTC)

::::::I misunderstood nothing. You were wrong. - SchroCat (talk) 01:11, 4 April 2025 (UTC)

::User:GreenC, I have stopped the bot running as it is undertaking a task that is down to editorial discretion, and not something that should be done en masse by a bot. Just because wikilinking newspapers can be done, doesn't mean it should be done: that is not for an individual running a bot to decide, but for the editors on the individual articles. - SchroCat (talk) 07:03, 3 April 2025 (UTC)

:::Pardon me if I am wrong, but I believe this only applies to the url changes for The Times, which migrated from thetimes.co.uk to thetimes.com. Is it adding reflinks for other newspapers? Οἶδα (talk) 09:14, 3 April 2025 (UTC)

::::As above, you've misunderstood the comment. It's not about changing the URL, it's about adding in an unnecessary wikilink, which it should not be doing. - SchroCat (talk) 09:26, 3 April 2025 (UTC)

:::::You didn't respond to what I wrote (about The Times). But it doesn't matter anymore given the RfC. Οἶδα (talk) 21:45, 3 April 2025 (UTC)

Feature removed. -- GreenC 13:57, 3 April 2025 (UTC)

:RfC: Wikipedia:Village_pump_(proposals)#RfC:_work_field_and_reflinks -- GreenC 20:07, 3 April 2025 (UTC)

bot tripped-up by html comment markup

this edit; line 108.

Trappist the monk (talk) 19:24, 4 April 2025 (UTC)

:Thanks. I believe this is fixed at the core function. Sometimes wikicomments are within the bounds of an argument |url= |date= , and sometimes cross over into another |url= . -- GreenC 22:54, 4 April 2025 (UTC)

tcm.com

The bot changed url-status from dead to live, eventhough the link is still dead: [https://en.wikipedia.org/w/index.php?title=List_of_Brian_Blessed_performances&diff=prev&oldid=1285133983]. Mika1h (talk) 16:39, 14 April 2025 (UTC)

:The new link https://www.tcm.com/tcmdb/person/17537%7C23930/brian-blessed#filmography is indeed live. You may be outside the USA, in which case there may be a regional block and there is nothing we can do about it, TCM is difficult that way. In this case, navigate the best you can. Recommend to click on the Wayback Machine link. -- GreenC 16:49, 14 April 2025 (UTC)

Bot error in [[Special:Diff/1287280421]]

The bot erroneously edited the archive link fields out of an empty ref template inside an invisible comment which was placed there for the convenience of editors. silviaASH (inquire within) 05:45, 25 April 2025 (UTC)