Wikipedia:Bots/Requests for approval/Bot1058 10

[[User:Bot1058|Bot1058 10]]

{{Newbot|Bot1058|10}}

Operator: {{botop|Wbm1058}}

Time filed: 17:49, Friday, March 7, 2025 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): PHP

Source code available: User:Bot1058/mishyphenation.php

Function overview: Bypass mishyphenated links, to remove pages from User:Wbm1058/Reports/Linked mishyphenations

Links to relevant discussions (where appropriate): User talk:wbm1058#R from incorrect hyphenation, Wikipedia talk:WikiProject Redirect#Wiped/reinstated template?

Edit period(s): Daily

Estimated number of pages affected: ~2,100 on the initial run; varies on subsequent runs

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: This is the second in a series of tasks for this bot, which will run on the Toolforge and use a database report as the basis for its edits to correct errors in links on mainspace pages. Task 9 bypasses bad piped links to link directly to the title displayed to readers; this task will bypass mishyphenated links. I view edits to add or remove a horizontal line, or adjust the length of a horizontal line, as sufficiently cosmetic to be safely made in automated fashion by a bot.

I created :Category:Redirects from incorrect hyphenation on 5 November 2023, to separate incorrect hyphenations from misspellings, as a lower priority for gnomes to fix than actual a–z misspellings. Misspellings need more scrutiny, as vandals can replace correctly-spelled words with a different, misspelled word. We need to avoid endorsing vandalism by correcting the spelling of the incorrect word rather than reverting back to the correct word. We continue to have an imbalance between "executive editors" declaring words to be misspelled or mishyphenated, and gnomes following their directives to correct these errors; my bot tasks are an effort to restore more balance between the executives and the gnomes.

I've built in a safeguard to ensure that this task's edits have community approval. The bot won't make edits when the redirect page triggering the edit has been edited within the past seven days. This will stop edit-warring over what the "correct" form of hyphenation should be, from causing the bot to edit war with itself over a short term. Editors may watchlist User:Bot1058/mishyphenation pending fixes if they want to monitor these pending edits before they're made.

Examples of edits that this task will make:

For consistency, the bot will make similar changes outside of wikilinks when it determines it's safe to do so:

  • the term is in plain text
  • surrounded by spaces
  • leading space and ending period, comma, or semicolon
  • in (parenthesis)
  • in "quotes"
  • leading space, followed by an "s" (plural form)
  • led by a pipe (|), assumed to be a table element
  • led by an equal sign (=), assumed to be a parameter

The bot will explicitly avoid changing filenames, to avoid breaking image links.

It will also avoid changing links when the link is part of a longer linked title. This will avoid the bot creating red links; these will be left for human review.

The bot will leave anything not explicitly determined to be "safe" for human review. The initial run of this task is expected to leave about 120 pages for human review.

The bot will not make changes when more than two characters in a link are changed, leaving these for human review as well. One of the changes will be to a hyphen, dash, or space. A second accepted change may be to uppercase a character or put a diacritic on a character.

=Discussion=

I'm concerned that the "changes outside of wikitext" would get into WP:CONTEXTBOT territory. You seem to be explicitly stating that you're going to alter direct quotes, which we usually take pains not to modify, and may not be able to correctly identify things like hyphenated compound modifiers. Anomie 13:48, 8 March 2025 (UTC)

:Yes, regarding this:

{{box|

:Generally, a compound modifier is hyphenated if the hyphen helps the reader differentiate a compound modifier from two adjacent modifiers that modify the noun independently. Compare the following examples:

:

:* "small appliance industry": a small industry producing appliances

:* "small-appliance industry": an industry producing small appliances

}}

:Redirects should only be tagged as "incorrect" if they are always incorrect under all contexts. If there are contexts where they are correct, then they should be tagged as valid alternatives. My bot only edits to correct things that are incorrect in all contexts. Some editors have been over-prescriptive, tagging things are incorrect when there are contexts where they are correct. When I find these mislabeled redirects, I correct them, e.g. age-of-consent laws, where I corrected {{U|TARDIS Builder}}. This is why I created User:Bot1058/mishyphenation pending fixes – to allow time for reversions of mislabeled redirects. – wbm1058 (talk) 16:40, 8 March 2025 (UTC)

:: You're relying on every editor to use your definition of "incorrect", or at least some human reacting to every other definition quickly. That does not seem like a very reliable assumption to me. Anomie 00:12, 9 March 2025 (UTC)

::Speaking only about that example, I tagged it as incorrect because someone searching "age of consent" is almost certainly looking for it as a term (noun) rather than as an adjective. As well, the article it points to talks about the concept as a concept, not as a descriptor.

::I disagree with your assertion that redirects need to be incorrect under all contexts to use that template; I could probably comb through my own history and find examples where that's not true and you would still agree that the choice of template was appropriate.

::I think nuance & judgment matter. Correct category depends on the redirect and the target article.

::Thanks for creating it, I'm glad it's there.   — TARDIS builder           06:33, 9 March 2025 (UTC)

:::This is all about linked misspellings, not searched terms. From the article Adolescent sexuality,

::::"Sexual interest among adolescents, as among adults, can vary greatly, and is influenced by cultural norms and mores, sex education, as well as comprehensive sexuality education provided, sexual orientation, and social controls such as age-of-consent laws."

:::{{reply to|TARDIS Builder}} You were requiring the link to be piped [[age of consent|age-of-consent]] laws to avoid the "mishyphenation". This is bad, especially in the future, if someone were to write a separate article about the adjective, as a distinctly different concept from the noun. You can also tag {{no redirect|age-of-consent}} with {{tl|R from adjective}}, which I just did. – wbm1058 (talk) 12:30, 9 March 2025 (UTC)

::::Okay, I understand what you're getting at with this example. I can see that it is an alternative hyphenation in some cases.   — TARDIS builder           01:33, 10 March 2025 (UTC)

:::Noting that you did correctly tag {{no redirect|Age of consent reform}}, {{no redirect|Age of consent reform in Canada}}, and {{no redirect|Age of consent reform in the United Kingdom}}. – wbm1058 (talk) 16:24, 9 March 2025 (UTC)

:MOS:SIC states that "insignificant spelling and typographic errors should simply be silently corrected." I'm assuming that shortening or lengthening a hyphen/dash would be an acceptable silent correction. – wbm1058 (talk) 17:02, 8 March 2025 (UTC)

Anomie, i created User:Wbm1058/Reports/Linked mishyphenations/by changes, manually organizing the pages listed on User:Wbm1058/Reports/Linked mishyphenations based on my bot's console output.

I believe the context issues are limited to those that call for replacing a hyphen with a space – specifically where the redirect consists of one word (no embedded spaces) which is not a proper noun (the letter following the hyphen is lower case).

I've highlighted the four that meet these criteria – showing examples here where the hyphen is appropriate in context:

Not sure about {{no redirect|African-Americans}}, which has over 500 links. {{no redirect|African-American}} was tagged as an adjective, and then as alternative hyphenation, but it's hard to imagine the plural form being used as an adjective, so this is probably OK to go ahead with the bypass-corrections.

I can modify my algorithm to skip the pages that meet these criteria and report them on the console as likely-valid alternatives in some contexts.

I'm OK with running this on an as-needed or on demand basis rather than daily, in a hybrid between automatic and supervised, where I make a dry run with the $objwiki->edit commented out, and review User:Wbm1058/Reports/Linked mishyphenations and the console report for any issues needing to be addressed before running it in automated mode. – wbm1058 (talk) 17:56, 12 March 2025 (UTC)

{{ping|Hyphenation Expert}} any help you might contribute with determining the algorithm for deciding whether a hyphenated form is correct in some contexts (and thus automated correction should be avoided) versus always incorrect (and thus automated correction is safe) would be appreciated. – wbm1058 (talk) 13:58, 14 March 2025 (UTC)

:For compound modifiers, determining "hyphenation correctness" is very case-by-case basis; per Hyphenated compound modifier: {{tq|Major style guides advise consulting a dictionary to determine whether a compound modifier should be hyphenated ... Not normally hyphenated: Compound modifiers that are not hyphenated in the relevant dictionary or that are unambiguous without a hyphen.}} So for example, Fatty-acid could be appropriately {{tl|R from incorrect hyphenation}}: Fatty acid synthesis, Fatty acid metabolism are unambiguous.

:Only with proper names do I see straightforward "100% incorrect" scenarios: {{-r|Aaron Taylor Johnson}}, {{-r|Hewlett-Packard Enterprise}}. Hyphenation Expert (talk) 18:21, 14 March 2025 (UTC)

::Right, "case-by-case" is precisely what {{tl|R from alternative hyphenation}} is for, and I do not intend for my bot to make any case-by-case determinations.

::Mrakia frigida §Membrane lipid composition: There is a positive correlation between the growth temperature and the degree of fatty-acid unsaturation of the cell lipids of Mrakia frigida.

:::The need for a hyphen, or not, is a judgement determination individual editors may make – not this bot. – wbm1058 (talk) 18:33, 15 March 2025 (UTC)

:::In the rare event that this bot might get the context wrong, the solution is easy. Change the offending redirect's Rcat from {{tlx|R from incorrect hyphenation}} to {{tlx|R from alternative hyphenation}}, and revert the bot. We don't expect perfection from human editors, who all make occasional mistakes, nor should we expect 100% perfection from bots, as long as their mistakes are not too frequent or cause intolerable harm. – wbm1058 (talk) 15:40, 27 March 2025 (UTC)

A previously approved bot, Wikipedia:Bots/Requests for approval/BattyBot 54, handled the related task of correcting links to {{no redirect|Full rigged ship}}. I just tagged that as an {{tl|R from incorrect hyphenation}} so that this bot will handle new cases of this usage. – wbm1058 (talk) 15:40, 27 March 2025 (UTC)