User:Gnomingstuff/Archived talk pages with undetected bad edits/faq

This page addresses questions I have repeatedly answered.

(back to main page)

General questions and explanations

= Why? =

Vandalism is bad. The fact that vandalism is bad is one of Wikipedia's core policies. That policy explicitly tells users to revert vandalism when they discover it. Some people believe that this core policy doesn't apply to archived talk pages. I don't agree with that.

This page is intended to demonstrate just how bad and widespread the problem is, and to collate bad edits in one place in case the core Wikipedia policy of reverting vandalism becomes allowed again.

= Why talk pages? =

Talk pages are still part of the Wikipedia project, still referenced by users, and still scraped and referenced by external sources. They are also records of what people actually said. Personally, I don't want someone reading a talk page to think I said a racial slur that I did not.

= Why archives? =

Archives are still part of the Wikipedia project, still referenced by users, and still scraped and referenced by external sources. They are also intended as records of what the discussions actually were at the time, not what vandals came along later and changed them to. It is easier to use them if they are not clogged with disruptive edits, or if large swaths of the discussion are missing because a vandal removed them.

= What do the categories mean? =

  • Slurs: The n-word, the f-word, etc. This is a very conservative interpretation, so some words that others might consider slurs are not included here.
  • Crude vandalism: The stereotypical kind of scatological/sexual vandalism.
  • Blanking/meaning-changing: Vandalism that either removes constructive comments, or edits them so that their meaning changes. This category is probably underrepresented, as it's easier to find things that were added to a page than things that were removed.
  • Nonsense: Keysmashes like gskjghjshgdk, etc.
  • Self-insert vandalism: People who vandalize with their names or their friends' names.
  • WP:NOTFORUM: Comments that express a random opinion on the article's subject or some other off-topic comment.
  • Other vandalism: Clear bad-faith edits that do not fall into the above categories.
  • likely text-to-speech/LLM prompt/search/Siri/homework test edits (standalone): The junk that started showing up around 2022, which often shows clear signs of using talk pages as prompts to LLMs, search engine queries, prompts to text-to-speech/Siri assistants, or soliciting drive-by homework help, via those means. These are easier to demonstrate in aggregate than to explain -- you know it when you see it. I don't know why these started proliferating when they did.
  • Other test edits: Bad edits that might not be in bad faith, that do not fall into the above categories. This errs on the side of assuming good faith, but some of the edits here probably were intended as vandalism.

= What is not included and why? =

  • BLP violations and revdellable/oversightable edits, per above. I report those instead, as these are exceptions to the rules and recent consensus (as of 2025) is that these edits are bad enough to be removed no matter what.
  • Heated arguments and personal attacks that take part during a legitimate discussion. Calling someone an asshole might violate WP:CIVIL, but they happen.
  • Discussions about vandalism, e.g., people posting examples of vandalism to an article and asking whether it should be reverted. It would have been better if those people just reverted it at the time without broadcasting it to the world, but these are unfortunately actual discussions.
  • Similarly, any bad edits that have been responded to, struck, or otherwise acknowledged, as they are unfortunately part of the discussion.
  • Vandalism on user talk pages.
  • Huge masses of WP:NOTFORUM type comments because I do not have the patience for that and it would be better if WP:TNT were applied.
  • Vandalism to the archive pages themselves (which includes vandalism to talk pages where the whole page itself was moved, a confusing but sometimes seen practice). I have a separate list of these edits but it isn't on Wikipedia.
  • Sockpuppet edits -- these are almost always active parts of a discussion, albeit unwanted ones, and would require a whole investigation to root out that I do not have the patience for.
  • Copyvios -- as above these would require a whole investigation to root out, which may not even be possible for old edits due to linkrot.
  • On-topic comments generated by LLMs (as opposed to comments that are meant as prompts for LLMs, or off-topic AI-generated spam) -- these are usually part of an actual discussion, albeit unwanted by some.

= Why is X listed? =

Every category here is something that could be uncontroversially removed per WP:TPO. And editors have indeed reverted these types of edits from talk pages since the earliest days of the project, for over 20 years. The edits here are simply the ones no one caught at the time.

A general rule of thumb: If an edit would have been OK to use rollback on at the time, it is included here. Otherwise, it is not.

= What differentiates "standalone" from "to other people's comments"? =

The distinction here is how the vandalism appears on the archive page, as the purpose of archive pages is to understand what people actually said.

  • "To other people's comments" means that a vandal either changed the text of what someone said, or that their vandalism appears on the page to be that person's words (for instance, if a comment was unsigned, or there is a spacing issue)
  • "Standalone" means that the vandalism is clearly and unambiguously attributed to the vandal, and is not mistakable for someone else's words.

= Why are pages listed twice? =

Some pages have been vandalized multiple times, sometimes by the same user and sometimes by multiple users, such that the vandalism falls into multiple categories.

This duplication might make the count slightly off, although I suspect that this is canceled out by the fact that some pages have multiple diffs listed in the same category.

= How much of this is there? =

It is impossible to know. Given that there are 8.1 million talk pages as of 2025, with 2-3 million edits to talk pages per year, there is probably a substantial amount.

= How are you finding these? =

Searching for common words/phrases used in vandalizing edits, signs that an edit is an unconstructive test edit (e.g., "Bold text"), and then going through the talk page's history to find the diff. I also scan the archived talk pages to check for other obvious instances of vandalism on the page. Sometimes I uncover other bad edits when going through the history -- this is how most of the blanking and other subtle vandalism is found.

These searches usually turn up vandalism on un-archived talk pages too, and I just revert those. If you think this distinction is arbitrary, I agree with you.

= Why don't you do X instead? =

Because I am choosing to do this. I have already reverted a great deal of this in articlespace, several years ago, and so the proverbial backlog there is much smaller.

= What about people with archive pages on their watchlists? =

Your watchlist is your own responsibility and is optional.

Watchlists are furthermore not the only way of patrolling archive pages for new vandalism. The search query intitle:"archive" "4 April 2024", filtered by talk pages, will return a list of all the edits to talk page archives for that particular day, which is usually a manageable amount.

Policies, guidelines, and essay-based questions

= [[WP:ARCHIVE]] / [[Template:Aan]] =

I believe that this guideline conflicts with several Wikipedia policies, including WP:VANDAL and the policies listed below. I also believe that preserving vandalism -- particularly vandalism to other people's original words, and undetected blanking of other people's constructive comments -- goes against the spirit of that guideline, as it means that what people are seeing on an archive page is not actually the original discussion, and in some cases omits large swaths of it.

Notably, other people have raised concerns in the past that this guideline conflicts with Wikipedia policies, and in both cases editors have stressed that this guideline is not policy. Previous discussions on archived talk pages have also consistently held that editing archives is not actually disallowed, though discouraged for reviving discussions. I do not believe that reverting vandalism constitutes reviving a discussion.

= [[WP:NOCON]] =

The previous discussion on edits to talk pages archives (as of 2025) ended in no consensus.

However, the current text of WP:NOCON does not really account for edits to pages outside articlespace (other than deletion), but does state that BLP violations/defamatory material, copyvios, and questionable external links are commonly removed. I believe vandalism falls into the spirit of this category.

More importantly, this policy does not mandate anything (besides removing copyvios); it only points out "common results." As WP:STONEWALLING states: "When a good faith discussion about a proposal results in 'no consensus' (rather than 'consensus opposes change'), the status quo is usually favored (WP:NOCONSENSUS). Once this occurs it's common though incorrect to later argue that consensus favors the status quo."

= [[WP:BOLD]]/[[WP:SOFIXIT]] =

Yes, I know. I agree with you. I did this for several years and then people yelled at me.

= [[WP:5P3]] =

Yes, I know. I agree with you. I find it mind-boggling that one of the five fundamental Wikipedia policies is that "any contributions can and may be mercilessly edited and redistributed," and yet there is this one exception.

= [[WP:5P5]] / [[WP:IAR]] =

Yes, I know. I agree with you. Unfortunately invoking this policy is likely to cause a shitstorm and I would prefer not to do that.

= [[WP:DENY]] =

Yes, I know. I agree with you. I believe that allowing vandalism to exist is glorifying vandals, but it could be argued that keeping a record of vandalizing edits is also doing that. To mitigate this, I have tried to describe the vandalizing edits as clinically and vaguely as possible, have not quoted the vandalism, and have not named the vandals.

=== WP:PRESERVE ===

The text of this policy is more about articles than talk pages. Nevertheless, I think vandalism is more serious than mere "imperfect content," and that the only way to fix it is to remove it.