Wikipedia:Village pump (WMF)#WMF plan to push LLM AIs for Wikipedia content

{{Short description|Discussion page for matters concerning the Wikimedia Foundation}}

{{village pump page header|WMF|The {{abbr|WMF|Wikimedia Foundation}} section of the village pump is a community-managed page. Editors or Wikimedia Foundation staff may post and discuss information, proposals, feedback requests, or other matters of significance to both the community and the Foundation. It is intended to aid communication, understanding, and coordination between the community and the foundation, though Wikimedia Foundation currently does not consider this page to be a communication venue.

Threads may be automatically archived after {{Th/abp|age|{{{root|{{FULLPAGENAME}}}}}|cfg={{{cfg|1}}}|r=y}} {{Th/abp|units|{{{root|{{FULLPAGENAME}}}}}|cfg={{{cfg|1}}}|r=y}} of inactivity.

Behaviour on this page: This page is for engaging with and discussing the Wikimedia Foundation. Editors commenting here are required to act with appropriate decorum. While grievances, complaints, or criticism of the foundation are frequently posted here, you are expected to present them without being rude or hostile. Comments that are uncivil may be removed without warning. Personal attacks against other users, including employees of the Wikimedia Foundation, will be met with sanctions.|WP:VPW|WP:VPWMF}}__NEWSECTIONLINK__{{User:ClueBot III/ArchiveThis

|header={{Wikipedia:Village pump/Archive header}}

|archiveprefix=Wikipedia:Village pump (WMF)/Archive

|format= %%i

|age=336

|minkeepthreads= 6

|maxarchsize= 300000

}}{{centralized discussion|compact=yes}}__TOC__

Category:Wikipedia village pump

Category:Non-talk pages that are automatically signed

Category:Pages automatically checked for incorrect links

{{toclimit|3}}

The WMF should not be developing an AI tool that helps spammers be more subtle

I've never been the type to make a "sky is falling" post about a new feature. And I'll state at the outset that I don't think anyone's acted with any less than the best of intentions here, and that I like the idea of the mw:Edit check feature overall. But someone just mentioned mw:Edit check/Tone Check to me, and I have to say, this is the first new feature I've seen that doesn't just seem like a bad idea, but actually seems like it could pose a fundamental threat to Wikipedia.

If that sounds like an overreaction, let me explain. The point of this feature is that it would warn people when they're about to make a non-neutral edit. That sounds like a great idea in theory. But if you look closer, the main kind of non-neutrality they're talking about is peacock words. Which makes sense: An AI can't tell whether "X was a war crime" is NPOV with respect to the consensus of sources. But it can tell whether "Y is the most outstanding author in her field" sounds promotional. So in practice, that is most of what this feature is going to catch: spammy edits.

To that, something that I think will be obvious to anyone who's ever done anti-spam work, but perhaps not to others: The only reliable way we have to catch spammers is that they suck at pretending to not be spammers. That's it. The handful of spammers who actually figure out how to pose as good-faith Wikipedians have yearslong careers and do untold damage.

{{cot|align=left|On a deep level, the survival of Wikipedia relies on the fact that spammers tend to write like this:}}

Chompsky's All-American Potato Chips are an iconic, beloved brand of potato chips, founded in 1991 by two college kids with a dream. Renowned for using only the finest, highest-quality ingredients, they are a consumer favorite across the country.

{{cob}}

{{cot|align=left|And not like this:}}

Chompsky's All-American Potato Chips are a potato chip brand founded in 1991. Described by the Semi-Reliable Times as "iconic and beloved",A paid post they have received positive media attention for their use of high-quality ingredients.An article in a highly reliable source that mentions the brand briefly, but not these claimsA listicle with 50 words on the brand, which repeats their claims without endorsing themAnother paid post

{{talk reflist}}

{{cob}}

There's been a lot of hand-wringing about whether improvements in LLMs will eventually cross over to the point of making it easier for spammers to pose as constructive editors. And here it turns out, the WMF is building that capability in-house. People won't even need to enable it. If I, a spammer completely clueless about how obvious I am, submit that first example above, I'm going to be coached in a direction of being less obvious. But my motives won't change. I won't learn how to find reliable sources, and won't suddenly gain a desire to be honest about what the sources say. All I will learn is how to be subtler about my promotion of a company.

If we could magically make Tone Check only show up for good-faith editors, then sure I'd support it, but we can't, and it's not like we don't already have ways to teach good-faith editors to use an encyclopedic tone. I've talked to Sohom Datta about my concerns, and I appreciate that he showed a lot of interest in finding solutions, but I don't think there is any solution other than making sure this feature is never developed. It wouldn't even be enough just to disable it here. If the code exists somewhere out there, created by the WMF and fully available to anyone under an open-source license, it will be used against us. There are plenty of smart spammers, and all you need is one who figures out how to get that code running locally so they can use it as their own personal wiki-writing tutor, before soon enough everyone's doing that. It could even be integrated with existing LLM output, of which the most obvious tell currently is tone, allowing for slop-spam that costs UPEs almost nothing to produce but is much harder for us to detect.

I want to be overreacting here, but I don't think I am. I'm reminded of an article I GA'd, where we talk about how efforts to increase awareness of gang tattoos have just led to a lot of gangsters getting cover-ups while continuing to be gangsters. Tone Check should be scrapped, and any dataset already created should be destroyed. -- Tamzin[cetacean needed] (they|xe|🤷) 19:51, 23 May 2025 (UTC)

:Concur with Tamzin. Unfortunately, there's a progression here over the last many years of helping new editors learn the ropes and produce a draft...or even an article...that seems reasonable. I've seen quite a number of drafts that get accepted on cursory examination (it's got sources? check; it's got an infobox? check; it's got categories? check; it's neutrally written? check; it's got a picture? check; ok must be good!). As we make it easier for new editors to develop content that seems reasonable at first pass, we increasingly enable bad actors to introduce things that would otherwise be caught. Spammers are heavily motivated by money. We're motivated by volunteer effort to do good in the world. Sadly, the spammers ultimately are going to win this as the tools to deceive become stronger. It's an arms race we are badly losing. The WMF needs to be developing tools to protect the project, not developing tools to aid bad actors (even if unintentionally). --Hammersoft (talk) 20:06, 23 May 2025 (UTC)

:Also concur with Tamzin. Like so many of the WMFs 'good ideas', it seems to have been conceived without the least thought over what side effects might result. AndyTheGrump (talk) 20:59, 23 May 2025 (UTC)

:To copy over some of the counterpoints here. One of the points I have raised during the initial prototyping phase was to make sure that experienced users are able to track if a user is shown this alert in the first place. (regardless of whether they went through with the edit) Also, similarly, at a technical level there are mechanisms that can be put in place that could make it significantly harder for users to run this check through WMF servers without already being in the process of saving their edits. (and thus being logged by the system)

:Regarding the concerns of "if we build this model we lose this war", there is nothing stopping a savvy enough spammer from using the thousands of datasets of Wikipedia article spam/LLM text floating around on the internet (or build there own dataset) and train their own classifiers on top of it provided they have the budget to purchase a few (two? three?) GPUs. That would be cheaper than having a engineer on payroll with the expertise to reverse engineer and replicate ORES locally. If we want complete secrecy, we shouldn't be sending folks AFC declines or telling people why we deleted their text in the first place and that is not really possible. Sohom (talk) 21:49, 23 May 2025 (UTC)

:Thank you for starting this discussion, @Tamzin and everyone here for thinking deeply and critically about Tone Check (T365301).

:With regard to the risk being talked about so far, we [i] are aligned with you all in thinking that we need to take seriously the possibility that Tone Check could nudge people away from more obvious peacock words (e.g. "iconic"; thank you for the example, @Berchanhimez) and towards subtler forms of biased writing that are more difficult for the model and people to detect.

:In terms of how we work together on the above to introduce functionality that benefits patrollers, newcomers, and the wikis at-large, a few initial thoughts come to mind:

:# I need to share with y'all what we're currently planning to mitigate the specific risk @Tamzin is raising. This way, y'all can help us spot gaps in these initial plans and together, identify how they might need to be bolstered and/or reconsidered.

:# I need to publish the broader set of risks we've identified with Tone Check through a pre-mortem we conducted earlier this year so that we can work together to ensure this set is sufficiently exhaustive and the strategies in place "robust" enough to manage them.

:# Further, members of the Editing and Machine Learning Teams will be available next week in Discord (we'll also publish a summary on-wiki) to share details and answer questions about the technical underpinnings of the system. This way, we can engage with the topics above, and others that come up, with a shared understanding of how the system is working.

:Next week, you can expect me to post again here with updates about all of the above. Of course, if there are things we ought to be doing/thinking about beyond the above, I hope you will raise them.

:Oh, and my name is Peter. I work as the product manager who is helping to lead the development of Tone Check and the broader Edit Check project it is a part of.

:---

:i. "We" being the Editing and Machine Learning Teams who are responsible for Tone Check. PPelberg (WMF) (talk) 02:37, 24 May 2025 (UTC)

::An update as this work week comes to a close for me...

::We've [https://www.mediawiki.org/w/index.php?title=Edit_check%2FTone_Check&diff=7662395&oldid=7660920 expanded mw:Edit check/Tone Check] to include more information about the model powering Tone Check, how we're planning to evaluate the holistic impact of the feature, and the conversations and existing initiatives the project is grounded in.

::Next week, we'll:

::# Publish the broader set of risks we've identified with Tone Check, the initial mitigation strategies we've planned, and invite y'all to help us improve it

::# Port the contents of mw:Edit check/Tone Check to a page here (at en.wiki)

::# Schedule time to be in Discord; we're thinking we'll set up a time for a synchronous voice/video chat there

::PPelberg (WMF) (talk) 02:30, 31 May 2025 (UTC)

:::A couple updates:

:::# Discord: tomorrow – 10 June 2025, from 16:00 - 17:00 UTC – members of the Wikimedia Foundation's Editing, Machine Learning, and Research Teams will be hosting a voice call in Discord to talk about Tone Check. You will be able to join the conversation here: https://discord.gg/wikipedia?event=1380664800656363671.

:::# Documentation: we're [https://www.mediawiki.org/w/index.php?title=Edit_check%2FTone_Check&diff=7678920&oldid=7662362 continuing to make updates] to the Tone Check project page, including the questions we're needing volunteer help to answer. Soon, we'll be creating an en.wiki-specific version of the Tone Check project page. We'll post here when that page is published.

:::PPelberg (WMF) (talk) 20:58, 9 June 2025 (UTC)

While I support the idea Tamzin and everyone is getting at, the cat's already out of the bag. For example, I went to ChatGPT and plugged in the following prompt, with the promotional text being copied directly from Tamzin's post:

{{cot|prompt}}

I'm trying to add the following text to Wikipedia but it got removed for being too "promotional" can you make it less promotional? Please reply with only the updated text. The text is: "Chompsky's All-American Potato Chips are an iconic, beloved brand of potato chips, founded in 1991 by two college kids with a dream. Renowned for using only the finest, highest-quality ingredients, they are a consumer favorite across the country."

{{cob}}

And its response:

{{cot|response}}

Chompsky's All-American Potato Chips is a brand of potato chips founded in 1991 by two college students. The company is known for using select ingredients and has developed a presence in markets across the United States.

{{cob}}

While that response is still slightly promotional/weasel-y, it's really limited to, from what I see, "select ingredients". Prompting it to make it less promotional still, it just removed the part about the ingredients entirely (leaving the rest basically the same). Out of curiosity, I went to :Category:AfC submissions declined as an advertisement to find a longer example to test - I tried to find a draft that was long enough to maybe pose a challenge for ChatGPT. I copied the text from Draft:Aarti_Gupta_Bhadauria with virtually the same prompt as the first one, only changing "text" to article in the prompt, specifying it was declined for "being too promotional and an advertisement", and removing the citations from it. I didn't consider whether the subject was notable or not or whether the sources provided clearly show notability/support the text/are reliable - I'll explain below why I didn't. This is the response I got:

{{cot|full article test response}}

Aarti Gupta Bhadauria (born 1983) is an Indian sculptor based in Bangalore. She works primarily with terracotta, creating abstract sculptures that explore themes related to human emotions. Bhadauria does not use a potter's wheel or armature in her process, instead carving her sculptures by hand from large blocks of clay.

Her work has been exhibited in India and the United States, and she has participated in several international art events. In 2022, she was selected by the Garrison Art Center in New York for its international biennial small works exhibition. Bhadauria earned a Bachelor of Fine Arts in sculpture from the Government Institute of Fine Arts, Gwalior, in 2004. She moved to Bangalore in 2009, where she continues to work with terracotta as a medium in contemporary art.

{{cob}}

Sure, there's no guarantee that it will work on all articles/text. And like the first example, it may leave some slightly promotional words/tone in it. But my point is that people can already use LLMs to avoid being called spammers even when they are. The difference is that if it's on-wiki it's much easier to track who is using the tool, whereas if people are pushed to use other LLMs off-wiki, there would be zero record of it other than an editor's guess. And as I showed, if someone brand new submitted something promotional, it would take maybe 2-3 promptings of a LLM off-wiki to get it to a level that would be very difficult to detect. That would appear no different to an editor reviewing the draft again than if the article was just edited by the user themselves to remove promotional tone/weasel words/etc.

I'm wondering if the better option is to have it be on-wiki, so that people using it are logged at least. And ultimately, if the subject is notable, the sources are reliable, and the only issue with the content is it's in a promotional tone, why shouldn't we want it to be added to Wikipedia after the issues with tone are fixed? To be very clear, I am not saying I support this activity by the WMF necessarily. But I do want to point out that it's already trivial to get LLMs to update text for people to resubmit... so maybe we should try to get out ahead of it and have it on-wiki so it's tracked, logged, and results in content that otherwise we may never have if the subject is notable but nobody cares enough to write about them. -bɜ:ʳkənhɪmez | me | talk to me! 21:37, 23 May 2025 (UTC)

:I don't have a strong opinion either way on this, but it did make me think of [https://xkcd.com/810/ this], which would be the optimist's view. I accept that those who fight spam have a good intuitive sense for what might happen if this tool is deployed, but it does seem possible to me that this will result in the spam being less spammy. Maybe harder to detect, but maybe lower priority to detect as a result. Mike Christie (talk - contribs - library) 21:41, 23 May 2025 (UTC)

::Yeah, that's my thoughts too. If the spam is less spammy, potentially even to the point that an editor reviewing the draft is inclined to accept it (notable topic, good reliable sources inline, etc. with the only problem being the tone), then that's a net win for the encyclopedia. Where no article existed on the notable topic before, we now have one that, while it may still have some problems with tone, or be incomplete, is at least in existence. And if people can already do this off-wiki trivially (that whole comment, including finding a draft to test, took me well under 10 minutes to formulate), why shouldn't we get them to do it on-wiki instead where we can track it? -bɜ:ʳkənhɪmez | me | talk to me! 21:44, 23 May 2025 (UTC)

:::There's two aspects to spam: style and substance. Some article have a non-spam substance but a spammy style. This happens a lot of in editor is really passionate about a TV show, for instance. For someone like that, a tool like this would be great. In some cases, when paid editing happens, the company's goal is to just get their name out there. In those cases, too, this tool would probably work fine: If they're notable, then there's no problem at all, and if they're not, there's a nice clean article for AfD to review. However, in my experience dealing with spam articles, much more often it's the case that the article is designed to inflate and aggrandize, not just through puffery but also materially false claims. That's where the comparison to Mike's xkcd link fails (btw, the xkcd: prefix exists now!). This tool won't teach those spammers to be constructive. It will just teach them to look constructive. But the lies and exaggerations will still be there. The advertorials presented as articles will still be there. The claims that fail verification will still be there. -- Tamzin[cetacean needed] (they|xe|🤷) 21:50, 23 May 2025 (UTC)

::::I don't really think this addresses my main point - which is mostly my fault because I could've organized my thoughts better. On-wiki, it can be tracked and logged - for example, if it warns an editor about "select ingredients" and they choose to leave it in anyway, it could log it similar to an edit filter such as "User:Example overrode tone check regarding "select ingredients". Perhaps that log could be visible only to admins or a lower group of permission-holders, that way the users themselves aren't being shown they're being tracked. But that tracking - of what someone chooses to override or not - will help track spammers in my view. Because someone overriding "select ingredients" is highly likely to be trying to promote the company - that's just not a term/phrase used in normal conversation. In summary of this section, if they can do this already, using LLMs off-wiki (as I said, it took me less than 10 minutes to make that whole comment with two completely different prompts), then are we just ignoring that they can do it already and missing out on at least being able to track them doing it?{{pb}}And ultimately, if they look constructive and it results in a notable topic having an article and/or information being added that we didn't have before, I see that as a good thing. Even if that editor never returns to add any other information or any other article - we still have improved from it. -bɜ:ʳkənhɪmez | me | talk to me! 22:01, 23 May 2025 (UTC)

:::::This already happens when the Guild of Copy Editors improves the grammar of an article in good faith, which can hide or gloss over more fundamental problems with the article. I agree with Mike Christie that if the tone improves, then it's still a problem, just less urgent. The most urgent and alarming concern for me is the difficulty of verifying the integrity of sources and, in general, the quality of sources themselves.

:::::I use local scripts, such as User:Novem Linguae/Scripts/CiteHighlighter.js, to help me assess potentially low-quality sources, but most editors are completely on their own. Leveraging AI and centralizing user ratings of central citation project as envisioned in :Meta:WikiCite/Shared Citations could help identify non-notable or poorly sourced articles, even if the syntax and reference count are well structured/hallucinated. ~ 🦝 Shushugah (he/him • talk) 23:12, 23 May 2025 (UTC)

::::::Is it a problem though? A draft shouldn't be approved, or an article marked as reviewed, unless the person approving/reviewing has at least made a good faith attempt to verify that the article doesn't contain any falsehoods and uses at least facially reliable sources. But regardless, those problems are completely separate from the tone, and as you say, they're independent of the tone. I fully support working on tools to help editors review sources - I don't recall its name off the top of my head but I have permanently enabled the plugin that turns questionable source links yellow, and unreliable/bad sources red. But improving the tools available is separate, again.{{pb}}The question here is whether we should oppose the WMF releasing a tool that just makes it wiki-side for editors to do what they can already do - which is use a LLM to improve the tone of their editing. We shouldn't oppose this tool just because it doesn't fix every single problem with new/spammy editors. And as I've been thinking about it, I think we should perhaps consider supporting it so that these editors, who can already (as I showed above) use an off-Wiki LLM to make their text less promotional, will have their activities logged on wiki. Then we can use those logs to further investigate them. If they're a SPA who's only here to promote one company, then they'll probably ignore recommendations to change things like "select ingredients" (again, from the example above). If they're just a new editor who's not sure what to do, then they'll probably take most of the recommendations. They can already use LLMs in bad ways - why not let them do it on-wiki so we can log them and utilize the data from that to help us stop spamming? -bɜ:ʳkənhɪmez | me | talk to me! 01:18, 24 May 2025 (UTC)

  • I know Tamzin has narrowly tailored their feedback to the code red spammer problem, but mark me down as unimpressed with the broader idea. There continues to be a surprising disconnect between what the community wants/needs and what the Foundation is shoveling resources towards. It's true that the community cares about neutrality. But using Google's AI model to prompt users to not make an edit in the first place? It feels like the product team doesn't even know us.{{pb}}The community has had conflicted stances on AI and we've been hashing that out in thoughtful extended dialogues. But for the product team to say "cool we're going to use AI to tell if something is neutral" is very tone deaf and unconsidered. Did any community members get asked about this before the project was started? Machine learning is not the secret to making Wikipedia NPOV. It takes humans to figure out what is neutral, and even we get it wrong sometimes. A computer isn't going to magically fix contentious topics. It sure as heck isn't going to improve our reputation. As I argued in the recent anti-AI images discussion, use of AI will quickly burn the trust we took so long to earn with our readers. I agree that whatever data has been created from this research is highly dangerous and must be destroyed. Cast it into Mount Doom lest its evil be unleashed! CaptainEek Edits Ho Cap'n! 03:05, 24 May 2025 (UTC)
  • :{{ping|CaptainEek}}, I know we've had at least one disagreement before, but your comment here gets to the reason I refuse to blindly support this and am not happy with the way it's come to light. However, I would appreciate if you review my "data" above - LLMs (AI) can already be trivially used by people to alter their contributions. I'm sure you've seen the various posts people make with AI - sure, those are easily figured out by people reading them. But how will we know if a user does something like the example I give above, and just copies their declined article or text into ChatGPT and then puts the response back here (with perhaps readding citations or formatting)? I didn't even have to try more than Tamzin's example and the first article I picked that met my criteria from the category of declined submissions - and ChatGPT quickly, within the first or second prompting, made them "innocent" - at least from my reading, neither of those can be seen as AI generated/edited.{{pb}}I'm against using AI to contribute too, and I am well aware of the problems with hallucinations as they would potentially relate to fabricating sources, or not attaching sources to information they support. But the cat's out of the bag. It exists and people will be (and likely already are) taking advantage of it, especially if/when they realize that they can just go put their declined article into ChatGPT with the instructions "fix (whatever issue it was declined for) in this article" and then republish it here. In my view, the only way we can stay ahead of it is by at least trying to keep it on-wiki so it can be tracked and the "evidence" from that tracking used for our benefit.{{pb}}To be clear, I firmly agree with everyone who is either directly or indirectly opining that the community should've been involved from day 0 - not after it's already been in development. But to me, trying to push back against something like this without even trying to make it work for us, when it's already trivial for people to do it off-wiki in undetectable ways is... no different from grandparents being annoyed with being asked if they want to sign up for an app/email/texting at checkout.{{pb}}I chose to reply to you directly because I'm really trying to see and understand any actual problems with this - because I don't like supporting this. However, the more I think about it and consider the "experiment" I did earlier, the more I'm thinking that we need to get this on-wiki and logged/tracked so we can actually get some useful information/evidence/tracking from what people will do anyway regardless of what we do or don't do. -bɜ:ʳkənhɪmez | me | talk to me! 03:45, 24 May 2025 (UTC)
  • ::@Berchanhimez your comment and this thread got me talking to an AI researcher friend who has done work on Wikipedia before, and she pointed out a perhaps much more useful solution that has been researched partly before: flagging non-neutral revisions for manual review. We already do that with ORES of course for vandalism, so why couldn't we flag edits for neutrality? That seems a smarter idea than giving instant feedback to spammers about how to better evade our systems. Perhaps I was too harsh--this model could be useful, but I think the implementation needs a rethink. CaptainEek Edits Ho Cap'n! 04:29, 24 May 2025 (UTC)
  • :::{{ping|CaptainEek}} I would tend to agree that we don't know enough yet. But I would pose that simply flagging them wouldn't be anything new. As has been pointed out, we already have ClueBot that flags vandalism edits based on a lot more than keywords... so if all we're looking for is a keyword filter (maybe with a bit of extra training) someone much better than me at coding bots should be able to whip a half-decent one up very quickly.{{pb}}I think what I can see being beneficial here is getting data of people who intentionally bypass the flags, versus those who appear to be listening to the flags/suggestions. That, to me, is a much better datapoint than "flagged by a bot and automatically reverted" - because it shows specific intent to be promotional/advert-y in nature, rather than just a new editor not being fully aware of our policies. In other words, it's not just what they're doing, but how they're doing it, and whether they're receptive to an on-wiki tool trying to guide them in the right direction. I can't see that sort of "nudging" happening with a bot that simply reverts. And maybe I'm being pessimistic, but even if edits were flagged so users could provide manual, or even templated, guidance in response... I highly doubt such a group of people monitoring and trying to guide would be large enough to have any impact whatsoever. -bɜ:ʳkənhɪmez | me | talk to me! 04:40, 24 May 2025 (UTC)
  • :@CaptainEek I want to answer some of the questions that you posed and point out some technical points that I think you are conflating in your post. First off, yes this was passed through some community members before being proposed. This feedback is literally coming on the first prototype that the team has released to editors. I'm not sure how else the team could have acted without anticipating a concern that was never mentioned to them?
  • :With respect to your second point, I think it is very important to draw a distinction between AI and large language models. The community has had many long drawn out conversations about the use of large-language models and their use in a generative environment. The community has however, for years encouraged the WMF to build AI tooling, whether that be in the form of building custom models (using our ORES, now called LiftWing infrastructure) to preemptively check if a edit should be reverted on Special:RecentChanges or should be labelled as a stub, start or other article class using the . Even in recent memory, many of the mentorship suggested-edits are powered through in-house models. Even the enwiki community has built community owned tooling like ClueBot NG (which AFAIK is just a AI random-forest classifier running on a dataset of labelled edits) that is autonomously reverting edits as we speak. None of this use large-language models (the source of much discussion in the community in recent times) and have overall been positively received.
  • :Also, your comment about this being just "using Google's AI model to prompt users to not make an edit in the first place" is just plain and simply wrong (and verges dangerously on misinformation/assume good faith violation). Google's AI models are not being planned to be used here, the models are being built in-house from scratch. In fact, looking at the relevant phabricator tasks, it seems like that the model that they ended up going with is not even using the typical large-language model architecture (i.e. transformers) and are instead a classifier based on BERT which (while originally developed by Google) is a 2018 era method that is generically used by almost anything that uses text as input and the model is not capable of generating it's own text (i.e. it is explicitly not generative in nature). (please ping on reply) Sohom (talk) 03:56, 24 May 2025 (UTC)
  • ::This is very helpful information/clarification. I won't say I thought it was going to be a LLM to begin with, but I figured that was the best comparison to a currently available off-wiki tool. I think the most important thing is that this isn't generative, in other words, there should be no risk of "hallucinations". Please correct me if I'm wrong on that point. -bɜ:ʳkənhɪmez | me | talk to me! 04:10, 24 May 2025 (UTC)
  • ::Thanks @Sohom Datta, that is a helpful clarification. Unfortunately, LLM's have somewhat poisoned the term AI, and the use of BERT (a language model) muddied the distinction for me. As for BERT and Google, I guess I don't understand how we could be using a model invented by Google but it not be Google's model...? (even if it is being used under an Apache free license). What am I not understanding about AI development? I also don't quite understand your claim that their model isn't using transformers, when BERT stands for "Bidirectional encoder representations from transformers"? CaptainEek Edits Ho Cap'n! 04:18, 24 May 2025 (UTC)
  • :::@CaptainEek So at a high level, BERT is a small model that takes your text and converts it into a array of floating point numbers that a machine learning model can then use to understand the text. The original model proposed by Google borrows from transformers architecture and was meant to be used alongside the transformer architecture, however, it has since been used on a lot of other applications that require the model to just understand text. To my understanding, the way the team is using the model is primarily as a way for the model to translate the text itself into a set of numbers that it can understand and then use that as input to a different model that outputs a number between 0 and 1 depending on how promotional it is.
  • :::Wrt to it not being a Google model, whenever you add/integrate significant new parts to a model, you loose a lot of the work that Google did to train the model back in 2018 and you will need to retrain the model almost from scratch. (Basically, imagine if Wikimedia published a paper explaining their software stack and somebody decided to copy everything but throw away the content and start from scratch, would you consider that a Wikimedia project?). Even if you ended up just using BERT as part of your model without any modifications, it still does not automatically make it a Google model since it's only a component of your model and you will need to do your own training on top of it for the output to start making sense at all (for another analogy, just using Gerrit to develop software does not make Wikimedia a Google company since Gerrit is not why Wikimedia is Wikimedia) Sohom (talk) 04:47, 24 May 2025 (UTC)
  • :There continues to be a surprising disconnect between what the community wants/needs and what the Foundation is shoveling resources towards.
  • :I'll be bold and mention two examples:
  • :* The Chart extension is only somewhat useful on Wikimedia wikis and leaves out everyone else unless they're willing to misuse Commons with their data for everyone to alter as they please. By looking at the latest developments, this won't change any time soon.
  • :* It's been far too long since we've heard news regarding the transition from CirrusSearch and Elastica to OpenSearch. Wikimedia wikis appear to be already using some OpenSearch code but again, third party MW users are left in the dark and either have to depend on Elastica or stick to the barely usable search feature.
  • :Tactica (talk) 13:35, 27 May 2025 (UTC)
  • ::@Tactica (and other folks in this thread who have made similar assertions), I decided to take a look at this exact thing over the course of the weekend since it has been made by multiple members in the thread. In the context of the Edit Check features, it seems rather untrue to say {{tq|There continues to be a surprising disconnect between what the community wants/needs and what the Foundation is shoveling resources towards.}} wrt to this feature. To me it feels like this is what the volunteer community asked for. Different iterations of this feature has been asked for in VPM posts, Community Wishlist posts in 2021 and 2022. Even after the development started, there were multiple rounds of feedback for this features starting with demos in Wikimania 2023 and 2024, partial roll out of parts of this feature across all wikis (other than enwiki) in the form the Link Reliability Check tool, as well as a brief round of feedback from PTAC earlier this year. To my understanding in each of these situations this feature was met with a positive reception in each event and no significant concerns were raised about Edit Checks usage.
  • ::Regarding the two other examples, I struggle to see these as wider "community" issues and more specific technical/philosophical disagreements that you have with how certain features are being implemented. (A "disconnect" would for example look like the WMF turning a blind eye to the Graph extension vulnerabilities and instead building a feature to allow users to integrate generative AI text and images into their articles rather than implementation niggles). Sohom (talk) 16:32, 27 May 2025 (UTC)
  • :::Thanks @Sohom Datta – That context and history is highly useful. I might suggest the WMF team put some of that into the (now empty) history section of project at mw:Edit_check/Tone_Check#History so that we have a better shared understanding of past discussions and community interactions. - Fuzheado | Talk 17:07, 27 May 2025 (UTC)
  • :::I wasn't referring to the Edit Check feature in particular, but to the WMF priorities in the broadest sense. I mean, it makes sense that the WMF supports any development that benefits first and foremost to Wikimedia wikis because technically speaking that's their main asset, but as I see it, the MediaWiki ecosystem comes immediately after that, and for the engine to be useful beyond WMF wikis development should also benefit third party wikis. Currently this is not the case when it comes to key extensions such as Chart, which will keep depending on Commons for the foreseeable future, or AdvancedSearch, which still depends on Elastica and forces third parties to install a lot of undesirable dependencies if they want an usable search feature. On the AI front however, for an organization that claims [https://wikimediafoundation.org/news/2025/04/30/our-new-ai-strategy-puts-wikipedias-humans-first/ not to intend to have editors replaced by drones], sure it seems to be investing significant resources into the opposite.
  • :::I don't have plenty of time in my hands to keep up with Wikimania articles and the endless bureaucracy going on, but I take notice when actually useful resources such as WikiApiary are left to rot while theoretically a whole usergroup is supposed to look after its maintenance. It's also quite revealing when you check a recent changes list and see developers spending time fighting spambots because the WMF brass decided everyone should be able to edit (and vandalize) their wikis, implying developer's time is cheap if not free. It shouldn't suprise anyone that development of MW and its extensions happens sometimes at such a snail pace and a number of bugs remain unfixed for years. Tactica (talk) 16:15, 2 June 2025 (UTC)
  • ::::{{re|Tactica}} Just a note that the principal that "anyone can edit Wikipedia" predates the creation of the Foundation. It is also my impression that the enWiki community as a whole still supports that principal. Many years ago I supported the idea of requiring registration, but I now believe that would not necessarily improve much of anything on enWiki. Donald Albury 16:51, 2 June 2025 (UTC)
  • :::::Thank you for the clarification. And no, just forcing users to register of course wouldn't stop spambots, but having the first edit automatically moderated would IMO stop most of them. But I know this won't happen because again, the powers that be have other priorities. Tactica (talk) 01:21, 3 June 2025 (UTC)
  • I think I agree with Berchan here in that I'd have to see some evidence that this is notably worse than what can already be done with ChatGPT (which almost certainly was heavily trained on Wikipedia) for me to worry too heavily about the WMF doing this. Unfortunately for us, this horse may have bolted long before we started trying to close this particular stable door.{{pb

}} What I don't really think is a solid part of Berchan's argument is "might as well have it on-wiki so we can log it". What I suspect will happen there is that if we use the logs to police this use, malicious editors will get wise to us and stop using the on-wiki tool. Loki (talk) 03:38, 24 May 2025 (UTC)

  • :This is one instance I actually agree with what Tamzin said: {{tq|The only reliable way we have to catch spammers is that they suck at pretending to not be spammers}}. We're never going to catch the most sophisticated people - at least not easily and quickly. But for the "run of the mill spammer" so to speak, they're going to get wise to the fact they can use off wiki tools for it... if they haven't already. Providing the tool on-wiki with tracking/logging/etc. will at least catch more than if everyone is persuaded to go off-wiki to do it. Furthermore, there's no need to publicize the fact that it's logged in an easy to find manner, or explicitly tell someone that the reason they're being blocked was because of evidence that was logged. Hence why I suggested the logging only be available to administrators, or perhaps to people with lower advanced permissions (such as autopatrolled and/or even rollback potentially). -bɜ:ʳkənhɪmez | me | talk to me! 03:49, 24 May 2025 (UTC)
  • ::A fundamental difference between people using off-wiki tools and there being an on-wiki one built in is that people have to know to do the former, and know how to do it. Most spammers write something like my first "Chompsky's" example above, look at it, and say "Yup, looks good to me." Often, even when that article has been deleted multiple times, even when they've been told it's unambiguously promotional, even when they've gone to the Teahouse and asked what they did wrong, they still aren't able to figure out how to make it sound like an encyclopedia article. Given how much spammers do use ChatGPT, we can only infer from this that, for whatever reason, it doesn't occur to them to give the prompt you gave; or perhaps the output isn't as reliably de-promo'd as in your one testcase.{{pb}}Secondly, at least at the moment, LLMs tend to strip wiki formatting, which makes their usage obvious. A tool that doesn't generate text but instead prompts the user to rewrite it better, while retaining wiki format, will be much subtler.{{pb}}And thirdly, the existing off-wiki tools are not trained specifically to look for what Wikipedia editors consider a non-neutral tone. The dataset that the WMF is building will be. That is a much greater danger than LLMs pose. The most effective way to regulate a weapon is to not invent it. We're in a rare position where we're the only ones who can invent this weapon, because it's based on our norms. So let's... not. -- Tamzin[cetacean needed] (they|xe|🤷) 04:27, 24 May 2025 (UTC)
  • :::But we aren't the only ones who can invent it. As I showed, both using your example and a random (ok, I just picked one on the first page and it happened to work) declined draft, ChatGPT is already capable of "fixing" poor articles/content. And I think you're vastly underestimating people - sure, they may not know of ChatGPT, but if they have Facebook, or Twitter/X, they're having LLMs shoved in their face every time they search or click virtually anything. So to claim that people won't know they can use LLMs is naive in my opinion.{{pb}}Secondly, most new people don't use proper wiki formatting to begin with. Sure, the one example I happened to click on of a draft article used proper citation formatting. But most people don't - at least at first. And formatting has never been a reason on its own to decline a draft or revert an edit. If the draft/article/edit is otherwise good, the solution is to fix the formatting, not revert it just because it wasn't formatted yet.{{pb}}And as the one employee who's responded has said, this isn't going to be a LLM that generates text - it's going to be more similar to ClueBot, just allowing users to select to fix it instead of just reverting it. -bɜ:ʳkənhɪmez | me | talk to me! 05:01, 24 May 2025 (UTC)
  • ::::I was never under the impression that this was an LLM that generates text. That would be bad, but much less bad. Instead, this is something that will encourage spammers to refine the spam they've written until it stops looking like spam, without fixing the underlying issues. And obviously people know they can use LLMs. My point is that, despite knowing that, they're still not (usually) successfully using them to make their spam less obvious. There is an absolutely massive difference between it being theoretically possible to use LLMs to turn a spammy article into a less spammy one, and literally baking a technology into the edit interface that will give spammers advance warning that their spam looks like spam. And again, if we do not create this technology, it does not exist. A lesser version of it might exist, but not a model literally built around data of which edits Wikipedians say were non-neutral. -- Tamzin[cetacean needed] (they|xe|🤷) 05:14, 24 May 2025 (UTC)
  • :::::If the underlying issue (assuming you mean things like sourcing, due weight, etc) isn't fixed, then that will be handled through our normal processes. In other words, it seems like you're letting perfect be the enemy of good here. If we can remove, lessen, or even just better track, the spam issue through a tool like this, why should we not be doing so just because it doesn't fix every single issue? -bɜ:ʳkənhɪmez | me | talk to me! 06:14, 24 May 2025 (UTC)
  • ::::::It's not that it doesn't fix every single issue. It's that it creates a massive issue that does not currently exist, and, once created, will be impossible to fix. My understanding is that this is generally considered poor practice in software engineering. -- Tamzin[cetacean needed] (they|xe|🤷) 06:48, 24 May 2025 (UTC)
  • :::::::One of the ways that software engineers mitigate the risk of brute-force password attacks is to deliberately slow down the login process to reduce the efficiency of the attack. I don't think using the tone check feature on the website is going to be responsive enough to be sufficiently useful for training a program to improve the quality of its writing. (If the underlying model is made publicly available, though, then there is a potential for misuse.) That being said, I think Wikipedia's existing processes have already pushed spammers to using low-cost contractors. More quality controls, whether they are manual or automated, will just provide incentive for spammers to implement their own quality controls. So while I agree any deployment of such a feature needs to be carefully considered, I don't think it's an existential threat beyond the current threat of spammers potentially swamping the time of volunteers able to combat bad edits. I think there are more than enough adequate writers in the potential labour pool that tools to help people write better aren't the limiting factor. isaacl (talk) 15:29, 24 May 2025 (UTC)
  • Agree broadly with the concerns. One solution that I don't see mentioned in the discussion would be to restrict Tone Check to folks we can reasonably consider as good-faith users – those who have went beyond a certain combination of account age and number of edits. It's a pity that unregistered and new users won't be able to use it but I think it's for the best. I don't buy in to the doomsdaying that the feature should not be developed at all – anyone who's savvy enough to bypass the restrictions would also be savvy enough to use some external plugin that offers the same functionality. After all, as others mention above detecting promotional language and rewriting to Wikipedia-esque language is already possible via ChatGPT and friends. – SD0001 (talk) 17:23, 24 May 2025 (UTC)
  • :Or even more low-tech: assign a copy editor for your contractors to review their edits and to train them to write non-promotionally. It's not hard to learn, particularly if your continued employment depends on it. isaacl (talk) 18:31, 24 May 2025 (UTC)
  • I agree with Tamzin, we don't need a tool that disguises spam. Even if we train a spammer to create stuff that look like a Wikipedia article, they will inevitably cherry pick their client's story and leave out negatives even if they are easily sourced. But my assumption is that spammers, unlike fans, very rarely become good wikipedians. We do have the occasional former vandal in the community, rarer than some might think, but you do come across them. Spammers however, does anyone ever remember a former spammer becoming a member of the community? We are more likely to have members of the community become spammers than vice versa (happy to be proved wrong if someone has a way to measure this, and I'm not counting isolated examples as a way to measure this). What I think would be useful would be a way to flag probable spammers at newpage patrol and recent changes. Maybe an AI that looks at likely spam and highlights it in those tools, or maybe a feed into huggle or whatever the trendy recent changes patrol tool is these days. ϢereSpielChequers 19:50, 24 May 2025 (UTC)
  • :And lo, the WMF has developed such an AI - they're just using it to help the wrong people. Tone Check shouldn't alert the spammer; it should quietly flag the edit as particularly needing patrolling. NebY (talk) 20:22, 24 May 2025 (UTC)
  • :@WereSpielChequers I think a good side question to your response would be to figure out what percentage of new editors are spammers? Another interesting metric to pull out would be how many users of the non-spammer bunch have been warned about WP:NPOV? Sohom (talk) 20:33, 24 May 2025 (UTC)
  • ::I'm pretty sure it is a significant minority, especially of new page creators. I doubt that WP:NPOV warnings would be a good measure, as that gets us into issues of arab/israeli and other political disputes. Yes I've no doubt some of our political propagandists are paid, but I'm assuming most aren't; so they are volunteers, just not necessarily our volunteers when they start. Whilst spammers are, I'm assuming, paid not volunteers. My assumption is that it is much easier to recruit someone who volunteers elsewhere to volunteer for Wikipedia than to recruit volunteers from among people who don't give time to charity. Hence my assumption that spammers are unlikely to become Wikipedians. ϢereSpielChequers 20:51, 24 May 2025 (UTC)
  • ::@[[User:WereSpiel
  • :{{Tq|What I think would be useful would be a way to flag probable spammers at newpage patrol and recent changes. Maybe an AI that looks at likely spam and highlights it in those tools}}. Fyi this already exists. "Spam" is one of the filters in Special:NewPagesFeed. It is powered by mw:Extension:ORES, which uses machine learning. –Novem Linguae (talk) 22:59, 25 May 2025 (UTC)
  • :To add to {{u|Novem Linguae}}'s comments here, the WMF has invested significant resources into upgrading the backend infrastructure running these models over the last year or so. There has also been efforts from WMF teams to invest into building newer language agonostic models that calculate how likely a edit is to be reverted, something that is being used to try and build a WMF-mainatined equivalent of ClueBot NG on other wikis. What y'all are proposing is already kinda in happening at the moment. Sohom (talk) 16:46, 27 May 2025 (UTC)

Just want to register concern for the notion that we should cultivate and preserve a protective layer of stylistic complexity for fear that removing a common barrier to participation would benefit not just good faith new users but also bad faith new users.
I get adversarial [technical] asymmetry and the Red Queen effect in the context of a digital arms race of sorts -- I'm not saying I can't fathom why anyone would oppose such a tool. But tone is such a frequent problem for good faith new users. I cannot tell you how many hundreds of newbies I've interacted with who struggled to understand the proper way to write. Students used to writing class papers, professors used to academic writing, artists used to flowery writing, etc. They're not spammers, just members of the public who aren't used to our very particular style. IMO the conversation should be about brainstorming how to deploy such a tool not whether it's worth doing. For example, yes, it could be part of the editing process, catching tone problems before they're saved, but it could also come afterwards. It could be similar to the other bots we have that stop by a user talk page and say something like "hey, I noticed you just made this edit. Thanks! It looks like there are some possible tone problems you may want to address". It could be logged, as others note above, which seems like a potential boon to recent changes patrollers. Access could be granted through a user right we grant people if they seem to be acting in good faith. I don't know what the right answers are, but it seems like there are a lot of possibilities here and I don't think a flat "no" is the right call. — Rhododendrites talk \\ 22:55, 24 May 2025 (UTC)

:+1 to your entire comment. We need to figure out how to help new users who may not understand that saying "uses select ingredients" (continuing off Tamzin's example as edited by ChatGPT in my "experiment" above) is not acceptable on Wikipedia... while also not enabling spammers. I agree that a "no" isn't the right call, since the cat is already out of the bag on people being able to use AI/LLMs to help them "de-spammify" their text. -bɜ:ʳkənhɪmez | me | talk to me! 23:31, 24 May 2025 (UTC)

As someone who does a lot of anti-spam work (primarily on Commons) I actually disagree with the backlash to this tool pretty strongly for many reasons (none of which have anything to do with AI). Namely:

  • The whole premise assumes that spam like #1 gets regularly caught because it looks like spam. No doubt some of it is caught but by no means is it all caught and some of it sticks around for years. The assumption, I guess, is that people are going to be regularly searching for these promotional phrases to nuke the remaining spam from the site. But either people aren't doing that or aren't doing that enough, because I find it just all the time, and likewise it shows up on AfD all the time. (The problem is even worse on Commons.)
  • Given that, #2 is an objective improvement over #1. The point of SEO copy is to be promotional and get their product associated with shit like "leading," "best," etc. The point of Wikipedia is to present as neutral a point of view as possible. (This is not a binary. Less promotional > more promotional, always.) So removing verbiage like #1 is a win for us and a loss for them: it mutilates their keyword-optimized copy, and it makes Wikipedia look less obviously embarrassing.
  • The examples given actually have no distinction whatsoever on the discoverability of spam. If someone is searching for the keywords "iconic" and "beloved" to spot spam then it doesn't make any difference where they are in the post or who they're attributed to. "Select ingredients" is maybe an improvement, but mostly because the phrase isn't used enough by anyone (spammers or not) for it to be worth a search. That "improved" article also removes any claims whatsoever of notability and arguably puts it into obvious speedy/prod territory.
  • One of the most reliable way to discourage any kind of malicious activity -- as Isaacl mentioned -- is to increase friction. This is why ChatGPT is so useful to spammers, it removes the friction associated with having to actually get someone to write the spam. Some people will forge on anyway, especially if they are paid to do so, but some people will give up, especially if they are automating some parts of the process.
  • Some articles may be written in a promotional fashion, but the company involved might still be notable. A common case of this: when a company has become notable for negative press, and the company hires some kind of reputation management firm to write a glowing article that doesn't mention any of that negative press. Deleting the article is a better outcome for them than having it turned into an actual encyclopedic article about how they became mired in notable scandal.

Not that I think this is going to be some massive improvement. I tend to agree with the people arguing that people will just keep using ChatGPT instead of learning a wiki tool. But it's nowhere near the end of the world, and all the time spent discussing this would be better spent tracking down extant spam. Gnomingstuff (talk) 20:29, 25 May 2025 (UTC)

@Sohom thank you for sharing this clarifying context about the origins of Edit Check. And @Fuzheado, I think you are spot-on in naming the value in us sharing details on mw:Edit check/Tone Check (and the soon-to-be created en.wiki page) about how Tone Check has come to be.

While we prepare documentation about Tone Check to ground us all in how the feature is currently implemented and what we (collectively) still need to figure out, I wanted to build on what @Sohom Datta shared above by offering more information about the broader Edit Check project, of which Tone Check is one part.

As you consider the below, there are a few things we'd like to learn from y'all about Tone Check and the Edit Check project:

  1. As @Novem Linguae and @CaptainEek noted [1] [2], many signals exist to detect spam/destructive edits. What signals do you notice yourself using most? How/where do you monitor those signals? E.g. Special:RecentChanges, particular project pages/noticeboards, etc.
  2. Abuse Filter, like Edit Check, offers people automated feedback about the edits they're attempting. What Abuse Feature features/controls do you value? Further, which of these kinds of features/controls have you not yet seen implemented and/or planned for Tone Check?
  3. How – if at all – have you noticed the editing behaviors of people you assume to be acting in bad faith evolving in response to AbuseFilters?

{{collapse top|title=Background: Edit Check}}

Edit Check is a ~2.5 year old initiative meant to simultaneously:

  1. Reduce the moderation workload experienced volunteers carry by enabling them to more effectively prevent and discover damaging edits
  2. Support new(er) volunteers (≤100 cumulative edits) acting in good faith to publish constructive edits by meeting them with actionable feedback about Wikipedia policies while they are editing.

At present, 3 Edit Checks are deployed and 2 are under active development. All 5 Checks have been, and need to continue to be, shaped through conversations like the one we're having here.[3][4][5][6][7][8][9][10]

Further, all Checks are implemented in ways that enable volunteers, on a per-project basis, to explicitly configure how they behave and who these Checks are made available to. For each Check, we also implement corresponding edit tags so that we can all evaluate the impact of each Check and how they are behaving on a per-edit basis. Note: defining what aspects of Tone Check are configurable on-wiki (T393820) is something we need y'all's help with, as noted above.

Deployed

  • Reference Check: prompts people to consider referencing the new content they are adding when they do not first do so themselves.
  • An A/B test of Reference Check concluded the feature caused an increase in the quality of edits newcomers publish without causing any significant disruption.
  • People shown the Reference Check are 2.2 times more likely to publish a new content edit that includes a reference and is not reverted within 48 hours.
  • The highest observed increase was on mobile where contributors are 4.2 times more likely to publish a new content edit with a reference that is not reverted within 48 hours.
  • As @Tamzin noted above, we cannot definitively say all of these edits are "constructive." Although, we are following AbuseFilter's lead and building Edit Check in such a way that as editing behaviors shift, you all can modify existing Checks, create new ones, and combine Checks (e.g. Paste Check + Tone Check + Reference Check), to make the system more effective and robust over time.
  • Reference Check is available at all Wikipedias except en.wiki.
  • Note: @Robertsky raised the idea of enabling Referencing Check at en.wiki and @Femke raised this idea more recently as well.
  • The Editing Team is now working on a deployment proposal. Please let me know if you'd like us to ping you when that's ready for review.
  • Link Check:  evaluates all external domains people attempt to link to against global (meta:Spam_blacklist) and local volunteer-maintained lists (MediaWiki:BlockedExternalDomains.json, MediaWiki:Spam-blacklist)
  • Reference Reliability: evaluates all external domains people attempt to insert against the same global and local volunteer-maintained lists that Link Check does.

Active development

  • Tone Check: uses a language model to prompt people adding promotional, derogatory, or otherwise subjective language to consider "neutralizing" the tone of what they are writing.
  • Note: the interface will make no suggestion of how to revise what someone has written. Rather, it makes people aware that something they've done might benefit from reconsideration and why.
  • Paste Check: detects when people paste content from an external source and prompts them to consider whether this action could risk a copyright violation.

We developed Edit Check because of the:

  1. Input we received from volunteers in the form of Wishes (2021, 2023, 2023, 2023, 2024)  and new feature requests
  2. Effort we've seen volunteers putting into creating maintenance templates, Abuse Filters (e.g. Edit Filter #686), scripts (e.g. CiteHighlighter.js by @Novem Linguae, HighlightUnreferencedPassages by @Phlsph7, copyvio-check.js by @DannyS712), bots (e.g. XLinkBot by @Versageek and @Beetstra, Valcio/BOTutor by @ValeJappo), gadgets (e.g. Unreliable/Predatory Source Detector by @Headbomb), proposals (e.g. Text reactions by @SD0001), edit notices, and documentation pages, etc. See more at mw:Edit_check#Background.

Taking a step back, when we think about Edit Check and its future, we think about it like a language – or an open-ended way for communities to encode the policies and moderation processes they converge on into editing interfaces – in ways that are effective at achieving two deeply interdependent and important outcomes:

  1. Reducing the moderation workload experienced volunteers carry
  2. Increasing the rate at which people who are new contribute constructively

Speaking personally, I think it's important to acknowledge that Edit Check is trying to do something difficult: to bring two outcomes ("1." and "2." above) into harmony that have historically been in opposition (to an extent). To do this effectively, I think we need more conversations of exactly this sort that help us align on a set of needs and help drive us towards solutions that are viable for new and experienced volunteers alike.

{{collapse bottom}}

Next steps

Now, in terms of next steps: the plan I shared on Friday is still in effect. Right now, we're working on updating mw:Edit check/Tone Check and creating en:WP:Edit Check/Tone Check so that with a shared understanding, we can work together to figure answers to the important questions you are raising here, like:

  • What aspects of Tone Check need to be configurable on-wiki?
  • What data are we logging about when and how Tone Check is presented and how people interact with it? Further, who has access to this information and where is this information accessible? @Berchanhimez helpfully raised this question here.
  • What risks are we tracking as it relates to Tone Check? What additional risks do we need to consider? How might we effectively mitigate and monitor these risks?
  • How might we experiment with Tone Check in ways that enable us to safely and meaningfully evaluate its impact on experienced and new(er) volunteers?
  • How exactly was this model trained and how will it learn/become more effective over time?

PPelberg (WMF) (talk) 21:53, 29 May 2025 (UTC)

:@PPelberg (WMF)

:1. I search for common words and phrases used in promotional or otherwise unconstructive edits -- which are not necessarily promotional words in isolation -- and keep a list of removed spam to find more patterns in. (Not all that dissimilar to machine learning really.) My main focus is undetected edits, so I don't use Recent Changes. When searching for spam files on Commons, I also look at the uploader's edit history across projects.

:The main takeaway I have noticed is that most undetected spam on enwiki is on userpages or userpage sandboxes, sometimes for years. I don't mess with searching/editing other people's userpages at all, but I suspect you could find a lot of spam by running searches for obvious promotional copy in that namespace.

:2. The abuse filters we have are fine -- there could be more filters, but the fundamental idea of introducing friction is good. The number one thing that will improve anti-spam, by an enormous margin, is not any kind of feature or plugin or tool but more people doing the work. Flagging for manual review is all well and good, someone has to do the review though.

:3. I realize that any remaining spam I find is the result of survivorship bias (i.e. if the spam is more subtle I'm probably not finding it), but when I have found, say, undetected spam uploads on Commons, their enwiki history is usually them attempting to create lasting spam, failing, then giving up and going away. Gnomingstuff (talk) 17:26, 31 May 2025 (UTC)

RfC: Adopting a community position on WMF AI development

{{User:ClueBot III/DoNotArchiveUntil|1751526069}}

{{RFC|prop|rfcid=679849B}}

Should the English Wikipedia community adopt a position on AI development by the WMF and affiliates?

This is a statement-and-agreement-style RfC. 05:05, 29 May 2025 (UTC)

= General =

== Discussion of whether to adopt any position ==

  • We have two threads on this page three open village pump threads about the WMF considering or actively working on deploying AI technologies on this wiki without community consultation: {{slink||WMF plan to push LLM AIs for Wikipedia content}}, and {{slink||The WMF should not be developing an AI tool that helps spammers be more subtle}}, and {{slink|WP:VPT#Simple summaries: editor survey and 2-week mobile study}}. Varying opinions have been given in both all three, but what is clear is that the WMF's attitude toward AI usage is out of touch with this community's. I closed the RfC that led to WP:AITALK, and a third of what became WP:AIIMAGES, and what was clear to me in both discussions is that the community is not entirely opposed to the use of AI, but is deeply skeptical. The WMF's attitude appears to be the mirror image: not evangelical, but generally enthusiastic. This mismatch is a problem. While we don't decide how the WMF spends its money, we should have a say in what it uses our wiki's content and editors to develop, and what AI tools it enables here. As discussed in the second thread I linked, there are credible concerns that mw:Edit check/Tone Check could cause irreversible damage even without being enabled locally. Some others disagree, and that's fine, but it should be the community's decision whether to take that risk.{{pb}}Therefore I believe we need to clearly establish our position as a community. I've proposed one statement below, but I care much more that we establish a position than what that position is. This RfC's closer can count me as favoring any outcome, even one diametrically opposed to my proposed statement, over none at all. -- Tamzin[cetacean needed] (they|xe|🤷) 05:05, 29 May 2025 (UTC), ed. 14:35, 3 June 2025 (UTC)
  • :{{tqq|what is clear is that the WMF's attitude toward AI usage is out of touch with this community's}} ... with some in the community, while it's in touch with others in the community. That much should be clear by now.
  • :{{tqq|we need to clearly establish our position as a community}} ... we don't clearly establish a position as a community on anything, not even on basics like what articles Wikipedia should have, or what edit warring is. There are hundreds of thousands of people who edit this website, and this "community" is not going to agree on a clear position about AI, or anything else. Groupthink--a single, clearly established position as a community--is neither possible nor desirable. Levivich (talk) 16:59, 30 May 2025 (UTC)
  • ::PS: these sort of things work better organically. If you want to get everybody on board on a website with hundreds of thousands of users, history has shown the best way to do that is from the bottom up, not the top down. Posting a statement on a user page and seeing if others copy it, writing an essay and seeing if it's promoted to a guideline... those kind of approaches work much better than trying to write a statement and having people formally vote on it. Levivich (talk) 17:10, 30 May 2025 (UTC)
  • Hi everyone, I’m the Director of ML at the Foundation. Thank you for this thoughtful discussion. While PPelberg (WMF) has responded in a separate thread to address questions that are specific to the Tone Check project, I wanted to chime in here with some technical perspective about how we use AI. In particular, I want to highlight our commitment to:
  • Prioritize features based on what we believe will be most helpful to editors and readers. We aren't looking for places to use AI; we are looking for ways to help readers and editors, and sometimes they use AI.
  • Include the community in any product initiative we pursue, and ensure that our development practices adhere to the principles we’ve aligned on through conversations with the community.
  • Our technical decisions aim to minimize risk. We select models that are open source or open weight, host models on our own servers to maximize privacy and control, use smaller language models that are more controllable and less resource-intensive, and ensure that the features that use these models are made configurable to each community that sees them ([https://uk.wikipedia.org/wiki/%D0%A1%D0%BF%D0%B5%D1%86%D1%96%D0%B0%D0%BB%D1%8C%D0%BD%D0%B0:CommunityConfiguration/AutoModerator example]).
  • We also follow processes that make these decisions, and the broader direction of our work, as transparent as possible. We share prototypes of our ideas long before they’re finalized, evaluate the performance of our models using feedback from community volunteers, publish [https://meta.wikimedia.org/wiki/Machine_learning_models model cards] that explain how our models work and include talk pages for community members to react, conducted a third-party a human rights impact assessment on our use of AI (that will be published as soon as its finalized), model cards will start including a human rights evaluation for each new model in production, and we’re now creating retraining pipelines that will allow each model’s predictions to adapt over time based on community-provided feedback.
  • As we continue to refine and test new features like the [https://www.mediawiki.org/wiki/Edit%20check/Tone%20Check Tone Check] or [https://www.mediawiki.org/wiki/Reading/Web/Content_Discovery_Experiments/Simple_Article_Summaries Simple Article Summaries], our product team will share updates via project pages - please feel free to follow along there. CAlbon (WMF) (talk) 15:07, 30 May 2025 (UTC)
  • :@CAlbon (WMF), I took a look at the Simple Article Summaries feature (which I was unaware about). Based on the image on the top, as it currently stands the idea appears to be appending LLM generated summaries to the top of articles. This feels at odds with WMF's AI strategy of prioritizing helping editor workflows over using generative content. I would expect a fair amount of push-back from the English Wikipedia community (including myself) if this feature were to be deployed in it's current form. Sohom (talk) 16:02, 30 May 2025 (UTC)
  • ::Hi @Sohom Datta, this is Olga, the product manager working on the Simple Article Summaries project. Thank you for flagging this and checking out the project page. You’re noticing and calling out an interesting part of our work right now. While we have built up an AI strategy for contributors, we have yet to build one for readers. We think these early summary experiments are potentially the first step into our thinking for how these two strategic pieces will work together. To clarify, we’re so far only experimenting with this feature in order to see whether readers find it useful and do not have any plans on deploying it in this current form, or in any form that doesn’t include a community moderation piece. Not sure if you saw the moderation consultation section of the page where we describe this, and we’ll also be posting more details soon. One of the two next steps for the experiment is a series of surveys for communities (planned to begin next week) where we will show and discuss different options for how editors will be involved in generating, moderating, and editing these types of summaries. Curious if you have any suggestions on this. If these summaries were available - what do you think might be effective ways for editors to moderate them? Also happy to answer more questions here or on the project talk page. OVasileva (WMF) (talk) 17:24, 30 May 2025 (UTC)
  • :::I do believe that an AI strategy for readers is essential going forward – getting feedback from what readers expect from Wikipedia (separately from the expectation of editors) is difficult but extremely important. However, a reader-facing AI will also impact editors, as they will have to write articles while taking into account the existence of these summary tools and how they might present the content these editors are writing. That way, it could be interesting to give editors (and the community at large) some level of input over these summaries.{{pb}}A basic possibility could be to have an AI-generated first draft of a summary, that is then editable by editors. The main issue would be that this draft couldn't be updated with each new edit to the main article without resetting the process. To solve that, we could envision a model that takes a unified diff as input and updates the summary accordingly, working in sync with editors themselves. I would be very happy to help in this process, if any more input is needed! Chaotic Enby (talk · contribs) 17:37, 30 May 2025 (UTC)
  • :::@OVasileva (WMF), I think my major concern is that the screenshot shows the AI generated text in the prime position, highlighted over and above beyond volunteer-written text (which is the core of the encyclopedia) and should be the thing we drawing attention to. Wrt to the rest, I would like to Chaotic Enby's comment above. I think we should first define a AI strategy, get community feedback and then design the feature around.
  • :::When it comes to the moderation of such secondary content I think a good model to take inspiration from is the enwiki short description model, which is typically set using a enwiki template that triggers a magic word to set the values in the backend. Sohom (talk) 18:06, 30 May 2025 (UTC)
  • ::::Regarding {{tq|the screenshot shows the AI generated text in the prime position, highlighted over and above beyond volunteer-written text}}, one of my favorite essays is WP:Reader. I love it so much, I quote it on my user page: {{Quote|text=A reader is someone who simply visits Wikipedia to read articles, not to edit or create them. They are the sole reason for which Wikipedia exists.|author=Wikipedia:Reader|title=|source=|style=background-color: BlanchedAlmond}}
  • ::::When evaluating what goes where, all that matters is what's best for the readers. So we should be evaluating what goes where based on which text is better for them, not who wrote it. RoySmith (talk) 18:33, 30 May 2025 (UTC)
  • :::::I agree, but I feel like prioritizing LLM generated text could rub parts of the readers the wrong way, whereas a "show me a simplified LLM generated summary" button would have the same effect, without potentially alienating the portion of the userbase looking for a AI generated summary of the article contents. Sohom (talk) 19:16, 30 May 2025 (UTC)
  • ::::::What I wonder here is, why does a reader come to Wikipedia? Active searchers will have clicked past their google default summary which already generally simply draws from Wikipedia. They will have chosen not to ask their chosen llm app or site about the subject. Presumably they are less likely to want an llm summary. Readers coming from links may have not have made such choices, but I wonder if the differences in expectation are that different. They could also, if they want, place the url in their favourite llm and ask for a summary. Does natively integrating the function that readers can access dilute WIkipedia's USP? That said, we can often have problems with technical language. Previous attempts I've seen to fix this with llms have been quite poor, but as it improves there is something to a tool which editors can use to evaluate their work and perhaps identify the more complexly written parts. CMD (talk) 17:29, 31 May 2025 (UTC)
  • ::::::I think this is a good take stacking with the idea that trust built slowly and lost quickly. For readers that are ideologically opposed to AI, making LLM content the default anywhere important to them on the cite is likely to violate their trust. For more open minded readers they have the option of seeing LLM summary. For the die hard LLM users they will probably find their information elsewhere and that is ok too. Czarking0 (talk) 05:39, 20 June 2025 (UTC)
  • :::::I object to this premise. Wikipedia is a human-curated encyclopedia that anyone can edit. All readers are editors, they're simply allowed to choose whether they edit or not, just as we are. Thebiguglyalien (talk) 🛸 21:57, 3 June 2025 (UTC)
  • :::::If our presumption is that LLM-generated text may be better for readers than human-written, we should just shutter the project and replace it with an AI-written encyclopedia. Zanahary 21:30, 6 June 2025 (UTC)
  • ::::::Note that this has been extensively discussed at WP:VPT and the project has been paused, with folks at the WMF are planning on taking stock of the situation and returning back later next week. Sohom (talk) 21:42, 6 June 2025 (UTC)
  • ::::Hey everyone! Thank you for engaging with this - this is exactly the kind of feedback we're hoping to get at this state of the project. I'll be back after the weekend to speak a bit more on the strategy aspect. Before that though, @Sohom Datta - you helped me realize the screenshot we'd put on the page was pretty misleading. In that screenshot you can see the design for the browser extension experiment that we did. In general, we expect this design to be iterated on as we keep working on this. Most importantly though, it didn't show that the default state for the browser extension was for the summary to be closed by default. Basically, you only see the summary if you click on the dropdown to open it. We tested it this way for the exact reason you mentioned - we wanted viewing the summary to be the choice of the reader, rather than something we force on readers. In terms of the positioning we thought that having it close to the top of the page would help it feel more clearly separated from the article content (more like navigation), but we also explored a few other places to put the dropdown, such as below infoboxes (open to other ideas for placement as well! Like I mentioned above, we expect these designs to change a number of times as we explore this more). I've just added a design section to the documentation that I hope makes this a bit clearer, thanks again for flagging it! OVasileva (WMF) (talk) 08:43, 31 May 2025 (UTC)
  • :::::@OVasileva (WMF) Ooh, the mock-ups look more promising. Is the feature expected to be released as a opt-in browser extension ? Or, do we expect this to be part of the default experience of Wikipedia? (If it is, maybe a button to collapse the bar/opt-out (like those present on the Page Previews feature) would be useful? Also, in it's current state "View simplified summary" and "Unverified" are the most visible elements on the page, which seems to distract over the content itself.) Sohom (talk) 17:16, 31 May 2025 (UTC)
  • ::::::I second the points Sohom makes, although I think it can be a good thing to clearly state that the summary is unverified. On the other hand, having an "Unverified" warning sign on all articles could be seen as an indicator of lower encyclopedic quality, as readers might not immediately realize that it only applies to the summary.{{pb}}The precise date and author are a bit of a clutter, however, and a simple "View machine-generated summary" could be better, maybe with a hoverable information sign alerting that it has not yet been verified, as well as an "X" button to allow users to remove the bar. Chaotic Enby (talk · contribs) 17:21, 31 May 2025 (UTC)
  • :::::::Thanks for flagging this. I see your point around "Unverified". I wonder if maybe we could show the "unverified" tag only once a summary is open and that way make the connection a bit clearer? We wanted to make it really visually obvious but I agree that it might be a bit distracting from the article content itself. I'll bring this to the team to discuss more. Like I mentioned above, the design is in no way final so this type of feedback is really useful right now! OVasileva (WMF) (talk) 12:44, 2 June 2025 (UTC)
  • ::::::::@OVasileva (WMF), thanks a lot! That would indeed make it much clearer. I would be very happy to give more feedback or help if needed, please keep me in the loop! Chaotic Enby (talk · contribs) 20:24, 3 June 2025 (UTC)
  • ::::::These are all good questions, thanks! The browser extension itself was just to allow us to have a lightweight way to experiment and get some initial feedback. We have a series of these small experiments coming up - we started with the browser extension, this week we'll be launching the surveys for communities that I mentioned above, where we'll be asking their thoughts on moderation. Next week we'll also be doing a two-week opt-in only experiment for mobile readers so we can see how the idea fares on mobile. From there, we'll see! We don't have concrete plans yet on what a final version of the feature would be, but I feel like we would start as opt-in only (or potentially a beta feature first for logged-in users), and on-wiki. Right now though we still need to discuss and build out the moderation piece, so any more permanent experiments or beta features are still blocked on that. OVasileva (WMF) (talk) 12:40, 2 June 2025 (UTC)
  • :::::::Agree that the final product should definitely be opt-in only. From what I understand, the surveys are mostly aimed at experienced users regarding moderation-related questions, right? Are other experiments planned for the wider userbase (including users without accounts) once a first moderation workflow is set up? Chaotic Enby (talk · contribs) 20:28, 3 June 2025 (UTC)
  • ::::::::Another thing I noticed: I just took the survey, but the "Agree" and "Disagree" columns get flipped in the fourth page, should that be fixed? Thanks a lot! Chaotic Enby (talk · contribs) 20:57, 3 June 2025 (UTC)
  • :::::::::In my case, the last page flipped from having "Very poor" on the right to putting "Strongly agree" there. I doubt the overall results on that page reliably reflect respondents' views. (Also, no back button? And no way to indicate that one idea is very poor, but marginally better than the others.) NebY (talk) 09:43, 4 June 2025 (UTC)
  • ::::::::::As is common with these surveys, it fails to provide an option for when you just have no idea how good or bad something will be, but insists on an answer. · · · Peter Southwood (talk): 12:31, 13 June 2025 (UTC)
  • :I feel like Simple Article Summaries (SAS) are contrary to a lot of things readers want in an encyclopedia. Readers come to the site trusting that we can give them all the information they want, while (crucially!) substantiating everything we say with sourcing and adhering to NPOV. While other readers could feel differently than I when I decided to join this community, without these two things, Wikipedia would be just another site.
  • :I've experimented with using AI on an encyclopedia. I've had it review my writing. I've asked it to write, with the intention to find shortcomings in my own ideas (if I forgot to say something). Just today, I delt with a user who has made over a thousand edits who cited sources that have never existed, at what appears to be the direction of a LLM. There is absolutely no evidence I've seen, either lived or in my line of work at an AI company, which would lead me to believe that an LLM can stick to the facts. Even the output in your survey is fraught with hallucinations.
  • :Likewise, using LLMs in my line of work, I've noticed the personality fluctuate in dramatic ways with model updates. I've tried my very hardest to correct it with a custom prompt, instructing it to use prose and maintain a neutral, skeptical perspective, but even this has not worked. There is absolutely no evidence I've seen, either lived or in my line of work at an AI company, which would lead me to believe an LLM can write neutrally. The most obvious example is WP:NOTCENSORED, whereas LLMs very much are.
  • :Yes, human editors can introduce reliabilty and NPOV issues. But as a collective mass, it evens out into a beautiful corpus. With Simple Article Summaries, you propose giving one singular editor with known reliabilty and NPOV issues a platform at the very top of any given article, whist giving zero editorial control to others. It reenforces the idea that Wikipedia cannot be relied on, destroying a decade of policy work. It reenforces the belief that unsourced, charged content can be added, because this platforms it. I don't think I would feel comfortable contributing to an encyclopedia like this. No other community has masterered collaboration to such a wonderous extent, and this would throw that away. Scaledish! Talkish? [https://xtools.wmflabs.org/ec/en.wikipedia.org/ScaledishStatish]. 01:41, 4 June 2025 (UTC)
  • ::I feel Scaledish's post strikes the issue square between the eyes. The developers of the SAS are missing the forest for the trees: the threshold questions should not presuppose that the tool is appropriate and beneficial to work of this project and move directly to inquiries targeted at optimization and risk management. There's a real and concerning replication here of the primary rational error and cognitive bias that is driving the rapidly mounting societal harms of AI: that is to say, leaping directly from "this is now technically possible" to "therefore, let's do it!" SnowRise let's rap 21:26, 4 June 2025 (UTC)
  • :@CAlbon (WMF)
  • :The main disconnect here seems to be that a lot of the feedback-gathering seems to be from a UI/UX perspective, where the actual problems involve content. The issue is not that the extension is buggy or hard to use, the issue is that the actual summaries, such as the ones in [https://gitlab.wikimedia.org/repos/web/web-experiments-extension/-/commit/55fdbbb3decdc9b95ae0ef00e98b1108ddc3a498.diff this list], are bad. They're not just flawed, they are so unfit for purpose and demonstrate such a mismatch between task and result that the whole project should have been scrapped or radically rethought the minute someone looked at them. Any serious moderation would throw most of them out for many reasons: claims that don't appear in the original text, inappropriate tone, outright falsehoods, politically controversial and/or legally problematic statements, and Western-centricism (in a feature intended for "global readers," no less). None of this should be surprising: they're the same problems that LLM-generated text has across the board.
  • :That list of summaries, while public, was not easy to find, nor was the example given to us representative of it. The community was told that these summaries "take existing Wikipedia text, and simplify it for interested readers," when what they actually seem to do is generate whole-cloth blurbs targeted at 7th graders ([https://gitlab.wikimedia.org/repos/research/simple-summaries/-/commits/main/utils_prompts.py?ref_type=heads#L71 per these prompts]) with titles like {{tq|Monitor Lizards: Big, Strong, and Wide-Ranging}} (and no, just filtering out the titles doesn't fix the underlying issue). We have had to piece the details together ourselves, and it took people about 15 minutes to find problems that apparently took the team several months to only partially notice. I'm sure there are some misconceptions in our interpretation of the various scattered documents and diffs -- which is to be expected, since we were told very little about the project. How is this being "as transparent as possible"?
  • :Actual transparency would provide, at bare minimum: the exact articles chosen and why, exact prompts used, the full list of output (as well as any intermediate stages or rejected output), the methodology used to evaluate that output, and so on. It would need to also happen much earlier in the process -- at least as early as September 2024, when [https://phabricator.wikimedia.org/T374635 sample summaries were available] and when there would been time for people to tell you exactly what they have told you now. Gnomingstuff (talk) 00:47, 8 June 2025 (UTC)
  • ::The average American reads and writes at a 7th grade level.[https://www.snopes.com/news/2022/08/02/us-literacy-rate/] That's a major demographic we aren't thinking about. A person who is reading our article on zero because they don't understand the number will likely understand:
  • ::* {{xt|Zero is a number that represents nothing. It's special because adding zero to any number keeps that number the same. In math, it's called the "additive identity." Multiplying a number by zero always gives you zero, and you can't divide by zero.}}
  • ::much better than:
  • ::* {{!xt|0 (zero) is a number representing an empty quantity. Adding 0 to any number leaves that number unchanged. In mathematical terminology, 0 is the additive identity of the integers, rational numbers, real numbers, and complex numbers, as well as other algebraic structures.}}
  • ::So what if the first tone is more casual and treats the reader like a child? As a child I enjoyed reading Wikipedia, and if I could've opted in to a feature like this I likely would have. AI-generated summaries are a big part of how people consume information because they can be tuned to the individual reader's preferences. This is clear step forwards in democratizing access to information. Chess (talk) (please mention me on reply) 05:33, 8 June 2025 (UTC)
  • :::Pre-generated AI summaries cannot be tuned to individual reader preferences. They are as static for the reader as our existing lead is. CMD (talk) 05:45, 8 June 2025 (UTC)
  • ::::Except that one could pre-generate a bunch of different summaries targeted to different reading levels and present the best one for the reader. Kind of like how we cache multiple resolution versions of images now. RoySmith (talk) 06:06, 8 June 2025 (UTC)
  • :::::I mean, there's probably lots you can do if you want to approach the utility of having an llm browser extension without being quite as helpful as an llm browser extension. It still wouldn't be tuned to individual reader preferences. CMD (talk) 06:19, 8 June 2025 (UTC)
  • :::If the WMF wants to create summaries targeted at children, that's one thing, but this project was not described as something for children but "interested readers," nor does the topics chosen suggest a child audience. The issues also go well beyond "the tone is more casual." Gnomingstuff (talk) 06:37, 8 June 2025 (UTC)
  • ::::I'm surprised they got that level of quality out of <1B parameter models (Flan-T5 and mt0).[https://phabricator.wikimedia.org/T374143] I wonder how many of these issues are caused by the resource constraints. Chess (talk) (please mention me on reply) 06:57, 8 June 2025 (UTC)
  • :::Just re-read this. The summary has other problems:
  • :::* 0 is not the additive identity, it's the additive identity of integers, rational numbers, real numbers, complex numbers etc. This is an important distinction in math that gets lost in the summary. If the concept of an "additive identity" is too complex for seventh graders -- which it may well be, this is college-level math -- then it shouldn't be in the 7th-grade-level summary. But a LLM doesn't care.
  • :::* "Represents nothing" is ambiguous. Someone who isn't fluent in English might see this and think "wait, it doesn't represent anything? Then what is it?"
  • :::* The part of the summary you didn't quote is even worse. It mentions 0's use in the place value system without ever actually saying it's talking about place value, and then moves on to "this system." What system?
  • :::Gnomingstuff (talk) 17:04, 8 June 2025 (UTC)
  • {{tq|Should the English Wikipedia community adopt a position on AI development by the WMF and affiliates?}} This doesn't seem like the right thing to RFC. Telling the WMF and the 193 affiliates what to work on is outside our jurisdiction, the same way that the WMF telling us what content to write or who should become administrator is outside their jurisdiction. –Novem Linguae (talk) 15:33, 30 May 2025 (UTC)
  • :This is kind of why I'm sitting at either "no opinion" or maybe something that comes out of the first draft I put below. Basically saying what our opinions are, requesting updates be provided directly to us (instead of us having to go search through Meta Wiki or MediaWiki Wiki or elsewhere for them), and that's that. -bɜ:ʳkənhɪmez | me | talk to me! 19:04, 30 May 2025 (UTC)
  • First, I appreciate having some WMF input here. If any WMFers are reading this comment, could you maybe opine on whether providing a relatively short statement to enwp directly (as I proposed below) would be feasible? I can't imagine it's not feasible, but I think that's a lot of the problem - people here don't want to have to go to multiple different websites (Meta, MediaWiki, WMF, etc) and watch different pages on all of them to know that a project is happening or there's an update to it. -bɜ:ʳkənhɪmez | me | talk to me! 19:07, 30 May 2025 (UTC)
  • Here's a statement I'm thinking of proposing:
  • :Wikipedia's greatest strength is the contributors that have dedicated their time, energy and enthusiasm to build "the sum of all human knowledge". Automation, including AI, has played a significant role in assisting contributors, with the best results coming when it is developed in a bottom-up manner. It is important that we continue developing new features and advances to help humans as technology improves, with the understanding that getting it wrong risks corrupting Wikipedia's soul.
  • This is more of a statement of principles than a specific demand/ask, but basically: bots, gadgets, and MediaWiki itself have been crucial in helping humans build Wikipedia. The best ideas were organically started by editors and made their way up through the tech stack rather than top-down. Getting the automation/human balance right is not an easy task, and the consequences of getting it wrong are massive. Thoughts? Legoktm (talk) 18:22, 31 May 2025 (UTC)
  • :@Legoktm I was with with you up until "the consequences of getting it wrong are massive". On the content side of the house, we have WP:BOLD, which basically says "the consequences of getting it wrong are trivial". In the software development world, this is embodied by philosophies like Minimum viable product and Fail fast. Facebook famously stated this as Move fast and break things.
  • :The problem is (as with so many software shops), projects out of WMF seem to take on a life of their own. I don't have any visibility inside WMF, but I'm basing that on what I see as an interested observer, and a veteran of many dev projects IRL. This is understandable. Once somebody (be it an individual dev, a product manager, a VP, whatever) have sunk a bunch of resources into a project, it can be difficult to say, "Hey guys, you know that $X I convinced you to invest in this? It turns out it was a bad idea and we should just chuck it and move onto something else". It really sucks to have to put on your annual performance report "Spent the last year working on something that never shipped and never will" if you're not working in an organization which rewards that sort of thing.
  • :So where I'm going with this is I'd like to see more of a culture where the consequences of getting something wrong aren't so massive. That would encourage more experimentation, which ultimately is a healthy thing. RoySmith (talk) 18:59, 31 May 2025 (UTC)
  • ::@RoySmith: thanks, and I agree with what you would like to see (and working bottom-up is the easiest way to do that IMO). The point I want to communicate about risks is that Wikipedia is ultimately a human project, built and shaped by humans. I support the use of automation when appropriate, but if you automate too much, then what you end up with isn't really a Wikipedia any more. The best case study being when a bot was allowed to take over multiple projects. I think we're too early in the Gen AI development cycle to understand what it fully means, but since folks are making pretty wide statements, I think we need to be honest about what the consequences could be if there isn't enough humanity in Wikipedia. Maybe there's a better way to express it? Legoktm (talk) 19:11, 31 May 2025 (UTC)
  • :::I think you put it quite well. Cheers, · · · Peter Southwood (talk): 12:50, 13 June 2025 (UTC)
  • I think this RfC is quickly going into the sprawling kind and that's not good. It would be quite unreasonable for a new participant to step in and parse through what we have discussed till now. Maybe the idea here should be to coagulate aligned positions into more succinct categories so editors can yay or nay. --qedk (t c) 13:36, 10 June 2025 (UTC)
  • :This. RoySmith (talk) 13:45, 10 June 2025 (UTC)
  • :I would be okay with my position being merged with that of Sodium, for instance, as his statement addresses all the points I consider important. Chaotic Enby (talk · contribs) 14:00, 10 June 2025 (UTC)
  • :I'm beginning to doubt the point of even having a statement. What's the point? Everyone who has pointed out the poor quality, factual inaccuracy, and legal risks (even after "manual review") has been completely ignored. I guess we're just "internet pundits" and the feature isn't "for us." Gnomingstuff (talk) 14:13, 10 June 2025 (UTC)
  • ::I know. The WMF will just do what they want to anyway. There's pretty much 100% opposition to this but I doubt it will stop them. It didn't stop them with Visual Editor 2022 Vector 2022. There was consensus against that but they did it anyway. ~WikiOriginal-9~ (talk) 14:17, 10 June 2025 (UTC)
  • :::I will ask both of y'all to pump the brakes on assuming the worst here. To my understanding, the WMF folks has scheduled a call to discuss this issue (among other things) with the PTAC on the 25th of June, I would atleast wait until then before coming to conclusions. Also, while you don't see any explicit official comments from folks, you can pretty sure they are following these discussions. Issuing a statement is definitely the correct way to go both from the POV of establishing boundaries and helping with identifying process deficiencies. Sohom (talk) 14:28, 10 June 2025 (UTC)
  • :::I don't think that's a good comparison because time has shown the new Visual Editor to be not such a bad idea. And I don't actually think the community opposition is the problem with this -- even if the community loved this feature it would still be a terrible idea. Instead, the quality of the summaries should speak for itself: informing adult readers that {{tq|Logic is like a superpower that helps us think and argue smartly. It's all about understanding how to make good decisions and draw the right conclusions.}} (For some reason this output is obsessed with calling things "superpowers.")
  • :::[edit conflict] I don't think I'm assuming the worst here. Not sure how I'm expected to know about a call on June 25th that to my knowledge was not publicized to anyone until now. Gnomingstuff (talk) 14:33, 10 June 2025 (UTC)
  • ::::Sorry, I meant to say Vector 2022. ~WikiOriginal-9~ (talk) 14:38, 10 June 2025 (UTC)
  • ::::(replying to wikioriginal9) Vector 2022 is also a bad example just cause based on hindsight it wasn't necessarily a bad idea. There was a reason V22 did not necessarily have unanimous consensus to be disabled despite multiple RFCs on enwiki.
  • ::::(replying to gnomingstuff), my point was to assume that folks are listening (as opposed to not). I did not expect prior knowledge of the call. Sohom (talk) 14:47, 10 June 2025 (UTC)
  • :I would be fine with collating a few proposals. {{u|berchanhimez}}, thoughts on combining your proposal with mine as well ? (I think it calls for effectively the same thing with asking for periodic updates). Sohom (talk) 14:52, 10 June 2025 (UTC)
  • ::{{ping|Sohom Datta}} You have my permission to take any part of my proposal that you feel would help. I honestly don't know if I'd support such a strong statement as your proposal is, but at least yours gives an out, and if it was combined with my idea of "early and often" communication and collaboration with the community (for an example), I may be able to support it. Feel free to take any/all/none of my statement and I don't even need credit for it :) - after all, you're the one doing the work to try and get something workable put together. If there's anything I figured out from this discussion, I think having any sort of single statement (even a multi part one) get consensus is going to be a miracle, due to a large spread between views in more than one direction. -bɜ:ʳkənhɪmez | me | talk to me! 20:37, 10 June 2025 (UTC)

== Users who oppose adopting any position ==

  • I firmly oppose any sort of universal statement. The WMF is not here to support just the English Wikipedia. They are there to support all WMF wikis. And if they come up with a reliable, reasonable AI model that works on other wikis, we should not be speaking out against it before we see it. There seems to be a widespread opposition to "AI" in the world nowadays, without considering what types of "AI" it affects or what benefits it can provide. I would support only a statement asking the WMF to comment on the English Wikipedia to keep us updated on their efforts - but that should be a given anyway, so I do not consider that a "universal statement" like this. -bɜ:ʳkənhɪmez | me | talk to me! 05:37, 29 May 2025 (UTC)
  • :Noting here that, while I still believe no blanket/universal statement is necessary, I posted a "request to keep us better informed" style statement below for people to wordsmith and/or consider. I don't even know if I would support making such a statement yet, mainly because I don't know how feasible it is to expect the WMF to make announcements like that here however frequently it may end up being. But maybe such a statement would help assuage the concerns of some people that we aren't being kept in the loop enough or given enough opportunity to provide feedback during early stages of projects, for example. -bɜ:ʳkənhɪmez | me | talk to me! 00:24, 30 May 2025 (UTC)
  • Agree with Berchan here that I am skeptical of this idea as a whole. Loki (talk) 06:05, 29 May 2025 (UTC)
  • I agree with Berchanhimez: it is premature to start determining our positions on tools that have not yet even been properly developed. I think it's important to remember that the entire Wikimedia Foundation does not revolve around the English Wikipedia, and whilst I too am sceptical about such usage of AI, I don't think this is going to be the way to address it (assuming it would ever have any actual impact). – Isochrone (talk) 08:25, 29 May 2025 (UTC)
  • Strongly oppose EnWiki adopting any position; it needs to be a global RfC first before any other action can be taken, as the English wiki should not have veto power over all the other wikis just because of its popularity. Stockhausenfan (talk) 12:37, 29 May 2025 (UTC)
  • We can't say it's clear that WMF's views are out of touch with the community when we haven't heard from the community yet; it could be that there's a strong majority in support of WMF's position outside of EnWiki. (Not that I'm saying this is the most likely scenario of course.) Stockhausenfan (talk) 12:45, 29 May 2025 (UTC)
  • Cluebot is one of the earliest examples of the successful use of AI technology. While fear of new technology is human nature, we shouldn't give into it. I'd rather encourage the WMF to spend its resources on new editing technology (including AI-assisted) rather than some of the other stuff it's spent money on historically, so with regards to enwiki-WMF relations, this would be a step in the wrong direction. Levivich (talk) 15:45, 29 May 2025 (UTC)
  • :@Levivich: that sounds like a reasonable statement to propose? Legoktm (talk) 19:15, 31 May 2025 (UTC)
  • Oppose adopting any position at this time. Short of a collapse of industrial civilization, AI is not going away, and adopting policies and resolutions is not going to protect us from the harmful aspects of it. In my opinion, the Foundation and the community must remain open to exploring how we can use AI to benefit the project. - Donald Albury 18:23, 29 May 2025 (UTC)
  • AI is just a tool. What matters is what you do with the tool. In 10 years, even your washing machine and tea kettle will probably be running AI models. As AI slowly becomes permeated in all kinds of software, people will stop talking about it as it were something special, rather than just another paradigm of building software. I find it exciting that WMF is embracing the future. {{tq|WMF's attitude toward AI usage is out of touch with this community's}} Indeed, but it's not the WMF's attitude that needs to change. Perhaps we as a community could try being less orthodox and conservative. – SD0001 (talk) 18:48, 29 May 2025 (UTC)
  • :{{+1}}. WP:AITALK and WP:AIIMAGES are, of course, reasonable policies. The adoption of those doesn't mean AI is bad, or that any kind of general statement to the WMF about AI is needed (whatever meaning that would possibly have).
  • :The below statement can have the effect of the WMF not exploring AI technologies and possible productivity improvements they may bring, which of course would be detrimental. ProcrastinatingReader (talk) 23:15, 29 May 2025 (UTC)
  • :@SD0001: I think what you wrote would be a useful statement to propose. Legoktm (talk) 19:18, 31 May 2025 (UTC)
  • The use of AI is growing at a rapid pace and (for better or worse) I don't think it'll slow down anytime soon. Any statement or position adopted now may make us feel good in the short term, but won't be future-proof. Some1 (talk) 00:12, 31 May 2025 (UTC)
  • Oppose any statement. Really, guys, oppose tools that have not even been designed yet and that you have no idea how do they work or any actual advantages or disadvantages they may have? And make big announcements that you oppose them just because? And all just because of a provincial and superstitious fear of AI? You'll just embarrass yourselves with such nonsense, and turn Wikipedia into a laughing stock. Cambalachero (talk) 19:13, 31 May 2025 (UTC)
  • Oppose AI is a broad and general concept like algorithms, bots, programs and software which we already have in abundance. The WMF should obviously consult the community when introducing new features and usually does so. It's the applications and features that matter rather than the computing technology. Andrew🐉(talk) 08:30, 3 June 2025 (UTC)
  • Oppose, per several comments above, including Andrew Davidson, Cambalachero, and SD0001. Mike Christie (talk - contribs - library) 14:10, 3 June 2025 (UTC)
  • As I touched upon in another section, I don't think wordsmithing a proclamation specifically regarding one category of technology is the best approach. I appreciate that WMF developers, in general, haven't always engaged the community to a sufficient degree to understand its concerns regarding feature development, and I feel the WMF needs to collaborate more with the community. To help overcome a natural resistance to change, though, I think the community needs to be understanding of exploratory work and rapid prototyping of different concepts. The spirit of a wiki is to quickly try things and revise. Of course, how this works on a highly visible web site is much different than less visible sites, and the effect of reader-visible changes (even for experiments) must be carefully considered. isaacl (talk) 15:58, 4 June 2025 (UTC)
  • :"Quickly try things and then revise" is in an incredibly dangerous and ill-advised philosophy for this particular type of information technology. Our public-facing content gets automatically replicated in a variety of ways by a variety of actors, pushing it out into online ecosystems that we have no control over. The current state of art for generative AI as a nascent technology is absolutely riddled to the core with technical issues making it prone to producing deceptive, misleading information, and (very frequently) just outright hallucinated hogwash. Literally no LLM yet produced is incapable of producing such artifacts with alarming frequency. Integrating this technology with our systems at this moment in time (at all, let alone in the slap-dash, band-wagon-jumping-upon fashion that we are seeing from the WMF's devs), is utterly incompatible with this project's core aims and ethos. It is not a question of whether we will pollute the global corpus of factual information online if we fail to slow down the deployment of these tools: it is merely a matter of just how large the disaster will end up being because we failed to act with diligence. Thinking about the BLP implications alone sends a chilling spike into my core. {{pb}}Nor is this just a matter of our duty of care resulting from how we all assisted in placing this project at the heart of the dissemination of general human knowledge in the contemporary world. There are extra issues that are particular to this moment in time, because the project is in its most delicate moment in its entire history when it comes to potential external forces which would seek to control or suppress our coverage of many socially and politically pregnant topics. All it would take is for these summaries, or other LLM-generated content, to include a small handful of hallucinations about the "wrong" subjects in order to provide immense amounts of ammunition to people who leverage it to the earth for advantage in framing this project in a negative light. And again those hallucinations will happen--there's not the slightest question about it. {{pb}} The WMF should be presently conserving its warchest and focusing it's energies on the legal, social, and public image fight that will define the future of this movement that is going to take place in the next couple of years, not greenlighting technologies that are only likely to add kerosene to the fire. And meanign no disrespect, but there's a lot of people opposing a statement here that are clearly completely missing the point, accusing others of being a part of some sort of anti-AI moral panic from lack of understanding of the technologies while demonstrating their own misconception of the technical issues here. I for one just got done strenuously objecting to the blanket ban on AI images, and couldn't be more disappointed by the lack of nuance in the community's "solution" there. This proposal is clearly not about reactionary, irrational responses to the concept of AI proliferation. That's a larger issue and a bell that can't be unwrung by this community.{{pb}} What this proposal is about is creating a mechanism for this community to monitor, regulate, and control a very specific variety of such tools that we are uniquely positioned to control, by virtue of the community's inherent placement within the relevant systems and our understanding of the implications that will result if we do not exercise that restraint. It is especially necessary in light the Foundation's eminently apparent laissez-faire attitude to these same concerns and "light speed ahead" attitude towards deployment of these tools, despite the fact that they have not gone through even the smallest fraction of the testing or safety analysis that they should have before we were even contemplating such a move. And I say "we", but part of the issue here is that the devs have set an unrealistic timetable for all of this with virtually no consultation with the community. {{pb}}I'm sorry, but a lot of people here seem so pre-occupied with being a part of the crowd that "gets it" and aren't going to be 'chicken little's over an inevitable technological seachange, that they have fully Dunning-Krugered themselves into conviction that the dangers here are being exaggerated. And that's mind-bogglingly short-sighted. The risks here are profound, and there will be no effective way to reverse the damage if we don't show the prudence we should have here, at the outset. SnowRise let's rap 01:19, 7 June 2025 (UTC)
  • ::I don't advocate for quick deployments of features in general, and I agree that there are plenty of areas where caution is key. I do think, though, that the community should keep in mind that it's very, very hard to get consensus in a large group, so requiring consensus to approve every step is a considerable bottleneck. I know some people think there are benefits to slowing down any exploration. All I'm saying is that the community should keep in mind the tradeoffs and where it wants to strike the balance between them. isaacl (talk) 02:05, 7 June 2025 (UTC)
  • :::As to balance, that's a reasonable position, but more of an argument against particularly worded statements than a solid basis for avoiding establishing any restraints. I don't see why, for example, Tamzin's statement of baseline principles would mandate such an extensive series of checks as you are imagining. What it comes down to is that we need to say something which communicates the following: {{pb}}"Hey, as we say around here, these are some pretty WP:BOLD and future-of-the-project-defining ideas that you are trying to push here, and we have some concerns. In fact, we really would have appreciated the community being formally consulted about this from the very beginning of blue sky planning on this, so our input on whether this was an avenue the community was comfortable with was explored before we were suddenly aware you had a half-finished alpha to launch. You all seem to have begged-off the question of whether this was even an in-principle good idea for this project and skipped straight to putting us under the gun to launch a tool that could have deep consequences for this project and how it is perceived. So, meaning no disrespect to the good intentions of the Foundation and its developers, we're going to need to talk about some guardrails here."{{pb}}In short, I see nothing in the proposals so far that would preclude a reasonable amount of oversight that would still allow for the prospect of development. And if some projects did not get greenlit, or get held up for extended periods of time while their rough edges are knocked off...well, that's precisely the point. There are considerable risks attached to the tools being considered here: in terms of our responsibility for our content, for the reputation of this project, and for the neutrality we count on as a legitimate justification for the strength of our processes even in the most uneventful of times, let alone in the shadow of the legal/social/political shitstorm that we all know is coming just over the horizon for Wikipedia. We should be assured that both the value of such technical developments and the risk management are both sufficiently where they need to be, before we assume those risks. And apparently, given the recent evidence, we seem to need to state that explicitly for the Foundation and its developers. SnowRise let's rap 08:55, 7 June 2025 (UTC)
  • ::::@Snow Rise I think I've said this somewhere below, but the checks and balances proposed in Barkeep statement are already the standard operating procedure at the Wikimedia Foundation for the most part. The recent Simple Article Summaries issue cropped up because of new technology being introduced, not the obvious shiny elephant in the room (generative AI) but the much less shiny and small elephant concept of A/B testing.
  • ::::Typically, most features at the WMF is planned and developed iteratively with multiple rounds of feedback from different parts of the community, before a progressive roll out where the feature is deployed in a staggered manner to wikis with more and more activity. Every single wiki where a rollout occurs recieves a community notifications from WMF staffer, this is turned into a community consensus discussion if the wiki is a bigger wiki like enwiki. If a community reacts negatively to a rollout, the rollout is typically paused and eithier the wiki is skipped or significant changes are made to accomodate the wiki's demands. There are already significant checks and balances in the process already. Tamzin's proposal requires that the Wikimedia Foundation require community approval (mind you, not feedback) before this process is started, i.e. when a project is planned and developed something, which, while it sounds good in theory effectively means multiple consensus "approval" discussions at every step of something that is supposed to be a iterative process to begin with. (Imagine if WP:RFCBEFORE required a community-wide RFC to approve every single change to the "idea" already going to be proposed to the community as a RFC)
  • ::::In the case of the Simple Article Summaries project, the Reading/Web team decided to follow a rather idiosncratic workflow. Instead of progressively testing their features across multiple wikis, and asking for feedback and community approval, the Reading/Web team decided to deploy their first iteration directly to the most populous wiki as a A/B test without any major community feedback cycles. The way the Reading/Web Team expected to deploy the project was through a Central Notice that introduced a small amount of code to trigger a dialog that then opted the user into the experiment (see, T387771). This is typically not how software development is done for most features on wiki. I think folks on the team failed to understand that even though the deployment was a "A/B test" in their eyes, the community and the readers would see it as deployment of a new feature (potentially without the communities approval). A/B tests are not something that is typically done on-wiki since we prefer to use feedback cycles instead (I think A/B tests have been used for a total of two or three projects in total). It is a new technology and I assume the folks using it made a good-faith misjudgement and were not aware of how it will be percieved by users. I've already pointed it out internally, but we really shouldn't be doing non-trivial A/B tests for features with prior community approval and I'm hoping the WMF will take that to heart after the negative reaction to this rollout and will modify it's internal processes to accomodate that.
  • ::::TLDR, the existing process does have checks and balances and is for the most part sufficient to convert bad ideas into useful ideas that community might use (I'm pretty sure that if Simple Article Summaries went through the typical feedback cycles it would have emerged as a different product once the community opposition to genAI summaries became obvious over feedback cycles), I see the Simple Article Summaries as a good-faith accident by folks who did not understand the ramifications of how their test would be percieved. I see Tamzin's and other's "rejection" proposals as a introduction of significant beaurcrarcy into the software development process that will gut Wikimedia Foundation's AI team and significantly reduce (and potentially stop forever) any future development of AI features by the team. Barkeep's proposal is the most amenable at the moment, however, in it's current form, it is describing a process that is already in place. Sohom (talk) 14:21, 7 June 2025 (UTC)
  • :::::A couple of years ago, there was a big kerfuffle about some live testing WMF was doing on enwiki. I was part of the small group that ended up meeting with the WMF to discuss this. As I recall, @Barkeep49 was also there; I don't remember who else. The end result was :foundation:Policy:Wikimedia Foundation Staff Test Account Policy. It's not a perfect analogy to this situation since that specifically dealt with the use of undeclared WMF accounts and that's not what's happening here (well, not unless you consider ChatGPT and Claude to be socks). But I think it would useful to take a step back and read the broader message in that policy which is twofold:
  • :::::# Testing is an essential part of development. A lot of testing can and should be done internally, but at some point, you need to get exposure to real users to fully understand the impact of a new feature.
  • :::::# Wikipedia (and most other WMF projects) are production systems. Doing testing on a production system is risky and thus to be avoided until you've exhausted the alternatives, and then only after appropriate discussion with the community.
  • :::::As I mentioned earlier, it is critical that we stay abreast of new technologies. That will inevitably involve making some mistakes and learning from them. Those who fail to adapt to a changing environment will inevitably discover that the environment doesn't care if they adapt or not. So the community just needs to get over that. On the other hand, I think the WMF didn't do a great job on the "only after appropriate discussion with the community" aspect. RoySmith (talk) 15:04, 7 June 2025 (UTC)
  • :::::I completely agree, and will add that the specifics of when A/B testing starts to need community consensus are not yet clear. We don't really want community consensus for tiny aesthetic changes, and massive features like generative AI definitely need such consensus. However, where do we draw the line is something we have yet to establish.{{pb}}The best example that comes to mind is mw:Edit check/Tone Check, another MediaWiki feature for which an A/B test is currently planned (phab:T387918). It is a lot less flashy than this "in-your-face" generative AI, and, at a first glance, we wouldn't expect consensus to be needed for a small "quality-of-life" feature. However, Tamzin raised major objections about the feature's consequences – even if a test wouldn't cause irreversible harm to Wikipedia's image, it might be something that the community would want to reach a consensus on beforehand.{{pb}}I'm not saying that Tone Check in particular is the issue here, but that we should have a meta-discussion on which level of changes should require a community consensus before deployment on the English Wikipedia – and, possibly, even a global consensus in these cases where implementation on one wiki might have consequences on others. Chaotic Enby (talk · contribs) 15:09, 7 June 2025 (UTC)
  • :::::Rushing to A/B testing was certainly part of the problem. There were others. The existence of the project indicated to the community that the WMF team doesn't think our leads are good and thinks it can make an AI that summarises complex subjects better than this massive and outstandingly successful community of editing volunteers. A survey focused on implementation suggested that rejection was inconceivable to the team. Sure, there's a question of why the team chose to and why they were permitted to carry out such testing, but framing this only in terms of testing enhances the perception of a gulf between community and WMF and the ever-present sense among the community that the WMF developers too often just don't get it. NebY (talk) 15:52, 7 June 2025 (UTC)
  • ::::::More communication is certainly needed, and probably from both sides. It could be great to have more avenues for interaction, as we often run into these situations where the community doesn't know the inner workings of the WMF (and at which stages they can give feedback!), and the WMF doesn't know the needs of the community. Chaotic Enby (talk · contribs) 16:10, 7 June 2025 (UTC)
  • :::::::In the first draft of my comment above, I also used the phrase "both sides", but then I thought better of it. The problem is that saying "both sides" reinforces the mindset that there's a competition here, and I would prefer that we not think like that. WMF and the editing community exist in a symbiotic relationship. RoySmith (talk) 16:20, 7 June 2025 (UTC)
  • ::::::::Yes, I completely agree with that – I was thinking of "both sides" in a non-competitive way (two communities working together), but it's true that the notion of "sides" might be a little too "us vs them". Maybe "both communities" could be a better wording? Chaotic Enby (talk · contribs) 16:23, 7 June 2025 (UTC)
  • ::::::@NebY For context, The original proposal comes from a specific subsection of WMF planning where they are trying to {{tq| Increase retention of logged-out readers by 5% on apps and 3% on web}} by creating "new experiences". (Something that has been previously requested by community folks as well) They had decided to tackle the fact that our leads our not good. (The WMF is not strictly speaking wrong here) The project was to see if {{tq|[the WMF] can make an AI that summarises complex subjects better}}. (And I don't think it's a inherently problematic thing to investigate to be honest) That was the hypothesis and it was mentioned during the planning process (search for the text: "machine-generated summaries" in the planning document) which underwent community feedback until 31st May. (The document has a awful amount of corporate speak and I don't fault the community for not finding the problematic parts) The way they went about implementing their test for the hypothesis was flawed. (For all of the reasons outlined above) If the team had gone thought the proper steps of iteration and community feedback they would have probably figured out that AI-summaries was not the correct place to go and would have potentially landed on community-led summaries once their hypothesis was proved wrong. The jumping of the gun and direct "test on 10% of users" was the reason we ended up with a community confrontation. Sohom (talk) 16:55, 7 June 2025 (UTC)
  • :::::::I've worked in big places that had a multi-layer approach to rolling out experimental services. We used to start with "teamfood", where only members of your dev team would get the new feature, then proceed to "dogfood" (as in Eating your own dog food) where it was shown to all company employees.
  • :::::::Then we would move on to what I guess in other places might have been called a "public beta", where we rolled out the new feature to some percentage of external users in an A/B comparison test. The selection of users could be configured in all sorts of ways ranging from those who met some specific requirements ("Don't show to users who are subject to GDRP") to totally random. Typically we'd slowly ramp up the percentage as we got more confidence in the feature. This also had the nice side-effect of letting us better judge the performance impact on our systems. It also gave us a built-in Kill switch. If we found some drastic problem, we could immediately stop all testing without having to roll out a new deployment.
  • :::::::I wonder if we could do something like that here (in general for new feature rollouts, not specifically just this one). Have some way to identify "internal users". Perhaps those with more than some threshold of account age and number of edits (perhaps with opt-in/opt-out layered on top). Let them get the feature first and solicit comments from them. WMF typically does rollouts on small projects first, and enwiki last; the problem is that by the time enwiki gets the feature, it's already a fait acompli. RoySmith (talk) 17:21, 7 June 2025 (UTC)
  • ::::::::An opt-in A/B test group could be helpful in getting data from a broad set of users, while not affecting the vast majority of Wikipedia readers. It wouldn't be as good as a random selection (and probably, to avoid affecting performance for non-logged in readers, only be available to logged in users), of course. isaacl (talk) 17:45, 7 June 2025 (UTC)
  • :::::::::I think the WMF is kinda-sorta building out capabilities to do that this year with their Edge Uniques projects. Sohom (talk) 17:49, 7 June 2025 (UTC)
  • ::::::::::Yes, I am aware of this work in progress. As far as I understand it, the capability enables A/B testing but doesn't allow for opt-in. An opt-in level may be helpful for some types of A/B testing that may be unduly disruptive for the entire readership. isaacl (talk) 17:57, 7 June 2025 (UTC)
  • :::::::::::Since it is based on browser cookies, making it opt-in is technically feasible (only add the cookie of people subscribed to the testing). However, it will of course be a skewed sample (mostly focused on experienced editors), but that can definitely be a good in-between step before full testing, and allow for editor feedback.{{pb}}However, for big additions like Simple Summaries, I'm not necessarily sure going straight to A/B testing is the best option, even on a small sample. A/B is really effective when you are comparing two versions of the same feature – not when you are adding a whole new layer to your product, with consequences that you can't really measure with engagement metrics alone. Chaotic Enby (talk · contribs) 20:17, 7 June 2025 (UTC)
  • ::::::::::::Sure, with development, it's possible. As far as I can tell, it's not currently within scope. My understanding of the current implementation plan is that it's handled on the edge, caching servers, so it doesn't know if you're logged in or not, and has no access to any personal configuration. isaacl (talk) 22:16, 7 June 2025 (UTC)
  • ::::::::I think the problem is not that there isn't a feedback process, but that the feedback process is not working. [https://www.mediawiki.org/wiki/Reading/Web/Content_Discovery_Experiments/Simple_Article_Summaries/Usability_study This] is the feedback received from the study, which seems to have been 8 participants, some of whom are non-native English speakers, with the conclusion {{tq|We might have a hit on our hands with this feature.}}
  • ::::::::The most troubling part of this is that: "The only participant who said they would not use it was a seemingly native English speaker who used university-level diction in their responses. They expressed familiarity with deep reading for research. For them, Simple Summaries weren't useful because they would just read the article. This may suggest an inverse proportion of educational attainment and acceptance of the feature. This also suggests that people who are technically and linguistically hyper-literate like most of our editors, internet pundits, and WMF staff will like the feature the least. The feature isn't really "for" them."
  • ::::::::This feels, frankly, like an invitation to disregard all feedback from us because "it's not 'for' us." It also feels patronizing to lower-literacy and non-native English speakers to decide that the factually incorrect and unencyclopedic AI "summary" content generated is good enough for them. Gnomingstuff (talk) 08:27, 8 June 2025 (UTC)
  • :::::::::It could be interesting to see how representative that sample is of the Wikipedia audience. Assuming that editors must be "hyper-literate", and that there is a rift between them and readers, feels at odds with Wikipedia's mission, and I am curious to see if there are statistics on reading levels in readers and editors. Chaotic Enby (talk · contribs) 13:01, 8 June 2025 (UTC)
  • ::::::::::@Gnomingstuff, To push back a bit, a) I wouldn't consider this to be a feedback stage, that is a initial survey and nothing else b) I don't see that as a invitation to do anything here, but as a personal observation/editorializing on the part of the research team conducting the survey (and probably a accurate statment one to be honest). Multiple countries where I would not expect folks to not know English that well show up among the top 10 countries that visited Wikipedia last month, including the likes of India, Brazil, Germany and Phillipines ([https://stats.wikimedia.org/#/en.wikipedia.org/reading/page-views-by-country/normal|table|last-month|(access)~desktop*mobile-app*mobile-web|monthly see this table]). I would like to encourage folks to still assume good faith against WMF staffers and not assume that they had pre-emptively decided to ignore consensus.
  • ::::::::::To respond to @Chaotic Enby, I think that study would be a hard thing to conduct primarily cause we prefer to preserve the anonymity of our readers, however, if you look at the graph above and compare that against a this graph of editors representation by continent (from a study conducted in 2022), you can see that there is a discrepancy in the ratio of contributors from Asia, Latin America and Africa vs the rest. Additionally, the latest community report states that over 81% of editors have atleast completed highschool with 42% of folks having a Masters or a Doctorate. I think there is definitely a point towards a discrepancy in literacy towards the readers we are targetting. The problem is real, whether we use AI to fix the issue is a different issue altogether. Sohom (talk) 13:48, 8 June 2025 (UTC)
  • :::::::::::@Sohom Datta, the other thing I'm having in mind is the question of whether editors are consciously writing for a lower reading level than their own. While that would be ideal, editors might unconsciously use their own understanding level as a reference point, and having statistics on the reading level of our articles could be good.{{pb}}I know that reading levels of the original articles were evaluated in phab:T395246 using the Flesch-Kincaid Grade Level, but, as far as I know, that was only used as a quality metric for the summaries. I'm wondering if we should look at it from a more statistical point of view to evaluate whether there is a discrepancy to begin with, and how much, between our content and our readers. The JSON files should be available, so I could look into that if a similar thing hasn't been done already. Chaotic Enby (talk · contribs) 14:34, 8 June 2025 (UTC)
  • :::::::::::We are not meant to be targeting places where one "would not expect folks to not know English". It is not a problem to be solved that people who do not know English do not understand our articles well. To be sure, it would be nice if they did, but that isn't a target for en.wiki and anything aimed at that is going to be wildly misplaced. CMD (talk) 14:44, 8 June 2025 (UTC)
  • ::::::::::::@Chipmunkdavis, I don't quite understand your take here, we should always try to make our article accessible to folks who might not have the same command of complex English that I or you have. Yes, obviously certain folks who do not understand English as well will be underserved, but that does not mean we shouldn't try, especially when some of those demographics make up a large portion of our readers. I know we are not there yet, but dismissing it as "English Wikipedia is not targetted at them" doesn't seem in line with our mission of being a encyclopedia in the first place. Sohom (talk) 15:12, 8 June 2025 (UTC)
  • :::::::::::::"folks who might not have the same command of complex English" is not exactly the same goalpost, and that will result in different considerations. English Wikipedia is targeted at English language speakers, being one of 342 currently supported language encyclopaedias, each intended to reflect speakers of that language (plus those that have been shuttered, because they did not reflect speakers of their language). CMD (talk) 15:25, 8 June 2025 (UTC)
  • ::::::::::::::@Chipmunkdavis I see where the disconnect comes from, when I said "would not expect folks to not know English" I meant "would expect to have a reduced competency in English/have the very good command of complex English", sorry for the mixup :( Sohom (talk) 15:32, 8 June 2025 (UTC)
  • :::::::::::I'd like to and am trying to assume good faith, but the research team has already characterized readers who might be opposed to the feature as, among other things, "internet pundits." That's not the kind of phrasing one uses when talking about someone whose opinions they value. The whole thing also mischaracterize why people might not like the featur. Most people here aren't opposed to a simple summary feature (as you can see in the discussions), but to the use of generative AI anywhere, and to the poor quality and poor accuracy of the AI-generated blurbs. Gnomingstuff (talk) 16:40, 8 June 2025 (UTC)
  • ::::::::::::@Gnomingstuff The person who wrote up the report is somebody from UX design who (I assume) was editorializing a fair bit because they were enthusiastic of their work and was relatively new to the WMF (I think they joined in 2023) and thus did not expect this level of scrutiny in what was a pretty small report (I don't even know if they expected public scrutiny in the first place). Folks are human and even stodgy academic research papers take victory laps at times that are ill-judged (there is a paper in 2008 that claiming that they had solved web security). Yes, in hindsight it was a a poor choice of wording, but you still have the burden of proof that this sentiment was shared by the whole team (or for that matter by the rest of the leadership). Sohom (talk) 16:54, 8 June 2025 (UTC)
  • :::::::::::::You're right, I don't know what the whole team thought deep down in their heart of hearts about the report, but the team was happy enough with it to directly link to it on the [https://www.mediawiki.org/wiki/Reading/Web/Content_Discovery_Experiments/Simple_Article_Summaries Simple Article Summaries] overview page that they showed to us on the original village pump thread, and to quote from its findings and judgment of the "main issue to be addressed before release." They also seemed to have enough trust in the results to go ahead with the next stage of the project (the browser extension experiment), and to make that feedback-gathering less about whether the summaries were a good idea, and more about whether people clicked them. From what I understand, the survey that we got also didn't include any place to say we didn't want this feature, but I didn't see it so I don't know exactly what was asked. Gnomingstuff (talk) 17:14, 8 June 2025 (UTC)
  • ::::::::::::::That is a good point, I agree that there was a lack of a "we don't like this" option presented to us in the original message. I will bring that up internally/at PTAC as another potential area for improvement. I don't think the idea was to shut out community feedback, but with hindsight it does look bad that no such option was provided. Sohom (talk) 17:36, 8 June 2025 (UTC)
  • :::::::::::::::Thank you for doing that. I don't think the idea was to shut out community feedback or mislead the community, although in practice the community has been misled and I haven't been impressed with the PR-speak surrounding the whole thing. I do think that the idea devalues the content of the summaries, so much as the perception that they are good or trustworthy.
  • :::::::::::::::A lot of that could be been solved even without community involvement, even. We wouldn't be here had the WMF hired copy editors/fact checkers to go through each "summary" and flag anything with inappropriate tone, false statements, or statements that weren't in the original article. Basically anything that violates the prompt. (I used to do this at my last job for alt text.) This would at least have caught a lot of the crap like how Tinder is "a fun and easy way to meet new people online" or how the GDP is "like a report card for a country's economy." If they really wanted to do it right they would also hire subject matter experts to fact-check them, since a lot of the errors are subtle. Gnomingstuff (talk) 17:52, 8 June 2025 (UTC)
  • :::::::@Sohom Datta, Yes, the team tackled a hard problem. Our leads are like democracy, not good but better than the alternatives, perpetually in tension between being readable and being right – checks and balances, if you like. Now we find that some initial summaries were said to be readable so a batch were sent to legal, who threw some out but didn't point out that others just weren't right. Was it no-one's place to say so, would no-one dare to say the emperor had no clothes? And in seeking to maximise reader retention percentages, are developers of AI summaries, tonechecks etc. and their management too deprioritising Wikipedia's USPs, such as the high standard expressed in Wikipedia:Five pillars, "All articles must strive for verifiable accuracy"? NebY (talk) 14:18, 8 June 2025 (UTC)
  • ::::::::I'm confused. Why is legal reviewing summaries? RoySmith (talk) 15:31, 8 June 2025 (UTC)
  • :::::::::In the original Simple Summaries experiment [https://gitlab.wikimedia.org/repos/web/web-experiments-extension/-/commit/8db793ba76e371d583c819a472c5fbb36605500b legal was asked to review the summaries that were to be shown to the user] (and threw away some of them). Sohom (talk) 15:41, 8 June 2025 (UTC)
  • ::::::::::That doesn't answer my question. Why are lawyers reviewing encyclopedia content? RoySmith (talk) 16:24, 8 June 2025 (UTC)
  • :::::::::::@RoySmith, Legal review is fairly standard (read mandatory) for new deployments of features on any production wiki. There have been cases (the Commons Structured Data deployment comes to mind) where legal review required changing the publishing flow to include text that mentioned the license they were being published under. Given that the summaries here were statically hardcoded by the extension, I assume the legal team found them in the source code and decided to review them as well, I don't think the idea here was to have legal review every summary going forward or for legal to have any say in what the summaries would say once the actual extension was deployed. (atleast they don't mention any plans about it in their mediawiki page, in fact, the extension did not even have ways for the enwiki community to moderate the summaries themselves, which was mentioned on the mediawiki page which imo is a critical oversight).
  • :::::::::::A more community forward/better plan would have been to allow the community to review before this freaking thing happened, but I think we've already established that the Simple Article Summaries was a poorly thought out experiment that should have never been deployed on a live wiki to start with. Sohom (talk) 16:41, 8 June 2025 (UTC)
  • :::::::::::I assume for the usual reasons lawyers do pre-publication reviews: check for copyright vios, defamation, incitement to violence, etc. Remember: these summaries aren't written by the website's users, they're written by the WMF (by software written/maintained/used by WMF employees) so none of the internet safe harbor stuff would apply. Not surprised it would get legal review. Levivich (talk) 16:41, 8 June 2025 (UTC)
  • ::::::::::::OK thanks, that makes sense. RoySmith (talk) 16:59, 8 June 2025 (UTC)
  • :::::::::::BLP concerns, libel, etc. [https://phabricator.wikimedia.org/T386493 Here's one discussion] from phabricator, and [https://phabricator.wikimedia.org/T389845 this link] lists a couple of subjects that are problems (note: this filter doesn't seem to have been implemented yet, and the sample summaries certainly haven't been filtered on it, otherwise stuff like Project 2025 and Jeffrey Epstein and antisemitism and suicide would not even have made it into the test summary set).
  • :::::::::::* Murders (crimes generally in the last 100 years)
  • :::::::::::* Terrorist acts (e.g. hotel mumbai shooting, Las Vegas sniper)
  • :::::::::::* Political parties that still exist
  • :::::::::::* Terrorist groups
  • :::::::::::* Mental health (suicide, depression)
  • :::::::::::* Controversial subjects (bomb making; chemical weapons)
  • :::::::::::Gnomingstuff (talk) 16:44, 8 June 2025 (UTC)
  • ::::::::@NebY, I see the Simple Summaries experiment as a tech demo to gauge sentiment rather than an oportunity at evaluating the summaries themselves. I assume the reason the summaries were thrown away was because the assumption was that the English Wikipedia would have the power to do the same when it came to the actual deployment of the feature. Also, legal review is a standard part of any feature being deployed onwiki, it was not something specifically added to bolster this feature's chances. I see no evidence for us to assume malicious/bad faith against the product managers who are leading these initiatives. The metrics used to evaluate these models (from a ML POV) are unfortunately not public but I assume both that and the number of summaries that were thrown away would have factored into the final report as a potential downside when evaluating the experiment as a whole. To my understanding, this is not a case of willfully hiding evidence, but the fact that we are looking at the wrong thing in the wrong light and assuming intentions that just weren't there to start with.
  • ::::::::I particularly dislike your framing of Tone Check's product manager as person who {{tq|deprioritises Wikipedia's USPs}}. The product manager has been extremely responsive onwiki, has taken Tamzin's concerns to heart, has opened phabricator tickets tracking work on the feedback they have recieved (see T395166 and T327563 and T327959), and has committed to interacting with and onboarding feedback community and answering folks questions about the product through a community call hosted on Discord. Sohom (talk) 16:11, 8 June 2025 (UTC)
  • :::::::::I'm not suggesting malice or bad faith. I am perturbed that the reviewing of the summaries was limited. I have a lot of respect for the intelligence and general knowledge of staff in legal departments, so I would guess they would have noticed that there were also factual errors without legal implications but might not have felt it was within their remit to point them out. In such ways, alas, the emperor goes down the road without anyone calling out.
  • :::::::::I have no intention of accusing anyone of wilfully hiding evidence, I don't think they have and don't know how you've read me that way. This is an entirely different kind of process failure, or rather multiple failures, and they're disheartening because they suggest a cultural disconnect. I'm very glad that the Tone Check project manager is now putting such effort into engaging with the community; I remain concerned about the organisational culture and processes that got them into this situation. NebY (talk) 16:39, 8 June 2025 (UTC)
  • ::::::::::@NebY The review of the summaries was limited agreed. That being said, I don't think legal is the team that is considered to be very engaged with the product and the community. Typically feedback and community insights (atleast in the context of WMF features) come from the engineers, other product managers and community laisons (read movement communication folks). Many of the engineers (and some product managers as well) at the WMF are folks who are English Wikipedia volunteers who have spent significant amount of time volunteering onwiki (some of whom are admins on this wiki and have served as stewards). Also, yes, the culture inside the WMF is obviously not as open as that of Wikipedia, where anybody can object to anything, but in my experience, it is a lot more open than most other tech companies working in the web space. (I can speak from personal experience, I've met folks from the Growth team, Community Tech team, the Moderator Tools team and even folks who are currently in charge and every one of them were interested in hearing and onboarding criticism/feedback of the products their teams were developing). Sohom (talk) 17:31, 8 June 2025 (UTC)
  • ::::::{{tq|There are already significant checks and balances in the process already.}}
  • :::::Yes, I am aware. But very clearly more is needed here, when 1) at least one development team has demonstrated just how laissez-faire they can be about existing custom in this respect, and 2) the software in question here presents an unprecedented danger and a challenge to safe deployment that is very arguably not feasible to meet at this time.
  • ::::::{{tq| Tamzin's proposal requires that the Wikimedia Foundation require community approval (mind you, not feedback) before this process is started, i.e. when a project is planned and developed something}}
  • :::::Right, which is, beyond the merest shadow of a doubt, exactly how that should work, for anything that would involve textual content creation by any generative model. We should never be hearing about anything that uses generative AI when the development team is preparing to test on our production pipeline/public facing content. There should be no question in the minds of anyone working at any level at or for the WMF of that happening again with anything involving a generative model producing article space or adjacent content.
  • ::::::{{tq|which, while it sounds good in theory effectively means multiple consensus "approval" discussions at every step of something that is supposed to be a iterative process to begin with.}}
  • :::::Again, I just don't see that mandate in Tamzin's statement of interests. Can you be more specific about what language makes you fear an endless series of checks? What I see is a clear requirement that the community be informed early of the concept and be given the opportunity to scrutinize the idea during the blue sky phase (and yes, if the community finds the concept too problematic in even the broadstrokes, the opportunity exercise its discretion in whether to allow development of the feature on-wiki to proceed.
  • ::::::{{tq|(Imagine if WP:RFCBEFORE required a community-wide RFC to approve every single change to the "idea" already going to be proposed to the community as a RFC)}}
  • :::::The problem with that analogy is that we are not talking about garden variety RfCs, or anything like such, here. It is very difficult to overstate just how much damage could be done--and just how irreversible much of it could be--by not exercising caution in this area.
  • ::::::{{tq|In the case of the Simple Article Summaries project, the Reading/Web team decided to follow a rather idiosncratic workflow. . . . It is a new technology and I assume the folks using it made a good-faith misjudgement and were not aware of how it will be percieved by users.}}
  • :::::I agree with all of this, but with a critical caveat: the issue is not merely how the approach of the team was bound to be perceived by the community: more important is the huge potential for damage with this tool, and the apparent sight-blindedness of the development staff to that fact as well.
  • ::::::{{tq|I see Tamzin's and other's "rejection" proposals as a introduction of significant bureaucracy into the software development process that will gut Wikimedia Foundation's AI team and significantly reduce (and potentially stop forever) any future development of AI features by the team.}}
  • :::::The thing is, to my mind, that is by leaps and bounds the lesser of two evils here. The possible stalling of AI development generally has far less potential for catastrophic harm than unchecked use of generative AI in content drafting at this moment in time. Here's the very simple truth of the matter: every LLM trained for text production hallucinates--or let's put it in more apt terms for our purposes here: makes shit up. Left, right, and center. Constantly. {{pb}} Now this might be a feature that can be to some extent forgiven in certain use cases--or at least has less severe consequences in some. But when your business is very specifically providing factually reliable information, that is a hell of an unavoidable implication of a tool. LLMs simply do not have robust self-corrective mechanisms in this respect, and it could be some time yet before they do, as this is a consequence of the fact that LLMs do not employ logical syntax like traditional software but rather produce their output through weighted associations--as I'm sure you know. Point being, this is not a quality of such models that is going away over night, nor one which the WMF software development teams are going to be capable of mitigating, by and large. {{pb}} So honestly, if our "bureaucracy" leaves us two years behind where industry leaders are other actors are racing forward (and not altogether without negative consequences, mind you), that might very will be the best possible thing that could happen in these circumstances. We have to recognize that these tools, as they exist today, are not fit for purpose for the work of this project. In fact, we might just be the single worst place to try to deploy generative AI text, in the entirely of the contemporary information ecosystem online. {{pb}} And yes, I recognize that part of your concern is that all AI software may get lumped in with LLMs. But A) I don't see why that outcome can't be avoided with further discussion about what the ultimate oversighting guidelines look like and B) even if that were a result of community action, I would still judge it a small price to pay, relative to the potential harm of swinging too far towards a too permissive attitude from the community on these issues. SnowRise let's rap 22:20, 7 June 2025 (UTC)
  • ::::::@Snow Rise {{tq|But very clearly more is needed here, when 1) at least one development team has demonstrated just how laissez-faire they can be about existing custom in this respect,}} - I don't know if I've made this clear, but the web team was nowhere close to deploying the feature, they decided to do a tech demo (for the lack of a better word) on a live wiki and whatever you would expect to happen in that case happened. Better policies are needed but it is a broader discussion and a problem for every feature rather than just AI. WMF internal processes need to change here.
  • ::::::{{tq|Right, which is, beyond the merest shadow of a doubt, exactly how that should work}} - Consensus is a very different thing from feedback. Consensus means a 30 day wait with a mandated binary outcome often final, even if based on an early version of a feature. That kind of rigid checkpoint can actually undermine the iterative nature of development. For example, if early concerns (say, lack of moderation) get fixed midway through development, but a significant portion of the community had already opposed it based on the first iteration, the project might be dead-on-arrival despite having meaningful improvements with no downsides. Feedback on the other hand is a two way iterative street.
  • ::::::{{tq|We should never be hearing about anything that uses generative AI }}, Generative AI is not a one-dimensional technology there are uses cases in areas such as translation, classifying and highlighting text that might be too technical or gendered and so on, that would not harm the encyclopedia.
  • ::::::{{tq|Again, I just don't see that mandate in Tamzin's statement of interests.}} - Ideas change iteratively throughout development. Let me take the case of a fictional Simple Article Summaries, which (say) that it made it through it's first round of review by the community and it was agreed upon to use a specific kind of AI model that was free from hallucinations (say). Now, imagine the product manager found that the agreed on AI-model architecture was not able to scale to being used across so many pages requiring a rethink of the architecture and the use of a different model, previously it would have been a quick-ish switcharoo, now requires a RFC. Now, imagine, after that the product manager finds out that model really doesn't like a particular set of Japanese or Chinese characters that are present in a bunch of ledes and needs to change the model architecture again, what would have been a day's worth of work needs a 30-day wait. This is in stark difference to Barkeep's proposal which says "hey, you need to get feedback before deploying on enwiki", which would have also caught the problems without potentially having 3 RFCs dragging on a week of iterative development to 3 months (this is a conservative estimate assuming only the English Wikipedia was targeted).
  • ::::::{{tq|more important is the huge potential for damage with this tool, and the apparent sight-blindedness of the development staff to that fact as well.}} - I agree that they shouldn't have done it, on looking at the extension source code it was nowhere near production ready and I'm not sure why they decided to experiment with it on production. I'm as interested as you are in figuring out what went wrong so that we can apply the bandage correctly and in a way such that we avoid such a outcome going forward for any product, not just AI ones.
  • ::::::{{tq|So honestly, if our "bureaucracy" leaves us two years behind where industry leaders are other actors are racing forward }} - The AI team doesn't only work on new features. They also help maintain the liftwing infrastructure which is used by almost every antivandalism tool to filter for more severe vanadlism edits and many of the growth features used in Special:HomePage. Gutting that team will mean that a large portion of this critical infrastructure will be left without a good maintainer or a steward. I'm not sure that's a good (or even desireable) outcome? Sohom (talk) 00:15, 8 June 2025 (UTC)
  • :::::::{{tq|Consensus is a very different thing from feedback. Consensus means a 30 day wait with a mandated binary outcome often final, even if based on an early version of a feature.}} I don't think this rigid RfC-style consensus is necessary in most cases. I see it more as semi-binding feedback: if there is a clear consensus in the community's responses (maybe just after a few hours or days), WMF researchers should take it into account to some extent. But it doesn't mean they have to wait for someone to formally close the discussion or be forced into a binary choice: in most cases, the community's opinion might be more nuanced, and an open discussion (rather than a rigid binary) can capture this better without forcing the researchers' hand in non-obvious cases. And, at the same time, researchers can keep iteratively working on the feature and gathering feedback on their updates from the community, making this a continuous back-and-forth discussion rather than a series of rigid RfCs.{{pb}}{{tq|Generative AI is not a one-dimensional technology there are uses cases in areas such as translation, classifying and highlighting text that might be too technical or gendered and so on, that would not harm the encyclopedia.}} From what I understand, those latter two would be classification rather than generation? Granted, the same models are often trained on both tasks, but I don't think every use of language models necessarily counts as generative. Chaotic Enby (talk · contribs) 00:25, 8 June 2025 (UTC)
  • ::::::::{{tq|I don't think this rigid RfC-style consensus is necessary in most cases. }} I agree but I fear that given that AI is a controversial topic, chances of discussions spiraling out, requiring multiple days is going to be norm without folks who are confident of acting as discussion moderators/stewards. (not to mention, that detractors of semi-controversial features would be more likely to vote in subsequent RFCs than the folks who aren't invested in the feature but supported it, making it harder for a "yep your good" outcome early on). I would potentially be advocating for a very different outcome if Tamzin's statement read {{tq|without first obtaining substantial feedback from potentially affected wikis}}
  • ::::::::{{tq|From what I understand, those latter two would be classification rather than generation? }} - You are right, classification would definitely be classificative, highlighting could be generative depending on the context. I was more equating generative AI to the transformer architecture. Sohom (talk) 01:06, 8 June 2025 (UTC)
  • ::::::::{{tq|Better policies are needed but it is a broader discussion and a problem for every feature rather than just AI. WMF internal processes need to change here.}}
  • :::::::That's a cogent point, but I for one have no problem with starting by creating a particular bulwark against overly credulous, incautious, devil-may-care approaches to this particular variety of issue, given its unique challenges and particularly pronounced and self-evident risks. If the community wishes to contemplate a more extensive re-orientation of the oversite of WMF labing on the project, I'm sure we'll all have opinions. But one thing at a time in order of operations matching the scope and cause for concern, is my take.
  • ::::::::{{tq|Generative AI is not a one-dimensional technology there are uses cases in areas such as translation, classifying and highlighting text that might be too technical or gendered and so on, that would not harm the encyclopedia.}}
  • :::::::As someone with multiple dimensions of expertise on this subject, I actually think the issues with AI generated translation are much more significant than have been recognized on this project to date. But I'm also capable of recognizing the ship has to some extent already sailed there. Again (and with sincere respect, I feel like we are going around and around in circules on this point), the kind of AI that I (and I think most others here with heavy concerns) am saying needs to be either outright proscribed for the immediate future or must be at least considered with the heaviest of scrutiny and testing from the earliest planning stages is this: LLMs and other ML-based models which generate natural language ouputs for any public facing content space on the encyclopedia. I think that's very specific and tailored, and entirely reasonable, given the objectives of our editorial work here and the known, common, and serious flaws of LLM-generated text, vis-a-vis constantly generating non-factual statements and fake sources (to name just the most serious manner in which such content is typically an issue, but hardly the only one).
  • ::::::::{{tq|Ideas change iteratively throughout development . . . potentially having 3 RFCs dragging on a week of iterative development to 3 months (this is a conservative estimate assuming only the English Wikipedia was targeted).}}
  • :::::::Again, I feel we're going around in circles to some extent here too, because I don't see why you think (for example) Tamzin's proposed statement would lead to such a cumbersome system, and I don't think I'm going to understand that belief until you explain what specific verbiage in their statement gives you that concern. But honestly, at the risk of sounding like a broken record, even if I thought that was a possible outcome, I would still favor the proposal over the current status quo and lack of a reasonably spelled-out statement of what the community does not want to see in terms of generative AI without, and forgive the stolid bureaucratic speech: an epic shit ton of discussion and community consultation, with full transparency from the devs from the earliest planning states. And honestly, such rules stand to save the devs much time and wasted effort on something that is never going to fly here, in addition to serving the project's needs.
  • ::::::::{{tq|The AI team doesn't only work on new features. They also help maintain the liftwing infrastructure which is used by almost every antivandalism tool to filter for more severe vanadlism edits and many of the growth features used in Special:HomePage. Gutting that team will mean that a large portion of this critical infrastructure will be left without a good maintainer or a steward. I'm not sure that's a good (or even desireable) outcome?}}
  • :::::::No, indeed not, I agree. But that also feels like a false choice to me. The community, I think, is more than capable of making distinction between the former category of technical product and the latter. The general editorial community and our technical specialists have been striking this balance more or less capably for a long time. I guess I'd agree with the statement that this is getting to be a more difficult balancing act all of the time, but I don't think the proper solution to that issue is to start writing the WMF's dev teams blank checks. Especially when this episode has emphasized just how out of touch they can be about what is a useful feature, vs. something that is terrifyingly ill-judged for an encyclopedia. SnowRise let's rap 05:13, 9 June 2025 (UTC)
  • ::::::::@Snow Rise, Tamzin's current proposal is ambigous in it's current state, due to the fact that it does not adequately define what happens when a "idea" changes. What happens if within a single project multiple iterations with different novel avenues of using AI are proposed during it's lifecycle? Your reading appears to be the more positive view that the community will auto-approve the new novel avenues since the previous ones were also approved by community consensus. I however am taking the more pessimistic (and imo realistic) view that community members who are not for controversial features will try to wikilawyer the definition of "novel avenues" and will call for the team to participate in multiple long RFCs every time changes are made to the idea that subtantially alter how the AI will be deployed. I find your assurances that the community will somehow turn into a oracle on AI-software projects to be very unlikely based on historical experiences during my four+ years working on Wikimedia software code as a volunteer developer. Sohom (talk) 06:31, 9 June 2025 (UTC)
  • :::::::::Well, I guess my perspective is that Tamzin's statement (or another similar one) should be the start of our regulatory efforts here, not a final guideline as the particulars; it is, afterall, heavy on aspirational statements of concern and the relative remit of the WMF development teams and the editorial community, and features next to nothing in terms of specific proposed processes. {{pb}} I fully agree that we'd need something that sets clearer standards and thresholds that would allow developers to be flexible (and indeed, transparent with the community) concerning their approach. We gain nothing by making them feel so nervous about making proposals that they hedge their bets and communicate in non-specifics for fear of triggering a backlash they can't recover from. {{pb}}But to my mind, none of that obviates the need to set out some broad-strokes expectations to start. Considering your response, and taking a look at Tamzin's wording with those thoughts in mind, it seems to me that the most operative/controversial phrasing is that concerning "novel avenues". I have been interpreting this as meaning we should have a handful of major checks for any new proposed software feature. I can see now that you (not unreasonably) believe that this might be interpreted as a call for consensus being triggered by any change in approach, even minor pivots between major benchmarks. But I think this can all get ironed out with further WP:PROPOSALS. {{pb}}I just think the danger of doing nothing is more pressing and significant than potential stagnating effects from an initially somewhat broad statement. As other have noted here, I don't think we should let the perfect be the enemy of the good in this particular moment. That said, I don't want to minimize the issues you raise, and I see how you come to view them as major stumble points from your particular experience and perspective as a developer. I just think there's a happy medium that can be reached, with the statement in question as the first stepping stone, rather than the final word. SnowRise let's rap 06:55, 9 June 2025 (UTC)
  • ::::::::::@User:Snow Rise I spent a bit of time, I don't think I can bring myself to support Tamzin's statement since I don't find the assurances of "we will figure it out at some point" to be anywhere near sufficient. However, maybe we can split the difference and land on something like so?
  • ::::::::::{{tbq|At present, AI is integrated into the English Wikipedia in the contexts of antivandalism and content translation, with varying degrees of success. The use of AI for translation has been controversial and the WMF's use of generative AI as a proxy for content in Simple Article Summaries was unanimously rejected by the community. As a result, the English Wikipedia community rejects any attempts by the Wikimedia Foundation to deploy novel avenues of AI technology on the English Wikipedia without first obtaining an affirmative consensus from the community.
  • ::::::::::* Deployment here refers to the feature being enabled in any form onwiki, eithier through A/B testing, through the standard deployment process, or through integration into Community Configuration. Modifications made to existing extensions and services like the ORES extension or the LiftWing infrastructure must be atleast behind disabled by default feature flags until affirmative consensus is achieved.
  • ::::::::::* Wikimedia Foundation teams are heavily encouraged to keep the community notified of the progress of features through venues like WP:VPWMF of ongoing initiatives and to hold multiple consultations with affected community members through out the process of the development of the features.
  • ::::::::::* Wikimedia Foundation teams should also keep transparency in mind as it works on AI, both in communication with projects and by enabling auditing of its uses, especially on projects (e.g., use of a tool having a tag applied automatically and open-sourcing and documenting onwiki the output, methodology, metrics and data used to train the AI models).}}

:TLDR of what this means, we ask for only a single hard requirement for consensus before deployment, and we heavily encourage folks to follow a set of general transperancy guidelines when developing AI features. Sohom (talk) 09:14, 9 June 2025 (UTC)

  • :::::::::::Speaking for myself, that would satisfy all the major signposts I'd like to see in an initial statement of principles, and even adds some extra weight to important points, in addition to the carve-outs you made for the additional specifics on bottlenecks. I'm still in support of Tamzin's proposal in principle, but if you wanted to post this as an alternative/refined statement, I would give it my formal endorsement, and I think it stands a chance of getting robust support. SnowRise let's rap 09:26, 9 June 2025 (UTC)

::I would also agree, and it looks similar to my proposed statement below (which also focused on implementation rather than development, although with some nuance). Chaotic Enby (talk · contribs) 12:52, 9 June 2025 (UTC)

:::I've gone ahead and added this proposal as a option to the RFC. Sohom (talk) 02:15, 10 June 2025 (UTC)

  • ::::As I stated, I don't think a statement about one single category of technology is the best approach. I think the community is broadly concerned about feature deployment in general. I think there needs to be better feedback loops for all development. It's often useful to be able to discuss some ideas, and then develop some test concepts or prototypes to help focus more discussion. Personally I feel that it would be too constraining to require community consensus to be established for every early stage idea. isaacl (talk) 17:22, 7 June 2025 (UTC)
  • :::::I agree with your point that this discussion shouldn't be focused exclusively on AI, especially since that is a topic that can easily bring up more heated emotions. Giving more opportunities for communication, and making editors aware of the already existing ones, should be a more general trend. It could be helpful to have more updates (newsletters maybe?) about current WMF research projects, written in a digestible, non-corporate-speak way. Chaotic Enby (talk · contribs) 20:23, 7 June 2025 (UTC)
  • ::::::There's a bulletin regularly posted to this page, with links to various other newsletters and bulletins, and a technical newsletter is regularly posted to the technical village pump. It's challenging because there's a lot of news and everyone has their own specific set of interests. The crowd-sourcing way would be for interested people to aggregate the items related to different domains, but this requires substantial sustained effort. Delegating to a group of representatives is one typical way to enable a crowd to have influence while managing the demands on people's time, but so far those in the English Wikipedia community who like to comment on these types of matters generally prefer not to cede their representation to others. isaacl (talk) 22:31, 7 June 2025 (UTC)
  • ::My thoughts align with Isaacl here, I am not a advocate for the the "move fast, break things mantra" (atleast not onwiki) but this proposal is the equivalent to requiring a RFC-style super-majority approval for every single major edit in a contentious topic. That is just simply not something I can get behind having spent the last four years working on the software development side of Wikipedia especially when it is applied to a field as broad as AI (which you appear to be confusing in your comment with the more narrowly defined and more controversial subset of technologies centered around generative AI). Sohom (talk) 02:59, 7 June 2025 (UTC)
  • :::If I was unclear as to what I meant, let me address that immediately: I meant any generative AI software which autonomously creates textual content, including that generated in an effort to summarize our existing content. Any project or development that seeks to put such content before the eyes of the general reader in any capacity should be seriously questioned, rigorously studied for flaws, and in most cases presented to the community through a formal process well before the actual development begins in earnest. {{pb}} And afterall, why should it be any other way? Last time I checked, this community is still responsible for the content on this project. The fact that the Foundation now, for the first time, has potential tools to generate content in substantial amounts without the need for the community does not mean that the classical remits of each arm of the project have now evaporated into thin air. Unless this community has decided to cede that privilege/responsibility. But for crying out loud (not directed at you Sohom, more a general appeal), surely that prerogative adheres from a lot more in our movement's history and foundational organization than just the fact that the Foundation didn't have Wiki chatbots until now? {{pb}} Those observations aside, my response to your concerns about unwieldy bottlenecks is substantially the same to how I replied to Isaacl above, so I'll direct you there, and summarize here: we don't need a million little checks, but we do need transparency from early in the planning stages and some degree of vetting. SnowRise let's rap 09:13, 7 June 2025 (UTC)
  • Opppose per several of the above comments, particularly Andrew Davidson's and Some1's. Thryduulf (talk) 18:07, 6 June 2025 (UTC)
  • After reading the statements, I am here.--Ymblanter (talk) 10:57, 7 June 2025 (UTC)
  • No, I don't think we as a single community should be telling the WMF what to do and what not to do. If they develop such tools we'll get to decide whether or not to adopt them at that time, and the WMF has other ways for the overall community of Wikimedia project members to contribute to these sorts of discussions. en-wiki is the biggest of those projects and we shouldn't be throwing our weight around about every issue. Our purpose is to build an encyclopaedia not tell charities how to run their affairs. WaggersTALK 12:31, 9 June 2025 (UTC)
  • :"If they develop such tools we'll get to decide whether or not to adopt them at that time" probably should be how it works, but we know it's not accurate, given there was no such option presented to the community when the llm-generated article summaries feature was scheduled to go live. CMD (talk) 01:38, 10 June 2025 (UTC)
  • Oppose for a variety of reasons. As other have stated, enWP is just one project. In addition, I do not think dividing WMF and enWP in advance of a problems is a good relational strategy. I also want to underscore that AI is really too vacuous and evolving of a term for a resolution to be advisable. The meaning of AI is evolving and it is ultimately a means to an end. We should take stances on specific ends that are desireable and undesireable and then work harmoniously with the WMF to achive those stances. Stances on the means taken to get there should be specific to an actual ongoing mean rather than blanket stances on what presumed means might be. This is likewise for the postive case. AI is not an end in and of itself and enWP should not make resolitions to support the development of AI per se.Czarking0 (talk) 16:05, 20 June 2025 (UTC)

= Statement proposed by Tamzin =

At present, AI is integrated into the English Wikipedia in the contexts of antivandalism and content translation, with varying degrees of success. There has never been community consensus for other uses, and even use for translation has been controversial. The English Wikipedia community rejects the use of Wikimedia Foundation or affiliate resources to develop novel avenues of AI technology without first obtaining an affirmative consensus from potentially affected wikis, and asserts the right to control what AI tools are deployed on this wiki.

  • A "novel avenue" is defined as a use case in which AI is not already used on WMF servers by some stable MediaWiki feature. Affirmative consensus for a novel avenue should be obtained through individual consensuses on each potentially affected wiki, or a global request for comment advertised on all of them.
  • All wikis should have the option to opt out of being used to develop an AI tool; to disable or decline to enable an AI tool; or, based on credible concerns of facilitating abuse, to compel the destruction of machine-learning data that has been gathered without local consensus.
  • Any person on the English Wikipedia seeking help in creating a dataset for machine learning should gain local consensus at the village pump for proposals before sending out any mass message or otherwise soliciting data. Those who do not do so may be found in violation of WP:NOTLAB.
  • The WMF is encouraged to devote more resources to areas that the community has requested support in.

-- Tamzin[cetacean needed] (they|xe|🤷) 05:05, 29 May 2025 (UTC)

== Discussion of Tamzin's proposed statement ==

  • Just to emphasize, the first bullet point is about what gets developed at all; the second is about what we enable. So for instance, the first bullet signals no objection to continued development of AI content translation tools, but that does not mean we are conceding that we must enable any new tools of that nature that get developed. -- Tamzin[cetacean needed] (they|xe|🤷) 05:05, 29 May 2025 (UTC)
  • The bolded text is not going to work. The WMF simply cannot reach out for affirmative consensus to every Wiki when it wants something, for practical issues as much as anything else. There are advantages and disadvantages to development strategies, but we should be careful not to mix the questions of development and deployment (the second part of your bolded statement). Many tools are available subject to community consensus, very few things are pushed onto the community (so few the only recent one that comes to mind is VECTOR2022), and it is to mutual benefit that this distinction is maintained. (I only half-facetiously want to propose some bargain, like the community would approve of investing resources into llms when Visual Editor can use named references and handle more than one personal name convention.) CMD (talk) 06:03, 29 May 2025 (UTC)
  • :That's why I left the option for a global RfC. Which I'd be fine with conducting on a timeframe closer to enwiki RfCs (usually one month) than many global RfCs (months to years). I don't think it's unreasonable to ask that, before the WMF decides to sink six or seven figures into some new kind of AI tool that may well run against the community's interests, that they ask the community first, "Hey, is this a good idea?" The WMF are quite familiar with how to quickly alert tens to hundreds of wikis to the existence of a global discussion. Furthermore, it's not a new consensus for each tool, just for each area of expansion. -- Tamzin[cetacean needed] (they|xe|🤷) 06:29, 29 May 2025 (UTC)
  • ::I disagree with speeding things up. I imagine part of the reason those take longer is the need for translation; demanding that the process is sped up seems to be assuming that the result is a foregone conclusion. Stockhausenfan (talk) 12:59, 29 May 2025 (UTC)
  • I disagree with a blanket opposition to new AI uses. I also disagree with asserting a right to create needless bureaucracy. If the WMF does something silly, we can complain about that specific something. Toadspike [Talk] 07:38, 29 May 2025 (UTC)
  • I agree with Toadspike and CMD; I don't think a blanket statement such as this is appropriate, and I think enwiki is only one (albeit the largest) of the communities the WMF serves, and shouldn't try to dictate overall development. There's no reason we shouldn't provide input to the WMF, as threads such as these are already doing, but as Toadspike says, if the WMF does something silly we can deal with it then. Mike Christie (talk - contribs - library) 11:19, 29 May 2025 (UTC)
  • A few months ago I obtained an AI generated list of typos on Wikipedia. I went through much of it manually, fixed a bunch of typos, made some suggestions for additional searches for AWB typo fixing, but ignored a whole bunch of AI errors that were either wrong or Americanisations. I don't consider that what I did was contentious, but it obviously stops me from signing Tamzin's statement unless it is amended to accept AI prompted editing where an individual takes responsibility for any actual edits made to Wikipedia. I'm also tempted to point out the difference between large Language Models or artificial unintelligence such as was used to generate my possible typos, which is what the WMF seems to be talking about and actual intelligence. Fifteen years ago at the very start of April 2010, I started a discussion as to how we should respond when artificial intelligence gets intelligent. But clearly the current discussion is about artificial unintelligence rather than artificial intelligence. ϢereSpielChequers 13:21, 29 May 2025 (UTC)
  • I already said above that I strongly oppose any statement at all until a global RfC is done, but if that doesn't gain consensus, I'll also add that I oppose this specific statement as well. The first part of the statement seems weird to me. Why would we oppose the development of novel avenues of AI technology? They are novel, so by definition we don't know what they do or how they work. The statement should at the very least be amended to replace AI with LLM, and get rid of the "novel avenues" comment. Something like "The English Wikipedia community rejects the use of Wikimedia Foundation or affiliate resources to develop large language models or tools that use them". I'm currently neutral on whether I'd support such an amended statement (if it were discussed in a global RfC), but the statement as it currently stands is a non-starter. Stockhausenfan (talk) 13:34, 29 May 2025 (UTC)
  • :Someone who knows more about the technology may be able to formulate a better statement that clarifies that it's not limited to text but also e.g. image models. But AI is such a broad, poorly-defined term that the way the statement is phrased currently makes it seem unnecessarily Luddite ("English Wikipedia opposes the development of novel forms of technology that may automate tasks that previously needed human input"). For example, a tool that checks whether chess game transcripts on Wikipedia contain errors could be interpreted as a "novel avenue of AI" that WMF cannot develop, even when it does not use any kind of LLM. Stockhausenfan (talk) 13:43, 29 May 2025 (UTC)
  • ::I think the point is that there is enough stuff that has been requested for a long time that isn't yet done, so spending resources on novel uses for AI isn't what those supporting this statement would like to see. ScottishFinnishRadish (talk) 13:58, 29 May 2025 (UTC)
  • :::The issue I have is just that I think we need to be specific about what "AI" is before we oppose its development. A program that can play perfect tic-tac-toe is popularly referred to as an "AI", despite being something that people would create in an introduction to programming class. So presumably a lot of tools that already exist on Wikipedia are "even more AI" than a tic-tac-toe bot. Stockhausenfan (talk) 14:07, 29 May 2025 (UTC)
  • Most of the controversial uses of AI have been generative - which for me includes translation because it's generating new text - and the less controversial uses has been pretty much everything else. So that's the first distinction I think such a statement should draw. Secondly, I agree that consultations on every project isn't practical and that a global consultation won't be representative. So I would suggest the ask be something about enabling projects to opt-out of projects and that tools shouldn't be developed that don't allow that opt-out. So, for instance, the language tool discussed above would have to be done in a way that a user inputs a page from a project and if that project has opted out the tool says "sorry I can't help you". Best, Barkeep49 (talk) 14:35, 29 May 2025 (UTC)
  • :I'm toying with similar ideas in my head, about what guidelines we could request. I would add ensuring that projects remain add-ons to the core software, that developers should be aware of existing community decisions on different uses of novel AI tools, and perhaps a step further to ensure that individual projects/communities need to opt-in. Wikipedia:Content translation tool may serve as a useful learning experience, I know that there has already been one AI tool developed to improve translations in a way that also translates appropriate wikicode. CMD (talk) 15:18, 29 May 2025 (UTC)
  • :Agree that the existing approach of projects opting out of WMF-built tools works better than having the WMF seek consensus from each wiki or run an enwiki-biased global RFC. Telling the WMF to destroy training sets created without local consensus, such as the [https://www.kaggle.com/datasets/wikimedia-foundation/wikipedia-structured-contents Wikipedia Kaggle Dataset], seems wrong because our concern should be whether a given feature is beneficial, not the mode of its creation. ViridianPenguin🐧 (💬) 21:13, 29 May 2025 (UTC)
  • In replacing the annual WP:Community Wishlist Survey with the constant meta:Community Wishlist, we were told that wish popularity would no longer be gauged because of the WMF's misunderstanding of WP:NOTVOTE, only for this month's update to tell us that it is working to bring back a mechanism to support individual wishes. This incompetent overhaul has left us without a dedicated time for brainstorming change, allowing the WMF to substitute its ideas for our own. Contrary to {{diff|Wikipedia:Village pump (WMF)|1292557928|prev|Sohom's reply}} implying that Tone Check was sought by the community, the VPR and Community Wishlist posts that prompted Edit Check were about warning against wikilinks to disambiguation pages and grammar errors, and the 2023/'24 Wikimania presentations were about warnings to include references when adding prose. Based on mounting frustration with the new Community Wishlist, the way forward in realigning the WMF's priorities seems to be reviving annual Community Wishlist Surveys, rather than this poorly attended replacement that replicates Phabricator's always-open ticket log. ViridianPenguin🐧 (💬) 21:13, 29 May 2025 (UTC)
  • :To correct the record, my reply was about EditCheck of which ToneCheck is a part of. Sohom (talk) 21:29, 29 May 2025 (UTC)
  • ::Appreciate the clarification because that reply appeared in a chain of CaptainEek and Tactica criticizing Tone Check as out of touch, not Edit Check in general. Thanks for your technical insight across a multitude of replies here! ViridianPenguin🐧 (💬) 21:38, 29 May 2025 (UTC)
  • I'm not sure I understand the structure of this RFC, so I'll just put my comments here and hope that's OK. There's a few different things intertwined here, which I'll talk about in turn.
  • AI is just a tool/technology and it is not going away (see for example [https://www.nytimes.com/2025/05/29/business/media/new-york-times-amazon-ai-licensing.html?unlocked_article_code=1.K08.6Wba.251R3Wq2zdkm&smid=url-share this in today's NY Times]; 30-day time-limited link). We can bury our heads in the sand, or we can learn all we can about the technology. Personally, I think the latter makes more sense, and the best way to learn about it is to use it, make mistakes, and learn from those mistakes. So of course WMF should be investing in AI.
  • As others have mentioned, WMF is more than just enwiki. If anything, this conversation should be happening on meta.
  • Generative AI is clearly not good enough yet for use on enwiki. If we wanted to say "enwiki bans the use of generative AI text on this project", we could do that (and I'd happily endorse it). But other projects may feel differently, for reasons that make a lot of sense to them, so WMF should be supporting their needs.
  • I'm not sure why affiliates are mentioned here. The idea that the enwiki community could or should have any influence on how WP:WMNYC or any of the other affiliates spends their money is absurd.
  • RoySmith (talk) 21:30, 29 May 2025 (UTC)
  • :Yes this is an important point that I'd overlooked when reading the statement - why are we trying to influence how affiliates spend their money? @Tamzin would you be willing to remove the statement about affiliates from the RfC statement? Stockhausenfan (talk) 23:26, 29 May 2025 (UTC)
  • ::I would appreciate clarity on this as well. Obviously affiliates like WMNYC have never had the ability, or indeed the aspiration, to deploy or impose anything technically on English Wikipedia. Thanks for your thoughts, @Tamzin. Pharos (talk) 20:50, 11 June 2025 (UTC)
  • :::Affiliates have roles in deploying code on WMF servers, most notably WMDE on Wikidata, but also various affiliates on .wikimedia.org and .wikimania.org wikis. More broadly, I don't think that anyone affiliated with the Wikimedia movement—most of whom get money from the WMF to some degree or another—should be using their money to create AIs that will interact with Wikimedia wikis, without consent from the wikis. -- Tamzin[cetacean needed] (they|xe|🤷) 23:16, 11 June 2025 (UTC)
  • ::::I will say that as Enwiki we reallllly should not regulate what happens on other projects like Wikidata and definitely what happens in affiliates in their internal wikis. (I know for a fact that there are affiliates experimenting with using Gemini AI to help make our abilities to make better first-draft OCR technologies for Wikisource) and we as the enwiki community should not get to dictate what technologies they should use. Sohom (talk) 23:34, 11 June 2025 (UTC)
  • :::::I think you're misreading my proposal, Sohom. I never said that enwiki should dictate what happens on other projects. I said that affected wikis should. Enwiki should have a say in a hypothetical WMDE project that would deploy AI on Wikidata in a way that affects enwiki, but shouldn't have a say in one that wouldn't affect us. -- Tamzin[cetacean needed] (they|xe|🤷) 23:38, 11 June 2025 (UTC)
  • ::::::Oh right, yep I misread your statement above. Sohom (talk) 23:40, 11 June 2025 (UTC)
  • ::::::I would agree on this point, especially since wikis are interconnected to some extent. Say hypothetically that a feature was deployed on Wikidata to automatically generate item descriptions where it is missing. Since English Wikipedia retrieves many of its short descriptions from Wikidata, we (and other indirectly affected wikis) should have a say in this to some extent. For a more concrete example, there is ToneCheck potentially being used on one wiki to refine prose being written for another. Chaotic Enby (talk · contribs) 23:42, 11 June 2025 (UTC)
  • ::::What you're describing is I think appropriate for a limitation of WMDE's action with regard to developing (and deploying) features for Wikidata, on a platform it basically controls. But for the theoretical case of WMNYC using its own resources to develop an AI-adjacent feature for English Wikipedia, we would just be in the same position as if Internet Archive or Mozilla were doing the same - our next step would just be to propose adoption to the English Wikipedia community on this very Village Pump. Pharos (talk) 20:44, 12 June 2025 (UTC)
  • AI is a poorly defined concept—now more than ever—but even so using it for the anti-vandalism and translation tools we have now is a major stretch. They both rely on rather simple machine learning models; qualitatively different from generative AI, which is what most people think of nowadays. – Joe (talk) 07:52, 30 May 2025 (UTC)
  • :Not just poorly defined, but continually evolving (see Expert system for what the state of the art looked like 50 years ago). To make a blanket statement that we should "reject AI" seems reactionary. RoySmith (talk) 10:52, 30 May 2025 (UTC)
  • Someone who wishes to use Wikipedia articles to create a dataset to train an AI is free to do so, and does not require any special authorization. That's what it means to be published under a free license. Cambalachero (talk) 04:46, 1 June 2025 (UTC)
  • {{ping|Tamzin}} your list of usage of AI seems pretty incomplete, we also use it for recommendation systems (e.g. User:SuggestBot, mw:Help:Growth/Tools/Add a link), analytics systems, Wikipedia:Content assessment, copyright violation detection (User:EranBot). That's what I could think off of the top of my head, I'm sure there's more. (Not to mention people who use AI tools like Grammarly to assist them while editing, which have sometimes been recommended in documentation.) Legoktm (talk) 16:28, 1 June 2025 (UTC)
  • @Tamzin, I'm a little concerned about {{tq|1=The English Wikipedia community rejects the use of Wikimedia Foundation or affiliate resources to develop novel avenues of AI technology}}. I think it was Roy who first mentioned it, but I don't think we should be preventing affiliates, like local WMF chapters, from pursuing the study of AI if they want to. Though I do sympathize with the sentiment that we generally shouldn't be using generative AI on Wikipedia, there are some cases where AI for other purposes can be useful (as Legoktm mentions above). IMO, we shouldn't be restricting or discouraging affiliates from studying the usage of AI, particularly non-generative AI, if they want to. – Epicgenius (talk) 20:58, 5 June 2025 (UTC)
  • Even if I agreed with the statement, the wording itself is terrible.
  • As others have said, there's no definition of "AI technology".
  • The "preclearance" idea for community consent prior to development will kill innovation.
  • You're telling the WMF they can't even put together a prototype or a proof of concept without a lengthy community consultation on a half-baked idea.
  • According to [https://slate.com/technology/2022/06/wikipedia-administrator-election-tamzin.html Slate], you're a programmer. You should know the waterfall model is terrible. I would literally quit my job if I needed to do a 30-day RfC any time I wanted to start the development process on a new use case.
  • Your idea that enwiki can opt out of letting our data train AI models goes against the free content pillar.
  • All our contributions are licensed under the Wikipedia:Text of the Creative Commons Attribution-ShareAlike 4.0 International License.
  • That allows anyone to create a dataset from Wikipedia articles, content, talk page comments, etc. You agreed to this when you started contributing. This isn't a legal technicality; it's free culture and the idea there should be a community norm of asking for permission before using Wikipedia content is antithetical to our founding principles.
  • I also disagree because I believe the WMF should keep developing new technology to improve the encyclopedia. Our editor pipeline keeps drying up while the enwiki community dumps on any ideas the WMF has to modernize the website. These two things are correlated, despite common misconceptions. Chess (talk) (please mention me on reply) 04:37, 8 June 2025 (UTC)

== Users who agree with Tamzin's proposed statement ==

  • I agree wholeheartedly with the statement. This is an interesting and novel RFC format; I like how it is structured JuxtaposedJacob (talk) | :) | he/him | 11:58, 29 May 2025 (UTC)
  • :Request for comment discussions where only supporting views for proposed statements are gathered used to be more common (for example, the arbitration committee election RfC used to follow this format). They've gone out of favour at least in part because generally people find it easier to weigh consensus support when there are explicit "disagree" statements. isaacl (talk) 03:12, 30 May 2025 (UTC)
  • Agree. I'm not watching this that closely but support this or similar statements.North8000 (talk) 13:33, 29 May 2025 (UTC)
  • agree wholeheartedly. AI integration should be done with consent of community. Bluethricecreamman (talk) 16:23, 29 May 2025 (UTC)
  • I agree with this Don't want the WMF wasting resources on this year's equivalent to the NFT craze. Remember when everything would be utopian because of blockchain? Simonm223 (talk) 18:37, 29 May 2025 (UTC)
  • Andre🚐 21:23, 29 May 2025 (UTC)
  • Yes, although I expect it to be ignored. Stifle (talk) 13:40, 30 May 2025 (UTC)
  • I would tend to agree, although my motivation for it isn't "AI bad". I see AI developments as new technologies that have the potential for disruption – positively as well as negatively. Rolling them out on a project as big as Wikipedia without the support of the community will likely exacerbate the negative effects, especially if we are not given time to prepare or adjust to it. I might write a separate statement (or an addendum) that emphasizes that it is not a reactionary "anti-AI movement", but one based on safety and alignment with our ideals as an encyclopedia. Chaotic Enby (talk · contribs) 17:07, 30 May 2025 (UTC)
  • :I agree with you on the AI alignment, but as written, {{u|Tamzin}}'s proposal prohibits the WMF (and it's affiliates) from even trying to develop (as opposed to deploy or train) any kind of AI technology. Adopting this proposal effectively means that any WMF manager or engineer (or affiliate) planning to use AI for anything (and let me remind you that AI in this context can end up literally being a dumb random forest classifier) will need to first ask for consensus from multiple communities before being able to implement their solution. This kind of bureaucracy will effectively gut any ability for WMF to build any kind of AI technology, good or bad making safety and alignment a moot discussion to have. Sohom (talk) 15:57, 31 May 2025 (UTC)
  • I generally agree, although the references to AI are unnecessary. This should apply to any new technology. MER-C 10:27, 31 May 2025 (UTC)
  • :I'm not sure what you mean by "any new technology"; that's a very broad net. RoySmith (talk) 12:19, 31 May 2025 (UTC)
  • ::We're both been around long enough to recall the fiasco of (say) the Visual Editor rollout, Flow, and other (formerly, in some cases) unfit for purpose WMF software. AI is only another app. What I am seeing is just another manifestation of the same old problems - some product manager gets something built thinking they know the community's problems better than the community does, when they don't. MER-C 17:46, 31 May 2025 (UTC)
  • I have noted my issues with this statement above, but Wikipedia:Village_pump (technical)#Simple summaries: editor survey and 2-week mobile study makes it very clear that a strong statement is needed. It is hard to not be blunt, but :mediawikiwiki:Reading/Web/Content Discovery Experiments/Simple Article Summaries should not be anywhere near the phase it is at. The summary they did all their testing with is quite bad, and it shouldn't have even reached the testing phase. The pushing ahead, including planning a two-week live trial for 10% of mobile readers on the basis of what is shown so far, is cause for alarm. Therefore, I am not going to let the perfect statement be the enemy of the firm and clear one here. I encourage others who were initially unsure or opposed reconsider in light of the new developments. CMD (talk) 02:03, 4 June 2025 (UTC)
  • :Any closer may consider this to also serve as a support for any derivatives of Tamzin's statement generated below. It would not be productive to figure out which (including this one) I personally have the fewest nitpicks with. I remain in support of this proposal as well. CMD (talk) 10:45, 9 June 2025 (UTC)
  • Support Best option. Polygnotus (talk) 02:10, 4 June 2025 (UTC)
  • Support. Chipmunkdavis said it best: a strong statement is needed. LilianaUwU (talk / contributions) 04:54, 4 June 2025 (UTC)
  • Support I'm disappointed but not at all surprised that this is needed. The Squirrel Conspiracy (talk) 06:34, 4 June 2025 (UTC)
  • Agree I think CMD has it right. I don't want to quibble about minutiae, and I think this conveys my concerns well. DJ-Aomand (talk) 12:22, 4 June 2025 (UTC)
  • Strong Support. There is a profound need for the community to set some limits and develop a mechanism for review of AI-features before (indeed, well before) they begin development and deployment. This is a genie that we will find extremely difficult to put back in the bottle if we do not act with restraint and careful consideration from the outset. Our content has a vast reach, and is replicated throughout the internet in ways we typically cannot claw back after it enters that flow of information. We should not take the primacy of our position in the online eocystem for general information, built upon the good name of the work of our volunteers over decades lightly. Nor underestimate the degree of harm from misinformation that may arise from hastily developed AI "enhancements" to our processes and technical infrastructure. Tamzin's proposed statements of general interest and concern are well-considered and reasonable, and a fair roadmap to construct our broader policies in this area--which will by necessity need to evolve substantially and quickly from here--around. SnowRise let's rap 21:37, 4 June 2025 (UTC)
  • Support the statement. AI and LLM models need to be dealt with care on enwiki (all wikis but thats beyond our scope). --JackFromWisconsin (talk | contribs) 04:05, 5 June 2025 (UTC)
  • Support wholeheartedly. Compassionate727 (T·C) 17:35, 6 June 2025 (UTC)
  • Support per Chipmunkdavis. Bishonen | tålk 21:17, 6 June 2025 (UTC).
  • * Pppery * it has begun... 21:47, 7 June 2025 (UTC)
  • Support, emphatically. fifteen thousand two hundred twenty four (talk) 23:20, 7 June 2025 (UTC)
  • Support. Support even more the more I look into how this has actually been done and how sloppy (no pun intended) the execution has been. Gnomingstuff (talk) 23:39, 7 June 2025 (UTC)
  • Support. It is abundantly clear that the WMF is hopelessly out of touch with the needs of editing communities, and in particular has entirely failed to take note of serious concerns raised in multiple places regarding the destructive effects of the use (well-intentioned or otherwise) of AI/LLM technology in the Wikipedia context. The last thing we need is more of the same, from the WMF. AndyTheGrump (talk) 08:50, 8 June 2025 (UTC)
  • Support. This statement provides for the very minimum we should hold WMF to account for, essentially preserving community autonomy. ~Gwennie🐈💬 📋⦆ 01:09, 10 June 2025 (UTC)
  • Support - We need to hit the brakes hard here. I’d be in favor of pausing the research and development of all AI-related work at the WMF, much less deployment. My thanks to Tamzin for the work on this issue. Jusdafax (talk) 01:56, 10 June 2025 (UTC)
  • :@Jusdafax, Stopping all AI-related work risks leaving our anti-vandalism infrastructure unmaintained (which historically have relied on old ORES models and are in the process of being modernized through the introduction of new Revert-risk models). I would be vehemently against us shooting ourselves in the foot here. Sohom (talk) 02:14, 10 June 2025 (UTC)

:::I have a bit of experience with reverting vandals over the past 15+ years. You want to get a handle on that particular problem, you change the rules regarding IP editing, which is where in my experience we Wikipedians are “shooting ourselves in the foot.” I’m of the opinion that the WMF has lost the trust of many rank-and-file editors over the years, and I speak as someone who was a volunteer in the San Francisco WMF offices in the early days and was a witness to what I will charitably term “bureaucratic bloat.” AI is a huge unknown… in my view. Jusdafax (talk) 02:59, 10 June 2025 (UTC)

  • Support per above - the WMF should defer to the community on such issues. 123957a (talk) 20:13, 12 June 2025 (UTC)

= Statement proposed by Stockhausenfan =

The English Wikipedia community rejects the use of Wikimedia Foundation resources to develop novel avenues of generative AI technology without first obtaining an affirmative consensus from potentially affected wikis, and asserts the right to control what generative AI tools are deployed on this wiki.

== Discussion of Stockhausenfan's proposed statement ==

  • I've already made it clear that I oppose making any statement at this stage, but I've made two changes to the original statement to fix what I found to be the two most concerning aspects - I clarified that it's specifically generative AI that is under discussion, and removed the reference to affiliates. Stockhausenfan (talk) 23:39, 29 May 2025 (UTC)
  • :I'm not sure a statement is warranted here, but even if we must, this version is not it. As it currently reads, the statement explicitly forbids Wikimedia Enterprise from working with AI companies without explicit consensus on enwiki (who would just start scraping Wikipedia increasing the load on our servers and causing more outages) or the existence of initiatives like the Wikimedia Kaggle dataset (which was also created to lessen the load from AI scrapers). If we do need to make a statement, it should be something more direct like, The English Wikipedia asks the Wikimedia Foundation (and it's affiliates) to seek community consensus before developing (or deploying) editor or reader facing features that make use of generative AI technology. Sohom (talk) 01:56, 30 May 2025 (UTC)
  • See my comment on berchanhimez's proposed statement regarding my views on the WMF investing in research. isaacl (talk) 03:07, 30 May 2025 (UTC)

== Users who agree with Stockhausenfan's proposed statement ==

  • Cremastra (uc) 20:26, 3 June 2025 (UTC)
  • I supported the Tamzin version above but I think any statement to pump the brakes on generative AI summaries or the like is better than no statement. Andre🚐 03:56, 4 June 2025 (UTC)
  • Support, noting I also support Tamzin’s statement above. Time for a dead stop. Jusdafax (talk) 03:04, 10 June 2025 (UTC)

= Statement proposed by berchanhimez =

The English Wikipedia understands there are both potential benefits and harms that can come from the use of AI, especially generative AI, on or for the encyclopedia. We also understand that the implementation of any form of AI on any WMF project should be supported by the local community, which requires they be informed about the proposed use and have an opportunity to provide feedback during all stages of development.

Therefore, we request the WMF immediately provide information on any project they are currently undertaking or considering that relates to AI that is being planned related to AI. For clarity, "project" includes any study, investigation, development process, trial, model training, or any other similar activity that relates to AI and the WMF wikis, even if not explicitly related to or proposed to impact the English Wikipedia. Following this initial disclosure, we request the WMF to make a similar disclosure as soon as reasonably possible after any new project is initiated, approved, or otherwise begun, or any time there is any significant change in the status of a project, including but not limited to if it is cancelled, being deployed on any WMF project, being tested on any WMF project, or similar.

We request that the notification to us be provided on the WMF Village Pump on the English Wikipedia - and we would encourage the WMF consider providing such notifications to other projects as well, as feasible. The information that we request to be included in the notification is a clear, short description of the project, as well as the reasons for the project, goals of the project, current status of the project, and proposed timeline for the project. A link to more information (such as on Meta Wiki or another place) is appreciated but we request the information above (and any other information relevant) be provided directly in the notification itself.

These notifications will ensure that the English Wikipedia's users are kept informed of all updates to any project relating to AI, and will give us a way to provide feedback in a central place without having to monitor other websites (such as Meta Wiki) to try and find out about projects and provide feedback. We encourage the WMF to monitor the responses to any notification requested above and to treat it as no different than feedback provided through any other means on any such project.

TLDR: Pretty pretty please inform us directly (not just on Meta Wiki or somewhere) of any ongoing/new projects and any significant developments on them, and allow us to discuss them and provide feedback here, so we don't have to go hunting for them or discover them elsewhere.

== Discussion of berchanhimez's proposed statement ==

  • I don't even know myself if I can support this, but I'm posting it here so it can be wordsmithed. I am still of the mind that no blanket statement is necessary/warranted, but if one is to be adopted, I would prefer it to be nothing more than this sort of a collaboration. Anyone can feel free to edit this statement to make corrections to wording, flow, etc. or add to it if they feel it will make it better.{{pb}}I'm putting this out there because I've been kind of thinking about this all day, and I feel that it may be better to have this sort of a request out there as supported by a large portion of the community... rather than just making no statements at all. Obviously we can't enforce this sort of a request on the WMF, but it would send a strong statement that at least some in the community are not happy with having to hunt down projects/grants/etc. to even find out that they exist. I'm not yet directly supporting this statement as I'd like to see how it evolves before I decide whether I support making any sort of statement at all. -bɜ:ʳkənhɪmez | me | talk to me! 00:22, 30 May 2025 (UTC)
  • :This is already the status quo (kinda-sorta). The concerns regarding Tone Check were raised when the first prototype of the feature was proposed for feedback. Typically, whenever WMF rolls out a new feature, they start of by announcing prototypes, asking for community feedback for prototypes, before announcing the feature in tech news, rolling out of the feature for beta testing on smaller wikis, scaling up sizes before starting a discussion on enwiki to enable said feature. This has been the standard operating procedure for any big feature since I've been around.
  • :I will also note that specifically for this year, the WMF did ask for feedback on both it's AI strategy as well as some AI enabled features (which included Tone Check) from the Product and technology Advisory Council during it's first retreat. There is also a separate conversation to be had about the fact that on enwiki there isn't a good WMF noticeboard outside of this page, which does not have the best history in terms of civility towards WMF staff (see the edit notice), which leads to WMF folks posting in other places (like on WT:NPR or similarly more focused venues) over here.
  • :Also, it does need a bit of knowledge of navigating Wikimedia's technical spaces, but all development at the WMF (minus WMF's wordpress instance and Wikimedia Enterprise) happens on eithier Gerrit/Gitlab or Phabricator which are publicly accessible to every user (although, I do concede/agree that they are not the most navigable for the average user). Sohom (talk) 01:19, 30 May 2025 (UTC)
  • ::I tend to agree, but I will say that this makes the request that they inform us before developing AI prototypes in the future, as one change. Perhaps a new page could be made as a forum to use rather than this page, if the concern is civility towards WMF staffers. But I think perhaps much earlier and ongoing interaction directly with the community could stop some of the concerns others have about their approach. -bɜ:ʳkənhɪmez | me | talk to me! 01:28, 30 May 2025 (UTC)
  • :::I would definitely support the creation of such a forum where WMF staffers can ask for feedback on ideas from English Wikipedians (if there is community appetite). For a start, maybe we could re-purpose WP:IANB ? (which will typically have more technically minded folks who are also familiar with community norms and expectations). Sohom (talk) 01:38, 30 May 2025 (UTC)
  • ::::I guess my goal with this sort of a statement is to get them to not only engage with technically minded folks. It's clear from this discussion and the prior one about the Tone Check that many users who aren't technically minded have strong opinions on this sort of thing. So the goal is to get the WMF to, for lack of a better way to say it, "dumb it down" to a level that the community as a whole can understand and engage with - without having to hunt information down or try to decipher it. I debated whether to include something about the level of detail/terms used/etc. but I ended up not to - maybe adding something like "the notifications should be in a manner in which a general English Wikipedia user can understand and engage with, even one without technical knowledge" or similar? -bɜ:ʳkənhɪmez | me | talk to me! 01:43, 30 May 2025 (UTC)
  • :::::I see where you are coming from but there is also a bit of nuance here. Projects like (say) the Wikimedia Kaggle dataset or the newer revert-risk models while AI adjacent do not (and should not) require community consensus to go forward with (Kaggle does not affect the community and Revert risk models are just a technical change migrating to a new infrastructure in the context of English Wikipedia). In my head the way this would work would be for interface-administrators to act as a filter for things to escalate to the community (for example, on hearing the idea for Wikimedia Kaggle dataset interface-administrators can eithier not respond at all or affirm that it looks good, whereas for the ToneCheck idea, a interface-administrator might say "hey, you might want to post on VPWMF or VPP about this?") Sohom (talk) 02:58, 30 May 2025 (UTC)
  • ::::::I don't think that everything should necessarily require community consensus. But involving the community more clearly in what they're doing early in the process would enable people to ask questions and try to understand why it is a good idea. It's not necessarily that they are asking for approval - but just explaining it to the community before they learn out about it in another way.{{pb}}The reason I don't think having a group of people "gatekeep" whether the community learns or not is that it's really no different than it is now - tech-savvy people who know where to look learn about things get to know about them and comment about them, and others feel like they aren't being involved early. There's still two whole threads on this page that, to sum it up in how I see it, were basically "why didn't we know about this, we need to know about this, etc". And that's what I'm trying to maybe help prevent with this idea. -bɜ:ʳkənhɪmez | me | talk to me! 03:07, 30 May 2025 (UTC)
  • :::::::I don't have a intention of introducing gatekeeping, but from my experience working on features alongside WMF (and other volunteer folks) involving the exact right people is a very hard problem that can't be solved by asking the WMF to throw every single new feature development at the community. If we do end up doing that we will end up with a case of banner fatique and start ignoring the actually important messages. I've personally had cases where despite multiple community consultation rounds, I ended up receiving feedback on the eve of deployment. There are also other cases where despite early negative community feedback we decided to go forward with certain technical changes since it helped significantly reduce technical debt in other areas. (the NewPagesFeed codex migration for example).
  • :::::::TLDR, I'm not sure what the answer is here, but I'm pretty certain that "just tell us on a designated page" isn't going to be a good one. Sohom (talk) 04:13, 30 May 2025 (UTC)
  • ::::::::Yeah, I don't think it's a full answer either, but it would at least stop claims of "omg the WMF is doing this AI development and trying to hide it from us". -bɜ:ʳkənhɪmez | me | talk to me! 05:10, 30 May 2025 (UTC)
  • ::::{{tq|I would definitely support the creation of such a forum where WMF staffers can ask for feedback on ideas from English Wikipedians (if there is community appetite).}} This is the spot for that, in my opinion. Creating a second VPWMF, or picking another board besides VPWMF and VPM, doesn't seem like the ideal way to organize things. –Novem Linguae (talk) 15:20, 30 May 2025 (UTC)
  • :::::Fair, and agreed. However, that is based on the assumption that we as a community however need to do better to moderate this page. In it's current state, it is nowhere near a lightweight feedback forum (if that was the original intention). Sohom (talk) 15:53, 30 May 2025 (UTC)
  • I agree with Barkeep49 that I don't think it's practical to ask the WMF to engage in consultations with all Wikimedia communities, on each community web site, for every project and initiative. In my opinion, the WMF is best situated to invest in research, whether on its own or in partnership with universities, on science and technology that can affect the goals of the Wikimedia web sites. I think it's good for it to be knowledgeable about AI research, so it can develop guidance on the advantages, disadvantages, and associated risks and uncertainties. I don't know if I would personally find any blanket statement suitable at the moment. isaacl (talk) 03:05, 30 May 2025 (UTC)
  • :Is there a way to make this sound less like a "consultation" than just a "please keep us informed of things as they happen rather than letting people find out on their own"? Perhaps removing the part about encouraging them to monitor responses? My goal with this sort of a statement is for it to be the "bare minimum" that would prevent the two threads on this page right now from happening again where there were at least significant minorities mad that they found out through this page rather than from the WMF themselves. -bɜ:ʳkənhɪmez | me | talk to me! 03:10, 30 May 2025 (UTC)
  • ::In an ideal world, there could be community liaisons for each community to publicize the WMF's work and help interested editors to participate in the right forums. A key challenge is that it's a hard task to do well, with so many WMF initiatives and projects that would need to be covered, and so many communities speaking different languages. So a lot of staffers would be needed, and the end efficacy is a big unknown: we know from experience that posting messages in all the usual targeted venues still fails to reach editors who express their discontent later. The crowd-sourcing approach is for each community to have interested editors stay in touch with what the WMF is doing and relay that info to the community. I appreciate this requires enough interested editors, which is particularly a problem with smaller communities, and it requires significant volunteer time.
  • ::Of course, any projects affecting the editor experience will benefit from regular editor feedback, and I agree that the WMF should be allocating enough time and resources for this in its project plans. Most recently, WMF developers seem to be aware of this need and engaging the communities. isaacl (talk) 04:52, 30 May 2025 (UTC)
  • :::I'm not saying this to be "enwp elitist" or anything like that, but given that a majority of the WMF employees that would be involved in potentially sending these notifications to us, and given that enwp is one of the most active projects, I don't think it's really too much to ask. That was my intent in including "other projects as well, as feasible". For example, if the person making the announcement speaks another language fluently, then they may consider giving a notification to any projects in that language too. I think, like you say, the WMF has been trying to engage more - this just formalizes our request that we be engaged "early and often", or at least kept updated even if it's not a full back-and-forth style of engagement. -bɜ:ʳkənhɪmez | me | talk to me! 05:13, 30 May 2025 (UTC)
  • ::::To take an example, the WMF did not commit to posting notifications on the WMF village pump, because there is typically another page that is a better fit for a targeted subset of the community who is likely to be interested, and it didn't want to fork the discussion across multiple pages. I agree with Sohom Datta: it's not clear to me that letting loose a firehose of information on this village pump page will be helpful. isaacl (talk) 05:38, 30 May 2025 (UTC)
  • :::::Maybe a specific page for WMF notifications of AI developments then? People interested can go to that page/watchlist it, and then those people could start a discussion here? I guess my goal is to just prevent the "ooh look the WMF is doing AI in secret and not telling us" that was at least a portion of the two discussions that are still above on this page. -bɜ:ʳkənhɪmez | me | talk to me! 05:46, 30 May 2025 (UTC)
  • This is a prime example of what my statement is intended to counter. A tool being developed with, from what I can see, only one prior notification to enwp (that got zero replies there), followed by a scheduled test that we're being informed about barely 2 weeks in advance (at most) and without any opportunity for the editing community to test it in advance and provide our feedback. Thinking about it more, perhaps a new noticeboard (such as WP:Village pump (AI)) would be best for the WMF to provide updates more regularly - but it's clear to me more frequent updates/engagement would be something a lot of users would like. For full disclosure, I did link to my comment from that thread so others may be able to see this and help with it. -bɜ:ʳkənhɪmez | me | talk to me! 22:58, 3 June 2025 (UTC)
  • :I don't think a new noticeboard is needed; we already have Wikipedia:Village pump (WMF) which says {{green|... Wikimedia Foundation staff may post and discuss information, proposals, feedback requests, or other matters of significance to both the community and the Foundation. It is intended to aid communication, understanding, and coordination between the community and the foundation}}. Some1 (talk) 23:54, 3 June 2025 (UTC)
  • :@Berchanhimez See the top of the RFC, we were going back and forth on the design of the exact tool. Also, nothing has been implemented yet, this is just them asking for feedback on their designs. Sohom (talk) 00:02, 4 June 2025 (UTC)
  • ::{{tq|in the next two weeks we will be launching: ... A two-week experiment on the mobile website.}} It's being actively implemented to test, according to the notification there. And to User:Some1, this page would work but there were some concerns above regarding whether this is a good place for it if these notifications would be frequent. Having it on a separate page would keep it separate from "clutter" (such as the whole ANI v WMF debacle) so people can watch that specific page if they want to, and also could have more "strict" moderation of comments to ensure they're on topic and constructive and not flaming the WMF. -bɜ:ʳkənhɪmez | me | talk to me! 00:08, 4 June 2025 (UTC)
  • :::@Berchanhimez As @OVasileva (WMF) mentioned above this is not the final product. They are not deploying this in it's current state after the experiment. The primary reason to do this is to test and gather feedback the reader facing components. I agree with you that deploying an experiment on the live website was hasty and maybe having editor feedback on the mockups would have been a better approach, but "testing" is not "we will deploy this tommorow". Sohom (talk) 00:34, 4 June 2025 (UTC)
  • ::::There's a problem testing things without prior consultation too. That's what my proposal here is intended to prevent - is a whole product being developed with only people who know where to look and look frequently knowing it's been developed and even going into testing without any input from the vast majority of users on enwp. -bɜ:ʳkənhɪmez | me | talk to me! 00:48, 4 June 2025 (UTC)
  • ::::@Sohom Datta, "testing" may as well be "deployment" from the perspective of the readers who see the test. Once "Wikipedia is using AI to summarize articles" gets mentioned on social media, we're not ever, ever going to be able to get that stain out. -- asilvering (talk) 03:47, 4 June 2025 (UTC)

== Users who agree with berchanhimez's proposed statement ==

= Statement proposed by Chaotic Enby =

At present, AI is integrated into the English Wikipedia in the contexts of antivandalism and content translation, with varying degrees of success. There has never been community consensus for other uses, and even use for translation has been controversial. The English Wikipedia community rejects the use of Wikimedia Foundation or affiliate resources to implement novel avenues of AI technology, or use user-generated data to develop novel avenues, without first obtaining an affirmative consensus from potentially affected wikis, and asserts the right to control what AI tools are deployed on this wiki.

  • A "novel avenue" is defined as a use case in which AI is not, as of this statement, used on WMF servers by some stable MediaWiki feature. Affirmative consensus for a novel avenue should be obtained through individual consensuses on each potentially affected wiki.
  • All wikis should have the option to enable an AI tool, or to provide their data to develop an AI tool, and both of these processes should be opt-in rather than opt-out.
  • Any wiki providing their data for AI tool development should, based on credible concerns of facilitating abuse, have the option to compel the destruction of machine-learning data that has been gathered without local consensus.
  • Any person on the English Wikipedia seeking help in creating a dataset for machine learning should gain local consensus at the village pump for proposals before sending out any mass message or otherwise soliciting data. Those who do not do so may be found in violation of WP:NOTLAB.
  • The WMF is encouraged to devote more resources to areas that the community has requested support in.
  • The rejection of novel avenues being implemented without community consensus should not be interpreted as a rejection of AI as a technology. Instead, it stems from a safety and AI alignment issue, and the community asserts its right to decide whether new technologies are aligned with our goals as an encyclopedia.
  • Besides the aforementioned encouragement, this is also not a limitation on the WMF's ability to work on developing novel avenues. However, the community has the final say on whether these avenues are implemented, and on any testing that should take place beforehand.

-- Chaotic Enby (talk · contribs) 18:16, 30 May 2025 (UTC)

== Discussion of Chaotic Enby's proposed statement ==

This is a variation of Tamzin's statement, asserting the need for consensus on affected wikis to implement novel avenues or aid in their development (making the latter opt-in rather than opt-out), but not requiring a global consensus to begin the development of these novel avenues. It also clarifies the position of the problem as an AI alignment question rather than a pro/anti-AI debate. Chaotic Enby (talk · contribs) 18:16, 30 May 2025 (UTC)

I think some additional refinement is needed if you're trying to distinguish between "[not limiting] the WMF's ability to work on developing novel avenues" and "[rejecting] the use of Wikimedia Foundation or affiliate resources to implement novel avenues of AI technology, or use user-generated data to develop novel avenues, without first obtaining an affirmative consensus from potentially affected wikis..." Development is part of the process of implementing new things, whether they're proofs-of-concept, prototypes, deployable features, or other project outcomes. isaacl (talk) 22:21, 30 May 2025 (UTC)

:Good point. What I'm meaning to say is that they should be able to work on the earlier parts of the development that do not necessitate direct testing on wikis, but not do the latter without affirmative consent. Chaotic Enby (talk · contribs) 22:39, 30 May 2025 (UTC)

::This would also reject the experiment the foundation did with the ChatGPT plug-in of which I'm not aware of any onwiki criticism of. Beyond which my concerns above would also apply here. Best, Barkeep49 (talk) 23:01, 30 May 2025 (UTC)

== Users who agree with Chaotic Enby's proposed statement ==

= Statement proposed by Barkeep49 =

The English Wikipedia community is monitoring news about Artificial Intelligence and knows that the Wikimedia Foundation has been researching its use on Wikimedia projects. Our community would like to remind the WMF about how AI is used and seen on the project. At present, AI is integrated into the English Wikipedia in the contexts of antivandalism and content translation with varying degrees of success. There has never been community consensus for other uses, and even use for translation has been controversial. As such, we request that when the foundation develops tools intended to help with core project activities, they should be developed in a way that enables projects to opt-in to their use, perhaps through Community Configuration, and where that is not feasible, that it be possible for a project to opt-out of tool deployment on that project. The Foundation should also keep transparency in mind as it works on AI, both in communication with projects and by enabling auditing of its uses, especially on projects (e.g., use of a tool having a tag applied automatically).

== Discussion of Barkeep49's proposed statement ==

  • I'm really not precious about this and so would likely be open to tweaking most of this. It also seems like the very real concerns about any message (something I'm rather sympathetic to) that all of these specific proposals will be more for ourselves than the WMF. Best, Barkeep49 (talk) 23:14, 30 May 2025 (UTC)
  • :I'd add that any AI edits should be easy to identify, say with a tag in the edit summary. RoySmith (talk) 23:24, 30 May 2025 (UTC)
  • ::Good add. I added a general sentence about transparency and communication as well as the tag idea. Best, Barkeep49 (talk) 23:39, 30 May 2025 (UTC)
  • :I agree with Sohom Datta and you that I'm not sure a blanket statement is helpful on what the WMF already aspires to do generally for new features. I appreciate that the WMF has not always been successful. I feel, though that any issues are best addressed by continuing to provide ongoing feedback to improve the collaborative process, rather than wordsmithing a proclamation of some sorts. isaacl (talk) 17:31, 31 May 2025 (UTC)
  • Maybe we can explicitly recommend integrating new features with the Community Configuration system, so that the opt-out can be enforced onwiki rather than requiring a MediaWiki deploy? -- Sohom (talk) 23:49, 30 May 2025 (UTC)
  • :I had that in mind when writing that section and think WMF would go towards it naturally. I also didn't want us to be proscriptive on process. But adding it in a similar way to the tags suggested by Roy makes sense. Best, Barkeep49 (talk) 02:56, 31 May 2025 (UTC)
  • If we aim to "remind the WMF about how AI is used and seen on the project", we should include recent positions that relate to the recent developments on the llm front, such as the WP:AIIB RfC. CMD (talk) 13:24, 31 May 2025 (UTC)
  • I think this comes closest to covering - or could be tweaked to include - something that's permeated these discussions but isn't quite explicit in the various statements, roughly this community is highly averse to / rejects any use of AI for content generation (including generating summaries and guiding users to spam more effectively). Our attitude to e.g. admin or RPP tools and filtering seems more mixed, but we largely see content as editors' remit, not developers'. NebY (talk) 18:42, 10 June 2025 (UTC)

== Users who agree with Barkeep49's proposed statement ==

  • This wording seems to strike an appropriate balance between experimentation and stability. I made minor grammar fixes. ViridianPenguin🐧 (💬) 00:53, 31 May 2025 (UTC)
  • I think the wording today (31 May) strikes a good enough balance in terms of what we want to actually say v/s stifling the ability of WMF to build new AI tooling. I would ideally not see a statement at all, since in my opinion, this is just restating what is already the recommended standard operating procedure at the WMF (from my understanding), but if we must, this is what we should be saying. Sohom (talk) 16:16, 31 May 2025 (UTC)
  • :I agree that I don't like restating existing best practice, particularly in the context of one specific domain. It leads to the impression that the community is less concerned about following best practice in other domains. isaacl (talk) 17:35, 31 May 2025 (UTC)
  • While I'm still skeptical of any statement, this seems to me to be clearly the least bad one if we have to make one. Loki (talk) 06:32, 1 June 2025 (UTC)
  • This seems to me to be the best of the proposed statements so far. As above, I like that it maintains a balanced perspective, not pre-emptively ruling out such developments in tooling while also centring community consent and consensus in the implementation. --Grnrchst (talk) 17:45, 2 June 2025 (UTC)
  • This would be an appropriate statement for the community to make. Other proposals seem to set hard lines against development of AI products. I am not sure that is desirable, as development and management teams need to have some freedom to develop further software products. Other Wikimedia communities may have greater need or appetite for AI products, and any limit we impose need to apply when products are being deployed and where deployment affects our project. Total bans at this stage would just be a step too far. The danger here is thinking that something must be done; banning AI at the WMF is something; therefore we must do that. A softer, open-minded approach is better here, and would not even require conceding to the deployment of more AI products. Arcticocean ■ 10:21, 14 June 2025 (UTC)

= Statement proposed by Curbon7 =

[Prior paragraphs of whichever variation go here]

The English Wikipedia community is also concerned about the environmental impacts generative AI tools would cause. For instance, xAI (Grok) has recently [https://www.theguardian.com/technology/2025/apr/24/elon-musk-xai-memphis been accused] of emitting large quantities of "toxic and carcinogenic pollution" in the city of Memphis, Tennessee, while [https://arxiv.org/pdf/2304.03271 this 2025 paper] provides data supporting the claim that LLM models consume a huge amount of water for cooling. In keeping with the resolution passed on 24 February 2017 – :WMF:Resolution:Environmental Impact – the English Wikipedia community demands assurances that the WMF's development of AI tools will not significantly impact the environment, and requests annual update reports about this.

== Discussion of Curbon7's proposed statement ==

This is not meant as a standalone proposal, but as an addendum to whichever proposal (if any) achieve consensus. The WMF passed an environmental resolution – :WMF:Resolution:Environmental Impact – on 24 February 2017, but with the environmental impacts of AI-use being well-known, these two goals seem to be at odds. Curbon7 (talk) 00:46, 31 May 2025 (UTC)

:@Curbon7 The total number of GPUs or TPUs on WMF servers is (to my understanding) less than the number of people who have served as English Wikipedia arbitration committee members in the last two years. For comparison, xAI's Memphis cluster uses [https://www.supermicro.com/CaseStudies/Success_Story_xAI_Colossus_Cluster.pdf atleast 100,000 GPUs, according to Supermicro]. Sohom (talk) 01:52, 31 May 2025 (UTC)

::I stand corrected, the number appears to be 17 at the moment. However. my point still stands. Sohom (talk) 02:03, 31 May 2025 (UTC)

:::Thank you, I had not seen :wikitech:Machine Learning/AMD GPU prior. The output of just 17 GPUs is indeed practically nothing, However, how far is this number expected to grow, as some of the plans the WMF has laid out for AI seem pretty aspirational? Obviously 100,000 is not going to happen, but could it go into the high hundreds? Beyond a thousand? And from there, where do we start seeing effects beyond rounding errors? I am not sure as I do not purport to be an expert in this area, but an affirmation from the Foundation that they intend to adhere to their prior environmental resolution in this regard would be decent. Curbon7 (talk) 19:36, 31 May 2025 (UTC)

::::@Curbon7, @CAlbon (WMF) Would be the best positioned to answer your questions regarding the projected growth of WMF GPU usage.

:::: However, to my understanding is that even with WMF's AI plans, a majority of the models will feature simpler and older model designs that do not require anywhere close to the processing power of the frontier models that have sparked criticism about environmental concerns.

::::Additionally another key reason why the WMF can keep running AI inference on such limited hardware is because most of the features where AI is used on Wikimedia wikis don't require immediate feedback (unlike, say, ChatGPT), allowing for slower hardware and more efficient inference logic (where one inference is generated and is subsequently cached for long periods of time). So while usage may grow modestly as a result of the new AI strategy, it's unlikely (imo) to scale to levels where the environmental impact becomes comparable to large-scale AI operations. Sohom (talk) 20:07, 31 May 2025 (UTC)

:::::I can! We are planning on purchasing 16 AMD GPU per year for the next three years. We just ordered 32 GPUs, which is the budget for two fiscal years (this fiscal year and next fiscal year).

:::::To Sohom's point, we aren't and will never be a computational powerhouse. Instead, what we are actually really good at is being super efficient with limited resources (e.g. pre-caching predictions [like Sohom mentioned], using really small models, using CPUs instead of GPUs). CAlbon (WMF) (talk) 18:54, 1 June 2025 (UTC)

::::::Just for the sake of those non-nerds trying to follow along, a GPU is a Graphics Processing Unit. This is a specialized type of CPU which was originally designed for quickly rendering the high resolution graphics needed by video games. They are very good at doing the very specific but highly repetitive types of calculations needed, but not so good at more general problems. Kind of like how some people are capable of doing amazing math calculations in their heads but struggle with everyday tasks. It turns out that other real-world problems like Bitcoin mining and AI model generation do the same kind repetitive computations. At one point, people were buying off-the-shelf gaming consoles to build supercomputers because they were being sold at a loss to spur game sales and were the cheapest way to get that kind of processing power. In a remarkable example of symbiotic evolution, the hardware manufacturers started packaging these types of chips into systems that could be installed in data centers, and the software folks have been hard at work developing algorithms and application frameworks to better take advantage of this kind of hardware. RoySmith (talk) 19:24, 1 June 2025 (UTC)

That's just the usual anti-AI conspiracy theories. Do AI need water cooling? Yes, of course... just like streaming a film from Netflix, an album from Spotify, playing an online game, or even working in Wikipedia if we get to it. Is there an environmental impact? Of course not. The ammount of water used for cooling is just trivia, which is then taken out of context. The water used to cool a computer is inside a closed system. See Computer cooling for details. --Cambalachero (talk) 19:30, 31 May 2025 (UTC)

:Server cooling is not what I'm an expert in, but I will note that the statement {{tq|The water used to cool a computer is inside a closed system.}} is not necessarily always accurate. While most individual computer cooling systems are closed loop, many server farms (for example Microsoft's Azure data centers where ChatGPT's AI workloads are run) do make use of [https://datacenters.microsoft.com/wp-content/uploads/2023/05/Azure_Modern-Datacenter-Cooling_Infographic.pdf#page=1.00&gsr=0direct evaporative cooling] which consumes water by design. In these systems, water is intentionally evaporated to carry away heat and must be replenished from external sources, so the system is by definition not closed. Sohom (talk) 20:49, 31 May 2025 (UTC)

::Even if that was the case, it still doesn't answer the other part of the argument: how is that any different from just any other use of internet? Cambalachero (talk) 23:12, 31 May 2025 (UTC)

:I don't know why you'd frame AI's environmental impact only in terms of water cooling. AI uses a lot of energy, more than simply delivering content. Generating and delivering that energy has environmental impacts and the heat of employing it does too. NebY (talk) 23:48, 31 May 2025 (UTC)

:[https://thereader.mitpress.mit.edu/the-staggering-ecological-impacts-of-computation-and-the-cloud/ Here] and [https://www.parkplacetechnologies.com/blog/environmental-impact-data-centers/ here] are a couple of pieces about the environmental cost of server farms. Donald Albury 01:49, 1 June 2025 (UTC)

::and, as said, they apply to all internet, and do not explain this weird finger-pointing to AI as if it was the single one to be blamed. It's like saying that books are evil, because to reach the libraries and bookstores they are distributed by cars fueled by oil, which causes pollution. Cambalachero (talk) 04:09, 1 June 2025 (UTC)

:::But, LLMs require particularly large server farms, [https://www.forbes.com/sites/emilsayegh/2024/09/30/the-billion-dollar-ai-gamble-data-centers-as-the-new-high-stakes-game/ from Forbes] Donald Albury 15:04, 1 June 2025 (UTC)

::::Jumping in to say that Forbes contributor articles are generally unreliable. LightNightLights (talkcontribs) 15:46, 1 June 2025 (UTC)

:::::Was also about to say this, and I disagree with the author's simplistic categorization of data-centers, data-centers are rarely used for a single-category of workload, but rather are used for a variety of workloads not limited to AI inference, web services, data processing etc.

:::::That being said, {{tq|they apply to all internet,}} is not entirely correct eithier since most internet services do not require significant computing resources to develop (unlike say frontier models which require significant extra compute time to be trained before they can be deployed). Sohom (talk) 16:01, 1 June 2025 (UTC)

::I've found that "AI is harmful to the environment" is an argument used by those who are already disposed to anti-AI sentiment and are looking for more reasons to oppose it. Otherwise it wouldn't be anywhere near the top of the list of environmental concerns to be concerned about. Thebiguglyalien (talk) 🛸 21:53, 3 June 2025 (UTC)

  • I am pretty sure this is false. jp×g🗯️ 13:14, 1 June 2025 (UTC)

:{{tq|The ammount of water used for cooling is just trivia, which is then taken out of context}} - I keep seeing these claims, but never from an independent source. I trust you have one you can share, {{u|Cambalachero}}? Guettarda (talk) 17:48, 6 June 2025 (UTC)

::Sure, check [https://prospect.org/environment/2024-09-27-water-not-the-problem-artificial-intelligence/ Water Is Not the Problem With Artificial Intelligence] Cambalachero (talk) 23:51, 6 June 2025 (UTC)

The relevant question here is WMF's {{tq|commit[ment] to seeking ways to reduce the impact of our activities on the environment}}. Articles are supposed to have lead sections. An LLM summary can never be more useful than a good lead. But even generating that lead once is going to have hundreds or thousands of times the carbon emissions of a human editor.

Using LLMs to summarise articles that lack leads is another issue entirely, but even then there would need to be cost-benefit considerations. A "commit[ment] to seeking ways to reduce the impact of our activities on the environment" doesn't mean "ignore the environmental impact of our actions if we find they have any benefit whatsoever". And how often would these summaries be regenerated? If I make 100 edits to a page, does that mean the summary will be regenerated every time? If so, the impacts would be staggering. If not, then these summaries will be about as useful as the old spoken word articles. Guettarda (talk) 18:06, 6 June 2025 (UTC)

:@Guettarda, With a total capacity of 17 GPUs (see above) I don't think the conversation is "ignore the environmental impact of our actions if we find they have any benefit whatsoever" but rather "even if we do this, our environment impact is negligible compared to the environmental impact of running all of the other non-AI servers". Sohom (talk) 17:05, 7 June 2025 (UTC)

::{{u|Sohom Datta}}: saying {{tq|even if we do this, our environment impact is negligible compared to the environmental impact of running all of the other non-AI servers}} is precisely a case of {{tq|ignore the environmental impact of our actions}}. And even if it isn't, it's most decidedly not consistent with the idea of {{tq|seeking ways to reduce the impact of our activities on the environment}}.

::And focusing on the current capacity is misleading. Either they're doing this with absolutely no intention of implementing it (in which case, it's a total waste of money and incredibly poor stewardship of donations) or they're doing it with an eye to implementing it. And if you add Ai summarise to the top of every article on Wikipedia, you're no longer talking about negligible impact - you're talking about something that will increase the environmental impact (and the cost) of running the servers significantly, possibly hundreds of times what it is now. (While still running the risk of being as big a "success" as spoken-word Wikipedia.) Guettarda (talk) 17:32, 7 June 2025 (UTC)

:::I'm not sure the folks implementing it had reached that stage of thought here. This was one of the first prototypes of the project. I don't think it was a given that future iterations of the project would use AI. Also, based on the fact that the summaries had a date on them, I would assume that the idea would have been to update the summaries with less frequency than every edit on the page. While I agree that there should have been some conversation about environmental impact, I don't think we had reached that stage yet and even with the expected growth over the next few years I don't think there is a expectation of WMF GPU usage coming anywhere close to the impact of running non-AI workloads on our servers (leave alone the ones used to train frontier models that have cause scrutiny into the environmental effects). Sohom (talk) 17:44, 7 June 2025 (UTC)

== Users who agree with Curbon7's proposed statement ==

  • Support: this is an important principle and worth appending to any statement, even although the WMF currently uses negligible amounts of computer resources and intends to keep doing so. It isn't enough that other (hypercapitalistic and ecocial) technological actors are much worse. They are hardly the best basis for an ethical comparison. Arcticocean ■ 10:27, 14 June 2025 (UTC)

= Statement proposed by Chess =

Keep up the good work!

== Discussion of Chess's proposed statement ==

  • I wrote this statement because nobody has unambiguously supported the WMF's attempts at integrating AI. I like the idea of autogenerated simple article summaries. Our math and science articles are famously difficult to comprehend. Allowing readers to opt-in to an automatically generated article summary is a great idea. I also like the idea of having community moderation, where we can verify that a given summary is accurate. I want the WMF to keep coming up with interesting ideas and use cases for AI without being restricted by an onerous approvals process, or overly legalistic guidance from the community. Chess (talk) (please mention me on reply) 04:51, 8 June 2025 (UTC)

== Users who agree with Chess's proposed statement ==

= Statement proposed by Sohom =

At present, AI is integrated into the English Wikipedia in the contexts of antivandalism and content translation, with varying degrees of success. The use of AI for translation has been controversial and the WMF's use of generative AI as a proxy for content in Simple Article Summaries was unanimously rejected by the community. As a result, the English Wikipedia community rejects any attempts by the Wikimedia Foundation to deploy novel avenues of AI technology on the English Wikipedia without first obtaining an affirmative consensus from the community.

  • Deployment here refers to the feature being enabled in any form onwiki, either through A/B testing, through the normal deployment process, or through integration into Community Configuration. Modifications made to existing extensions and services like the ORES extension or the LiftWing infrastructure as part of new features must be behind disabled-by-default feature flags until affirmative consensus is achieved. Furthermore, irreversible actions, such as database migrations on the live English Wikipedia databases or the public release of production models for these new features, should not proceed until affirmative consensus for the feature has been achieved.
  • Wikimedia Foundation teams are encouraged to keep the community notified of the progress of features and initiatives through venues like WP:VPWMF and to hold multiple rounds of consultations with affected community members through out the process of the development of the features.
  • Wikimedia Foundation teams should also keep transparency in mind as it works on AI, both in communication with projects and by enabling auditing of its uses, especially on projects (e.g., use of a tool having a tag applied automatically and open-sourcing and documenting onwiki the output, methodology, metrics, and data used to train the AI models).

== Discussion of Sohom's proposed statement ==

== Users who agree with Sohom's proposed statement ==

  • Given the recent Simple Article Summaries thread, I am supportive of this statment as well. Sohom (talk) 10:03, 9 June 2025 (UTC)
  • In line with my own proposal as it puts a limit on deployment rather than development while still keeping transparency throughout the development process. I wonder if it could be possible to add something about Wikimedia Foundation teams having to clearly inform editors in plain language of what is being worked on (as it can otherwise be "transparent" but hidden at the bottom of a page of corporate jargon). The "multiple rounds of consultations" is a good idea, although I'm afraid it might become a bit too rigid of a system, and continuous feedback on pages like WP:VPWMF could also be considered. Chaotic Enby (talk · contribs) 02:33, 10 June 2025 (UTC)
  • Support. This is a good statement of community reservations and expectations, balanced against an effort to make sure that the ultimate guidelines that devolve from this starting point are not overly onerous and do not create undue barriers for useful and less problematic software development and technical backbone support. In other words, it makes no bones about demanding transparency and insisting that nothing goes live in our technical and editorial ecosystems without a serious opportunity to vet (and if necessary, veto) AI software which is at odds with the community's perspective on safe, ethical, and pragmatic practices, while also keeping a path open for much less controversial implementation of AI that does not draft content or otherwise create problematic operational quagmires. SnowRise let's rap 11:59, 10 June 2025 (UTC)
  • :Incidentally, it's worth noting that this proposal resulted from a substantial back-and-forth between a few parties in the Users who oppose any position section above. It's worth reading that dialog to understand the needle that this version of the statement attempts to thread. SnowRise let's rap 12:04, 10 June 2025 (UTC)

:This is still a comparatively forceful and straightforward statement. I believe that is appropriate following the unpleasant surprise that precipitated this issue, and the raised eyebrows re apparent level of WMF clue. I would support this. --Elmidae (talk · contribs) 15:34, 10 June 2025 (UTC)

= Collaborative statement workshopping =

Following the discussion above, there have been proposals of merging multiple statements together, and this approach would fit nicely into the wiki spirit of building our community statement together. We can take Sohom's statement as a starting point, feel free to edit it as you wish!

Pinging User:Sohom Datta, User:Berchanhimez, User:QEDK and User:RoySmith who have mentioned being interested in that approach. To clarify, this section is not for support/oppose voting, but an idea lab-style discussion to have the community work on a common statement that we can then put up to consensus. Chaotic Enby (talk · contribs) 03:15, 11 June 2025 (UTC)

== Current statement ==

At present, AI is integrated into the English Wikipedia in the contexts of antivandalism and content translation, with varying degrees of success. The use of AI for translation has been controversial and the WMF's use of generative AI as a proxy for content in Simple Article Summaries was unanimously rejected by the community. As a result, the English Wikipedia community rejects any attempts by the Wikimedia Foundation to deploy new use cases of AI technology on the English Wikipedia without first obtaining an affirmative consensus from the community.

  • Deployment here refers to the feature being enabled in any form onwiki, either through A/B testing, through the normal deployment process, or through integration into Community Configuration.
  • A "new use case" is defined as a use case in which AI is not already used on WMF servers by some stable MediaWiki feature. Modifications made to existing extensions and services like the ORES extension or the LiftWing infrastructure as part of new features must be behind disabled-by-default feature flags until affirmative consensus is achieved. Furthermore, irreversible actions, such as database migrations on the live English Wikipedia databases or the public release of production models for these new features, should not proceed until affirmative consensus for the feature has been achieved.
  • Wikimedia Foundation teams are encouraged to keep the community notified of the progress of features and initiatives through venues like WP:VPWMF and to hold multiple rounds of consultations with affected community members through out the process of the development of the features.
  • Wikimedia Foundation teams should also keep transparency in mind as it works on AI, both in communication with projects and by enabling auditing of its uses, especially on projects (e.g., use of a tool having a tag applied automatically and open-sourcing and documenting onwiki the output, methodology, metrics, and data used to train the AI models).

== Discussion ==

A wording question: does anyone see an advantage to using "novel avenues" instead of "new uses"? isaacl (talk) 03:26, 11 June 2025 (UTC)

:The only big difference I see is that "new uses" might apply to implementing already existing tools in new situations, which might be too wide-ranging for our community proposal. However, "new use cases" could also work while still using plain language. In either case, we could maybe consider borrowing a sentence or two from Tamzin's statement to define more clearly what we mean by that. Chaotic Enby (talk · contribs) 03:40, 11 June 2025 (UTC)

::I would be open to using the term "new use cases" but you are right, we should define what we mean by eithier phrasing Sohom (talk) 03:44, 11 June 2025 (UTC)

Slight nitpick - database migrations very much are reversible. Not sure what you had in mind there instead. Gnomingstuff (talk) 03:48, 11 June 2025 (UTC)

:Database migration (or rather schema migration) aren't very reversible at Wikimedia's scale (or atleast would require significant work to undo in certain cases). I think it makes sense to caution folks against doing a lot of hard to undo work before obtaining consensus. Sohom (talk) 18:26, 11 June 2025 (UTC)

:For this proposal to be workable, we need a distinction between "deployment" and "development", possibly with different proposals. The proposal says it only restricts deployment, but there's several lines in it about development. Telling the WMF they cannot even develop an AI feature is counterproductive, because Google/Meta/Microsoft/Apple/etc can develop AI features based on Wikipedia content without any extra permission (due to our licence) and continue to take our readers. We are kneecapping the only organization with a legal mandate to help us.

:I will strongly oppose any requirements for ongoing consultations in the development process, because that isn't how software works. I usually build a prototype, demo it to users, get feedback, and iterate from there. Doing lengthy requirements analysis before anything concrete exists has discredited for decades. If I were a WMF developer faced with restrictions on database migrations or training, I would either a) ignore enwiki's statement or b) not develop anything new for enwiki, because we are painful to work with.

:I can understand the desire for a community consultation process before widely deploying a feature, though. If something is going to change my workflow I would appreciate a heads up. But that should be a 7-day RfC for a simple A/B test or opt-in feature. Not multiple rounds of consensus. Chess (talk) (please mention me on reply) 03:04, 15 June 2025 (UTC)

::{{tq|I usually build a prototype, demo it to users, get feedback, and iterate from there.}}Wouldn't that count as the "multiple rounds of consultations" in the proposal? I don't see anywhere that these must start before the first prototype, or include requirement analysis rather than simply sharing updates with the community and asking if they have feedback. Chaotic Enby (talk · contribs) 03:17, 15 June 2025 (UTC)

:::The whole point of the proposal is to formalize and provide guidelines for what is already common practise, the language around {{tq|multiple rounds of consultation}} is explicitly loose so that it does not require consensus, but rather to encourage feedback and iteration through demos and "test this out on betawiki and give your thoughts" (i.e. the agile model). The proposed recommendations around database migrations (associated with new features) on production wikis are (to my understanding) already established procedures since we consider those operations to be hard to undo. The new recommendation here is for production model training (note, not training models in general, since training test models are fine) should be considered a irreversible action since one of the conversations surrounding ToneCheck is that once such a model is trained and released to the public, engineers cannot "untrain" it. Sohom (talk) 03:53, 15 June 2025 (UTC)

::::I think it could be good to clarify that last guideline to explicitly refer to models made available somewhere. It is perfectly possible that engineers may have a production branch they are working on, but not release the model to the public and only show its results on select test cases. In that situation, assuming no model leaks (which could also happen on a test branch), there wouldn't be a ToneCheck-like issue to my understanding. This could be more reassuring as it gives freedom for developers until the actual deployment (on either test or production wikis) while making the actual issue more specific (as test models could also lead to similar issues once released to the public). Chaotic Enby (talk · contribs) 04:06, 15 June 2025 (UTC)

:::::Maybe {{tq|publication of production models}} instead of training ? Sohom (talk) 04:07, 15 June 2025 (UTC)

::::::Yes, that could work! "Public release" could also be a possibility, as publication is usually related to copyright law, although it depends on how much we value precise language vs plain language. Chaotic Enby (talk · contribs) 04:17, 15 June 2025 (UTC)

::::The community seems to care a lot more about deploying features than developing features, which is why Simple Article Summaries started getting flak once it was proposed as an opt-in. Both WP:VISUALEDITOR and WP:Media viewer garnered controversy when they were deployed, not when they were developed.

::::If we're going to go with mandatory "rounds of consultation" (i.e. points where the community can express outrage about an idea), it shouldn't be loose. We should have actual milestones so managers can add community consultations to the project timeline and account for its risk. I think it's unclear to the WMF why their first notification of Simple Article Summaries was uncontroversial[https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_218#c-SGrabarczuk_(WMF)-20250210151100-Simple_Article_Summaries:_research_so_far_and_next_steps] and their second notification resulted in unanimous opposition.[https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#c-EBlackorby-WMF-20250602182000-Simple_summaries:_editor_survey_and_2-week_mobile_study]

::::One way is a 7-day mini-RfC for key deployment milestones like A/B testing. The project manager can say "we're doing the RfC for the A/B on June 2nd. We need x, y, and z from the team by that date. If we pass that, we can do the RfC for the full deployment in September". That gives the WMF time to plan budgets/hiring/etc for negative community responses that could kill a project. Chess (talk) (please mention me on reply) 04:55, 15 June 2025 (UTC)

:::::The major pain point, the fact that consensus before deployment, even for A/B testing is a must is already codified in the proposal. The headline is that English Wikipedia if this proposal is accepted, enwiki will not accept deployment of new usecases of AI without community consensus. This would already provide the structure you are talking about. The "multiple-rounds of community consultation" cannot really be structured since it differs for every team (and to be honest I don't think it should be enwiki's place to decide what product development lifecycle is followed) and so is left as a "you should probably do this" suggestion of how teams can modify their workflows to avoid failure at the A/B testing phase (by fitting in multiple consultation stages). Sohom (talk) 05:21, 15 June 2025 (UTC)

::::::{{re|Sohom Datta}} There are real people employed to help us onwiki, is my point. They need parental leave, medical leave, and vacation. They have professional reputations and can't be anonymous.

::::::When we publicly[https://www.404media.co/wikipedia-pauses-ai-generated-summaries-after-editor-backlash/][https://www.engadget.com/ai/wikipedia-cancels-plan-to-test-ai-summaries-after-editors-skewer-the-idea-200029899.html?guccounter=1][https://techcrunch.com/2025/06/11/wikipedia-pauses-ai-generated-summaries-pilot-after-editors-protest/] and kill projects at random points in the cycle after the WMF tries to consult us with no response, that hurts. It's going to make them more reluctant to innovate in the future. That's a failure on us that we didn't have a better way to approach the problem.

::::::If we want to mandate that the WMF get affirmative community consensus, we have to come up with a workable consensus process for them to follow. Right now, that's going to default to a 30-day RfC posted here. A 30-day RfC is fine for volunteers, but for actual professionals working full time, it's 1/3rd of a quarter. We need something faster than that. We also need to make it clear to them when they need to run those RfCs.

::::::This is all our job, because we are the customer and we pick the acceptance criteria. We can't punt that off to the WMF. Chess (talk) (please mention me on reply) 07:53, 15 June 2025 (UTC)

:::::::Again, I genuinely don't think a series of 30-day RfCs on a fixed schedule is the best way to go at it. That's the whole point of agile development, and why the waterfall model, as you mentioned above, isn't in use anymore. The key is to have a live communication between the "customer" (us) and the developers, and to adapt accordingly. Chaotic Enby (talk · contribs) 14:14, 15 June 2025 (UTC)

::::::::I know I've said this before, but it's worth repeating. The enwiki community is not the customer. Our readers are the customer. RoySmith (talk) 14:34, 15 June 2025 (UTC)

:::::::::Also a good point. I was putting it in quotes, but yes, readers should also be involved (to the extent that it is possible) in the process of deploying new features. Chaotic Enby (talk · contribs) 15:45, 15 June 2025 (UTC)

:::::::::There are different types of customers, depending on the feature. With automatically generated summaries, readers are the external customer, and editors are the internal customer with respect to changes to their workflow, as well as being a collaborator due to their vested interest in how Wikipedia presents content. With tone check, editors are the internal customer, though the potential effects of having tone check available affect readers. isaacl (talk) 16:46, 15 June 2025 (UTC)

::::::::::Also, editors are the Content Generation and Management Team, by far the largest part of the workforce with immense individual and collective expertise, entrusted with massive responsibilities and altogether Wikimedia's greatest resource. Plus we're cheap. Time and money spent liaising with this team is an acceptable cost that should be factored into project budgets from the start. NebY (talk) 17:21, 15 June 2025 (UTC)

:::::::@Chess, Again, much of what is codified here is already is standard operating procedure across most teams (see the IP Masking rollout, Moderator Tools Automoderators, Growth Team experiments many of which went through multiple feedback loops before coming close to being deployed). The deployment of a single feature is typically supposed to take a longer than a quarter as folks fix bugs and respond to community feedback over a staged rollout. The fact that Simple Summaries got struck down is not only a failure on the community but also a failure of the WMF to effectively communicate and ask for feedback in the right way. What this proposal is aiming to do is to give guidance on engaging with the community, and putting up bright lines and guardrails at the point of deployment without providing a rigid structure that incapacitates them. Technically, a team can ignore our recommendation for "multiple consultations" if they want, but what we are saying here is simply, having multiple consultations (which are not 30-day RFCs on their own) increases your chances of having a easier time with the 30-day RFC approving deployment/A/B testing towards the end of the quarter. Sohom (talk) 16:52, 15 June 2025 (UTC)

:::{{re|Chaotic Enby}} My reading of {{!xt|irreversible actions, such as database migrations or training production models for these new features, should not proceed until affirmative consensus has been achieved}} is that significant development work on a prototype requires affirmative community consensus. For example, before I can even download a data dump of Wikipedia and put it into a database on my laptop to begin working on the prototype, I need to hold an RfC by the plain language of the proposed wording. I also doubt any editors would comment on an RfC that's "should the WMF add a new field to the global database schema?", given Simple Article Summaries got no comments at WP:VPT.[https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_218#c-SGrabarczuk_(WMF)-20250210151100-Simple_Article_Summaries:_research_so_far_and_next_steps] Chess (talk) (please mention me on reply) 04:03, 15 June 2025 (UTC)

::::Maybe {{tq|database migration on live enwiki databases}} would work better ? Sohom (talk) 04:09, 15 June 2025 (UTC)

:::::To give an example of how I'd write an RfC statement: {{xt|The WMF needs a vector database of English Wikipedia articles to enable RAG pipeline research. This will require creating a secondary Postgres+pgvector cluster replicating the primary MariaDB instance. We will also do a staged rollout of Postgres for a subset of our readers to evaluate its performance characteristics.}} I can't imagine many of us would comment on that, and if I were a developer, I'd rather not wait 30 days to get a community response that is either "sure whatever I don't care" or "I read a blog post saying RAG is dead so the WMF shouldn't invest in it". Chess (talk) (please mention me on reply) 05:17, 15 June 2025 (UTC)

::::::You are misreading the statement here. This kind of a change should not require consensus at all under the current wording since this would not result in the immediate deployment of a AI feature on the English Wikipedia. If a particular RAG based feature is proposed that requires a database migration, the migration should wait until the deployment of the feature (say Simple Summaries) has affirmative consensus. Sohom (talk) 05:31, 15 June 2025 (UTC)

::::I don't think that is the case, as building the prototype of a new database would not be an "irreversible action", but only implementing the actual database migration would be. However, if there is an ambiguity over this, you are of course welcome to suggest a clarification to the language.{{pb}}Also, the WP:VPT post was very vague in its wording, mostly focusing on the background and metrics while not mentioning at all the key fact that a generative AI model was used, with the exception of a mention of {{tq|the text simplification model at the top of the page}} (not actually anywhere on the page). This lack of clarity over what was actually being done might have been the reason for the lack of engagement, and communicating in plain language about ongoing projects would help. Also noting that WP:RFCs are widely advertised and required to be in the form of a brief, neutral question, while the VPT post was in a completely different form, so this issue might not be present. Chaotic Enby (talk · contribs) 04:14, 15 June 2025 (UTC)

:::::Indeed, the VPT post didn't even ask for responses, ending instead {{tq|We will come back to you over the next couple of weeks with specific questions and would appreciate your participation and help. In the meantime, for anyone who is interested, we encourage you to check out our current documentation.}} That was 10 February; did they come back with questions in two weeks somewhere?

:::::The post wasn't really structured to engage editors either. An encyclopedic lead would have opened by describing Simple Summaries, e.g. as stated later in the post, a summary {{tq|that takes the text of the Wikipedia article and converts it to use simpler language}}. Instead it began with a leisurely scene-setting in terms of WMF strategy {{tq|The Web team at the Wikimedia Foundation has been working to make the wikis easy to engage and learn from so that readers will continue coming back to our wikis frequently.}} NebY (talk) 15:01, 15 June 2025 (UTC)

= Statement proposed by [user] =

== Discussion of [user]'s proposed statement ==

== Users who agree with [user]'s proposed statement ==

WMF survey: do we want AI generated summaries on every article?

{{atop

| result = Discussion continues over at Wikipedia:Village_pump_(technical)#Simple_summaries:_editor_survey_and_2-week_mobile_study. Polygnotus (talk) 18:21, 10 June 2025 (UTC)

}}

:Update: the survey has been closed and the project is paused. Polygnotus (talk) 03:34, 6 June 2025 (UTC)}}

The WMF has started a survey to ask if we want to put an AI summary in every article's lead section.

https://wikimedia.qualtrics.com/jfe/form/SV_1XiNLmcNJxPeMqq

Unsurprisingly, even the example they gave in their screenshot [https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=1293819617#The_Dopamine_summary contains hallucinated AI nonsense].

Please voice your opinions [https://wikimedia.qualtrics.com/jfe/form/SV_1XiNLmcNJxPeMqq here]! Polygnotus (talk) 00:35, 4 June 2025 (UTC)

:The more lively discussion is taking place at {{slink|Wikipedia:Village_pump_(technical)|Simple_summaries:_editor_survey_and_2-week_mobile_study}}. Some1 (talk) — Preceding undated comment added 01:31, 4 June 2025 (UTC)

::How out-of-touch can you get? Phil Bridger (talk) 07:05, 4 June 2025 (UTC)

:::On this website there is no limit. Levivich (talk) 15:37, 4 June 2025 (UTC)

::::Well, for once I can agree with Levivich. Phil Bridger (talk) 19:50, 4 June 2025 (UTC)

:Note that the inputs are reversed on the last question. I almost submitted incorrectly. Sock-the-guy (talk) 19:12, 4 June 2025 (UTC)

:{{tqq|Sorry, this survey is not currently active.}} For me, at least. Skynxnex (talk) 15:00, 5 June 2025 (UTC)

{{abot}}

ToneCheck community call/discussion

Hi hi, the team behind Tone Check, a feature that will use AI to prompt people adding promotional, derogatory, or otherwise subjective language to consider "neutralizing" the tone of what they are writing while they are in the editor, will be hosting a community consultation tomorrow on the Wikimedia Discord voice channels from 16:00 UTC to 17:00 UTC. Folks interested in listening in joining in, asking questions should join the Wikimedia Discord server and subscribe to [https://discord.gg/wikipedia?event=1380664800656363671 this event] Sohom (talk) 20:44, 9 June 2025 (UTC)

:@Sohom Datta A notification one day in advance on a page with relatively low traffic compared to other similar pages may not be the best idea. I would've liked to be able to attend. Was {{ping|Tamzin}} invited, who explained why this is a bad idea [https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(WMF)#c-Tamzin-20250523195100-The_WMF_should_not_be_developing_an_AI_tool_that_helps_spammers_be_more_subtle here]? {{tq|a feature that will use AI}} So that decision has already been made? Sentiment analysis does not require AI at all, or at least not what most people would consider to be AI. Where can we find the recording? Why was the Discord platform chosen instead of something more appropriate? Why was the notification only a day in advance? Polygnotus (talk) 17:59, 10 June 2025 (UTC)

::Rapid fire round: I notified WT:NPR, WT:AIC, here, WP:VPT and the tag has been up on discord over the weekend. Discord was suggested by me since a lot of folks are already on it, easier to get folks to show up and provide feedback (over something like GMeet). Tamzin was invited, I explicitly mentioned it to them a last week. I used to term AI because that's the way folks have described it in RFCs, I agree sentiment analysis is a better description, but it wouldn't be accessible to folks. The meeting wasn't recorded, however, notes were taken at [https://etherpad.wikimedia.org/p/DiscordToneCheck] (even I couldn't make it since I got stuck in a last minute meeting IRL). The notification part is on me, I realized last minute that I should have put the notifications out earlier. Sohom (talk) 18:23, 10 June 2025 (UTC)

:::@Sohom Datta Where can I find the download link? I would recommend using WP:CENT. Polygnotus (talk) 18:54, 10 June 2025 (UTC)

::::@Polygnotus Download link for ? (Also imo WP:CENT doesn't make that much sense here, but I can put it up there next time) Sohom (talk) 19:01, 10 June 2025 (UTC)

:::::@Sohom Datta For Bert (and perhaps Ernie). Reading {{Phab|T368274}} it looks like a BERT model was trained and that is what will be used, right? Polygnotus (talk) 19:02, 10 June 2025 (UTC)

:::::Also why tell the writer instead of the reviewer? It would be far better to not inform the potential spammer, but inform the AfC reviewer: "x promotional phrases detected, sentiment 95% positive" or whatever, right? Is there a reason we need to tell this information to the person writing the article instead of the one reviewing it? Polygnotus (talk) 19:06, 10 June 2025 (UTC)

::::::I don't think the BERT model has been trained properly yet (the last I checked atleast, @PPelberg (WMF) will be able to give better specifics). To my understanding, the whole point of the feature is to potentially reduce the amount of time the user spends reviewing another user's edits. I think a big part of the conversation at this point is how to mitigate Tamzin's concerns and surface to the admins/others that the user did see the prompt. Sohom (talk) 19:15, 10 June 2025 (UTC)

:::::::

:::::::@Sohom Datta We mitigate Tamzin's concerns by providing the information to the AfC reviewer/recent changes patroller/vandalfighter and not the person making the edit. Otherwise you get a bizarre version of Clippy (as [https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(WMF)#c-NebY-20250610192100-Polygnotus-20250610190600 NebY explains below]).

:::::::We don't tell LTA's "if you do this you will be blocked as a sock of X, would you like to continue".

:::::::Is my understanding that this model takes only a single edit in account correct? If so, how will it be able to detect actual UPEs like Hajer-12? I compiled a mountain of evidence available at the three collapsible boxes [https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/IncidentArchive1161#c-Snow_Rise-20240717223100-Isaidnoway-20240717172500 here]. This is how marketing companies operate. Polygnotus (talk) 19:29, 10 June 2025 (UTC)

::::::::@Polygnotus Except that we do, with Extension:AbuseFilters and with our vandalism warnings. If we want total secrecy we should just slap every user with a block and never warn them. We don't do that. The point is to tell the admin, "hey this guy was editing promotionally" AND tell the user "hey your tone is promotional". That way we have editor retention as well as better vandalism fighting.

::::::::That aside, yes the model will only take a single edit into account. Sohom (talk) 19:37, 10 June 2025 (UTC)

:::::::::@Sohom Datta Except that we don't, that is not how abuse filters work. There is no abuse filter or vandalism warning that says "if you post this it will be considered behavioral evidence that you are this LTA, would you still like to post it?". And editfilters created in response to LTAs are only visible to admins and edit filter managers.

:::::::::And I am not proposing total secrecy, or any, so that comparison doesn't hold up. I am proposing providing the information to the person who can use it (AfC reviewer, vandalfighter) instead of showing it to those who can abuse it. They would still be able to see it, e.g. on the AfC dashboard, but only after posting the draft, and without clear instructions how to score better.

:::::::::{{tq|The point is to tell the admin, "hey this guy was editing promotionally" AND tell the user "hey your tone is promotional".}} Why is that the point? That just seems like a bad idea. {{tq|That way we have editor retention as well as better vandalism fighting.}} I very much doubt that we want to increase editor retention among people making promotional edits.

:::::::::{{tq|yes the model will only take a single edit into account}} Has an alternative approach of taking more than one edit into account been compared? And other factors like editcount, amount of pages created, account age et cetera? Polygnotus (talk) 19:53, 10 June 2025 (UTC)

::::::::::@Polygnotus That is literally how some AbuseFilters works. It shows you a popup and then it allows folks to try and resubmit, which then allows the edit to go through. {{tq| I am proposing providing the information to the person who can use it (AfC reviewer, vandalfighter) instead of showing it to those who can abuse it.}}, the point here is to allow NPP folks, admins access to the information while also giving the user some indication that they have messed up. Think about it from the new user's POV for a sec, you create a page about yourself, cause you think you are cool (idk, but lot of folks do autobiographies), you are enthusiastic about the page, two minutes later leaves a warning linking to WP:PROMO, you refresh the page and there is another one for speedy deletion and a message asking you to disclose your employer. Before you have finished typing the message, your article is deleted. If we just had one side of the pipeline as you propose, you would just get the warning faster. No change. Alternatively, with Edit Check, you get a warning while you are editing and you read through the policies and realize you should be writing about something else. You do that. On the administrator/NPP/folks who are monitoring users, you could look at a page's log and see that they received a warning and once you do, you are able to still take the same actions. (Potentially), if the folks working on make a robust system, you will also potentially able to see the text that they had before they revised it and are able to spot subtle differences where you are able to link them to specific spam rings. (and/or block them/warn them for it) I don't see a downside here to be honest.

::::::::::{{tq|yes the model will only take a single edit into account Has an alternative approach of taking more than one edit into account been compared?}}, I think using a set of edits and computing similarity with another account/using ML to detect UPE rings is out of scope for this project AFAIK, but it would be a interesting thing to raise with the Trust and Safety Product team (who I think are currently focused on CU tooling and IPMasking) -- Maybe {{u|KHarlan (WMF)}} (a engineer on that team) would be interested/be able to point you to a better place? Sohom (talk) 20:25, 10 June 2025 (UTC)

:::::::::::@Sohom Datta {{tq|That is literally how some AbuseFilters works.}} No, it is not.

:::::::::::{{tq|It shows you a popup and then it allows folks to try and resubmit, which then allows the edit to go through.}} Yes, but that is not what I said. I said {{tq|We don't tell LTA's "if you do this you will be blocked as a sock of X, would you like to continue".}} so the fact that there are editfilters that warn you before you can submit an edit (e.g. if it contains a bad link) is not relevant. And as an [https://en.wikipedia.org/wiki/Special:Log?type=rights&user=&page=User%3ASohom_Datta&wpdate=&tagfilter=&wpfilters%5B%5D=newusers&wpFormIdentifier=logeventslist edit filter manager] you know that AbuseFilters do not work that way. I am talking about the fact that we don't always tell LTAs how we detect them, because if we do they will hide themselves better next time. That is different from showing someone a message that a particular link may be undesirable. For example Special:AbuseFilter/213 is hidden from view and only edit filter managers and admins can see it. There are a bunch of other edit filters that are also hidden, usually for a very similar reason.

:::::::::::{{tq|the point here is to allow NPP folks, admins access to the information while also giving the user some indication that they have messed up.}} But that is simply a bad idea. Providing that information to NPP/AfC/admin is all good, but providing that information to the person making the edit while writing is a bad idea.

:::::::::::{{tq|Think about it from the new user's POV for a sec, you create a page about yourself, cause you think you are cool (idk, but lot of folks do autobiographies), you are enthusiastic about the page, two minutes later leaves a warning linking to WP:PROMO, you refresh the page and there is another one for speedy deletion and a message asking you to disclose your employer. Before you have finished typing the message, your article is deleted. If we just had one side of the pipeline as you propose, you would just get the warning faster. No change. Alternatively, with Edit Check, you get a warning while you are editing and you read through the policies and realize you should be writing about something else. You do that.}} This would basically never happen. The idea that people who write autobiographies would suddenly be converted into goodfaith editors with a simple popup is very very very very optimistic. In reality, the best case scenario is that they stop and move on, which a 24-hour block is more likely to achieve than this Edit Check.

:::::::::::There is a group of promotional editors, lets for the sake of argument say 100% of edits is promotional. Of the promotional edits they make only a tiny subset is salvageable if you rewrite them, maybe 1%.

:::::::::::If we assume that all editors are promotional editors, then there is maybe 1% or 2% who may be interested in becoming goodfaith non-promotional editors. I think that at most 1% or 2% of the general population of a rich European country would want to be a goodfaith Wikipedian, and among promotional editors that percentage is probably smaller, not larger. We do not want to increase editor retention of promotional editors, so the entire idea is misguided at best and actively damaging the encyclopedia at worst.

:::::::::::So a far more likely scenario is: UPE shows up, Edit Check helpfully assists them whitewashing their spam, and it gets deleted anyway but may take longer to detect it/may fool inexperienced NPP/AfC folk.

:::::::::::{{tq|you get a warning while you are editing and you read through the policies}} The people I meet on the street or the internet don't work like that.

:::::::::::We want trolls to insult everyone, see WP:ROPE. We want promotional editors to be as promotional as possible so that it is easy to detect and flag. We want vandals to use bad words that make Cluebot's job easy.

:::::::::::{{tq| I think using a set of edits and computing similarity with another account/using ML to detect UPE rings is out of scope for this project AFAIK}} Maybe, but that is not what I said. What I said was: {{tq|Has an alternative approach of taking more than one edit into account been compared?}} So let's say an account makes 10 edits and we have a reasonable suspicion that 1 edit is promotional. It would make sense to check the other edits and if they are promotional too then you can be almost certain this account is up to no good. But when you only use 1 data point (1 edit) the reliability is far lower.

:::::::::::I also mentioned other stuff you can take into account when determining a score, like {{tq|editcount, amount of pages created, account age}}. For example, it would be interesting to see people who make 10 edits, wait until 4 days have passed, and then suddenly start editing in a way that sentiment analysis determines is highly positive or negative. Or 500 edits and 30 days. Especially in a CTOP area. Am I making sense?

:::::::::::See WP:BEANS, we don't want to give our enemies information on how to better evade our scrutiny. Although I am a big fan of oracle attacks.

:::::::::::{{tq|I think using a set of edits and computing similarity with another account/using ML to detect UPE rings is out of scope for this project AFAIK, but it would be a interesting thing to raise with the Trust and Safety Product team}} Oh yeah I made something like that once. I have a tool that gets all diffs of edits by a user, and you can filter out the context, so if you do that with 2 users it is pretty easy to compare. I hadn't really figured out a way to determine how rare each string was before my attention was drawn to something else. Polygnotus (talk) 21:10, 10 June 2025 (UTC)

::::::::::::I don't want to go around in circles here, but the feature is broadly aimed at increasing editor retention amongst new users. Yes, a UPE users will be warned about their impending doom, but the idea of the call was to figure out what tooling the team needs to work on/make robust so as to mitigate and counteract the effect of showing the new editor a prompt asking them to improve their text. (for example, if we as AFC/NPP folks can see the text before a edit was revised, I see no reason why we should not try to help good faith editors write about their favorite content creator in a NPOV manner or write about their research?) The point of the feature is to encourage editor retention (by letting folks know when they have violated policy) before they save an edit, not to serve as a anti-vandalism toolkit (even tho that might be a by-product). Part of the feature (not specifically ToneCheck) has even already been deployed to wikis.

::::::::::::{{tq|So a far more likely scenario is: UPE shows up, Edit Check helpfully assists them whitewashing their spam, and it gets deleted anyway but may take longer to detect it/may fool inexperienced NPP/AfC folk.}} - Except, that if we design this correctly, NPP/AFC folks would know to EditCheck logs, see the previous versions of the edits and alert a admin to block the user. We could even surface the fact that an EditCheck event was triggered inside the AFC script or the PageTriage UI (Think like a AbuseFilter for bad links) That's what this call was for!

::::::::::::I do come from a background of computer security, so I understand your propensity to come at it from the point of view of a threat model, however, it's important to note that unlike most traditional security models, if we accidentally err on the side of too much enforcement we go the way of StackOverflow questions graph. Sohom (talk) 22:17, 10 June 2025 (UTC)

:::::::::::::{{ping|Sohom_Datta}} {{tq|That's what this call was for!}} Neither of us was there during this call, so maybe you can invite me to the next one? In my experience people who always agree with me are very boring.

:::::::::::::As a user of both: Wikipedia can [https://meta.stackoverflow.com/questions/412978/how-often-should-these-are-you-paying-attention-review-tasks-appear learn a lot] from StackExchange, and StackExchange can [https://i.programmerhumor.io/2023/09/programmerhumor-io-stackoverflow-memes-programming-memes-4d109f904610157.jpg learn a lot] from Wikipedia.

:::::::::::::Please respond to the part that starts with "What I said was:" to the end of [https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(WMF)#c-Polygnotus-20250610211000-Sohom_Datta-20250610202500 this] comment because it is a pretty good idea and something I have been contemplating making for a while (although my idea was to use a different approach). It looks like the team has tunnel vision on their proposed solution, which is very common in software development (especially when coders have managers). They should step back and consider what other ways of tackling this problem exist, and how else we can use this tech, and make a list of reasons why this is a bad idea/the downsides/flaws/imperfections. I also list my assumptions and why they are wrong. That usually helps me. What percentage of promotional edits do you think is salvageable if rewritten? What percentage of promotional editors do you think can be converted to goodfaith editors? Polygnotus (talk) 23:22, 10 June 2025 (UTC)

::::::::::::::I will keep it in my mind to notify you whenever the next one happens.

::::::::::::::I think the CTOP area idea is interesting and a good idea, I think a variation of this would be useful at SPI to find sleeper socks as well.

::::::::::::::Regarding the rest, the reason the team is working on it is because members of the community have suggested that EditCheck in general is a good idea. Another thing to keep in mind is that we are not necessarily talking about promotional edits (even though that is a major portion), but NPOV text is general (this might be both overly negative and overly positive). I don't have numbers for the same, but I agree that it would be interesting for the team to pull them up at some point. Sohom (talk) 23:58, 10 June 2025 (UTC)

:::::::::::::::@Sohom Datta Thanks! Can you imagine having to live with such a brain? Oof. My wife is a saint. Polygnotus (talk) 00:03, 11 June 2025 (UTC)

::::::"It looks like you are trying to promote a product. Would you like to be less obvious about it?" NebY (talk) 19:21, 10 June 2025 (UTC)

:::::::@NebY, Folks (read newbies) are unintentionally promotional as well ("Google Chrome is a renowned product" etc), in which case, they get slapped with 4 warnings and a block and leave the wiki. It's nice to tell them, hey "you tone sounds off here, please write about it more neutrally" while they are writing about it. Sohom (talk) 19:29, 10 June 2025 (UTC)

::::::::@Sohom Datta I would like to see statistics. Pretty sure most promotional edits are made by people who are actually promoting something and not well-meaning newbies with a surprising love for spyware. And I have yet to encounter the scenario of a goodfaith user without promotional intent getting blocked for promo. Most admins are sane. Polygnotus (talk) 19:33, 10 June 2025 (UTC)

:::::::::We are regurgitating the thread above at this point. I don't have numbers for it, but even if the fraction is low it is still worth it to improve editor retention. It is bad for editor retention for us to even be giving warning and declining pages since warnings and declining pages are often demoralizing to well meaning editors, which are the folks we need more of. But we do need to do that anyway, so why not do it when the editor is in the edit page ? Sohom (talk) 19:44, 10 June 2025 (UTC)

::::::::::I'd hope that any organisation's spending decisions went beyond {{tq|"even if the fraction is low it is still worth it"}}. New editor retention is not an absolute good to be pursued regardless of cost, whether cost to the encyclopedia in degraded quality and the attendant reputational harm, or cost to the WMF in time, money and community relations of pursuing a minimal improvement in that metric. NebY (talk) 19:57, 10 June 2025 (UTC)

:::::::::::@NebY The point of the community consultations is do that the team can address concerns and look at methods to minimize the costs that you mentioned. Sohom (talk) 20:27, 10 June 2025 (UTC)

::::::::::::Can minimising the costs include re-evaluating the project and agreeing that it is not worth it? Your phrasing suggests not and that a single metric will be pursued regardless of other considerations. NebY (talk) 21:07, 10 June 2025 (UTC)

:::::::::::::@NebY I do not think the overall EditCheck project is going to be reconsidered (especially since it has been partially deployed on other wikis) though the specific ToneCheck component and it's applicability to enwiki definitely can be (and that decision will be not up to the team but the community).

:::::::::::::To answer your question below, to my understanding, the team intends to understand the problems raised and to balance the trade-off there and build good tooling for folks doing anti-vandalism/detecting spammers. I don't know what you took away from the statement I said above, but the point is not to keep promotional editors in, and the long-term editors out, but rather to notify the good-faith people making the mistake so that they can course-correct and avoid getting demoralized while also preserving status quo in our ability to moderate content. Sohom (talk) 23:47, 10 June 2025 (UTC)

::::::::::::::If I understand you correctly you believe there is a significant amount of overly enthusiastic goodfaith people who make promotional edits because they don't understand the rules who can be converted to goodfaith editors ("%celebrity% is the best evar!!1!") while I (as a jaded person) believe that that is a small minority of the people who make promotional edits. I believe that a large majority of people who make promotional edits are doing that to promote something. Polygnotus (talk) 23:53, 10 June 2025 (UTC)

:::::::::::::::Yep! You've summarized my position better than I could :) Sohom (talk) 00:01, 11 June 2025 (UTC)

::::::::::Is the retention of editors who defend the encyclopedia against promotion also a consideration, and does the same {{tq|"even if the fraction is low"}} apply to the risk of them giving up? NebY (talk) 21:33, 10 June 2025 (UTC)

::::::::Is the AI capable of distinguishing unintentional promotion from the wholly intentional that we see so often? NebY (talk) 19:38, 10 June 2025 (UTC)

:::::::::To my understanding, no. Sohom (talk) 19:45, 10 June 2025 (UTC)

:Also, wtf are you doing [https://en.wikipedia.org/w/index.php?title=Wikipedia%3AVillage_pump_%28WMF%29&diff=1294431072&oldid=1294430927 here]? That is impersonation. If I thought it was a good idea to close I would've done that. If you wanted to close it you could. But please do not edit my comments. People get very annoyed if you do that. Thank you, Polygnotus (talk) 18:03, 10 June 2025 (UTC)

::I don't think it's impersonation but I understand your concern of misconstrued intentions, you can just revert and move on though, I'm pretty sure they didn't mean anything by it. --qedk (t c) 18:09, 10 June 2025 (UTC)

:::@QEDK Yes, I am very much assuming good faith, and I love Sohom, but I get very annoyed when people edit my comments. Hence my warning that {{tq|People get very annoyed if you do that}}. Polygnotus (talk) 18:13, 10 June 2025 (UTC)

:::That, I just assumed based on the way things were laid out that you had intended it in that particular format, feel free to revert. It just makes more sense to centralize discussions about this topic at this point. Sohom (talk) 18:14, 10 June 2025 (UTC)

::::@Sohom Datta I agree that it makes more sense to centralize discussions, and I don't even disagree with closing, but you get very annoyed if I edit your comments to insert interesting facts about ducks and I get very annoyed if you edit mine. It is something we both share. So we only edit our own comments. And for a bit there I lived in an alternate reality where I had closed a section with absolutely no recollection of ever having done that, while clearly remembering having left that comment, and I had to dig through the history to confirm that my memory was still trustworthy. Polygnotus (talk) 18:16, 10 June 2025 (UTC)

:@Sohom Datta Thanks to you and the Tone Check team for setting up this consultation. As I'd warned, I'm recovering from COVID and couldn't predict when I'd be awake; and, just my luck, this turned out to be the first day in a week that I wasn't awake that time. But I hope some kind of good discussion was able to happen with others who did make it. -- Tamzin[cetacean needed] (they|xe|🤷) 21:25, 10 June 2025 (UTC)

::@Tamzin Unsurprisingly [https://etherpad.wikimedia.org/p/DiscordToneCheck the notes] taken by the WMF say: {{tq|Volunteers feel confident WMF Staff developing Tone check really understand the concerns/risks being raised in the WP:VPWMF discussion.}} so I doubt it was a very productive discussion of the pitfalls, downsides, drawbacks, limitations, risks, flaws and challenges. Polygnotus (talk) 21:29, 10 June 2025 (UTC)

:::@Polygnotus That was the expected outcomes of the meetings, potentially used to tell peeps who were joining what the agenda was. That's how notes are taken for these kinds of discussions. Everything after that is a summary of what happened, which it appears wasn't written by staff (as evidenced by the difference in color) Sohom (talk) 21:34, 10 June 2025 (UTC)

::::Maybe it should be "Intended outcomes" not "Outcomes"? Polygnotus (talk) 21:43, 10 June 2025 (UTC)

::::I'm glad to see some progress was made. From the minutes, it doesn't sound like the WMF is close to having an answer to the fundamental concerns about making a tool for spamming better and then enabling it by default for all users, necessarily including spammers. As I cautioned when you first suggested I talk to some people on the team, they're not going to have a good answer to that, because there is only one good answer, and it's to not make the damn thing. In a collaborative community, that is what we do when someone has a fundamentally bad idea: We tell them to stop pursuing it. -- Tamzin[cetacean needed] (they|xe|🤷) 21:44, 10 June 2025 (UTC)

:::::@Tamzin How do you feel about this same idea but without telling the (potential) spammer and only in retrospect after the edit was made/draft was posted? So only disclosing the sentiment analysis to NPP/AfC/admins et cetera? Polygnotus (talk) 21:50, 10 June 2025 (UTC)

::::::I am less opposed, but still concerned about creating a large language model that could be used outside of our own servers to create more presentable slop. The obvious solution remains simply not doing this at all. The marginal benefit the WMF seeks here is pretty minor. -- Tamzin[cetacean needed] (they|xe|🤷) 21:57, 10 June 2025 (UTC)

:::::::That is probably true, but it is easier to convince the WMF to pivot than it is to tell them to drop it. Polygnotus (talk) 22:00, 10 June 2025 (UTC)

:::::@Tamzin I'm not a 100% convinced that it's a completely bad idea and while I agree that it does not appear that we have a definitive answer today, I think there are technical improvements to be made that could made (for example, T395166) which mitigate a large portion of the risk. To my understanding the call was primarily so that the team was aware and understood what we are concerned about and not necessarily to pull the proverbial mitigating bunny out of the hat. I think we should give the team some time. If the mitigation(s) are not sufficient at the time of deployment I'm pretty sure we can ask them to shut/undeploy this particular component from enwiki and I see that folk have already raised the point of Community Configuration not being sufficient in this case. Sohom (talk) 22:30, 10 June 2025 (UTC)

::::::@Sohom Datta So they are going to publish the model, make a test page where anyone can enter some text to get a score, and then get proper community consensus first (with a link on WP:CENT to a discussion on WP:VPT and then a RfC a week or two later, and not with one of their weird surveys on qualtrics or limesurvey) before potential deployment right? If that isn't their plan, can you please explain to them that they must do that? {{tq|which mitigate a large portion of the risk}} The linked Phab ticket does not mitigate a large portion of the risk.

::::::{{tq|If the mitigation(s) are not sufficient at the time of deployment I'm pretty sure we can ask them to shut/undeploy}} We won't need to because they will allow the community to test the feature and are then going to get proper community consensus right? Polygnotus (talk) 22:37, 10 June 2025 (UTC)

:::::::To my understanding the answer to that is yes. I do not have access to a time machine (last I checked) so I cannot predict the future (obviously). The team has already done a fair bit of the right things (by consulting the community and releasing early prototypes, which led to this concern being surfaced in the first place) and I expect them to generally do the right thing and follow community norms in general. Sohom (talk) 22:52, 10 June 2025 (UTC)

::::::::@Sohom Datta Please make sure. Thanks. I am happy to test the model and provide further feedback. Downside is that I am a jaded nitpicker. Upside is that I am usually correct. Please let me know when and where I can download the model. Polygnotus (talk) 22:56, 10 June 2025 (UTC)

::::::::Oh, almost forgot. Will there be an API endpoint (public/OAuth required)? Polygnotus (talk) 23:01, 10 June 2025 (UTC)

:::::::::Almost all LiftWing models do, so pretty sure that will be a yes. Sohom (talk) 23:23, 10 June 2025 (UTC)

:::::::I appreciate the desire for anyone to be able to test the model, but if it is published for anyone to run, then it can be used by malicious people to train their programs or staff. I am concerned about the risk this would pose. There will be no logs on the server to consult if the iterations are being done off-wiki. isaacl (talk) 01:05, 11 June 2025 (UTC)

::::::::@Isaacl Even if they keep everything a secret, people with a tiny bit of scripting knowledge will be able to use oracle attacks to figure out everything they need to know. Security through obscurity never works. Polygnotus (talk) 05:39, 11 June 2025 (UTC)

:::::::::I've already discussed how malicious people can implement their own quality controls and develop their own programs, so there's no need to tell me about what they can do. Nonetheless, that doesn't mean we should allow them to train on the same Wikipedia quality controls being applied in production. isaacl (talk) 06:09, 11 June 2025 (UTC)

::My headline would be despite all the thoughtful commentary here the team didn't understand the community concerns. I think some marginal progress was made during the meeting. I will hope that more progress, real progress, is made after when the team has time to reflect on the discussion. I was only able to stay for 45 minutes so perhaps things changed after I left. Best, Barkeep49 (talk) 01:41, 11 June 2025 (UTC)

  • In contrast to Barkeep I was only there at the tail end of the meeting and my interpretation was very different. The team accepted/agreed to look into several proposed improvements and alterations that, in my view, would make me fully support the implementation of ToneCheck on enwiki. These potential changes include integration with edit filters, logging edits flagged by ToneCheck, and capping the number of times the same edit/user is flagged, to prevent "oracle attacks" (or just good-faith editors circumventing a warning they do not understand by repeatedly slightly altering their edit).

: I share Sohom's optimistic interpretation that most editors adding non-neutral wording are not looking to promote stuff, but simply do not understand how Wikipedia works. A wise editor once pointed out that schools spend the first two decades of childrens' lives drilling them in argumentative essay writing, so we cannot blame those people when they then come to Wikipedia and continue writing in that style. Toadspike [Talk] 18:02, 12 June 2025 (UTC)

::Capping the amount of times the same edit/user is flagged would not prevent oracle attacks.

::Make a list of 1000 editors who made a promotional edit. What percentage of those edits can be salvaged if rewritten? What percent of those editors can be turned into goodfaith net positive Wikipedians? 2%? Polygnotus (talk) 18:07, 12 June 2025 (UTC)

:I think ToneCheck is a great idea. The idea that we should make it harder to edit Wikipedia so it's easier to "trap" disruptive individuals is textbook WP:BITING. The main problem with promotional editing is a lack of neutrality. It's not a game where we try to play "gotcha" and get editors banned. Banning editors is a last resort to protect the encyclopedia, and we should always prefer improving editors to excluding them. This proposal prevents common promotional wording from even entering the encyclopedia in the first place. Chess (talk) (please mention me on reply) 02:39, 15 June 2025 (UTC)

::@Chess How do you propose we improve promotional editors who show up to promote a product/brand/company? {{tq|The idea that we should make it harder to edit Wikipedia so it's easier to "trap" disruptive individuals is textbook WP:BITING.}} No, it isn't, and no one proposed that. {{tq|This proposal prevents common promotional wording from even entering the encyclopedia in the first place.}} No, it doesn't. Have you read the above? Polygnotus (talk) 02:42, 15 June 2025 (UTC)

:::{{re|Polygnotus}} I did read the above. To give an example, you said: {{tq|New editor retention is not an absolute good to be pursued regardless of cost, whether cost to the encyclopedia in degraded quality and the attendant reputational harm,}} It's obvious you want to filter out promotional editors based on motives, since you don't view them as improvable.

:::I'm coming at this with my experience in WP:CTOPS like Israel-Palestine, where approximately 100% of editors have some kind of agenda. Generally, that agenda is fixing Wikipedia's bias against their ethnicity or cultural group. Their remedy is adding biased statements in the other direction.

:::Your proposal to use this to retrospectively identify editors that become biased after reaching 500/30 isn't useful. We've had 5 ARBCOM cases and hundreds of WP:AE threads, and it's a game of Whac-A-Mole that isn't working.

:::What does work is guiding editors towards WP:NPOV and reassuring them that our policies are being evenly applied. ToneCheck can be helpful, because it's a machine tool, not an editor who likely has an interest in promoting or advancing the goals of their specific side. Chess (talk) (please mention me on reply) 03:21, 15 June 2025 (UTC)

::::@Chess {{tq|To give an example, you said}} No, that was {{noping|NebY}} who is far more eloquent than I am. {{tq|It's obvious you want to filter out promotional editors based on motives, since you don't view them as improvable.}} Nah, I said there was like 1-2% who might be improvable. But that is pretty optimistic. Do you have any evidence of people who come here to promote a brand/product/company and then gets turned into a productive editor?

::::{{tq|What does work is guiding editors towards WP:NPOV and reassuring them that our policies are being evenly applied.}} Why would lying to them work? No one believes that this planet, or any of the systems on it, is fair. Wikipedia certainly isn't fair. In the CTOP area some try to be fair (but their perception is flawed, like all humans), most do not.

::::{{tq|What does work is guiding editors towards WP:NPOV}} New editors are immune to PaGs. And WP:NPOV is 5502 words. I am pretty sure that in the history of Wikipedia no one has ever turned a PIA POV editor into a productive editor with gentle guidance.

::::{{tq|Your proposal to use this to retrospectively identify editors that become biased after reaching 500/30 isn't useful.}} Why not? How do you know? Why should we trust you?

::::{{tq|ToneCheck can be helpful, because it's a machine tool, not an editor who likely has an interest in promoting or advancing the goals of their specific side.}} So you think that a small language model cannot be biased? There are roughly a quarter million news articles published recently describing AI bias. If the training data (the internet/Wikipedia) is biased then the AI output will be biased. Computers running AI are glorified calculators on steroids mixed with crack; not impartial arbiters of truth. Markov chains do not lead to spiritual enlightenment. Polygnotus (talk) 10:14, 15 June 2025 (UTC)

:::::I'm constantly involved in arguments in ARBPIA, and the one thing that can get editors to make concessions is the perception that a neutral standard is being applied.

:::::I'm currently working on this with the term "massacre", which is unevenly applied across the topic area. Editors are willing to !vote to remove it when the term is used for the killing of people on "their side", so long as they see the same standard being applied to killings of people on the "other side". Arguments about minor style issues consume less time.

:::::I have less experience with COI/promotional editors outside of AfC. It'd be nice to avoid forcing people through a 4 month queue only to get rejected for blatantly promotional wording. Chess (talk) (please mention me on reply) 19:25, 19 June 2025 (UTC)

::::::{{ping|Chess}} {{tq|I'm constantly involved in arguments in ARBPIA}} Oh man that sucks. Try to escape!

::::::{{tq|the one thing that can get editors to make concessions is the perception that a neutral standard is being applied.}} Have we tried the honesty technique? "We are all fallible humans and most of us are trying to do the right thing but this stuff is very difficult and it is easy to fall into the trap of tribalism."

::::::{{tq|I'm currently working on this with the term "massacre", which is unevenly applied across the topic area.}} Even what we consider to be reliable sources use terms unevenly. There are no secret balanced sources we can use so it would be good if the general public develops a bit of media literacy. But that costs billions in education.

::::::{{tq|It'd be nice to avoid forcing people through a 4 month queue only to get rejected for blatantly promotional wording}} Yeah it would be pretty easy to write some code to give AfC reviewers a list of drafts that should most likely be rejected. I proposed something like that once. Polygnotus (talk) 21:55, 19 June 2025 (UTC)

:::::::{{tq|Have we tried the honesty technique?}} Everyone honestly believes that Wikipedia is irredeemably biased against "their side". This perception is fed by individual cases of POV-language introduced by drive-by editors without discussion. Since humans tend to notice unequal treatment when it benefits "their side", any unequal treatment turns into accusations of bias.

:::::::Any effort that establishes consistency is beneficial. It doesn't actually matter what the standard is so long as it's outside of the direct control of participants in the current dispute and perceived as being somewhat neutral. A literal dice roll hosted by the WMF could benefit the area.

:::::::{{tq|Oh man that sucks. Try to escape!}} It's too fun to leave at this point. I like the dance of negotiation and bargaining on article content. I am learning a lot.

:::::::Just write the code for AfC instead of proposing it. Chess (talk) (please mention me on reply) 22:16, 19 June 2025 (UTC)

The WMF would like to buy you books

There's a new pilot program open at Wikipedia:Resource support pilot, where editors can submit requests for the WMF to buy sources for them. I encourage folks to check it out, and notify any WikiProjects and editors that may be interested. Toadspike [Talk] 18:46, 10 June 2025 (UTC)

: Now that's just cool. – SJ + 00:40, 11 June 2025 (UTC)

:This is an excellent idea. Polygnotus (talk) 07:53, 11 June 2025 (UTC)

:That would be a great complement to the Wikipedia library! —Ganesha811 (talk) 01:16, 12 June 2025 (UTC)

:More of this. -- LCU ActivelyDisinterested «@» °∆t° 19:43, 12 June 2025 (UTC)

:I wonder with whom this idea originated? They deserve some plaudits. SnowRise let's rap 15:33, 13 June 2025 (UTC)

::It's an old idea with a few previous iterations, but plaudits for this specific initiative (or at the least for being the public face and focal point for this initiative) go to RAdimer-WMF. CMD (talk) 16:33, 13 June 2025 (UTC)

:::Well, tally one for the global communications team. :) SnowRise let's rap 12:05, 15 June 2025 (UTC)

:Notified Women in Red and WikiProject LGBTQ. Polygnotus (talk) 00:53, 20 June 2025 (UTC)

Proposal: abolishing fundraising banners from the English Wikipedia

{{atop

| status = closed

| result = There is little support for this and the conversation is unlikely to lead to anything productive. Mike Christie (talk - contribs - library) 14:57, 13 June 2025 (UTC)

}}

Wikipedia does not allow advertisement, so why do we allow fundraising banners all the time on these pages? This is essentially advertisement for the Wikimedia Foundation.

In the past, this was justified as a necessary evil since Wikipedia needed funds to survive. However, the WMF had over $271 million of assets by the end of the 2024 fiscal year.[https://upload.wikimedia.org/wikipedia/foundation/f/f6/Wikimedia_Foundation_2024_Audited_Financial_Statements.pdf Financial Statements, June 30, 2024 and 2023] Wikimedia Foundation

Even with a very conservative average 5% interest rate, and reinvesting half of it in the fund to compensate for inflation, this would mean almost $7 million per year. This is more than double the ~$3 million that the WMF spent in internet hosting in fiscal 2024 (although that does not include salaries, the other $4 million should be more than enough to pay for salaries of technicians and other essential workers). This is before any donations coming even without the banners, which would likely be much more than the $7 million from the endowment.

What about the $178 million that the WMF spent in the last year? It would necessarily need to be cut drastically to focus back on the core responsibility of the WMF: running the servers. The WMF has demonstrated many times by now that not only they are not willing to invest this extra money in projects that are requested by the community (see for example the Graph extension being down for 2 years, replaced only now by a barely functioning alternative, or the community wishlist which is consistently ignored), but they spend money and resources in ways that are explicitly opposed to the wishes of the community (and with barely any community consultation), sometimes even risking the whole project in the process (see the incredibly misguided AI proposal under discussion here).

{{talkref}} Ita140188 (talk) 21:22, 12 June 2025 (UTC)

:Valid complaints. They forget that Wikipedia (and it's images in commons) is beyond being the flagship of WMF, it is THE ship that it rides on. But but that means that Wikipedia needs to help fund raise to some extent. Sincerely, North8000 (talk) 21:42, 12 June 2025 (UTC)

::My point is that proceeds from the endowment (which the WMF already has established) should be enough to ensure basic infrastructure working for the foreseeable future. Any more funds coming are welcome (people will keep donating anyway without the banners), but they are not vital: fundraising banners trick people into believing that a donation is essential to the survival of the project, when this is not true at all. If anything, it seems that more money is a threat to the project rather than a help. Ita140188 (talk) 21:52, 12 June 2025 (UTC)

:Yes. Wikipedia existed before the WMF, but that seems to not be realised by many WMF staff, who seem to think that the WMF owns Wikipedia. The WMF has become much more bloated than is needed to provide logistical support to Wikipedia. Phil Bridger (talk) 22:00, 12 June 2025 (UTC)

:I am glad the foundation can do more than zero work developing mediawiki software. I find it essential that we have a legal department - seemingly not contemplated in the "we can run the WMF for 7 million dollar budget" - who can offer legal assistance to editors facing lawsuits and can afford to hire top notch representation to fight back legal challenges, as they have and continue with ANI's baseless lawsuit. Best, Barkeep49 (talk) 22:14, 12 June 2025 (UTC)

::Donations will still come even without the banners, so the budget would be substantially higher than $7 million (probably by a factor of 10, considering previous estimates of how much the English Wikipedia banners contribute to the total donations). There will be enough funding for a legal department. As for the development of the Mediawiki software, there is plenty of examples of successful open source software developed entirely by volunteers (see Linux and its ecosystem for example) so I don't see why it should be different for Mediawiki (which by the way is already very mature and does not necessarily need large amount of work in any case). Ita140188 (talk) 22:22, 12 June 2025 (UTC)

:::It is not developed just by volunteers. A ton of the development also comes from WMF employees. And the tasks tracker is, of course, administrated by the WMF; the maintainers of MediaWiki also belong to WMF. Linux also has people paid by the Linux foundation and many "volunteer"s are paid by big companies like RedHat to contribute to Linux. Aaron Liu (talk) 01:40, 13 June 2025 (UTC)

:::You are comparing a critical/foundational infrastructure software against Mediawiki? Linux depends on paid "volunteers" for the continual development. A good chunk of codes in Linux are being written by [https://www.theregister.com/2023/02/24/who_writes_open_source/#:~:text=So%2C%20yes%2C%20open%20source%20certainly,;%20Meta;%20and%20Red%20Hat. programmers who have vested interests], e.g. Intel engineers pushing their firmware updates. How many large companies are there that rely solely on MediaWiki? A lot of successful open source movements have a corporate sponsor or two. WordPress relies on Automattic; Redis just turned corporate; its fork, Valkey is driven by volunteers employed by large corporations/entities like AWS; Chromium by Google, and increasingly Microsoft as well; MySQL and Java are with Oracle. In a way, Mediawiki benefits from having the Foundation as the sponsor as it distances the influence of corporations from its development. – robertsky (talk) 01:40, 13 June 2025 (UTC)

::Agreed. We are shortly going to have cause to be immensely grateful for the foundation's massive legal and public outreach warchest, you can be well assured of that. The greatest existential fight of this project's entire history is on the immediate horizon, make no mistake. {{pb}}Now personally, I believe that the WMF horrifically failed in its ethical duties to volunteers and to the community in several aspects of the ANI debacle. People have recently celebrated (with good cause) that the Supreme Court of India "permitted us" to reinstate our article on ANI, while looking past the questionable decisions of the WMF in using an office action to overrule the community's prerogative without consultation in the first instance.{{pb}} To say nothing of the fact that in order to preserve that appeal before the high court, the WMF (after disingenuously hand-waving away community concerns that they would do this very thing) decided to throw community volunteers thoroughly under the bus by disclosing PII to the court of appeals, knowing it would end up in the hands of ANI and other third parties in a dangerously sectarian context--thereby betraying a decades-long convention for how we protect our volunteers and vitiating the trust that accrued from those assurances. To say nothing of how the Foundation's management of those issues seriously damaged the faith of the community that the old standards for shared leadership in moments of crisis would be respected.{{pb}} All of which is to say: I get why trust in the foundation is at a low ebb. In the course of one short year, I for one went from someone who would easily, vocally, and consistently support the WMF in discussions like this (as a consequence of a history with non-profit administration and an appreciation of the organizations formal duties and special remit) to someone who could not really be any more concerned about the org's leadership and it's drift away from community values and towards an increasing propensity to try to unilaterally define the movement's priorities and steer its course. {{pb}} But those issues are at most tangentially related to fundraising, "bloat", or expanding operational costs. None of these things present a serious risk to this project's autonomy or functioning. The real issues are that the WMF Board and senior operations staff have been allowed to become increasingly isolated from the direction and influence of the project communities, becoming more and more untethered from taking their ques on movement priorities from (or having any true accountability to) the communities of the projects and affiliates. The ship is going to have to be righted in that respect in the very near future, because the only way we will be prepared to meet the challenge that is coming is if the communities and the WMF prepare a united and well-formed front against that storm. {{pb}}That is part of why the ANI situation has left me so ill-at-ease (though I would have been opposed in principle to selling out those editors regardless): I recognized that it should be seen as a trial run for the bigger contest that is coming on turf where comity concerns will provide the Foundation and en.Wikipedia even less room for evasion, and I didn't like what I was seeing regarding the Foundation's response under the much lower threshold of pressure it was facing in that rehearsal fight. From its a priori choice of priorities, to its questionable approach to the legal issues, to it's utterly confused and at times outright disrespectful approach to communication with the community, I feel that they demonstrated that they do not have the right people in charge to meet this defining moment for the movement. {{pb}}That's why I think the recent announcement of a CEO transition could not possibly be more consequential. I can't help but feel that there is an opportunity here to re-align the Foundation with broader movement priorities and rehabilitation of the Foundation's responsiveness to the community. But whether the vestigial organs of communication between the two heads of the behemoth are still operational enough to allow for any serious improvement in that respect remains to be seen. {{pb}} But one thing I know is certain: we gain very little from attempting to reduce the Foundation's financial resources, and any effort to block fundraising banners may in fact embolden the more autocratic elements at the WMF to just technically circumvent the community's will in the unlikely event it supported this proposal, further fracturing our trust and unity at a moment when we should be in full damage control mode with regard to repair our means of rowing together and demonstrate mutual respect. SnowRise let's rap 00:38, 13 June 2025 (UTC)

:I agree with Barkeep, but would be curious to see the data on how much banners impact the total donations. However, given the current political climate in the United States (where the servers are hosted), I believe that a fight between the community and the WMF will not be helpful for the encyclopedia's future. Chaotic Enby (talk · contribs) 22:55, 12 June 2025 (UTC)

::That's a good point, and I think it suggests another line of discussion: the need for decentralization of Wikipedia's infrastructure and governance. The authoritarian drift and the decline of the rule of law in the United States is a huge risk to the project right now, even without a fight between the community and the WMF. Ita140188 (talk) 23:04, 12 June 2025 (UTC)

:::There are two problems with that: 1) to say that such a move would be divisive among the various communities and within the WMF itself is about the understatement of the century. And 2) such a move would be so technically, operationally, administratively and legally complex to be as next to impossible at this juncture and, if feasible, would take nothing less than a good number of years. We don't have that time right now. The situations which so demand uniformity of planning and good faith between the community and the foundation are essentially right upon us. Now is not the time to be further damaging the sense compatriotism between the community and the Foundation. And I'm not saying the community doesn't have reasons to be concerned (read my exhaustive post immediately above to see just how much I'm not saying that). But right now we need a firm mood of detente, not more squabbles over comparatively inconsequential issues that can be re-visited in a few years if we manage to shield the critical projects against the efforts at repression and censorship that it is about to face. Not that I think it is likely to ever make sense to forbid on-site fundraising efforts. But if a time like that might someday exist, this is certainly not it.SnowRise let's rap 00:48, 13 June 2025 (UTC)

:::The infrastructure is already decentralising, with data centers located across the world, and regular switching of the two main data centers every six months. can more be done? May be, but definitely not at the current costs. One would probably have to increase the amount of servers, storage and data transfer to upgrade the caching data centers to full fledged one and employ dedicated people on site to manage everything. – robertsky (talk) 01:12, 13 June 2025 (UTC)

::Chaotic Enby: the :meta:Fundraising/2022-23 Report claims that the adjustment of banner wording to sound less needy following discussions here led to a "$10 million decrease compared to the 2021 campaign." CMD (talk) 02:01, 13 June 2025 (UTC)

:::Thanks, that is the kind of hard data I wanted to see! Definitely seems to have a major impact then. Chaotic Enby (talk · contribs) 02:16, 13 June 2025 (UTC)

::::If I recall (not that I was directly involved) there were noticeable changes to funding streams as a result. CMD (talk) 02:30, 13 June 2025 (UTC)

:Can we stop with this bullshit please? If you have suggestions for how the WMF should spend its money, get involved at meta-wiki or run for the board. If you think people should donate to other non-profits, go fundraise for them. The WMF exists and it has significant assets which it spends on maintaining Wikipedia, advancing the open knowledge movement, and protecting the community. None of that is not going to change. voorts (talk/contributions) 23:25, 12 June 2025 (UTC)

::+1 CX Zoom[he/him] (let's talk • {CX}) 00:33, 13 June 2025 (UTC)

::+1 Sohom (talk) 00:45, 13 June 2025 (UTC)

::Well put, voorts. I think that a lot of editors who are not involved in the broader open knowledge movement just aren't familiar with how much the WMF does. ThadeusOfNazereth(he/him)Talk to Me! 02:38, 13 June 2025 (UTC)

:I think this proposal is a bit too radical and doesn't show a good understanding of the services that the WMF provides. If the WMF were stripped down to just servers and site reliability engineers, then there would be no conferences (Wikimania, Hackathon) and conference scholarships, no new software features and extensions (I assume you are proposing getting rid of all the "product" teams such as the Moderator Tools Team, Editing Team, Trust & Safety Product Team, etc. that make new software and maintain existing software), no legal department, no Trust & Safety Team, no affiliates, no rapid grants, etc.

:In general, English Wikipedia has a certain amount of political capital that we can use to lobby for changes in other parts of the movement, and I think we should "spend" this political capital wisely. We should spend it on very important issues and in a way that doesn't make other parts of the movement resent us. Or if we do cause tension with other parts of the movement, it needs to be for an issue that is worth it. Let's pick our battles wisely. –Novem Linguae (talk) 00:44, 13 June 2025 (UTC)

::I agree with Novem. An issue as trivial as banners is not what the community should be focusing its political capital on, and sacrificing most of what the WMF does in the name of removing donation banners is not reasonable.{{pb}}One of the main reasons why Wikipedia does not run ads is to stay financially independent of any backers. The WMF, thanks to donations, is exactly what guarantees that financial independence. Chaotic Enby (talk · contribs) 00:49, 13 June 2025 (UTC)

:If you're wondering how this might play out in practice, I'd suggest a quick visit to our coverage on Cutting off one's nose to spite one's face or Throwing the baby out with the bathwater. On a more serious note, no, just no, other folks above me have made excellent points about why starving the parent organization of funds is not a good idea in the slightest. Sohom (talk) 01:04, 13 June 2025 (UTC)

:Cutting off the WMF's main revenue source isn't going to make things like the graph extension get done faster. -- asilvering (talk) 01:40, 13 June 2025 (UTC)

:The WMF continues to hugely support our work despite its shortcomings. See Wikipedia:Resource support pilot for a newly launched example. Building on asilvering, jumping to the nuclear option after the WMF retracted its AI article summaries sends a message that we are incorrigible. ViridianPenguin🐧 (💬) 04:11, 13 June 2025 (UTC)

::I don't support this proposal but I do want to point out that the WMF didn't "retract" the AI project. It's on hold but not canceled. Gnomingstuff (talk) 08:17, 13 June 2025 (UTC)

:Can someone please close this thread? It is very unlikely to lead to any productive discussion. voorts (talk/contributions) 14:37, 13 June 2025 (UTC)

{{abot}}

Biodiversity Heritage Library

WMF-related section [https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(idea_lab)#BHL here]. Cremastra (Go Oilers!) 23:59, 12 June 2025 (UTC)

:BHL = Biodiversity Heritage Library. The linked section is asking for the WMF's help to host them because they lost their funding, it sounds like. –Novem Linguae (talk) 00:54, 13 June 2025 (UTC)

::Actually makes a very good point about the section immediately above this one. Chaotic Enby (talk · contribs) 01:08, 13 June 2025 (UTC)

:::Agreed, and that observation underscores a broader point: with the floor falling out from underneath numerous beneficial projects in the broader free knowledge movement, the Wikimedia movement may be positioned as a possible source of fall-back support for many of them. But none of that is going to come cheap. It's one thing to have reservations about how the Foundation allocates its resources in any given funding/operational cycle. It's another to decide the solution to those misgivings is to undermine its ability to recharge its resources. SnowRise let's rap 01:30, 13 June 2025 (UTC)

Design feedback on category-based template discovery

The CommTech team is soliciting feedback on a set of designs to use categories to improve the quality of life for folks discovering templates through the Visual Editor template selection interface. This work comes from a focus area identified as part of the new Community Wishlist survey. The designs and the survey can be found here. Sohom (talk) 13:23, 16 June 2025 (UTC)

:Folks on WP:DISCORD can leave free-style feedback at [https://discord.com/channels/221049808784326656/1383137736541601932 this thread]. Sohom (talk) 13:25, 16 June 2025 (UTC)

Official Wikipedia Roblox game and Generative AI use

I considered whether to add this as a subsection to the above RFC on WMF AI development, but decided not to as I didn't want to further bloat that discussion. Regardless, just earlier today I came across a [https://www.instagram.com/p/DKfaBAYOvB2/ post on instagram] from the official Wikipedia instagram account ([https://www.facebook.com/share/r/1Ac28e8iQ7/ facebook link] for boomers who don't have instagram) showcasing a new Wikipedia Roblox game. The post was made almost two weeks ago so I'm not sure whether it has already been discussed before, but this is a continuation of the use of generative AI (the cover image for the game page, which is also included in the instagram and facebook posts is almost certainly AI) which has quite openly been discussed and decried by many users in the community. I also think that this is a different issue, though, as rather than this use of AI being even remotely justifiable as trying to improve the quality of the 'pedia, the use of generative AI images in what is basically marketing materials really only serves to costs while providing a worse product. I also echo users concerns about the WMF's environmentalism when they say things like {{tq|The Wikimedia Foundation believes that a long-term commitment to sustainability is an essential component of our work towards the Wikimedia mission and vision}} [https://diff.wikimedia.org/2023/04/20/wikimedia-foundation-environmental-sustainability-report-for-2022/ here], but then use generative AI to create images for their Roblox game.

I'm aware that most folks on here are certainly not the demographic targeted by this sort of post, but in the end it still reflects on us, so I wonder what folks think. Weirdguyz (talk) 00:45, 17 June 2025 (UTC)

:I would have added a link to the Roblox game as well, but roblox.com is on the blacklist, so ¯\_(ツ)_/¯ Weirdguyz (talk) 00:47, 17 June 2025 (UTC)

:: https://www.roblox.com/games/99320538920886/Wikispeedia-the-Wikipedia-Speedrunning-Game * Pppery * it has begun... 01:06, 17 June 2025 (UTC)

:the WMF, last week: {{tq|Bringing generative AI into the Wikipedia reading experience is a serious set of decisions, with important implications, and we intend to treat it as such.}}

:I guess the skibidi brainrot market technically is not the "Wikipedia reading experience" Gnomingstuff (talk) 01:45, 17 June 2025 (UTC)

::{{tq|I guess the skibidi brainrot market technically is not the "Wikipedia reading experience"}}, exactly! {{tq|I'm aware that most folks on here are certainly not the demographic targeted by this sort of post,}} I think is the most important part. We don't know what folks who are actually in that segment want/use. The Future Audiences team is creating short-lived experiments to understand what kind of content the younger generation want. It obviously will be considered borderline by folks who are not the target demographic (which will be a large portion of the community base). I don't support Roblox's exploitative marketplace nor am I supporter of AI image generation, but I do recognize that these explorations are necessary to understand and figure out what kind of media for consuming Wikipedia is popular among the younger crowd (damn, that makes me sound old). Whether or not the WMF invests significantly more resources into that direction and decides to rewrite MediaWiki in Roblox-lang (I believe it is a flavour of Lua?) is up for debate and something that we should (and rightfully does) have a say on. Sohom (talk) 06:04, 17 June 2025 (UTC)

:::Do my eyes deceive me, are you saying Roblox may be incubating a generation of Wikipedia coders? I might change my mind on that game. CMD (talk) 06:13, 17 June 2025 (UTC)

::::The games on Roblox are written using a abridged version of Lua called Luau, so maybe yes :) Sohom (talk) 06:25, 17 June 2025 (UTC)

:::Oh my gripe is certainly not with the fact that they've made a Roblox game, bringing in the younger generations is paramount to the continuation of our goal (I say this as one of the younger (relatively...) generations). My issue is solely with the generative AI used in said pursuit, because the only argument in favour of it is that it is cheaper than paying an actual artist. The quality of the work is worse than if you got an actual artist to make something, the environmental impact is a genuine measurable concern, and the number of people who will see the use of generative AI and be turned off the WMF and Wikipedia is not insubstantial. Weirdguyz (talk) 06:23, 17 June 2025 (UTC)

::::If only we had a repository of free images they could have used instead, or a cohort of editors who might be willing to create and donate actual human work for this. Fram (talk) 07:16, 17 June 2025 (UTC)

:::::We don't really have any Roblox characters on commons (for better or for worse) that could have been used. Sohom (talk) 08:06, 17 June 2025 (UTC)

::::This is my stance as well. That, and the fact that it's terrible optics -- Wikimedia has already gotten [https://www.404media.co/wikipedia-pauses-ai-generated-summaries-after-editor-backlash/ a significant amount of negative PR] for using generative AI in the "paused" summary feature. Gnomingstuff (talk) 15:47, 17 June 2025 (UTC)

:It there is a desire to productively engage on questions regarding the use of generative AI/llms/similar, it is probably not worth it in terms of both time and in terms of effective collaboration to respond to each individual use of gen AI. What is likely more effective is generating engagement with the processes behind them. In this case, the relevant initiative is :meta:Future Audiences. You can see their stance on gen AI at :meta:Future Audiences/FAQ: "The Wikimedia Foundation view of conversational/generative AI specifically is that we (Wikimedians, Mediawiki software developers, and WMF staff) have developed and used machine-assisted tools and processes on our projects for many years, and it is important to keep learning about how recent advances in AI technology might help our movement; however, it is equally important not to ignore the challenges and risks that commercial AI assistants may bring not just to our model of human-led knowledge creation and sharing, but to the entire ecosystem of digital knowledge." I stated somewhere during the discussion of :meta:Future Audiences/Generated Video that there have been some flawed risk considerations, for example that "Experiment" (quoting to indicate this is the terminology they use, not a scare quote) page has a subsection on the risks of associating Wikipedia with TikTok, but nothing on associating Wikipedia with generative AI. (I might add that the first two bullet points at :meta:Future Audiences seem to pose contradictory lessons, possibly worth digging into.) Now, what I haven't figured out and what perhaps we haven't worked out as a community is how to effectively channel feedback about broader themes rather than individual activities, and then perhaps more importantly how we remain continually engaged on that end. Say that the RfC on a statement on AI comes to a consensus, what happens next? It's quite a hard question as to how something as amorphous as en.wiki can be represented in these processes. The Future Audiences team has meetings every month, is an attendee there from en.wiki going to be representative? Should we be proactively trying to figure out statements here for such meetings in advance? How would that be most collegial/effective? A further complication is that the WMF is also not a monolith, the :meta:Reading/Web team for example which is looking into the gen AI Simple Article Summaries is a different team with its own projects. Should we use this noticeboard to figure out statements that can be transferred to meta, or does that fall down as meta threads are also a discussion? We sometimes contribute to community wishlists, we have individual members who engage, but do we as a community have an overall approach? I'm rambling slightly, and I know some would prefer we did not have to engage, but we do have to and given the historical difficulties in communication maybe we could think of some ideas to create something a little more sustained. CMD (talk) 07:57, 17 June 2025 (UTC)

::I think engaging is the only way forward for folks on the teams to know what the communities take on this matter is. Not engaging never was (and still is not) the answer especially if the expectation is for the WMF to reflect the views of the community.

::I can/will try to be around during the next call for Future Audiences whenever that is but I don't think "proactively trying to figure out statements here for such meetings in advance" is the way to go in these kinds of situations, rather the idea would be for the enwiki representative to act as a steward/helpful member who is able to vouch for and provide context for the team's decisions while also guiding the team to not make major policy missteps and provide stewardship on where and when to ask feedback.

::(Unrelatedly, is mw:Future Audiences/Generated Video about AI generated videos or just using generative text-to-speech software (which has been around for a while) ? My understanding was the latter, the former would be concerning) Sohom (talk) 08:29, 17 June 2025 (UTC)

:::My understanding is that the short videos were mostly AI generated, in that the AI did the writing and the voicing (so to speak). I don't recall if the AI chose the images, or whether the final cut was done manually. CMD (talk) 08:37, 17 June 2025 (UTC)

::::@Sohom Datta & @Chipmunkdavis: to create these videos, we use AI to do an initial cut of selecting some images and text from a target article + "hook" (which either comes from DYK or we write ourselves) and summarize the text into a 30-secondish-length video. Members of our social media team then review and make changes to this first draft (ensuring that the summarization of facts from the article is correct and has the appropriate tone, selecting different images from the article or Commons if needed, etc.) before posting. The narration is indeed generative text-to-speech, though we've also gotten some of our staff to supply narration for a few of these. This use of AI helps us greatly reduce the time/cost to make these videos. We're also very happy to feature community-created content on these channels and have published several ([https://www.tiktok.com/@wikipedia/video/7502064132623961350 example from the folks at Wikimedia Armenia]). These take more time & effort, but in the longer term we'd love to get a bigger ratio of community faces to "fun fact" explainers on these channels, so if you or anyone you know is interested in creating some short video content, please get in touch! Maryana Pinchuk (WMF) (talk) 14:34, 17 June 2025 (UTC)

:Creating an AI generated image for social media doesn't bother me. As I said in another WMF related thread, enwiki only has so much political capital, and we should use it wisely, i.e. making a stink only about issues that are truly worth it. –Novem Linguae (talk) 10:59, 17 June 2025 (UTC)

::This is definitely true and we shouldn't be getting pissy everytime the WMF does anything outside of "make enwiki better". Is "AI" (read: chatgpt and LLMs) bad? 100% without a doubt. But if its used on a platform like Roblox, then I really don't care. Roblox is a cesspool anyway. Trying to connect with Gen Alpha and introduce them to Wikipedia (preferably as editors) is a good goal and is something that the WMF should be working on. JackFromWisconsin (talk | contribs) 04:02, 20 June 2025 (UTC)

:Hi @Weirdguyz, member of the Future Audiences team here! TBC, the cover image for the Roblox game was created by the lovely humans in our Brand Studio team, not AI. The game itself also doesn't involve any generative AI imagery. I can understand the confusion, though, given the (for lack of a better word) "robo-blocky" nature of the Roblox aesthetic. Maryana Pinchuk (WMF) (talk) 14:15, 17 June 2025 (UTC)

::@MPinchuk (WMF) any secrets you can let us in on, is the cover character one of the team? CMD (talk) 14:25, 17 June 2025 (UTC)

:::@Chipmunkdavis Ha, I don't think it's meant to look like any specific person... just a cool Roblox guy {{smiley}} Maryana Pinchuk (WMF) (talk) 14:37, 17 June 2025 (UTC)

::{{ping|MPinchuk (WMF)}} Forgive me for being cynical, but I have both seen too many AI-generated images, and played too much Roblox myself (I am quite familiar with the visual style of Roblox, going back over a decade...) to truly believe that generative AI didn't play even a small part in the creation of the cover image without any evidence. Just to illustrate what concerns me most, the design on the bottom of the shoe that can be seen exhibits many of the hallmarks of generative AI images, where it knows vaguely what it is meant to look like, but cant quite get the details correct, so it ends up with lines and structures that don't really go anywhere or don't match correctly. If any insight into the design process for the image could be shown that would be wonderful, but I completely understand that there are limitations to what can be made public. Weirdguyz (talk) 15:05, 17 June 2025 (UTC)

:::@Weirdguyz My apologies, I misunderstood your original question (I thought your concern was about whether we used AI in the design of the game itself, which we didn't) and I didn't address what the process looked like for making the Roblox marketing image specifically. For us, the team responsible for making the Roblox game, the process was: we needed a cover image to use in Roblox and in the social media posts about it that would convey the feel of the game and match the Roblox aesthetic, so we asked our Brand team (who are professional designers who make other marketing materials for our social channels) to help us. They provided a few different ideas, we workshopped which ones we liked and then chose the final design concept together, which Brand then refined and finalized. Honestly, I don't have insight into exactly what tools were used to create or refine the image, and the designer is currently out of office, but it met our needs of conveying gameplay, looking Roblox-y, and being the right size & resolution for social channels.

:::

:::(Also: cool to hear that you're an avid Roblox player! Have you had a chance to play our game? Any thoughts/feedback? We're currently working on some refinements to help with stickiness and learning, i.e., adding some knowledge quizzes to the gameplay – would love to also get your feedback on those changes once those are out in a few weeks.) Maryana Pinchuk (WMF) (talk) 18:22, 18 June 2025 (UTC)

::::@MPinchuk (WMF) Very confusing. Why does the WMF think the community wants it to develop Roblox stuff? If that isn't the case, why does the WMF think Roblox players, who are between 7 and 13 years old are a good demographic to target? Why in this way? How much money and time did this cost? How many billable hours? How will the return on investment be calculated? This seems like a massive waste of time for unclear (no) benefit. And Roblox is truly evil. https://www.youtube.com/watch?v=_gXlauRB1EQ Polygnotus (talk) 16:09, 19 June 2025 (UTC)

:::::7-13 year kids today will one day become 16-17+ year old who might edit Wikipedia (or atleast have a positive association with Wikipedia from a early age). Even if the community did not explicitly ask for a Roblox game, there is implicit consensus on allowing the WMF to experiment and try to attract contributors to the project. I assume this is being thought of as a Gateway drug instead of a thing unto itself. Sohom (talk) 19:11, 19 June 2025 (UTC)

::::::Also this is explicitly important thing to do since more and more companies keep summarizing our info and conveniently forget to link to us decreasing the ability to convert folks into editors. Sohom (talk) 19:14, 19 June 2025 (UTC)

:::::::{{ping|Sohom_Datta}} {{tq|7-13 year kids today will one day become 16-17+ year old who might edit Wikipedia}} Agreed. But then it would possibly be more efficient (and cheaper) to reach out to them when they are 16-17+? {{tq|Even if the community did not explicitly ask for a Roblox game, there is implicit consensus on allowing the WMF to experiment}} Maybe. But when I experiment I don't just randomly smash rocks together to see what happens; I have a hypothesis that I want to prove or disprove to build on underlying knowledge I have acquired over the years. And since I don't start every experiment at zero it is reasonable to ask things like: "What were your assumptions? Why? How will you determine if this was a success?". {{tq|I assume this is being thought of as a gateway drug}} A debunked theory is perhaps not the greatest comparison; but I get what you mean.

:::::::{{tq|Also this is explicitly important thing to do since more and more companies keep summarizing our info and conveniently forget to link to us decreasing the ability to convert folks into editors.}} That genie is out of the bottle. It would be weird to suddenly start demanding attribution. And using an LLM effectively "whitewashes" the use of licensed and copyrighted material. Polygnotus (talk) 21:38, 19 June 2025 (UTC)

::::::::If you know of an effective way to reach 16-17yos, please suggest it as I'm pretty sure anything slightly likely to work will have a good chance of being tried out. I believe the team tracked retention after the first play and stickiness of repeat players as metrics for the initial deployment, although I can't find the report. CMD (talk) 02:48, 20 June 2025 (UTC)

:::::::::@Chipmunkdavis I think that the entire assumption that the kind of people we want are unaware of Wikipedia's existence by the time they have reached 18 is flawed (in the western world). Kinda difficult to keep a "compendium of all human knowledge" a secret from nerds; especially when Wikipedia is usually the top result for any search query on Google.

:::::::::{{tq|If you know of an effective way to reach 16-17yos, please suggest it}} Wikipedia contributors are a very specific kind of people. Marketing companies exist who specialize in this kinda thing.

:::::::::I think the main problem is not brand recognition, but the fact that Wikipedia is shit at converting readers to editors and our tendency to bite even good-faith newbies. The whole set of uw- templates has depersonalized communication and has made human connection even more infrequent. Another problem is that we encourage children who are new to Wikipedia to do vandalfighting which results in them reverting a lot of goodfaith contributions. Polygnotus (talk) 03:16, 20 June 2025 (UTC)

::::::::::I would guess the assumption is more that finding a way to better show the backend (in this case, the web between articles) might make people more interested. This is not a new discussion, and no-one has really figured out a 'solution'. New ideas are much more helpful that saying a current one might not be maximally effective. CMD (talk) 03:20, 20 June 2025 (UTC)

:::::::::::@Chipmunkdavis {{tq|New ideas are much more helpful that saying a current one might not be maximally effective.}} That makes little sense. There are many situations in which an old well-known solution to a problem is superior to whatever new stuff you can come up with. Dismissing all ideas that aren't "new" is unhelpful at best.

:::::::::::Saying that a new bad idea is a bad idea is helpful because people can stop wasting time and money and ideally it would prevent us from making the same or similar mistakes over and over again. And if you read carefully you'll see I also explained why the idea is bad and provided both superior alternatives and advice that could be used to ensure that future plans would be better. Polygnotus (talk) 03:37, 20 June 2025 (UTC)

::::::::::::I did not find your explanations convincing, especially as part of it seemed to rely on there not being any hypothesis. The advice going forward was also quite generic. We don't have an "old well-known solution" here. Nobody has dismissed all ideas that aren't "new". If I was to start somewhere my thinking is that a good part of the issue may be "known", and that the WMF should be doing way more regarding monitoring and evaluating affiliate actions to figure out what is "known". CMD (talk) 03:44, 20 June 2025 (UTC)

:::::::::::::@Chipmunkdavis {{tq|I did not find your explanations convincing}} I can explain stuff, but I can't understand it for you. {{tq|We don't have an "old well-known solution" here.}} Yes we do, and I mentioned it already. {{tq|Nobody has dismissed all ideas that aren't "new".}} See straw man. Polygnotus (talk) 03:48, 20 June 2025 (UTC)

::::::::::::::It's not a strawman, it's a direct reply to your statement immediately above. CMD (talk) 03:50, 20 June 2025 (UTC)

:::::::::::::::@Chipmunkdavis Compare {{tq|Nobody has dismissed all ideas that aren't "new"}} with my comment. Polygnotus (talk) 03:52, 20 June 2025 (UTC)

::::::::::::::::Is the underlying assumption here that I did not do that when actually writing the reply? "Dismissing all ideas that aren't "new" is unhelpful"->"Nobody has dismissed all ideas that aren't "new"" is almost as close as can be. If the discussion is going to be claims that a direct reply is a strawman coupled with swipes about understanding, then it is not going to be lead to any productive outcome. CMD (talk) 03:58, 20 June 2025 (UTC)

:::::::::::::::::@Chipmunkdavis I do not know what you do or don't do. I do not work at one of those 3 letter agencies and therefore all I know about you is what you have written on your userpage, which is not much. Perhaps we both like chipmunks? You seem to interpret the sentence {{Tq|Dismissing all ideas that aren't "new" is unhelpful at best.}} as "You are dismissing all ideas that aren't "new" which is unhelpful at best." but that was not the intended meaning. If it was I would've written that. In my experience most goodfaith people who disagree with me either misunderstand me or do not have (access to) the same information. Especially in cases like this, where it is unlikely that goodfaith people have wildly diverging opinions. Polygnotus (talk) 04:04, 20 June 2025 (UTC)

::::::::::::::::::I interpreted "Dismissing all ideas that aren't "new" is unhelpful at best" as being related to something written prior in the conversation, but not necessarily by me ("You"). My reply "Nobody" was a general reference to all participants of the conversation, not just my comments. I don't think the Roblox experiment will be successful either, but it is relatively small, and does not impede editing or the direct experience of Wikipedia. If I had a better idea that fits the mandate of the Future Audiences team, I would raise it with them. Alas, I do not and right now only have my critical comments about the inherent conflict in their core findings and my related former comment about how their risk assessments have a substantial gap. I don't think either of these would impact the Roblox experiment anyway, and am quite happy for WMF to run relatively safe experiments even if they fail. (My shameful secret is that I have no unique affinity for chipmunks, as inherently valuable as they are, I'm simply stuck in decades of path dependency.) CMD (talk) 04:13, 20 June 2025 (UTC)

:::::::::::::::::::@Chipmunkdavis Are you familiar with Minecraft's [https://minecraft-archive.fandom.com/wiki/Redstone?file=Logic_Gates.png redstone]? The kinda kids who built computers out of them are the kind we want. But they'll probably already know of Wikipedia. I strongly believe that focusing on user retention makes more sense than focusing on user acquisition at this point.

:::::::::::::::::::Cheek pouch says: {{tq|The cheek pouches of chipmunks can reach the size of their body when full.}} Polygnotus (talk) 04:19, 20 June 2025 (UTC)

::::::::::::::::::::I hope we can establish the casual redstoners who just built a door as well as the ones who run Pokemon in Minecraft. I find that cheek pouch statement hard to believe. CMD (talk) 05:23, 20 June 2025 (UTC)

:::::::::::::::::::::@Chipmunkdavis Same. Cheek_pouch#Chipmunks lists 3 refs. Polygnotus (talk) 05:55, 20 June 2025 (UTC)

::::::::In marketing speak, there are brand awareness campaigns and remarketing campaigns. Its primary utility, which is to maintain the brand awareness, which to many people would seem inefficient as it is typically more spray (for awareness) than pray (for returns). As a brand awareness campaign, it is a long shot, but if a few years down the road and some new editors go 'yeah, Roblox! There was that Wikipedia game. I played that.' we know it had done it's work. For the efficiency that you sought, it would usually be remarketing campaigns where the marketers know that what audience to tap on, and what marketing message to design for (i.e. remember the Wikipedia game in Roblox? Here's how you can contribute to Wikipedia.). There is no guarantee that the older kids know Wikipedia in the same homogeneous manner(s) than that of the brand awareness campaigns. – robertsky (talk) 06:38, 20 June 2025 (UTC)

:It's so sad to see the reputation of Wikipedia, built over so many years by volunteers working every day, squandered by the WMF's bad decisions without even consulting the community Ita140188 (talk) 12:27, 18 June 2025 (UTC)

::{{color|blue|citation needed}} Donald Albury 13:22, 18 June 2025 (UTC)

:::yeah its not like Wikipedia has a great reputation. Polygnotus (talk) 16:10, 19 June 2025 (UTC)

::Would love to see proof of our reputation being tarnished in any way by this. This roblox game has literally nothing to do with the editing process over here yet people are treating it like a thermonuclear bomb. Its a silly kids game. Thats it. Its not that deep. JackFromWisconsin (talk | contribs) 04:07, 20 June 2025 (UTC)

:{{re|MPinchuk (WMF)}} Great job! Any chance the game will be open-source?

:Roblox has a lot of young people who also enjoy learning to code. Since the WMF isn't making the game for profit, you might end up with a competitive advantage by allowing the same people who like the game to contribute to it.

:For the record, I do not care if generative AI is used to create cover art for the game. Chess (talk) (please mention me on reply) 22:26, 19 June 2025 (UTC)

::Chess: Thanks for asking! Everything we produce is open source. Please see [https://gitlab.wikimedia.org/repos/future-audiences/roblox this GitLab repo]. Johan (WMF) (talk) 12:05, 23 June 2025 (UTC)

Wikimedia Foundation Bulletin 2025 Issue 11

MediaWiki message delivery 19:39, 17 June 2025 (UTC)

RfC on new temporary account IP viewer (TAIV) user right

There is an RfC on the new temporary account IP viewer (TAIV) user right at Wikipedia:Requests for comment/Temporary account IP-viewer. voorts (talk/contributions) 17:03, 21 June 2025 (UTC)