User talk:Newyorkbrad#Word usage
{{User:MiszaBot/config
|algo = old(7d)
|archive = User talk:Newyorkbrad/Archive/%(year)d/%(monthnameshort)s
}}
{{archive box|Index of archives|age=7|bot=MiszaBot III}}
Templates on my articles
- Hello {{ping|Newyorkbrad}} Thank you for removing the template on Paul-Gilbert Langevin. Isn't it the same issue with the following pages : Eliane Montel, Luce Langevin, Roger Dajoz? I let you think about it. Regards. Paul-Eric Langevin (talk) 20:45, 18 February 2025 (UTC)
- {{ping|Paul-Eric Langevin}} Thanks for your post. I am on vacation this week, and trying not to spend too much time online, but will take a look at this later in the month when I get home. Regards, Newyorkbrad (talk) 07:23, 19 February 2025 (UTC)
- Hello {{ping|Newyorkbrad}}, the same problem is as far as I am concerned for the article Langevin family. Thanks in advance. Paul-Eric Langevin (talk) 22:25, 28 February 2025 (UTC)
- {{ping|Paul-Eric Langevin}} I've removed the tags on the three articles about the individuals (although the third could use more sources). I will take a look next time at the family article. Regards, Newyorkbrad (talk) 05:09, 13 March 2025 (UTC)
- Many thanks! Paul-Eric Langevin (talk) 10:27, 13 March 2025 (UTC)
- Hello {{ping|Newyorkbrad}}, what do you think about Langevin family? Have a nice evening. Paul-Eric Langevin (talk) 21:00, 12 April 2025 (UTC)
Thu March 27: Editing the Open Data Garden @ Prime Produce
style="background: white; color: black; border:1px solid #6881b9; margin:0.5em; padding:0.5em;border-radius: 8px;" |
colspan=2 style="font-size:150%; padding: .4em;"|March 27: Editing the Open Data Garden @ Prime Produce |
---|
style="padding-left: .6em;" |
File:Editing the Open Date Garden Wikimedia NYC Farming Concrete.jpg You are invited to join the Wikimedia NYC community for an urban gardening-themed edit-a-thon at Prime Produce in Hell's Kitchen, Manhattan. This event will be hosted in collaboration with Farming Concrete in celebration of Open Data Week in New York City. All are welcome, new and experienced! All attendees are subject to Wikimedia NYC's Code of Conduct and Photography Policy. Meeting info:
|
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)
--Wikimedia New York City Team via MediaWiki message delivery (talk) 03:15, 22 March 2025 (UTC)
Question for discussion
Many pages on Wikipedia, including drafts, deletion discussions, and the like, are not supposed to be indexed by search engines. My impression is that we frequently justify maintaining borderline material on the site, such as drafts and other discussions about living persons that would not be fit for mainspace, by pointing out that "it won't cause any harm because it won't show up in Google searches and the like." Indeed, in the past I have advocated increased usage of "NOINDEX" designations for this specific reason.
However, I have read in several places (a Wikipediocracy thread is the most recent) that AI bots that now routinely scrape Wikipedia often disregard robots.txt and similar designations and scrape every page. The contents of those pages, including those regarded as "not ready for prime time," are then impounded into AI databases and are at risk of being regurgitated as fact in later queries to the AI programs.
If this is the case, what implications does this have for our policies and operations?
I am interested in a broad discussion of this issue, but given my non-technical background, am asking here first to test whether my basic understandings/assumptions are correct, as well as whether this issue has been (or is being) discussed already. I would appreciate hearing from anyone with information on this. Thank you, Newyorkbrad (talk) 15:15, 2 April 2025 (UTC)
:I am a professional in the AI space, read books on this, and go to conferences. I am on the social side of AI and not the technical side.
:I have a general recommendation for comprehending AI trends, which I think has made accurate and actionable predictions to this point. When predicting trends in a well-funded tech direction, do not worry about the AI having bias, making errors, misunderstanding anything, or having shortcomings in the future which are visible as problems today. Instead, your worry should be about a near future when the AI does everything with transcendent perfection. Applied to this case, the concern should be of the AI understanding the exact extent of how NOINDEX pages are different from other Wikimedia content, as well as understanding all other available data. Bluerasberry (talk) 15:40, 2 April 2025 (UTC)
::Thanks for the reply. I think the concern is that some AI programs and programmers may not care about how or why NOINDEX or robots.txt pages are different from other Wikipedia content, but may just be treating pages as data indiscriminately. Newyorkbrad (talk) 15:52, 2 April 2025 (UTC)
:::Hi! Someone alerted me to this because I was puzzling over why we have, essentially, two things that do the same thing (draftspace and userspace), but one seems to waste more editor-hours on busywork. They suggested I put my thoughts down, which I sketched out sorta here User:Tduk/Draftspace. Anyway in a nutshell, if this might be a motivating factor to reform draftspace and actually make it useful to new editors, I’d support it! In response to this direct question, I’d wonder what the impact is on AI of makiing user/draftspace articles vs just making public web pages. Tduk (talk) 19:49, 2 April 2025 (UTC)
::{{tq|When predicting trends in a well-funded tech direction, do not worry about the AI having bias, making errors, misunderstanding anything, or having shortcomings in the future which are visible as problems today.}} Whether or not we are concerned that LLMs will continue to have issues with bias/hallucination in the future, given that we know that current iterations of the technology {{em|does}} have those issues (and that many people seem to give text generated by LLM significantly more trust than I think the evidence suggests that it currently deserves) we should be concerned about the effects that LLM use of draftspace etc. is having {{em|now}}. When google was putting AI generated results at the top of all their searches I regularly found that it gave information which I knew for a fact was wrong.
::Personally I think that current evidence does not suggest that anything approaching an AI which does {{tq|everything with transcendent perfection}} is coming in the near future, but even if it is we still have an indeterminate amount of time now where the nearest thing we have to general AI is LLMs which are not close to transcendent perfection. Caeciliusinhorto-public (talk) 08:32, 3 April 2025 (UTC)
:"[W]hat implications does this have for our policies and operations?" I'd have to suggest that perhaps one of the most obvious ones is that, both from an ethical perspective and possibly a legal one, Wikipedia needs to consider enforcing the 'other pages, including talk pages' provisions within existing WP:BLP policy more strictly. If the bullshit-bots are scraping such content and potentially regurgitating it, and we are aware of the fact, we can't just pretend it isn't happening. AndyTheGrump (talk) 21:26, 2 April 2025 (UTC)
::This makes sense - I'd worry also if people started to become aware of when AI bots did their scraping - they could sneak in vandalism right before the scrape, so that even if it was reverted pretty quickly, it may get in. It seems like in all the internet wars so far involving scraping vs fake content, the fake content has been winning. Tduk (talk) 21:34, 2 April 2025 (UTC)
:It's a shame that [https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/ the original report] came in a Diff post and had a focus on Infrastructure, as the implications probably need supplementary information on crawler workload, to draw out implications on, for example:
:* Draft-space content being consumed for regurgitation;
:* Talk-page and User-Talk-page content being consumed, whether to better simulate human language or to regurgitate discussion points/claims as facts.
:Maybe requests for non-mainspace content could be throttled to one per 10s - wouldn't impact a real user but would detriment a crawler?
:And outside en.wiki, this changing workload may increase the case for stricter validation of content being added to small-language wikis lacking many-eyes oversight, such as the cases of the Scots wiki and the Greenlandic wiki, to avoid unidentified poor translation-tool text being consumed by crawlers and then becoming part of the language. AllyD (talk) 08:37, 5 April 2025 (UTC)
::I'd strongly challenge {{tq|Maybe requests for non-mainspace content could be throttled to one per 10s - wouldn't impact a real user}}—skimming between 100+ near-identical shots on Commons of the same building to try to decide whicb one best illustrates a particular architectural element (for example) isn't at all an unusual situation.
::(Personally, if pressed I'd guess this is an issue which will resolve itself fairly soon. The AI bubble is almost certainly reaching its bursting point, and whichever two or three systems survive the bust will presumably soon have completed their bulk downloads and will just periodically check for new additions from them no. We went through the same thing 20 years ago when every AltaVista, HotBot and AskJeeves was constantly downloading text dumps, and we survived without any obvious problems. For reasons NYB knows well, I have a high degree of scepticism whenever the WMF comes out with any claimed problem to which the answer is "we need more money".)
::My broader thoughts on the AI scraping issue are here, to avoid cluttering NYB's talkpage with what's essentially a lengthy rambling aside. ‑ Iridescent 16:35, 6 April 2025 (UTC)
Help report a IP for vandalism on a NYC page
hello can you please review a IP that i reported that keeps vandalizing the Bensonhurst wikipedia page with racist vandalism. I reported the IP for vandalizing and the person keeps changing their IP every few months. The IP is 182.0.205.13
Thank you. Bklynculture (talk) 01:08, 5 April 2025 (UTC)
WikiNYC April 11: Foundation and Friends' Free Culture Friday
style="background:#00000; border:1px solid #6881b9; margin:0.5em; padding:0.5em;border-radius: 8px;" |
colspan=2 style="font-size:150%; padding: .4em;"|April 11: Free Culture Friday |
---|
style="padding-left: .6em;" |
File:Wikimedia New York City logo.svg You are invited to Foundation and Friends' Free Culture Friday at Prime Produce on Friday, April 11. This event will feature a reception with Wikimedia Foundation staff in the afternoon, followed by a more informal salon, hackathon, and game night, utilizing Prime Produce's vast collection of board games. This replaces WikiWednesday Salon this month. No experience of anything at all is required. All are welcome!
All attendees at Wikimedia NYC events are subject to the Wikimedia NYC Code of Conduct and Photography Policy. |
(You can subscribe/unsubscribe from future notifications for NYC-area events by adding or removing your name from this list.)
--Wikimedia New York City Team via MediaWiki message delivery (talk) 13:30, 8 April 2025 (UTC)