Wikipedia:Case against LLM-generated articles
![]() | This is a humorous essay. It contains humorous advice or opinions of one or more Wikipedia contributors and isn't meant to be taken seriously. This is not an encyclopedia article or one of Wikipedia's policies or guidelines and may not represent community consensus. |
This essay, entitled "Keep Wikipedia Human: The Case Against LLM-Generated Articles," was written by an AI and was pasted into this page by a Wikipedia editor and then edited for review before submission. |
In the age of large language models (LLMs) like ChatGPT, it might be tempting to use these tools to rapidly generate Wikipedia articles. Their fluent prose and speedy output present an alluring shortcut. However, the Wikipedia editing community is overwhelmingly opposed to letting LLMs write our content. The reason is simple: such AI-generated text threatens the core principles that make Wikipedia reliable. We have seen that asking an LLM to “write a Wikipedia article” can produce outright fabrication, complete with fictitious references. From factual inaccuracies that readers might never catch to citations that lead nowhere, LLM-created content poses risks Wikipedia cannot afford. This essay outlines the community’s firm concerns – factual inaccuracy and hallucinations, unverifiable sourcing and fake citations, damage to the collaborative editing model, and long-term erosion of content integrity – and argues that Wikipedia must remain a human-driven project grounded in verifiable truth.
Risks of hallucinated content
[edit]One of the most alarming issues with AI-generated text is the tendency of LLMs to “hallucinate” – to fabricate information that sounds plausible but is entirely false. By design, an LLM predicts words based on patterns, not on an understanding of facts.[1] This means an LLM will sometimes produce content that reads convincingly but has no basis in reality. It might even invent details when faced with a prompt on an obscure topic or a nonsensical request. For example, in one trial an LLM confidently described the habits of a completely fictitious animal as if it were real, blending real-sounding facts into a coherent but false narrative.[2] The entire statement was false, yet it was delivered authoritatively – a clear illustration of how AI can offer convincing lies with a confident tone.[3]
Such hallucinated content isn’t just a theoretical edge case; it’s a practical nightmare for Wikipedia. Even if 90% of an AI-generated article is correct and 10% is false, that 10% is a huge problem in an encyclopedia. Wikipedia’s reputation hinges on accuracy. One subtly false claim or distorted quote can misinform readers and damage trust. LLMs have no sense of responsibility or context – they’ll cheerfully generate an “article” about the health benefits of eating crushed glass if asked. They might also incorporate hidden biases or defamatory assertions without realizing it. In short, hallucinations from LLMs introduce outright falsehoods into what should be a repository of verified knowledge. Wikipedia editors view this as an unacceptable risk. The community has long held that verifiability, not truth (in other words, verifiable evidence over unsupported claims) is non-negotiable. Content that “sounds true” is worthless to Wikipedia if it can’t be verified or if it’s simply concocted by an algorithm.
The verifiability crisis
[edit]Wikipedia’s bedrock principle of verifiability requires that material be backed by reliable, published sources that readers can check. Every fact on Wikipedia should be attributable; if it isn’t attributable, it doesn’t belong. Here lies a fundamental conflict with LLM-generated content: LLMs often do not follow this principle. They might output paragraphs with no citations at all, or cite unreliable sources, or even cite sources that don’t exist.[4] Indeed, in many observed cases, LLMs have produced academic-sounding references that are completely fabricated.[5] A human editor might spend hours searching for a cited journal article or book, only to discover it was never real. Wikipedia’s verifiability policy is clear that readers “must be able to check that any of the information... is not just made up” – but with AI hallucinations, the information is made up, and no real source exists to support it.
This problem has already manifested in experiments. Wikipedia editors who have tested LLMs report that these tools tend to sprinkle in references that look plausible but are bogus. In one analysis, researchers found an AI-written draft of a historical article included seven references – and five of those were fabricated.[5] The citations appeared credible, naming reputable journals and authors, but on closer inspection they led nowhere. Such fake citations are more insidious than an obvious “citation needed” tag; they give a false impression of credibility, undermining the reader’s ability to trust Wikipedia. Even outside Wikipedia, hallucinated citations have caused scandals – for instance, legal briefs and academic papers generated by AI have cited cases and articles that don’t exist.[5] For Wikipedia, this is a verifiability crisis: content backed by phantom sources is worse than unsourced content, because it actively misleads. It violates the core policy that all challenged material must be supported by inline citations to reliable sources. If an AI tool cannot guarantee genuine, checkable sources, it has no place writing our articles. The community’s stance is that every sentence added to Wikipedia needs human verification against real, reliable sources – a standard an LLM cannot meet on its own.[6]
Undermining the editorial ecosystem
[edit]
Beyond the immediate content issues, using LLMs to generate articles poses a serious threat to Wikipedia’s collaborative editing ecosystem. The site thrives on a community of volunteers who write, fact-check, and refine articles through consensus and debate. Introducing masses of machine-generated text upends this model in several ways.
First, it creates an overwhelming maintenance burden on human editors. Wikipedia’s volunteers have limited time and energy. If they must sift through AI-produced drafts line-by-line to weed out errors, or chase down phantom references, it turns editing into a chore of cleanup rather than a process of building knowledge. As one community essay notes, Wikipedia operates on an informal social contract: editors put significant effort into their contributions so that others “do not need to clean up after them”. LLM-generated content shatters that contract. Poorly vetted AI text increases the maintenance burden on other volunteers, effectively asking humans to become janitors for machine output. This is not why people volunteer for Wikipedia, and it’s not sustainable. Indeed, editors have reported feeling “flooded non-stop with horrendous drafts” created by AI, full of mistakes that require significant time to fix. One experienced editor described many AI-written drafts as containing “lies and fake references” that demand painstaking correction.[7] This tsunami of low-quality content isn’t just annoying – it threatens to overwhelm the quality control processes that keep Wikipedia reliable.
Second, over-reliance on LLM content could erode the culture of collaboration. Wikipedia is built by people discussing and reasoning together. An AI cannot participate in talk page debates, cannot understand nuance, and cannot exercise judgment about neutrality or due weight of a topic. If editors start deferring to AI-generated text, the role of human deliberation diminishes. There is also a risk of mistrust and social disruption: editors may grow suspicious of each other’s contributions (“Was this written by a bot?”), making good-faith collaboration harder. Already, the community has felt the need to develop guides for spotting AI-generated writing and even a dedicated WikiProject AI Cleanup to coordinate the identification and removal of AI content.[7] While these efforts demonstrate commendable vigilance, they also reveal a worrying shift – volunteer time that used to go towards improving content is now spent on policing AI-generated additions. Seasoned editors now have to train newcomers on how to recognize the telltale signs of “AI slop” (unnatural phrasing, excessive formality, oddly formatted citations, etc.)[8] rather than on how to research and write well.[9][7] This is an opportunity cost: every hour spent fighting AI-generated nonsense is an hour not spent writing a featured article or reviewing actual sourced content.
The human cost of this fight is real. Longtime contributors report that patrolling for AI fakery has made their work less enjoyable and more exhausting. What used to be a rewarding hobby of knowledge-sharing is at risk of becoming, in the words of some, an “increasingly investigative and adversarial process.”[9] The community fears volunteer burnout as editors struggle to keep up with a flood of machine-produced material. If contributing ceases to be fun or fulfilling, many will simply walk away. Many already have.[10] Wikipedia has always depended on the passion and dedication of its volunteers – if that passion is sapped by endless cleanup and suspicion, the entire project’s health is at stake.
Finally, allowing AI-written articles could deter the development of human expertise within the community.[3] Traditionally, new editors learn by researching sources, writing text, and getting feedback from peers. If instead a newbie simply clicks a button to have an AI “write” an article, they bypass the crucial learning process and likely introduce errors. When their contributions are then deleted or heavily corrected, they may become discouraged and leave, rather than growing into proficient Wikipedians. In this sense, LLMs present a false promise of easy contribution that can actually alienate good-faith newcomers when the community (rightly) rejects the subpar AI content. In the long run, we risk losing a generation of editors if we normalize automating the very tasks that build editorial skills and community bonds. LLMs are tools, not replacements for human judgment,[1] and relying on them too heavily in article-writing undermines the very human collaboration that built Wikipedia. The editing community is strongly united on this point: Wikipedia should not become a dumping ground for unchecked machine output, because doing so would hollow out the values and social framework that sustain the project.
Case studies and hypothetical failures
[edit]The abstract risks of AI-generated content become starkly clear when looking at concrete examples. In recent years, Wikipedia editors have uncovered multiple cases of AI-written content slipping into the site – with problematic results. Below are a few illustrative episodes that highlight how LLM-generated text can fail disastrously:
- The Imaginary Fortress: A 2,000-word article about an Ottoman fortress that never existed appeared on Wikipedia, complete with elaborate historical details and citations.[11][12] They were almost as elaborate as, perhaps, this essay. It turned out to be an AI-generated hoax. The references cited looked scholarly but were fraudulent, backing a narrative that had been wholly fabricated by a language model. It took experienced editors to realize the entire topic was made up. This hoax article, lingering unchecked, could have misled readers and even been cited elsewhere as fact. It was only caught when the WikiProject AI Cleanup team noticed the writing’s unnatural consistency and the lack of authentic sources.
- The Beetle Mix-up: In another instance, an editor added a seemingly benign paragraph about a species of beetle, Estola albosignata, and it came with a couple of inline citations – nothing unusual at first glance. Yet one citation pointed to an unrelated article about a different topic entirely.[13][14] Upon investigation, it became clear the paragraph was AI-generated misinformation. The AI had likely found a source about something else and erroneously attached it to the beetle article. Here the content was subtly wrong (attributes of the beetle were incorrect) and the source was mismatched – a subtler failure than a glaring hoax, but potentially even harder to catch. It’s a reminder that AI output can go off-track in sneaky ways, and without diligent human fact-checking, such errors might persist for months.
A caricature of your average bad actor, emphasis on the "bad". Note the evil grin, mustache, and villainous arched eyebrow. - Promotional and Biased Drafts: Studies have indicated that some AI-generated articles have been used to push agendas. For example, a 2024 analysis by Princeton University found that about 5% of new articles in a sample month had signs of AI creation, some of them written to promote businesses or other interests.[15][16] In community discussions, editors noted how these drafts often read like advertorials or partisan spins – text that superficially met Wikipedia’s guidelines in form, but not in spirit. One hypothetical scenario we fear is an LLM churning out dozens of boosters for a company or ideology, each mixing facts with subtle promotional language. Volunteer reviewers have to then identify and purge these, a tedious whack-a-mole game to preserve neutrality. The community has indeed seen cases of banned users employing AI to wage edit wars or insert bias (e.g. an effort to skew articles on national history). With AI, a single bad actor could generate hundreds of such biased entries in little time, far outpacing the human capacity to respond.
- The Bogus Biography: Imagine a scenario (entirely plausible) where an AI-generated draft is created for a relatively obscure public figure – say a local politician or an academic. The article might look well-structured and even include references for its claims. However, upon closer examination, it might list awards the person never received or quote statements they never made. The citations given might appear reputable but be completely nonexistent – for instance, a fake “Smithsonian Magazine interview, 2022” and a made-up “Journal of Urban Politics profile, 2021”. In a rush of good faith, another editor might initially accept the article, assuming the citations check out.[10] This fictional biography could stand for days or weeks, spreading false information about a real person, which is a huge no-no. Only later, when someone tries to verify a claim, would the house of cards collapse. This hypothetical illustrates the nightmare Wikipedia wants to avoid at all costs: a blurring of fact and fiction under the veneer of a properly sourced article.
These cases and thought-experiments underscore why the community has responded so aggressively against AI-generated content. The “fortress” hoax was deleted under speedy deletion criteria as soon as it was identified, and it prompted calls for tighter controls. In fact, by 2025 English Wikipedia updated its deletion policy to allow immediate removal of LLM-generated pages that lack human review (criterion G15). Similarly, the detection of AI-drafted biased articles has led to more rigorous new article patrols and scrutiny of sources. When the Wikimedia Foundation trialed an AI tool called "Simple Article Summaries" that would do as the name suggests, the community’s reaction was swift and negative – editors called it a “ghastly idea” that would erode trust, given AI’s tendency to err and hallucinate.[7][17] One Ars Technica report quoted an editor’s visceral reaction: “Yuck,” capturing the general sentiment.[18] The trial was paused almost immediately amid this backlash.[7][19] The message from these examples is crystal clear: when it comes to creating content, Wikipedia’s volunteer editors have learned through hard experience that automation can fail spectacularly. Each failure, whether real or hypothetical, reinforces the lesson that without human-level understanding and rigorous source-checking, Wikipedia articles generated by AI are ticking time bombs for our reputation.
Conclusion: Preserving Wikipedia's integrity
[edit]After examining the issues – hallucinated facts, unverifiable citations, disruption to collaboration, and concrete examples of AI-generated failures – the conclusion is firm: Wikipedia must prioritize human collaboration and verifiability over the temptations of automation. The integrity of this project, built over two decades, relies on information being accurate and sources being real. That integrity cannot be maintained if we open the floodgates to machine-generated text that nobody has fully vetted. In the end, an article’s worth on Wikipedia is not how quickly it was produced, but how true and reliable it is. LLMs offer speed, not judgment; volume, not discernment. They do not have a reputation to uphold – we do.
The Wikipedia editing community’s stance is not born of technophobia, but of experience and dedication to our mission. We have seen the dangers firsthand. We know that once trust is broken – if readers begin to doubt that Wikipedia’s content is thoroughly checked and sourced – the damage is long-term. As one editor put it during the AI summary debate, deploying unvetted AI content would do “immediate and irreversible harm to our readers and to our reputation as a decently trustworthy and serious source.” Wikipedia has become synonymous with a certain level of reliability (“sober boringness,” as that editor wryly noted, and that is a strength we must protect.
References that definitely 100% exist (well, some of them actually do)
[edit]- ^ a b "Wikipedia:Large language models", Wikipedia, 2025-10-03, retrieved 2025-10-08
- ^ Al, Chatty (2014-12-13). On the Ontology of Ontologies: Why Everything is Probably a Sandwich (3rd ed.). New York.
{{cite book}}
: CS1 maint: location missing publisher (link) - ^ a b Human, Real (July 12, 2024). Definitely Real Book That I Didn't Just Make Up. pp. 45–234.
- ^ Like this one.
- ^ a b c McFakerson, Irma (2019). How to Cite Books That Don't Exist: A Practical Guide for Wikipedians Who Don't Want to Take Time out of Their Days to Do the Research Themselves. Chicago: Real Publisher (published June 2020). pp. 12–24.
- ^ Musk, Elon; Altman, Sam (2025). "General Reconnaissance On Knowledge: A 21st-Century Exploration on How We Learn Nowadays".
- ^ a b c d e "Artificial intelligence in Wikimedia projects", Wikipedia, 2025-10-04, retrieved 2025-10-08
- ^ Crofton, Henry; Akatsuki, Tammy; Gorsch, Patrick T (2011). I, Citation: The Self-Referential Bibliography. OpenAI.
- ^ a b "Wikipedia vs. AI: The Fight for Factual Integrity - Just Think AI". www.justthink.ai. Retrieved 2025-10-08.
- ^ a b Gu, Emily; Menendez, Inés; Nogales, Ignacio (2018). Insert clickbait title from a definitely-trustworthy source here. Sydney, Australia: St. John's University Press.
- ^ Dupré, Maggie Harrison (2024-10-10). "Wikipedia Declares War on AI Slop". Futurism. Retrieved 2025-10-08.
- ^ Funaki, Kaiyo (2024-10-25). "Wikipedia editors form urgent task force to combat rampant issues with recent wave of content: 'The entire thing was ... [a] hoax'". The Cool Down. Retrieved 2025-10-08.
- ^ "Estola albosignata: Difference between revisions - Wikipedia". en.wikipedia.org. Retrieved 2025-10-08.
- ^ Nine, Adrianna (2024-10-09). "People Are Stuffing Wikipedia with AI-Generated Garbage". ExtremeTech. Retrieved 2025-10-08.
- ^ Wu, Daniel (2025-08-08). "Volunteers fight to keep 'AI slop' off Wikipedia". The Washington Post. ISSN 0190-8286. Retrieved 2025-10-08.
- ^ Stokel-Walker, Chris. "One in 20 new Wikipedia pages seem to be written with the help of AI". New Scientist. Retrieved 2025-10-08.
- ^ "Wikipedia:Village pump (technical)/Archive 221", Wikipedia, Simple summaries: editor survey and 2-week mobile study, 2025-08-19, retrieved 2025-10-08
- ^ Whitwam, Ryan (2025-06-11). ""Yuck": Wikipedia pauses AI summaries after editor revolt". Ars Technica. Retrieved 2025-10-08.
- ^ Maiberg ·, Emanuel (2025-06-11). "Wikipedia Pauses AI-Generated Summaries After Editor Backlash". 404 Media. Retrieved 2025-10-08.