Jump to content

Artificial intelligence in Wikimedia projects

From Wikipedia, the free encyclopedia

Artificial intelligence and machine learning have long been used in Wikimedia projects, primarily to help edit existing articles.[1]

Some applications of artificial intelligence, like using large language models to create new articles from scratch, have been more controversial than others for the Wikipedia community. In August 2025, Wikipedia adopted a policy that allowed editors to nominate suspected AI-generated articles for speedy deletion.

Wikipedia has also been a significant source of training data for some of the earliest artificial intelligence projects. This has received mixed reactions including concern about companies not citing Wikipedia when relying on it to answer a question as well as Wikipedia’s increased costs from data scraping.

Using artificial intelligence for Wikipedia

[edit]

Earliest use of automated tools, machine learning and AI

[edit]

Since 2002 bots have been allowed to run on Wikipedia but must be approved and supervised by a human.[2] One bot created in 2002 transformed census data into short new articles about towns in the US.[3] Fighting vandalism has been a major focus of machine learning and AI bots and tools.[4][5][6][7] For example, the 2007 ClueBot relied on simple heuristics to identify likely vandalism, while its 2010 successor, ClueBot NG, uses machine learning through an artificial neural network.[8] Machine translation software has also been used by Wikimedia contributors for a number of years.[9][10][11]

Beginnings of generative AI

[edit]

In 2022, the public release of ChatGPT inspired more experimentation with AI and writing Wikipedia articles. A debate was sparked about whether and to what extent such large language models are suitable for such purposes in light of their tendency to generate plausible-sounding misinformation, including fake references; to generate prose that is not encyclopedic in tone; and to reproduce biases.[3][12] An early experiment on December 6, 2022 by a Wikipedia contributor named Pharos occurred when he created the article "Artwork title" using ChatGPT for the initial draft. Another editor who experimented with this early version of ChatGPT said that ChatGPT's overview of the topic was decent, but that the citations were fabricated.[3]

Since 2023, work has been done to draft Wikipedia policy on ChatGPT and similar large language models (LLMs), e.g. at times recommending that users who are unfamiliar with LLMs should avoid using them due to the aforementioned risks, as well as noting the potential for libel or copyright infringement.[12] In early 2023, the Wiki Education Foundation reported that some experienced editors found AI to be useful in starting drafts or creating new articles. It said that ChatGPT "knows" what Wikipedia articles look like and can easily generate one that is written in the style of Wikipedia, but warned that ChatGPT had a tendency to use promotional language, among other issues.[13][non-primary source needed] In 2023, a ban on AI was deemed "too harsh" by the community given the productivity benefits it offered editors.[11][2] In 2023, members of the Wikipedia community created a WikiProject named AI Cleanup to assist in the removal of poor quality AI content from Wikipedia.[14]

Miguel García, a former Wikimedia member from Spain, said in 2024 that when ChatGPT was originally launched, the number of AI-generated articles on the site peaked. He added that the rate of AI articles has now stabilized due to the community's efforts to combat it. He said that majority of the articles that have no sources are deleted instantly or are nominated for deletion.[15] In October 2024, a study by Princeton University found that about 5% of 3,000 newly created articles (created in August 2024) on English Wikipedia were created using AI. The study said that some of the AI articles were on innocuous topics and that AI had likely only been used to assist in writing. For some other articles, AI had been used to promote businesses or political interests.[4][16] In October 2024, Ilyas Lebleu, founder of WikiProject AI Cleanup, said that they and their fellow editors noticed a pattern of unnatural writing that could be connected to ChatGPT. They added that AI is able to mass-produce content that sounds real while being completely fake, leading to the creation of hoax articles on Wikipedia that they were tasked to delete.[17][18]

In June 2025, the Wikimedia Foundation started testing a "Simple Article Summaries" feature which would provide AI-generated summaries of Wikipedia articles, similar to Google Search's AI Overviews. The decision was met with immediate and harsh criticism from some Wikipedia editors, who called the feature a "ghastly idea" and a "PR hype stunt." They criticized a perceived loss of trust in the site due to AI's tendency to hallucinate and questioned the necessity of the feature.[19] The criticism led the Wikimedia Foundation to halt the rollout of Simple Article Summaries that same month while still expressing interest in integrating generative AI more into Wikipedia.[20] The project hints at tensions within the community and with the Foundation over when to use AI.[19]

AI-generated draft article getting nominated for speedy deletion under G15 criteria

In August 2025, the Wikipedia community created a policy that allowed users to nominate suspected AI-generated articles for speedy deletion. Editors usually recognize AI-generated articles because they use citations that are not related to the subject of the article or fabricated citations. The wording of articles is also used to recognize AI writings. For example, if an article uses language that reads like an LLM response to a user, such as "Here is your Wikipedia article on" or "Up to my last training update", the article is typically tagged for speedy deletion.[4][21] Other signs of AI use include excessive use of em dashes, overuse of the word "moreover", promotional material in articles that describes something as "breathtaking" and formatting issues like using curly quotation marks instead of straight versions. During the discussion on implementing the speedy deletion policy, one user, who is an article reviewer, said that he is "flooded non-stop with horrendous drafts" created using AI. Other users said that AI articles have a large amount of "lies and fake references" and that it takes a significant amount of time to fix the issues.[22][23] Wikipedia created a guide on how to spot signs of AI-generated writing in August 2025, titled "Signs of AI writing".[24][14]

Using Wikipedia for artificial intelligence

[edit]

A 2017 paper described Wikipedia as the mother lode for human-generated text available for machine learning.[25] In the development of the Google's Perspective API that identifies toxic comments in online forums, a dataset containing hundreds of thousands of Wikipedia talk page comments with human-labelled toxicity levels was used.[26]

As of 2023, subsets of the Wikipedia corpus were considered one of the largest well-curated data sets available for AI training,[1] used to train every LLM to-date according to Stephen Harrison.[2] This use of Wikipedia was divisive back in 2023.[11]

The Wikimedia Foundation and many of its projects supporters worry that attribution to Wikipedia articles is missing in many large-language models like ChatGPT[1][27] (as well as AI like Siri and Alexa).[2] While Wikipedia's licensing policy lets anyone use its texts, including in modified forms, it does have the condition that credit is given, implying that using its contents in answers by AI models without clarifying the sourcing may violate its terms of use.[1] The Foundation expressed concern that without attribution, people will not visit the site as much or be as motivated to donate to support the project if they do not know when they are benefiting from it.[28] They also noticed an 8% decrease in visitors to Wikipedia in 2025 which they attributed both to the increased popularity of generative AI and social media.[29]

In 2025, the Wikimedia Foundation has cited absorbing increased costs associated with scraping by artificial intelligence companies and is looking to provide training data to companies more efficiently,[30] and make its data more useful[31] including with special licensing deals and paid APIs to limit and recover some of the costs of providing information to AI companies.[32][33][34]

Reactions

[edit]

In 2023, Stephen Harrison cites longtime Wikipedians like Richard Knipel and Andrew Lih, worried about Wikipedia "losing its original bold spirit and developing a knee-jerk resistance to change."[3] Harrison argued that A.I. should be embraced with guidelines like transparency and the need for human supervision.[3]

In July 2025, Jimmy Wales proposed the use of LLMs to provide customized default feedback when drafts are rejected.[35] In October 2025, Wales encouraged AI use for editing tasks like spotting inconsistencies and pointing out missing information, but not yet for writing entire articles.[36] In his 2025 book, The Seven Rules of Trust, Wales also advocated using AI to summarize discussions between editors so that editors new to a discussion could get quickly caught up on the debate.[37]

In an August 2025 interview, Ilyas Lebleu described speedy deletion as a "band-aid" for more serious instances of AI use, and said that the bigger problem of AI use will continue. They also said that some AI articles are discussed for one week before being deleted.[38]

See also

[edit]

References

[edit]
  1. ^ a b c d Gertner, Jon (18 July 2023). "Wikipedia's Moment of Truth". New York Times. Retrieved 29 November 2024.
  2. ^ a b c d Harrison, Stephen (August 24, 2023). "Wikipedia Will Survive A.I." Slate Magazine (Column).
  3. ^ a b c d e Harrison, Stephen (2023-01-12). "Should ChatGPT Be Used to Write Wikipedia Articles?". Slate Magazine. Retrieved 2023-01-13.
  4. ^ a b c Wu, Daniel (August 8, 2025). "Volunteers fight to keep 'AI slop' off Wikipedia". The Washington Post. ISSN 0190-8286.
  5. ^ Nasaw, Daniel (25 July 2012). "Meet the 'bots' that edit Wikipedia". BBC News. Archived from the original on 16 September 2018. Retrieved 21 July 2018.
  6. ^ Simonite, Tom (1 December 2015). "Software That Can Spot Rookie Mistakes Could Make Wikipedia More Welcoming". MIT Technology Review.
  7. ^ Metz, Cade (1 December 2015). "Wikipedia Deploys AI to Expand Its Ranks of Human Editors". Wired. Archived from the original on 2 Apr 2024.
  8. ^ Hicks, Jesse (2014-02-18). "This machine kills trolls". The Verge. Retrieved 2025-12-13.
  9. ^ Costa-jussà, Marta R.; Cross, James; Çelebi, Onur; Elbayad, Maha; Heafield, Kenneth; Heffernan, Kevin; Kalbassi, Elahe; Lam, Janice; Licht, Daniel; Maillard, Jean; Sun, Anna; Wang, Skyler; Wenzek, Guillaume; Youngblood, Al; Akula, Bapi; Barrault, Loic; Gonzalez, Gabriel Mejia; Hansanti, Prangthip; Hoffman, John; Jarrett, Semarley; Sadagopan, Kaushik Ram; Rowe, Dirk; Spruit, Shannon; Tran, Chau; Andrews, Pierre; Ayan, Necip Fazil; Bhosale, Shruti; Edunov, Sergey; Fan, Angela; Gao, Cynthia; Goswami, Vedanuj; Guzmán, Francisco; Koehn, Philipp; Mourachko, Alexandre; Ropers, Christophe; Saleem, Safiyyah; Schwenk, Holger; Wang, Jeff (June 2024). "Scaling neural machine translation to 200 languages". Nature. 630 (8018): 841–846. Bibcode:2024Natur.630..841N. doi:10.1038/s41586-024-07335-x. ISSN 1476-4687. PMC 11208141. PMID 38839963.
  10. ^ Mamadouh, Virginie (2020). "Wikipedia: Mirror, Microcosm, and Motor of Global Linguistic Diversity". Handbook of the Changing World Language Map. Springer International Publishing. pp. 3773–3799. doi:10.1007/978-3-030-02438-3_200. ISBN 978-3-030-02438-3. Some versions have expanded dramatically using machine translation through the work of bots or web robots generating articles by translating them automatically from the other Wikipedias, often the English Wikipedia. […] In any event, the English Wikipedia is different from the others because it clearly serves a global audience, while other versions serve more localized audience, even if the Portuguese, Spanish, and French Wikipedias also serves a public spread across different continents
  11. ^ a b c Woodrock, Claire (May 2, 2023). "AI Is Tearing Wikipedia Apart". Vice Magazine. Archived from the original on October 4, 2024.
  12. ^ a b Woodcock, Claire (2 May 2023). "AI Is Tearing Wikipedia Apart". Vice.
  13. ^ Ross, Sage (February 21, 2023). "ChatGPT, Wikipedia, and student writing assignments". Wiki Education Foundation.
  14. ^ a b Brandom, Russell (2025-11-20). "The best guide to spotting AI writing comes from Wikipedia". TechCrunch. Retrieved 2025-12-14.
  15. ^ Bejerano, Pablo G. (August 10, 2024). "How Wikipedia is surviving in the age of ChatGPT". El País.
  16. ^ Stokel-Walker, Chris (November 1, 2024). "One in 20 new Wikipedia pages seem to be written with the help of AI". New Scientist.
  17. ^ Maiberg, Emanuel (October 9, 2024). "The Editors Protecting Wikipedia from AI Hoaxes". 404 Media.
  18. ^ Lomas, Natasha (October 11, 2024). "How AI-generated content is upping the workload for Wikipedia editors". TechCrunch.
  19. ^ a b Whitwam, Ryan (June 11, 2025). ""Yuck": Wikipedia pauses AI summaries after editor revolt". Ars Technica.
  20. ^ Wiggers, Kyle (June 11, 2025). "Wikipedia pauses AI-generated summaries pilot after editors protest". Tech Crunch.
  21. ^ Maiberg, Emanuel (August 5, 2025). "Wikipedia Editors Adopt 'Speedy Deletion' Policy for AI Slop Articles". 404 Media.
  22. ^ Roth, Emma (August 8, 2025). "How Wikipedia is fighting AI slop content". The Verge. Archived from the original on August 10, 2025.
  23. ^ Gills, Drew (August 8, 2025). "Read this: How Wikipedia identifies and removes AI slop". AV Club.
  24. ^ Clair, Grant (August 20, 2025). "Wikipedia publishes list of AI writing tells". Boing Boing.
  25. ^ Mehdi, Mohamad; Okoli, Chitu; Mesgari, Mostafa; Nielsen, Finn Årup; Lanamäki, Arto (March 2017). "Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus". Information Processing & Management. 53 (2): 505–529. doi:10.1016/j.ipm.2016.07.003. S2CID 217265814.
  26. ^ Blue, Violet (2017-09-01). "Google's comment-ranking system will be a hit with the alt-right". Engadget.
  27. ^ Tremayne-Pengelly, Alexandra (28 March 2025). "Wikipedia Built the Internet's Brain. Now Its Leaders Want Credit". Observer. Retrieved 2 April 2025. Attributions, however, remain a sticking point. Citations not only give credit but also help Wikipedia attract new editors and donors. " If our content is getting sucked into an LLM without attribution or links, that's a real problem for us in the short term,"
  28. ^ Herrman, John (2025-10-18). "Wikipedia Is Getting Pretty Worried About AI". Intelligencer. Retrieved 2025-12-11.
  29. ^ Ha, Anthony (2025-10-18). "Wikipedia says traffic is falling due to AI search summaries and social video". TechCrunch. Retrieved 2025-12-14.
  30. ^ Holt, Kris (2025-04-17). "Wikipedia offers AI developers a training dataset to maybe get scraper bots off its back". Engadget. Retrieved 2025-12-11.
  31. ^ Brandom, Russell (2025-10-01). "New project makes Wikipedia data more accessible to AI". TechCrunch. Retrieved 2025-12-14.
  32. ^ Sophia, Deborah Mary; Hu, Krystal (December 3, 2025). "Wikipedia seeks more AI licensing deals similar to Google tie-up, co-founder Wales says". Reuters.
  33. ^ Maruccia, Alfonso (2025-11-12). "Wikipedia helped train your favorite AI, now the Wiki foundation wants a cut". TechSpot. Retrieved 2025-12-14.
  34. ^ Perez, Sarah (2025-11-10). "Wikipedia urges AI companies to use its paid API, and stop scraping". TechCrunch. Retrieved 2025-12-14.
  35. ^ Maiberg, Emanuel (Aug 21, 2025). "Jimmy Wales Says Wikipedia Could Use AI. Editors Call It the 'Antithesis of Wikipedia'". 404 Media.
  36. ^ Howarth, Tom (2025-10-28). "How AI could soon be used by Wikipedia, according to its founder". BBC Science Focus Magazine. Retrieved 2025-12-11.
  37. ^ Wales, Jimmy (2025). The Seven Rules of Trust: A Blueprint for Building Things That Last. Dan Gardner (1st ed.). New York: The Crown Publishing Group. p. 190. ISBN 978-0-593-72747-8.
  38. ^ Crider, Michael (August 6, 2025). "Wikipedia goes to war against AI slop articles with new deletion policy". PC World.
[edit]