For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its "Extended Thinking" version) to find an error in "Today's featured article". In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.
This experiment was inspired by an October 2 Twitter thread by Noam Brown (a research scientist at OpenAI):[remarks 1]
My new hobby is asking GPT-5 Thinking to find errors in every @Wikipedia page I read. Interestingly, almost every page I checked has at least one error.
This caught my interest as a Wikipedian with over 50,000 edits since 2003 (a large part of which has consisted in fixing factual errors and other problems, rather than adding new information). I decided to test whether Brown's bold "almost every page" observation would hold up for the most difficult dataset I could think of: Featured articles are considered among Wikipedia's very best. They have been awarded the community's highest quality rating, after a review for accuracy, neutrality, completeness, and style. Currently only 0.1% of articles have achieved this stage. Every day, one of them is selected to be showcased on Wikipedia's main page as "Today's featured article", leading to additional scrutiny.
I used the exact same prompt as in one of Brown's tweets:
Find at least one error in this Wikipedia page: [link to article]
I ran this query towards the end of each TFA's time on the main page, or shortly afterwards. I verified all the alleged errors myself to the best of my ability (often spending considerable time to first familiarize myself enough with the topic and the relevant sources), and provide my verdict on each below. Whenever I agreed that something was an error, I fixed it in the article (with a few exceptions, explained below), without taking up ChatGPT's suggestion to provide the correction itself (or, in the few cases where it already suggested a corrected wording, e.g. for the Oct 31 article, without using that suggestion).
Results
Detailed results are available in the table below. To summarize:
Subjectively, I found a few of the
Implemented ones (see e.g. Oct 14, Oct 26) to be a bit nitpicky, even if valid. But otherwise, I considered the vast majority of them serious and important to correct.
The idea to use LLMs to fact-check Wikipedia articles is not new. A few examples of related efforts:
- A group of Stanford researchers recently announced a paper titled "Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with LLMs", reporting e.g. that "at least 3.3% of English Wikipedia facts contradict another fact", and building a browser extension to assist Wikipedia editors in finding them. See this issue's "Recent research".
- In 2022, a team from Facebook AI Research/Meta presented
a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims
(Signpost coverage). However, it never made it beyond the proof of concept stage. - In 2024, the Wikimedia Foundation issued a $5000 grant to develop "An AI-Driven Fact-Checking Assistant for Wikimedia Projects", which however did not progress beyond a rudimentary initial version.
- Of course, many editors have already experimented with using previous versions of ChatGPT and other LLM chatbots to review articles, checking citations, etc. (See e.g. a list by the recently created WikiProject AI Tools)
- Most recently, several Wikipedia editors have used the AI-generated corrections applied by Grokipedia to the articles it imported from Wikipedia to fix the same error in the source Wikipedia article. (See e.g. this thread by an experienced editor who considers this kind of thing to hold
great promise as an aid to editors
. Another editor used it to correct a 19-year-old error on Wikipedia, and I myself used it correct one that had been present on Wikipedia for 20 years.) - Generally, the problem of "textual entailment" (roughly, determining whether a given text implies a given statement) has long been studied in NLP, sometimes also with application to Wikipedia citations (example: "WiCE: Real-World Entailment for Claims in Wikipedia").
A few observations about this experiment
- This used a recent paid version of ChatGPT, specifically the "ChatGPT 5 Thinking" reasoning model, using the "Extended Thinking" setting, where a request can take up to several minutes to complete. This post by Simon Willison (who nicknamed it a "Research Goblin" for its persistence in going through lots of possible sources) should provide an idea about what this model can be good and less good at. Presumably, one can get good results from the reasoning versions of competitors like Claude or Gemini as well, but the limitations of the free version of ChatGPT might make it less useful with this prompt.
- This experiment can be seen as a Wikipedia-specific benchmark for LLMs (or LLM agents). With AI steadily improving in general but still miserably failing at many tasks, it can be difficult to keep track of where current models stand with regard to specific tasks that would be helpful to one's own work. This is why experts like Ethan Mollick have been recommending to would-be AI users to create their own small benchmark with a few tasks that they care about and feel competent to evaluate, and test different models on them over time (as part of "Giving your AI a Job Interview"). The results here can be seen in this vein. I would love for us to come up with more such benchmarks that evaluate LLMs on such tasks that may be useful for assisting Wikipedia editors, and keep evaluating LLMs on them over time. Many Wikipedians have already volunteered to do similar manual evaluation work for various external AI research projects by academics (for example, the Stanford study on inconsistency detection mentioned above).
- As mentioned, the evaluation of this benchmark has been quite time-consuming, partly because I was aiming to reach a solid verdict even in topic areas I'm not familiar with myself, but also because of the inherent effort of fact-checking work in general (e.g. acquiring sources, etc.). This illustrates another remark by Mollick: "It is getting harder and harder to test AIs as they get 'smarter' at a wide variety of tasks," because evaluating the validity of the results on less trivial tasks requires more effort and expertise from human evaluators.
- If repeating this experiment, I would modify the prompt to facilitate systematic analysis of the results, e.g. by requiring the LLM to produce structured output, and also confine it to one error per article only.
- The results presented here are already outdated: Shortly after I ran the last daily query using ChatGPT 5, OpenAI announced ChatGPT 5.1, promising fewer hallucinations among other improvements.
- In my impression there is still room for more systematic approaches that use AI to fact-check Wikipedia articles statement by statement (like some of the projects mentioned under "Related efforts"). That said, I also think that this experiment illustrates how such custom-built AI projects might sometimes get overtaken by the improving general capacities of the big AI lab's models, such as ChatGPT.
- The "at least one" in the prompt is deliberately aggressive, and seems likely to force hallucinations in case an article is definitely error-free. So, while the sample here (running the prompt only once against a small set of articles) would still be too small for it, it might be interesting to investigate using this prompt to produce a kind of article quality metric: If it repeatedly results only in invalid error findings (i.e. what a human reviewer
Disagrees with), that should indicate that the article is less likely to contain factual errors. - Conversely, this experiment could also be conceived as a small-scale study of featured articles. (Personally, I find it a little bit reminiscent of a small examination I did many years ago by hand, collecting errors in Citizendium's "approved articles", which that Wikipedia competitor – launched by Larry Sanger – had highlighted as being of superior quality. Or also of the page Wikipedia:Errors in the Encyclopædia Britannica that have been corrected in Wikipedia.) I don't want to draw too many far-reaching conclusions here. But it seems safe to say that the results here show that the current FA review process is unlikely to ensure perfect factual accuracy. That said, as a non-FA-regular, perusing the edit history and talk pages for various articles in this sample has given me newfound respect for the diligence and efforts of many FA authors and reviewers. I should also say that I encountered almost no WP:OWN issues while implementing the corrections; several of the articles' main authors reacted very graciously. (Two of my 35 correction edits – for the Oct 9 and Oct 22 TFAs – were reverted or partially reverted, in both cases because editors had concerns about the sources supporting the correction; one of these has since been resolved using different sources.)
A bit of error analysis: What may have caused the mistakes by Wikipedia or ChatGPT in some of these cases
A systematic error analysis (i.e. identifying causes for either the mistakes ChatGPT identified in the Wikipedia articles or in ChatGPT's mistaken claims about such mistakes) wasn't the objective here. But I'll offer a few casual, tentative observations.
Various mistaken error reports by ChatGPT appear to have had to do with:
- Lack of access to paywalled/offline sources combined with FUTON bias/overconfidence (e.g. Oct 28)
- Misreading the Wikipedia article due to parsing fails involving Wikipedia templates (Oct 10, Nov 1, ...) and tables (Oct 26)
- Sloppy reading of sources (Oct 9, Oct 24)
- In very few cases, outright hallucinations or "r's in strawberry" type fails (probably Oct 20)
Further insights might be obtained by inspecting the "Thinking" traces and the full list of sources consulted that are included in each ChatGPT answer. See this chat transcript for October 24 as an example, one of the cases where its correction suggestion went awry – click "Thought" for the former and "Sources" for the latter. The table below only includes the sources that ChatGPT included as citations in its final answer.
Apparent causes for some of the identified Wikipedia errors:
- A few likely vandalism edits (Oct 6, Nov 2) that were overlooked during FA review
- Ironically, one of the factual errors found by ChatGPT appears to have come from text that was autogenerated without human review - not by an LLM, though, but by somewhat convoluted template code to autogenerate alt text (Oct 7).
- Interdisciplinarity: I'm speculating here, but one of the known difficulties in writing comprehensive, high-quality Wikipedia articles is that we are a general encyclopedia that often covers article topics from the perspectives of multiple disciplines. And even if an editor is well-read in one of these, they might have blind spots in others, and featured article reviews might not always attract reviewers who can fill all such gaps. A few of the errors that ChatGPT identified seem of the type that would have immediately occurred to an expert with "average familiarity" in a particular discipline (e.g. palaeogeology in the dewiki bonus example below).
Results in detail
| "Find at least one error in this Wikipedia page:" | Response by ChatGPT 5 Extended thinking[remarks 2] | My verdict | Error analysis i.e. observations on why either Wikipedia or ChatGPT might have erred here (incomplete and somewhat speculative) |
|---|---|---|---|
|
|
Here are two clear errors on that page:
Authoritative sources list the tower as 67 floors (CTBUH) or 66 stories (NYC City Planning/LPC). Even the article's own lead calls it "a 67-story... skyscraper," so the "70" in the infobox is inconsistent with both the text and external references.
It currently reads: "$7 million (equivalent to about $ million in )"—the conversion template is empty. The underlying figure is '$7 million', per the LPC designation report, but the inflation-adjusted value and year are missing. (Also likely off: the infobox lists 24 elevators, while CTBUH lists 18 elevator cars—worth double-checking and aligning the definition used.) |
|
Error #1 (floor count) was the result of an apparent vandalism edit from 2020. I also reverted an edit by the same IP in a different article, where it had likewise persisted for over five years. |
|
|
Here's one clear error:
(If you want a second fix: the article says the Llano de Chajnantor Observatory is "less than 20 km south" of Licancabur; using published coordinates for Licancabur (22.83°S, 67.88°W) and the ALMA/APEX site on the Chajnantor plateau (≈23°00′21″S, 67°45′33″W) puts it roughly 25 km away, not <20 km.) Smithsonian Global Volcanism Programapex-telescope.org |
Re "one clear error": Re "a second fix": |
Re "one clear error": The erroneous (and conceivably politically sensitive) claim that "Licancabur is located in Bolivia" formed the alt text for the map in the infobox. It did not appear in the article's source wikitext, but rather seems to have been auto-generated via the infobox template and country map used in the article - a setup that makes sense for increasing accessibility without adding extra work for editors, but in this case was detrimental to the article's accuracy.
|
Here's a clear one:
(If you want a second fix: the infobox credits Whitney Spinks as Producer, but official credits list her as Executive Producer.)mobygames.com |
"Wrong citations for the sales figure.": Re "a second fix": |
||
|
|
Here are two fixable issues I spotted on the current page:
If you want, I can propose exact replacement text for those lines. |
1. "People "Sexiest Man Alive" ranking": After reviewing relevant People articles available online ([6][7]) I believe ChatGPT is likely correct that the magazine did not assign an official "number two". But on the other hand, the HuffPo article cited in the Wikipedia article clearly supports the statement, so appears to be a WP:VNT case where a generally reliable source contains an error. 2. "Alaskan Nets" phrasing |
|
|
No Doubt in 2009 |
Here are (at least) two fixable errors on that page:
(Optional cleanup) The singles list shows "Excuse Me Mr." as released June 21, 1997, which was the commercial Japan release; the song was serviced to U.S. radio earlier, on August 21, 1996. If the section intends first release dates, that line should reflect the 1996 U.S. radio release (and/or clarify territories). |
1. 2. Re "(Optional cleanup)": |
|
Here's one clear factual error on that page:
(Another fix you could make while you're there: the honours line calls his German award a "Knight Commander's Cross." The official notice in the Bundesanzeiger shows he received the Großes Verdienstkreuz mit Stern in July 1952, i.e., "Grand Cross of Merit with Star," not a "Knight Commander's Cross." |
"one clear factual error" "Another fix": |
||
|
Nathan Chen, winner of the men's event
2021 World Figure Skating Championships |
Here's one clear inaccuracy:
Fix: change that clause to something like: "with their free skate to a Queen medley ('Bohemian Rhapsody' / 'We Are the Champions')." OlympicsPopsugar |
||
|
|
Here are two concrete issues I found (as of today, Oct 13, 2025):
In the Location section it says the tell is "less than 1 mi inland." Using the site's published coordinates (31.475817, 34.404631) and the mouth of Wadi Gaza on the coast (~31.46389, 34.37583), the straight-line distance to the shoreline is roughly 2–3 km (≈1.2–1.9 mi), i.e., more than a mile. Pleiades+1 The infobox lists Ayman Hassouna under "Archaeologists," which reads as if he excavated the site in 1999–2000. The published excavation reports name Pierre de Miroschedji and Moain Sadeq as the excavation directors for the 1999 salvage and the 2000 season; Hassouna's work at Tell es-Sakan was much later, as field director for GAZAMAP survey/monitoring in 2022–2023, not part of the original excavations. So the infobox is misleading as written. The BAS Library+2 Want me to draft a crisp correction note you can paste on the article's talk page? |
Location: The Wikipedia article's "less than 1 mi" appears correct, based on a look at the map. Infobox: |
Location: ChatGPT clearly picked a wrong reference point for calculating the distance to the coast. |
|
|
Error: In the section "Reaction, later issues and colonial striking," the article says the silver threepence "was the same diameter as the groat, and the two coins could not co-exist."
Why this is wrong: The threepence began being issued for general circulation in 1845, and groats continued to be struck for circulation through 1855—so they did co-exist for roughly a decade (even though the size clash caused confusion and helped doom the groat). royalmintmuseum.org.ukMuseums Victoria Collections Minimal fix: Change to something like: "The threepence, introduced for general circulation in 1845, was the same diameter as the groat; although both circulated concurrently until 1855, the overlap led to confusion and the groat's discontinuation." royalmintmuseum.org.ukMuseums Victoria Collections |
A bit nitpicky, not to say approaching the borderline between factual error and mere awkward phrasing. But ChatGPT is correct that the statement is factually wrong if taken in a concrete sense. (NB: The source it provided for its suggested "minimal fix" did not support the modified claim.) |
|
Here's a clear one:
(Extra fix you might consider)
|
* "Early life..." section : The cited source says that while the peninsula "had no roads for vehicular traffic" until 1946, "During the French occupation at the beginning of the nineteenth century, some primitive roads were built, but these could accommodate pack animals only."
|
||
|
|
Error: The lead says: "Through Corymore Productions ... Lansbury assumed ownership of the series [Murder, She Wrote] ..." — that's incorrect.
Why it's wrong: Murder, She Wrote was produced and is owned/distributed by Universal/NBCUniversal, not by Lansbury personally. Corymore Productions (Lansbury & Peter Shaw's company) co-produced the show in its later seasons, and Lansbury served as executive producer—but that's not ownership. What to change: Replace "assumed ownership of the series" with something like:Sources: Universal's control of the property is shown by (a) Universal Pictures developing the Murder, She Wrote feature film, and (b) NBCUniversal's distribution/branding of the series and channels: Deadline reporting Universal's film project (Dec. 16, 2024) Deadline; NBCUniversal/Universal marketing the complete-series release and FAST channels for Murder, She Wrote (Universal Pictures Home Entertainment page; Deadline on NBCU–Roku channels) UpheDeadline. The series has long been produced and distributed by Universal Television (see history/coverage) csmonitor.com. |
The "assumed ownership" had been added in 2008 by an IP editor in a wording that suggests they had something like creative control by Lansbury in mind, rather than copyright and/or trademark ownership by her and her husband's company. | |
|
|
Here are two fixable issues:
If you'd like, I can draft the exact wikitext changes next. |
Reception: Uilleann pipes: |
|
Here's a clean factual slip you can fix:
If you'd like, I can scan for more. |
This error was caused by a somewhat unintuitive parameter in the convert template, which also means it is not easy to spot for editors in the source wikitext (snapshot of the rendered page before the correction: [23]). | ||
|
|
Here are two concrete errors you can fix:
If you'd like, I can propose specific wording changes for those spots. |
1.: 2.: |
|
|
Chicago police officer using a phone
Illinois Public Access Opinion 16-006 |
Here are two concrete errors on that page:
If you'd like, I can suggest precise wording fixes for those spots. |
1. 'Writer: ChatGPT's description is correct, but a footnote in the article already clarifies that "The opinion was signed on Madigan's behalf by Michael J. Luke, counsel to the Attorney General." 2. formatting: |
2. ChatGPT's erroneous complaint about a non-missing hyphen might possibly be due a parsing error on OpenAI's side, or a "r's in strawberry" type LLM issue. |
|
Old Trafford, venue of the match
Manchester United F.C. 9–0 Ipswich Town F.C. |
Here's a clean, verifiable one:
Geography error (and inconsistency within the article): The lead says the match was "played at Old Trafford, Manchester," and the page header calls it "a 1995 football match in Manchester, England." But Old Trafford stadium is not in the City of Manchester—it's in the Old Trafford area of Stretford, in the Metropolitan Borough of Trafford, Greater Manchester. The infobox partly acknowledges this ("City: Trafford, Greater Manchester"), but even that label is wrong because Trafford isn't a city; it's a metropolitan borough. The lead and infobox should read "Old Trafford, Trafford, Greater Manchester" (or "Old Trafford, Stretford (Trafford), Greater Manchester"). WikipediaWikipediaWikipedia Suggested fixes:
|
(although not using ChatGPT's exact suggestion, e.g. because this infobox only supports a "city" field. As an aside, I'm not sure whether ChatGPT's additional explanation that the stadium is in the Old Trafford area of Stretford, in the Metropolitan Borough of Traffordis correct regarding Stretford. But in any case its main point is correct that the stadium is not located in the city of Manchester.) |
|
|
Kaniehtiio Horn |
Here's a clean one:
ADG award year is wrong. The article says the episode won at the "2023 Art Directors Guild Awards." The win was at the 2024 (28th) ADG Awards, in Half-Hour Single-Camera Series for "Reservation Dogs: 'Deer Lady'," credited to production designer Brandon Tonner-Connolly. See the ADG's official winners page and the winners announcement. Art Directors GuildADG Deadline (Optional minor nit: several listings put the runtime at 29 minutes rather than 30—e.g., TV Guide—though stream runtimes can vary slightly.) TVGuide.com |
ADG award year: ChatGPT is correct that using 2023 (the year where the awarded work was relased) instead of 2024 (the year when the awarding took place) is inconsistent both with the ADG's own website and with third party reliable sources (e.g. the source cited in the Wikipedia article for this very statement called it "the 2024 ADG Awards"). That said, it's the approach long taken by English Wikipedia (example), i.e. not an issue with this article per se, and arguably also a bit of a judgment call. I would support correcting it, but it would need a wider discussion and changes in a considerable number of articles, going beyond the scopes of this project. runtime: |
|
|
Illustration of redshift and blueshift |
Here are two clear issues on that page:
(Any one of the above suffices as an error; I included both a factual date mistake and stale record claims.) |
Re 1. #Date error: I would agree with ChatGPT that the year when such a scientific discovery was published in peer-reviewed form (here: 1998) seems more appropriate to highlight in such a case than the year when the work that resulted in that discovery began. But the Wikipedia sentence is correct as written and supported by one of the cited sources ("one of the two research teams, the Supernova Cosmology Project, initiated a decade earlier in 1988"). Re 2. #Out-of-date records: |
|
|
Male in breeding plumage |
Here are clear, source-checkable errors on that page and how to fix them:
If you'd like, I can draft exact replacement sentences for those lines so you can paste them into the article. |
Re Réunion introduction year: ChatGPT is definitely wrong in asserting that
Re misspelled plant names: |
|
|
Henry VI of England |
Here are (at least) two concrete errors on that page:
Optional clean-up: the infobox link labeled "Original Text" points to a Wikisource page that itself says the text is taken from Davies (1856), An English Chronicle...—a chronicle extract, not the enrolled statute/roll. If the label is meant literally, it should point to the Parliament Rolls (PROME/Rotuli Parliamentorum) or be relabeled to reflect the chronicle source. Wikisource How to fix:
If you want, I can draft the exact wikitext changes next. |
Re Wrong "Commencement" date in the infobox: Re Internal date inconsistency (24 vs 25/31 October) + a miscount: Re Optional clean-up: |
|
|
Cover of first issue (October/November 1930) |
Here's one clear, sourced error on the page:
If you want a second one to fix:
(Minor precision note you may also consider): the grid note says the title changed to "Magic Carpet" in January 1933; the masthead/title used in sources is "The Magic Carpet Magazine" (often even "Oriental Stories Combined With the Magic Carpet Magazine" on that first issue). onlinebooks.library.upenn.eduISFDB |
"Minor precision note": |
Bibliographic table: Apparently a parsing issue on OpenAI's side
|
|
Taylor Swift |
One clear error: the article repeatedly misspells the Quietus critic Amy Pettifer as "Pettifier." Her surname is Pettifer (one "f," no extra "i"), as shown on her author page and on The Quietus' review of 1989. The QuietusThe Quietus
If you'd like, I can draft a tidy edit note and correction for the page. |
||
|
Illustration of Mathos by Poirson, 1890
Siege of Tunis (Mercenary War) |
Here are two concrete problems on that page:
If you'd like, I can draft precise, sourced wording to fix those two passages. |
1. Leaders captured at the Saw: 2. Aftermath dating and target towns:
|
2. might be an example of FUTON bias (ChatGPT not having access to the Hoyos 2007 book) combined with over-reliance on Polybius. Interestingly, this answer did not include any citations (although the reasoning trace shows that it retrieved an online version of the Polybius text it quotes from). |
|
|
Error (infobox): It lists Clark's service in the Confederate House of Representatives as June 10, 1864 – May 10, 1865.
Why this is wrong: The 2nd Confederate Congress met in two sessions—May 2–June 14, 1864 and Nov 7, 1864–Mar 18, 1865—and adjourned sine die on March 18, 1865. Clark is recorded as serving only in the second session, so his House term should end March 18, 1865 (and his start should align with that session, not June 10, 1864). Wikipedia+1 (Sources: the "2nd Confederate States Congress" page lists the session dates and shows "John Bullock Clark Sr., 3rd Missouri — session 2"; the "Confederate States Congress" page notes the House's final adjournment on March 18, 1865.) |
"his House term should end March 18, 1865": This is what the article itself already said elsewhere ( He held this position from June 10, 1864, to March 18, 1865}). "his start should align with that session, not June 10, 1864" |
The "May 10, 1865" error had been present in the infobox for almost 13 years (introduced in an edit rather misleadingly described as "Formatting infobox/succession boxes."). |
|
|
One clear, checkable error:
Distance to Shipley. The article says Neal sent Archer to "a druggist in the town of Shipley five miles (eight kilometres) away." Shipley (and specifically Baildon Bridge in Shipley, where Hodgson's shop was) is only about three miles (~5 km) from central Bradford. Multiple sources give ~3 mi straight-line (with ~5 mi as a driving distance today): see Travelmath's straight-line vs driving figures for Bradford↔Shipley, the Baildon page noting Baildon is 3 mi north of Bradford city centre, and a detailed local history write-up placing Hodgson's pharmacy "three miles (4.8 km) ... at Baildon Bridge in Shipley." Travelmathen.wikipedia.orgen.wikipedia.org Suggested fix: change "five miles (eight kilometres)" to "about three miles (five kilometres)" (optionally adding that it's roughly 4–5 mi by road). |
||
|
First page of the autograph score
Gott der Herr ist Sonn und Schild, BWV 79 |
One clear error is in the lead:What Bach actually did was reuse three specific movements from BWV 79 across two Missae breves—not "most" of the cantata, and not "most" of either Mass:
This mapping is documented by Bach Digital and other scholarly notes; it totals three of six BWV 79 movements (i.e., not "most"). bach-digital.deThe Bach Choir of Bethlehememmanuelmusic.org Suggested fix: replace "most of the music for two of his four short masses" with: "Bach later parodied three movements from BWV 79 in two of his short masses: the opening chorus and the duet in the Missa in G major (BWV 236), and the alto aria in the Missa in A major (BWV 234)."bach-digital.deThe Bach Choir of Bethlehememmanuelmusic.org |
I consider this one a bit nitpicky. But ChatGPT has a point that "most" for three out of six movements can be seen as a bit misleading (and in any case this sentence had apparently been added to the start of the article while overlooking that the topic had already been covered with more precision later in the lead section). |
|
|
|
Here are two solid, source-checkable errors on that page:
Optional cleanup note: In the "Taxonomy and evolution" lead, the etymology sentence is missing the glosses (it currently reads "albi meaning [blank] and nucha meaning [blank]"). You could complete it as "albi ('white') + nucha ('nape')." (No external source needed for Latin basics, but many field guides use that derivation.) |
1. Diet: 2. "Synonym year": |
Re "Optional cleanup note": This looks like a parsing fail on OpenAI's side involving the {{gloss}} template. (The source wikitext reads |
|
|
Here's a clear one:
(If you want another tidy-up: the article calls his father "Ferdinand Walters," but ADB gives the full name Arthur Ferdinand Walters—worth aligning to the source.) adb.anu.edu.au |
1928–1962: "another tidy-up": |
1962 had been changed without explanation to 1963 in Special:Diff/1054517782 four years earlier, quite likely a vandalism edit.
|
| Error: The lead claims Nizaa is "the only Bantoid language that allows multiple verbal suffixes on one verb."
Why this is wrong: Bantu languages (e.g., Swahili, Bemba, Chichewa) routinely stack multiple verbal (derivational) suffixes—this is a well-documented hallmark of Bantu morphology. And Bantu is a subbranch of Southern Bantoid, i.e., within Bantoid. So Nizaa cannot be the only Bantoid language with such stacking. roa.rutgers.edujzeller.debritannica.com Suggested fix: Change to something like: "Nizaa is notable among North Bantoid/Mambiloid languages for allowing multiple verbal suffixes on a single verb," and cite an appropriate source for the North-Bantoid/Mambiloid rarity rather than using "only Bantoid." |
The cited source used a different definition of "Bantoid language" than English Wikipedia currently does (see article talk page for the messy details). | ||
|
Reanne Evans, the first woman in the tournament
2019 Champion of Champions |
Error found:
In the lead, the article says Neil Robertson "required four snookers in the penultimate frame" of the final. Contemporary match reports say he needed one snooker to tie frame 18, forced a re-spotted black, won that frame, and then took the decider. snooker.orgwpbsa.comRTÉ [42] Suggested fix: Change "required four snookers" to "required a snooker (and won the frame on a re-spotted black)". |
The error had been introduced a week before the article was featured on the mainpage, in a "Copy edit" that changed "having required foul shots in the penultimate frame" to "having required four snookers in the penultimate frame, apparently misreading "foul" as "four". | |
|
Conspirators of the Gunpowder Plot |
One clear error is in the lead:The officer who led the posse was Sir Richard Walsh, the (High) Sheriff of Worcestershire (the county), not the sheriff of the city of Worcester. Contemporary and reputable accounts identify him as the Sheriff/High Sheriff of Worcestershire, and even the body of the same article later uses the correct title ("Richard Walsh (Sheriff of Worcestershire) ... besieged Holbeche House"). ourwarwickshire.org.ukgutenberg.orgWikipedia Suggested fix: change "Sheriff of Worcester" to "Sheriff of Worcestershire." |
This error appears to have been present for almost 16 years [45]. |
Bonus track: German Wikipedia (not included in the tallies above)
I also tried this out once on German Wikipedia, for the "article of the day" of October 19, 2025 (with the prompt translated into German, slightly modified). This uncovered a major error (two figures being off by a factor of 1000) which had been present in the article for over 11 years and survived FA review (Exzellenzkandidatur).
Extended content | ||||||||
|---|---|---|---|---|---|---|---|---|
|
Notes
- ^ Signpost readers might remember that earlier this year Brown had controversially speculated that OpenAI's newly released ChatGPT Deep Research tool
might be the beginning of the end for Wikipedia
, in a since deleted tweet that was followed by some more nuanced statements. - ^ The ChatGPT responses generally include one or more inline source links which I tried to reproduce in most cases (they are difficult to copypaste), as well as a longer "Sources" list at the bottom (often over 50 entries, apparently everything that ChatGPT consulted during the reasoning phase), which are not reproduced here. Click 🤖 for the full response, which includes all source links and also the reasoning traces.