GitHub - entanglr/awesome-vintage-llms: A curated list of vintage large language models — also called historical or time-capsule LLMs — trained from scratch on text from bounded historical periods, along with the papers, datasets, demos, and discussions surrounding them.

8 min read Original article ↗

A curated list of vintage large language models — also called historical LLMs, time-capsule LLMs, or time-locked LLMs — together with the papers, datasets, demos, and discussions surrounding them.

A vintage LLM is a language model trained (typically from scratch) on text from a bounded historical period, with no information from after a given knowledge-cutoff date in its training corpus. The goal is to produce a model that does not merely roleplay a historical era when prompted, but whose vocabulary, worldview, and "mental furniture" are genuinely shaped by the textual culture of that period. The term vintage LLM was popularised by Owain Evans in his "Vintage Large Language Models" talk; for a humanist's perspective on the emerging field, see Benjamin Breen's essay "Are 'Vintage LLMs' the start of a new humanistic field?".

These models are interesting because they enable, among other things:

  • Counterfactual experiments (e.g. can a pre-1915 model independently arrive at general relativity?)
  • Contamination-free benchmarks for temporal generalisation and forecasting
  • Research in the digital humanities, social sciences, and history of ideas
  • Aggregate "windows into the past" for studying period vocabulary, rhetoric, and assumptions

Contents

Models

Listed in chronological order of first public release / announcement.

MonadGPT (2023)

"What would have happened if ChatGPT was invented in the 17th century?"

A 7B-parameter chatbot fine-tuned by Pierre-Carl Langlais (Pclanglais) from Mistral-Hermes 2 on roughly 11,000 early-modern texts in English, French, and Latin (≈1400–1700 CE), drawn primarily from EEBO and Gallica. MonadGPT replies in archaic language and frequently invokes pre-modern references; it is in many ways the spiritual ancestor of the more recent train-from-scratch projects below, although it is technically a fine-tune rather than a clean-room vintage model. Released November 2023, it predates the term "vintage LLM" but is widely cited as an early example of the idea.

TimeCapsuleLLM (2025–2026)

A series of models trained from scratch on London texts published between 1800 and 1875 by Hayk Grigorian (with academic supervision from Dr. Hamed Yaghoobian at Muhlenberg College). Versions range from a tiny 16M-parameter v0 built on nanoGPT to a 1.2B-parameter v2 trained on a 90 GB corpus of 136,344 documents. The v1 model famously connected the year 1834 with Lord Palmerston and a real London protest unprompted, demonstrating that a small model trained only on period sources can surface genuine historical patterns from primary sources alone.

Ranke-4B / History LLMs (2025–2026)

A family of 4B-parameter models based on the Qwen3 architecture, trained from scratch on 80B tokens of carefully time-stamped historical text drawn from a curated 600B-token dataset. Each model in the family has a fixed knowledge cutoff: 1913, 1929, 1933, 1939, or 1946. The project is led by Daniel Göttlich, Dominik Loibner, Guohui Jiang, and Hans-Joachim Voth at the University of Zurich and Cologne University. Ranke-4B-1913 famously does not know who Adolf Hitler is (its closest match is a Hessian philosopher of the 19th century). The project explicitly aims to preserve, rather than scrub away, the normative judgements acquired during pre-training, treating the resulting prejudices and worldviews as a feature for historical research rather than a bug to be sanitised.

Machina Mirabilis / GPT-1900 (2026)

An experiment by Michael Hla asking whether a 3.3B-parameter LLM trained from scratch on pre-1900 text can independently arrive at quantum mechanics and relativity, directly testing Demis Hassabis's well-known proposal. The pre-training corpus (~22B tokens) was assembled from Institutional Books, the British Library Books dataset, and American Stories newspapers, and aggressively filtered to remove any document referencing Einstein, quantum mechanics, or other post-1900 physics. After mid-training on ~290M tokens of pre-1900 physics texts and a careful instruction-tuning + RL post-training pipeline, the model shows "glimpses of intuition" — for example, occasionally asserting that light "is made up of definite quantities of energy" or that gravity and acceleration are locally equivalent — but the author is openly cautious about claiming genuine out-of-distribution reasoning.

Mr. Chatterbox (2026)

A ~340M-parameter chatbot (roughly the size of GPT-2-Medium) trained from scratch by Trip Venturella on 28,035 Victorian-era British texts published between 1837 and 1899, drawn from the British Library's blbooks dataset. Built using Andrej Karpathy's nanochat, it totals approximately 2.93 billion training tokens after filtering. The model is unapologetically weak compared to frontier systems — illustrating just how data-hungry "public-domain-only" LLMs are — but it is one of the cleanest demonstrations that an entirely out-of-copyright LLM is feasible on modest hardware, and it does answer in a recognisably 19th-century voice.

Talkie-1930 (2026)

The largest publicly released vintage LLM to date: a 13B-parameter model trained on 260B tokens of pre-1931 English-language text (books, newspapers, periodicals, scientific journals, patents, and case law), built by Alec Radford (co-creator of GPT), Nick Levine, and David Duvenaud. The 1930 cutoff was chosen because U.S. copyright law places texts from that year into the public domain. The team also released talkie-web-13b-base, a "modern twin" with identical architecture and training FLOPs but trained on FineWeb, to enable controlled comparisons. The instruction-tuned talkie-1930-13b-it was post-trained without any modern chat data — instead using QA pairs extracted from pre-1931 etiquette manuals, encyclopaedias, letter-writing guides, and poetry collections, plus online DPO with an LLM-as-judge.

Related reading

A small selection of foundational and contextual writing on the field. Contributions of further high-quality references are welcome.

Contributing

Contributions are very welcome! This is a young, fast-moving field, and the list above is certainly incomplete. To suggest an addition or correction, please open a pull request that:

  1. Adds the project in the appropriate chronological position.
  2. Uses the same field structure as existing entries (short description; then GitHub, website / demo, paper, source HuggingFace, quantised / fine-tuned HuggingFace artefacts, reputable online publications, Hacker News discussions — as applicable).
  3. Cites only reputable sources (academic venues, established publications, official author write-ups, well-known technical blogs). Please avoid SEO blogspam and AI-generated summaries.
  4. Prefers project-from-scratch vintage models over plain fine-tunes or "system-prompt cosplay" of modern LLMs, although carefully-scoped fine-tunes (such as MonadGPT) are in scope.

Please also feel free to open issues to discuss whether a borderline project belongs on the list.

License

CC0

To the extent possible under law, the contributors of this list have waived all copyright and related or neighbouring rights to this work.