GitHub - joeduffy/foundations-of-llms

The Foundations of Large Language Models, 1943-2026

⬇️ PDF (Archive.org) | ⬇️ PDF (Google Drive)

A comprehensive collection of the foundational papers in the development of large language models, spanning from McCulloch-Pitts neurons (1943) to modern reasoning systems (2026).

Overview

This book assembles the essential papers that shaped the development of large language models. Each paper represents a genuine breakthrough moment—introducing transformative concepts rather than incremental improvements. The collection is organized chronologically to trace the intellectual progression from basic neural networks through modern transformer architectures.

Structure

Core Foundations (56 papers)

Part I: Neural Beginnings & Learning Mechanisms (1943–1990) - 5 papers
Part II: Sequence Models & Word Embeddings (1997–2013) - 6 papers
Part III: Deep Learning & Attention (2012–2015) - 10 papers
Part IV: The Transformer Era and Pretraining Revolution (2016–2019) - 9 papers
Part V: Emergence and Scale (2019–2020) - 8 papers
Part VI: Efficiency, Alignment, and Reasoning (2021–2022) - 10 papers
Part VII: Open LLMs and Modern Frontier (2023–2024) - 8 papers

Appendices (15 papers)

Appendix A: Emerging Results (2023–2025) - 5 papers
Appendix B: Foundations of Agents (2022–2024) - 4 papers
Appendix C: System Reports & Production Breakthroughs (2023–2025) - 6 papers

Building the Book

Prerequisites

LaTeX distribution (TeX Live, MiKTeX, or MacTeX)
Make (for build automation)

Quick Start

# Build the complete book
make book

# Quick development build
make quick

# Clean build artifacts
make clean

Directory Structure

├── main.tex              # Main LaTeX document
├── content/              # LaTeX source files for each section
├── pdfs/                 # Paper PDFs
├── out/                  # Generated output files
├── Makefile              # Build automation
└── README.md             # This file

Paper Selection Criteria

A paper belongs in the book if it meets at least one of the following:

It pioneered a technique or architecture that is a direct ancestor of modern LLMs.
It is a necessary stepping stone — later foundational work could not have happened without it.
It introduced a component, method, or insight now standard in frontier models (evidenced by citations in system cards for GPT-4, Gemini, Claude, DeepSeek, Llama, etc.).
For system reports: it documents a qualitatively new capability or paradigm (not just another strong model).

A paper does not belong if:

It is incremental over an already-included paper with no distinct lasting technique.
It is a survey or review (not a primary contribution).
It is overly vendor-specific without advancing the broader state of the art.
It is too recent to have proven its significance and is not already referenced by frontier systems.

Contributing

This is a curated academic collection. For suggestions or corrections, please open an issue.

License

The LaTeX source code and organizational content are provided under MIT License. Individual papers retain their original copyright and are included for academic purposes under fair use.

Acknowledgments

This collection builds upon decades of research by hundreds of contributors to the field of artificial intelligence. Special recognition goes to the pioneering researchers whose work made modern large language models possible.