Launching Alpha Research: Karpathy Knowledge Bases for All of Humanity's Knowledge

4 min read Original article ↗

April 9, 2026

Available now

Alpha Book is live, and there are three clear ways to try it.

Alpha Research is the company. Alpha Book is the first live research dataset. The dashboard is for browsing datasets, runs, and artifacts in the browser. The RESEARCH CLI is for running the same workflow against your own data or from your coding agent.

  • Alpha Book: ask questions across 75,000 public domain books right away.
  • Dashboard: inspect datasets, runs, transcripts, and research artifacts in the browser.
  • RESEARCH CLI: point the workflow at your own dataset, then open the resulting runs in the dashboard.

"Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest." Andrej Karpathy

Alpha Research builds open source research products that let coding agents search large datasets, collect evidence, and return inspectable outputs. Our mission is to help knowledge workers work 100x faster.

Today we're launching and open sourcing our first platform, Alpha Book, built on a mirror of the Project Gutenberg dataset with 75,000 books. An Ivy League author used it and, in 10 minutes, found new and crucial sources he had not heard of for a book he is writing.

Start with the live example

Alpha Book

See the product on a real public corpus and understand the quality of answers before you install anything.

Open Alpha Book

Inspect the web app

Dashboard

Browse datasets, inspect runs, read transcripts, and review the artifacts a completed research job returns.

Open dashboard datasets and runs

Bring your own data

RESEARCH CLI

Install the CLI if you want to ingest a private or proprietary dataset and hand runs off to the dashboard.

Open CLI setup

The Bottleneck in Knowledge Work

Knowledge work is bottlenecked in two key ways: finding evidence and testing hypotheses. We want to make both dramatically faster.

Alpha Research extends classic RAG with general coding agents that can use filesystem tools like ripgrep, reason iteratively, and search across broad corpora for evidence instead of relying on a single retrieval pass. Those same agents can write ad hoc scripts to label datasets, compare slices, and test research hypotheses directly against the underlying corpus.

The Roadmap

Our roadmap is simple: repeat Alpha Book across every public dataset that matters. Next we'll launch Alpha Econ with US Census, BLS, NBER, and related datasets. Then Alpha Justice with US Supreme Court cases. Then Alpha NYC with New York City's open data. And so on.

The long-term goal is straightforward: build knowledge bases for the entirety of humanity's knowledge.

How You Can Help

  1. Interested in quantitative humanities research? Try Alpha Book now, then suggest datasets in our Discord.
  2. Know a historian, author, economist, or other humanities researcher? Introduce them to me at ryan@dolphinmade.com. I'm especially excited to support young researchers who are eager to use technology to power their research.
  3. If you have introductions to large research libraries, archives, museums, presidential libraries, universities, historical societies, the National Archives, or Google Books, I'd love to meet the researchers and librarians who want to attach the world's best librarian to their collections.
  4. For enterprise teams with large proprietary datasets like AdTech, real estate, or social media trends: Alpha Research also supports private, high-value corpora. Agentic search produces better results, drives more searches, and increases user satisfaction and retention.

We want to set the standard for agentic knowledge work by building the best and easiest-to-use research platform. The mission is to usher in a new era of quantitative research by building knowledge bases for the entirety of humanity's knowledge.

Use Alpha Book with your coding agent

If you already work in Codex, Claude Code, or another coding agent, use the Alpha Book skill as the fastest product path. It teaches the agent how to query the corpus and return evidence-backed answers.

Expected result: your agent opens Alpha Book, follows the skill instructions, and returns research output you can inspect or continue in the dashboard.

Open Alpha Book skill instructions

Go to https://alpha-book.org/skill.md and follow the instructions there.