SQLite WAL in 2026: an architecture note on checkpoint starvation, writer serialization, and the real concurrency boundary

oss

5 sources 3 primary sources March 26, 2026

D. Richard Hipp presenting SQLite in October 2017. WAL mode expands SQLite's room to breathe, but the design still keeps coordination deliberately narrow: append first, checkpoint later, and serialize writers.

SQLite's WAL mode is often introduced as the switch that "fixes concurrency." That description hides the real engineering trade. WAL changes who blocks whom, but it does not abolish SQLite's narrow coordination model. You still get one writer at a time, checkpoint progress becomes part of steady-state performance, and every participating process still has to live on the same host.[1][2]

For teams deciding whether SQLite can stay in the architecture, the high-signal question is not how many reads per second a benchmark can print. It is whether your workload can keep writes short, readers bounded, and checkpoint progress regular. That is the real contract.

1) WAL changes the write path, not the topology

In rollback-journal mode, SQLite writes changes into the main database file and preserves older pages in a rollback journal. In WAL mode, changes are appended to a separate WAL file first and later copied back into the main database during checkpointing.[1]

That is what buys the headline behavior. Readers can continue reading the main database while a writer appends new frames to the WAL, so readers and writers no longer block each other in the older rollback-journal pattern.[1] The gain is real. The topology stays tight:

only one write transaction can be active at a time,[2]
WAL depends on a shared-memory wal-index, so all processes must be on the same machine.[1]

That second point matters more than it sounds. WAL is a concurrency upgrade inside SQLite's local-file model. It does not turn SQLite into a cross-host coordination service.

2) Reader end marks make checkpointing the real pressure point

SQLite's WAL documentation explains the core mechanism with reader end marks. When a read transaction begins, it remembers the point in the WAL that defines its snapshot. Newer frames appended after that point stay invisible to that reader.[1]

Checkpointing is where the pleasant story turns operational. A checkpoint moves WAL content back into the main database file, but it must stop when it reaches frames that are still needed by an active reader's older snapshot.[1] SQLite also notes that the default automatic checkpoint runs when the WAL reaches 1000 pages.[1]

This is why long readers can become the first real performance problem:

a checkpoint cannot finish cleanly while an older reader still pins an earlier end mark,
the WAL keeps growing as fresh writes continue,
read performance can degrade as readers have to consult a larger WAL, even with the wal-index in place.[1]

The useful mental model is that WAL moves contention away from the read/write lock edge and into checkpoint debt. The independent engineering write-up from Ten Thousand Meters describes the same failure mode in plainer operator language: long-lived readers can produce checkpoint starvation, and the system feels slower even when raw write volume is not extreme.[5]

3) Write duration is the real concurrency budget

The transaction documentation keeps the hard boundary simple: SQLite supports multiple simultaneous read transactions, but only one simultaneous write transaction.[2] In WAL mode, BEGIN IMMEDIATE starts the write transaction right away, while BEGIN DEFERRED waits until the first write statement upgrades the transaction; BEGIN EXCLUSIVE behaves the same as IMMEDIATE under WAL.[2]

That makes write duration more important than writer count.

If application code opens a transaction, then performs HTTP calls, retries business logic, or waits on some slow upstream before COMMIT, every other writer sits behind that lock window. The Ten Thousand Meters analysis is useful here because it translates the formal SQLite rule into the failure pattern teams actually see: "database is locked" usually means a write path lasted too long, not that SQLite somehow forgot how to scale.[5]

For design review, the working question is straightforward: how much non-SQL work happens after the transaction begins and before the writer lets go? That is often a more predictive metric than average query latency.

4) PRAGMAs shape pressure; they do not erase it

SQLite gives you a few direct controls over WAL behavior. PRAGMA journal_mode=WAL enables WAL mode. PRAGMA wal_autocheckpoint sets the page threshold for automatic checkpoints. PRAGMA busy_timeout tells SQLite how long to wait on a locked database before returning SQLITE_BUSY.[3]

These settings are useful because they change failure shape:

a lower wal_autocheckpoint threshold keeps the WAL shorter, but increases checkpoint frequency,
a higher threshold reduces checkpoint churn, but raises worst-case WAL size and read drag,
busy_timeout can smooth short lock races, but it does not cure structurally long write transactions.[3]

Used well, these PRAGMAs act like pressure-management tools. Used poorly, they become a way to postpone confronting an application path that holds the write lock too long.

5) The deployment boundary is still easy to state

SQLite's "Appropriate Uses" page is unusually clear about fit. SQLite is for local data, embedded systems, application files, caches, and other cases where one application or one device owns the durable state. Client/server engines fit better when many separate machines need concurrent access to the same data or when centralized administration and higher write concurrency dominate the requirement.[4]

Put into architecture-review language, WAL is strongest when:

the database file belongs to one app, one service, or one host boundary,
write transactions stay short and easy to reason about,
long analytical readers are rare or explicitly managed.[1][2][4]

The warning zone arrives when:

several hosts need to write the same database,
a hot write path coexists with long snapshot readers,
the easiest fix people reach for is "raise the timeout" instead of shortening transactions.[1][3][4][5]

Inference from the official SQLite boundary: WAL is usually an excellent fit when one team owns the whole write path and can reason about every transaction from call site to commit. It weakens quickly when the database stops being local state and starts acting like a shared coordination surface.

Bottom line

SQLite WAL is best read as a precise trade, not a general concurrency miracle. It buys reader/writer overlap by turning writes into append-first operations and by making checkpointing explicit. In exchange, it keeps the same narrow system shape: one writer at a time, same-host coordination, and performance that depends heavily on transaction length and checkpoint discipline.[1][2][3]