SQLite: Wal2 Mode Notes
sqlite.orgSQLite has different “modes” to facilitate recovery, this new WAL2 mode addresses the problem in WAL (version 1) where the recovery log could potentially grow very large.
It’s solving a real problem, but not considered stable for production yet.
I am curious what the situation is where the recovery log grows to large size and what the actual consequence of this would be.
We have been using SQLite in WAL mode for over half a decade and never witnessed this. Several of our databases can see concurrent access from hundreds of users with transactions in the 1-10 megabyte range, so I find it a bit odd this never came up.
> I am curious what the situation is where the recovery log grows to large size
From the link, "if a writer writes to the database while a checkpoint is ongoing [...] it does mean that the wal file may grow indefinitely if the checkpointer never gets a chance to finish without a writer appending to the wal file. There are also circumstances in which long-running readers may prevent a checkpointer from checkpointing the entire wal file - also causing the wal file to grow indefinitely in a busy system."
> what the actual consequence of this would be.
SQLite runs in a surprising number of places. In an embedded environment disk space may be limited.
I've seen the WAL grow when there's a constant write load and a long background read transaction keeps an old version alive. If you also never reach a point where there are 0 connections to the DB, you can keep this large WAL file around (most of it is unused) for a long time.
I've had to restart services when WAL files had grown to multiple GBs and wouldn't shrink.
I ran into this when I was importing about 1TB into a DB while simultaneously reading from it and performing tasks. This was all in a dev environment, but it did come up.
Citation needed.
For which claim?
The recovery log growing large is explained in [1] - "it does mean that the wal file may grow indefinitely [...] There are also circumstances [...] causing the wal file to grow indefinitely in a busy system."
Not ready for production is [2] - although that is from 2 years ago, the WAL2 branch still isn't merged to trunk [3] which you'd expect if it was ready to go, I think?
[1] https://www.sqlite.org/cgi/src/doc/wal2/doc/wal2.md [2] https://sqlite.org/forum/forumpost/17249fb83a?t=c&unf [3] https://www.sqlite.org/cgi/src/timeline?r=wal2
I wonder why they limit it to 2 WAL files instead of creating a new WAL file once the latest one reaches a certain size and a checkpoint is done. Then delete the WAL files that have been successfully flushed to the main DB. That could minimize the spikyness of WAL file garbage collection under less than optimal circumstances.
I think the idea is you only need 2 to achieve this new property of “avoiding unbounded growth”. You need the “one that is being written to” and the “other one which can be checkpointed into the main Db”.
Reading between the lines a bit, I suspect unbounded growth is still possible with WAL2 - If you have an infinitely long running read transaction, data that modifies the page that’s being read from can never be checkpointed into the main DB.
So setting aside the case of readers that never progress, with WAL1, even if your read transactions were finishing and fast forwarding through commits, it was possible to be in situations where, for a very long time (even potentially forever), at least one reader was behind the most recent write transaction, which meant the wal could never be truncated.
Now with this WAL2 mode I think it should be guaranteed that so long as your read transactions are eventually ending and progressing towards more recent commits, you’ll always eventually find a time to checkpoint and truncate your WAL2 files.
I should say though, I think your proposal makes sense as a way to minimize the impact of any individual checkpoint!
Though you still have to occasionally support arbitrarily large WAL files to an extent because there’s no limit to how big any one write transaction is.
You correctly identified that with arbitrarily long transactions you can't GC the WAL that it is referring to so you will always have potentially arbitarily large WAL files. My suggestion was indeed not to prevent that (as you can't) but to minimize the impact of each individual WAL GC as you can't even GC data that is in the WAL that is being referenced but not relevant anymore.
I see it as a kind of reference counting for WAL data. A WAL file can only be deleted once the sum of all refcounts of transactions in it are zero (synced to main DB or aborted). So minimal GC impact would be if every transaction had its own WAL file but that brings overhead itself so a sensible tradeoff might be to find a middle ground where you split WAL files once they reach a certain size (configurable).
As an informed guess: backward compatibility.
This way an SQLite library which has never heard of a WAL2 file will just use WAL mode instead, instead of potentially being confused by the unheard-of existence of multiple .wal files.
I don't think that would work; there is also a -wal2 file which might be the active WAL file at the time a DB was closed. If you took an older library and tried to open that same DB, you would lose the data in the -wal2 file since that older library would ignore it.
You won't have a wal2 unless one of these connections generated it. The question is, can sqlite+wal2 write to a database while sqlite+wal is writing to it, without terrible things happening.
The answer kinda has to be 'yes' right? This is SQLite we're speculating about here.