Settings

Theme

SQLite: Wal2 Mode Notes

sqlite.org

63 points by umanghere 3 years ago · 15 comments

Reader

ikawe 3 years ago

SQLite has different “modes” to facilitate recovery, this new WAL2 mode addresses the problem in WAL (version 1) where the recovery log could potentially grow very large.

It’s solving a real problem, but not considered stable for production yet.

  • bob1029 3 years ago

    I am curious what the situation is where the recovery log grows to large size and what the actual consequence of this would be.

    We have been using SQLite in WAL mode for over half a decade and never witnessed this. Several of our databases can see concurrent access from hundreds of users with transactions in the 1-10 megabyte range, so I find it a bit odd this never came up.

    • zimpenfish 3 years ago

      > I am curious what the situation is where the recovery log grows to large size

      From the link, "if a writer writes to the database while a checkpoint is ongoing [...] it does mean that the wal file may grow indefinitely if the checkpointer never gets a chance to finish without a writer appending to the wal file. There are also circumstances in which long-running readers may prevent a checkpointer from checkpointing the entire wal file - also causing the wal file to grow indefinitely in a busy system."

    • Seattle3503 3 years ago

      > what the actual consequence of this would be.

      SQLite runs in a surprising number of places. In an embedded environment disk space may be limited.

    • pkhuong 3 years ago

      I've seen the WAL grow when there's a constant write load and a long background read transaction keeps an old version alive. If you also never reach a point where there are 0 connections to the DB, you can keep this large WAL file around (most of it is unused) for a long time.

      I've had to restart services when WAL files had grown to multiple GBs and wouldn't shrink.

    • muttled 3 years ago

      I ran into this when I was importing about 1TB into a DB while simultaneously reading from it and performing tasks. This was all in a dev environment, but it did come up.

  • jessermeyer 3 years ago

    Citation needed.

eis 3 years ago

I wonder why they limit it to 2 WAL files instead of creating a new WAL file once the latest one reaches a certain size and a checkpoint is done. Then delete the WAL files that have been successfully flushed to the main DB. That could minimize the spikyness of WAL file garbage collection under less than optimal circumstances.

  • ikawe 3 years ago

    I think the idea is you only need 2 to achieve this new property of “avoiding unbounded growth”. You need the “one that is being written to” and the “other one which can be checkpointed into the main Db”.

    Reading between the lines a bit, I suspect unbounded growth is still possible with WAL2 - If you have an infinitely long running read transaction, data that modifies the page that’s being read from can never be checkpointed into the main DB.

    So setting aside the case of readers that never progress, with WAL1, even if your read transactions were finishing and fast forwarding through commits, it was possible to be in situations where, for a very long time (even potentially forever), at least one reader was behind the most recent write transaction, which meant the wal could never be truncated.

    Now with this WAL2 mode I think it should be guaranteed that so long as your read transactions are eventually ending and progressing towards more recent commits, you’ll always eventually find a time to checkpoint and truncate your WAL2 files.

    • ikawe 3 years ago

      I should say though, I think your proposal makes sense as a way to minimize the impact of any individual checkpoint!

      Though you still have to occasionally support arbitrarily large WAL files to an extent because there’s no limit to how big any one write transaction is.

      • eis 3 years ago

        You correctly identified that with arbitrarily long transactions you can't GC the WAL that it is referring to so you will always have potentially arbitarily large WAL files. My suggestion was indeed not to prevent that (as you can't) but to minimize the impact of each individual WAL GC as you can't even GC data that is in the WAL that is being referenced but not relevant anymore.

        I see it as a kind of reference counting for WAL data. A WAL file can only be deleted once the sum of all refcounts of transactions in it are zero (synced to main DB or aborted). So minimal GC impact would be if every transaction had its own WAL file but that brings overhead itself so a sensible tradeoff might be to find a middle ground where you split WAL files once they reach a certain size (configurable).

  • samatman 3 years ago

    As an informed guess: backward compatibility.

    This way an SQLite library which has never heard of a WAL2 file will just use WAL mode instead, instead of potentially being confused by the unheard-of existence of multiple .wal files.

    • ok_dad 3 years ago

      I don't think that would work; there is also a -wal2 file which might be the active WAL file at the time a DB was closed. If you took an older library and tried to open that same DB, you would lose the data in the -wal2 file since that older library would ignore it.

      • samatman 3 years ago

        You won't have a wal2 unless one of these connections generated it. The question is, can sqlite+wal2 write to a database while sqlite+wal is writing to it, without terrible things happening.

        The answer kinda has to be 'yes' right? This is SQLite we're speculating about here.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection