Transaction Isolation in Postgres

160 points by jerrinot 2 years ago · 26 comments

Reader

One caveat to serializable transactions in Postgres is that ALL concurrent transactions must be running with the SERIALIZABLE isolation level to protect against serialization anomalies.

This is a bit jarring if you come from MSSQL, which implements the SERIALIZABLE isolation level using locks. In MSSQL, you can rest assured that a serializable transaction will not be affected by changes from other concurrent transactions, regardless of their isolation level.

In Postgres, you may have a set of transactions all participating in SERIALIZABLE isolation today, but tomorrow someone adds another script without the SERIALIZABLE isolation level, and now your protected paths are no longer isolated.

magicalhippo 2 years ago

We use Sybase SQLAnywhere at work, which also implements SERIALIZABLE using locks. Naive me thought that meant a lock on the table, but no, it locks all the rows... Not great for a table with many rows!
We were essentially trying to avoid inserting the same value twice, so we ditched SERIALIZABLE and instead added a unique index along with a retrying loop on the client side.
brasetvik 2 years ago

Or from the other perspective of the trade-off: One caveat with MSSQL is that ALL concurrent transactions must pay the overhead if _some_ transactions need serializable guarantees?
- vore 2 years ago
  
  Only if they touch the same data. If they are touching disjoint sets of data then there is no overhead to be paid by non-SERIALIZABLE transactions.
- bob1029 2 years ago
  
  There has been some recent improvement to locking behavior:
  https://learn.microsoft.com/en-us/sql/relational-databases/p...
Fire-Dragon-DoL 2 years ago

Oh that's super nasty, is it mentioned somewhere in the doc?
Is it the same for repeatable read?
- nuttingd 2 years ago
  
  I have read the docs plenty of times, but it never stuck for me until I read the (free!) PostgreSQL 14 Internals ebook: https://postgrespro.com/community/books/internals
  Quoted from Page 70:
  If you use the Serializable level, it must be observed by all transactions of the application. When combined with other levels, Serializable behaves as Repeatable Read without any notice. So if you decide to use the Serializable level, it makes read sense to modify the default_transaction_isolation parameter value accordingly -- even though someone can still overwrite it by explicitly setting a different level.
  I had a real "WTF?" moment when I read this the first time.
  - Fire-Dragon-DoL 2 years ago
    
    Interesting, this book looks really cool!
- gwen-shapira 2 years ago
  
  It is mentioned in the doc, but can be easy to mis-understand:
  "If a pattern of reads and writes among concurrent serializable transactions would create a situation which could not have occurred for any serial (one-at-a-time) execution of those transactions, one of them will be rolled back with a serialization_failure error."
  Note that it says nothing about the non-serializable transactions.
  https://www.postgresql.org/docs/current/sql-set-transaction....
  - franckpachot 2 years ago
    
    It is in the Wiki: https://wiki.postgresql.org/wiki/Serializable Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI. If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.
    I guess it is the same for all MVCC databases. They don't want to acquire a read lock just in case another transaction is in Serializable
    
    gwen-shapira 2 years ago
    
    The OP mentioned that SQLServer does lock (but then, it doesn't use SSI - just SI).
    The recommendation in PG docs to use a combination of SERIALIZABLE and READ ONLY transactions seems like a good one for read-heavy systems.
TomaszZielinski 2 years ago

This is pretty intuitive when you think about predicate locks that Postgres uses to detect conflicts.
If you have one SERIALIZABLE transaction that sets some locks, and one non-SERIALIZABLE that doesn't, then they can't "see" each other "by definition".
But your point stands--there could be some kind of "warning flag" somewhere, that would alert if SERIALIZABLE transactions overlap with non-SERIALIZABLE ones. Or maybe there _already_ is something like that??

ikhare 2 years ago

When I first learned about isolation levels in databases I was shocked that databases could “lie” to me. I think like most devs focused on the product end I just expected databases to be a magical black box that worked perfectly. Which I assumed was just the strictest definition of serializability without really thinking about it.

After watching some of Andy Pavlo’s lectures[1] it all just dawned on me: Databases are just like any other piece of code you write and have to think about all the tradeoffs with algorithms and book keeping to keep things efficient and providing the guarantees you want.

I highly recommend that lecture series.

Shameless plug: the reason I watched those lectures was to understand the internals of DBs better because I started working at Convex. Where we try to make sure things like this is something an app developer doesn’t have to worry about. Though we do mention it in our docs[2] for the curious.

[1] https://www.youtube.com/watch?v=LWS8LEQAUVc&list=PLSE8ODhjZX... [2] https://docs.convex.dev/database/advanced/occ

watters 2 years ago

> For reasons that should be obvious to anyone with a bank account, you really really want both updates to happen, or neither. This is what atomicity guarantees - that the entire transaction will either succeed or fail as a single unit.

So, I understand why this example feels particularly illustrative of the value of transactions, many-if-not-most financial "transactions" can't practically rely on this kind of atomicity for the kind of financial operation depicted.

While it may seem like a small thing, I think authors would do everyone a favor to stop using the "banking transactions, obvs" example.

TheNewsIsHere 2 years ago

I suppose this is a good example if your reader knows how banking systems work.
A better direct example in the same line of reasoning would be double-entry accounting where you would want both the credit and debit entry to either fail or succeed.
Most people probably don’t know that their bank account _is_ a double-entry account to their banking institution.
I can’t noodle a way to make the banking example more intuitive for an audience absent explaining how double-entry accounting works and that banks mostly obscure that from the customer. That’s not really knowledge you can assume from a software developer or sysadmin.
lmm 2 years ago

In my experience most things you want to do turn out to be impossible to achieve with RDBMS-level transactions, and you end up having to implement the behaviour that you need "by hand" with the database's transaction support mostly getting in your way. So in a subtle way banking transactions are actually a pretty good example.
banq 2 years ago

transaction is belong to business logic, please use DDD

PeterCorless 2 years ago

Good articles on the difference between linearizability and serializability. They are not the same thing.

https://accelazh.github.io/storage/Linearizability-Vs-Serial...

https://ajaygupta-spark.medium.com/linearizability-and-vs-se...

Exuma 2 years ago

This is literally one of those topics I have to come back and read time and time and time again every time I need it like 2 times a year. Maybe this article will finally make it stick.

gwen-shapira 2 years ago

Possibly. I tried hard to have some memorable examples.
But I'll be honest: Every time I got review comments (or re-reviewed myself), it took a bit of time for my brain to warm up again into "transaction mode".
The big lesson may that concurrent transactions are pretty hard to reason about without external assistant like diagrams or test scenarios. I really like the system Postgres uses for transaction testing (AKA - deterministic simulation testing). Create scenario that match your business logic and then run them serially but with different ordering of statements and make sure the results are as you expect.
rabee3 2 years ago

this! never sticks for long, and I like the way this article flows in explaining isolation levels. definitely bookmarking it to get back to it later when needed.
TomaszZielinski 2 years ago

I've been dealing with transactions regularly in the past few years, but not constantly, as things work correctly most of the time. And I still need to refresh my memory pretty much every time I go back to revisit some transactional code, or to answer any non-trivial questions about it.
So I guess it's just the way it is :)

masfuerte 2 years ago

What is "the pre"?

itunpredictable 2 years ago

I came here to ask this
- gwen-shapira 2 years ago
  
  It should read "code" instead of "pre".
  I'm pushing a fix right now.

Settings

Transaction Isolation in Postgres

Keyboard Shortcuts