Read Committed and Pessimistic Locking in Distributed SQL Databases
originally posted on LinkedIn
One of the most difficult concepts to understand in SQL databases is probably transaction Isolation Level (the I in ACID). If this is your case, don’t panic, the fault is not yours. Unlike most of other Relational and SQL concepts, isolation levels were never correctly defined, because the definition, in the seventies, was based on an idea of a possible implementation, which wasn’t the one used later. Isolation Level is the concept, and locking, pessimistic or optimistic, is the implementation. I’ll try to explain both.
A transaction can do many reads and writes, bringing the database from one state to another. If a concurrent transaction is doing its read and writes at the same time, it cannot work on the other’s intermediate state, or the result would be inconsistent. Think of it as two people booking a hotel room at the same time. Both start to read a state where room 42 is free. Then, both decide to book this room, which is a write to the database and commit it, which is the final state. Without special coordination, the final state, depending on the data model, may be that both will end in the same bed. Or that only one booking was recorded, without the other knowing of the lost one. To avoid this situation, it requires some coordination between the transactions so that they read a consistent state independent of concurrent operations. There are two acceptable outcomes for a consistent result:
- one of the users, when reading the initial state, will have to wait with a message like “someone else is booking this room, please wait to see if he commits or cancels”. This guarantees that its initial state is the final state of the other one. Both transactions work, during their whole transaction, on their final state at commit, by blocking the concurrent changes with pessimistic locking.
- one of the users, when writing her new state, will get an error message like “sorry, someone booked the same room since, it was not actually free, bad luck”. Here, with optimistic locking, the transactions work on their initial state and can commit only if there were no concurrent changes on them.
This is where you have the choice between pessimistic or optimistic locking. The second one, where you don’t wait but may receive an error, is optimistic. This is used when the probability of conflict on the same data is low. The first one, where you wait, is pessimistic. When the probability of conflict is high, better wait than receive an error. But the choice is not only about the probability because both have important drawbacks:
- pessimistic locking is achieved with lock enqueues. Exclusive locks are acquired on everything you write, to prevent concurrent reads and writes on an intermediate state. And shared locks are acquired on everything you read, to prevent concurrent writes from changing the state you have read. This can quickly lock the whole database, serializing all transactions to happen one after the other, and that’s the reason the isolation level that can achieve what I described above (not reading or writing on another session intermediate state) is called Serializable Isolation level. Because the definers were thinking about locks and waits to serialize the transactions. But this doesn’t scale, defeating the multi-user nature of databases, so other solutions (and isolation levels) had to be considered, with lower isolation levels and shorter locks
- optimistic locking is achieved by lock intents, similar to breakable locks, cancelling one transaction in case of conflict, rather than waiting. This, when the probability of conflict stays low, is scalable, because many transactions can happen at the same time. Serializable is the highest isolation level, guaranties no intermediate state anomalies, and is scalable with optimistic locking. This is perfect, but only when your application has adapted its exception handling to receive this kind of serialization error, and retry the transaction. This is not easy to do because you cannot just retry indefinitely. Your code should catch the exception, cancel whatever it did that is not transactional (like saying to the user that a room is free and finally telling her that it was not), wait a bit (like ethernet exponential backoff on collisions), and give up after a few attempts.
In NoSQL databases, the first ones to be distributed, the transactions are very simple, being a simple read or write operation on a single object or document. There, it is easy to limit the API to idempotent calls (like Cassandra UPSERT behavior of INSERT) and the database can even retry the transaction itself. But a SQL database is more complex than that. A transaction reads and writes multiple tables or indexes, and may already have returned a partial result to the application when it encounters the serializable error. The database cannot always restart the transaction and may need to notify the application of an error.
If you look at CockroachDB, which supports only the Serializable isolation level, it is explained quickly that “Your application should include client-side retry handling”. And you have no choice because they support only Serializable. If you try the other isolation levels, they are internally mapped to Serializable. This is apparently sufficient, as it is the highest, preventing all anomalies, but it also comes with the need to code your application for it. CockroachDB is protocol-compatible with PostgreSQL, sending the same SQLState 40001for serialization errors, and accepting the isolation level syntax, but is not behavior-compatible because the default PostgreSQL isolation level, Read Committed, is not supported.
Because it is PostgreSQL-compatible, YugabyteDB implements, in addition to Serializable, the Read Committed one. Because this is the default in PostgreSQL, it is the one used by the majority of applications. And, by a consequence, most applications written for PostgreSQL don’t implement retry logic. Read Committed and Pessimistic Locking goes together here. Pessimistic Locking, to wait instead of getting serialization errors and Read Committed because, at this level, the read snapshot can be an intermediate state. Let’s explain that.
Read Committed isolation level may read intermediate states. It reads a snapshot of the database that is not taken at the start of the transaction, like Serializable, but at the start of each SQL statement. A statement can even be automatically rolled back and restarted in case of conflict or clock skew, on a later state. This allows some anomalies, like in my case of hotel room booking, where reading at a different time within a transaction may show a different state of the database where the free and booked status has changed. Those need special care with explicit locking, like SELECT … FOR UPDATE when reading the status of the room. And some anomalies can still occur (see an example here). But, for many transactions, Read Committed is acceptable. Because fewer locks are acquired, and for a shorter time, it is still scalable with pessimistic locking, without implementing complex retry logic in the application. And, anyway, being the default, and then widely used, any PostgreSQL compatible database must support it. This is why YugabyteDB implements it.
There is another isolation level, between Read Committed and Serializable. It is called Repeatable Read by database using pessimistic locking for it, and Snapshot Isolation by those using optimistic locking. It ensures that the rows are read from a state at the start of the transaction, but still allows some other intermediate state for new rows. In my hotel example, if there is one row per room, with a free/booked flag, Repeatable Read allows only one of the two transactions to commit. However, if this is modeled with the presence or absence of a booking row, the absence of a booking is not part of the initial state, and the isolation level doesn’t prevent the conflict. However, a unique constraint may prevent it.
In summary, Serializable prevents all race condition inconsistencies, but requires the application to implement retry logic. Read Committed doesn’t need this, but requires special handling of some transactions with SELECT FOR UPDATE or handling unique constraint violations. As most traditional SQL databases use Read Committed by default, migrating to a distributed SQL database requires support for all isolation levels. And even if Serializable is encouraged for new applications, Read Committed and Pessimistic Locking is a must for legacy ones. Don’t forget that it is not about the additional code for retry logic, but also about integration tests. Do you test all race conditions or do you rely on database features to guarantee them?
I am Developer Advocate for Yugabyte, I blog on dev.to and chat on twitter
A very good presentation, clear and fun, on isolation levels, by Martin Kleppmann: