Some NoSQL vendors have introduced transactional behavior and now claim ACID properties in their marketing slides. I’m pleased to see them using SQL terms after proclaiming the end of relational databases. However, it can be misleading if you don’t examine the distinction between a NoSQL transaction and a SQL transaction. NoSQL databases vendors decided to impose many limitations to scale at a time when scaling out SQL was challenging (before Distributed SQL databases like Spanner, CockroachDB, TiDB, and YugabyteDB were available), and this limitation still exists today, even if a few NoSQL databases support some transactional operations.
‘ACID’ encompasses more than that. The acronym is used in the context of complex transactions and was introduced in the Principles of Transaction-Oriented Database Recovery with reference to DBMS with many more capabilities. The italicized text is from this paper. Let’s explore the ACID properties one by one to clarify the differences.
In a SQL transaction, it begins with ‘start transaction’ or its equivalent and concludes with ‘commit’ or ‘rollback.’ During this transaction, numerous SQL statements are executed to read and write to the database. Typically, a transaction is a short sequence of interactions with the database. The interaction aspect is crucial: the database remains unaware of the complete transaction intent until it receives the ‘commit’ statement. Achieving atomicity would be straightforward if all transaction intents were known from the outset, as in a NoSQL GET/PUT operation or an SQL auto-commit execution of a stored procedure. However, business transactions involve extensive application logic, entailing multiple database calls to read and write data. If an issue arises in the middle, the database never displays this interim state to other threads. The reads and writes remain concealed until they are all simultaneously visible.
A NoSQL transaction can update, read, and write multiple objects, but the ‘A’ in ACID signifies more than that: SQL transactions transition the entire database from one state to another.
In a SQL database, the assurance goes beyond transitioning the database from one state to another upon committing a transaction; it also ensures that only a state preserving the overall database consistency can be committed. In other words, each successful transaction by definition commits only legal results. In the realm of SQL databases, this entails that integrity constraints such as key uniqueness, foreign key references, and numerous check constraints remain perpetually valid.
Conversely, a NoSQL transaction can modify multiple objects, sometimes residing in different partitions. However, global secondary indexes in NoSQL databases are typically updated asynchronously and cannot detect duplicate values to enforce uniqueness. Consequently, there’s no guarantee that you will retrieve the same state when querying the table or the index, or that a duplicate insert retry after a failure will be detected. The same applies when reading two different tables, breaking the integrity of references. NoSQL databases do not enforce such integrity constraints, which constitute the ‘C’ in ACID.
In an SQL database, transactions are synchronized to ensure that each one can transition the database from one state to another without overlapping incompatible changes, which could lead to anomalies. Events within a transaction must be hidden from other transactions running concurrently. To strike a balance between performance and code complexity, several isolation levels are available. With Serializable transactions, you can write code without worrying about race conditions, but you must handle retry logic in case of conflict exceptions. In the case of Read Committed transactions, such exceptions are less likely, but you may need to consider certain race conditions that necessitate explicit locking.
NoSQL transactions are constrained to mitigate complex race conditions by limiting them to single objects, making it easier to sequence operations. Due to the absence of joins between tables, consistent maintenance of secondary indexes, and foreign key checks, achieving isolation is straightforward. However, it’s essential to note that the ‘I’ in ACID primarily concerns complex transactions. And even a single insert into a table with secondary indexes starts to be a complex transaction when wrong results are not allowed.
Achieving durability, represented by the ‘D’ in ACID, seems straightforward with persistent storage and replication across multiple zones. In the best-case scenario, resilience ensures that a failure has no impact due to the absence of a single point of failure. In the worst case, recovery entails downtime, but the database can failover to a replica that contains all changes. The results survive any subsequent malfunctions. However, in SQL transactions, when writing to multiple objects across the database, what is written to disk and synchronized to the network may include changes that have not yet been committed. In the event of a failure, these changes must stay invisible and be undone. Typically, recovery involves rolling forward changes that may have been made in memory but not yet flushed to disk, followed by rolling back ongoing transactions that were not committed.
NoSQL simplifies this by restricting the number of objects updated by a transaction. Achieving persistent changes, such as writing to a mirrored disk, is straightforward. However, the ‘D’ in ACID entails much greater complexity. It’s not surprising that the paper introducing ACID properties delves into recovery in detail, covering UNDO and REDO logs.
Classifying NoSQL transactions as ACID overlooks the true significance and complexity of what SQL databases have achieved. While transactions involving a small set of pre-planned reads and writes can be labeled as ‘transactional,’ branding them as ACID hides the absence of support for the comprehensive set of features for which ACID properties were originally defined.
Today, there’s no need to compromise SQL features for the sake of horizontal scalability. YugabyteDB offers a PostgreSQL-compatible endpoint across multiple nodes while seeing the same logical database with all SQL capabilities and ACID properties intact. As it operates on the same transactional and distributed storage infrastructure, YugabyteDB also offers a Cassandra-like endpoint that supports transactions when updating multiple tables, including their secondary indexes. For scenarios where ACID transactions are unnecessary, such as data migration with bulk loads, the option to release ACID properties at the SQL statement level is available, ensuring flexibility without sacrificing essential features. With this, a dedicated NoSQL database is not needed anymore and the distributed SQL database can handle all OLTP workloads on simple documents or complex relations.