When a new technology emerges, we need to name the previous one, which had no name when it was the only option available. SQL databases, or RDBMS (Relational Database Management Systems as they were designed to store data primarily into relational structures), traditionally run on a single server where applications could connect to query and modify the data stored on disks.
Some derivative architectures allow running the databases on multiple servers to ensure resiliency if one server goes down. I’ll cover them later, but let’s list them:
- One technique is sharding, partitioning data into multiple databases. However, the SQL properties do not apply at a global level for queries that involve multiple shards. Teradata was famous for using this technique. Today, Oracle Sharding and Citus for PostgreSQL use similar architectures.
- Another technique involves a cluster of servers sharing the same database files and exchanging data via a private network interconnect and shared disk. Oracle RAC is known for this and is used within an Exadata machine.
- A third solution appeared with the cloud, which still runs on a single server but distributes the database files to multiple storage servers across multiple data centers. AWS started this with Amazon Aurora.
Let’s start with traditional SQL databases that work on a single server in a read-and-write mode without considering those extended architectures.
These databases may have additional read replicas, including synchronous ones, but those replicas do not provide all SQL features, such as consistent reads, read-write transactions, or ACID properties. Therefore, even if some parts are distributed, they cannot be categorized as “Distributed SQL”. Their scalability may also be limited with some synchronization coupling with the primary.
To distinguish them from “NewSQL” and related architectures mentioned above, I usually refer to these databases as “Traditional SQL Databases.” However, these terms don’t carry any technical meaning to oppose “Distributed SQL”. Therefore, I call them “Monolithic SQL Databases”, but let me explain what that means.
In application software, software is called “monolithic” if it has a monolithic architecture, in which functionally distinguishable aspects (for example data input and output, data processing, error handling, and the user interface) are all interwoven, rather than containing architecturally separate components.
I will examine the common aspects of SQL databases, including query processing, shared memory, write-ahead logging, and disk storage, to explain what is monolithic in those traditional databases architecture.
Oracle and PostgreSQL use a single process per connection for communication with your application. The backend, server, or shadow process is responsible for most of your database activity, from parsing queries to executing and updating the database blocks directly.
YugabyteDB also uses a backend process, for the stateless part of SQL processing, which can run on any node. For transaction control and persistence, it buffers read and write operations to send to another layer. YugabyteDB has multiple query layers, YSQL (the PostgreSQL-compatible API) and YCQL (the Cassandra-like API), which can communicate with the storage layer, DocDB, locally or remotely. You can enable one or the other for physical separation. Those are two separate layers communicating through an internal API that geeks can observe in a lab by enabling yb_debug_log_docdb_requests.
SQL databases write data to buffers in memory, not directly to disk. Oracle and PostgreSQL use a shared buffer pool to improve performance and manage concurrent access. Maintaining consistency without corrupting data is achieved by reading and writing to a single shared memory point of truth attached to all processes.
Sharing memory between processes is possible only within a single machine with multiple cores, processors or NUMA nodes. In some cases, memory can be shared between clustered servers by using Remote Memory Access (RMA) with specialized interconnect networks. However, sharing memory between machines over a common network is not feasible.
For YugabyteDB, the stateful layer is the network. The Distributed SQL database stores all data as LSM-Tree, an append-only structure that eliminates the need for reading, updating, and splitting blocks. Changes are versioned, sharded, and distributed to the right tablet, where they are stored in a memory table periodically flushed to SST files. Those files are compacted asynchronously to reduce storage and read amplification. The server’s memory is not a scalability limitation once sharding happens upfront.
Monolithic and distributed architectures use Write-Ahead Logging (WAL also called transaction log or redo log) to protect data written to memory. Oracle and PostgreSQL use a single WAL/Redo stream. For example, a single redo record that is missing or corrupted stops the whole redo thread. YugabyteDB distributes key-value operations to the shards, stored in the LSM-Tree memory and WAL. Each YugabyteDB shard is its own Raft group and has its own WAL stream.
Oracle and PostgreSQL store data in files that can be distributed across devices and replicated to multiple centers. However, they’re consistent as a whole, and each replica must store the entire database. On the other hand, YugabyteDB tablets are small key-value databases, with their own storage and replication. They’re rebalanced in small chunks and typically range from 10 GB to 100 GB.
Monolithic at all levels
Technically speaking, it makes sense to refer to traditional RDBMS as “Monolithic SQL databases”. Their architecture is monolithic, both vertically and horizontally.
- Vertically, the foreground process is tightly coupled with internal storage; a single process parses the user requests and writes the final bytes to shared buffers. There are logical layers in the code but they are tightly coupled.
- Horizontally, multiple processes share the same memory segments, limiting scalability to one machine. They write data to one shared buffer pool and log changes to a single WAL stream. Read replicas maintain a full database copy, serving only read-only connections.
Let’s review the attempts to scale out monolithic SQL databases. Even if they give the impression of scaling out, they are additional layers on top of the same monolithic architecture.
Shared-Storage Parallel Servers
Oracle RAC is a high-availability solution that allows for running multiple instances on multiple servers to scale out within certain limits. The shared buffer pools in Oracle RAC can form a global cache across the servers, but the vertical monolithic architecture remains unchanged: the writes are not distributed and happen to the local memory. The current data buffers are moved through the network or disk to make them local to the processing node and be updated by the same monolithic code. In Oracle RAC, each instance has its own thread of redo log (WAL), but they must synchronize at some point to a single System Change Number over the cluster. Oracle RAC must share storage and typically uses ASM deployed on an Exadata machine for distributed storage, but this is limited to a single data center or two within a short distance.
Shared-Nothing Sharded Databases
Partitioning a database across multiple SQL databases allows it to scale out to multiple servers on a wide area network. This qualifies as “Distributed” but not “Distributed SQL”, as at the global level, it loses certain SQL properties such as ACID, referential integrity, and unique indexes. For example, Citus is Eventually Consistent but not ACID. Distributed SQL refers to the SQL layer over distributed storage and transaction, while sharding over SQL databases retains the monolithic architecture of each shard.
Distributed Block Storage
Cloud databases like Amazon Aurora, Google AlloyDB, and Neon have separate compute and storage architecture that allows the distribution of storage into multiple Availability Zones. From an application perspective, they appear as a cloud-native monolithic architecture in terms of a single writer, memory, and WAL. The WAL is sent to the storage servers, and all read replicas to maintain their cache consistency.
It is crucial to understand the architecture difference between Monolithic SQL databases and Distributed SQL databases. Monolithic databases have a single backend process per connection, which serves all database functions, from parsing the query to writing the final data blocks. Those are updated to shared memory, which makes it unsuitable for scaling out. Even if some components can scale within limits, like read replicas, global cache, multi-thread redo, or distributed block storage, the architecture of monolithic SQL databases remains and doesn’t qualify for Distributed SQL.
A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.
YugabyteDB’s architecture comprises two layers, the query layer, and the storage layer, and both can scale out linearly. The query layer is stateless, making it scalable: you can connect to any node where you view the entire SQL database and run your SQL queries or stored procedures. The storage layer is distributed to small shards, called tablets, with no coupling between themselves. Each shard has its own Raft group for replication and its own LSM-Tree with a memory table protected by its own WAL and SST Files to store table rows and index entries belonging to that specific tablet. The SQL transaction coordination relies on distributed intents and transaction status, visible from all nodes and consistent by Raft consensus.