Which šPostgreSQL problems are solved with šYugabyteDB
A famous article, ā10 Things I Hate About PostgreSQLā, by Rick Branson summarizes the main pain points people encounter with PostgreSQL. It bring an answer to: āNo software is perfect, so exactly what are PostgreSQLās imperfections?ā
Those problems are known for a while, work has been started on them, but they have their root in the design of the storage and transactions. This is not easy to change in a database that is used for decades, for critical applications, with, most of the time, no direct contact between the users and the community, because it is free and open source.
The PostgreSQL community brings many features and solutions into every release in the upper layer, the SQL language, the relational features, the advanced semi-structured data types like JSONB. But the lower layer, storage and transaction, is harder to evolve and still uses the Heap tables, B*Tree indexes, full-page WAL per checkpoint, no in-place updates, Vacuum, 32-bit xidā¦ designed from the old times. It works, there are contributions to workaround the problems, but it can be annoying as mentioned in the ā10 Thingsā article.
The primary goal of YugabyteDB is to provide a distributed SQL database compatible with all the features of PostgreSQL, with a cloud-native architecture for the distributed storage: all nodes active, resilient to failures and elastic to scale. But, with the modern storage layer design, it also solves most of the problems mentioned. This post will list the ā10 Thingsā from the article and how they are solved in YugabyteDB. The titles are from Rick Branson article, they are not my words.
#1: Disastrous XID Wraparound
šPostgreSQL uses a 32-bit transaction ID and the database freezes when it is exhausted. Table *vacuum * must run frequently to avoid this.
šYugabyteDB uses Hybrid Logical Time (HLC) for cluster synchronization (see A Matter of Time: Evolving Clock Sync for Distributed Databases) and, as thereās a physical clock component in it, it will never exhaust.
#2: Failover Will Probably Lose Data
šPostgreSQL doesnāt provide ACID across multiple servers. An async replica can be setup to offload reads, with an acceptable lag, and to provide Disaster Recovery with a Recovery Time Objective in minutes, and a Recovery Point Objective with some data loss. A quorum of sync replicas can reduce the risk of data loss at the price of higher latency for write transactions. This must be carefully managed to avoid data corruption in case of failover.
šYugabyteDB is built for efficient sync replication, sharding the tables and indexes into tablets, forming a Raft group, and replicating them to their peers with the Raft protocol. Thereās the guarantee to always have one and only one āleaderā to take the reads and writes for a set of rows or index entries, without any risk of split-brain. And a āfollowerā can be elected as a new āleaderā within 3 seconds.
#3: Inefficient Replication That Spreads Corruption
šPostgreSQL streaming replication is based on physical replication though the WAL, at page level. I do not agree with the āSpread Corruptionā part. The author of the ā10 thingsā mentions āhardware-induced data corruptionā which would need more explanations, because WAL streaming replication is a lot safer, in this area, than storage-based replication. I guess that what the author experienced was a non-full-page WAL record corrupting an existing block in the standby.
šYugabyteDB replication is a logical replication, at key-value level (the rows, index entries, transaction intents,.. are all key-value records appended to the Raft group). This is more efficient in the volume transferred (and this is also important for the cost of cross-cloud-region clusters). But it also lets each tablet peer apply the changes to their own files. YugabyteDB writes into Sorted Sequence Table (SST) files, always written sequentially, with no risk that a new change corrupts a previous block. They are compacted on each node independently, and compaction will detect a block corruption, thanks to checksum checking, and repair it.
#4: MVCC Garbage Frequently Painful
šPostgreSQL accumulates the states by writing entire new rows. Even updating one byte of a single column will copy the whole row. This is known as table bloat, has also some performances on index maintenance especially if you use the default 100% fillfactor, and requires frequent vacuum.
šYugabyteDB stores updates on a per column basis. The accumulation of the garbage collection of the oldest versions, kept a short time for MVCC purpose, is managed by the storage with the compaction of SST files in background, per table/index shard. This activity doesnāt collide with writes because the SST Files are immutable.
#5: Process-Per-Connection = Pain at Scale
šPostgreSQL is not multi-threaded, and forks a new process per connection. Each connection takes memory, and memory is limited on a server. Especially with double buffering, less available memory means more I/O. Connection pools can help to limit this, but, with microservices, this is still a lot of connections. Pgbouncer can help, but thatās another component to install and manage.
šYugabyteDB does the same, one process per connection, because the protocol and query layer uses PostgreSQL. However, this is not a scalability issue because connections can be spread to multiple nodes: all nodes are equal, accepting connections, with read and write transactions.
#6: Primary Key Index is a Space Hog
šPostgreSQL stores rows in heap tables like Oracle, and the primary key is an additional secondary index. This takes more space and more reads (even the Index Only Access) has to read the table.
šYugabyteDB stores tables in their primary index structure, a log-structured merge-tree (LSM) with the first level in memory. This ensures fast access by primary key, and real Index Only Access by secondary indexes.
#7: Major Version Upgrades Can Require Downtime
šPostgreSQL upgrades are done with a planned outage. The application must be stopped, not only for the duration of the upgrade but also for the statistic gathering that must be done after it (ANALYZE). This makes the downtime window depending on the size of the database.
šYugabyteDB, being distributed, allows online upgrade by upgrading the nodes one by one in a rolling fashion. In a Replication Factor RF=3 deployment, one node can be down without impacting the application availability. The restart is fast, but you can also add a new node and remove the old one when data has been rebalanced to it.
#8: Somewhat Cumbersome Replication Setup
šPostgreSQL replication is not easy to setup. From a recent discussion with a friend concerned by it, here is a recent example showing how troubleshooting can be complex. Note that it also shows the amazing help you can have from an open-source community.
šYugabyteDB is designed for replication and all, sync, async, and change data capture (CDC), are at the same level: the per-shard Raft group writes. This is built to be fully automated. Enabling sync replication is as simple as setting --replication-factor=3
when starting the cluster.
#9: Ridiculous No-Planner-Hints Dogma
šPostgreSQL core team has always been against the usage of query planner hints. Even when it is accepted that giving directives to the optimizer is a good thing, like materializing a CTE or not, it has been introduced in the SQL syntax, breaking the compatibility with previous versions.
šYugabyteDB enables the pg_hint_plan
by default. That's not a reason to put hints everywhere, but it is already there when needed. For testing or quickly workaround a problem. This is simple and can be done with PostgreSQL of course. But, in IT, so many deployments are made without specific configuration, the defaults matter.
#10: No Block Compression
šPostgreSQL doesnāt provide compression and, generally speaking, counts on the operating system to handle anything file-related. Iāve seen people using ZFS for the many features it provides, but getting acceptable and predictable performance for a database on ZFS is not easy.
šYugabyteDB provides compression and encryption, at rest (storage) and in transit (network). This is a must for a cloud-native database given the cost of storage, and egress traffic between AZ or regions.
All That Saidā¦
The author, in conclusion, still recommends: āIn general Iād recommend starting with PostgreSQL and then trying to figure out why it wonāt work for your use caseā. He mentions that some of those issues can be reduced by using a managed service. Because, solutions, or workarounds are well known. You just need some DBAs to managed them pro-actively, in-house or as from a managed service. PostgreSQL is the best open-source database I know for a wide area of workloads.
Today, for an OLTP workload where you may need to scale at some point, and where high availability is a thing, I would recommend looking at YugabyteDB. Fully open-source, and compatible with PostgreSQL, you can still change your mind and move to PostgreSQL, managed or not, later. By doing this, you release the burden of dealing with vacuum and replication problems.
Iām not saying that you will not have other problems to deal with, thereās no magic. A distributed database requires a bit more upfront design, like on the primary key sharding, and may show lower performance on single-node operations because thereās no shared buffer cache. YugabyteDB re-use of PostgreSQL is stateless to be able to scale-out. However, in the decision to start with one or the other, the time you spend on the issues mentioned in this article are wasted when you move to another storage engine. On the other way, when starting with a distributed SQL database, the time spent on data model design for scalability will never be a waste of time. You will still benefit from scalable design in a monolithic database.
In some way, this last paragraph is an answer to:
My answer is my subjective opinion, and you should make your own in your context, of course. I love PostgreSQL for its community of users, the ecosystem around it, its open-source model, and the strict SQL standard compatibility. And YugabyteDB reuses all that. Donāt forget that Iām a Developer Advocate at Yugabyte. The more people start with YugabyteDB, the more work I have to help, learn and share. And I enjoy it enough to ask for more š