Which šŸ˜PostgreSQL problems are solved with šŸš€YugabyteDB

Franck Pachot
7 min readFeb 25, 2022

--

A famous article, ā€œ10 Things I Hate About PostgreSQLā€, by Rick Branson summarizes the main pain points people encounter with PostgreSQL. It bring an answer to: ā€œNo software is perfect, so exactly what are PostgreSQLā€™s imperfections?ā€

Those problems are known for a while, work has been started on them, but they have their root in the design of the storage and transactions. This is not easy to change in a database that is used for decades, for critical applications, with, most of the time, no direct contact between the users and the community, because it is free and open source.
The PostgreSQL community brings many features and solutions into every release in the upper layer, the SQL language, the relational features, the advanced semi-structured data types like JSONB. But the lower layer, storage and transaction, is harder to evolve and still uses the Heap tables, B*Tree indexes, full-page WAL per checkpoint, no in-place updates, Vacuum, 32-bit xidā€¦ designed from the old times. It works, there are contributions to workaround the problems, but it can be annoying as mentioned in the ā€œ10 Thingsā€ article.

The primary goal of YugabyteDB is to provide a distributed SQL database compatible with all the features of PostgreSQL, with a cloud-native architecture for the distributed storage: all nodes active, resilient to failures and elastic to scale. But, with the modern storage layer design, it also solves most of the problems mentioned. This post will list the ā€œ10 Thingsā€ from the article and how they are solved in YugabyteDB. The titles are from Rick Branson article, they are not my words.

#1: Disastrous XID Wraparound

šŸ˜PostgreSQL uses a 32-bit transaction ID and the database freezes when it is exhausted. Table *vacuum * must run frequently to avoid this.

šŸš€YugabyteDB uses Hybrid Logical Time (HLC) for cluster synchronization (see A Matter of Time: Evolving Clock Sync for Distributed Databases) and, as thereā€™s a physical clock component in it, it will never exhaust.

#2: Failover Will Probably Lose Data

šŸ˜PostgreSQL doesnā€™t provide ACID across multiple servers. An async replica can be setup to offload reads, with an acceptable lag, and to provide Disaster Recovery with a Recovery Time Objective in minutes, and a Recovery Point Objective with some data loss. A quorum of sync replicas can reduce the risk of data loss at the price of higher latency for write transactions. This must be carefully managed to avoid data corruption in case of failover.

šŸš€YugabyteDB is built for efficient sync replication, sharding the tables and indexes into tablets, forming a Raft group, and replicating them to their peers with the Raft protocol. Thereā€™s the guarantee to always have one and only one ā€œleaderā€ to take the reads and writes for a set of rows or index entries, without any risk of split-brain. And a ā€œfollowerā€ can be elected as a new ā€œleaderā€ within 3 seconds.

#3: Inefficient Replication That Spreads Corruption

šŸ˜PostgreSQL streaming replication is based on physical replication though the WAL, at page level. I do not agree with the ā€œSpread Corruptionā€ part. The author of the ā€œ10 thingsā€ mentions ā€œhardware-induced data corruptionā€ which would need more explanations, because WAL streaming replication is a lot safer, in this area, than storage-based replication. I guess that what the author experienced was a non-full-page WAL record corrupting an existing block in the standby.

šŸš€YugabyteDB replication is a logical replication, at key-value level (the rows, index entries, transaction intents,.. are all key-value records appended to the Raft group). This is more efficient in the volume transferred (and this is also important for the cost of cross-cloud-region clusters). But it also lets each tablet peer apply the changes to their own files. YugabyteDB writes into Sorted Sequence Table (SST) files, always written sequentially, with no risk that a new change corrupts a previous block. They are compacted on each node independently, and compaction will detect a block corruption, thanks to checksum checking, and repair it.

#4: MVCC Garbage Frequently Painful

šŸ˜PostgreSQL accumulates the states by writing entire new rows. Even updating one byte of a single column will copy the whole row. This is known as table bloat, has also some performances on index maintenance especially if you use the default 100% fillfactor, and requires frequent vacuum.

šŸš€YugabyteDB stores updates on a per column basis. The accumulation of the garbage collection of the oldest versions, kept a short time for MVCC purpose, is managed by the storage with the compaction of SST files in background, per table/index shard. This activity doesnā€™t collide with writes because the SST Files are immutable.

#5: Process-Per-Connection = Pain at Scale

šŸ˜PostgreSQL is not multi-threaded, and forks a new process per connection. Each connection takes memory, and memory is limited on a server. Especially with double buffering, less available memory means more I/O. Connection pools can help to limit this, but, with microservices, this is still a lot of connections. Pgbouncer can help, but thatā€™s another component to install and manage.

šŸš€YugabyteDB does the same, one process per connection, because the protocol and query layer uses PostgreSQL. However, this is not a scalability issue because connections can be spread to multiple nodes: all nodes are equal, accepting connections, with read and write transactions.

#6: Primary Key Index is a Space Hog

šŸ˜PostgreSQL stores rows in heap tables like Oracle, and the primary key is an additional secondary index. This takes more space and more reads (even the Index Only Access) has to read the table.

šŸš€YugabyteDB stores tables in their primary index structure, a log-structured merge-tree (LSM) with the first level in memory. This ensures fast access by primary key, and real Index Only Access by secondary indexes.

#7: Major Version Upgrades Can Require Downtime

šŸ˜PostgreSQL upgrades are done with a planned outage. The application must be stopped, not only for the duration of the upgrade but also for the statistic gathering that must be done after it (ANALYZE). This makes the downtime window depending on the size of the database.

šŸš€YugabyteDB, being distributed, allows online upgrade by upgrading the nodes one by one in a rolling fashion. In a Replication Factor RF=3 deployment, one node can be down without impacting the application availability. The restart is fast, but you can also add a new node and remove the old one when data has been rebalanced to it.

#8: Somewhat Cumbersome Replication Setup

šŸ˜PostgreSQL replication is not easy to setup. From a recent discussion with a friend concerned by it, here is a recent example showing how troubleshooting can be complex. Note that it also shows the amazing help you can have from an open-source community.

šŸš€YugabyteDB is designed for replication and all, sync, async, and change data capture (CDC), are at the same level: the per-shard Raft group writes. This is built to be fully automated. Enabling sync replication is as simple as setting --replication-factor=3 when starting the cluster.

#9: Ridiculous No-Planner-Hints Dogma

šŸ˜PostgreSQL core team has always been against the usage of query planner hints. Even when it is accepted that giving directives to the optimizer is a good thing, like materializing a CTE or not, it has been introduced in the SQL syntax, breaking the compatibility with previous versions.

šŸš€YugabyteDB enables the pg_hint_plan by default. That's not a reason to put hints everywhere, but it is already there when needed. For testing or quickly workaround a problem. This is simple and can be done with PostgreSQL of course. But, in IT, so many deployments are made without specific configuration, the defaults matter.

#10: No Block Compression

šŸ˜PostgreSQL doesnā€™t provide compression and, generally speaking, counts on the operating system to handle anything file-related. Iā€™ve seen people using ZFS for the many features it provides, but getting acceptable and predictable performance for a database on ZFS is not easy.

šŸš€YugabyteDB provides compression and encryption, at rest (storage) and in transit (network). This is a must for a cloud-native database given the cost of storage, and egress traffic between AZ or regions.

All That Saidā€¦

The author, in conclusion, still recommends: ā€œIn general Iā€™d recommend starting with PostgreSQL and then trying to figure out why it wonā€™t work for your use caseā€. He mentions that some of those issues can be reduced by using a managed service. Because, solutions, or workarounds are well known. You just need some DBAs to managed them pro-actively, in-house or as from a managed service. PostgreSQL is the best open-source database I know for a wide area of workloads.

Today, for an OLTP workload where you may need to scale at some point, and where high availability is a thing, I would recommend looking at YugabyteDB. Fully open-source, and compatible with PostgreSQL, you can still change your mind and move to PostgreSQL, managed or not, later. By doing this, you release the burden of dealing with vacuum and replication problems.

Iā€™m not saying that you will not have other problems to deal with, thereā€™s no magic. A distributed database requires a bit more upfront design, like on the primary key sharding, and may show lower performance on single-node operations because thereā€™s no shared buffer cache. YugabyteDB re-use of PostgreSQL is stateless to be able to scale-out. However, in the decision to start with one or the other, the time you spend on the issues mentioned in this article are wasted when you move to another storage engine. On the other way, when starting with a distributed SQL database, the time spent on data model design for scalability will never be a waste of time. You will still benefit from scalable design in a monolithic database.

In some way, this last paragraph is an answer to:

My answer is my subjective opinion, and you should make your own in your context, of course. I love PostgreSQL for its community of users, the ecosystem around it, its open-source model, and the strict SQL standard compatibility. And YugabyteDB reuses all that. Donā€™t forget that Iā€™m a Developer Advocate at Yugabyte. The more people start with YugabyteDB, the more work I have to help, learn and share. And I enjoy it enough to ask for more šŸ˜Ž

--

--

Franck Pachot
Franck Pachot

Written by Franck Pachot

Developer Advocate for YugabyteDB (Open-Source, PostgreSQL-compatible Distributed SQL Database. Oracle Certified Master and AWS Data Hero.

No responses yet