Distributed SQL Essentials: Sharding and Partitioning in YugabyteDB

  • At the storage level (DocDB), using the primary key and index column definition, we distribute rows into multiple tablets. We call this sharding.
  • At the file level, the rows — or index entries — are stored on the key, within each Sorted Sequence Table (SST).

Sharded vs. Distributed

Partitioning is a general term, and sharding is commonly used for horizontal partitioning to scale-out the database in a shared-nothing architecture. But these terms are used for different architectural concepts. However, since YugabyteDB provides both, it’s important to use the right terminology.

Application sharding

If you split your data into multiple databases, this can be called sharding. For example, let’s say you have a database for your EMEA customers: one for the US, and one for Asia. The application connects to one or the other depending on the country code of the user. This is called application sharding, but is not a distributed database. You don’t need any special database for that, only some automation to manage it. You can also use multiple instances of a monolithic database. But this requires a lot of code in the application and operations to manage it.

Sharded databases

If you add middleware to handle part of this additional code, you then have a sharded database. For example, CitusDB runs multiple PostgreSQL databases, and adds a master node. You have to connect to the master, or aggregator, which will redirect the query — or part of it — to the worker nodes. Oracle Sharding is similar, with Shard Director and Connection Pools to coordinate the work that is offloaded to the many databases. This technique is heavily used in data warehouses where you can split queries and process them in parallel. But OLTP transactions on relational tables have to maintain indexes and referential integrity across multiple shards.

Distributed SQL Databases

In a distributed SQL database, you have no master or director. All nodes are equal. You can connect wherever you want, and query your entire database from there. Each node knows the topology of the cluster, the location of data and metadata, the transaction status, and sequence next value (with a cache) because it is a single database. You have elasticity without giving up on SQL features from the most advanced relational databases.

Yugabyte Distributed SQL

Now that we’ve defined those techniques, let’s go deeper into all levels of data organization in YugabyteDB, starting from the storage layer.

“SST files”

At the lower level, the organization of rows — and index entries — is within the Sorted Sequence Table (SST) on a Linux filesystem. This is done automatically, during writes and compaction, so that reads can find a row easily. Each SST holds rows ordered by their key and version, with additional indexes and bloom filters to locate them. In terms of co-location, the rows — or index entries — with a key that has the same hash value, and within a narrow range of values, are stored together in each SST file. The number of SST files is limited by compaction. At this level, everything is automated in YugabyteDB. The user only provides the filesystem paths.

Tablet sharding

The SST files store the key-value pairs for tables and indexes. Sharding is the right term here because each tablet is a database (based on RocksDB), with its own protection. This looks like the sharded databases we described above, except that they are not SQL databases but key-value document stores. They have all the required features for a reliable datastore, with transactions and strong consistency. However, they don’t have the burden of managing them as multiple databases because the SQL layer is above. Joins and secondary indexes are not processed at this level because this prevents cross-shard transactions.

  • The primary key or index column definition defines the sharding method, with the HASH or range (ASC or DESC) sharding. This is where we choose to distribute data on the hash column to load balance and avoid hot points. It’s also where we keep rows together on the ascending or descending columns for range scans (like ‘<‘, >’, ‘BETWEEN’ predicates)
  • Optionally, the additional definition on the number of tablets with the --ysql_num_tablets at system level or SPLIT AT/INTO a table at index level.
  • Shard the large and growing tables into multiple tablets, one per node (or more), to be able to rebalance and scale them out when adding more nodes.
  • For sharded tables, think about which columns will be accessed by a range scan, when rows are queried together using a range predicate.

Table partitioning

The previously-discussed approach to sharding considers all nodes to be equal, whether they are located in the same Data Center or multiple availability zones (AZ). You’ll want to load-balance across all compute and storage instances, and scale-out as needed. The organization of data is decided per table and index. However, you may want to go further with more control on rows, based on their value. This is where you partition a table at SQL level, on column values, with a HASH function, RANGE, or LIST of values. This is a PostgreSQL feature, known as declarative partitioning, which can be used with YugabyteDB because it is fully code compatible with PostgreSQL.

  • To ensure predictable response time, with data close to the users.
  • For cost reasons, to lower cross-AZ, cross-region, or cross-cloud provider network traffic.

Global indexes

After describing the levels of data distribution, let’s explain why we need all of them. SQL, especially for OLTP, is complex. You have multiple keys, usually one surrogate key and one or many natural keys. They all need to be enforced by a unique index. And because the tables can be partitioned on only one key, the others require a global index. A global index can be partitioned, but not on the same key as the table.

Summary

With a distributed SQL database like YugabyteDB, the purpose of sharding into tablets is to scale-out at row level. On top of this, you can create unique constraints, secondary indexes, foreign key constraints, and transactions. Partitions and tablespaces give more control on the physical location. A distributed SQL database needs to combine both and then be exposed as one database application that can connect to any node transparently. Sharding operates on tablets for data distribution, applying a hash or range function on rows and global index entries. Partitioning operates on table partitions for data placement, applying range or list defined on the table, with local indexes. Tablet sharding applies to YCQL and YSQL but partitioning is a YSQL feature. Documentation details can be found here:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Franck Pachot

Franck Pachot

502 Followers

Developer Advocate at Yugabyte, Open Source distributed SQL database 🚀 Also Oracle ACE Director, Oracle Certified Master, AWS Data Hero, OakTable member