Distributed SQL Essentials: Sharding and Partitioning in YugabyteDB

  • At the query level (YSQL), using the PostgreSQL syntax, the user partitions a logical table into multiple ones, based on column values. This is called table partitioning.
  • At the storage level (DocDB), using the primary key and index column definition, we distribute rows into multiple tablets. We call this sharding.
  • At the file level, the rows — or index entries — are stored on the key, within each Sorted Sequence Table (SST).

Sharded vs. Distributed

Application sharding

Sharded databases

Distributed SQL Databases

Yugabyte Distributed SQL

“SST files”

Tablet sharding

  • The colocated property, or tablegroup, defines which SQL tables (i.e., indexes or partitions, as we will see later) are sharded, with multiple dedicated tablets. And which ones are colocated, replicated but not sharded.
  • The primary key or index column definition defines the sharding method, with the HASH or range (ASC or DESC) sharding. This is where we choose to distribute data on the hash column to load balance and avoid hot points. It’s also where we keep rows together on the ascending or descending columns for range scans (like ‘<‘, >’, ‘BETWEEN’ predicates)
  • Optionally, the additional definition on the number of tablets with the --ysql_num_tablets at system level or SPLIT AT/INTO a table at index level.
  • Co-locate the small tables together to reduce the memory footprint.
  • Shard the large and growing tables into multiple tablets, one per node (or more), to be able to rebalance and scale them out when adding more nodes.
  • For sharded tables, think about which columns will be accessed by a range scan, when rows are queried together using a range predicate.

Table partitioning

  • For legal reasons, when you are required to store data from some countries in a specific region.
  • To ensure predictable response time, with data close to the users.
  • For cost reasons, to lower cross-AZ, cross-region, or cross-cloud provider network traffic.

Global indexes

Summary

Developer Advocate at Yugabyte, Open Source distributed SQL database 🚀 Also Oracle ACE Director, Oracle Certified Master, AWS Data Hero, OakTable member

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to manage development projects effectively

What Makes Good Code? (2/7)

How I developed a laravel package that will generate all countries and states around the world…

Data safety and security on Notebook.ai just (quietly) got a whole lot better

Adapting Flink Slot Sharing Group For Better Performance in Data Processing Pipeline

Using Vega with Python and R: Problems met, solution and a demo

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Franck Pachot

Franck Pachot

Developer Advocate at Yugabyte, Open Source distributed SQL database 🚀 Also Oracle ACE Director, Oracle Certified Master, AWS Data Hero, OakTable member

More from Medium

How can Apache Spark help your big data?

How to succeed and twice fail for AWS Glue Developer Endpoint and Apache Zeppelin

Is Apache Impala 65 to 200 times faster than Apache Hive on Tez.

Hadoop — Could not find or load main class <UserName> (Error Fix)