PITR snapshot: an easy flashback / backtrack for application releases

initially published on dev.to

During an application release, you may run some DDL and DML to bring the database from one version to another. If anything fails, for whatever reason, you don’t want to leave it in an intermediate state and restore the initial state.

  • This is an In-Place Point In Time Recovery (PITR) and can be achieved by restoring a backup and recovering the transactions until the initial point in time. But doing this in this way can be long, and the fallback procedure is part of the maintenance window, whatever the probability of the decision.
  • Another solution is Transactional DDL where you run everything in a transaction that can be rolled back. PostgreSQL is quite nice there, supporting transactional DDL and being fast to rollback. But having long-running transactions may not be the best idea and you don’t really need this because the application is stopped.
  • The third solution is what Oracle calls Flashback Database, and Amazon Aurora calls Backtrack. It is similar to PITR but, rather than a restore + recover, it keeps a snapshot from the start of the maintenance window, easy to revert to.

The snapshot solution depends on the storage. YugabyteDB stores tables and indexes into LSM-Tree which are append-only. This is stored on the filesystem with SST Files, which are written once and never updated. With this technology, the snapshots are easy: nothing to write, just keep the files. It is a bit more complicated to synchronize metadata (the PostgreSQL catalog) that is stored on the yb-master. To automate this, we can create a snapshot schedule for the duration of the maintenance.

Example

I’m starting a RF=3 cluster on my laptop using my ybdemo/docker/yb-lab/ and I set two aliases, ysqlsh, for SQL commands, and yb-admin, for snapshot commands:

alias ysqlsh="\
docker exec -it yb-tserver-0 ysqlsh -h yb-tserver-0 \
"

alias yb-admin="\
docker exec -it yb-master-0 /home/yugabyte/bin/yb-admin \
--master_addresses $(echo yb-master-{0..2}:7100|tr ' ' ,)\
"

sh gen-yb-docker-compose.sh rf3

I create a database with a demo table:

ysqlsh create database myapp;
\c myapp
create table mytable (a uuid primary key, b bigint unique);
create extension pgcrypto;
insert into mytable select gen_random_uuid(), generate_series(1,1000);
\q

Here are my tables and indexes: mytable with OID 16426 holds the table rows in its primary key structure, indexed on "a" and mytable_b_key is the unique index on "b"

http://localhost:7000/tables

Take a snapshot

When the maintenance window starts, after stopping the application and before running the changes in the database, I take a snapshot by starting a snapshot schedule with yb-admin create_snapshot_schedule, a frequency and retention, and the name of the database prefixed with ysql.:

Franck@YB:/home/ybdemo/docker/yb-lab $ 

yb_sched=$(
yb-admin create_snapshot_schedule 60 2880 ysql.myapp |
tee /dev/stderr |
jq -r .schedule_id ;
) ; echo "${yb_sched}"

{
"schedule_id": "46178a8f-507f-40cf-ab4d-42e677d4de5d"
}
46178a8f-507f-40cf-ab4d-42e677d4de5d

This will take an hourly snapshot (every 60 minutes) that is kept for two days (2880 minutes). This allows to recovery within 24h, without increasing the MVCC retention too much (one hour here). Here, the goal is to cover the maintenance window and be able to recover quickly by applying, at most, one hour of WAL. I need only one snapshot fort that, and I’ll delete the snapshot schedule at the end.

The first snapshot is taken in the background. I want to be sure that the snapshot is complete:

Franck@YB:/home/ybdemo/docker/yb-lab $

while yb-admin list_snapshots | grep CREATING
do sleep 1 ; done

Franck@YB:/home/ybdemo/docker/yb-lab $

yb-admin list_snapshots

Snapshot UUID State Creation Time
4aeaba29-7ecc-4974-9cc3-5a9b6a6f5ab4 COMPLETE 2022-06-16 11:51:10.543104
No snapshot restorations

This enables Point In Time Recovery (PITR) and allows to recover to any point in time after this first snapshot. As the goal is to revert back to the start of the maintenance window, I take note of this timestamp:

Franck@YB:/home/ybdemo/docker/yb-lab $ 

yb_pit0=$(
ysqlsh -qAt -c "
select to_char(now() at time zone 'utc','YYYY-MM-DD HH24:MI:SS')
" myapp
) ; echo "${yb_pit0}"

2022-06-16 11:53:14

I get it from the database. You can also get it from Linux date +%s but I find it convenient to log it at the start of the DDL + DML that will be run.

Some DDL and DML

I this example I’ll do two things:

  • remove half of the rows, as if we purge old data
  • change the primary key from column a to column b which is typically something that requires DDL + DML, though an intermediate table
ysqlsh myapp begin transaction;
delete from mytable where b<=500;
commit;
alter table mytable rename to tmp_mytable;
create table mytable (a uuid unique, b bigint primary key);
insert into mytable (a,b) select a,b from tmp_mytable;
drop table tmp_mytable;
\q

Now that PITR is enabled, I still see the dropped table and index, because metadata is required in case of restoring a previous state of data. They are still there, but marked as “Hidden”:

http://localhost:7000/tables

The new table, with “b” as the primary key, is the visible one:

Franck@YB:/home/ybdemo/docker/yb-lab $ 

ysqlsh -c '\d mytable' myapp;

Table "public.mytable"
Column | Type | Collation | Nullable | Default
--------+--------+-----------+----------+---------
a | uuid | | |
b | bigint | | not null |
Indexes:
"mytable_pkey" PRIMARY KEY, lsm (b HASH)
"mytable_a_key" UNIQUE CONSTRAINT, lsm (a HASH)

Franck@YB:/home/ybdemo/docker/yb-lab $

ysqlsh -c 'select count(*) from mytable' myapp

count
---------
500
(1 row)

Flashback

Let’s say that this change doesn’t work as expected, the application test shows that something fails with this new schema. We want to revert it back quickly and open the application again.

Franck@YB:/home/ybdemo/docker/yb-lab $

yb-admin restore_snapshot_schedule ${yb_sched} "${yb_pit0}"

{
"snapshot_id": "9fa70057-1404-4487-ae05-3f023972b2dd",
"restoration_id": "121b87bf-81c7-4950-8093-6f801c9433c7"
}

The restore happens in the background. I want to be sure that the restore is complete:

Franck@YB:/home/ybdemo/docker/yb-lab $

while yb-admin list_snapshots | grep RESTORING
do sleep 1 ; done

yb-admin list_snapshots

Snapshot UUID State Creation Time
4aeaba29-7ecc-4974-9cc3-5a9b6a6f5ab4 COMPLETE 2022-06-16 11:51:10.543104
9fa70057-1404-4487-ae05-3f023972b2dd COMPLETE 2022-06-16 11:56:28.584306
Restoration UUID State
121b87bf-81c7-4950-8093-6f801c9433c7 RESTORED

Verification

Now, at this point in time, the new tables are hidden and the old ones are visible:

http://localhost:7000/tables

Both metadata and data have been restored as of the initial point in time:

Franck@YB:/home/ybdemo/docker/yb-lab $ 

ysqlsh -c '\d mytable' myapp;

Table "public.mytable"
Column | Type | Collation | Nullable | Default
--------+--------+-----------+----------+---------
a | uuid | | not null |
b | bigint | | |
Indexes:
"mytable_pkey" PRIMARY KEY, lsm (a HASH)
"mytable_b_key" UNIQUE CONSTRAINT, lsm (b HASH)

Franck@YB:/home/ybdemo/docker/yb-lab $

ysqlsh -c 'select count(*) from mytable' myapp

count
---------
1000
(1 row)

All is back as of 11:53:14 — tables and metadata

Forward and backward

In the list_snapshot above we have seen two snapshots because one was taken before the restore. This means that, in case of doubt, we can roll forward and backward within this timeline:

Franck@YB:/home/ybdemo/docker/yb-lab $

yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:37:31"

{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "ab81a8f8-eee1-410a-9e06-65edb6248160"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

ysqlsh -c 'select count(*) from mytable' myapp
count
-------
500
(1 row)


Franck@YB:/home/ybdemo/docker/yb-lab $

yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:37:00"

{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "edc3981d-fc77-4564-b77a-05d2e05d5483"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

ysqlsh -c 'select count(*) from mytable' myapp
count
-------
500
(1 row)


Franck@YB:/home/ybdemo/docker/yb-lab $

yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:36:30"

{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "3c51073f-2d43-418c-b468-29b3c2fee160"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

ysqlsh -c 'select count(*) from mytable' myapp

count
-------
1000
(1 row)


Franck@YB:/home/ybdemo/docker/yb-lab $

yb-admin restore_snapshot_schedule ${yb_sched} "2022-06-14 16:36:45"

{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "d2e137f8-86fa-41bf-ace8-864bf1572fc3"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

ysqlsh -c 'select count(*) from mytable' myapp

ERROR: relation "mytable" does not exist
LINE 1: select count(*) from mytable
^

Franck@YB:/home/ybdemo/docker/yb-lab $

yb-admin restore_snapshot_schedule ${yb_sched} "${yb_pit0}"

{
"snapshot_id": "6c223f6c-b033-41dc-a1d9-404607b79252",
"restoration_id": "4ef52933-a40d-4764-8dfd-f01f6298b966"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

ysqlsh -c 'select count(*) from mytable' myapp

count
-------
1000
(1 row)

(the output here is a copy/paste from the dev.to article, the times doesn’t match this where I re-ran it)

You see here that I was able to go back to the future, just before the restore, then back in time again, finding when the table had 1000 rows before my delete, and even when the table was renamed but not yet created, and finally back to the initial state as it was my goal.

Delete schedule

When all is validated, there’s no need to keep the snapshots. Of course you can, if you want to be able to do point in time recovery, but for this example I need it only to cover the maintenance window.

Franck@YB:/home/ybdemo/docker/yb-lab $

yb-admin delete_snapshot_schedule ${yb_sched}

{
"schedule_id": "46178a8f-507f-40cf-ab4d-42e677d4de5d"
}

Franck@YB:/home/ybdemo/docker/yb-lab $

yb-admin list_snapshots

No snapshots
Restoration UUID State
727a168b-6666-42a1-b67f-4f5148180f88 RESTORED
ab81a8f8-eee1-410a-9e06-65edb6248160 RESTORED
4ef52933-a40d-4764-8dfd-f01f6298b966 RESTORED
edc3981d-fc77-4564-b77a-05d2e05d5483 RESTORED
3c51073f-2d43-418c-b468-29b3c2fee160 RESTORED
d2e137f8-86fa-41bf-ace8-864bf1572fc3 RESTORED

I have no snapshot anymore. Just some traces of my restores.

And all is clean, no hidden table:

http://localhost:7000/tables

Summary

YugabyteDB doesn’t support transactional DDL yet, but the need to protect a maintenance window full of DDL and DML is achieved with fast Point In Time Recovery. Doing the same with PostgreSQL is not easy and requires a copy-on-write filesystem which impacts the performance. YugabyteDB storage, in SST Files, allows fast snapshots, on each node. This, combined with YugabyteDB implementation of MVCC (Multi-Version Concurrency Control) with Hybrid Logical Clock timestamps, makes it possible to travel in time, to a state that is consistent across all nodes. The MVCC retention, usually set by timestamp_history_retention_interval_sec for transaction snapshots, is internally increased to cover down to the previous PITR snapshot.

Note: PITR is work in progress, there are currently (2.13) some limitation. They are documented and the roadmap is in PITR: Tracking issue #7120.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Franck Pachot

Developer Advocate at Yugabyte, Open Source distributed SQL database 🚀 Also Oracle ACE Director, Oracle Certified Master, AWS Data Hero, OakTable member