pgbench retry for repeatable read transactions — first (re)tries

Trying a no-patch solution for pgbench running on repeatable read transactions, using a custom script with PL/pgSQL

Franck Pachot
7 min readAug 19, 2019

In a previous post I was running pgBench on YugaByteDB in serializable isolation level. But Serializable is optimistic and requires that the transactions are re-tried when failed. But pgBench has no retry mode. There was a patch proposed in several commit fests for that, but patch acceptance is a long journey in PostgreSQL:

Rather than patching pgbench.c I’m trying to implement a retry logic with the preferred procedural code: plpgsql. This is not easy because there are many limits with transaction management in procedures. So let’s start simple with a Repetable Read isolation level and trying to get most of the threads not failing before the end.

pgBench simple-update builtin

I create a database for my demo with Repeatable Read default isolation level, and initialize the pgBench schema:

## restart the server
pg_ctl -l $PGDATA/logfile restart
## create the database and the procedure
psql -e <<'PSQL'
-- re-create the demo database and set serializable as default
drop database if exists franck;
create database franck;
alter database franck set default_transaction_isolation='repeatable read';
PSQL
pgbench --initialize --init-steps=dtgvpf franck
alter database set default_transaction_isolation=’repeatable read’;
\! pgbench — initialize — init-steps=dtgvpf

Then I run the built-in simple update, from 10 threads:

pgbench --builtin=simple-update --time 30 --jobs=10 --client=10 franck

Here is the disappointing result:

My clients fail quickly as soon as they encounter a serialization error and all finishes with few threads only. Those errors are normal in a MVCC database. Maximum concurrency is achieved with optimistic locking. It is up to the application to re-try when encountering a serialization error.

But pgBench has no option for that. To workaround this I create a procedure that does the same job as the simple-update builtin, so that I have a procedural language (PL/pgSQL) to do those retries.

pgBench simple-update script

Here is what is actually harcoded for the simple-update builtin:

cat > /tmp/simple-update.sql <<'CAT'
-- simple-update <builtin: simple update>
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta
WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
CAT

I can run it with:

pgbench --file=/tmp/simple-update.sql --time 30 --jobs=10 --client=10 franck

Of course, the result is the same as with the built-in: threads die as soon as they encounter a serialization error.

pgBench simple-update anonymous block

My first idea was to run an anonymous block in order to add some procedural code for retries:

cat > /tmp/simple-update.sql <<'CAT'
-- simple-update <builtin: simple update>
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
DO
$$
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta
WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
$$
CAT

but this is not possible because bind variables are not possible in anonymous blocks:

ERROR: bind message supplies 7 parameters, but prepared statement “” requires 0

pgBench simple-update procedure

Then I create a stored procedure for that procedural code.

Here is the call:

cat > /tmp/simple-update.sql <<'CAT'
-- simple-update <builtin: simple update>
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
call SIMPLE_UPDATE_RETRY(:aid, :bid, :tid, :delta);
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
CAT

I keep the SELECT in the client as I cannot return a resultset from a procedure.

The UPDATE and INSERT are run into the procedure:

psql -e <<'PSQL'
\connect franck
create or replace procedure SIMPLE_UPDATE_RETRY
(p_aid integer, p_bid integer, p_tid integer, p_delta integer) AS $$
BEGIN
UPDATE pgbench_accounts SET abalance = abalance + p_delta
WHERE aid = p_aid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
VALUES (p_tid, p_bid, p_aid, p_delta, CURRENT_TIMESTAMP);
END;
$$ language plpgsql;
PSQL

Of course, I’ll get the same error as there’s no retry logic yet:

pgbench --file=/tmp/simple-update.sql --time 30 --jobs=10 --client=10 franck
ERROR: could not serialize access due to concurrent update

pgBench simple-update procedure with retry

Now that I have a procedure I can add the retry logic:

  • a loop to be able to retry
  • in the loop, a block with exception
  • when no error is encountered, exit the loop
  • when error is encountered, rollback and continue the loop
  • in case there are too many retries, finally raise the error
psql -e <<'PSQL'
\connect franck
create or replace procedure SIMPLE_UPDATE_RETRY
(p_aid integer, p_bid integer, p_tid integer, p_delta integer) AS $$
declare
retries integer:=0;
BEGIN
loop
begin
UPDATE pgbench_accounts SET abalance = abalance + p_delta
WHERE aid = p_aid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
VALUES (p_tid, p_bid, p_aid, p_delta, CURRENT_TIMESTAMP);
-- exits the loop as we got no error after all DML
exit;
exception
when serialization_failure then
-- need to rollback to run in a new snapshot
rollback;
-- give up after 10 retries
if retries >10 then
raise notice 'Give Up after % retries. tid=%',retries,p_tid;
raise;
end if;
-- continue the retry loop
end;
retries=retries+1;
end loop;
commit;
if retries > 2 then
raise notice 'Required % retries (tid=%)',retries,p_tid;
end if;
END;
$$ language plpgsql;
PSQL

Note that the “rollback” is required. The exception block does a rollback to savepoint, but then it continues to run in the same snapshot. Without starting another transaction the same serialization error will be encountered.

Here is one run without any serialization error:

However, with less luck I can encounter this:

The serialization errors now occur on COMMIT. That’s another big surprise when coming from Oracle. Oracle has weakest isolation levels, but the goal is to avoid errors at commit, which is critical for OLTP and even more with distributed transactions. With PostgreSQL, the commit can raise an error. But unfortunately, my PL/pgSQL procedure cannot prevent that because I cannot commit within an exception block. That’s another limitation of transaction control in procedures.

protocol=prepared

I used the default “simple” protocol here. Obviously, I want to use “prepared” because I don’t want to parse each execute: my benchmark goal is to measure execution rate, not parsing. But running what I have here (call to procedure with a rollback in the exception handler) crashes with prepared statement:

And with PostgreSQL it seems that when a client crashes, many others fear a memory corruption and prefer to abort. I even got some case where instance recovery was required. That’s annoying and not easy to troubleshoot. The core dump says segmentation fault on pquery.c:1241 when reading pstmt->utilityStmt.

In summary… small solutions and big questions

The retry logic is mandatory when benchmarking on repeatable read or serializable isolation level. Pgbench is there to do benchmarks for PostgreSQL flavors. I want to also use it to benchmark other databases with postgres-compatible APIs, like CockroachDB or YugaByteDB. Those databases require a serializable isolation level, at least for some features. But I faced many limitations:

  • I cannot use anonymous blocks with bind variables ➛ stored procedure
  • I cannot return the select result from a stored procedure (and cannot do transaction management in functions) ➛ SELECT is done out of the transaction
  • cannot commit in exception blocks ➛ remains the possibility of non-retried error

If you have any remark about this, please tell me (Twitter: @FranckPachot). Of course I’ll investigate further on the core dump and other possibilities. In my opinion, those retries must be done in the server and not in a client loop. It makes no sense to add additional round-trips and context switches for this. Pgbench is there to do TPC-B-like benchmarks where each transaction should be at most one call to the database.

--

--

Franck Pachot
Franck Pachot

Written by Franck Pachot

Developer Advocate for YugabyteDB (Open-Source, PostgreSQL-compatible Distributed SQL Database. Oracle Certified Master and AWS Data Hero.

No responses yet