pgbench retry for repeatable read transactions — first (re)tries

Trying a no-patch solution for pgbench running on repeatable read transactions, using a custom script with PL/pgSQL

7 min readAug 19, 2019

In a previous post I was running pgBench on YugaByteDB in serializable isolation level. But Serializable is optimistic and requires that the transactions are re-tried when failed. But pgBench has no retry mode. There was a patch proposed in several commit fests for that, but patch acceptance is a long journey in PostgreSQL:

WIP: Pgbench Errors and serialization/deadlock retries

WIP: Pgbench Errors and serialization/deadlock retriescommitfest.postgresql.org

Rather than patching pgbench.c I’m trying to implement a retry logic with the preferred procedural code: plpgsql. This is not easy because there are many limits with transaction management in procedures. So let’s start simple with a Repetable Read isolation level and trying to get most of the threads not failing before the end.

pgBench simple-update builtin

I create a database for my demo with Repeatable Read default isolation level, and initialize the pgBench schema:

## restart the server
pg_ctl -l $PGDATA/logfile restart## create the database and the procedure
psql -e <<'PSQL'
-- re-create the demo database and set serializable as default
drop database if exists franck;
create database franck;
alter database franck set default_transaction_isolation='repeatable read';
PSQLpgbench --initialize --init-steps=dtgvpf franck

alter database set default_transaction_isolation=’repeatable read’;
\! pgbench — initialize — init-steps=dtgvpf

Then I run the built-in simple update, from 10 threads:

pgbench --builtin=simple-update --time 30 --jobs=10 --client=10 franck

Here is the disappointing result:

My clients fail quickly as soon as they encounter a serialization error and all finishes with few threads only. Those errors are normal in a MVCC database. Maximum concurrency is achieved with optimistic locking. It is up to the application to re-try when encountering a serialization error.

But pgBench has no option for that. To workaround this I create a procedure that does the same job as the simple-update builtin, so that I have a procedural language (PL/pgSQL) to do those retries.

pgBench simple-update script

Here is what is actually harcoded for the simple-update builtin:

cat > /tmp/simple-update.sql <<'CAT'
 -- simple-update <builtin: simple update>
 \set aid random(1, 100000 * :scale)
 \set bid random(1, 1 * :scale)
 \set tid random(1, 10 * :scale)
 \set delta random(-5000, 5000)
 BEGIN;
   UPDATE pgbench_accounts SET abalance = abalance + :delta 
    WHERE aid = :aid;
   SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
     VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
 END;
CAT

I can run it with:

pgbench --file=/tmp/simple-update.sql --time 30 --jobs=10 --client=10 franck

Of course, the result is the same as with the built-in: threads die as soon as they encounter a serialization error.

pgBench simple-update anonymous block

My first idea was to run an anonymous block in order to add some procedural code for retries:

cat > /tmp/simple-update.sql <<'CAT'
 -- simple-update <builtin: simple update>
 \set aid random(1, 100000 * :scale)
 \set bid random(1, 1 * :scale)
 \set tid random(1, 10 * :scale)
 \set delta random(-5000, 5000)
 DO
 $$
 BEGIN;
   UPDATE pgbench_accounts SET abalance = abalance + :delta 
    WHERE aid = :aid;
   SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
     VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
 END;
 $$
CAT

but this is not possible because bind variables are not possible in anonymous blocks:

ERROR: bind message supplies 7 parameters, but prepared statement “” requires 0

pgBench simple-update procedure

Then I create a stored procedure for that procedural code.

Here is the call:

cat > /tmp/simple-update.sql <<'CAT'
 -- simple-update <builtin: simple update>
 \set aid random(1, 100000 * :scale)
 \set bid random(1, 1 * :scale)
 \set tid random(1, 10 * :scale)
 \set delta random(-5000, 5000)
  call SIMPLE_UPDATE_RETRY(:aid, :bid, :tid, :delta);
  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
CAT

I keep the SELECT in the client as I cannot return a resultset from a procedure.

The UPDATE and INSERT are run into the procedure:

psql -e <<'PSQL'
\connect franck
create or replace procedure SIMPLE_UPDATE_RETRY
 (p_aid integer, p_bid integer, p_tid integer, p_delta integer) AS $$
 BEGIN
    UPDATE pgbench_accounts SET abalance = abalance + p_delta
      WHERE aid = p_aid;
     INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) 
      VALUES (p_tid, p_bid, p_aid, p_delta, CURRENT_TIMESTAMP);
 END;
 $$ language plpgsql;
PSQL

Of course, I’ll get the same error as there’s no retry logic yet:

pgbench --file=/tmp/simple-update.sql --time 30 --jobs=10 --client=10 franck

ERROR: could not serialize access due to concurrent update

pgBench simple-update procedure with retry

Now that I have a procedure I can add the retry logic:

a loop to be able to retry
in the loop, a block with exception
when no error is encountered, exit the loop
when error is encountered, rollback and continue the loop
in case there are too many retries, finally raise the error

psql -e <<'PSQL'
\connect franck
create or replace procedure SIMPLE_UPDATE_RETRY
 (p_aid integer, p_bid integer, p_tid integer, p_delta integer) AS $$
 declare
  retries integer:=0;
 BEGIN
  loop
   begin
     UPDATE pgbench_accounts SET abalance = abalance + p_delta
      WHERE aid = p_aid;
     INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) 
      VALUES (p_tid, p_bid, p_aid, p_delta, CURRENT_TIMESTAMP);
     -- exits the loop as we got no error after all DML
     exit; 
   exception
     when serialization_failure then
       -- need to rollback to run in a new snapshot
       rollback;
       -- give up after 10 retries
       if retries >10 then
        raise notice 'Give Up after % retries. tid=%',retries,p_tid;
        raise;
       end if;
       -- continue the retry loop
   end;
   retries=retries+1;
  end loop;
  commit;
  if retries > 2 then
   raise notice 'Required % retries (tid=%)',retries,p_tid;
  end if;
 END;
 $$ language plpgsql;
PSQL

Note that the “rollback” is required. The exception block does a rollback to savepoint, but then it continues to run in the same snapshot. Without starting another transaction the same serialization error will be encountered.

Here is one run without any serialization error:

However, with less luck I can encounter this:

The serialization errors now occur on COMMIT. That’s another big surprise when coming from Oracle. Oracle has weakest isolation levels, but the goal is to avoid errors at commit, which is critical for OLTP and even more with distributed transactions. With PostgreSQL, the commit can raise an error. But unfortunately, my PL/pgSQL procedure cannot prevent that because I cannot commit within an exception block. That’s another limitation of transaction control in procedures.

protocol=prepared

I used the default “simple” protocol here. Obviously, I want to use “prepared” because I don’t want to parse each execute: my benchmark goal is to measure execution rate, not parsing. But running what I have here (call to procedure with a rollback in the exception handler) crashes with prepared statement:

And with PostgreSQL it seems that when a client crashes, many others fear a memory corruption and prefer to abort. I even got some case where instance recovery was required. That’s annoying and not easy to troubleshoot. The core dump says segmentation fault on pquery.c:1241 when reading pstmt->utilityStmt.

In summary… small solutions and big questions

The retry logic is mandatory when benchmarking on repeatable read or serializable isolation level. Pgbench is there to do benchmarks for PostgreSQL flavors. I want to also use it to benchmark other databases with postgres-compatible APIs, like CockroachDB or YugaByteDB. Those databases require a serializable isolation level, at least for some features. But I faced many limitations:

I cannot use anonymous blocks with bind variables ➛ stored procedure
I cannot return the select result from a stored procedure (and cannot do transaction management in functions) ➛ SELECT is done out of the transaction
cannot commit in exception blocks ➛ remains the possibility of non-retried error

If you have any remark about this, please tell me (Twitter: @FranckPachot). Of course I’ll investigate further on the core dump and other possibilities. In my opinion, those retries must be done in the server and not in a client loop. It makes no sense to add additional round-trips and context switches for this. Pgbench is there to do TPC-B-like benchmarks where each transaction should be at most one call to the database.

pgbench retry for repeatable read transactions — first (re)tries

Trying a no-patch solution for pgbench running on repeatable read transactions, using a custom script with PL/pgSQL

WIP: Pgbench Errors and serialization/deadlock retries

WIP: Pgbench Errors and serialization/deadlock retries

pgBench simple-update builtin

pgBench simple-update script

pgBench simple-update anonymous block

pgBench simple-update procedure

pgBench simple-update procedure with retry

protocol=prepared

In summary… small solutions and big questions

Written by Franck Pachot

No responses yet