Do you know what you are measuring with pgbench?
pgbench is a benchmark application for PostgreSQL. You define some parameters for the workload (read-only, volume of data, number of threads, cursor sharing, …) and measure the number of transactions per second. Pgbench is used a lot when one wants to compare two alternative environments, like different postgres version, different platform, different table design,…
However, a scientific approach should go beyond the simple correlation between the observed performance (transactions per seconds) and the configuration. Without a clear analysis and explanation on the cause-consequence, we cannot extrapolate from a single set of observations to a general recommendation. The goal of this post is to show what is behind this ‘transaction per second’ measure.
I’ll run another benchmark tool focused at the platform: Kevin Closson pgio, which is designed exactly for this analysis. Rather than trying to simulate all layers of an application (like pgbench) we can focus at a specific component: the PostgreSQL shared buffer cache, or the OS filesystem cache, or the storage access,…
I’m using Brendan Gregg FlameGraph here to visualize the full stack sampled by perf record
with the following flags:
perf record --call-graph dwarf -F99 -e cpu-cycles -a
I’ve compiled PostgreSQL server with the following flags:
./configure CFLAGS=" -fno-omit-frame-pointer" --enable-debug
pgbench
I’ve initialized the pgbench database with a small scale (about 100MB) as it is the only setting where we can focus the pgbench activity: with a small size, I’ll have no physical reads:
pgbench --initialize --scale=8 pgio
In the same idea, I run a read-only workload, with 12 threads:
pgbench --no-vacuum --select-only --protocol=prepared --client=12 --jobs=12 --time=120 pgio &
Then, after waiting a few minutes for the warm-up, I record perf events:
sudo perf record --call-graph dwarf -F99 -e cpu-cycles -a \
-o /tmp/perf.data sleep 60
The result is parsed to produce a flamegraph of stack samples:
sudo perf script -i /tmp/perf.data | ./stackcollapse-perf.pl | ./flamegraph.pl --width=1200 --hash --cp
Here is the result (.svg)
This is what happened in the system during the pgbench test. Pgbench, the client, spends its time on PQsendQueryPrepared and PQconsumeInput, which is the minimum that can be done with an OLTP-like well-tuned application. I’ve run with ‘--protocol=prepared’ to avoid parsing overhead which is not what I want to measure.
The postgres process is running the backend. And this is where we can realize that the real database work (run DML and commit) is not where this pgbench spending its time. Less than 15% of samples in the backend executor (ExecScan) and 6% on the CommitTransaction (even if it is a select-only workload there’s a commit here). Remains the ReadyForQuery and pq_getbyte which are about frontend-backend communication.
If you run a benchmark to measure something else than the network roundtrips and context switches involved in the client/server communication, then this pgbench workload is not the right tool.
If you benchmark to compare the CPU and RAM activity, for example because you want to choose the best compute shape from your cloud provider, then you need to run something that is focused at this activity, in a sustainable way.
pgio
I’ll use Kevin Closson ‘pgio’ which is the same approach as his ‘SLOB’ for Oracle:
The settings in pgio.conf are similar in size and number of threads (I don’t want physical I/O and this stays in cache):
UPDATE_PCT=0
RUN_TIME=60
NUM_SCHEMAS=1
NUM_THREADS=12
WORK_UNIT=255
UPDATE_WORK_UNIT=8
SCALE=100M
DBNAME=pgio
CONNECT_STRING=pgio
CREATE_BASE_TABLE=TRUE
The setup and run is easy, and again I record perf events after a little warmup:
sh ./setup.sh
sh ./runit.sh &
sudo perf record --call-graph dwarf -F99 -e cpu-cycles -a \
-o /tmp/perf.data sleep 60
Same flamegraph (using same colors):
sudo perf script -i /tmp/perf.data | ./stackcollapse-perf.pl | ./flamegraph.pl --width=1200 --hash --cp > /tmp/perf.svg
And here is the .svg result:
There’s no frontend work here because of all runs from a PL/pgSQL loop and then no roundtrip, network and context switch is there to influence my measures. Most of the activity is in the query executor, accessing the shared buffers. This is what you want if you want to compare some platform configurations like:
- cloud compute shapes
- NUMA
- large pages
- memory settings
- filesystem cache
- compression / encryption
- various intel security bugs mitigation patches
- …
And instead of ‘transaction per second’ pgio will measure the number of buffers read per second and the cache hits.
In summary…
Pgbench is not the tool if you want to measure specific platform components, or the postgres components interfacing with the system (buffer cache, WAL, writer, …). Pgbench can be used to test the database for the application. But in all case, one number like ‘transactions per second’ is not sufficient. FlameGraph can help to visualize what is involved behind this measure.
Update 05-JUL-2019
Following some discussions at PostgresLondon2019 and after, here are some screenshots showing how it is difficult to get metrics with pgbench TPC-B because of the frequent context switches:
The Top Activity graphs are from PASH-Viewer:
Update 21-SEP-2019
PGIO is available on: