Franck Pachot
3 min readAug 11, 2021

--

Here, you are building a general architecture mantra (stored procedures are bad) from a context-specific experience (SO had bad performance, You had a monolithic DB that didn't scale out, Your test/deployment procedures works better with application, Some database do not accept common languages to write procedures, "Top Architects" says ...).

Because of rumors from others bad experience, you immediately discredit the performance point, which is the most important when it comes to where to run the code. But think about the physics of it: how can you get better performance by adding network roundtrips, system calls and context switches between each statement execution? If you really look at what takes your CPU cycles on a database receiving short calls from the application, you will realize that you actually lower the DB server CPU usage by colocating some procedural code close to data. Because remote calls are more expensive than the processing itself.

Note that if you don't want to deploy stored procedures, you can also execute anonymous blocks but you cover all this under the umbrella of "stored procedures".

Please compare the time and CPU usage from:

%%timeit
cur = conn.cursor()
cur.execute("""
prepare my_single_select(text,timestamp) as
insert into demo (my_method,my_timestamp) values ($1,$2)
""")
for i in range(1000000):
cur.execute("""
execute my_single_select ('single selects',clock_timestamp())
""")
cur.execute("""
deallocate my_single_select
""");
conn.commit();

with

%%timeit
cur = conn.cursor()
cur.execute("""
do $$ begin
for i in 1..1000000 loop
insert into demo (my_method,my_timestamp) values ('DB procedure',clock_timestamp());
end loop;
end; $$;
""")
conn.commit();

This is in PostgreSQL with Python as the client language and PL/pgSQL as the server language. But you can reproduce it with anything.

Or look at the following example:

Just added the full example mentioned above:

export PGDATABASE=postgres PGUSER=postgres PGPASSWORD=postgres PGHOSTlocalhost PGPORT=5432python3 <<'PY'import psycopg2
conn = psycopg2.connect()
conn.cursor().execute("""
drop table if exists demo;
create table if not exists demo (
my_id bigint generated always as identity primary key,
my_method text,
my_timestamp timestamp
);
""");
conn.commit();
cur = conn.cursor()
cur.execute("""
prepare my_single_select(text,timestamp) as
insert into demo (my_method,my_timestamp) values ($1,$2)
""")
for i in range(1000000):
cur.execute("""
execute my_single_select ('single selects',clock_timestamp())
""")
cur.execute("""
deallocate my_single_select
""");
conn.commit();
cur = conn.cursor()
cur.execute("""
do $$ begin
for i in 1..1000000 loop
insert into demo (my_method,my_timestamp) values ('DB procedure',clock_timestamp());
end loop;
end; $$;
""")
conn.commit();
PYpsql <<< "select my_method,count(*),max(my_timestamp)-min(my_timestamp) duration from demo group by my_method;"

And this is on the same VM. I let you test between containers, or VMs, where the network is involved. And look at the CPU consumed on the database server to see that “less processing on the DB server” is actually lot of CPU in context switches, network call, and query/result message handling

sudo perf top -p $(pgrep -f "postgres: postgres postgres")

--

--

Franck Pachot

Developer Advocate for YugabyteDB (Open-Source, PostgreSQL-compatible Distributed SQL Database. Oracle Certified Master and AWS Data Hero.