Oracle & Postgres JDBC Fetch Size
TL;DR — By default PostgreSQL ResultSet is not a server cursor but a client buffer. The whole result is fetched before the first next() call.
It is usually a good idea to start with default settings rather than trying to ‘tune’ any possible parameter. Defaults are often suited to the most common cases. However, there’s a big exception to it when using JDBC drivers: the default fetch size is probably not the right setting. Here is an example with the same Java code to show the fetching behavior in Oracle and PostgreSQL.
Java example
Here is my Java code which is exactly the same for Oracle and PostgresSQL
import java.io.*;
import java.sql.*;
import java.util.*;
import java.time.*;
import oracle.jdbc.*;public class JDBC {
public static void println(String text){
System.out.println(Instant.now().toString() +":" + text);
}
public static void main(String[] args)
throws SQLException,InterruptedException {
try (Connection c = (Connection) DriverManager.getConnection(
args[2],args[0],args[1]) // url, user, password
) {
c.setAutoCommit(false);
try (PreparedStatement s = c.prepareStatement(
"select n,mytime(n) t from demo"
)) {
//s.setFetchSize(5);
Thread.sleep(1000);
println(" PRS "+s);
try ( ResultSet rs = s.executeQuery() ) {
Thread.sleep(1000);
println(" EXE "+rs);
while ( rs.next()) {
Thread.sleep(1000);
println(" FCH "+rs.getLong("N")+" "+rs.getTimestamp("T"));
}
System.out.println("fetch size: "+rs.getFetchSize());
}
}
}
}
}
This is very simple. The user, password and JDBC url are passed as arguments. The DriverManager finds the right driver for it (ojdbc8.jar and postgresql-jdbc.jar are in my CLASSPATH).
I prepare a simple select which I execute and fetch the rows. The query calls a “mytime()” stored function that returns the execution time timestamp, and waits one second for each call. The goal is to simulate a long query and show when the rows are actually read and when they are fetched.
Oracle JDBC Fetch Size
In order to run it on Oracle I create the following table and function:
alter session set current_schema=demo;
create or replace function mytime(n number) return timestamp as
begin
dbms_session.sleep(1);
return current_timestamp;
end;
/
show errors
create table demo (n int);
insert into demo
select to_number(column_value) N from xmltable('1000 to 1020');
commit;
The function is non-deterministic, and then it is executed for each row.
$ORACLE_HOME/jdk/bin/java JDBC "demo" "demo" "jdbc:oracle:thin:@(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))"
Here is the result. Each fetch line shows the display timestamp first, and then the timestamp from the MYTIME() function which is the read time.
The Java program waits 1 second after each call: prepared at 18:33:10, executed at 18:33:21 — that’s a 10 second execution time. Then I fetched one row every second from 13:33:22 to 18:33:53:
With Oracle, the default JDBC fetch size is 10 rows, and the EXECUTE call also does the first fetch. Then here is the timing:
- at 18:33:10 the prepareStatement is a very fast PARSE call as the actual optimization is deferred to the first execution.
- at 18:33:22 the EXECUTE and first FETCH of the cursor occurs. Here it takes 10 seconds to get the first 10 rows. Those rows were read between 18:33:12 and 18:33:21, then fetched, then displayed between 18:33:22 and 18:33:31.
- Then, the second FETCH call has read 10 more rows from 18:33:32 to 18:33:40 and displayed them from 18:33:42 to 18:33:51
- Then one row remain: read, fetched and displayed.
PostgreSQL default JDBC Fetch Size
Now I’ll run the same on a PostgreSQL database. In PostgreSQL, current_timestamp is consistent with the query time, then I use clock_timestamp() to get the actual read time.
\connect demo postgres
create table demo(n int);
insert into demo
select generate_series from generate_series(1000,1020);
create function mytime(n int) returns timestamp as
$$
begin
perform pg_sleep(1);
return clock_timestamp();
end;
$$
language plpgsql;
Here is the run with the same Java code:
\! $ORACLE_HOME/jdk/bin/java JDBC "postgres" "demo" "jdbc:postgresql://localhost:5432/demo"
We don’t see any read-fetch pattern here and we have a clue with the fetch size that is set to zero by default. Rows were read from 18:36:07 to 18:36:27 and that matches with the executeQuery() time. All rows were read, and buffered, before displaying the first one. That’s the default behavior: the JDBC ResultSet is not a cursor but a buffer.
PostgreSQL JDBC set Fetch Size
In the Java code above, there’s a setFetchSize() in comments.
Here, I remove the comment to explicitly set the fetch size to 5 (for the demo, in real life you rather set 100 or 1000 to avoid too many roundtrips and context switches).
try (PreparedStatement s = c.prepareStatement(
"select n,mytime(n) t from demo"
)) {
s.setFetchSize(5);
Here is the execution:
$ORACLE_HOME/jdk/bin/java JDBC "postgres" "demo" "jdbc:postgresql://localhost:5432/demo"
There, we can see 5 rows read from 19:13:56 to 19:14:00 during the executeQuery() first FETCH, displayed from 19:14:02 to 19:14:06 and then again the same with 5 rows…
In summary
The behavior of the application can be really different.
The Oracle default has several advantages:
- we have a real SQL Forward-Only cursor rather than a buffer
- we can read a few rows from a large result. Like when de-queueing (https://medium.com/@FranckPachot/oracle-hibernate-de-queuing-7454432a7738) or Top-N results.
- we can start to process the result without waiting for the completion
However, it has also some drawbacks:
- we can get an ORA-1555 if we read slowly and the MVCC retention has expired
- we waste time in roundtrip and context switch latencies if we forget to increase the fetch size
- we can get an execution error after having already processed some rows
Note that if there is a blocking operation, like an ORDER BY that has to read the whole rows before returning the first one, the whole result is read and buffered on the server anyway.
PostgreSQL default seems to be more intuitive for today’s developers who are more used to lists than cursors. But this may change with the trend of streaming. The good thing is that you can choose. And anyway, in both case, the default (all rows for PostgreSQL, 10 rows for Oracle) is probably something to change in many cases.
Note that I’ve set AutoCommit to false, and this is required for the PostgreSQL cursor behavior when setting a non-zero fetch size, or it still buffers the whole rows. Anyway, that’s probably the worst JDBC default: never leave Auto Commit to true.