Oracle global vs. partition level statistics CBO usage
Global statistics are complex to gather. Gathering on the whole table can be very long and doesn’t ‘scale’ because the duration will increase with the volume. Incremental gathering can be a solution but has its side effects (such as the size of the synopsis). But having stale global statistics can be dangerous. Do you know when the optimizer bases its estimation on global or on partition level statistics? The partition level statistics are used only:
- when partition pruning iterates on only one single partition
- when this partition is known at optimization time during the parse phase
This is clearly explained in Jonathan Lewis ‘Cost-Based Oracle Fundamentals’:
I also tested on the current release (18c) to verify this. Here is the test-case.
I create a table partitioned by year, fill 100 rows in one partition and 5000 rows in another. Then I gather the statistics:
SQL> create table DEMO (day date) partition by range(day) (
partition P2018 values less than (date '2019-01-01'),
partition P2019 values less than (date '2020-01-01'),
partition P2020 values less than (date '2021-01-01'),
partition P2021 values less than (date '2022-01-01')
);Table DEMO created.SQL> insert into DEMO
select date '2019-01-01'+rownum from xmltable('1 to 100');100 rows inserted.SQL> insert into DEMO
select date '2018-01-01'-rownum/24 from xmltable('1 to 5000');5,000 rows inserted.SQL> exec dbms_stats.gather_table_stats(user,'DEMO');PL/SQL procedure successfully completed.
My goal is to show when the partition or the global statistics are used.
Here are the statistics:
SQL> select partition_name,object_type,num_rows,global_stats
,to_date((regexp_replace(
extract(dbms_xmlgen.getxmltype(
'select high_value from dba_tab_partitions
where table_owner='''||owner||'''
and table_name='''||table_name||'''
and partition_name='''||partition_name||''''
),'/ROWSET/ROW/HIGH_VALUE/text()').getstringval()
,'[^;]*apos; *([^;]*) *[^;]apos;.*','\1'))
,'yyyy-mm-dd hh24:mi:ss') high_value
from dba_tab_statistics
where owner=user and table_name='DEMO' order by 1 nulls first;PARTITION_ OBJECT_TYPE NUM_ROWS GLO HIGH_VALUE
---------- ------------ ---------- --- -----------
TABLE 5100 YES
P2018 PARTITION 5000 YES 01-JAN-2019
P2019 PARTITION 100 YES 01-JAN-2020
P2020 PARTITION 0 YES 01-JAN-2021
P2021 PARTITION 0 YES 01-JAN-2022
I’ll query partitions above 2019 where I have 100 rows. But the global statistics count 5100 rows. Let’s see which one is used.
Pstart = Pstop (PARTITION RANGE SINGLE) -> partition
This first query reads the rows from only one partition, and the optimizer knows it at parse time because I use literals:
SQL> select count(*) from DEMO where day between to_date( '2019-01-08','yyyy-mm-dd' ) and to_date( '2019-02-08','yyyy-mm-dd' ) ;COUNT(*)
----------
32SQL> select * from dbms_xplan.display_cursor(
format=>'basic +rows +outline +peeked_binds +partition');PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select count(*) from DEMO
where day between to_date( '2019-01-08','yyyy-mm-dd' )
and to_date( '2019-02-08','yyyy-mm-dd' )
Plan hash value: 849908795
----------------------------------------------------------------
| Id | Operation | Name | Rows | Pstart| Pstop |
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | PARTITION RANGE SINGLE| | 33 | 2 | 2 |
| 3 | TABLE ACCESS FULL | DEMO | 33 | 2 | 2 |
----------------------------------------------------------------
- Partition-level statistics are used when only one partition is accessed, known at the time of parsing. You see that with Pstart=Pstop = number.
Pstart <> Pstop (PARTITION RANGE ITERATOR/FULL) -> global
The second query still knows the partitions are parse time, but multiple partitions are concerned:
SQL> select count(*) from DEMO where day between to_date( '2019-01-08','yyyy-mm-dd' ) and to_date( '2020-02-08','yyyy-mm-dd' );COUNT(*)
----------
94SQL> select * from dbms_xplan.display_cursor(
format=>'basic +rows +outline +peeked_binds +partition');PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select count(*) from DEMO
where day between to_date( '2019-01-08','yyyy-mm-dd' )
and to_date( '2020-02-08','yyyy-mm-dd' )
Plan hash value: 1988821877
------------------------------------------------------------------
| Id | Operation | Name | Rows | Pstart| Pstop |
------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | PARTITION RANGE ITERATOR| | 705 | 2 | 3 |
| 3 | TABLE ACCESS FULL | DEMO | 705 | 2 | 3 |
------------------------------------------------------------------
This high estimation comes from the high number known at the table level.
- Global table level statistics are used when more than one partition is accessed, even when they are known at the time of parsing. You see that with Pstart <> Pstop.
KEY — KEY (PARTITION RANGE ITERATOR/FULL) -> global
For the third query, I use bind variables and set them to access only one partition:
SQL> variable d1 varchar2(10)
SQL> variable d2 varchar2(10)
SQL> exec :d1:='2019-01-08' ; :d2:='2019-02-08'PL/SQL procedure successfully completed.SQL> select count(*) from DEMO where day between to_date( :d1,'yyyy-mm-dd' ) and to_date( :d2,'yyyy-mm-dd' );COUNT(*)
----------
32SQL> select * from dbms_xplan.display_cursor(
format=>'basic +rows +outline +peeked_binds +partition');PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select count(*) from DEMO
where day between to_date( :d1,'yyyy-mm-dd' )
and to_date( :d2,'yyyy-mm-dd' )
Plan hash value: 203823535
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Pstart| Pstop |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | FILTER | | | | |
| 3 | PARTITION RANGE ITERATOR| | 33 | KEY | KEY |
| 4 | TABLE ACCESS FULL | DEMO | 33 | KEY | KEY |
-------------------------------------------------------------------
Peeked Binds (identified by position):
--------------------------------------
1 - :D1 (VARCHAR2(30), CSID=873): '2019-01-08'
2 - :D2 (VARCHAR2(30), CSID=873): '2019-02-08'
The cardinality is estimated correctly: the partition statistics were used.
- Partition-level statistics are used when only one partition is accessed, known at the time of parsing, even the value is known with bind peeking. You see that with KEY/KEY for Pstart/Pstop and with bind variables listed by +peeked_binds format
Of course, this accurate estimation has a drawback: executing the query again for another range of dates may re-use the same execution plan. You should code a different statement when you know that you cover something different.
Without bind peeking -> global
It is not a recommendation, but to avoid the risk mentioned above, some sessions may prefer to disable bind peeking. When bind variable peeking is disabled, the optimization is done before any value is known, so there is no possibility to know which partition is concerned.
SQL> alter session set "_optim_peek_user_binds"=false;Session altered.SQL> select count(*) from DEMO where day between to_date( :d1,'yyyy-mm-dd' ) and to_date( :d2,'yyyy-mm-dd' );COUNT(*)
----------
32SQL> select * from dbms_xplan.display_cursor(
format=>'basic +rows +outline +peeked_binds +partition');PLAN_TABLE_OUTPUT
--------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select count(*) from DEMO
where day between to_date( :d1,'yyyy-mm-dd' )
and to_date( :d2,'yyyy-mm-dd' )
Plan hash value: 203823535
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Pstart| Pstop |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | FILTER | | | | |
| 3 | PARTITION RANGE ITERATOR| | 13 | KEY | KEY |
| 4 | TABLE ACCESS FULL | DEMO | 13 | KEY | KEY |
-------------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
OPT_PARAM('_optim_peek_user_binds' 'false')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "DEMO"@"SEL$1")
END_OUTLINE_DATA
*/
Here the optimizer estimates 13/5100≈0.25% of global rows.
In conclusion, there is only one case where the partition level statistics can be used: partition pruning to one single partition, known at parse time by the literal value or the peeked bind. Then, you cannot ignore global statistics.
By the way, why did I use a range between 2019–01–08 and 2019–02–08? Probably to remind you that there’s only one month left for the #KaaS19 Belgium Tech Days …