Oracle global vs. partition level statistics CBO usage

5 min readJan 8, 2019

Global statistics are complex to gather. Gathering on the whole table can be very long and doesn’t ‘scale’ because the duration will increase with the volume. Incremental gathering can be a solution but has its side effects (such as the size of the synopsis). But having stale global statistics can be dangerous. Do you know when the optimizer bases its estimation on global or on partition level statistics? The partition level statistics are used only:

when partition pruning iterates on only one single partition
when this partition is known at optimization time during the parse phase

This is clearly explained in Jonathan Lewis ‘Cost-Based Oracle Fundamentals’:

Cost-Based Oracle Fundamentals

Arthur C. Clarke once wrote that "any sufficiently advanced technology is indistinguishable from magic. " I believe…

books.google.ch

I also tested on the current release (18c) to verify this. Here is the test-case.

I create a table partitioned by year, fill 100 rows in one partition and 5000 rows in another. Then I gather the statistics:

SQL> create table DEMO (day date) partition by range(day) (
     partition P2018 values less than (date '2019-01-01'),
     partition P2019 values less than (date '2020-01-01'),
     partition P2020 values less than (date '2021-01-01'),
     partition P2021 values less than (date '2022-01-01')
    );Table DEMO created.SQL> insert into DEMO 
     select date '2019-01-01'+rownum from xmltable('1 to 100');100 rows inserted.SQL> insert into DEMO 
     select date '2018-01-01'-rownum/24 from xmltable('1 to 5000');5,000 rows inserted.SQL> exec dbms_stats.gather_table_stats(user,'DEMO');PL/SQL procedure successfully completed.

My goal is to show when the partition or the global statistics are used.

Here are the statistics:

SQL> select partition_name,object_type,num_rows,global_stats 
   ,to_date((regexp_replace(
     extract(dbms_xmlgen.getxmltype(
     'select high_value from dba_tab_partitions 
      where table_owner='''||owner||''' 
      and table_name='''||table_name||''' 
      and partition_name='''||partition_name||''''
     ),'/ROWSET/ROW/HIGH_VALUE/text()').getstringval()
     ,'[^;]*apos; *([^;]*) *[^;]apos;.*','\1'))
    ,'yyyy-mm-dd hh24:mi:ss') high_value
   from dba_tab_statistics 
   where owner=user and table_name='DEMO' order by 1 nulls first;PARTITION_ OBJECT_TYPE    NUM_ROWS GLO HIGH_VALUE 
---------- ------------ ---------- --- -----------
           TABLE              5100 YES            
P2018      PARTITION          5000 YES 01-JAN-2019
P2019      PARTITION           100 YES 01-JAN-2020
P2020      PARTITION             0 YES 01-JAN-2021
P2021      PARTITION             0 YES 01-JAN-2022

I’ll query partitions above 2019 where I have 100 rows. But the global statistics count 5100 rows. Let’s see which one is used.

Pstart = Pstop (PARTITION RANGE SINGLE) -> partition

This first query reads the rows from only one partition, and the optimizer knows it at parse time because I use literals:

SQL> select count(*) from DEMO where day between to_date( '2019-01-08','yyyy-mm-dd' ) and to_date( '2019-02-08','yyyy-mm-dd' ) ;COUNT(*)
----------
        32SQL> select * from dbms_xplan.display_cursor(
     format=>'basic +rows +outline +peeked_binds +partition');PLAN_TABLE_OUTPUT                                                   
--------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select count(*) from DEMO 
 where day between to_date( '2019-01-08','yyyy-mm-dd' ) 
 and to_date( '2019-02-08','yyyy-mm-dd' )
 
Plan hash value: 849908795
 
----------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Pstart| Pstop |
----------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |       |       |       |
|   1 |  SORT AGGREGATE         |      |     1 |       |       |
|   2 |   PARTITION RANGE SINGLE|      |    33 |     2 |     2 |
|   3 |    TABLE ACCESS FULL    | DEMO |    33 |     2 |     2 |
----------------------------------------------------------------

Partition-level statistics are used when only one partition is accessed, known at the time of parsing. You see that with Pstart=Pstop = number.

Pstart <> Pstop (PARTITION RANGE ITERATOR/FULL) -> global

The second query still knows the partitions are parse time, but multiple partitions are concerned:

SQL> select count(*) from DEMO where day between to_date( '2019-01-08','yyyy-mm-dd' ) and to_date( '2020-02-08','yyyy-mm-dd' );COUNT(*)
----------
        94SQL> select * from dbms_xplan.display_cursor(
     format=>'basic +rows +outline +peeked_binds +partition');PLAN_TABLE_OUTPUT                                                   
--------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select count(*) from DEMO 
 where day between to_date( '2019-01-08','yyyy-mm-dd' ) 
 and to_date( '2020-02-08','yyyy-mm-dd' )
 
Plan hash value: 1988821877
 
------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Pstart| Pstop |
------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |       |       |       |
|   1 |  SORT AGGREGATE           |      |     1 |       |       |
|   2 |   PARTITION RANGE ITERATOR|      |   705 |     2 |     3 |
|   3 |    TABLE ACCESS FULL      | DEMO |   705 |     2 |     3 |
------------------------------------------------------------------

This high estimation comes from the high number known at the table level.

Global table level statistics are used when more than one partition is accessed, even when they are known at the time of parsing. You see that with Pstart <> Pstop.

KEY — KEY (PARTITION RANGE ITERATOR/FULL) -> global

For the third query, I use bind variables and set them to access only one partition:

SQL> variable d1 varchar2(10)
SQL> variable d2 varchar2(10)
SQL> exec :d1:='2019-01-08' ; :d2:='2019-02-08'PL/SQL procedure successfully completed.SQL> select count(*) from DEMO where day between to_date( :d1,'yyyy-mm-dd' ) and to_date( :d2,'yyyy-mm-dd' );COUNT(*)
----------
        32SQL> select * from dbms_xplan.display_cursor(
     format=>'basic +rows +outline +peeked_binds +partition');PLAN_TABLE_OUTPUT                                                   
--------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select count(*) from DEMO 
 where day between to_date( :d1,'yyyy-mm-dd' ) 
 and to_date( :d2,'yyyy-mm-dd' )
 
Plan hash value: 203823535
 
-------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Pstart| Pstop |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |       |       |       |
|   1 |  SORT AGGREGATE            |      |     1 |       |       |
|   2 |   FILTER                   |      |       |       |       |
|   3 |    PARTITION RANGE ITERATOR|      |    33 |   KEY |   KEY |
|   4 |     TABLE ACCESS FULL      | DEMO |    33 |   KEY |   KEY |
-------------------------------------------------------------------
 

Peeked Binds (identified by position):
--------------------------------------
 
   1 - :D1 (VARCHAR2(30), CSID=873): '2019-01-08'
   2 - :D2 (VARCHAR2(30), CSID=873): '2019-02-08'

The cardinality is estimated correctly: the partition statistics were used.

Partition-level statistics are used when only one partition is accessed, known at the time of parsing, even the value is known with bind peeking. You see that with KEY/KEY for Pstart/Pstop and with bind variables listed by +peeked_binds format

Of course, this accurate estimation has a drawback: executing the query again for another range of dates may re-use the same execution plan. You should code a different statement when you know that you cover something different.

Without bind peeking -> global

It is not a recommendation, but to avoid the risk mentioned above, some sessions may prefer to disable bind peeking. When bind variable peeking is disabled, the optimization is done before any value is known, so there is no possibility to know which partition is concerned.

SQL> alter session set "_optim_peek_user_binds"=false;Session altered.SQL> select count(*) from DEMO where day between to_date( :d1,'yyyy-mm-dd' ) and to_date( :d2,'yyyy-mm-dd' );COUNT(*)
----------
        32SQL> select * from dbms_xplan.display_cursor(
     format=>'basic +rows +outline +peeked_binds +partition');PLAN_TABLE_OUTPUT                                                   
--------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select count(*) from DEMO 
 where day between to_date( :d1,'yyyy-mm-dd' )
 and to_date( :d2,'yyyy-mm-dd' )
 
Plan hash value: 203823535
 
-------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Pstart| Pstop |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |       |       |       |
|   1 |  SORT AGGREGATE            |      |     1 |       |       |
|   2 |   FILTER                   |      |       |       |       |
|   3 |    PARTITION RANGE ITERATOR|      |    13 |   KEY |   KEY |
|   4 |     TABLE ACCESS FULL      | DEMO |    13 |   KEY |   KEY |
-------------------------------------------------------------------
 
Outline Data
-------------
 
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      OPT_PARAM('_optim_peek_user_binds' 'false')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "DEMO"@"SEL$1")
      END_OUTLINE_DATA
  */

Here the optimizer estimates 13/5100≈0.25% of global rows.

In conclusion, there is only one case where the partition level statistics can be used: partition pruning to one single partition, known at parse time by the literal value or the peeked bind. Then, you cannot ignore global statistics.

By the way, why did I use a range between 2019–01–08 and 2019–02–08? Probably to remind you that there’s only one month left for the #KaaS19 Belgium Tech Days …

Oracle global vs. partition level statistics CBO usage

Cost-Based Oracle Fundamentals

Arthur C. Clarke once wrote that "any sufficiently advanced technology is indistinguishable from magic. " I believe…

Pstart = Pstop (PARTITION RANGE SINGLE) -> partition

Pstart <> Pstop (PARTITION RANGE ITERATOR/FULL) -> global

KEY — KEY (PARTITION RANGE ITERATOR/FULL) -> global

Without bind peeking -> global

Order Tickets " Techdays Belgium

Edit description

Written by Franck Pachot

Responses (1)