Optimizer statistics – Gathering Statistics and Histograms

It’s such a mess with those histograms and statistics that I had to rewrite this post at least ten times. It’s such a mess that I still cannot believe how poorly maintained the global statistics are and how badly created the histograms are. Because of this mess and the depth of the topic this post is a bit longer than usual, but I find it really useful and I think it can give answers to many questions.

First we shall review how Oracle is gathering statistics and histograms on partition level. Basically most of the things described for partitions are true for non partitioned tables as well.

LOCAL STATISTICS

Focus mainly on the highlighted lines, which I describe later:

SQL> begin
DBMS_STATS.GATHER_TABLE_STATS(ownname => '', tabname => 'IPT', partname => 'SYS_P101',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, granularity => 'PARTITION',
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY');
  5  end;
  6  /
DBMS_STATS: Start gather table stats -- tabname: IPT
DBMS_STATS: Started table X.IPT.SYS_P101 at 10-MAR-15 03.30.18.277642000 PM +02:00. Execution phase: 1 stattype: DATA
DBMS_STATS: Specified granularity = PARTITION, Fixed granularity = PARTITION
DBMS_STATS: parameters ** pfix:  granularity: PARTITION gIdxGranularity:  executionPhase: 1 pname: SYS_P101 stime: 03-10-2015 15:30:18 method_opt: FOR ALL COLUMNS SIZE SKEWONLY
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: 1 granularity: PARTITION global_requested:  pfix:
DBMS_STATS: Start gather part -- conctx.global_requested:  gran: PARTITION execution phase: 1
DBMS_STATS: gather stats on partition SYS_P101: unable to incrementally maintain;
DBMS_STATS: Start gather_stats.. pfix:  ownname: Xtabname: IPT pname: SYS_P101 spname:  execution phase: 1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y         Y    Y    Y    Y                             PART_TIME
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y                             ID_A
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y                             ID_B
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y                             ID_C
DBMS_STATS:   Y    Y         Y    Y    Y    Y                             INSERT_DATE
DBMS_STATS: Approximate NDV Options
DBMS_STATS: NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL
DBMS_STATS: Starting query at 10-MAR-15 03.30.18.286983000 PM +02:00
DBMS_STATS: select /*+  full(t)    no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
*/to_char(count("PART_TIME")),to_char(substrb(dump(min("PART_TIME"),16,0,32),1,120)),to_char(substrb(dump(max("PART_TIME"),16,0,32),1,120)),to_char(count("ID_A")),to_char(substrb(dump(min("ID_A"),16,0,32),1,120)),to_char(substrb(dump(max("ID_A"),16,0,32),1,120)),to_char
(count("ID_B")),to_char(substrb(dump(min("ID_B"),16,0,32),1,120)),to_char(substrb(dump(max("ID_B"),16,0,32),1,120)),to_char(count("ID_C")),to_char(substrb(dump(min("ID_C"),16,0,32),1,120)),to_char(substrb(dump(max("ID_C"),16,0,32),1,120)),to_char(count("INSERT_DATE")),t
o_char(substrb(dump(min("INSERT_DATE"),16,0,32),1,120)),to_char(substrb(dump(max("INSERT_DATE"),16,0,32),1,120)) from "X"."IPT" t  where TBL$OR$IDX$PART$NUM("X"."IPT",0,4,0,"ROWID") = :objn /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/
DBMS_STATS: Ending query at 10-MAR-15 03.30.18.992624000 PM +02:00
DBMS_STATS:  no histogram: setting density to 1/ndv (1)
DBMS_STATS:  no histogram: setting density to 1/ndv (.1111111111111111111111111111111111111111)
DBMS_STATS:  no histogram: setting density to 1/ndv (.001176470588235294117647058823529411764706)
DBMS_STATS:  no histogram: setting density to 1/ndv (.00055005500550055005500550055005500550055)
DBMS_STATS:  no histogram: setting density to 1/ndv (1)
DBMS_STATS: Iteration 1, percentage .1375 nblks: 17797
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:                       Y    Y    Y                             PART_TIME
DBMS_STATS:                       Y    Y    Y                             ID_A
DBMS_STATS:                       Y    Y    Y                             ID_B
DBMS_STATS:                       Y    Y    Y                             ID_C
DBMS_STATS:                       Y    Y    Y                             INSERT_DATE
DBMS_STATS: Starting query at 10-MAR-15 03.30.18.995350000 PM +02:00
DBMS_STATS: create global temporary table sys.ora_temp_1_ds_420004 on commit preserve rows cache noparallel as select /*+  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
*/"PART_TIME","ID_A","ID_B","ID_C","INSERT_DATE"  from "X"."IPT" sample (   .1375000000)  t  WHERE  1 = 2
DBMS_STATS: Ending query at 10-MAR-15 03.30.18.999328000 PM +02:00
DBMS_STATS: Starting query at 10-MAR-15 03.30.18.999358000 PM +02:00
DBMS_STATS: grant select,insert on sys.ora_temp_1_ds_420004 to "X"
DBMS_STATS: Starting query at 10-MAR-15 03.30.19.000419000 PM +02:00
DBMS_STATS: alter table sys.ora_temp_1_ds_420004 noparallel
DBMS_STATS: Starting query at 10-MAR-15 03.30.19.002096000 PM +02:00
DBMS_STATS: insert /*+ append */ into sys.ora_temp_1_ds_420004 SELECT /*+  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad  */"PART_TIME","ID_A","ID_B","ID_C","INSERT_DATE"  from
"X"."IPT" sample (   .1375000000)  t  WHERE TBL$OR$IDX$PART$NUM("X"."IPT",0,4,0,"ROWID") = :objn UNION ALL SELECT  "PART_TIME", "ID_A", "ID_B", "ID_C", "INSERT_DATE" FROM sys.ora_temp_1_ds_420004 WHERE 1 = 0  (objn=88842)
DBMS_STATS: Ending query at 10-MAR-15 03.30.19.049667000 PM +02:00
DBMS_STATS: Building Histogram for PART_TIME
DBMS_STATS:  bktnum=-2, nnv=4000000, snnv=5500, sndv=1, est_ndv=1, mnb=254
DBMS_STATS:  Trying frequency histogram
DBMS_STATS: Starting query at 10-MAR-15 03.30.19.050146000 PM +02:00
DBMS_STATS: select substrb(dump(val,16,0,32),1,120) ep, cnt from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad  */"PART_TIME"
val,count(*) cnt  from sys.ora_temp_1_ds_420004 t where "PART_TIME" is not null  group by "PART_TIME") order by val
DBMS_STATS: Building Histogram for ID_A
DBMS_STATS:  bktnum=-2, nnv=4000000, snnv=5500, sndv=9, est_ndv=9, mnb=254
DBMS_STATS:  Trying frequency histogram
DBMS_STATS: Starting query at 10-MAR-15 03.30.19.053808000 PM +02:00
DBMS_STATS: select substrb(dump(val,16,0,32),1,120) ep, cnt from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad  */"ID_A" val,count(*)
cnt  from sys.ora_temp_1_ds_420004 t where "ID_A" is not null  group by "ID_A") order by val
DBMS_STATS: Building Histogram for ID_B
DBMS_STATS:  bktnum=-2, nnv=4000000, snnv=5500, sndv=850, est_ndv=850, mnb=254
DBMS_STATS:  Trying height balanced histogram
DBMS_STATS: Starting query at 10-MAR-15 03.30.19.057279000 PM +02:00
DBMS_STATS: select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sumrep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select
val,min(bkt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
*/"ID_B" val, ntile(254) over (order by "ID_B") bkt  from sys.ora_temp_1_ds_420004 t where "ID_B" is not null) group by val) group by maxbkt order by maxbkt
DBMS_STATS: Ending query at 10-MAR-15 03.30.19.072861000 PM +02:00
DBMS_STATS:  scaling density: pct=.1375,ssize=5539,popcnt=34,ssizesq=52873,popcntsq=1156,slsv=0,invden=589.597134404547827600208828818376936017
DBMS_STATS:  kkesdv_p: sndv=589.597134404547827600208828818376936017,snnv=5539,nnv=4028363.63636363636363636363636363636364,hind=0,slsv=0
DBMS_STATS:  totcnt=1,popndv=0,lsv=0
DBMS_STATS:  npndv=590
DBMS_STATS:  scaled density=.001694915254237288135593220338983050847458,invden=590
DBMS_STATS:   Histogram gathering flags: 7
DBMS_STATS: Building Histogram for ID_C
DBMS_STATS:  bktnum=-2, nnv=4000000, snnv=5500, sndv=1818, est_ndv=1818, mnb=254
DBMS_STATS:  Trying height balanced histogram
DBMS_STATS: Starting query at 10-MAR-15 03.30.19.076613000 PM +02:00
DBMS_STATS: select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sumrep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select
val,min(bkt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
*/"ID_C" val, ntile(254) over (order by "ID_C") bkt  from sys.ora_temp_1_ds_420004 t where "ID_C" is not null) group by val) group by maxbkt order by maxbkt
DBMS_STATS: Ending query at 10-MAR-15 03.30.19.094222000 PM +02:00
DBMS_STATS:  Skewness check: FALSE
DBMS_STATS:  Discarding Histogram (not necessary), ssize = 5539
DBMS_STATS:   Histogram gathering flags: 7
DBMS_STATS: Building Histogram for INSERT_DATE
DBMS_STATS:  bktnum=-2, nnv=4000000, snnv=5500, sndv=1, est_ndv=1, mnb=254
DBMS_STATS:  Trying frequency histogram
DBMS_STATS: Starting query at 10-MAR-15 03.30.19.097484000 PM +02:00
DBMS_STATS: select substrb(dump(val,16,0,32),1,120) ep, cnt from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad  */"INSERT_DATE"
val,count(*) cnt  from sys.ora_temp_1_ds_420004 t where "INSERT_DATE" is not null  group by "INSERT_DATE") order by val
.
.
.

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.05
SQL>

From line 25-28 we see how oracle is fully scanning our partition to create basic statistics as number of distinct values, min/max value, number of rows and so on. From line 43-52 we can see how Oracle is creating temporary table and inserts less than 1% sample data in it. Then for each column it queries the sample data in order to create histogram. The type of histogram – frequency or height is determined from the number of distinct values. After the histogram is created, Oracle checks the data skewness and if it’s skewed, then the histogram is preserved, otherwise it will be ignored – line 88, 89.

“So far so good”. That was simple example with METHOD_OPT=>’FOR ALL COLUMNS SIZE SKEWONLY’, but this is not the default behaviour.

By default Oracle have all kind of automagical things in behind, so in here it does it too. The default value for METHOD_OPT is ‘FOR ALL COLUMNS SIZE AUTO’ which means that before gathering histogram Oracle checks the usage statistics for this column. If our column is used as filter predicate, join, etc in some query, oracle automatically puts a record in sys.col_usage$ to flag this. Consequently, the next time when we gather statistics, Oracle will first checks sys.col_usage$ table to determine the column usage and if it’s used, it will try to create an histogram for it. Of course if the data is non-skew, the histogram is ignored.

Let’s see it in action:

SQL> begin
DBMS_STATS.GATHER_TABLE_STATS(ownname => '', tabname => 'IPT', partname => 'SYS_P101',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, granularity => 'PARTITION',
method_opt => 'FOR ALL COLUMNS SIZE AUTO');
end;
  6  /
DBMS_STATS: Start gather table stats -- tabname: IPT
.
.
.
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y         Y                                            PART_TIME
DBMS_STATS:   Y    Y    Y    Y                                            ID_A
DBMS_STATS:   Y    Y    Y    Y                                            ID_B
DBMS_STATS:   Y    Y    Y    Y                                            ID_C
DBMS_STATS:   Y    Y         Y                                            INSERT_DATE
DBMS_STATS: Approximate NDV Options
.
.
.
DBMS_STATS: Finished table X.IPT.SYS_P101 at 10-MAR-15 04.31.32.266807000 PM +02:00

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.03
SQL>

From line 11-17 we can see that Oracle is not even trying to create histograms for our columns. This is normal because we have never used that table in queries. So now we start 2 queries and try again:

SQL> select count(*) from ipt where id_a=5;

  COUNT(*)
----------
    406000

Elapsed: 00:00:00.56
SQL> select count(*) from ipt where id_b=99;

  COUNT(*)
----------
         0

Elapsed: 00:00:00.17
SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.19
SQL> begin
DBMS_STATS.GATHER_TABLE_STATS(ownname => '', tabname => 'IPT', partname => 'SYS_P101',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, granularity => 'PARTITION',
method_opt => 'FOR ALL COLUMNS SIZE AUTO');
end;
  6  /
DBMS_STATS: Start gather table stats -- tabname: IPT
.
.
.
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y         Y                                            PART_TIME
DBMS_STATS:   Y    Y    Y    Y    Y    Y         Y                        ID_A
DBMS_STATS:   Y    Y    Y    Y    Y    Y         Y                        ID_B
DBMS_STATS:   Y    Y    Y    Y                                            ID_C
DBMS_STATS:   Y    Y         Y                                            INSERT_DATE
.                                                          INSERT_DATE
.
.
DBMS_STATS: Building Histogram for ID_A
DBMS_STATS:  bktnum=-1, nnv=4000000, snnv=5500, sndv=9, est_ndv=9, mnb=254
DBMS_STATS:  Trying frequency histogram
DBMS_STATS: Starting query at 10-MAR-15 04.48.39.392345000 PM +02:00
DBMS_STATS: select substrb(dump(val,16,0,32),1,120) ep, cnt from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad  */"ID_A" val,count(*)
cnt  from sys.ora_temp_1_ds_420006 t where "ID_A" is not null  group by "ID_A") order by val
DBMS_STATS: Building Histogram for ID_B
DBMS_STATS:  bktnum=-1, nnv=4000000, snnv=5500, sndv=850, est_ndv=850, mnb=254
DBMS_STATS:  Trying height balanced histogram
DBMS_STATS: Starting query at 10-MAR-15 04.48.39.396334000 PM +02:00
DBMS_STATS: select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sumrep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select
val,min(bkt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
*/"ID_B" val, ntile(254) over (order by "ID_B") bkt  from sys.ora_temp_1_ds_420006 t where "ID_B" is not null) group by val) group by maxbkt order by maxbkt
DBMS_STATS: Ending query at 10-MAR-15 04.48.39.404927000 PM +02:00
DBMS_STATS:  Discarding Histogram (not necessary), ssize = 5535
.
.
.
DBMS_STATS: Finished table X.IPT.SYS_P101 at 10-MAR-15 04.48.39.549546000 PM +02:00

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.03
SQL>

Again, everything is as expected. Histograms are gathered only for those columns which are used and even the histogram for id_b is disregarded. So far so good, in a dust free environment as this one.

Now let’s imagine dynamic system as BI or other kind of system which lets our users to create costume reports. That means that our column usage will change every single day. In the beginning of the life of this database all column histograms will be empty and then little by little Oracle will add histograms when system usage is accumulated. For local statistics this is not a big trouble. But when we gather global statistics this can cause us a lot of troubles, which we will review a bit later.
The other issue in this approach is the sample size which is less than 1%. That means that we shall hope to “get lucky” and persist our histogram. The data that we sample can be good or bad – skew or non-skew. It depends on the sample. I think Oracle shall work a lot more in here and they shall optimize everything to happen in single pass. Single full tables scan with a bit more logic on top of it, in order to generate all statistics and histograms at once. This will eliminate the sample issue and will save a lot of time. It’s not impossible and it will be way better than first scanning the whole object /as they do/, then creating a sample, which mostly means writing /keep in mind that 1% can be gigabytes/ and then scanning this sample many times.

Now let’s look at global level.

GLOBAL STATISTICS

Now for global statistics we can choose to scan the whole object, sampling it or to aggregate our statistics from the partition level statistics. The first two options are unthinkable because scanning all partitions can take few decades and sampling it another one. So the only acceptable and intelligent option is to aggregate the statistics. And actually that’s the idea behind partitioning, to maintain every partition separately, while having one big object made from all of them.

If we read the documentation and some MOS notes about global statistics maintenance/aggregation we will notice that everything is shiny and pinky, but actually is brown and stinky.

According to the documentation, in order to incrementally maintain global statistics we have to set INCREMENTAL preference on true for our table. After that, when we gather statistics Oracle will gather stats only for those partitions that have stale or empty statistics, while incrementally maintains our global statistics.

In our test case we have only one partition without statistics and few more with fresh ones.

SQL>
begin
DBMS_STATS.GATHER_TABLE_STATS(ownname => '', tabname => 'IPT',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, granularity => 'AUTO',
method_opt => 'FOR ALL COLUMNS SIZE AUTO');
end;
  6  /
DBMS_STATS: Start gather table stats -- tabname: IPT
DBMS_STATS: Started table X.IPT. at 10-MAR-15 07.28.56.298658000 PM +02:00. Execution phase: 1 stattype: DATA
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix:  granularity: AUTO gIdxGranularity:  executionPhase: 1 pname:  stime: 03-10-2015 19:28:56 method_opt: FOR ALL COLUMNS SIZE AUTO
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: 1 granularity: GLOBAL AND PARTITION global_requested:  pfix:
DBMS_STATS: Start gather part -- conctx.global_requested:  gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: gather stats on partition SYS_P157: synopsis not gathered yet; not analyzed yet;
DBMS_STATS: Start gather_stats.. pfix:  ownname: Xtabname: IPT pname: SYS_P157 spname:  execution phase: 1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y         Y                                            PART_TIME
DBMS_STATS:   Y    Y    Y    Y    Y    Y         Y                        ID_A
DBMS_STATS:   Y    Y    Y    Y    Y    Y         Y                        ID_B
DBMS_STATS:   Y    Y    Y    Y                                            ID_C
DBMS_STATS:   Y    Y         Y                                            INSERT_DATE
.
.
.
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Start gather_stats.. pfix:  ownname: Xtabname: IPT pname:  spname:  execution phase: 1
DBMS_STATS: Synopsis Aggregation Degree: 1
DBMS_STATS: Can not derive global histogram since missing histogram at partition level, colsize = 8000000nnvtemp = 20000000
DBMS_STATS: Derive global stats from partition synopses/stats for table IPT
DBMS_STATS:  no histogram: setting density to 1/ndv (.2)
DBMS_STATS:  no histogram: setting density to 1/ndv (.1111111111111111111111111111111111111111)
DBMS_STATS:  no histogram: setting density to 1/ndv (.001001001001001001001001001001001001001001)
DBMS_STATS:  no histogram: setting density to 1/ndv (.00015745551881593449850417257124862226421)
DBMS_STATS:  no histogram: setting density to 1/ndv (.2)
DBMS_STATS: Iteration 1, percentage .0275 nblks: 88974
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:                                                               PART_TIME
DBMS_STATS:                       Y    Y         Y                        ID_A
DBMS_STATS:                       Y    Y         Y                        ID_B
DBMS_STATS:                                                               ID_C
DBMS_STATS:                                                               INSERT_DATE
DBMS_STATS: Starting query at 10-MAR-15 07.29.04.851068000 PM +02:00
DBMS_STATS: truncate table sys.ora_temp_1_ds_420011
DBMS_STATS: Starting query at 10-MAR-15 07.29.04.854979000 PM +02:00
DBMS_STATS: insert /*+ append */ into sys.ora_temp_1_ds_420011 SELECT /*+  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad  */"ID_A","ID_B", rowid SYS_DS_ALIAS_0  from "X"."IPT"
sample (   .0275000000)  t  UNION ALL SELECT  "ID_A", "ID_B", SYS_DS_ALIAS_0 FROM sys.ora_temp_1_ds_420011 WHERE 1 = 0
DBMS_STATS: Ending query at 10-MAR-15 07.29.20.128812000 PM +02:00
DBMS_STATS: Building Histogram for ID_A
DBMS_STATS:  bktnum=-1, nnv=20000000, snnv=5500, sndv=9, est_ndv=9, mnb=254
DBMS_STATS:  Trying frequency histogram
DBMS_STATS: Starting query at 10-MAR-15 07.29.20.129099000 PM +02:00
DBMS_STATS: select substrb(dump(val,16,0,32),1,120) ep, cnt from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad  */"ID_A" val,count(*)
cnt  from sys.ora_temp_1_ds_420011 t where "ID_A" is not null  group by "ID_A") order by val
DBMS_STATS: Building Histogram for ID_B
DBMS_STATS:  bktnum=-1, nnv=20000000, snnv=5500, sndv=999, est_ndv=999, mnb=254
DBMS_STATS:  Trying height balanced histogram
DBMS_STATS: Starting query at 10-MAR-15 07.29.20.133049000 PM +02:00
DBMS_STATS: select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sumrep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select
val,min(bkt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
*/"ID_B" val, ntile(254) over (order by "ID_B") bkt  from sys.ora_temp_1_ds_420011 t where "ID_B" is not null) group by val) group by maxbkt order by maxbkt
DBMS_STATS: Ending query at 10-MAR-15 07.29.20.142934000 PM +02:00
DBMS_STATS:  Discarding Histogram (not necessary), ssize = 5433
DBMS_STATS:   Histogram gathering flags: 11
DBMS_STATS: ====================================================================================================
DBMS_STATS: Statistics from clist:
DBMS_STATS: ====================================================================================================
DBMS_STATS: Number of rows in the table = 20000000
DBMS_STATS: COL_NAME                       ACL       NV        SNNVDV    NDV       SNDV      DENS      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: PART_TIME                      8         0         20000000  5         5         .2        0
DBMS_STATS:          min:
DBMS_STATS:          max:
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME                       ACL       NV        SNNVDV    NDV       SNDV      DENS      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID_A                           3         0         5433      9         9         .00000002 1
DBMS_STATS:          min:
DBMS_STATS:          max:
DBMS_STATS:          Histograms:
DBMS_STATS:          -------------------------------------------------------------------------------------------
DBMS_STATS:          BVAL EAVAL                            EDVAL
DBMS_STATS:          -------------------------------------------------------------------------------------------
DBMS_STATS:          589                                   Typ=2 Len=2: c1,3
DBMS_STATS:          1200                                  Typ=2 Len=2: c1,4
DBMS_STATS:          1801                                  Typ=2 Len=2: c1,5
DBMS_STATS:          2432                                  Typ=2 Len=2: c1,6
DBMS_STATS:          2989                                  Typ=2 Len=2: c1,7
DBMS_STATS:          3620                                  Typ=2 Len=2: c1,8
DBMS_STATS:          4205                                  Typ=2 Len=2: c1,9
DBMS_STATS:          4834                                  Typ=2 Len=2: c1,a
DBMS_STATS:          5433                                  Typ=2 Len=2: c1,b
DBMS_STATS:          Data Type: 2
DBMS_STATS:          Histogram Flags: 0 Histogram Gathering Flags: 10
DBMS_STATS:          Incremental: TRUE Fix Control 13583722: 1
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME                       ACL       NV        SNNVDV    NDV       SNDV      DENS      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID_B                           4         0         20000000  999       999       .00100100 0
DBMS_STATS:          min:
DBMS_STATS:          max:
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME                       ACL       NV        SNNVDV    NDV       SNDV      DENS      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: ID_C                           4         0         20000000  6351      6351      .00015745 0
DBMS_STATS:          min:
DBMS_STATS:          max:
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: COL_NAME                       ACL       NV        SNNVDV    NDV       SNDV      DENS      CCNT
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: INSERT_DATE                    8         0         20000000  5         5         .2        0
DBMS_STATS:          min:
DBMS_STATS:          max:
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Starting query at 10-MAR-15 07.29.20.144267000 PM +02:00
DBMS_STATS: truncate table sys.ora_temp_1_ds_420011
DBMS_STATS: Starting query at 10-MAR-15 07.29.20.145880000 PM +02:00
DBMS_STATS: drop table sys.ora_temp_1_ds_420011
DBMS_STATS: End of construct analyze using sql.
DBMS_STATS: Finished table X.IPT. at 10-MAR-15 07.29.20.154516000 PM +02:00

PL/SQL procedure successfully completed.

SQL>

We can see again, “perfect” behavior. On line 14 it detects that our new partition doesn’t have statistics and it starts gathering the statistics in the well known way. Then after we have good statistics on all partitions, on line 30 Oracle is determining the global statistics and then is trying to do the same with the histograms. And what do we see? On line 29 it says it cannot aggregate our histograms on global level, because some local histograms are missing. Well, that was expected, we are checking the skewness for every partition separately, without looking at the whole picture and discarding whatever we want.
After that, because it cannot determine our histogram from the local one, it falls backs to the old way and creates sample with a size of ’APPROX_GLOBAL AND PARTITION’. Its doing generally the same, and it has the same issues regarding gathering statistics and histograms, so I don’t think to test it in here at all.

Actually one of the biggest issues in my opinion is the column usage tracking and this rubbish way of scanning all partitions. So the best is to skip it and create histograms on all columns with skew data METHOD_OPT=>’FOR ALL COLUMNS SIZE SKEWONLY’. Sounds promising.

Let’s see how well that idea is implemented.

begin
DBMS_STATS.GATHER_TABLE_STATS(ownname => '', tabname => 'IPT',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, granularity => 'AUTO',
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY');
end;
  6  /
DBMS_STATS: Start gather table stats -- tabname: IPT
DBMS_STATS: Started table X.IPT. at 11-MAR-15 05.13.14.828278000 PM +02:00. Execution phase: 1 stattype: DATA
DBMS_STATS: Specified granularity = AUTO, Fixed granularity = GLOBAL AND PARTITION
DBMS_STATS: parameters ** pfix:  granularity: AUTO gIdxGranularity:  executionPhase: 1 pname:  stime: 03-11-2015 17:13:14 method_opt: FOR ALL COLUMNS SIZE SKEWONLY
DBMS_STATS: Start construct analyze using SQL .. Execution Phase: 1 granularity: GLOBAL AND PARTITION global_requested:  pfix:
DBMS_STATS: Start gather part -- conctx.global_requested:  gran: GLOBAL AND PARTITION execution phase: 1
DBMS_STATS: gather stats on partition SYS_P182: synopsis not gathered yet; not analyzed yet;
DBMS_STATS: Start gather_stats.. pfix:  ownname: Xtabname: IPT pname: SYS_P182 spname:  execution phase: 1
DBMS_STATS: Using approximate NDV pct=0
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y         Y    Y    Y    Y                             PART_TIME
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y    Y                        ID_A
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y    Y                        ID_B
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y    Y                        ID_C
DBMS_STATS:   Y    Y         Y    Y    Y    Y                             INSERT_DATE
.
.
.
DBMS_STATS: Building Histogram for ID_C
DBMS_STATS:  bktnum=-2, nnv=4000000, snnv=5500, sndv=1824, est_ndv=1824, mnb=254
DBMS_STATS:  Trying height balanced histogram
DBMS_STATS: Starting query at 11-MAR-15 05.13.21.150695000 PM +02:00
DBMS_STATS: select min(minbkt),maxbkt,substrb(dump(min(val),16,0,32),1,120) minval,substrb(dump(max(val),16,0,32),1,120) maxval,sum(rep) sumrep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from (select
val,min(bkt) minbkt, max(bkt) maxbkt, count(val) rep, count(val)*count(val) repsq from (select /*+ no_expand_table(t) index_rs(t)   no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
*/"ID_C" val, ntile(254) over (order by "ID_C") bkt  from sys.ora_temp_1_ds_440005 t where "ID_C" is not null) group by val) group by maxbkt order by maxbkt
DBMS_STATS: Ending query at 11-MAR-15 05.13.21.168371000 PM +02:00
DBMS_STATS:  Skewness check: FALSE
DBMS_STATS:  Not Discarding histogram (Incremental mode)
DBMS_STATS:  scaling density: pct=.1375,ssize=5616,popcnt=0,ssizesq=24520,popcntsq=0,slsv=0,invden=1286.274714518760195758564437194127243067
DBMS_STATS:  kkesdv_p: sndv=1286.274714518760195758564437194127243067,snnv=5616,nnv=4084363.63636363636363636363636363636364,hind=0,slsv=0
DBMS_STATS:  totcnt=1,popndv=0,lsv=0
DBMS_STATS:  npndv=1303
DBMS_STATS:  scaled density=.00076745970836531082118188795088257866462,invden=1303
DBMS_STATS:   Histogram gathering flags: 4111
DBMS_STATS: Building Histogram for INSERT_DATE
.
.
.
DBMS_STATS: ----------------------------------------------------------------------------------------------------
DBMS_STATS: Start gather_stats.. pfix:  ownname: Xtabname: IPT pname:  spname:  execution phase: 1
DBMS_STATS: Synopsis Aggregation Degree: 1
DBMS_STATS: Deriving Global Histogram for PART_TIME
DBMS_STATS: Deriving Global Histogram for ID_A
DBMS_STATS: Deriving Global Histogram for ID_B
DBMS_STATS:  Skewness check: FALSE
DBMS_STATS:  Not Discarding histogram (Incremental mode)
DBMS_STATS:  scaling density: pct=100,ssize=40000000,popcnt=0,ssizesq=254,popcntsq=0,slsv=0,invden=6299212598425.19685039370078740157480315
DBMS_STATS:  scaled density=.00000000000015875,invden=6299212598425.19685039370078740157480315
DBMS_STATS:   Histogram gathering flags: 4111
DBMS_STATS: Deriving Global Histogram for ID_C
DBMS_STATS:  Skewness check: FALSE
DBMS_STATS:  Not Discarding histogram (Incremental mode)
DBMS_STATS:  scaling density: pct=100,ssize=40000000,popcnt=0,ssizesq=254,popcntsq=0,slsv=0,invden=6299212598425.19685039370078740157480315
DBMS_STATS:  scaled density=.00000000000015875,invden=6299212598425.19685039370078740157480315
DBMS_STATS:   Histogram gathering flags: 4111
DBMS_STATS: Deriving Global Histogram for INSERT_DATE
DBMS_STATS: Derive global stats from partition synopses/stats for table IPT
.
.
.
DBMS_STATS: Finished table X.IPT. at 11-MAR-15 05.13.22.207335000 PM +02:00

PL/SQL procedure successfully completed.

SQL>

What do we see now, the well-known stupid behavior made stupider. We just said gather histograms only for our skew columns and Oracle decided by itself /line 33,34/ to persist the histograms for non-skew columns anyway, because it will need it to maintain the global histograms. How stupid.

Why not just using SIZE 254 then, it looks the same. But there is one small difference:

****************
QUERY BLOCK TEXT
****************
select count(*) from ipt where id_c=5995
.
.
.
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: IPT  Alias: IPT  (Using composite stats)
    #Rows: 4000000  #Blks:  17791  AvgRowLen:  27.00  ChainCnt:  0.00
Access path analysis for IPT
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for IPT[IPT]
NOTE: ignoring histogram of column (IPT.ID_C)
      used only for incremental stats maintenance
    #Rows: 4000000  #Blks:  17791  AvgRowLen:  27.00  ChainCnt:  0.00
  Column (#4): ID_C(  Part#: 0
    AvgLen: 4 NDV: 1815 Nulls: 0 Density: 0.000000 Min: 0 Max: 0
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 0
.
.
.
  Best:: AccessPath: TableScan
         Cost: 4855.35  Degree: 1  Resp: 4855.35  Card: 2203.86  Bytes: 0

The optimizer is not using those histograms which were made persistent, just because we are in incremental mode. And when I say not using them I mean on global level as well. What a rubbish.

Let’s check the situation with SIZE 254:


****************
QUERY BLOCK TEXT
****************
select count(*) from ipt where id_c=5995
.
.
.
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: IPT  Alias: IPT  (Using composite stats)
    #Rows: 4000000  #Blks:  17791  AvgRowLen:  27.00  ChainCnt:  0.00
Access path analysis for IPT
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for IPT[IPT]
  Column (#4):
    NewDensity:0.000547, OldDensity:0.000000 BktCnt:254, PopBktCnt:2, PopValCnt:1, NDV:1815
  Column (#4): ID_C(
    AvgLen: 4 NDV: 1815 Nulls: 0 Density: 0.000547 Min: 6 Max: 9996
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 254
.
.
.
  Best:: AccessPath: TableScan
         Cost: 4855.35  Degree: 1  Resp: 4855.35  Card: 2187.71  Bytes: 0

At least we have useful histograms.

CONCLUSIONS

What a mess.
Generally Oracle database is a piece of software with great evolution. But in this part of it, I think we have something more like mutation than evolution.
I think Oracle shall rewrite everything around the way they gather statistics. In my opinion they shall use single table/partition scan to gather stats and histograms and maybe gather histograms on all columns. Or track the trending on partition level and then decide do we need histogram on that column or not. And then when they take a decision to be single for all levels – global and partition. In this way they will fix a lot of issues and save a lot of time and resources. But for now we have the rubbish that I just explained and we shall know it well in order to use it the best possible way.

What about 12c?
Same shit, different version.

Happy gathering,
Kovachev

Advertisements

One thought on “Optimizer statistics – Gathering Statistics and Histograms

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s