Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Archive for March, 2008

Correlation between column predicates

Posted by Riyaj Shamsudeen on March 21, 2008

Multi column correlation

We all have encountered this situation many times before: Cost based optimizer assumes no correlation between two columns (until 11g) and this has the effect of reducing cardinality of a row source erroneously. Incorrect cardinality estimates
are one of the many root causes for SQL performance issues.

Consider following example.

Following code creates a table and populates data with four columns. First two columns n1 and n2
has

create table t_vc as
select mod(n, 100) n1, mod(n, 100) n2 ,
mod(n, 50) n3 , mod(n, 20) n4
from (select level n from dual connect by level <= 10001);

There is a strong correlation between n1 and n2 above. N1 is always equals to N2.
Let’s collect statistics with histograms on all columns.
begin
dbms_stats.gather_Table_stats( user, ‘T_VC’, estimate_percent => null, method_opt => ‘for all columns size 254′);
end;
/

Let’s explain plan for query specifying one predicate. Exactly 100 rows with a value of 10.

explain plan for select count(*) from t_vc where n1=10;


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     9   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T_VC |   100 |   300 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

Yes, optimizer is able to estimate cardinality to be 100 from row source T_VC at step 2.

Now, let’s add another predicate n2=10. Since there is a strong correlation between n1 and n2,
adding a predicate such as n2=10 should not alter row source cardinality. Right?

explain plan for select count(*) from t_vc where n1=10 and n2=10;

select * from table(dbms_xplan.display);


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |     9   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T_VC |     1 |     6 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------

Nope, Optimizer cardinality is far off and 100 times off the correct cardinality.

Notice rows column at step 2 above and it is just 1. Essentially, CBO estimate boils down to
simplified formula:
# of rows~= total # of rows * (1/NDV for n1) * (1/NDV for n2)
= 10000 * (1/100) * (1/100) =1 row.
where selectivity is (1/NDV) in simple case, without considering histograms.

Far from the truth! There are at least 100 rows returned for this SQL.

Extended stats

Oracle 11g introduces extended stats to relieve some pain. In 11g, an extended stats can be added between columns, enabling CBO to consider correlation between these column values.


SELECT dbms_stats.create_extended_stats(
ownname=>user, tabname => ‘T_VC’,
extension => ‘(n1, n2)’ ) AS n1_n2_correlation
FROM dual;
N1_n2_correlation
—————————————————
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS

Let’s collect stats again on this table and check the SQL plan.
exec dbms_stats.gather_Table_stats( user, ‘T_VC’, estimate_percent => null, method_opt => ‘for all columns size 254′);

explain plan for select count(*) from t_vc where n1=10 and n2=10;


--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time               |
--------------------------------------------------------------------------
| 0  | SELECT STATEMENT |        | 100 | 1200 | 9 (0)| 00:00:01          |
|* 1 | TABLE ACCESS FULL| T_VC   | 100 | 1200 | 9 (0)| 00:00:01
--------------------------------------------------------------------------

At last, optimizer truly understood the correlation in 11g.

Under the wrap

1. Adding an extended stats adds a new virtual column to the table. Here is the line from sqltrace.

alter table "CBQT"."T_VC" add (SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
       as (sys_op_combined_hash(n1, n2)) virtual BY USER for statistics);

Virtual column name is cryptic and seems to have been derived from table_name, column name combinations.

2. A new deterministic hash function sys_op_combined_hash called by optimizer to populate
this virtual column values. This function returns unique value for unique combination of values passed.

col h1 format 99999999999999999999999999
select sys_op_combined_hash (1,1) h1 from dual;
H1
—————————-
7026129190895635777

select sys_op_combined_hash (1,2) h1 from dual;
H1
—————————-
298332787864732998

Collecting histograms on all columns collects histograms on this virtual column also.

Trace lines

Using the above histogram, CBO is able to find that there is a strong correlation between these two columns.

This is visible in the 10053 output.


SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T_VC[T_VC]
  Column (#1):
    NewDensity:0.005000, OldDensity:0.000050 BktCnt:10001, PopBktCnt:10001, PopValCnt:100, NDV:100
  Column (#2):
    NewDensity:0.005000, OldDensity:0.000050 BktCnt:10001, PopBktCnt:10001, PopValCnt:100, NDV:100
  Column (#5):
    NewDensity:0.005000, OldDensity:0.000050 BktCnt:10001, PopBktCnt:10001, PopValCnt:100, NDV:100
  ColGroup (#1, VC) SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
    Col#: 1 2     CorStregth: 100.00
  ColGroup Usage:: PredCnt: 2  Matches Full: #0  Partial:  Sel: 0.0100
  Table: T_VC  Alias: T_VC
    Card: Original: 10001.000000  Rounded: 100  Computed: 100.00  Non Adjusted: 100.00
  Access Path: TableScan
    Cost:  9.11  Resp: 9.11  Degree: 0
      Cost_io: 9.00  Cost_cpu: 2404620
      Resp_io: 9.00  Resp_cpu: 2404620
  Best:: AccessPath: TableScan
         Cost: 9.11  Degree: 1  Resp: 9.11  Card: 100.00  Bytes: 0

Notice the colgroup line and corStrength field above. It is set to 100. corStrength is calcualted
using histograms for virtual column and Final cardinality estimates are multipled by corStrength.

# of rows~= total # of rows * (1/NDV for n1) * (1/NDV for n2)*corStrength
= 10000 * (1/100) * (1/100)*100 =100 rows.

Cardinality estimates are matching with reality, at last.

In the next section, we will discuss this further ;-)

Well, okay, it is time to introduce myself. I specialize in Oracle performance tuning,
oracle internals and E-business suite. I have over 15 years of experience as an Oracle DBA, currently,
I am employed with pythian [ Thanks Paul]. I am regular presentor in major conferences such as
hotsos, ukoug, rmoug etc. I am also an Oaktable member.

Some of my papers can be found in my personal blog
my papers and presentations

Posted in CBO | 2 Comments »

Hotsos 2008 is over

Posted by Riyaj Shamsudeen on March 8, 2008

I have been a regular presenter in hotsos performance intensive conference, three years in a row. This year my presentation was about cost based subquery transformation. Optimizer has become far smarter in 11g, and subqueries are transformed and cost calculated using physical optimizer. Enough about that.

As usual, I met many of my friends, Mark Bobak, Alex Gorbachev, Robyn Sands, Joze Sengachnik, Kyle Hailey, Jared Still, Wolfgang B, Carol Decko, Stephen Knecht, Christian Antognini and so on. We had a wonderful time on Monday in Bob’s steak house, good time on Wednesday in downtown dallas.

Now, I can’t wait for next HOTSOS!

Posted in Uncategorized | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 219 other followers