Resolving HW enqueue contention

Recently, I had few email exchanges about HW enqueue contention in oracle-l list and offline. There were few interesting observations emerged from test cases created during that discussion.

HW enqueue

When a session needs access to a resource, it requests a lock on that resource in a specific mode. Internally, lock and resource structures are used to control access to a resource. Enqueues, as name suggests, have First In First Out queueing mechanism. You can find more information about internals of locks in my paper [printed in 2001 ]Internals of locks

Segments have High Water Mark (HWM) indicating that blocks below that HWM have been formatted. New tables or truncated tables [ that is truncated without reuse storage clause ], have HWM value set to segment header block. Meaning, there are zero blocks below HWM. As new rows inserted or existing rows updated (increasing row length), more blocks are added to the free lists and HWM bumped up to reflect these new blocks. HW enqueues are acquired in Exclusive mode before updating HWM and essentially HW enqueues operate as a serializing mechanism for HWM updates.

In non-ASSM tablespaces, HWM is bumped up by 5 blocks at a time ( Actually, undocumented parameter _bump_highwater_mark_count controls this behavior and defaults to 5). Heavy inserts in to a table can result in increased HWM activity leading to HW enqueue contention. This issue is prevalent if the table has LOB columns or if the row length is big.

Measuring HW enqueue contention

We will use few test cases to see how underlying extent size and table structures are affecting HW enqueue contention. But, before we need to find a way to measure total number of gets on HW enqueue. If total number of gets on HW enqueue is reduced, enqueue contention can be relieved.

Fixed table x$ksqst stores statistics about total number of enqueue gets, success and failures of those gets at instance level. For example, to see total number of gets on HW enqueue, following query can be used. Column ksqstreq indicates total # of gets and ksqstwat shows total # of waits.

SQL> select ksqstreq, ksqstwat from x$ksqst where ksqsttyp='HW';

  KSQSTREQ   KSQSTWAT
---------- ----------
    546953         50

From Oracle version 10g and above, x$ksqst is externalized as v$enqueue_statistics.

However, this statistics shows activity at instance level. While we can use this statistics to measure HW enqueue activity , we need to make sure that there is no other session acquiring HW enqueue. Event 10704 can be used to trace enqueues and every call to get an enqueue prints few lines in the trace file. SQL statement to dump this information to trace file is :

alter session set events ‘10704 trace name context forever, level 15′;

Event 10704 is documented as below:

10704, 00000, "Print out information about what enqueues are being obtained"
// *Cause:  When enabled, prints out arguments to calls to ksqcmi and
//          ksqlrl and the return values.
// *Action: Level indicates details:
//   Level: 1-4: print out basic info for ksqlrl, ksqcmi
//          5-9: also print out stuff in callbacks:  ksqlac, ksqlop
//          10+: also print out time for each line

Few lines from the trace files printed below. ksq is internal Oracle module names for enqueues and ksqgtl is to get locks on a resource. From the lines below, we can see that HW enqueue is acquired in mode 6, exclusive mode. Timestamp is also printed since we enabled this event at level 15. If we need count number of HW enqueue gets, we need to count occurrences of HW- string in the trace file.


*** 2008-05-04 10:08:35.734
ksqgtl *** HW-00000007-01800014 mode=6 flags=0x11 timeout=21474836 ***
ksqgtl: xcb=0x1E283158, ktcdix=2147483647, topxcb=0x1E283158
	ktcipt(topxcb)=0x0

*** 2008-05-04 10:08:35.734
ksucti: init session DID from txn DID:
ksqgtl:
	ksqlkdid: 0001-0014-00000016

*** 2008-05-04 10:08:35.734
*** ksudidTrace: ksqgtl
	ktcmydid(): 0001-0014-00000016
	ksusesdi:   0000-0000-00000000
	ksusetxn:   0001-0014-00000016
ksqgtl: RETURNS 0

*** 2008-05-04 10:08:35.750
ksqrcl: HW,7,1800014
ksqrcl: returns 0

Now, we can grep for HW- in the trace file, count it and match that against v$enqueue_statistics. Following test case illustrates this method using an example table below:


SQL> select ksqstreq, ksqstwat from x$ksqst where ksqsttyp='HW';

  KSQSTREQ   KSQSTWAT
---------- ----------
    546198         50

SQL> insert into test_hw
     select n, lpad(n, 4000,'a') v1 from
     (select level n from dual connect by level  SQL> SQL> commit;

Commit complete.

SQL> select ksqstreq, ksqstwat from x$ksqst where ksqsttyp='HW';

  KSQSTREQ   KSQSTWAT
---------- ----------
    546953         50

A difference of 755 (546953-54619 8) total gets to HW enqueue.

Searching for HW- enqueues in the trace file also prints 755.

/oracle/app/oracle/admin/TEST1/udump> grep ‘HW-’ test1_ora_26668.trc |wc -l
755

So, in my test database, my session is the only session and we could query v$enqueue_statistics for our test cases.

Test case

Following test case will be used to see the impact of extent size, segment management on HW enqueue gets. If total number of HW enqueue gets can be reduced, contention can be relieved.

Following script creates a tablespace, then creates a table with lob column in that tablespace, inserts 9999 rows in to that table and prints difference in total enqueue gets before and after insert statement.


-- Script enq_hw1.sql -----------
variable v_begin_cnt number
variable v_end_cnt number

prompt
prompt  Tablespace: Locally managed with \&1
prompt
drop tablespace TS_LMT_HW including contents and datafiles;

create tablespace TS_LMT_HW datafile 'D:\ORACLE\ORADATA\ORCL11G\TS_LMT_AUTO_1M_01.DBF'
 size 200M
 extent management local
\&1;

create table test_hw (n1 number ,  c1 clob )     tablespace TS_LMT_HW;

begin
  select total_req# into :v_begin_cnt from v$enqueue_statistics where 	eq_type ='HW';
end;
/
insert into test_hw
 select n, lpad(n, 4000,'a') v1 from
   (select level n from dual connect by level <10000);
commit;
select total_req#, succ_req#, failed_req# from v$enqueue_statistics where 	eq_type ='HW';

begin
  select total_req# into :v_end_cnt from v$enqueue_statistics where 	eq_type ='HW';
end;
/

select :v_end_cnt - :v_begin_cnt diff from dual;

--- script end enq_hw1.sql -----------

Above script is called by following script passing various tablespace attributes, essentially running same test for different types of tablespaces:

spool call_eng_hw1.lst
REM Test cases #1 through #6
@enq_hw1 "uniform size 5M segment space management manual"
@enq_hw1 "uniform size 5M segment space management auto"
@enq_hw1 "uniform size 40K segment space management manual"
@enq_hw1 "uniform size 40K segment space management auto"
@enq_hw1 "autoallocate segment space management manual"
@enq_hw1 "autoallocate segment space management auto"
spool off

Test Results

Tested above script for tablespace in versions 10.2.0.4 and 11.1.0.6. Here are the test results:

test Extent management Segment space management 10.2.0.4 11.1.0.6
#1 uniform size 5M manual 2049 64
#2 uniform size 5M auto 50 48
#3 uniform size 40K manual 6604 6045
#4 uniform size 40K auto 7552 7554
#5 autoallocate manual 2231 279
#6 autoallocate auto 269 291

There are few key points here.
1. In test cases #1 and #5, there is a dramatic decrease in enqueue gets between 10g and 11g for tablespace with segment space management set to manual. Looks like, a new feature faster lob is kicking in. Tested above script for a table without lob column. Virtually there is no difference between 10g and 11g, if the table has no lob column. row length is kept around 3000 bytes for this test.

  create table test_hw (n1 number ,  v1 varchar2(1000), v2 varchar2(1000), v3 varchar2(1000) )     tablespace TS_LMT_HW;
...
declare
   i number;
begin
  for i in 1 .. 10000
   loop
    insert into test_hw  values ( i, lpad(i, 1000, 'a'), lpad(i, 1000, 'a'), lpad(i, 1000, 'a')  );
	commit;
   end loop;
end;
/
...
test Extent management Segment space management 10.2.0.4 11.1.0.6
#1 uniform size 5M manual 1020 1019
#2 uniform size 5M auto 33 31
#3 uniform size 40K manual 3004 3004
#4 uniform size 40K auto 3055 3055
#5 autoallocate manual 1132 1132
#6 autoallocate auto 163 164

2. If tablespace uniform size is too low, then # of HW enqueue gets increases sharply. Compare test cases #1 & #3 for 11g. Enqueue gets decreased 10 times for tablespace with appropriate extent size.

3. In all test cases, automatic segment space management tablespaces performed fewer enqueue gets. In ASSM tablespaces HWM is bumped up by much higher number. I am not suggesting to use ASSM, since space wastage *can* increase in ASSM tablespaces. However, if resolving HW contention is primary issue, then consider ASSM.

4. Allocating additional extent with instance keyword seems to help in non-ASSM tablespace [ Behavior is different for ASSM tablespace and needs more research to understand that ]. In test case #1 above, HWM for the table was set at file_id 6, block_id 80 and that extent starting at file 6, block_id 9. That is, 70 blocks were below HWM initially. After allocating an extent with instance 1 keyword, HWM was bumped up by 1209 blocks.
[ Yong has a test case for this using dbms_space package: http://yong321.freeshell.org/oranotes/Dbms_Space.txt ]

From segment header block, before allocating extent, HWM: 0×01800050 (dba for file 6,block 80)
alter table test_hw allocate extent (instance 1);
From segment header block, after allocating extent, HWM: 0×01802f89 (dba for file 6,block 12169)

Extent Map
—————————————————————–
0×0180000a length: 639
0×01802d09 length: 640

5. High HW enqueue contention is prevalent during Oracle applications upgrade also. During one of our recent upgrade from 11.5.8 to 11.5.0, there was heavy HW enqueue contention on sys.source$ table due to many packages compiled/added during the upgrade. Only option was to increase _bump_highwater_mark_count during upgrade, to relieve excessive waits on HW enqueue contention. Of course, Oracle support must be contacted before adding any underscore parameters in production environment. Side effect of setting this parameter is that, for smaller tables, more blocks might be added below HWM leading to full table scan reading more blocks then necessary.

Add comment May 16, 2008

how to tune SQL statements scientifically

I wrote a blog for pythian group blog and this is a place holder entry for the pdf file.

You can find original blog entry at
how to tune sql statements scientifically

how to tune sql statements scientifically - pdf

Add comment April 15, 2008

Correlation between column predicates

Multi column correlation

We all have encountered this situation many times before: Cost based optimizer assumes no correlation between two columns (until 11g) and this has the effect of reducing cardinality of a row source erraneously. Incorrect cardinaltiy estimates
are one of the many root causes for SQL performance issues.

Consider following example.

Following code creates a table and populates data with four columns. First two columns n1 and n2
has

create table t_vc as
select mod(n, 100) n1, mod(n, 100) n2 ,
mod(n, 50) n3 , mod(n, 20) n4
from (select level n from dual connect by level <= 10001);

There is a strong correlation between n1 and n2 above. N1 is always equals to N2.
Let’s collect statistics with histograms on all columns.
begin
dbms_stats.gather_Table_stats( user, ‘T_VC’, estimate_percent => null, method_opt => ‘for all columns size 254′);
end;
/

Let’s explain plan for query specifying one predicate. Exactly 100 rows with a value of 10.

explain plan for select count(*) from t_vc where n1=10;


—————————————————————————
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————
|   0 | SELECT STATEMENT   |      |     1 |     3 |     9   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T_VC |   100 |   300 |     9   (0)| 00:00:01 |
—————————————————————————

Yes, optimizer is able to estimate cardinality to be 100 from row source T_VC at step 2.

Now, let’s add another predicate n2=10. Since there is a strong correlation between n1 and n2,
adding a predicate such as n2=10 should not alter row source cardinality. Right?

explain plan for select count(*) from t_vc where n1=10 and n2=10;

select * from table(dbms_xplan.display);


—————————————————————————
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————
|   0 | SELECT STATEMENT   |      |     1 |     6 |     9   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T_VC |     1 |     6 |     9   (0)| 00:00:01 |
————————————————————————–

Nope, Optimizer cardinality is far off and 100 times off the correct cardinality.

Notice rows column at step 2 above and it is just 1. Essentially, CBO estimate boils down to
simplified formula:
# of rows~= total # of rows * (1/NDV for n1) * (1/NDV for n2)
= 10000 * (1/100) * (1/100) =1 row.
where selectivity is (1/NDV) in simple case, without considering histograms.

Far from the truth! There are at least 100 rows returned for this SQL.

Extended stats

Oracle 11g introduces extended stats to relieve some pain. In 11g, an extended stats can be added between columns, enabling CBO to consider correlation between these column values.


SELECT dbms_stats.create_extended_stats(
ownname=>user, tabname => ‘T_VC’,
extension => ‘(n1, n2)’ ) AS n1_n2_correlation
FROM dual;
N1_n2_correlation
—————————————————
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS

Let’s collect stats again on this table and check the SQL plan.
exec dbms_stats.gather_Table_stats( user, ‘T_VC’, estimate_percent => null, method_opt => ‘for all columns size 254′);

explain plan for select count(*) from t_vc where n1=10 and n2=10;


————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time               |
————————————————————————–
| 0  | SELECT STATEMENT |        | 100 | 1200 | 9 (0)| 00:00:01          |
|* 1 | TABLE ACCESS FULL| T_VC   | 100 | 1200 | 9 (0)| 00:00:01
————————————————————————–

At last, optimizer truly understood the correlation in 11g.

Under the wrap

1. Adding an extended stats adds a new virtual column to the table. Here is the line from sqltrace.

alter table "CBQT"."T_VC" add (SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
       as (sys_op_combined_hash(n1, n2)) virtual BY USER for statistics);

Virtual column name is cryptic and seems to have been derived from table_name, column name combinations.

2. A new deterministic hash function sys_op_combined_hash called by optimizer to populate
this virtual column values. This function returns unique value for unique combination of values passed.

col h1 format 99999999999999999999999999
select sys_op_combined_hash (1,1) h1 from dual;
H1
—————————-
7026129190895635777

select sys_op_combined_hash (1,2) h1 from dual;
H1
—————————-
298332787864732998

Collecting histograms on all columns collects histograms on this virtual column also.

Trace lines

Using the above histogram, CBO is able to find that there is a strong correlation between these two columns.

This is visible in the 10053 output.


SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T_VC[T_VC]
  Column (#1):
    NewDensity:0.005000, OldDensity:0.000050 BktCnt:10001, PopBktCnt:10001, PopValCnt:100, NDV:100
  Column (#2):
    NewDensity:0.005000, OldDensity:0.000050 BktCnt:10001, PopBktCnt:10001, PopValCnt:100, NDV:100
  Column (#5):
    NewDensity:0.005000, OldDensity:0.000050 BktCnt:10001, PopBktCnt:10001, PopValCnt:100, NDV:100
  ColGroup (#1, VC) SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
    Col#: 1 2     CorStregth: 100.00
  ColGroup Usage:: PredCnt: 2  Matches Full: #0  Partial:  Sel: 0.0100
  Table: T_VC  Alias: T_VC
    Card: Original: 10001.000000  Rounded: 100  Computed: 100.00  Non Adjusted: 100.00
  Access Path: TableScan
    Cost:  9.11  Resp: 9.11  Degree: 0
      Cost_io: 9.00  Cost_cpu: 2404620
      Resp_io: 9.00  Resp_cpu: 2404620
  Best:: AccessPath: TableScan
         Cost: 9.11  Degree: 1  Resp: 9.11  Card: 100.00  Bytes: 0

Notice the colgroup line and corStrength field above. It is set to 100. corStrength is calcualted
using histograms for virtual column and Final cardinality estimates are multipled by corStrength.

# of rows~= total # of rows * (1/NDV for n1) * (1/NDV for n2)*corStrength
= 10000 * (1/100) * (1/100)*100 =100 rows.

Cardinality estimates are matching with reality, at last.

In the next section, we will discuss this further ;-)

Well, okay, it is time to introduce myself. I specialize in Oracle performance tuning,
oracle internals and E-business suite. I have over 15 years of experience as an Oracle DBA, currently,
I am employed with pythian [ Thanks Paul]. I am regular presentor in major conferences such as
hotsos, ukoug, rmoug etc. I am also an Oaktable member.

Some of my papers can be found in my personal blog
my papers and presentations

Add comment March 21, 2008

Hotsos 2008 is over

I have been a regular presenter in hotsos performance intensive conference, three years in a row. This year my presentation was about cost based subquery transformation. Optimizer has become far smarter in 11g, and subqueries are transformed and cost calculated using physical optimizer. Enough about that.

As usual, I met many of my friends, Mark Bobak, Alex Gorbachev, Robyn Sands, Joze Sengachnik, Kyle Hailey, Jared Still, Wolfgang B, Carol Decko, Stephen Knecht, Christian Antognini and so on. We had a wonderful time on Monday in Bob’s steak house, good time on Wednesday in downtown dallas.

Now, I can’t wait for next HOTSOS!

Add comment March 8, 2008

Transforming from list to a set of rows

A developer asked me how to transform a column with a list of values to a set of rows ?
Column value is of this format:
1, ‘100,101,102,103,104,105′
Requirement was to convert above value to the format below:
1, 100
1, 101
1, 102

Solution

:

It is easy to do this using a pipelined function. Rows returned from a function can be used as a row source, as if it is a table row source. Metalink document 136909.1 describes this pipelined functions excellently.

Here is the test case, for the solution. Some part of code is borrowed from Tom Kyte’s original asktom posting.

– Dropping, recreate table tlist and populate with two rows.
drop table tlist;
create table tlist (c1 number, c2 varchar2(30) );
insert into tlist values (1,’100,101,102,103,104,105′);
insert into tlist values (2,’200,201,202,203,204,205′);
commit;

– Pipelined functions returns collections. Declaring a type TestListObj.
– TestList is an array of TestListObj
drop type TestList;
drop type TestListObj;
create or replace type TestListObj as object
(c1 number, c2 number);
create or replace type TestList as table of TestListObj;

— Creating a pipelined function.
REM This function :
REM for each row from tlist table
REM Read column values c1 and c2.
REM c2 has list of values separated by comma
REM For each value from the list
REM Construct TestListObj and store that as a member of TestList table object
REM Return TestList

create or replace function in_list_Test
return
TestList
as

l_string
long;
l_data
TestList := TestList();
n number;
begin
for l_csr in (select c1,c2 from tlist )
loop
l_string := l_csr.c2 || ‘,’;
loop
exit when l_string is null;
n := instr( l_string, ‘,’ );
l_data.extend;
l_data(l_data.count) := TestListObj( l_csr.c1, ltrim( rtrim( substr( l_string, 1, n-1 ) ) ) );
l_string := substr( l_string, n+1 );
end loop;
end loop;
return l_data;
end;

pause
show errors
select * from table (cast (in_list_Test as TestList))
/

C1 C2
1 100
1 101
1 102
….
Hope, you can make use of this.

Add comment January 11, 2007

Log group switch sequence

Q: How does Oracle determines which log group to use next,
at the onset of log switch ?
Q: How does adding a log group alters the sequence ?

Log group to switch, is determined from controlfile.
Let’s review the controlfile for pertinent information
Dumping the control file using following commands reveals more information:

oradebug setmypid
oradebug dump controlf 12

Above commands generated a trace file in
user_dump_dest directory and following is the excerpt from the file.
Only few lines shown to improve readability.

**********************************************************
LOG FILE RECORDS
**********************************************************

(blkno = 0×9, size = 72, max = 16, in-use = 8, last-recid= 15)
LOG FILE #1:
siz: 0xc8000 seq: 0×0000022b hws: 0×2 bsz: 512 nab: 0×9 flg: 0×1 dup: 1
LOG FILE #2:
siz: 0xc8000 seq: 0×0000022e hws: 0×2 bsz: 512 nab: 0×39 flg: 0×1 dup: 1
LOG FILE #3:
siz: 0xc8000 seq: 0×00000230 hws: 0×1 bsz: 512 nab: 0xffffffff flg: 0×8 dup: 1
LOG FILE #4:
siz: 0×5000 seq: 0×00000228 hws: 0×2 bsz: 512 nab: 0xc flg: 0×1 dup: 1
LOG FILE #5:
siz: 0×5000 seq: 0×0000022a hws: 0×2 bsz: 512 nab: 0×4 flg: 0×1 dup: 1
LOG FILE #6:
siz: 0×5000 seq: 0×0000022c hws: 0×2 bsz: 512 nab: 0×98 flg: 0×1 dup: 1
LOG FILE #7:
siz: 0×5000 seq: 0×0000022f hws: 0×2 bsz: 512 nab: 0xe9 flg: 0×1 dup: 1
LOG FILE #8:
siz: 0×5000 seq: 0×0000022d hws: 0×2 bsz: 512 nab: 0×5 flg: 0×1 dup: 1

Sequence# above is log sequence # in hexadecimal.

Oracle determines the next log group to switch:
i) searching for the log group with lowest sequence# that is archived.
ii) If a new log group is added, then that new log group will have log sequence# as zero and so that group will be selected
iii) If there are two new log groups, then first one encountered in the above array is selected.

Third point is quite important, since if we drop a log group, then that log group is simply marked as deleted, not completely removed from the array. So, if the same log group is readded, then deleted entry is reused.

Luckily, above controlfile array is externalized as x$kccle table. Instead of dumping the controlfile, we could use x$kccle to test our understanding:

Following SQL, can predict which log group will be selected by Oracle, during next log switch:

SELECT indx, lenum group#, leseq sequence#, archived,
DECODE (leflg, 8, ‘CURRENT’) curstatus,
CASE WHEN leseq = minseq AND indx = minind THEN ‘NEXT’
END NEXT
FROM (SELECT indx, lenum, leseq, leflg,
DECODE (BITAND (leflg, 1), 0, ‘NO’, ‘YES’) archived,
FIRST_VALUE (leseq) OVER (ORDER BY leseq, indx,BITAND (leflg, 1) DESC) minseq,
FIRST_VALUE (indx) OVER (ORDER BY leseq, indx, BITAND (leflg, 1) DESC) minind
FROM x$kccle
WHERE lesiz != 0 AND inst_id = USERENV (’instance’))
ORDER BY indx

For RAC:

SELECT lethr, indx, lenum group#, leseq sequence#, archived,
DECODE (leflg, 8, ‘CURRENT’) curstatus,
CASE WHEN leseq = minseq AND indx = minind THEN ‘NEXT’
END NEXT
FROM (SELECT lethr, indx, lenum, leseq, leflg,
DECODE (BITAND (leflg, 1), 0, ‘NO’, ‘YES’) archived,
FIRST_VALUE (leseq) OVER (partition by lethr ORDER BY leseq, indx,BITAND (leflg, 1) DESC) minseq,
FIRST_VALUE (indx) OVER ( partition by lethr ORDER BY leseq, indx, BITAND (leflg, 1) DESC) minind
FROM x$kccle
WHERE lesiz != 0 AND inst_id = USERENV (’instance’))
ORDER BY indx

See Oracle log sequence for complete test case.

Add comment January 11, 2007

At last, my blog is up.

This has been one of my long pending wishes to have a blog and update it regularly.

I am an Oracle Database Administrator and I specialize in Oracle database internals and performance tuning. My emphasis currently in Real Application Clusters and E-Business Suite.

Hopefully, I will keep this blog updated and love to hear your comments.

I have also written couple of articles, presented few and will upload them here soon.

3 comments January 9, 2007


Categories

Links

Feeds