Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Resolving HW enqueue contention

Posted by Riyaj Shamsudeen on May 16, 2008

Recently, I had few email exchanges about HW enqueue contention in oracle-l list and offline. There were few interesting observations emerged from test cases created during that discussion.

HW enqueue

When a session needs access to a resource, it requests a lock on that resource in a specific mode. Internally, lock and resource structures are used to control access to a resource. Enqueues, as name suggests, have First In First Out queueing mechanism. You can find more information about internals of locks in my paper [printed in 2001 ]Internals of locks

Segments have High Water Mark (HWM) indicating that blocks below that HWM have been formatted. New tables or truncated tables [ that is truncated without reuse storage clause ], have HWM value set to segment header block. Meaning, there are zero blocks below HWM. As new rows inserted or existing rows updated (increasing row length), more blocks are added to the free lists and HWM bumped up to reflect these new blocks. HW enqueues are acquired in Exclusive mode before updating HWM and essentially HW enqueues operate as a serializing mechanism for HWM updates.

In non-ASSM tablespaces, HWM is bumped up by 5 blocks at a time ( Actually, undocumented parameter _bump_highwater_mark_count controls this behavior and defaults to 5). Heavy inserts in to a table can result in increased HWM activity leading to HW enqueue contention. This issue is prevalent if the table has LOB columns or if the row length is big.

Measuring HW enqueue contention

We will use few test cases to see how underlying extent size and table structures are affecting HW enqueue contention. But, before we need to find a way to measure total number of gets on HW enqueue. If total number of gets on HW enqueue is reduced, enqueue contention can be relieved.

Fixed table x$ksqst stores statistics about total number of enqueue gets, success and failures of those gets at instance level. For example, to see total number of gets on HW enqueue, following query can be used. Column ksqstreq indicates total # of gets and ksqstwat shows total # of waits.

SQL> select ksqstreq, ksqstwat from x$ksqst where ksqsttyp='HW';

  KSQSTREQ   KSQSTWAT
---------- ----------
    546953         50

From Oracle version 10g and above, x$ksqst is externalized as v$enqueue_statistics.

However, this statistics shows activity at instance level. While we can use this statistics to measure HW enqueue activity , we need to make sure that there is no other session acquiring HW enqueue. Event 10704 can be used to trace enqueues and every call to get an enqueue prints few lines in the trace file. SQL statement to dump this information to trace file is :

alter session set events ‘10704 trace name context forever, level 15′;

Event 10704 is documented as below:

10704, 00000, "Print out information about what enqueues are being obtained"
// *Cause:  When enabled, prints out arguments to calls to ksqcmi and
//          ksqlrl and the return values.
// *Action: Level indicates details:
//   Level: 1-4: print out basic info for ksqlrl, ksqcmi
//          5-9: also print out stuff in callbacks:  ksqlac, ksqlop
//          10+: also print out time for each line

Few lines from the trace files printed below. ksq is internal Oracle module names for enqueues and ksqgtl is to get locks on a resource. From the lines below, we can see that HW enqueue is acquired in mode 6, exclusive mode. Timestamp is also printed since we enabled this event at level 15. If we need count number of HW enqueue gets, we need to count occurrences of HW- string in the trace file.


*** 2008-05-04 10:08:35.734
ksqgtl *** HW-00000007-01800014 mode=6 flags=0x11 timeout=21474836 ***
ksqgtl: xcb=0x1E283158, ktcdix=2147483647, topxcb=0x1E283158
	ktcipt(topxcb)=0x0

*** 2008-05-04 10:08:35.734
ksucti: init session DID from txn DID:
ksqgtl:
	ksqlkdid: 0001-0014-00000016

*** 2008-05-04 10:08:35.734
*** ksudidTrace: ksqgtl
	ktcmydid(): 0001-0014-00000016
	ksusesdi:   0000-0000-00000000
	ksusetxn:   0001-0014-00000016
ksqgtl: RETURNS 0

*** 2008-05-04 10:08:35.750
ksqrcl: HW,7,1800014
ksqrcl: returns 0

Now, we can grep for HW- in the trace file, count it and match that against v$enqueue_statistics. Following test case illustrates this method using an example table below:


SQL> select ksqstreq, ksqstwat from x$ksqst where ksqsttyp='HW';

  KSQSTREQ   KSQSTWAT
---------- ----------
    546198         50

SQL> insert into test_hw
     select n, lpad(n, 4000,'a') v1 from
     (select level n from dual connect by level  SQL> SQL> commit;

Commit complete.

SQL> select ksqstreq, ksqstwat from x$ksqst where ksqsttyp='HW';

  KSQSTREQ   KSQSTWAT
---------- ----------
    546953         50

A difference of 755 (546953-546198) total gets to HW enqueue.

Searching for HW- enqueues in the trace file also prints 755.

/oracle/app/oracle/admin/TEST1/udump> grep ‘HW-‘ test1_ora_26668.trc |wc -l
755

So, in my test database, my session is the only session and we could query v$enqueue_statistics for our test cases.

Test case

Following test case will be used to see the impact of extent size, segment management on HW enqueue gets. If total number of HW enqueue gets can be reduced, contention can be relieved.

Following script creates a tablespace, then creates a table with lob column in that tablespace, inserts 9999 rows in to that table and prints difference in total enqueue gets before and after insert statement.


-- Script enq_hw1.sql -----------
variable v_begin_cnt number
variable v_end_cnt number

prompt
prompt  Tablespace: Locally managed with \&1
prompt
drop tablespace TS_LMT_HW including contents and datafiles;

create tablespace TS_LMT_HW datafile 'D:\ORACLE\ORADATA\ORCL11G\TS_LMT_AUTO_1M_01.DBF'
 size 200M
 extent management local
\&1;

create table test_hw (n1 number ,  c1 clob )     tablespace TS_LMT_HW;

begin
  select total_req# into :v_begin_cnt from v$enqueue_statistics where 	eq_type ='HW';
end;
/
insert into test_hw
 select n, lpad(n, 4000,'a') v1 from
   (select level n from dual connect by level <10000);
commit;
select total_req#, succ_req#, failed_req# from v$enqueue_statistics where 	eq_type ='HW';

begin
  select total_req# into :v_end_cnt from v$enqueue_statistics where 	eq_type ='HW';
end;
/

select :v_end_cnt - :v_begin_cnt diff from dual;

--- script end enq_hw1.sql -----------

Above script is called by following script passing various tablespace attributes, essentially running same test for different types of tablespaces:

spool call_eng_hw1.lst
REM Test cases #1 through #6
@enq_hw1 "uniform size 5M segment space management manual"
@enq_hw1 "uniform size 5M segment space management auto"
@enq_hw1 "uniform size 40K segment space management manual"
@enq_hw1 "uniform size 40K segment space management auto"
@enq_hw1 "autoallocate segment space management manual"
@enq_hw1 "autoallocate segment space management auto"
spool off

Test Results

Tested above script for tablespace in versions 10.2.0.4 and 11.1.0.6. Here are the test results:

test Extent management Segment space management 10.2.0.4 11.1.0.6
#1 uniform size 5M manual 2049 64
#2 uniform size 5M auto 50 48
#3 uniform size 40K manual 6604 6045
#4 uniform size 40K auto 7552 7554
#5 autoallocate manual 2231 279
#6 autoallocate auto 269 291

There are few key points here.
1. In test cases #1 and #5, there is a dramatic decrease in enqueue gets between 10g and 11g for tablespace with segment space management set to manual. Looks like, a new feature faster lob is kicking in. Tested above script for a table without lob column. Virtually there is no difference between 10g and 11g, if the table has no lob column. row length is kept around 3000 bytes for this test.

  create table test_hw (n1 number ,  v1 varchar2(1000), v2 varchar2(1000), v3 varchar2(1000) )     tablespace TS_LMT_HW;
...
declare
   i number;
begin
  for i in 1 .. 10000
   loop
    insert into test_hw  values ( i, lpad(i, 1000, 'a'), lpad(i, 1000, 'a'), lpad(i, 1000, 'a')  );
	commit;
   end loop;
end;
/
...
test Extent management Segment space management 10.2.0.4 11.1.0.6
#1 uniform size 5M manual 1020 1019
#2 uniform size 5M auto 33 31
#3 uniform size 40K manual 3004 3004
#4 uniform size 40K auto 3055 3055
#5 autoallocate manual 1132 1132
#6 autoallocate auto 163 164

2. If tablespace uniform size is too low, then # of HW enqueue gets increases sharply. Compare test cases #1 & #3 for 11g. Enqueue gets decreased 10 times for tablespace with appropriate extent size.

3. In all test cases, automatic segment space management tablespaces performed fewer enqueue gets. In ASSM tablespaces HWM is bumped up by much higher number. I am not suggesting to use ASSM, since space wastage *can* increase in ASSM tablespaces. However, if resolving HW contention is primary issue, then consider ASSM.

4. Allocating additional extent with instance keyword seems to help in non-ASSM tablespace [ Behavior is different for ASSM tablespace and needs more research to understand that ]. In test case #1 above, HWM for the table was set at file_id 6, block_id 80 and that extent starting at file 6, block_id 9. That is, 70 blocks were below HWM initially. After allocating an extent with instance 1 keyword, HWM was bumped up by 1209 blocks.
[ Yong has a test case for this using dbms_space package: http://yong321.freeshell.org/oranotes/Dbms_Space.txt ]

From segment header block, before allocating extent, HWM: 0x01800050 (dba for file 6,block 80)
alter table test_hw allocate extent (instance 1);
From segment header block, after allocating extent, HWM: 0x01802f89 (dba for file 6,block 12169)

Extent Map
—————————————————————–
0x0180000a length: 639
0x01802d09 length: 640

5. High HW enqueue contention is prevalent during Oracle applications upgrade also. During one of our recent upgrade from 11.5.8 to 11.5.0, there was heavy HW enqueue contention on sys.source$ table due to many packages compiled/added during the upgrade. Only option was to increase _bump_highwater_mark_count during upgrade, to relieve excessive waits on HW enqueue contention. Of course, Oracle support must be contacted before adding any underscore parameters in production environment. Side effect of setting this parameter is that, for smaller tables, more blocks might be added below HWM leading to full table scan reading more blocks then necessary.

12 Responses to “Resolving HW enqueue contention”

  1. Besides growing Segments, there is a much more problematic ASSM reclaim Problem with Blobs and HW-enqueue. This is supposed to be fixed in 10.2.0.4, but only(?) if you set the Event 44951.

    This will batch-reclaim inactive (undo) LOB chunks and reduce the number of enqueues needed for tht operation. Do you have experiences with that, as well? This typically happens on LOB segments with concurrent insert/delete/update activity.

  2. Hi Bernd
    I guess, you are referring to bug 6376915. No, I have not personally encountered it. Looks very intrusive though. Posting a summary here for others:

    The bug states that in effect space reclamation can cause HW enqueu’s – and what we are seeing is that if we truncate the table – we don’t see the HW:enqueue’s until we fill up the pre-allocated extents – then the HW enqueue’s start en masse.

    Fix:

    This fix causes ASSM LOB space to batch up reclaim instead of just reclaiming the requested/required number of LOB chunks. To enable this fix, set event 44951 to the maximum number of chunks that you would like to have reclaimed per attempt. The maximum allowed is 1024. Anything larger becomes 1024. However, if the requested amount of space is larger than the event’s value, the reclaim will be for the requested amount of space.

    Thanks for pointing out this bug, Bernd

    Cheers
    Riyaj

    • I have the similar issue in my zOracle 10.2.0.5 on Windows. In my top waits AWR, I see enq: HW – contention waits but when i trace out my session I see I/O slaves (something related to 1300021.1 which I think I can ignore). But in trace file I see many HW numbers.
      Does the above mentioned Bug applied on 10.2.0.5? Should I set this event for 10.2.0.5?

  3. Kostas Hairopoulos said

    I noticed in our 10.2.0.3 database with ASSM in AIX 5.3 box that some staging table (with insert and delete operations) are using a huge number of extents and is continuously growing. It seems that for some reason even if there are deletions, oracle cannot reclaim back the space and lower the HHWM. Is any way or any event we can set at instance level to trace the time of the extent is added in the staging table.
    I noticed some notes from Steve Adams email that this can be shown dumping the redo log with layer 14.4 but the problem is that we don’t when it happens, thus we cannot check every log

    Thank you in advance,
    khair

  4. Hello Khair
    I don’t think, there is an event to print when an extent is added, at least, not that aware of. My recommendation is to write a small piece of code to get timestamp and # of extents every hour or so for that tablespace/object. With that output, you can pin down the hour at which extent is added and dump the log file during that time period.

    Correct approach here is to understand root cause by dumping the log files. This might also mean that you may have to dump the ASSM segments with dbms_Space_admin.segment_dump to see block status of various blocks. With these two trace files, you may be able to understand, why extents are allocated even when not needed.

    There are many bugs in 10.2.0.3 in ASSM for extent management. You may be hitting one of that. So, you might want to consider upgrading to version 10.2.0.4 and see if this problem reproduces.

    Cheers
    Riyaj

  5. Md. Zafar Ahsan said

    Hello

    It is very pleasing to see your practical approach and demonstration for various issues.
    Here may be i am just deviating some from the article but it is all related to the same. I tried to do the test and the result was very satisfactory and was in accordance to your observation. As we have to face different types of client in many cases they even donot agree with these statistics i mean HW resuest .so itried to aggrevate it by using 40K extent size and inserting data through 3 different sessions in test environment so that it could comeup with the HW enqueue wait event but unfortunately it didnot show any HW wait event . Could you please provide the test scenerio in which we can also create this wait event and then we can check if the wait event has gone by increasing the extent size. I have used manual allocate enough space to avoid this wait event( using alter table modify lob… allocate extent command).
    other observation is:
    I am finding lots of “direct path write” on the datafile “TS_LMT_AUTO_1M_01.DBF ” contention for each session that is doing the insert .
    So could you please provide some help to reduce this “direct path write” on data file.
    I have disk_asynch_io = TRUE. As this wait is not on temp file i hope it has nothing to take with pga and sort area and worksize_area_policy even then if it is required i have the following setting for them:

    pga_aggregate_target =200000000

    workarea_size_policy =AUTO

    increasing the pga didnot help as it is coming on datafile?

    Your early suggestions and solutions for both the points are very much appriciated.
    Regards,
    Md. zafar Ahsan Oracle DBA

    • Zafar

      Thanks for reading my blog and your kind words.

      For HW enqueue test, what version of software and what tablespace management type (ASSM or non-ASSM) are you using? You might need to increase concurrency, but keep the transaction smaller, typically, few blocks. That might trigger HW enqueue contention.

      If your SQL inserts are suffering from direct path write wait events, then are you using conventional insert or direct mode insert? Do you have lob columns in that table? What software version please. Better thing is to understand why writes are slower? What kind of file system? Instead of setting disk_asynch_io parameter, you need to have filesystem_options to setall, Oracle RDBMS code will use both asynchronous I/O and direct I/O if the file system supports it.

      Cheers
      Riyaj

  6. Md. Zafar Ahsan said

    Hi Riyaj,

    Sorry for missing some important parameters . Ok it is windows XP 32 bits, filesystem- ntfs, the database version is 10.1.0.2
    Insert operation done through 3 simultaneous session. Yes it has clob as in your script.In fact i am using the same scripts, but even having 3 sessions running in parallel for insert opertaion still i am not able to produce the Hw-enq wait. Statistics are fine but I need to be inline with wait and produce HW-enq wait and then after increasing the segment size to 5m check how the waits are reduced with larger extent size.
    Test case useed:
    @enq_hw1 “uniform size 40K segment space management manual”
    @enq_hw1 “uniform size 40K segment space management auto”.

    So if possible please let me know how to go for it what changes to make to get the above goal? We canot reduce the extent size further as at least 5 block size(8k) is required while creating the tablespace. Other details i have provided in previous post if anyother information is required then please let me know.
    Regarding filesystem_options to setall, . i set this but still didnot find any change for “direct path write” wait event on TS_LMT_AUTO_1M_01.DBF’ file .

    Regards,
    Zafar

  7. I am having a problem with lock , the developers are saying the application logic is correct , i have an oracle trace level 12, and the processes have 2 tx enqueue:
    enq: TX – row lock contention 32 2.93 28.47
    enq: TX – index contention 14 0.05 0.18

    the wait process is in a update ,also there is a important table , always inserting ( millions of rows a day ) , with a CLOB column.

    Is there a way to check if the problem is for this bug? looking the oracle trace.

    thanks.

    • Hello Marcel

      You haven’t provided much information, what version of RDBMS?

      Mode of the request and hold is very important to diagnose the locking performance issue.

      Can you send the output of the script below when you have locking contention? If you don’t want to post here, you can send to rshamsud@orainternals.com.

      col inst format A10
      col module format A30
      col lmode format 999
      col request format 999
      SELECT DECODE(request,0,’Holder: ‘,’ Waiter: ‘)||a.sid sess,
      decode(request,0,to_char(a.inst_id),’ ‘||to_char(a.inst_id)) inst,
      id1, id2, lmode, request, a.type, a.sid,
      b.sql_hash_value hash,module,round(last_call_et/60,0) min
      FROM gV$LOCK a, gv$session b
      WHERE (id1, id2, a.type) IN
      (SELECT id1, id2, type FROM gV$LOCK WHERE request>0)
      and a.sid = b.sid
      and a.inst_id = b.inst_id
      ORDER BY id1, request
      ;

  8. saurabh mishra said

    Hi,

    I have a typical after migration issue, we had i/o issues visible in prod after migration to 11gr2. Oracle recommended to set the parameter “_partition_large_extents to false” and after doing that we are observing lots of HW contentions and buffer busy waits in database, mostly the truncate is getting slowed down.

    Please suggest.

    Thank
    Saurabh

  9. […] http://orainternals.wordpress.com/2008/05/16/resolving-hw-enqueue-contention/ […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 219 other followers

%d bloggers like this: