Tuning latch contention: Cache buffers chain latches
Posted by Riyaj Shamsudeen on July 30, 2008
Recently, I had an opportunity to tune latch contention for cache buffers chain (CBC) latches. Problem statement is that high CPU usage combined with poor application performance. Quick review of statspack report of 15 minutes showed a latch free wait as top event and consuming 3600 seconds approximately, in a 8 CPU server. Further CPU usage was quite high, which is a typical symptom of latch contention, due to spinning involved. v$session_wait showed that hundreds of sessions were waiting for latch free event.
SQL> @waits10g SID PID EVENT P1_P2_P3_TEXT ------ ------- ------------ -------------------------------------- 294 17189 latch free address 15873156640-number 127-tries 0 628 17187 latch free address 15873156640-number 127-tries 0 .... 343 17191 latch free address 15873156640-number 127-tries 0 599 17199 latch: cache address 17748373096-number 122-tries 0 buffers chains 337 17214 latch: cache address 17748373096-number 122-tries 0 buffers chains ..... 695 17228 latch: cache address 17748373096-number 122-tries 0 buffers chains .... 276 15153 latch: cache address 19878655176-number 122-tries 1 buffers chains
We will use two pronged approach to find root cause scientifically. First, we will find SQL suffering from latch contention and objects associated with access plan for that SQL. Next, we will find buffers involved in latch contention, map that back to objects. Finally, we will match these two techniques to pinpoint root cause.
Before we go any further, let’s do a quick summary of internals of latch operations.
Brief Introduction to CBC latches and not-so-brief reason why this is a complicated topic to discuss briefly
Latches are internal memory structures to coordinate access to shared resources. Locks aka enqueues are different from latches. Key difference is that enqueues, as name suggests, provides a FIFO queueing mechanisms and latches do not provide a queueing mechanism. On the other hand, latches are held very briefly and locks are usually held longer.
In Oracle SGA, buffer cache is the memory area data blocks are read in to, aka buffer cache. [If ASMM – Automatic Shared Memory Management is in use, then part of Shared pool can be tagged as KGH:NO ALLOC and remapped to buffer cache area too].
Each buffer in the buffer cache has an associated element the buffer header array, externalized as x$bh. Buffer headers keeps track of various attributes and state of buffers in the buffer cache. This Buffer header array is allocated in shared pool. These buffer headers are chained together in a doubly linked list and linked to a hash bucket. There are many hash buckets (# of buckets are derived and governed by _db_block_hash_buckets parameter). Access (both inspect and change) to these hash chains are protected by cache buffers chains latches.
Further, buffer headers can be linked and delinked from hash buckets dynamically.
Simple algorithm to access a buffer is: (I had to deliberately cut out so as not to deviate too much from our primary discussion.)
- Hash data block address (DBA: Combination of tablespace, file_id and block_id) to find hash bucket.
- Get latch protecting hash bucket.
- If (success) then Walk the hash chain reading buffer headers to see if a specific version of the block is already in the chain.
- If (not success) spin for spin_count times and go to step 2.
- If this latch was not got with spinning, then sleep, with increasing exponential back-off sleep time and go to step 2.
If found, access the buffer in buffer cache, with protection of buffer pin/unpin actions.
If not found, then find a free buffer in buffer cache, unlink the buffer header for that buffer from its current chain, link that buffer header with this hash chain, release the latch and read block in to that free buffer in buffer cache with buffer header pinned.
Obviously, latches are playing crucial role controlling access to critical resources such as hash chain. My point is that repeated access to few buffers can increase latch activity.
There are many CBC latch children (derived by size of buffer cache). Parameter _db_block_hash_latches control # of latches and derived based upon buffer cache size. Further, In Oracle 10g, sharable latches are used and inspecting an hash chain needs to acquire latches in share mode, which is compatible with other shared mode operations. Note that these undocumented parameters are usually sufficient and changes to these parameters must get approval from Oracle support.
Back to our problem…
Let’s revisit our problem at hand. Wait graph printed above shows that this latch contention is caused by two types of latches. Latch # 127 is simulator lru latch and #122 is cache buffers chains latch.
select latch#, name from v$latch where latch# in (127, 122);
Problem with ‘simulator lru’ latch is simple. There is a bug with db_cache_advice and bug number is 5918642. If db_cache_advice is set to ON, then latch contention due to simulator lru latches can be observed for large buffer caches. This issue was fixed quickly by db_cache_advice to OFF.
After resolving ‘simulator lru’ latch, we had some relief in performance but not much.
Querying v$session to see what SQL(s) causing Latch contention. State column below indicates that processes are not currently waiting for latches, but waited in the past. 24 sessions are executing same SQL statement and last wait in the past is ‘latch free’ event for these sessions and yes, these are active sessions. If the latch contention is prevalent, then querying v$session as below, will provide SQLs to focus on.
select event, sql_hash_value,state, count(*) from v$session w where event='latch free' and status='ACTIVE' group by sql_hash_value, state , event SQL> / EVENT SQL_HASH_VALUE STATE COUNT(*) ------------- -------------- ------------------- ---------- latch free 3629331128 WAITED KNOWN TIME 24 latch free 673277007 WAITED KNOWN TIME 1 latch free 1378683334 WAITED SHORT TIME 1 latch free 3629331128 WAITED SHORT TIME 5 latch free 2920275581 WAITED SHORT TIME 3 We can find SQL statement querying v$sql_text with above hash value 3629331128 and SQL suffering from latch contention is printed below. Of course, care has been taken to change actual object names for security reasons. select * from v1 WHERE ( col1 IN ( 3, 20, 21, 44, 45, 47, 48, 49, 50, 51, 57, 58, 59, 67, 68, 69, 76, 78, 79, 80, 81, 82, 84,85, 106, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 476, 478, 500, 501, 502) OR col2 IN (3, 20, 21, 44, 45, 47, 48, 49, 50, 51, 57, 58, 59, 67, 68, 69, 76, 78, 79, 80, 81, 82, 84, 85, 106, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 476, 478, 500, 501, 502)) AND UPPER(col3) LIKE :1 and rownum < 200
Explain plan for the above shows that multiple tables are accessed in this view. But, at this point, we don’t know which step in this plan is causing latch contention. If you have to guess, which of the following tables, is causing the issue? Check your guess with correct answer later ( and become a BAAG member immediately).
[ Few columns removed from the plan output to improve readability ]
------------------------------------------------------------------ | Id | Operation | Name |Rows | Cost | ------------------------------------------------------------------ | 0 | SELECT STATEMENT | |17778 | 223| |* 1 | COUNT STOPKEY | | | | | 2 | CONCATENATION | | | | |* 3 | FILTER | | | | |* 4 | HASH JOIN | |17777 | 216| |* 5 | TABLE ACCESS FULL | ORG | 4 | 2| | 6 | NESTED LOOPS | | 197 | 213| | 7 | NESTED LOOPS | | 195 | 18| | 8 | TABLE ACCESS FULL | ORG | 1 | 2| |* 9 | TABLE ACCESS FULL | ORDER | 195 | 16| |* 10 | TABLE ACCESS BY INDEX ROWID| TRADE | 1 | 1| |* 11 | INDEX UNIQUE SCAN | TRADE_PK| 1 | 0| |* 12 | FILTER | | | | | 13 | NESTED LOOPS | | 1 | 7| | 14 | NESTED LOOPS | | 1 | 6| | 15 | NESTED LOOPS | | 1 | 4| |* 16 | TABLE ACCESS FULL | ORG | 1 | 2| |* 17 | TABLE ACCESS FULL | ORG | 1 | 2| |* 18 | TABLE ACCESS FULL | ORDER | 3 | 2| |* 19 | TABLE ACCESS BY INDEX ROWID | TRADE | 1 | 1| |* 20 | INDEX UNIQUE SCAN | TRADE_PK| 1 | 0| ------------------------------------------------------------------
Researching further
Re-querying v$session_wait, we see that couple of latches are hot. We will consider one latch children with latch address 19875043200 as an example and drill down further.
SID PID EVENT P1_P2_P3_TEXT ------ ---------- ----------- ---------------------------------------- 578 17220 latch:CBC address 19875043200-number 122-tries 0 664 17226 latch:CBC address 19875043200-number 122-tries 0 695 17228 latch:CBC address 19875043200-number 122-tries 0 701 23987 latch:CBC address 19875043200-number 122-tries 0 ...
Converting this latch address 19875043200 from decimal to hex yields 4A0A51780. But latch address is 16 bytes and so prefixing with zeros and querying v$latch_children to see activity against that latch children.
select addr, latch#, child#, level#, gets from v$latch_children where addr='00000004A0A51780' SQL> / ADDR LATCH# CHILD# LEVEL# GETS ---------------- ---------- ---------- ---------- ---------- 00000004A0A51780 122 10437 1 23672075 SQL> / ADDR LATCH# CHILD# LEVEL# GETS ---------------- ---------- ---------- ---------- ---------- 00000004A0A51780 122 10437 1 23672209
Repeated the execution of above SQL almost immediately. An increase 134 gets in sub-seconds. Above step also helps to validate latch address and comparing with latch type we see that this latch address is indeed Cache buffers chains latch.
Hang those buffers!
Next, we need to find buffers protected by this latch children and then find the buffers causing latch contention. Many such hash buckets (and so, numerous buffers) are protected by a latch children. Fortunately, column tch can be used effectively to identify hot block(s). Almost every access to a buffer increments tch value for that buffer header. Idea here is to find buffers protected by that latch and identify buffers with higher touch counts.Those buffers are probable candidates for further analysis.
x$bh table and v$latch_children can be joined to find those buffer attributes. (BTW, Following SQL can be rewritten with ease to print buffers protected by top latch, say by sleeps.)
select hladdr, file#, dbablk, decode(state,1,'cur ',3,'CR',state) ST, tch from x$bh where hladdr in (select addr from (select addr from v$latch_children where addr='00000004A0A51780' order by sleeps, misses,immediate_misses desc )where rownum <2) HLADDR FILE# DBABLK ST TCH ---------------- ---------- ---------- ---- ------ 00000004A0A51780 1 52351 cur 3 00000004A0A51780 16 701009 cur 24 00000004A0A51780 16 23959 cur 182 00000004A0A51780 15 16215 cur 2855 <-- 00000004A0A51780 26 693 cur 9 00000004A0A51780 9 52872 cur 2935 <-- 00000004A0A51780 8 45128 cur 1831 <-- 00000004A0A51780 16 635473 cur 560 00000004A0A51780 25 233403 cur 51 00000004A0A51780 25 97993 cur 110 00000004A0A51780 4 97273 cur 43 00000004A0A51780 25 268340 cur 0 12 rows selected.
There are at least three blocks with higher activity since their tch value is much higher. A note of caution, buffer cache activity is quite dynamic and this analysis need to be performed during the period of latch contention. Performing this analysis few hours after latch contention will lead to incorrect diagnosis.
We need to associate these three hot blocks with object names. File# and DBAblk can be used to find object using following script.
accept h_file_id prompt ' Enter file_id ==>' accept h_block_id prompt ' Enter block_id==>' set verify off column owner format A10 column segment_name format A20 column segment_type format A10 column hdrfile format 9999 column curfile format 9999 column curblk format 99999999 column hdrblock format 99999999 select owner, segment_name,partition_name, segment_type, file_id,block_id from dba_extents where file_id = &&h_file_id and block_id &&h_block_id; set verify on
For example, supply 15 for the file_id and 16215 for block_id to find the object_name for buffer with tch of 2855.
I don’t prefer above script, since performance is not so optimal. It is much easier to dump the blocks and convert them to object ids. Let’s dump these three blocks.
alter system dump datafile 15 block min 16215 block max 16215;
alter system dump datafile 9 block min 52872 block max 52872;
alter system dump datafile 8 block min 45128 block max 45128;
Reading trace files, we see three different segments and one line per block is printed below from the trace file.
seg/obj: 0xd756 csc: 0x00.17b5fe4f itc: 2 flg: E typ: 1 – DATA
seg/obj: 0x1801a csc: 0x00.1cb9f0ab itc: 2 flg: E typ: 1 – DATA
seg/obj: 0x181c5 csc: 0x00.1bef7a59 itc: 169 flg: E typ: 2 – INDEX
seg/obj field is the object_id printed in hex. Converting these hex numbers d756, 1801a and 181c5 to decimal equivalents results in 55126, 98330,98757.
Now,we could query, dba_objects to find object_names.
select owner, object_id, object_name, data_object_id from dba_objects where object_id in (55126, 98330,98757) or data_object_id in (55126, 98330,98757) SQL> / OWNER OBJECT_ID OBJECT_NAME DATA_OBJECT_ID ------------ ---------- ----------------- -------------- SOME_USER 55126 ORDER 55126 SOME_USER 98330 GBLOCK 98330 SOME_USER 98757 ALLOCATION_OID 98757
Comparing explain plans and object_names printed above, we can see that ORDER table is a common object between these two techniques.
| 7 | NESTED LOOPS | | 195 | 14040 | 18 | | 8 | TABLE ACCESS FULL | ORG | 1 | 34 | 2 | |* 9 | TABLE ACCESS FULL | ORDER | 195 | 7410 | 16 | ... | 14 | NESTED LOOPS | | 1 | 106 | 6 | | 15 | NESTED LOOPS | | 1 | 68 | 4 | |* 16 | TABLE ACCESS FULL | ORG | 1 | 34 | 2 | |* 17 | TABLE ACCESS FULL | ORG | 1 | 34 | 2 | |* 18 | TABLE ACCESS FULL | ORDER | 3 | 114 | 2 | 9 - filter(UPPER(UPPER("GO"."ORD_ID")) LIKE :1 AND "GO"."IS_MOD"='F') 18 - filter(UPPER(UPPER("GO"."ORD_ID")) LIKE :1 AND "GO"."IS_MOD"='F')
Quick summary
Let’s summarize what we have done so far.
1. We found one latch children address, located all buffers protected by that latch, found buffers with high tch, queried to find object names for those buffers
2. Through v$session_wait we found sql hash value, found sql suffering from latch contention, generated explain plan.
From these two different techniques, we can find objects common to both steps 1 and 2 and those objects are probable candidates to focus on. We see that ORDER table is common to both techniques. From the plan above, ORDER table is accessed in a tight nested loops join. This will increase buffer access to ORDER table, in turn, resulting in higher latching activity.
SQL tuning: That was easy
From here onwards, solution is straightforward, we need to avoid tight nested loops join. Specifically, if inner tables in the nested loops join is accessed with Full Table Scan access then that can cause increased latching activity. Hash join might be a preferred access method. For every row from outer row source, inner row source is queried, in a nested loops join. But, in hash join, tables are scanned once and hashed reducing latching activity.
In this specific case, ORDER is a small table. Further analysis revealed that, CBO chose nested loops join since rownum triggers first_rows optimizer_mode. As a test, let’s remove rownum clause to see what plan we get.
explain plan for select * from v1 WHERE ( col1 IN ( 3, 20, 21, 44, 45, 47, 48, 49, 50, 51, 57, 58, 59, 67, 68, 69, 76, 78, 79, 80, 81, 82, 84, 85, 106, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 476, 478, 500, 501, 502) OR col2 IN (3, 20, 21, 44, 45, 47, 48, 49, 50, 51, 57, 58, 59, 67, 68, 69, 76, 78, 79, 80, 81, 82, 84, 85, 106, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 476, 478, 500, 501, 502)) AND UPPER(col3) LIKE :1 --and rownum select * from table(dbms_xplan.display); --------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4902 | 818K| 4014| |* 1 | HASH JOIN | | 4902 | 818K| 4014| | 2 | TABLE ACCESS FULL | ORG | 370 | 12950 | 5| |* 3 | HASH JOIN | | 17513 | 2325K| 4008| | 4 | TABLE ACCESS FULL | ORG | 370 | 12950 | 5| |* 5 | HASH JOIN | | 17479 | 1724K| 4002| |* 6 | TABLE ACCESS FULL| ORDER | 17331 | 643K| 1320| | 7 | TABLE ACCESS FULL| TRADE | 353K| 21M| 2680| ---------------------------------------------------------------
After commenting rownum clause, CBO chose hash join to join that table. Comparing execution times between these two versions, running two of these SQLs in parallel, original query consumed almost 70 CPU seconds per execution and query with commented rownum clause consumed a cpu time of just 0.5 seconds. Essentially, optimizer_mode of all_rows should be used for this SQL, even if rownum predicate is used. Fortunately, this query is accessing a view (and only similar queries are accessing that view ) and so adding all_rows hint to that view resolved latch contention.
We used two pronged approach: Find objects causing latch contention and match those objects with execution plan of SQL suffering from latch contention. Then, resolved the issue with a minor change to the view. Of course, this can be read as pdf file from resolving-latch-contention-cbc.
[ Environment: Oracle version 10.2.0.3 in Linux server].
Amit Bhube said
Hi
Went through your article .I had certain doubts ,In the query tht you given in article ,When the query goes for NL join the cost is 233 ,but when i goes for HASH Join the cost increases to 4014. Wht abt the execution time ,consistent gets and physical reads the query did in both cases ?? did the execution time and physical reads reduce when the quey went for hash join ?
Amit Bhube
orainternals said
Hi Amit
Thanks for reading my blog.
That’s a classic example as to why reducing cost is not always the goal. Goal is to improve performance and resolve bottleneck. In this specific case, concurrency combined with excessive access to few buffers caused latch contention. We need to reduce access by eliminating Nested loops join.
Consistent gets also went down, typical of hash joins. Physical reads were never a problem since all buffers are already in the cache, in fact, there were negligible amount of physical reads in both cases.
Execution time also went down from 70 CPU seconds to 0.5 CPU seconds.
Charles Schultz said
Riyaj, thanks for sharing your insight. Curious, have you come up with a script to reduce the P1 address to objects in the buffer chain? Seems like the manual process would be a bit drawn out in a Production scenario when the boss is breathing down your neck. =)
Charles Schultz said
For instance, here is a prototype I am working on:
column segment_name format a35
/* From Metalink note 163424.1 */
/* and https://orainternals.wordpress.com/2008/07/30/tuning-latch-contention-cache-buffers-chain-latches/#comment-116 */
/* Modified by charles schultz */
with hot_latches as
(select * from
(select CHILD#
, ADDR
, GETS
, MISSES
, SLEEPS
from v$latch_children
where name = ‘cache buffers chains’
order by 5 desc, 1, 2, 3
)
where rownum < 6
)
select distinct * from (
select /*+ RULE */
sum(x.tch) over (partition by l.addr) total_touches,
l.addr latch_address,
o.name object_name,
x.tch touch_count,
l.child#,
l.sleeps sleeps,
l.gets
from
hot_latches l,
sys.x$bh x,
sys.obj$ o
where
x.hladdr = l.addr and
x.obj = o.obj#
)
order by total_touches desc, touch_count desc
/
orainternals said
Hi Charles
Thank you for reading my blog.
Thanks for sharing the script. There are couple of things I would like to point out though:
1) Query is accessing v$latch_children, which are somewhat historical information, meaning not real time waits. This data can mislead if other parts of application also have caused latch contention issue in the past [after instance restart ].
2) There is a join to obj$. So, undo blocks, dropped objects and other special types of blocks will not show, further increasing confusion. This needs to be an outer join. See my script below please.
I would also think, you might want to drive from v$session_wait and I wrote a small script. Please test it and let me know how that works. Even this script is not completely dependable, since some addr are RAW(8) and some are RAW(4) in these v$ views. For e.g. in my db (11.1.0.6) XP, v$session_wait.p1raw is raw(8), but v$latch_children.addr is raw(4). join conditions might fail due to that.
So, please use this script with care.
UPDATE2: I edited this script after posting.
with bh_lc as
(select /*+ ORDERED */
lc.addr, lc.child#, lc.gets, lc.misses, lc.immediate_gets, lc.immediate_misses, lc.spin_gets, lc.sleeps,
bh.hladdr, bh.tch tch, bh.file#, bh.dbablk, bh.class, bh.state, bh.obj
from
x$kslld ld,
v$session_wait sw,
v$latch_children lc,
x$bh bh
where lc.addr =sw.p1raw
and sw.p2= ld.indx
and ld.kslldnam=’cache buffers chains’
and lower(sw.event) like ‘%latch%’
— and state=’WAITING’
and bh.hladdr=lc.addr
)
select bh_lc.hladdr, bh_lc.tch, o.owner, o.object_name, o.object_type, bh_lc.child#, bh_lc.gets, bh_lc.misses, bh_lc.immediate_gets,
bh_lc.immediate_misses, spin_gets, sleeps
from
bh_lc, dba_objects o
where bh_lc.obj = o.object_id(+)
union
select bh_lc.hladdr, bh_lc.tch, o.owner, o.object_name, o.object_type, bh_lc.child#, bh_lc.gets, bh_lc.misses, bh_lc.immediate_gets,
bh_lc.immediate_misses, spin_gets, sleeps
from
bh_lc, dba_objects o
where bh_lc.obj = o.data_object_id(+)
order by 1,2 desc
I usually, execute, subsection of above script and confirm the script is working fine, before running whole script.
select /*+ ORDERED */
lc.addr, lc.child#, lc.gets, lc.misses, lc.immediate_gets, lc.immediate_misses, lc.spin_gets, lc.sleeps,
bh.hladdr, bh.tch tch, bh.file#, bh.dbablk, bh.class, bh.state, bh.obj
from
x$kslld ld,
v$session_wait sw,
v$latch_children lc,
x$bh bh
where lc.addr =sw.p1raw
and sw.p2= ld.indx
and ld.kslldnam=’cache buffers chains’
and lower(sw.event) like ‘%latch%’
— and state=’WAITING’
and bh.hladdr=lc.addr
HTH
Cheers
Riyaj
Cardinality feedback to resolve a Cache buffers chains latch contention issue « Oracle database internals by Riyaj said
[…] Earlier, I blogged about resolving cache buffers chains latch contention in my earlier entry , in which, root cause was excessive index access due to Nested Loops join. Recently, we resolved […]
Charles Schultz said
Thanks, Riyaj. Now I have to find a database that has cache buffers chain waits. =)
Your query only does a current snapshot; do you know of any way to use the AWR to obtain the same information historically? I am also curious about the union you have – why do we have to add in data_object_id? I am merely curious, as I am not that familiar with that column.
Charles Schultz said
I did find one minor issue with the query; the latch_children stats are a little misleading in the report. While it may be useful to see all the objects controlled by a specific latch, I would think that during a performance problem, one would want to know what the hot objects are; those objects that are being heavily requested (ie, high touch counts). Would it be pertinent to alter the query a little to make it a top-n type query?
I am thinking it would also be nice to cross-reference with the sessions that are experiencing the wait. I’ll try to play around with it some more. =)
orainternals said
Hi Charles
Thanks for reading my blog.
You had a question about union branch in latch_cbc_to_buf.sql. Problem was that x$bh.obj column has object_id in few cases and data_object_id in few other cases. Original SQL had only data_object_id, but in few specific instances, SQL didn’t print some objects since object_id was populated in x$bh.obj column (which may have been a bug when I think about it now). That’s why one branch joins with object_id and another with data_object_id. Of course, it is an outer join and so can’t exactly use OR clause.
Side effect of above change is that bh_lc is materialized in the execution plan.
About touch_count question: In some cases, it is the # of clones that causes issues, not necessarily, touches to the buffer. As you may know, cloning buffers involves quite a bit of latching activity too. If we sort by touch count, cloned buffers are not visible and can be misleading.
So, I guess, what I am saying that is probably better to get more rows and throw away junk rows manually then throwing away from SQL š
Yes, I would very much appreciate, if you can enhance the script.
Cheers
Riyaj
orainternals said
Hi Charles
I removed the union all branch joining with object_id and tested out.. It works fine with data_object_id outer join itself.
with bh_lc as
(select /*+ ORDERED */
lc.addr, lc.child#, lc.gets, lc.misses, lc.immediate_gets, lc.immediate_misses, lc.spin_gets, lc.sleeps,
bh.hladdr, bh.tch tch, bh.file#, bh.dbablk, bh.class, bh.state, bh.obj
from
x$kslld ld,
v$session_wait sw,
v$latch_children lc,
x$bh bh
where lc.addr =sw.p1raw
and sw.p2= ld.indx
and ld.kslldnam=’cache buffers chains’
and lower(sw.event) like ‘%latch%’
— and state=’WAITING’
and bh.hladdr=lc.addr
)
select bh_lc.hladdr, bh_lc.tch, o.owner, o.object_name, o.object_type, bh_lc.child#, bh_lc.gets, bh_lc.misses, bh_lc.immediate_gets,
bh_lc.immediate_misses, spin_gets, sleeps
from
bh_lc, dba_objects o
where bh_lc.obj = o.data_object_id(+)
order by 1,2 desc
/
Cheers
Riyaj
Muthu said
Riyaj,
Very nice article. We’re experiencing latch contention in our EBS environment off and on. I’m sure, this article will help troubleshooting the issues.
Also, have you blogged with the focus on tuning the SQLs ? We’ve a situation in whchi, the SQLs that were written over 3 yrs ago, during the implementation phase, has undergone some changes ; and with the growth of the database from 1 TB to around 4 TB, performance troubleshooting has become a big challenge now.
Thanks and regds.
Muthu
orainternals said
Hello Muthu
Thanks for visiting. I am glad this blog is helpful.
Cheers
Riyaj
Osama Karim said
Riyaz,
Thanks for the lovely blog. We are facing similar problem where CPU usage is going upto 80% +. We will try to implement your Suggestion.
Thanks
CBC Latch Contention – My Experience « Anand's Blog said
[…] https://orainternals.wordpress.com/2008/07/30/tuning-latch-contention-cache-buffers-chain-latches/ […]
troullyTror said
Wow thank god almighty for orainternals.wordpress.com. That training and kindness in taking care of the whole thing was tremendous. I don’t know what I would’ve done if I had not come across such a point like this. I can at this moment look ahead to my future. Thank you so much for the professional and result oriented guide. I will not be reluctant to recommend the website to any person who will need recommendations on this area.
Cardinality feedback to resolve a Cache buffers chains latch contention issue - OraInternals said
[…] I blogged about resolving cache buffers chains latch contention in my earlier entry , in which, root cause was excessive index access due to Nested Loops join. Recently, we resolved […]
Bad Stats and the Strage "Cache Buffers Chain Latches" Wait said
[…] know what this mysterious “latch” thing was, so of course I google’d it and found an article from “orainternals”. Inside is a lot of Oracle DBA stuff that I probably don’t have access to but toward […]
metalframe-pool said
It’s going to be ending of mine day, but before finish I am reading this fantastic
paragraph to increase my experience.