Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Archive for the ‘Oracle database internals’ Category

High global cache waits on tab$ ?

Posted by Riyaj Shamsudeen on February 25, 2009

Yes, you read it correct. That shocked me too.

I was trying to understand global cache waits for a client. Naturally, I queried statspack tables and analyzed the data for just one day using a script. Surprisingly, tab$ came as top consumer of global cache waits. I was shocked by the revelations and couldn’t believe it! If something doesn’t make sense, look more closely, right?

Global cache waits

Database version is 9i. Statspack table stats$seg_stat is the source for this script and that table is populated from v$segment_stats. So, these column values (global_cache_cu_blocks_served and global_cache_cr_blocks_served) are cumulative. To find statistics for a specific period, we need to subtract column value of prior row from current row. Analytic function lag can be useful for this.

In this analytic function printed below, partitioning clause uses instance_number, startup_time and dataobj#. All rows with same value for these three columns will be considered in one data partition and then rows ordered by snap_id within that data partition. Lag will pull the row from prior snap_id in that partition. Then, we subtract from current value to get the difference. Please refer to this paper: Performance tuning with SQL new features – paper for more information about analytic functions.

   global_cache_cu_blocks_served -
   lag(global_cache_cu_blocks_served,1,0) over (partition by instance_number,startup_time, dataobj#, obj#
              order by  snap_id ) global_cache_cu_blocks_served,

Script and output

Complete script printed below and running that against a database.

Continue Reading

Posted in EBS11i, Oracle database internals, Performance tuning | Tagged: , , , , | 6 Comments »

A stroll through shared pool heaps

Posted by Riyaj Shamsudeen on January 15, 2009

Last week, we were discussing about increasing shared_pool_reserved_size to combat a performance issue(bug) in a conference call. I thought, it was a common knowledge that shared_pool reserved area is part of a shared_pool and surprisingly it is not-so-common.

In this blog, we will discuss about shared_pool and shared_pool reserved area internals. First, we will discuss about details specific to release 9i and then discuss changes in later releases 10g/11g.

oradebug command

We will use oradebug command to dump the heap with level 2. Level 2 is to dump shared_pool heap in to a trace file.

 oradebug setmypid
 oradebug dump heapdump 2

Above command generates a trace file and we will walk through the trace file and review various areas closely.


In this test instance, we have a bigger SGA. Shared_pool (6GB) and shared_pool_reserved_size values are printed below.

SQL> show parameter shared_pool
shared_pool_reserved_size            big integer 629145600
shared_pool_size                     big integer 6442450944

Trace file analysis
Continue Reading

Posted in Oracle database internals, Performance tuning | Tagged: , , , , , , , , | 8 Comments »

Log file synch tuning #2

Posted by Riyaj Shamsudeen on November 24, 2008

After reading my earlier blog about log file sync tuning a reader sent an interesting issue, worth blogging about it.


Excessive wait time for ‘log file sync’ event while wait time for ‘log file parallel write’ is minimal. See statspack top events below


Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time                                                       18,435    31.06
PL/SQL lock timer                                   4,370      12,137    20.45
log file sync                                      57,939       8,565    14.43  <-----
db file sequential read                         1,303,477       7,030    11.85
db file scattered read                            505,147       3,992     6.73

                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync                      57,939      6,571      8,565    148      0.5
log file parallel write           215,112          0      1,138      5      1.9

Average wait time for ‘log file sync’ event is 148ms and it is 5ms for ‘log file parallel write’ event. There is an oddity here. Wait time for ‘log file sync’ event seems bit excessive compared to ‘log file parallel write’ event. Ratio between number of waits for ‘log file sync’ and ‘log file parallel write’ is approximately 4. In an average case scenario, if each commit resulted in 4 log writes, then I would expect an Average wait time of 20ms for ‘log file sync’ event. Wait time of 148ms? Needs further analysis.

pstack, pmap and more

Database is running in a Solaris 10 server and database version Since this is a Solaris operating environment, we can use proc utilities to debug this further. We used a script to dump pstack and pmap of LGWR in a loop. This wasn’t much help, but we could guess that LGWR was waiting for I/O since there were many lines in the stack with kaio and aio wait calls. Few lines from pstack output printed below.

6924:	ora_lgwr_ERPPRD2
-----------------  lwp# 1 / thread# 1  --------------------
 ffffffff7ddd4838 semsys   (4, 3, ffffffff7fffd5cc, 1, ffffffff7fffd5b8)
 00000001005b2694 ksliwat (0, 3c0, 77e8a7790, 1037c0b30, 3, 0) + 274
-----------------  lwp# 2 / thread# 2  --------------------
 ffffffff7ddd3a34 kaio     (6, 0, ffffffff7def7f80, 10, ffffffff7def9c18, ffffffff7f700a00)
-----------------  lwp# 3 / thread# 3  --------------------
 ffffffff7ddd2f1c lwp_park (0, 0, 0)
 ffffffff7ddcc748 cond_wait_queue (1038e5a70, 1038e5a80, 0, 0, ffffffff7def9d08, 0) + 28
 ffffffff7ddccca8 cond_wait (1038e5a70, 1038e5a80, 0, 0, 1c00, 0) + 10
 ffffffff7db03a60 _aio_idle (1038e5a00, 1, 1038e5a80, 0, ffffffff7def9c18, 0) + 28
 ffffffff7db034fc _aio_send_sigev (1038e5a00, 0, 104b60, ffffffff7ddd2cc8, ffffffff7db03498, ffffffff7dc08000) + 64


I/O statistics did not provide much clarity either. We need to find if LGWR is suffering from a performance issue. To see if LGWR is suffering from any OS related issues, we need to trace system calls from LGWR and Truss utility provides such a facility. Suffering from truss related paranoia, we didn’t want to run truss against LGWR since that can affect performance, more importantly database stability.

Fortunately, dtruss came handy. dtruss is based upon dtrace utility and by design, dtrace is safe in Solaris Sparc platform.

Keep Reading

Posted in Oracle database internals, Performance tuning | Tagged: , , , , , | 12 Comments »

Performance issue: High Kernel mode CPU usage

Posted by Riyaj Shamsudeen on October 31, 2008

Recently, we resolved a performance issue with one of our esteemed clients. Problem was interesting and worth blogging about it.

An application makes 300 static connections to database DB1 in database server, say CLNTDB server. Application relies on database link and over a period of time, each session executes SQL through a database link, creating a connection in central database PROD1. So, there are 300 connections in PROD1 database coming from DB1 database through database links. Performance is fine during normal operation.

Problem starts when the application is shutdown. Shutting down the application in DB1, creates massive CPU consumption in PROD1DB server. Unfortunately, this spike in CPU usage lasts for 5-10 seconds and causes ASM heartbeat to fail. Considering that PROD1 is a central database and this has a global effect on application(s) functionality. See presentation below for graphical representation of this problem.

Looking at symptoms in detail, it is visible that CPU usage in sys mode. It is not uncommon to see high CPU usage during a storm of disconnects. But, this specific CPU usage is much higher and uses all CPUs in kernel mode. If this is a problem due to process destruction, then this will show up in CLNTDB server too as there are 300 disconnects there too. But, this problem manifests only in PROD1DB server.

mpstat output from PROD1DB server shows that CPU usage in %sys mode. Notice the numbers below under sys column. almost all CPUs are used in sys mode. We need to drill down further to understand why this much CPUs are used in sys mode. It is also visible that no other columns have any abnormal higher values.

CPU ..intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    554  237  651  219   87  491    0  4349    9  91   0   0
  1    911    0 2412  591  353  630    0 15210   30  63   0   7
  2    313    0  613  106  190  105    0  3562    8  90   0   2
  3    255    0  492   92  161  530    0  2914    6  92   0   2
  4     86    1    2    3    1   63    0     8    0 100   0   0
  5    662    0 1269  153  326  211    0  6753   13  77   0  10
  6    349    0  589   54  170 1534    0  3002    7  88   0   5
  7    317   81  411  105   51   61    0  2484    4  93   0   3
  8   6423 6321  697   36  148  546    0  3663    6  86   0   9
  9    363    0  639   41  154 1893    0  3214    9  85   0   6
 10    456    0  964  194  193   77    0  5797   10  81   0   8
 11    104    0   42    3    9  183    0   207    0  95   0   5
 12    195    0  279  110   31   80    0  1590    3  97   0   0
 13    449    0  844  117  158  127    0  4486    7  85   0   8

Test case
Let’s analyze this issue for a single connection to simplify the problem. Created a test user in DB1 database, created a private database link from DB1 to PROD1 and then executed a select statement over that database link.

In db1:

create user test1 identified by test1;
grant connect, resource to test1;
conn test1/test1
create dtabase ink proddb
connect to test1 identified by test1
using 'proddb';

select * from dual@proddb;

At this point, a database link connection has been created in PROD1 database. Querying v$session in PROD1 database to find session created for database link. SID 4306 is the session created for our test connection from db1 database.

select sid, serial#, LOGON_TIME,LAST_CALL_ET from v$session where
  logon_time > sysdate-(1/24)*(1/60) and machine='machine_name_here'
---------- ---------- -------------------- ------------
      4306      51273 12-SEP-2008 20:47:29            1

Associating this session to v$process, found UNIX process for this session. Now, we will start truss on this UNIX process( Solaris platform).

truss -p -d -o /tmp/truss.log

In DB1 executed: exit

This will disconnect session in PROD1 database.

Truss output
Truss output contains activity generated for disconnect in PROD1 server. Reading through truss output, we can see that system call (shmdt) consumed CPU time. Each shmdt call consumes approximately 24ms from output below (18.5053-18.4807=0.0242). We are using -d flag as above to print time spent in that call. Of course, this is a system call and so this CPU usage will be in kernel mode.

18.4630 close(10)                                       = 0
18.4807 shmdt(0x380000000)                              = 0
18.5053 shmdt(0x440000000)                              = 0
18.5295 shmdt(0x640000000)                              = 0
18.5541 shmdt(0x840000000)                              = 0
18.5784 shmdt(0xA40000000)                              = 0
18.6026 shmdt(0xC40000000)                              = 0
18.6273 shmdt(0xE40000000)                              = 0
18.6512 shmdt(0x1040000000)                             = 0
18.6752 shmdt(0x1240000000)                             = 0
18.6753 shmdt(0x1440000000)                             = 0

So, one disconnect executes 10 system calls and consumes approximately 0.24 CPU seconds in kernel mode. Shmdt calls are used to detach from shared memory segment. Since there are 10 shared memory segments ( as visible in ipcs -ma), 10 shmdt calls are executed per session disconnect.

Projecting this calculation for 300 connections, CPU consumption will be for approximately 72 seconds in total. With 16 CPUs, at least, 5-6 seconds will be used in kernel mode, assuming linear projections[ But, in practice, due to mutex calls and such this may not be linear and will be over 72 seconds]. This is matching with our observation: 5-10 seconds of CPU consumption in kernel mode. First thing, we need to do is reduce # of shmdt calls. One way is to reduce # of shared memory segments.

Shared memory segments
We thought that SHMMAX kernel parameter is limiting since SGA size was bigger than SHMMAX size. After changing SHMMAX parameter, restarting server and such, still many shared memory segments were created.

That’s interesting! TRUSSing instance startup to understand why multiple shared memory segments are created.

Few lines from truss output shows calls to system calls _lgrpsys and pset_bind.

4.5957 munmap(0xFFFFFD7FFDAE0000, 32768)               = 0
4.5958 lgrp_version(1, )                               = 1
4.5958 _lgrpsys(1, 0, )                                = 42
4.5958 _lgrpsys(3, 0x00000000, 0x00000000)             = 19108
4.5959 _lgrpsys(3, 0x00004AA4, 0x06399D60)             = 19108
4.5959 _lgrpsys(1, 0, )                                = 42
4.5960 pset_bind(PS_QUERY, P_LWPID, 4294967295, 0xFFFFFD7FFFDFB11C) = 0
4.5960 pset_info(PS_MYID, 0x00000000, 0xFFFFFD7FFFDFB0D4, 0x00000000) = 0
4.5961 pset_info(PS_MYID, 0x00000000, 0xFFFFFD7FFFDFB0D4, 0x061AA2B0) = 0

_lgrpsys calls indicates that there is some form of NUMA activity going on here. pset_bind is used to bind a thread or process to a specific processor set.

Enter NUMA.

NUMA or Locality groups

Keep Reading

Posted in Oracle database internals, Performance tuning | Tagged: , , , , , , , | 13 Comments »

How to find objects creating nologging changes?

Posted by Riyaj Shamsudeen on September 11, 2008

In an Oracle-l thread, a question was raised: How to find objects creating nologging changes?

What is a ‘nologging’ change?

Redo logging mechanism plays critical role in media recovery. Media recovery relies on archivelog files generated to roll forward the database. Standby database or dataguard recovery also relies on archivelog files. It is possible to do DML changes with minimal logging, aka nologging changes or direct mode inserts. For example, insert /*+ append */ can be used to populate the rows in to a table without generating much redo. This can invalidate standby database and might trigger rebuilding some or all parts of the standby database.

Nologging changes generates minimal redo, since the blocks are pre-formatted and written to disk directly. A redo record is generated invalidating a range of affected blocks. This invalidation redo record size is far smaller, for e.g. hundreds of blocks can be invalidated using just a single redo record. Of course, recovery is severely affected as the changes performed with nologging operations can NOT be reapplied / recovered.

Internals of nologging changes

Since nologging is all about redo records, dumping redo log file or archivelog file is a concrete way to see what happens under the hood. Let’s consider an example to explain internals of nologging changes.

We will create a table, insert rows, and review redo records closely.

  create table t4 ( a number) nologging tablespace users;
  REM switch log file so that we wil have smaller log files to dump.
  alter system switch logfile;
  REM Direct mode or nologging inserts
  insert /*+ append */ into t4 select object_id from dba_objects;
  alter system switch logfile;

Prior online redo log file contains nologging changes from table t4 above. We could dump that log file in its entirety, but we will dump just a layer specific to direct mode changes alone to limit trace file size. Layer 19 is for nologging changes. Following script will dump last online redo log file for layer 19.

 set serveroutput on size 1000000
   v_sqltext varchar2(255);
   -- Find name of latest but one log file.
   select 'alter system dump logfile '||chr(39)||
                member||chr(39) || ' layer 19 '
   into v_sqltext
   v$log  lg, v$logfile lgfile
   where = and
   lg.sequence# =
       (select sequence#-1 from v$log where status='CURRENT' )
   and rownum <2;
   dbms_output.put_line ('Executing :'||v_sqltext);
   --Execute above SQL to dump log file and print it to see.
   execute immediate v_sqltext;

alter system dump logfile 'D:\ORACLE\ORADATA\ORCL11G\REDO02.LOG' layer 19

Above command generated a trace file in user_dump_dest directory and few redo records from that trace file printed below:

REDO RECORD - Thread:1 RBA: 0x0000b0.0000000f.00d4 LEN: 0x0034 VLD: 0x01
SCN: 0x0000.00486397 SUBSCN:  1 09/03/2008 10:54:38
CHANGE #1 INVLD AFN:4 DBA:0x01038824 BLKS:0x0001 OBJ:72852 SCN:0x0000.00486397 SEQ:  1 OP:19.2
Direct Loader invalidate block range redo entry

REDO RECORD - Thread:1 RBA: 0x0000b0.00000014.0118 LEN: 0x0034 VLD: 0x01
SCN: 0x0000.0048639a SUBSCN:  1 09/03/2008 10:54:38
CHANGE #1 INVLD AFN:4 DBA:0x01038832 BLKS:0x0001 OBJ:72852 SCN:0x0000.0048639a SEQ:  1 OP:19.2
Direct Loader invalidate block range redo entry

Explanation of redo records

Let’s review following two lines.

CHANGE #1 INVLD AFN:4 DBA:0x01038832 BLKS:0x0001 OBJ:72852 SCN:0x0000.0048639a SEQ:  1 OP:19.2
Direct Loader invalidate block range redo entry

OP:19.2 indicates that layer is 19 for this change vector and 2 is the opcode. This vector also specifies that BLKS=1 and DBA as 1038832. Essentially this change vector says “Invalidate range of blocks starting at block DBA 0×01038832 for 1 block”. This could be for a range of blocks too.

For nologging changes a block range invalidation redo generated, block formatted, populated with rows and written directly to disk. But, standby database and media recovery rely on redo records to replay these changes. When recovery code encounters above direct loader invalidate redo record, it throws a warning to alert log and simply mark that range of blocks as invalid. Any attempt to access those blocks will throw ORA-1578 block corruption error.

Just to reiterate, standby and media recovery are affected by nologging changes. But, Current database is still fine and there is no corruption in primary database. If we backup the primary database again or rebuild that tablespace in standby database, we avoid corruption. Point is that, if there is a business need to have nologging changes, consider taking a backup immediately after nologging changes. Of course, if there is a standby database involved, then that tablespace need to be re-copied from production.

Back to our problem

We need to find the object causing nologging changes. Field OBJ:72852 shows object_id in decimal. We can query dba_objects to find object_name.

select owner, object_name from dba_objects where
object_id=72852 or data_object_id=72852
SQL> /
OWNER                          OBJECT_NAM
------------------------------ ----------
CBQT                           T4

In essence, procedure to find object_name causing nologging generation is:

    1. Find archivelog containing nologging changes. If timestamp is known, then v$archived_log can be queried to find archivelog file name.
    2. Dump archivelog file for layer 19: alter system dump logfile ‘filename’ layer 19;
    3. Locate redo record with opcode 19.2.
    4. Query dba_objects for that object_id.More information about redo internals can be found in redo internals doc

Posted in Oracle database internals, Performance tuning, Uncategorized | Tagged: | 14 Comments »

Import performance: Does import of date column resorts to single row inserts, like lob columns ?

Posted by Riyaj Shamsudeen on August 3, 2008

We had a discussion about import performance in this otn forum . During the discussion, OP raised a doubt that import will resort to single row inserts for tables with date columns. Buffer parameter specifies, essentially, size of this array for array inserts.

We know that if a table has lob columns, then import parameter buffer is not honored and import utility resorts to single row inserts for those tables. But, claim here is, tables with date columns also suffers from single row inserts. We will probe this further and validate that claim in this blog.

Let’s create a table and populate 300K rows.

 create table t1 (n1 number, v1 varchar2 (512), d1 date);
 insert into t1
 select n1, lpad(n1, 500, 'x'), sysdate
 from (select level n1 from dual connect by level <=300003);

 REM Creating an  export file..
 host exp userid=cbqt/cbqt file=exp_t1.dmp log=exp_t1.log tables=t1

Above code fragment created a table, inserted 300,000 rows and exported that table to an export dump file. This dump file is ready to be imported. But, we need to trace the import to measure the effect of buffer parameter. Problem is that how to we trace import session alone, without generating every session in the database? This can be achieved by creating a logon trigger as below. Only sessions from a test user will have trace enabled from this trigger (username is cbqt).

REM I could potentially , use "on schema clause too, but this is part of a generic code that I use.
REM Riyaj Shamsudeen - To trace a session through logon trigger
create or replace trigger
after logon  on database
v_user dba_users.username%TYPE:=user;
sql_stmt1 varchar2(256) :='alter session set events '||chr(39)||'10046 trace name context forever, level 12'||chr(39);
  if (v_user = 'CBQT') THEN
      execute immediate sql_stmt1;
  end if;

Let’s drop the table, import with a default buffer size of 64K. Through logon trigger a new sql trace file will be generated. That trace file will be analyzed with tkprof utility as shown in the code fragment below:

drop table t1;

imp userid=cbqt/cbqt file=exp_t1.dmp log=imp_t1.log commt=Y full=Y

tkprof orcl11g_ora_3840.trc orcl11g_ora_3840.trc.out sort=execpu, fchcpu

From the tkprof output file generated, pertinent lines are printed below. Insert statement was executed 5455 times which works out to be an average array size of 157 rows.


SQL ID : c9nv9yq6w2ydp
INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "T1" ("N1", "V1", "D1") 
 (:1, :2, :3)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute   5455     15.06      20.10        108      43261     212184      300003
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     5456     15.06      20.10        108      43261     212184      300003

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 88  

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  LOAD TABLE CONVENTIONAL  (cr=7 pr=0 pw=0 time=0 us)

Let’s repeat this test case for a buffer size of 1MB.

sqlplus cbqt/cbqt <<EOF
drop table t1;

imp userid=cbqt/cbqt file=exp_t1.dmp log=imp_t1.log buffer=1048576 commt=Y full=Y 

tkprof orcl11g_ora_3846.trc orcl11g_ora_3846.trc.out sort=execpu, fchcpu

Trace lines from the tkprof output file for 1MB test case shown below:

Keep Reading

Posted in Oracle database internals, Performance tuning | Tagged: , , , , , , , | 1 Comment »

Tuning latch contention: Cache buffers chain latches

Posted by Riyaj Shamsudeen on July 30, 2008

Recently, I had an opportunity to tune latch contention for cache buffers chain (CBC) latches. Problem statement is that high CPU usage combined with poor application performance. Quick review of statspack report of 15 minutes showed a latch free wait as top event and consuming 3600 seconds approximately, in a 8 CPU server. Further CPU usage was quite high, which is a typical symptom of latch contention, due to spinning involved. v$session_wait showed that hundreds of sessions were waiting for latch free event.

SQL> @waits10g

   SID PID     EVENT         P1_P2_P3_TEXT
------ ------- ------------  --------------------------------------
   294  17189  latch free    address 15873156640-number 127-tries 0
   628  17187  latch free    address 15873156640-number 127-tries 0
   343  17191  latch free    address 15873156640-number 127-tries 0
   599  17199  latch: cache  address 17748373096-number 122-tries 0
               buffers chains
   337  17214  latch: cache  address 17748373096-number 122-tries 0
               buffers chains
   695  17228  latch: cache  address 17748373096-number 122-tries 0
               buffers chains
   276  15153  latch: cache  address 19878655176-number 122-tries 1
               buffers chains

We will use two pronged approach to find root cause scientifically. First, we will find SQL suffering from latch contention and objects associated with access plan for that SQL. Next, we will find buffers involved in latch contention, map that back to objects. Finally, we will match these two techniques to pinpoint root cause.

Before we go any further, let’s do a quick summary of internals of latch operations.

Brief Introduction to CBC latches and not-so-brief reason why this is a complicated topic to discuss briefly

Latches are internal memory structures to coordinate access to shared resources. Locks aka enqueues are different from latches. Key difference is that enqueues, as name suggests, provides a FIFO queueing mechanisms and latches do not provide a queueing mechanism. On the other hand, latches are held very briefly and locks are usually held longer.

In Oracle SGA, buffer cache is the memory area data blocks are read in to, aka buffer cache. [If ASMM - Automatic Shared Memory Management is in use, then part of Shared pool can be tagged as KGH:NO ALLOC and remapped to buffer cache area too].

Each buffer in the buffer cache has an associated element the buffer header array, externalized as x$bh. Buffer headers keeps track of various attributes and state of buffers in the buffer cache. This Buffer header array is allocated in shared pool. These buffer headers are chained together in a doubly linked list and linked to a hash bucket. There are many hash buckets (# of buckets are derived and governed by _db_block_hash_buckets parameter). Access (both inspect and change) to these hash chains are protected by cache buffers chains latches.

Further, buffer headers can be linked and delinked from hash buckets dynamically.

Simple algorithm to access a buffer is: (I had to deliberately cut out so as not to deviate too much from our primary discussion.)

  1. Hash data block address (DBA: Combination of tablespace, file_id and block_id) to find hash bucket.
  2. Get latch protecting hash bucket.
  3. If (success) then Walk the hash chain reading buffer headers to see if a specific version of the block is already in the chain.
  4. If found, access the buffer in buffer cache, with protection of buffer pin/unpin actions.
    If not found, then find a free buffer in buffer cache, unlink the buffer header for that buffer from its current chain, link that buffer header with this hash chain, release the latch and read block in to that free buffer in buffer cache with buffer header pinned.

  5. If (not success) spin for spin_count times and go to step 2.
  6. If this latch was not got with spinning, then sleep, with increasing exponential back-off sleep time and go to step 2.

Obviously, latches are playing crucial role controlling access to critical resources such as hash chain. My point is that repeated access to few buffers can increase latch activity.

There are many CBC latch children (derived by size of buffer cache). Parameter _db_block_hash_latches control # of latches and derived based upon buffer cache size. Further, In Oracle 10g, sharable latches are used and inspecting an hash chain needs to acquire latches in share mode, which is compatible with other shared mode operations. Note that these undocumented parameters are usually sufficient and changes to these parameters must get approval from Oracle support.

Back to our problem…

Let’s revisit our problem at hand. Wait graph printed above shows that this latch contention is caused by two types of latches. Latch # 127 is simulator lru latch and #122 is cache buffers chains latch.
Keep Reading

Posted in CBO, Oracle database internals, Performance tuning | Tagged: , , , , | 17 Comments »

Log group switch sequence

Posted by Riyaj Shamsudeen on January 11, 2007

Q: How does Oracle determines which log group to use next,
at the onset of log switch ?
Q: How does adding a log group alters the sequence ?

Log group to switch, is determined from controlfile.
Let’s review the controlfile for pertinent information
Dumping the control file using following commands reveals more information:

oradebug setmypid
oradebug dump controlf 12

Above commands generated a trace file in
user_dump_dest directory and following is the excerpt from the file.
Only few lines shown to improve readability.


(blkno = 0×9, size = 72, max = 16, in-use = 8, last-recid= 15)
siz: 0xc8000 seq: 0x0000022b hws: 0×2 bsz: 512 nab: 0×9 flg: 0×1 dup: 1
siz: 0xc8000 seq: 0x0000022e hws: 0×2 bsz: 512 nab: 0×39 flg: 0×1 dup: 1
siz: 0xc8000 seq: 0×00000230 hws: 0×1 bsz: 512 nab: 0xffffffff flg: 0×8 dup: 1
siz: 0×5000 seq: 0×00000228 hws: 0×2 bsz: 512 nab: 0xc flg: 0×1 dup: 1
siz: 0×5000 seq: 0x0000022a hws: 0×2 bsz: 512 nab: 0×4 flg: 0×1 dup: 1
siz: 0×5000 seq: 0x0000022c hws: 0×2 bsz: 512 nab: 0×98 flg: 0×1 dup: 1
siz: 0×5000 seq: 0x0000022f hws: 0×2 bsz: 512 nab: 0xe9 flg: 0×1 dup: 1
siz: 0×5000 seq: 0x0000022d hws: 0×2 bsz: 512 nab: 0×5 flg: 0×1 dup: 1

Sequence# above is log sequence # in hexadecimal.

Oracle determines the next log group to switch:
i) searching for the log group with lowest sequence# that is archived.
ii) If a new log group is added, then that new log group will have log sequence# as zero and so that group will be selected
iii) If there are two new log groups, then first one encountered in the above array is selected.

Third point is quite important, since if we drop a log group, then that log group is simply marked as deleted, not completely removed from the array. So, if the same log group is readded, then deleted entry is reused.

Luckily, above controlfile array is externalized as x$kccle table. Instead of dumping the controlfile, we could use x$kccle to test our understanding:

Following SQL, can predict which log group will be selected by Oracle, during next log switch:

SELECT indx, lenum group#, leseq sequence#, archived,
DECODE (leflg, 8, ‘CURRENT’) curstatus,
CASE WHEN leseq = minseq AND indx = minind THEN ‘NEXT’
FROM (SELECT indx, lenum, leseq, leflg,
DECODE (BITAND (leflg, 1), 0, ‘NO’, ‘YES’) archived,
FIRST_VALUE (leseq) OVER (ORDER BY leseq, indx,BITAND (leflg, 1) DESC) minseq,
FIRST_VALUE (indx) OVER (ORDER BY leseq, indx, BITAND (leflg, 1) DESC) minind
FROM x$kccle
WHERE lesiz != 0 AND inst_id = USERENV (‘instance’))

For RAC:

SELECT lethr, indx, lenum group#, leseq sequence#, archived,
DECODE (leflg, 8, ‘CURRENT’) curstatus,
CASE WHEN leseq = minseq AND indx = minind THEN ‘NEXT’
FROM (SELECT lethr, indx, lenum, leseq, leflg,
DECODE (BITAND (leflg, 1), 0, ‘NO’, ‘YES’) archived,
FIRST_VALUE (leseq) OVER (partition by lethr ORDER BY leseq, indx,BITAND (leflg, 1) DESC) minseq,
FIRST_VALUE (indx) OVER ( partition by lethr ORDER BY leseq, indx, BITAND (leflg, 1) DESC) minind
FROM x$kccle
WHERE lesiz != 0 AND inst_id = USERENV (‘instance’))

See Oracle log sequence for complete test case.

Posted in Oracle database internals | Leave a Comment »


Get every new post delivered to your Inbox.

Join 193 other followers