Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

RAC performance tuning: Understanding Global cache performance

Posted by Riyaj Shamsudeen on December 23, 2009

Global cache performance metrics are not correctly measured. It is not understood clearly either. There are even few blogs and web pages disseminating incorrect information. This blog entry is an attempt to offer few methods and scripts to understand global cache performance.

Always review all instances

It is very important to review the performance metrics from all instances in that RAC cluster, not just one instance that you are connected. If you have access to AWR reports, then it is critical to generate AWR reports (or statspack reports) from all instances. But, the problem is that, DBAs tend to generate AWR reports after logging in to each instance iteratively, enter couple of parameters and then reports are generated. Not exactly a convenient practice.

  REM connect to each instance separately, type in the beginning snap_id and ending snap_id for each node etc..
   sqlplus mydba@proddb1 
   @$ORACLE_HOME/rdbms/admin/awrrpt.sql
   exit;
   sqlplus mydba@proddb2
   @$ORACLE_HOME/rdbms/admin/awrrpt.sql
   exit;
   sqlplus mydba@proddb3
   @$ORACLE_HOME/rdbms/admin/awrrpt.sql
   exit;

  

There are few issues with this approach. It is a cumbersome practice if the instance count is higher. In addition to that, all of AWR reports are, in turn, accessing underlying AWR tables. Physically, rows from all instances are together in the same block and so, by executing these reports connecting to various instances, Global cache traffic is increased. If the database is suffering from Global cache (GC) performance issues then generating reports connecting to various instances is probably not a grand idea.

Keep Reading

Posted in Oracle database internals, Performance tuning, RAC | Tagged: , , , , , , , , | 23 Comments »

UKOUG 2009

Posted by Riyaj Shamsudeen on November 24, 2009

Well, it is that time of the year !

Few Presentations

I will be presenting two papers in Birmingham, UK Oracle User Group Technology conference . My sessions are :


  1. Tuesday 10:55-11:55: Battle of the nodes: RAC Performance myths .

  2. Wednesday 15:15-16:00: Advanced performance issues: A close encounter with real world performance issues .

I will upload new versions of these presentations in this blog, few days after the conference. See you in Birmingham.

Update: Papers and presentations can be downloaded following the links below:
1. Battle of the nodes RAC performance myths ppt

2. Battle of the nodes RAC performance myths doc

3. Close encounter with real world and odd perf issues ppt

Posted in Presentations, RAC | 2 Comments »

Is plan_hash_value a final say?

Posted by Riyaj Shamsudeen on September 13, 2009

I was reviewing a performance issue with a client recently. Problem is that increased global cache waits causing application slowdown affecting few critical business functions. Using one of my script gc_traffic.sql > and graphing the results with Excel spreadsheet, it is established that there is a marked increase in GC traffic today compared to week earlier. Similar jobs runs every day and so comparing two week days is sufficient to show the GC traffic increase. Graph is between total blocks and AWR snap time in 30 minutes interval. [Click the picture below to review the graph clearly.]

Identifying the object creating this increased GC traffic is essential to identify root cause. We were able to quickly determine that this increase in GC traffic was localized around few SQL statements using ADDM and AWR reports. We decided to focus on one SQL with an obvious increase in elapsed time compared to prior week. So, first question asked, is there a change in the plan? plan_hash_value was reviewed and quickly determined that there is no change in the plan_hash_value.

Keep Reading

Posted in CBO, Performance tuning, RAC, Uncategorized | Tagged: , , , , , | 7 Comments »

Exciting seminars in Dallas arena

Posted by Riyaj Shamsudeen on August 21, 2009

If you live in Dallas area, there are two very important seminars coming up:

  1. Tanel Poder is presenting his Advanced Oracle troubleshooting seminar on September 9-11, 2009. .
  2. Jonathan Lewis presenting two day seminar cost based optimization and writing efficient SQL on Oct 28 and Oct 29 2009
  3. .

These two gentlemen needs no introduction. Tanel Poder is an expert in Oracle internals and Jonathan Lewis is a Guru in Cost based optimization and writing efficient SQL.

Tell them I sent ya :-)

Update: I just heard from Tanel and he is unable to make it to Dallas this September. He is rescheduling his seminar and I can’t wait for his seminar. I will update this blog with his rescheduled dates.

Posted in CBO, Oracle database internals, Performance tuning, Presentations | 5 Comments »

Shared pool freelists (and durations)

Posted by Riyaj Shamsudeen on August 9, 2009

My earlier blog about shared pool duration got an offline response from one of my reader:
” So, you say that durations aka mini-heaps have been introduced from 10g onwards. I have been using Steve Adams’ script shared_pool_free_lists.sql. Is that not accurate anymore?”

</p

Shared pool free lists

I have a great respect for Steve Adams . In many ways, he has been a great virtual mentor and his insights are so remarkable.

Coming back to the question, I have used Steve’s script before and it is applicable prior to Oracle version 9i. In 9i, sub-heaps were introduced. Further, shared pool durations were introduced in Oracle version 10g. So, his script may not be applicable from version 9i onwards. We will probe this further in this blog.

This is the problem with writing anything about internals stuff, they tend to change from version to version and In many cases, our work can become obsolete in future releases(including this blog!).

Keep Reading

Posted in 11g, Oracle database internals, Performance tuning, shared_pool | Tagged: , , , , , , , , , , , | 2 Comments »

ORA-4031 and Shared Pool Duration

Posted by Riyaj Shamsudeen on August 6, 2009

After reading my earlier post on shared pool A stroll through shared pool heap , one of my client contacted me with an interesting ORA-4031 issue. Client was getting ORA-4031 errors and shared pool size was over 4GB ( in a RAC environment). Client DBA queried v$sgastat to show that there is plenty of free memory in the shared pool. We researched the issue and it is worth blogging. Client DBA was confused as to how there can be ORA-4031 errors when the shared pool free memory is few GBs.

Heapdump Analysis

At this point, it is imperative to take heapdump in level 2 and Level 2 is for the shared pool heap dump. [ Please be warned that it is not advisable to take shared pool heap dumps excessively, as that itself can cause performance issue. During an offline conversation, Tanel Poder said that heapdump can freeze instance as his clients have experienced.]. This will create a trace file in user_dump_dest destination and that trace file is quite useful in analyzing the contents of shared pool heap. Tanel Poder has an excellent script heapdump_analyzer . I modified that script adding code for aggregation at hea, extent and type levels to debug this issue further and it is available as heapdump_dissect.ksh . ( with a special permission from Tanel to publish this script.)

Shared pool review

You can read much more about shared pool in my earlier blog entry posted above. Just as a cursory review, shared pool is split in to multiple sub heaps. In 10g, each of those sub heaps are divided in to even smaller sub heaps, let’s call it mini-heaps. For example, in this specific database, there are three sub heaps. Each of those sub heaps are further split in to four mini-heaps (1,0), (1,1), (1,2) and (1,3) each.

	sga heap(1,0)
	sga heap(1,1)
	sga heap(1,2)
	sga heap(1,3)

	sga heap(2,0)
	sga heap(2,1)
	sga heap(2,2)
	sga heap(2,3)

	sga heap(3,0)
	sga heap(3,1)
	sga heap(3,2)
	sga heap(3,3)	

Keep Reading

Posted in Oracle database internals, Performance tuning, RAC | Tagged: , , , , , , , , , | 15 Comments »

RAC, parallel query and udpsnoop

Posted by Riyaj Shamsudeen on June 20, 2009

I presented about various performance myths in my ‘battle of the nodes’ presentation. One of the myth was that how spawning parallel query slaves across multiple RAC instances can cause major bottleneck in the interconnect. In fact, that myth was direct result of a lessons learnt presentation from a client engagement. Client was suffering from performance issues with enormous global cache waits running in to 30+ms average response time for global cache CR traffic and crippling application performance. Essentially, their data warehouse queries were performing hundreds of parallel queries concurrently with slaves spawning across three node RAC instances.

Of course, I had to hide the client details and simplified using a test case to explain the myth. Looks like either a)my test case is bad or b) some sort of bug I encountered in 9.2.0.5 version c) I made a mistake in my analysis somewhere. Most likely it is the last one :-(. Greg Rahn questioned that example and this topic deserves more research to understand this little bit further. At this point, I don’t have 9.2.0.5 and database is in 10.2.0.4 and so we will test this in 10.2.0.4.

udpsnoop

UDP is one of the protocol used for cache fusion traffic in RAC and it is the Oracle recommended protocol. In this article, UDP traffic size must be measured. Measuring Global cache traffic using AWR reports was not precise. So, I decided to use a dtrace tool kit tool:udpsnoop.d to measure the traffic between RAC nodes. There are two RAC nodes in this setup. You can read more about udpsnoop.d. That tool udpsnoop.d can be downloaded from dtrace toolkit . Output of this script is of the form:

PID        LADDR           LPORT           DR         RADDR           RPORT                 SIZE
---------- --------------- --------------- ---------- --------------- --------------- -----------
15393      1.1.59.192      38395           ->         2.1.59.192      40449                 8240
...

Keep Reading

Posted in Oracle database internals, Performance tuning, RAC | Tagged: , , , , , , , , , , | 15 Comments »

Resolving corruption issue with file_hdrs dump

Posted by Riyaj Shamsudeen on June 18, 2009

One of our client had an interesting and bizarre corruption issue. Intermittently their database is corrupted.

ORA-01171: datafile 178 going offline due to error advancing checkpoint
ORA-01122: database file 178 failed verification check
ORA-01110: data file 178: ‘/app/u04/oradata/somedb/some_data_01.dbf’
ORA-01251: Unknown File Header Version read for file number 178

Error message printed above indicates that file header is corrupted. This is not a one time issue and recurring many times in the past 45 days or so. Database become unusable and only option was to restore the database from backup and recover it. A painful and business impacting issue. Client’s frustration is understandable.

Of course, Client tried normal channels to resolve this problem with out luck. Opening Service tickets with vendors, Relink software stack, update dbid, modify kernel parameters and replace hardware etc. Client has even planned to reinstall OS soon. Still, problem re-occurred every week. That’s when he reached out to us.

file_hdrs dump

Realizing that file header is corrupted, First thing, I wanted to do was that to find what exactly is there in those file headers. What type of corruption in those file headers? Is it a block formatted with null characters? or Is it that few fields are corrupted? This will give more clues and direct us in the right path. So, I asked the client to perform a file_hdrs dump when this problem re-occurs. Following commands were sent to the client.

oradebug setmypid
oradebug dump file_hdrs 10
exit

Above statement is to dump file headers at level 10. oradebug file_hdrs dump will print blocks from the file headers.

oradebug setmypid
oradebug dump controlf 10

Second dump (controlf dump) should print control file records to the trace file. Idea here is to match the file headers and control file file header section to see if we gain more insights in to this corruption issue.

It is quite possible that this could be a control file corruption and control file corruption can’t be ruled out yet. File headers and files can be corrupted. Essentially, this needs cross verification between control file and file headers to identify the root cause. At least, I was hopeful that this will point us in right direction.

dd of file header block

I also requested the client to take a backup of first few blocks of the corrupted file using ‘dd if=filename of=filehdr.out bs=8192 count=2′ command. In UNIX platform, this dd command will copy first two blocks (of block size 8K) in to a flat file. After every corruption, only workaround is to restore and recover the database and I wanted to collect as much information as possible in one round.

What happens on Friday?

Client also pointed out that this issue happens on Fridays only. Various groups were checking UNIX, SAN layers to see if there is anything special about Friday, looking for maintenance jobs that runs on Friday etc. Nothing of any significance shows up. At this point, we were waiting for the problem to reoccur.

And then it did…

file_hdr and controlf trace files

We received trace files from the client and reviewed it. We will concentrate on just one corrupted data file, even though 68 files were corrupted at this point.

control file section for that file printed below:

DATA FILE #32: 
(name #41) /app/u04/oradata/somedb/somedata1.dbf
 creation size=12800 block size=8192 status=0x1c head=41 tail=41 dup=1
 tablespace 32, index=33 krfil=32 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:48 scn: 0x0910.62fd2f34 05/08/2009 10:30:32
 Stop scn: 0x0910.6305098e 05/08/2009 17:42:13
 Creation Checkpointed at scn:  0x0000.000935a6 04/30/2009 21:42:29

Nothing jumps out. Checkpoint SCN is recent enough [ Of course, this problem happened in May 2009].
Let’s review the file header section for that corrupted file. Clearly , there is a corruption since controlf dump command itself is unable to print correct file header version.

 
File header version cannot be determined due to corruption <<<<
Dump may be suspect
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=1755378070=0x68a0f196, Db Name='SOMEDB'
	Activation ID=0=0x0
	Control Seq=3032023=0x2e43d7, File size=401792=0x62180
	File Number=18, Blksiz=8192, File Type=3 DATA

If the file header is filled with null then above section starting with ‘V10 STYLE FILE HEADER:’ can not be printed. oradebug command is complaining that file header version can not be determined, while also printing fields from file header. Confusing and that doesn’t make sense: File header version is corrupt but fields from file headers can be printed. This means that we might need to review the output of dd command. But, before going that route , decided to compare corrupted file header output with another uncorrupted file in the control file trace file to see if we can spot any obvious issue(s).

controlf trace file for an uncorrupted file

Following first two sections prints the control file record section for a uncorrupted file. Checkpoint scn time line is matching for corrupted and uncorrupted data file records. This gives us a clue that control file record for these files may not be corrupted. Genuinely, there must be a corruption in the file header.

DATA FILE #33: 
  (name #42) /app/u03/oradata/qnoldv01/ANOTHER_GOOD_FILE_01.dbf
creation size=12800 block size=8192 status=0xe head=42 tail=42 dup=1
 tablespace 33, index=34 krfil=33 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:53 scn: 0x0910.641e926d 05/13/2009 16:25:26
 Stop scn: 0xffff.ffffffff 05/02/2009 13:41:42
 Creation Checkpointed at scn:  0x0000.000935c5 04/30/2009 21:42:31
 thread:1 rba:(0x3.e3bbd.10)

Let’s also review the file header section of an uncorrupted data file.

 Hot Backup end marker scn: 0x0000.00000000
 aux_file is NOT DEFINED 
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=1908337442=0x71beeb22, Db Name='SOMEDB'
	Activation ID=0=0x0
	Control Seq=34895=0x884f, File size=2434616=0x252638
	File Number=33, Blksiz=8192, File Type=3 DATA

Comparison of corrupted and uncorrupted file is not showing any insights either. OR Is it?

Not so fast…

There is a difference in the file header section between these these two file headers. Let me reprint both these file headers side-by-side to see if the difference can be spotted easily.

Corrupted file:

 
File header version cannot be determined due to corruption
Dump may be suspect
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=1755378070=0x68a0f196, Db Name='SOMEDB'
	Activation ID=0=0x0
	Control Seq=3032023=0x2e43d7, File size=401792=0x62180
	File Number=18, Blksiz=8192, File Type=3 DATA

Uncorrupted file:

 Hot Backup end marker scn: 0x0000.00000000
 aux_file is NOT DEFINED 
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=1908337442=0x71beeb22, Db Name='SOMEDB'
	Activation ID=0=0x0
	Control Seq=34895=0x884f, File size=2434616=0x252638
	File Number=33, Blksiz=8192, File Type=3 DATA

Ahh.. Right there. DBID is different between these two files! How can the DBID be different in this situation? Client assured me that there is no transportable tablespace or read only tablespaces shared between databases going on here. How can the dbids different between two files in the same database? These tablespaces are both opened with read and write activity. We need to follow this clue.

Corrupted file:     :Db ID=1755378070=0x68a0f196 , Db Name='SOMEDB'
...
Uncorruped file     :Db ID=1908337442=0x71beeb22 , Db Name='SOMEDB'

After pointing out that dbid is different between uncorrupted and corrupted files, client dumped control files from other development and test databases to see if they can locate a db with dbid 1755378070. No databases had the same dbid.

Finally..

With dbid difference in mind, few hours later, client DBA recollected something. When they migrated from old database server to new database server, they used scp to move the database files from old server to new server. Of course, client DBA did not want scp connection to die when his VPN connection dies and so wrote a script to scp the files from cron entry as any good DBA will do.

After migration, system admins changed oracle password. Machine was supposed to be shutdown and kept in pristine condition for few weeks before returning the hardware. Unfortunately, it was not down.

I am sure, you figured out where I am going with it. There was a cron entry scheduled in that old DB server and that was quietly scp’ing the files from old database server to new database server corrupting the database. Client disabled that cron entry and also created new ssh key to avoid these issues.

I heard that client is able to enjoy his Fridays :-)
This blog can be read in a traditional format resolving_corruption_issue_with_filehdr_dumps.
PS: Thanks to the client for allowing me to blog about this issue.

Posted in corruption, Oracle database internals, recovery | Tagged: , , , , | 7 Comments »

Library cache lock and library cache pin waits

Posted by Riyaj Shamsudeen on June 2, 2009

I encountered few customer issues centered around library cache lock and library cache pin waits. Library cache lock and pin waits can hang instance, and in few cases, whole clusters of RAC instances can be hung due to library cache lock and pin waits.

Why Library cache locks are needed?

Library cache locks aka parse locks are needed to maintain dependency mechanism between objects and their dependent objects like SQL etc. For example, if an object definition need to be modified or if parse locks are to be broken, then dependent objects objects must be invalidated. This dependency is maintained using library cache locks. For example, if a column is dropped from a table then all SQLs dependent upon the table must be invalidated and reparsed during next access to that object. Library cache locks are designed to implement this tracking mechanism.

In a regular enqueue locking scenarios there is a resource (example TM table level lock) and sessions enqueue to lock that resource. More discussion on enqueue locking can be found in Internal of locks. Similarly, library cache locks uses object handles as resource structures and locks are taken on that resource. If the resources are not available in a compatible mode, then sessions must wait for library cache objects to be available.

Why Library cache pins are needed?

Library cache pins deals with current execution of dependent objects. For example, an underlying objects should not be modified when a session is executing or accessing a dependent object. So, before parse locks on a library cache object can be broken, library cache pins must be acquired in an Exclusive mode and then only library cache objects can be dropped. If a session is executing a stored object, then the library cache pins will not be available and there will be waits for library cache pins. Typically, this happens for a long running statement executing a stored object.

x$kgllk, x$kglpn and x$kglob

Library cache locks and pins are externalized in three x$ tables. x$kgllk is externalizing all locking structures on an object. Entries in x$kglob acts as a resource structure. x$kglpn is externalizing all library cache pins.

x$kglob.kglhdadr acts as a pointer to the resource structure. Presumably, kglhdadr stands KGL handle address. x$kgllk acts as a lock structure and x$kgllk.kgllkhdl points to x$kglob.kglhdadr. Also, x$kglpn acts as a pin stucture and x$kglpn.kglpnhdl points to x$kglob.kglhdadr to pin a resource. To give an analogy between object locking scenarios, x$kglob acts as resource structure and x$kgllk acts as lock structures for library cache locks. For library cache pins, x$kglpn acts as pin structure. x$kglpn also pins that resource using kglpnhdl. This might be clear after reviewing the example below.

Test case

We will create a simple test case to create library cache locks and pin waits

create or replace procedure backup.test_kgllk (l_sleep in boolean , l_compile in boolean)
as
 begin
  if (l_sleep ) then
	sys.dbms_lock.sleep(60);
  elsif (l_compile )  then
  	execute immediate 'alter procedure test_kgllk compile';
  end if;
 end;
/

In this test case above, we create a procedure and it accepts two boolean parameters: sleep and compile. Passing true to first argument will enable the procedure to sleep for a minute and passing true for the second argument will enable the procedure to recompile itself.

Let’s create two sessions in the database and then execute them as below.

Session #1: exec test_kgllk ( true, false); — Sleep for 1 minutes and no compile
Session #2: exec test_kgllk ( false, true); — No sleep,but compile..

At this point both sessions are waiting. Following SQL can be used to print session wait details.

select
 distinct
   ses.ksusenum sid, ses.ksuseser serial#, ses.ksuudlna username,ses.ksuseunm machine,
   ob.kglnaown obj_owner, ob.kglnaobj obj_name
   ,pn.kglpncnt pin_cnt, pn.kglpnmod pin_mode, pn.kglpnreq pin_req
   , w.state, w.event, w.wait_Time, w.seconds_in_Wait
   -- lk.kglnaobj, lk.user_name, lk.kgllksnm,
   --,lk.kgllkhdl,lk.kglhdpar
   --,trim(lk.kgllkcnt) lock_cnt, lk.kgllkmod lock_mode, lk.kgllkreq lock_req,
   --,lk.kgllkpns, lk.kgllkpnc,pn.kglpnhdl
 from
  x$kglpn pn,  x$kglob ob,x$ksuse ses 
   , v$session_wait w
where pn.kglpnhdl in
(select kglpnhdl from x$kglpn where kglpnreq >0 )
and ob.kglhdadr = pn.kglpnhdl
and pn.kglpnuse = ses.addr
and w.sid = ses.indx
order by seconds_in_wait desc
/

Output of above SQL is:

                                                                pin  pin  pin                                 wait seconds
  SID   SERIAL# USERNAME     MACHINE   OBJ_OWNER  OBJ_NAME      cnt  mode req  STATE      EVENT               time in_wait
----- --------- ------------ --------- ---------- ------------- ---- ---- ---- ---------- ------------------- ----- -------
  268     12409 SYS          orap      SYS        TEST_KGLLK    3    2    0    WAITING    PL/SQL lock timer       0       7
  313     45572 SYS          orap      SYS        TEST_KGLLK    0    0    3    WAITING    library cache pin       0       3
  313     45572 SYS          orap      SYS        TEST_KGLLK    3    2    0    WAITING    library cache pin       0       3

  1. Session 268 (session #1) is sleeping while holding library cache pin on test_kgllk object (waiting on PL/SQL lock timer more accurately).
  2. Session 313 is holding library cache pin in mode 2 and waiting for library cache pin in mode 3.

Obviously, session 313 is waiting for session 268 to release library cache pins. Since session 268 is executing, session 313 should not be allowed to modify test_kgllk library cache object. That’s exactly why library cache pins are needed.

Adding another session to this mix..

Let’s add one more session as below

 
exec test_kgllk (false, true);

Output of above query is:

 
                                                                                   pin  pin  pin                                            wait seconds
  SID   SERIAL# USERNAME     MACHINE              OBJ_OWNER  OBJ_NAME              cnt mode  req STATE      EVENT                           time in_wait
----- --------- ------------ -------------------- ---------- -------------------- ---- ---- ---- ---------- ------------------------------ ----- -------
  268     12409 SYS          oraperf              SYS        TEST_KGLLK              3    2    0 WAITING    PL/SQL lock timer                  0      34
  313     45572 SYS          oraperf              SYS        TEST_KGLLK              0    0    3 WAITING    library cache pin                  0      29
  313     45572 SYS          oraperf              SYS        TEST_KGLLK              3    2    0 WAITING    library cache pin                  0      29
  442      4142 SYS          oraperf              SYS        TEST_KGLLK              0    0    2 WAITING    library cache pin                  0       3

Well, no surprise there. New session 442 also waiting for library cache pin. But, notice the request mode for session 442. It is 2. Session 442 needs that library cache pin in share mode to start execution. But 313 has already requested that library cache pin in mode 3. A queue is building up here. Many processes can queue behind session 313 at this point leading to an hung instance.

library cache locks..

Let’s execute same package but both with same parameters.

 Session #1: exec test_kgllk(false, true);
 Session #2: exec test_kgllk(false, true);

Rerunning above query tells us that session 313 is waiting for the self. Eventually, this will lead library cache pin self deadlock.


Library cache pin holders/waiters
---------------------------------
                                                                                   pin  pin  pin                                            wait seconds
  SID   SERIAL# USERNAME     MACHINE              OBJ_OWNER  OBJ_NAME              cnt mode  req STATE      EVENT                           time in_wait
----- --------- ------------ -------------------- ---------- -------------------- ---- ---- ---- ---------- ------------------------------ ----- -------
  313     45572 SYS          oraperf              SYS        TEST_KGLLK              0    0    3 WAITING    library cache pin                  0      26
  313     45572 SYS          oraperf              SYS        TEST_KGLLK              3    2    0 WAITING    library cache pin                  0      26

Wait, what happened to session #2? It is not visible in x$kglpn. Querying v$session_wait shows that Session #2 is waiting for library cache lock. We will run yet another query against x$kgllk to see library cache lock waits.

 
  Querying x$kgllk with the query below: 
select
 distinct
   ses.ksusenum sid, ses.ksuseser serial#, ses.ksuudlna username,KSUSEMNM module,
   ob.kglnaown obj_owner, ob.kglnaobj obj_name
   ,lk.kgllkcnt lck_cnt, lk.kgllkmod lock_mode, lk.kgllkreq lock_req
   , w.state, w.event, w.wait_Time, w.seconds_in_Wait
 from
  x$kgllk lk,  x$kglob ob,x$ksuse ses
  , v$session_wait w
where lk.kgllkhdl in
(select kgllkhdl from x$kgllk where kgllkreq >0 )
and ob.kglhdadr = lk.kgllkhdl
and lk.kgllkuse = ses.addr
and w.sid = ses.indx
order by seconds_in_wait desc
/

Library cache lock holders/waiters
---------------------------------
                                                                                   lock lock                                            wait seconds
  SID   SERIAL# USERNAME     MODULE     OBJ_OWNER  OBJ_NAME                LCK_CNT mode  req STATE      EVENT                           time in_wait
----- --------- ------------ ---------- ---------- -------------------- ---------- ---- ---- ---------- ------------------------------ ----- -------
  313     45572 SYS          wsqfinc1a  SYS        TEST_KGLLK                    1    1    0 WAITING    library cache pin                  0      29
  313     45572 SYS          wsqfinc1a  SYS        TEST_KGLLK                    1    3    0 WAITING    library cache pin                  0      29
  268     12409 SYS          wsqfinc1a  SYS        TEST_KGLLK                    0    0    2 WAITING    library cache lock                 0      12
  268     12409 SYS          wsqfinc1a  SYS        TEST_KGLLK                    1    1    0 WAITING    library cache lock                 0      12

Session 313 is holding library cache lock on that object in mode 3 and session 268 is requesting lock on that library cache object in mode 2. So, session 268 is waiting for library cache lock while session 313 is waiting for library cache pin (self ). Again, point here is that session 268 is trying to access library cache object and need to acquire library cache lock in correct mode. That library cache lock is not available leading to a wait.

Complete script can be downloaded from my script archive.

RAC, library cache locks and pins

Things are different in RAC. Library cache locks and pins are global resources controlled by GES layer. So, these scripts might not work if these library cache lock and pin waits are global events. Let’s look at what happens in a RAC environment

exec test_kgllk ( false, true); — node 1
exec test_kgllk ( false, true); — node 2

In node1, only one session is visible.

Library cache pin holders/waiters
----------------------------------
                                                                                   pin  pin  pin                                            wait seconds
  SID   SERIAL# USERNAME     MACHINE              OBJ_OWNER  OBJ_NAME              cnt mode  req STATE      EVENT                           time in_wait
----- --------- ------------ -------------------- ---------- -------------------- ---- ---- ---- ---------- ------------------------------ ----- -------
  268     12409 SYS          oraperf              SYS        TEST_KGLLK              0    0    3 WAITING    library cache pin                  0      18
  268     12409 SYS          oraperf              SYS        TEST_KGLLK              3    2    0 WAITING    library cache pin                  0      18

 In node 2, only requestor of the lock is visible. 

lock lock wait seconds SID SERIAL# USERNAME MODULE OBJ_OWNER OBJ_NAME LCK_CNT mode req STATE EVENT time in_wait ----- --------- ------------ ---------- ---------- -------------------- ---------- ---- ---- ---------- ------------------------------ ----- ------- 377 43558 SYS wsqfinc2a SYS TEST_KGLLK 0 0 2 WAITING library cache lock 0 86

Essentially, this script does not work in a RAC environment since it accesses x$ tables directly, which are local to an instance. To understand the issue in a RAC environment we need to access gv$ views, based on x$kgllk, x$kglpn etc. But, I don’t see gv$ views over these x$ tables. We are out of luck there unless we do some more coding.

Nevertheless, we can see lockers and waiters accessing gv$ges_blocking_enqneue to understand locking in RAC.

  1  select inst_id, handle, grant_level, request_level, resource_name1, resource_name2, pid , transaction_id0, transaction_id1
  2* ,owner_node, blocked, blocker, state from gv$ges_blocking_enqueue
SQL> /

   INST_ID HANDLE           GRANT_LEV REQUEST_L RESOURCE_NAME1                 RESOURCE_NAME2                        PID
---------- ---------------- --------- --------- ------------------------------ ------------------------------ ----------
TRANSACTION_ID0 TRANSACTION_ID1 OWNER_NODE    BLOCKED    BLOCKER
--------------- --------------- ---------- ---------- ----------
STATE
----------------------------------------------------------------
         2 00000008DD779258 KJUSERNL  KJUSERPR  [0x45993b44][0x3a1b9eee],[LB]  1167670084,974888686,LB              8700
              0               0          1          1          0
OPENING

         1 00000008E8123878 KJUSEREX  KJUSEREX  [0x45993b44][0x3a1b9eee],[LB]  1167670084,974888686,LB             12741
              0               0          0          0          1
GRANTED


We can see that PID 12741 from instance 1 is holding a library cache global lock [LB]. Global resource in this case is [0x45993b44][0x3a1b9eee],[LB] which uniquely identifies a library cache object at the cluster level. Grant_level is KJUSEREX or Exclusive level and request_level from node 2 is KJUSERPR which is Protected Read level. PID 8700 in node 2 is waiting for library cache lock held by PID 12741 in node1. Using this output and our script output, we can pin point which process is holding library cache lock or pin. While Library cache locks are globalized as global locks in the range of [LA] – [LZ], Library cache pins are also globalized as lock types in the range [NA]-[NZ].

This blog can be read in a document format from
Library_cache_locks_and_library_cache_pin_waits
Update #1: Updated broken links.
Update #2: Updated verbatim after a reader comment.

Posted in Oracle database internals, Performance tuning | Tagged: , , , , , , , | 34 Comments »

COLLABORATE 2009 presentation: 11g performance new feature.

Posted by Riyaj Shamsudeen on May 6, 2009

I just presented about Oracle 11g new features specific to performance in COLLABORATE 2009, Orlando Florida. You can download presentation and paper from here:

11g performance specific new features – presentation
11g performance specific new features – paper

I met couple of new friends and many familiar faces.

Catherin Devlin introduced sqlpython tool to me. This tool is an user friendly replacement for sqlplus with many UNIX like goodies. This tool will be an useful addition to command line tools. You can see installation instructions for sqlpython here.

I also attended an interesting application performance panel discussion with fellow Oakie Mark W Farnheim, Mike Brown and Sandra Vucinic. We discussed few customer issues.

An interesting problem discussion is worth mentioning. A client clones production database to development database using full rapidclone methodology. But, access plans for a SQL statement is different between production and development, even though everything was exactly the same. I am hoping, that client will send 10053 trace files from both databases and I will blog about it if I get anything from that client. I think, bind peeking is causing the plan difference, but need to analyze 10053 trace file to confirm that. Another possibility is that, may be , he is not cloning Oracle Software and some software patches are not existing in cloned environment. Need trace files to proceed further here.

Of course, I met few other friends Jeremy Schneider, Dan Norris to mention few.

Posted in 11g, CBO, EBS11i, Oracle database internals, Performance tuning, Presentations | Tagged: , , , , , , , , , , | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 200 other followers