Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Archive for the ‘RAC’ Category


Posted by Riyaj Shamsudeen on April 29, 2012

We know that database blocks are transferred between the nodes through the interconnect, aka cache fusion traffic. Common misconception is that packet transfer size is always database block size for block transfer (Of course, messages are smaller in size). That’s not entirely true. There is an optimization in the cache fusion code to reduce the packet size (and so reduces the bits transferred over the private network). Don’t confuse this note with Jumbo frames and MTU size, this note is independent of MTU setting.

In a nutshell, if free space in a block exceeds a threshold (_gc_fusion_compression) then instead of sending the whole block, LMS sends a smaller packet, reducing private network traffic bits. Let me give an example to illustrate my point. Let’s say that the database block size is 8192 and a block to be transferred is a recently NEWed block, say, with 4000 bytes of free space. Transfer of this block over the interconnect from one node to another node in the cluster will result in a packet size of ~4200 bytes. Transfer of bytes representing free space can be avoided completely, just a symbolic notation of free space begin offset and free space end offset is good enough to reconstruct the block in the receiving side without any loss of data.This optimization makes sense as there is no need to clog the network unnecessarily.

Read the rest of this entry »

Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: , , , | 9 Comments »


Posted by Riyaj Shamsudeen on April 19, 2012

If you are attending Collaborate 2012, you might be interested in my content-rich sessions below :

Session Number: 326
Session Title: SCAN, VIP, HAIP, and other RAC acronyms
Session Date/Time/Room: Tue, Apr 24, 2012 (10:45 AM – 11:45 AM) : Surf C

Session Number: 327
Session Title: Internals and Performance Boot Camp: Truss, pstack, pmap, and more
Session Date/Time/Room: Wed, Apr 25, 2012 (03:00 PM – 04:00 PM) : Palm A

Hope to see you there!

Update: I am uploading presentation files. Presentations are much more recent than the document 🙂


Thanks for attending!

Posted in Oracle database internals, Performance tuning, Presentations, RAC | Tagged: , , , , , , , , , , , , | 1 Comment »

gc buffer busy acquire vs release

Posted by Riyaj Shamsudeen on April 19, 2012

Last week (March 2012), I was conducting Advanced RAC Training online. During the class, I was recreating a ‘gc buffer busy’ waits to explain the concepts and methods to troubleshoot the issue.


Let’s define these events first. Event ‘gc buffer busy’ event means that a session is trying to access a buffer,but there is an open request for Global cache lock for that block already, and so, the session must wait for the GC lock request to complete before proceeding. This wait is instrumented as ‘gc buffer busy’ event.

From 11g onwards, this wait event is split in to ‘gc buffer busy acquire’ and ‘gc buffer busy release’. An attendee asked me to show the differentiation between these two wait events. Fortunately, we had a problem with LGWR writes and we were able to inspect the waits with much clarity during the class.

Remember that Global cache enqueues are considered to be owned by an instance. From 11g onwards, gc buffer busy event differentiated between two cases:

  1. If existing GC open request originated from the local instance, then current session will wait for ‘gc buffer busy acquire’. Essentially, current process is waiting for another process in the local instance to acquire GC lock, on behalf of the local instance. Once GC lock is acquired, current process can access that buffer without additional GC processing (if the lock is acquired in a compatible mode).
  2. If existing GC open request originated from a remote instance, then current session will wait for ‘gc buffer busy release’ event. In this case session is waiting for another remote session (hence another instance) to release the GC lock, so that local instance can acquire buffer.


Following output should show the differentiation with much clarity.

Read the rest of this entry »

Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: , , , , | 19 Comments »

Next Advanced RAC Training class scheduled.

Posted by Riyaj Shamsudeen on February 27, 2012

Just a quick note, I will be conducting next 2-week advanced RAC Training, online class in March 26-30 and April 9-13.

You can find agenda and other details here .

Update 1:
I received few emails about the training outline. You can find the outline below:


Update 2:
Yes, I do accept Purchase Orders and can invoice you 🙂

Posted in Presentations, RAC | Tagged: , , | 1 Comment »

Temporary tablespaces in RAC

Posted by Riyaj Shamsudeen on February 13, 2012

Temporary tablespaces are shared objects and they are associated to an user or whole database (using default temporary tablespace). So, in RAC, temporary tablespaces are shared between the instances. Many temporary tablespaces can be created in a database, but all of those temporary tablespaces are shared between the instances. Hence, temporary tablespaces must be allocated in shared storage or ASM. We will explore the space allocation in temporary tablespace in RAC, in this blog entry.

In contrast, UNDO tablespaces are owned by an instance and all transactions from that instance is exclusively allocated in that UNDO tablespace. Remember that other instances can read blocks from remote undo tablespace, and so, undo tablespaces also must be allocated from shared storage or ASM.

Space allocation in TEMP tablespace

TEMP tablespaces are divided in to extents (In 11.2, extent size is 1M, not sure whether the size of an extent is controllable or not). These extent maps are cached in local SGA, essentially, soft reserving those extents for the use of sessions connecting to that instance. But, note that, extents in a temporary tablespace are not cached at instance startup, instead instance caches the extents as the need arises. We will explore this with a small example:

Read the rest of this entry »

Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: , , , , , , , | 16 Comments »


Posted by Riyaj Shamsudeen on February 10, 2012

I just uploaded my presentation materials for ‘Truss, pstack etc’ for HOTSOS 2012 symposium , a performance intensive conference, happening right here in my home town Dallas, TX.

I can’t believe, it is been ten years from the start of this annual conference! This is the tenth annual symposium and I have been presenting in this symposium for almost all years except few early years. Quality of presentations and quality of audience is very high in this symposium and many of the audience are repeat audience, almost this feels like an annual pilgrimage to “sanctum of performance”. If you are interested in learning the techniques and methods to debug and resolve performance issues in a correct way, you should definitely consider attending this symposium. To top it off, Jonathan Lewis is conducting Training Day this year.

There are many great authors talking in this symposium.

Let me take this opportunity to welcome you to Dallas and encourage you to attend this symposium !
PS: Mark Bobak has been presenting or attending all ten years of this symposium, kudos Mark! And, Yes, that’s the same Mark Bobak who is the list admin for that most famous oracle-l mailing list.
PPS: Even though I live in Texas, no, I do not ride an horse to commute, instead, I drive a Ford Mustang car.(Incidentally, Mustang means “a small breed of horse, often wild or half wild, found in the southwestern US” as defined by, So, after all, I am driving a string of wild horse).

Posted in Performance tuning, Presentations, RAC | Tagged: , , | 1 Comment »

My sessions in RMOUG 2012

Posted by Riyaj Shamsudeen on February 10, 2012

I will be leaving to Denver in few days to talk about the following presentations in RMOUG 2012. Stop by and say hello to me if you intend to attend RMOUG training days.

My sessions in RMOUG 2012 are

  1. Room 402:Session 2: Parallel Execution in RAC – Wednesday 10:45 AM to 11:45AM
  2. Room 4f: Session 10: Troubleshooting RAC background processes – Thursday 1:30PM to 2:30PM
  3. Room 4f: Session 11: A kind and Gentle introduction to RAC – Thursday 2:45 PM to 3:45 PM

Hope to see you there.

Posted in Performance tuning, Presentations, RAC | Tagged: , , , | 2 Comments »

Nologging redo size

Posted by Riyaj Shamsudeen on January 25, 2012

It is probably easy to calculate hourly redo rate or daily redo rate using AWR data. For example, my script awr_redo_size.sql can be used to calculate daily redo rate, and awr_redo_size_history.sql can be used to calculate hourly redo rate. Hourly redo rate is especially useful since you can export to an excel spreadsheet, graph it to see redo rate trend.

Update: I added another script to calculate redo rate if you don’t have AWR license. redo_size_archived_log.sql.

Introduction to Direct Mode Writes

Direct mode operations write directly in to the database file skipping buffer cache. Minimal redo(aka invalidation redo) is generated, if the database is not in force logging mode. Keeping the database in no force logging mode is peachy as long as you don’t use Data guard, Streams, or Golden Gate.

Suddenly, business decide to use one of these log mining based replication products. This means that you must turn on Force logging at the database level so that replication tools can capture (just replay in the case of Data guard) the redo information correctly and consistently.

But, what if your application performs high amount of direct mode operation, such as insert /*+ append */ operations? Now, you need to estimate the redo size to identify the effect of FORCE LOGGING mode That estimation gets little tricky.
But wait, there’s more!

Posted in Oracle database internals, Performance tuning, RAC | Tagged: , , , , | 8 Comments »

Video: deep review of LMS

Posted by Riyaj Shamsudeen on January 20, 2012

This video was created circa July 2011. Click the Read More link to review the video. Version Oracle Database

Synopsis: Essentially, we probe the importance of LMS processes using DTrace. Explain why LMS should run in elevated priority. How to review deep statistics about LMS processes and much more.

Posted in Oracle database internals, Performance tuning, Presentations, RAC, video | Tagged: , , , , | 7 Comments »

SCN – What, why, and how?

Posted by Riyaj Shamsudeen on January 19, 2012

In this blog entry, we will explore the wonderful world of SCNs and how Oracle database uses SCN internally. We will also explore few new bugs and clarify few misconceptions about SCN itself.

What is SCN?

SCN (System Change Number) is a primary mechanism to maintain data consistency in Oracle database. SCN is used primarily in the following areas, of course, this is not a complete list:

  1. Every redo record has an SCN version of the redo record in the redo header (and redo records can have non-unique SCN). Given redo records from two threads (as in the case of RAC), Recovery will order them in SCN order, essentially maintaining a strict sequential order. As explained in my paper, every redo record has multiple change vectors too.
  2. Every data block also has block SCN (aka block version). In addition to that, a change vector in a redo record also has expected block SCN. This means that a change vector can be applied to one and only version of the block. Code checks if the target SCN in a change vector is matching with the block SCN before applying the redo record. If there is a mismatch, corruption errors are thrown.
  3. Read consistency also uses SCN. Every query has query environment which includes an SCN at the start of the query. A session can see the transactional changes only if that transaction commit SCN is lower then the query environment SCN.
  4. Commit. Every commit will generate SCN, aka commit SCN, that marks a transaction boundary. Group commits are possible too.

SCN format

SCN is a huge number with two components to it: Base and wrap. Wrap is a 16 bit number and base is a 32 bit number. It is of the format wrap.base. When the base exceeds 4 billion, then the wrap is incremented by 1. Essentially, wrap counts the number of  times base wrapped around 4 billion. Few simple SQL script will enumerate this better:
But wait, there’s more!

Posted in 11g, corruption, Oracle database internals, Performance tuning, RAC, recovery | Tagged: , , , , , , , | 36 Comments »