Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Archive for the ‘11g’ Category

IOUG Collaborate 2017

Posted by Riyaj Shamsudeen on March 30, 2017

I have a few presentations scheduled in IOUG Collaborate 2017, Las Vegas.

1. Session: 621: RAC Clusterware internals
**Date/time: Mon, Apr 03, 2017 (01:30 PM – 02:30 PM) : Jasmine A

2. Session: RAC cache fusion internals. ( OakTable track )
**Date/time: Wed, Apr 05, 2017 (09:45 AM – 10:45 AM) Room: South Seas C

3. Session: 479: An in-depth review of ASM and internals
**Date/time: Wed, Apr 05, 2017 (02:45 PM – 03:45 PM) : Palm B

Presentation files ( Updated after the presentations).

Thank you for coming to my presentation(s).

ASM review and internals
clusterware internals

Come to Vegas 🙂

( Hopefully, I will have better luck in Blackjack tables too. That reminds me, I need to practice in the blackjack simulator 🙂 )

Posted in 11g, 12c, Oracle database internals, Performance tuning, Presentations, RAC | Tagged: , , , , , , , , | Leave a Comment »

RMOUG Training Days 2015

Posted by Riyaj Shamsudeen on January 19, 2015

I will be talking in Rocky Mountain Oracle User Group Training Days 2015(, with live demos (hopefully there will be no failures in the demo). My topics are:

Feb 17: Deep dive: 3:15PM to 5:15PM – RAC 12c optimization: I will discuss RAC global cache layer in detail with a few demos. You probably can’t find these deep Global Cache layer details anywhere else 🙂

Feb 19: Wednesday: 2:45PM to 3:45PM – Advanced UNIX tools: I will discuss both Solaris and Linux advanced tools to debug deep performance issues.

Feb 19: Wednesday: 12:15PM – 1:15PM – Exadata SIG panel with Alex Fatkulin.

Come to Denver. Come on, it won’t be cold ( I think 🙂 )

Uploading presentation and scripts of the presentation files: RMOUG_2015_Riyaj_RAC_12c_optim files

Presentation files for Unix tools demos: RMOUG_2015_Riyaj_Unix_tools

Posted in 11g, 12c, Presentations, RAC | Tagged: , , , | 1 Comment »

Visualizing AWR data using python

Posted by Riyaj Shamsudeen on August 1, 2014

In my earlier post, I talked about, how tableau can be used to visualize the data. In some cases, I find it useful to query AWR base tables directly using Python and graph it using matplotlib package quickly. Since python is preinstalled in almost all computers, I think, this method will be useful for almost everyone. Of course, you may not have all necessary packages installed in your computer, you can install the packages using install python packages . Of course, if you improve the script, please send it to me, I will share it in this blog entry.

Script is available as a zip file:


Script usage is straight forward. Unzip the zip file and you will have a .py script in the current directory. Execute the script (after adjusting permissions of the script) using the format described below:

# To graph the events for the past 60 days, for inst_id=1, connecting to PROD, with username system. 
./ -d PROD -u system -n 'latch free' -t e -i 1
# To graph the statistics for the past 60 days, for inst_id=2, connecting to PROD
./ -d PRD -u system  -n 'physical reads' -t s -i 2

A typical graph from the above script is:



Posted in 11g, 12c | Tagged: , | 1 Comment »

Data visualization, px qref waits, and a kernel bug!

Posted by Riyaj Shamsudeen on July 2, 2014

Data visualization is a useful method to identify performance patterns. In most cases, I pull custom performance metrics from AWR repository and use tableau to visualize the data. Of course, you can do the visualization using excel spreadsheet too.

Problem definition
We had huge amount of PX qref waits in a database:

                                            Tota    Wait   % DB
Event                                 Waits Time Avg(ms)   time Wait Class
------------------------------ ------------ ---- ------- ------ ----------
PX qref latch                    63,669,198 3341       0   34.2 Other
DB CPU                                      3031           31.1
direct path read temp                92,996 199.       2    2.0 User I/O
direct path write temp               12,029 109.       9    1.1 User I/O
direct path read                      5,454 61.1      11     .6 User I/O

Read the rest of this entry »

Posted in 11g, 12c, Oracle database internals, Performance tuning, weird stuff | Tagged: , , , , , , , | 5 Comments »

Golden rules of RAC performance diagnostics

Posted by Riyaj Shamsudeen on March 20, 2014

After collaborating with many performance engineers in a RAC database, I have come to realize that there are common pattern among the (mis)diagnosis. This blog about discussing those issues. I talked about this in Hotsos 2014 conference also.

Golden rules

Here are the golden rules of RAC performance diagnostics. These rules may not apply general RAC configuration issues though.

  1. Beware of top event tunnel vision
  2. Eliminate infrastructure as an issue
  3. Identify problem-inducing instance
  4. Review send-side metrics also
  5. Use histograms, not just averages

Looks like, this may be better read as a document. So, please use the pdf files of the presentation and a paper. Presentation slide #10 shows indepth coverage on gc buffer busy* wait events. I will try to blog about that slide later (hopefully).

Golden rules of RAC diagnostics paper

Golden rules of rac diagnostics ppt

Scripts mentioned in the presentation can be downloaded here.


Posted in 11g, Performance tuning, Presentations, RAC | Tagged: , , , , , | 3 Comments »

Dude, where is my redo?

Posted by Riyaj Shamsudeen on June 12, 2013

This blog entry is to discuss a method to identify the objects inducing higher amount of redo. First,we will establish that redo size increased sharply and then identify the objects generating more redo. Unfortunately, redo size is not tracked at a segment level. However, you can make an educated guess using ‘db block changes’ statistics. But, you must use logminer utility to identify the objects generating more redo scientifically.

Detecting redo size increase

AWR tables (require Diagnostics license) can be accessed to identify the redo size increase. Following query spools the daily rate of redo size. You can easily open the output file redosize.lst in an Excel spreadsheet and graph the data to visualize the redo size change. Use pipe symbol as the delimiter while opening the file in excel spreadsheet.

spool redosize.lst
REM  You need Diagnostic Pack licence to execute this query!
REM  Author: Riyaj Shamsudeen
col begin_interval_time format a30
set lines 160 pages 1000
col end_interval_time format a30
set colsep '|'
alter session set nls_date_format='DD-MON-YYYY';
with redo_sz as (
SELECT  sysst.snap_id, sysst.instance_number, begin_interval_time ,end_interval_time ,  startup_time,
VALUE - lag (VALUE) OVER ( PARTITION BY  startup_time, sysst.instance_number
                ORDER BY begin_interval_time, startup_time, sysst.instance_number) stat_value,
EXTRACT (DAY    FROM (end_interval_time-begin_interval_time))*24*60*60+
            EXTRACT (HOUR   FROM (end_interval_time-begin_interval_time))*60*60+
            EXTRACT (MINUTE FROM (end_interval_time-begin_interval_time))*60+
            EXTRACT (SECOND FROM (end_interval_time-begin_interval_time)) DELTA
  FROM sys.wrh$_sysstat sysst , DBA_HIST_SNAPSHOT snaps
WHERE (sysst.dbid, sysst.stat_id) IN ( SELECT dbid, stat_id FROM sys.wrh$_stat_name WHERE  stat_name='redo size' )
AND snaps.snap_id = sysst.snap_id
AND snaps.dbid =sysst.dbid
AND sysst.instance_number=snaps.instance_number
and begin_interval_time > sysdate-90
select instance_number, 
  to_date(to_char(begin_interval_time,'DD-MON-YYYY'),'DD-MON-YYYY') dt 
, sum(stat_value) redo1
from redo_sz
group by  instance_number,
order by instance_number, 2
spool off

Visualizing the data will help you to quickly identify any pattern anomalies in redo generation. Here is an example graph created from the excel spreadsheet and see that redo size increased recently.

Read the rest of this entry »

Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: , , , , | 22 Comments »

Clusterware Startup

Posted by Riyaj Shamsudeen on June 5, 2013

The restart of a UNIX server call initialization scripts to start processes and daemons. Every platform has a unique directory structure and follows a method to implement server startup sequence. In Linux platform (prior to Linux 6), initialization scripts are started by calling scripts in the /etc/rcX.d directories, where X denotes the run level of the UNIX server. Typically, Clusterware is started at run level 3. For example, ohasd daemon started by /etc/rc3.d/S96ohasd file by supplying start as an argument. File S96ohasd is linked to /etc/init.d/ohasd.

S96ohasd -> /etc/init.d/ohasd

/etc/rc3.d/S96ohasd start  # init daemon starting ohasd.

Similarly, a server shutdown will call scripts in rcX.d directories, for example, ohasd is shut down by calling K15ohasd script:

K15ohasd -> /etc/init.d/ohasd
/etc/rc3.d/K15ohasd stop  #UNIX daemons stopping ohasd

In Summary, server startup will call files matching the pattern of S* in the /etc/rcX.d directories. Calling sequence of the scripts is in the lexical order of script name. For example, S10cscape will be called prior to S96ohasd, as the script S10cscape occurs earlier in the lexical sequence.

Google if you want to learn further about RC startup sequence. Of course, Linux 6 introduces Upstart feature and the mechanism is a little different:

That’s not the whole story!

Have you ever thought why the ‘crsctl start crs’ returns immediately? You can guess that Clusterware is started in the background as the command returns to UNIX prompt almost immediately. Executing the crsctl command just modifies the ohasdrun file content to ‘restart’. It doesn’t actually perform the task of starting the clusterware. Daemon init.ohasd reads the ohasdrun file every few seconds and starts the Clusterware if the file content is changed to ‘restart’.

# cat /etc/oracle/scls_scr/oel6rac1/root/ohasdrun

Read the rest of this entry »

Posted in 11g, Oracle database internals, RAC | Tagged: , , , , , | 21 Comments »

Do you need asmlib?

Posted by Riyaj Shamsudeen on August 29, 2012

There are many questions from few of my clients about asmlib support in RHEL6, as they are gearing up to upgrade the database servers to RHEL6. There is a controversy about asmlib support in RHEL6.  As usual, I will only discuss technical details in this blog entry.

ASMLIB is applicable only to Linux platform and does not apply to any other platform.

Now, you might ask why bother and why not just use OEL and UK? Well, not every Linux server is used as a database server. In a typical company, there are hundreds of Linux servers and just few percent of those servers are used as Database servers. Linux system administrators prefer to keep one flavor of Linux distribution for management ease and so, asking clients to change the distribution from RHEL to OEL or OEL to RHEL is always not a viable option.

Do you need to use ASMLIB in Linux?

Short answer is No. Long answer is possibly No. ASMLIB is an optional support library and eases the administration of ASM devices. Especially, it is helpful while adding new devices to the nodes in a cluster. ASMLIB essentially stamps the devices and so, it is easily visible in other nodes of a cluster in the next asm scandisk. asmlib also provides device persistence, which is the important benefit of ASM (see the discussion below for more details about device persistence).

Read the rest of this entry »

Posted in 11g, RAC | Tagged: , , , , , , , , | 14 Comments »

All about RAC and MTU with a video

Posted by Riyaj Shamsudeen on May 22, 2012

Let’s first discuss how RAC traffic works before continuing. Environment for the discussion is: 2 node cluster with 8K database block size, UDP protocol is used for cache fusion. (BTW, UDP and RDS protocols are supported in UNIX platform; whereas Windows uses TCP protocol).

UDP protocol, fragmentation, and assembly

UDP Protocol is an higher level protocol stack, and it is implemented over IP Protocol ( UDP/IP). Cache Fusion uses UDP protocol to send packets over the wire (Exadata uses RDS protocol though).

MTU defines the Maximum Transfer Unit of an IP packet. Let us consider an example of MTU set to 1500 in a network interface. One 8K block transfer can not be performed with just one IP packet  as the IP packet size (1500 bytes) is less than 8K. So, one transfer of UDP packet of 8K size is fragmented to 6 IP packets and sent over the wire. In the receiving side, those 6 packets are reassembled to create one UDP buffer of size 8K. After the assembly, that UDP buffer is delivered to an UDP port of a UNIX process. Usually, a foreground process will listen on that port to receive the UDP buffer.

Read the rest of this entry »

Posted in 11g, Oracle database internals, Performance tuning, Presentations, RAC, video | Tagged: , , , , , , , , , , , , , , , , , , | 13 Comments »


Posted by Riyaj Shamsudeen on April 29, 2012

We know that database blocks are transferred between the nodes through the interconnect, aka cache fusion traffic. Common misconception is that packet transfer size is always database block size for block transfer (Of course, messages are smaller in size). That’s not entirely true. There is an optimization in the cache fusion code to reduce the packet size (and so reduces the bits transferred over the private network). Don’t confuse this note with Jumbo frames and MTU size, this note is independent of MTU setting.

In a nutshell, if free space in a block exceeds a threshold (_gc_fusion_compression) then instead of sending the whole block, LMS sends a smaller packet, reducing private network traffic bits. Let me give an example to illustrate my point. Let’s say that the database block size is 8192 and a block to be transferred is a recently NEWed block, say, with 4000 bytes of free space. Transfer of this block over the interconnect from one node to another node in the cluster will result in a packet size of ~4200 bytes. Transfer of bytes representing free space can be avoided completely, just a symbolic notation of free space begin offset and free space end offset is good enough to reconstruct the block in the receiving side without any loss of data.This optimization makes sense as there is no need to clog the network unnecessarily.

Read the rest of this entry »

Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: , , , | 9 Comments »