Archive for the ‘11g’ Category
Posted by Riyaj Shamsudeen on June 12, 2013
This blog entry is to discuss a method to identify the objects inducing higher amount of redo. First,we will establish that redo size increased sharply and then identify the objects generating more redo. Unfortunately, redo size is not tracked at a segment level. However, you can make an educated guess using ‘db block changes’ statistics. But, you must use logminer utility to identify the objects generating more redo scientifically.
Detecting redo size increase
AWR tables (require Diagnostics license) can be accessed to identify the redo size increase. Following query spools the daily rate of redo size. You can easily open the output file redosize.lst in an Excel spreadsheet and graph the data to visualize the redo size change. Use pipe symbol as the delimiter while opening the file in excel spreadsheet.
REM You need Diagnostic Pack licence to execute this query!
REM Author: Riyaj Shamsudeen
col begin_interval_time format a30
set lines 160 pages 1000
col end_interval_time format a30
set colsep '|'
alter session set nls_date_format='DD-MON-YYYY';
with redo_sz as (
SELECT sysst.snap_id, sysst.instance_number, begin_interval_time ,end_interval_time , startup_time,
VALUE - lag (VALUE) OVER ( PARTITION BY startup_time, sysst.instance_number
ORDER BY begin_interval_time, startup_time, sysst.instance_number) stat_value,
EXTRACT (DAY FROM (end_interval_time-begin_interval_time))*24*60*60+
EXTRACT (HOUR FROM (end_interval_time-begin_interval_time))*60*60+
EXTRACT (MINUTE FROM (end_interval_time-begin_interval_time))*60+
EXTRACT (SECOND FROM (end_interval_time-begin_interval_time)) DELTA
FROM sys.wrh$_sysstat sysst , DBA_HIST_SNAPSHOT snaps
WHERE (sysst.dbid, sysst.stat_id) IN ( SELECT dbid, stat_id FROM sys.wrh$_stat_name WHERE stat_name='redo size' )
AND snaps.snap_id = sysst.snap_id
AND snaps.dbid =sysst.dbid
and begin_interval_time > sysdate-90
, sum(stat_value) redo1
group by instance_number,
order by instance_number, 2
Visualizing the data will help you to quickly identify any pattern anomalies in redo generation. Here is an example graph created from the excel spreadsheet and see that redo size increased recently.
Read the rest of this entry »
Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: identify objects redo, redo internals, segment_stats.sql, v$logmnr_contents, v$segment_stats | 8 Comments »
Posted by Riyaj Shamsudeen on June 5, 2013
The restart of a UNIX server call initialization scripts to start processes and daemons. Every platform has a unique directory structure and follows a method to implement server startup sequence. In Linux platform (prior to Linux 6), initialization scripts are started by calling scripts in the /etc/rcX.d directories, where X denotes the run level of the UNIX server. Typically, Clusterware is started at run level 3. For example, ohasd daemon started by /etc/rc3.d/S96ohasd file by supplying start as an argument. File S96ohasd is linked to /etc/init.d/ohasd.
S96ohasd -> /etc/init.d/ohasd
/etc/rc3.d/S96ohasd start # init daemon starting ohasd.
Similarly, a server shutdown will call scripts in rcX.d directories, for example, ohasd is shut down by calling K15ohasd script:
K15ohasd -> /etc/init.d/ohasd
/etc/rc3.d/K15ohasd stop #UNIX daemons stopping ohasd
In Summary, server startup will call files matching the pattern of S* in the /etc/rcX.d directories. Calling sequence of the scripts is in the lexical order of script name. For example, S10cscape will be called prior to S96ohasd, as the script S10cscape occurs earlier in the lexical sequence.
Google if you want to learn further about RC startup sequence. Of course, Linux 6 introduces Upstart feature and the mechanism is a little different: http://en.wikipedia.org/wiki/Upstart
That’s not the whole story!
Have you ever thought why the ‘crsctl start crs’ returns immediately? You can guess that Clusterware is started in the background as the command returns to UNIX prompt almost immediately. Executing the crsctl command just modifies the ohasdrun file content to ‘restart’. It doesn’t actually perform the task of starting the clusterware. Daemon init.ohasd reads the ohasdrun file every few seconds and starts the Clusterware if the file content is changed to ‘restart’.
# cat /etc/oracle/scls_scr/oel6rac1/root/ohasdrun
Read the rest of this entry »
Posted in 11g, Oracle database internals, RAC | Tagged: clusterware startup, ohasd startup, ohasdrun, ohasdstr, RAC, RC scripts clusterware | 1 Comment »
Posted by Riyaj Shamsudeen on August 29, 2012
There are many questions from few of my clients about asmlib support in RHEL6, as they are gearing up to upgrade the database servers to RHEL6. There is a controversy about asmlib support in RHEL6. As usual, I will only discuss technical details in this blog entry.
ASMLIB is applicable only to Linux platform and does not apply to any other platform.
Now, you might ask why bother and why not just use OEL and UK? Well, not every Linux server is used as a database server. In a typical company, there are hundreds of Linux servers and just few percent of those servers are used as Database servers. Linux system administrators prefer to keep one flavor of Linux distribution for management ease and so, asking clients to change the distribution from RHEL to OEL or OEL to RHEL is always not a viable option.
Do you need to use ASMLIB in Linux?
Short answer is No. Long answer is possibly No. ASMLIB is an optional support library and eases the administration of ASM devices. Especially, it is helpful while adding new devices to the nodes in a cluster. ASMLIB essentially stamps the devices and so, it is easily visible in other nodes of a cluster in the next asm scandisk. asmlib also provides device persistence, which is the important benefit of ASM (see the discussion below for more details about device persistence).
Read the rest of this entry »
Posted in 11g, RAC | Tagged: asmlib, device mapper, multipath, multipath.conf, oracle performance, oracle RAC asmlib, RAC, udev, udev rules | 7 Comments »
Posted by Riyaj Shamsudeen on May 22, 2012
Let’s first discuss how RAC traffic works before continuing. Environment for the discussion is: 2 node cluster with 8K database block size, UDP protocol is used for cache fusion. (BTW, UDP and RDS protocols are supported in UNIX platform; whereas Windows uses TCP protocol).
UDP protocol, fragmentation, and assembly
UDP Protocol is an higher level protocol stack, and it is implemented over IP Protocol ( UDP/IP). Cache Fusion uses UDP protocol to send packets over the wire (Exadata uses RDS protocol though).
MTU defines the Maximum Transfer Unit of an IP packet. Let us consider an example of MTU set to 1500 in a network interface. One 8K block transfer can not be performed with just one IP packet as the IP packet size (1500 bytes) is less than 8K. So, one transfer of UDP packet of 8K size is fragmented to 6 IP packets and sent over the wire. In the receiving side, those 6 packets are reassembled to create one UDP buffer of size 8K. After the assembly, that UDP buffer is delivered to an UDP port of a UNIX process. Usually, a foreground process will listen on that port to receive the UDP buffer.
Read the rest of this entry »
Posted in 11g, Oracle database internals, Performance tuning, Presentations, RAC, video | Tagged: cache fusion mtu, fragmentation and reassembly, gc lost packets, ipfrag_high_thres, ipfrag_low_thres, ipfrag_time, Jumbo frames, MTU, MTU=9000, oracle performance, RAC internals, RAC performance, RAC presentations, RAC training, RAC video, RAC videos, RDS, UDP vs tcp, wireshark | 10 Comments »
Posted by Riyaj Shamsudeen on April 29, 2012
We know that database blocks are transferred between the nodes through the interconnect, aka cache fusion traffic. Common misconception is that packet transfer size is always database block size for block transfer (Of course, messages are smaller in size). That’s not entirely true. There is an optimization in the cache fusion code to reduce the packet size (and so reduces the bits transferred over the private network). Don’t confuse this note with Jumbo frames and MTU size, this note is independent of MTU setting.
In a nutshell, if free space in a block exceeds a threshold (_gc_fusion_compression) then instead of sending the whole block, LMS sends a smaller packet, reducing private network traffic bits. Let me give an example to illustrate my point. Let’s say that the database block size is 8192 and a block to be transferred is a recently NEWed block, say, with 4000 bytes of free space. Transfer of this block over the interconnect from one node to another node in the cluster will result in a packet size of ~4200 bytes. Transfer of bytes representing free space can be avoided completely, just a symbolic notation of free space begin offset and free space end offset is good enough to reconstruct the block in the receiving side without any loss of data.This optimization makes sense as there is no need to clog the network unnecessarily.
Read the rest of this entry »
Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: RAC internals, RAC performance, RAC performance myths, _gc_fusion_compression | 9 Comments »
Posted by Riyaj Shamsudeen on April 19, 2012
Last week (March 2012), I was conducting Advanced RAC Training online. During the class, I was recreating a ‘gc buffer busy’ waits to explain the concepts and methods to troubleshoot the issue.
Let’s define these events first. Event ‘gc buffer busy’ event means that a session is trying to access a buffer,but there is an open request for Global cache lock for that block already, and so, the session must wait for the GC lock request to complete before proceeding. This wait is instrumented as ‘gc buffer busy’ event.
From 11g onwards, this wait event is split in to ‘gc buffer busy acquire’ and ‘gc buffer busy release’. An attendee asked me to show the differentiation between these two wait events. Fortunately, we had a problem with LGWR writes and we were able to inspect the waits with much clarity during the class.
Remember that Global cache enqueues are considered to be owned by an instance. From 11g onwards, gc buffer busy event differentiated between two cases:
- If existing GC open request originated from the local instance, then current session will wait for ‘gc buffer busy acquire’. Essentially, current process is waiting for another process in the local instance to acquire GC lock, on behalf of the local instance. Once GC lock is acquired, current process can access that buffer without additional GC processing (if the lock is acquired in a compatible mode).
- If existing GC open request originated from a remote instance, then current session will wait for ‘gc buffer busy release’ event. In this case session is waiting for another remote session (hence another instance) to release the GC lock, so that local instance can acquire buffer.
Following output should show the differentiation with much clarity.
Read the rest of this entry »
Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: gc buffer busy, gc buffer busy acquire, gc buffer busy release, oracle performance, RAC performance | 10 Comments »
Posted by Riyaj Shamsudeen on February 13, 2012
Temporary tablespaces are shared objects and they are associated to an user or whole database (using default temporary tablespace). So, in RAC, temporary tablespaces are shared between the instances. Many temporary tablespaces can be created in a database, but all of those temporary tablespaces are shared between the instances. Hence, temporary tablespaces must be allocated in shared storage or ASM. We will explore the space allocation in temporary tablespace in RAC, in this blog entry.
In contrast, UNDO tablespaces are owned by an instance and all transactions from that instance is exclusively allocated in that UNDO tablespace. Remember that other instances can read blocks from remote undo tablespace, and so, undo tablespaces also must be allocated from shared storage or ASM.
Space allocation in TEMP tablespace
TEMP tablespaces are divided in to extents (In 11.2, extent size is 1M, not sure whether the size of an extent is controllable or not). These extent maps are cached in local SGA, essentially, soft reserving those extents for the use of sessions connecting to that instance. But, note that, extents in a temporary tablespace are not cached at instance startup, instead instance caches the extents as the need arises. We will explore this with a small example:
Read the rest of this entry »
Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: CI enqueue, DFS lock handle, oracle performance, RAC performance, SS enqueue, temporary tablesapce, temporary tablespace, temporary tablespace groups | 14 Comments »
Posted by Riyaj Shamsudeen on January 19, 2012
In this blog entry, we will explore the wonderful world of SCNs and how Oracle database uses SCN internally. We will also explore few new bugs and clarify few misconceptions about SCN itself.
What is SCN?
SCN (System Change Number) is a primary mechanism to maintain data consistency in Oracle database. SCN is used primarily in the following areas, of course, this is not a complete list:
- Every redo record has an SCN version of the redo record in the redo header (and redo records can have non-unique SCN). Given redo records from two threads (as in the case of RAC), Recovery will order them in SCN order, essentially maintaining a strict sequential order. As explained in my paper, every redo record has multiple change vectors too.
- Every data block also has block SCN (aka block version). In addition to that, a change vector in a redo record also has expected block SCN. This means that a change vector can be applied to one and only version of the block. Code checks if the target SCN in a change vector is matching with the block SCN before applying the redo record. If there is a mismatch, corruption errors are thrown.
- Read consistency also uses SCN. Every query has query environment which includes an SCN at the start of the query. A session can see the transactional changes only if that transaction commit SCN is lower then the query environment SCN.
- Commit. Every commit will generate SCN, aka commit SCN, that marks a transaction boundary. Group commits are possible too.
SCN is a huge number with two components to it: Base and wrap. Wrap is a 16 bit number and base is a 32 bit number. It is of the format wrap.base. When the base exceeds 4 billion, then the wrap is incremented by 1. Essentially, wrap counts the number of times base wrapped around 4 billion. Few simple SQL script will enumerate this better:
But wait, there’s more!
Posted in 11g, corruption, Oracle database internals, Performance tuning, RAC, recovery | Tagged: get_system_change_number, hot backup scn growth, kcmgas calls, kcvblg, ORA-600 , oracle performance, SCN bug, tracefile_name | 24 Comments »
Posted by Riyaj Shamsudeen on January 13, 2012
You might encounter RAC wait event ‘gc cr disk read’ in 11.2 while tuning your applications in RAC environment. Let’s probe this wait event to understand why a session would wait for this wait event.
Understanding the wait event
Let’s say that a foreground process running in node 1, is trying to access a block using a SELECT statement and that block is not in the local cache. To maintain the read consistency, foreground process will require the block consistent with the query SCN. Then the sequence of operation is(simplified):
- Foreground session calculates the master node of the block; Requests a LMS process running in the master node to access the block.
- Let’s assume that block is resident in the master node’s buffer cache. If the block is in a consistent state (meaning block version SCN is lower (or equal?) to query SCN), then LMS process can send the block to the foreground process immediately. Life is not that simple, so, let’s assume that requested block has an uncommitted transaction.
- Since the block has uncommitted changes, LMS process can not send the block immediately. LMS process must create a CR (Consistent Read) version of the block: clones the buffer, applies undo records to the cloned buffer rolling back the block to the SCN consistent with the requested query SCN.
- Then the CR block is sent to the foreground process.
LMS is a light weight process
Global cache operations must complete quickly, in the order of milli-seconds, to maintain the overall performance of RAC database. LMS is a critical process and does not do heavy lifting tasks such as disk I/O etc. If LMS process has to initiate I/O, instead of initiating I/O, LMS will downgrade the block mode and send the block to the requesting foreground process (this is known as Light Works rule). Foreground process will apply undo records to the block to construct CR version of the block.
Posted in 11g, Oracle database internals, Performance tuning, Presentations, RAC | Tagged: gc cr disk read, RAC performance | 6 Comments »
Posted by Riyaj Shamsudeen on November 8, 2011
Waits for ‘DFS lock handle’ can cause massive performance issues in a busy RAC cluster. In this blog entry, we will explore the DFS lock handle wait event, and understand how to troubleshoot the root cause of these waits. I am also going to use locks and resources interchangeably in this blog, but internally, they are two different types of structures.
A little background
DFS (stands for Distributed File System) is an ancient name, associated with cluster file system operations, in a Lock manager supplied by vendors in Oracle Parallel Server Environment (prior name for RAC). But, this wait event has morphed and is now associated with waits irrelevant to database files also. Hence, it is imperative to understand the underlying details to debug the ‘DFS lock handle’ waits.
How does it work?
I have no access to the code, so read this paragraph with caution, as I may have misunderstood my test results: A process trying to acquire a lock on a global GES resource sends a AST(Asynchronous Trap) or BAST (Blocking Asynchronous Trap) message to LCK process, constructing the message with (lock pointer, resource pointer, and resource name) information. If the resource is not available, then the LCK process sends a message to the lock holder for a lock downgrade.
Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: AST, BAST, BB enqueue, CI enqueue, DFS lock handle, GES, gv$ges_resource, oracle performance, RAC, RAC performance, SV enqueue, v$lock_type | 10 Comments »