Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Archive for the ‘Oracle database internals’ Category

Temporary tablespaces in RAC

Posted by Riyaj Shamsudeen on February 13, 2012

Temporary tablespaces are shared objects and they are associated to an user or whole database (using default temporary tablespace). So, in RAC, temporary tablespaces are shared between the instances. Many temporary tablespaces can be created in a database, but all of those temporary tablespaces are shared between the instances. Hence, temporary tablespaces must be allocated in shared storage or ASM. We will explore the space allocation in temporary tablespace in RAC, in this blog entry.

In contrast, UNDO tablespaces are owned by an instance and all transactions from that instance is exclusively allocated in that UNDO tablespace. Remember that other instances can read blocks from remote undo tablespace, and so, undo tablespaces also must be allocated from shared storage or ASM.

Space allocation in TEMP tablespace

TEMP tablespaces are divided in to extents (In 11.2, extent size is 1M, not sure whether the size of an extent is controllable or not). These extent maps are cached in local SGA, essentially, soft reserving those extents for the use of sessions connecting to that instance. But, note that, extents in a temporary tablespace are not cached at instance startup, instead instance caches the extents as the need arises. We will explore this with a small example:

Read the rest of this entry »

Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: , , , , , , , | 16 Comments »

What is ‘rdbms ipc message’ wait event?

Posted by Riyaj Shamsudeen on February 10, 2012

Introduction

There was a question about the wait event ‘rdbms ipc message’ in Oracle-l list. Short answer is that ‘rdbms ipc message’ event means that a process is waiting for an IPC message to arrive. Usually, this wait event can be ignored, but there are few rare scenarios this wait event can’t be completely ignored.

What is ‘rdbms ipc message’ wait means?

It is typical of Oracle Database background processes to wait for more work. For example, LGWR will wait for more work until another (foreground or background ) process request LGWR to do a log flush. In UNIX platforms, wait mechanism is implemented as a sleep on a specific semaphore associated with that process. This wait time is accounted towards database wait events ‘rdbms ipc message’.

Also note that, semaphore based waits are used in other wait scenarios too, not just ‘rdbms ipc message’ waits.

Time to Trace

We will use UNIX utility TRUSS to trace system calls from LGWR; We will enable sql trace on LGWR process. Using the output of these two methods, we will explore this wait event.
Read the rest of this entry »

Posted in Oracle database internals, Performance tuning | Tagged: , , , , , , | 8 Comments »

Nologging redo size

Posted by Riyaj Shamsudeen on January 25, 2012

It is probably easy to calculate hourly redo rate or daily redo rate using AWR data. For example, my script awr_redo_size.sql can be used to calculate daily redo rate, and awr_redo_size_history.sql can be used to calculate hourly redo rate. Hourly redo rate is especially useful since you can export to an excel spreadsheet, graph it to see redo rate trend.

Update: I added another script to calculate redo rate if you don’t have AWR license. redo_size_archived_log.sql.

Introduction to Direct Mode Writes

Direct mode operations write directly in to the database file skipping buffer cache. Minimal redo(aka invalidation redo) is generated, if the database is not in force logging mode. Keeping the database in no force logging mode is peachy as long as you don’t use Data guard, Streams, or Golden Gate.

Suddenly, business decide to use one of these log mining based replication products. This means that you must turn on Force logging at the database level so that replication tools can capture (just replay in the case of Data guard) the redo information correctly and consistently.

But, what if your application performs high amount of direct mode operation, such as insert /*+ append */ operations? Now, you need to estimate the redo size to identify the effect of FORCE LOGGING mode That estimation gets little tricky.
But wait, there’s more!

Posted in Oracle database internals, Performance tuning, RAC | Tagged: , , , , | 8 Comments »

Video: deep review of LMS

Posted by Riyaj Shamsudeen on January 20, 2012

This video was created circa July 2011. Click the Read More link to review the video. Version Oracle Database 11.2.0.2

Synopsis: Essentially, we probe the importance of LMS processes using DTrace. Explain why LMS should run in elevated priority. How to review deep statistics about LMS processes and much more.

Posted in Oracle database internals, Performance tuning, Presentations, RAC, video | Tagged: , , , , | 7 Comments »

SCN – What, why, and how?

Posted by Riyaj Shamsudeen on January 19, 2012

In this blog entry, we will explore the wonderful world of SCNs and how Oracle database uses SCN internally. We will also explore few new bugs and clarify few misconceptions about SCN itself.

What is SCN?

SCN (System Change Number) is a primary mechanism to maintain data consistency in Oracle database. SCN is used primarily in the following areas, of course, this is not a complete list:

  1. Every redo record has an SCN version of the redo record in the redo header (and redo records can have non-unique SCN). Given redo records from two threads (as in the case of RAC), Recovery will order them in SCN order, essentially maintaining a strict sequential order. As explained in my paper, every redo record has multiple change vectors too.
  2. Every data block also has block SCN (aka block version). In addition to that, a change vector in a redo record also has expected block SCN. This means that a change vector can be applied to one and only version of the block. Code checks if the target SCN in a change vector is matching with the block SCN before applying the redo record. If there is a mismatch, corruption errors are thrown.
  3. Read consistency also uses SCN. Every query has query environment which includes an SCN at the start of the query. A session can see the transactional changes only if that transaction commit SCN is lower then the query environment SCN.
  4. Commit. Every commit will generate SCN, aka commit SCN, that marks a transaction boundary. Group commits are possible too.

SCN format

SCN is a huge number with two components to it: Base and wrap. Wrap is a 16 bit number and base is a 32 bit number. It is of the format wrap.base. When the base exceeds 4 billion, then the wrap is incremented by 1. Essentially, wrap counts the number of  times base wrapped around 4 billion. Few simple SQL script will enumerate this better:
But wait, there’s more!

Posted in 11g, corruption, Oracle database internals, Performance tuning, RAC, recovery | Tagged: , , , , , , , | 31 Comments »

gc cr disk read

Posted by Riyaj Shamsudeen on January 13, 2012

You might encounter RAC wait event ‘gc cr disk read’ in 11.2 while tuning your applications in RAC environment. Let’s probe this wait event to understand why a session would wait for this wait event.

Understanding the wait event

Let’s say that a foreground process running in node 1, is trying to access a block using a SELECT statement and that block is not in the local cache. To maintain the read consistency, foreground process will require the block consistent with the query SCN. Then the sequence of operation is(simplified):

  1. Foreground session calculates the master node of the block; Requests a LMS process running in the master node to access the block.
  2. Let’s assume that block is resident in the master node’s buffer cache. If the block is in a consistent state (meaning block version SCN is lower (or equal?) to query SCN), then LMS process can send the block to the foreground process immediately. Life is not that simple, so, let’s assume that requested block has an uncommitted transaction.
  3. Since the block has uncommitted changes, LMS process can not send the block immediately. LMS process must create a CR (Consistent Read) version of the block: clones the buffer, applies undo records to the cloned buffer rolling back the block to the SCN consistent with the requested query SCN.
  4. Then the CR block is sent to the foreground process.

LMS is a light weight process

Global cache operations must complete quickly, in the order of milli-seconds, to maintain the overall performance of RAC database. LMS is a critical process and does not do heavy lifting tasks such as disk I/O etc. If LMS process has to initiate I/O, instead of initiating I/O, LMS will downgrade the block mode and send the block to the requesting foreground process (this is known as Light Works rule). Foreground process will apply undo records to the block to construct CR version of the block.

Read more

Posted in 11g, Oracle database internals, Performance tuning, Presentations, RAC | Tagged: , | 6 Comments »

RMOUG 2012 – Hello Denver!

Posted by Riyaj Shamsudeen on January 10, 2012

On February 14-16, I’ll be at the Colorado Convention Center in Denver, Colorado for RMOUG’s Training Days Conference. This is the largest regional Oracle User Conference in North America and attracts presenters from all around the country and the globe. I’ll be presenting:

Presentation Name: Troubleshooting RAC Background Process

Abstract: RAC background process performance is critical to keep the application performance. This session will demo techniques to review the performance of RAC background processes such as LMS, LMD, LMON, etc. using various statistics and UNIX tools. The presentation will also discuss why certain background processes must run in higher priority to maintain the application performance in RAC.

Presentation Name: A Kind and Gentle Introduction to RAC

Abstract: This session will introduce basic concepts such as cache fusion, conversion to RAC, protocols for interconnect, general architectural overview, GES layer locks, clusterware, etc. The session will also discuss the srvctl command and demo a few of these commands to improve the understanding.

Presentation Name: Parallel Execution in RAC

Abstract: This presentation will start to discuss and demo parallel server allocation, intra, and inter node parallelism aspects. The session will discuss the new parallelism features such as parallel statement queuing, parallel auto dop, and discuss the interaction of those features with RAC. The session will probe a few critical parameters to improve PQ performance in RAC.

Click here for more information or to register for RMOUG’s Training Days.

Posted in Oracle database internals, Performance tuning, Presentations, RAC | Tagged: , , | Leave a Comment »

Troubleshooting ‘DFS lock handle’ waits

Posted by Riyaj Shamsudeen on November 8, 2011

Waits for ‘DFS lock handle’ can cause massive performance issues in a busy RAC cluster. In this blog entry, we will explore the DFS lock handle wait event, and understand how to troubleshoot the root cause of these waits. I am also going to use locks and resources interchangeably in this blog, but internally, they are two different types of structures.

A little background

DFS (stands for Distributed File System) is an ancient name, associated with cluster file system operations, in a Lock manager supplied by vendors in Oracle Parallel Server Environment (prior name for RAC). But, this wait event has morphed and is now associated with waits irrelevant to database files also. Hence, it is imperative to understand the underlying details to debug the ‘DFS lock handle’ waits.

How does it work?

I have no access to the code, so read this paragraph with caution, as I may have misunderstood my test results: A process trying to acquire a lock on a global GES resource sends a AST(Asynchronous Trap) or BAST (Blocking Asynchronous Trap) message to LCK process, constructing the message with (lock pointer, resource pointer, and resource name) information. If the resource is not available, then the LCK process sends a message to the lock holder for a lock downgrade.

Read more

Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: , , , , , , , , , , , | 11 Comments »

Advanced RAC Training

Posted by Riyaj Shamsudeen on July 5, 2011

I will be delivering an intense, advanced 2-week RAC training seminar in Aug 22-26 and Sep 19-23 with numerous demos, dumps and scripts. We will meet 4 hours per day, in a virtual world, 8AM-12 Noon Pacific time for those two weeks. Tanel has written has a great blog post about this seminar series Advanced RAC training.

If you like my presentations about RAC and performance tuning, you will love my seminar series. You will gain better understanding about RAC internals, and you will be able to advance your debugging and performance tuning skills attending my seminar series.

Join me and Let’s explore RAC further.

Posted in Oracle database internals, Performance tuning, Presentations, RAC | Tagged: , , , , | 5 Comments »

Does an UPDATE statement modify the row if the update modifies the column to same value?

Posted by Riyaj Shamsudeen on November 4, 2010

Introduction

If a table column is updated with the same value in a row, does Oracle RDBMS engine modify the data? (or) Does RDBMS engine have an optimization skipping the update, as value of that column is not changing? This was the essence of a question asked in Oracle-l list and I think, it is a good topic for further discussion. Jared Still came up with a fine method to understand this issue measuring redo/undo size. We will explore the same questions with redo log dump method in this blog entry.

Following few lines shows a test case creating a table, an index, and then populating a row in the table.

create table updtest (v1 varchar2(30));

create index updtest_i1 on updtest(v1);

insert into updtest values ('Riyaj');

commit;

REDO records and change vectors

If a row is modified by a SQL statement, Oracle Database generates redo records describing that change. Generated redo records are applied to the database blocks taking the blocks to next version. Direct mode inserts aka nologging inserts are not discussed in this blog entry.

Changes made by the SQL statement might be rolled back too. So, undo records are created in the undo block and these undo records describe how to rollback the SQL changes. Redo records are generated describing the changes to those undo blocks too. Further, these redo records are applied to the data blocks and undo blocks taking the blocks to next version. So, reviewing the redo records generated is sufficient to understand exactly what happens underneath.

To keep the redo log file dumps as clean as possible, we need a database with no activity. I have one such database and I have disabled all automatic jobs to make the redo dumps as clean as possible. It is also important to dump the log file from another session to maintain clarity.

So, in the following script, we will perform a checkpoint, switch log file from session #1. Then, We update the row with the same value from another session and commit. From Session #1, we switch log again from session #1.

Test case #1: Updating column to the same value
============
Session #1:
 alter system checkpoint;
 alter system switch logfile;
Session #2:
 update updtest set v1='Riyaj';
 commit;
Session #1: alter system switch logfile;

Essentially, Last redo log file has the redo records generated by executing the update statement. We need to dump the redo log file to see contents of the log file.

Notice that I am using ‘Riyaj’ as the value to update, as I can search for ASCII representation of my first name in hex easily in the dump file. I mean, Come on, who would not know the ASCII representation of their first name in hex? Here is a way to get Hex values:

select dump('Riyaj',16) Hex from dual

HEX
----------------------------
Typ=96 Len=5: 52,69,79,61,6a

We will use following Script to dump the last redo log file using ‘alter system dump logfile’ syntax:

-------------------------------------------------------------------------------------------  Script : dump_last_log.sql
------------------------------------------------------------------------------------------- This script will dump the last log file.
--   If the log file is big with enormous activity, this might take much resource.
--
--
--  Author : Riyaj Shamsudeen
--  No implied or explicit warranty !
-----------------------------------------------------------------------------------------set serveroutput on size 1000000
declare
  v_sqltext varchar2(255);
begin
 select 'alter system dump logfile '||chr(39)||member||chr(39)  into v_sqltext
  from v$log  lg, v$logfile lgfile
  where lg.group# = lgfile.group# and
  lg.sequence# = (select sequence#-1 from v$log where status='CURRENT' )
  and rownum <2
 ;
  dbms_output.put_line ('Executing :'||v_sqltext);
  execute immediate v_sqltext;
end;
/

Analysis of redo records:

Dumping the redo log file creates a trace file in the user_dump_dest destination. Let’s review the redo records associated with this statement.
Please refer to the output below and I am showing only relevant details. A REDO RECORD contains multiple change vectors and each change vector describe an atomic change to a block. First change vector is for OBJ: 77534 and that object_id is associated with the table UPDTEST. CHANGE #1 specifies how to Update Row Piece (URP): Update the column 0 to value of ’52 69 79 61 6a’ which is ‘Riyaj’ in the data block with Data Block Address (DBA) 0x0100113d associated with the object 77534.

REDO RECORD - Thread:1 RBA: 0x000071.00000002.0010 LEN: 0x021c VLD: 0x0d
SCN: 0x0000.0032b4c9 SUBSCN:  1 11/03/2010 10:51:02

CHANGE #1 TYP:2 CLS:1 AFN:4 DBA:0x0100113d OBJ:77534 SCN:0x0000.0032b40b SEQ:2 OP:11.5 ENC:0 RBL:0
KTB Redo
op: 0x11  ver: 0x01
..
KDO Op code: URP row dependencies Disabled
...
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 1 ckix: 0
ncol: 1 nnew: 1 size: 0
Vector content:
col  0: [ 5]  52 69 79 61 6a <--- 'Riyaj'

Change Vectors 2 and 3 are irrelevant for our discussion, we will ignore it. CHANGE vector #4 is modifying the block with DBA 0x00c00694 which is for the object with object_id 4294967295. Objects with object_id close to 4GB are for undo segments. This change vector holds an undo record. That undo record describes how to rollback the change: Update the row at slot 0 column 0 in the block with DBA 0x0100113d to ’52 69 79 61 6a’.

CHANGE #4 TYP:0 CLS:18 AFN:3 DBA:0x00c00694 OBJ:4294967295 SCN:0x0000.0032b4a6 SEQ:1 OP:5.1 ENC:0 RBL:0
...
Undo type:  Regular undo        Begin trans    Last buffer split:  No
...
KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
compat bit: 4 (post-11) padding: 0
op: L  itl: xid:  0x000a.00d.000005ae uba: 0x00c00996.01a7.2c
                      flg: C---    lkc:  0     scn: 0x0000.0032b3f1
KDO Op code: URP row dependencies Disabled
  xtype: XAxtype KDO_KDOM2 flags: 0x00000080  bdba: 0x0100113d  hdba: 0x0100113a
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: 0
Vector content:
col  0: [ 5]  52 69 79 61 6a

In a nutshell, update to the table row with the same value, updated the row value from ‘Riyaj’ to ‘Riyaj’. Even though, supplied value and current row value is the same value, update to the row piece must happen. It is important as both external and internalized triggers need to fire correctly.

But, what happened to the index update?

We have an index on that column v1 and we updated that indexed column too. Did Oracle update the indexed column? NO. If the values are matching at the indexed column level, then the RDBMS code is not updating the index, a redo optimization feature. Only the table row piece is updated and the index is not updated.

Updating to a different value

To explain what happens at the index level, we need to consider a test case that updates the column value to a different value. This test case is similar to the Test case #1 except that we are updating the column value from ‘Riyaj’ to ‘RiyajS’.

Test case #2: Updating to a different value.
============
Session #1:
 alter system checkpoint;
 alter system switch logfile;
Session #2:
 update updtest set v1='RiyajS';
 commit;
Session #1: alter system switch logfile;

First Change vector is updating the table row piece to ‘RiyajS’.

CHANGE #1 TYP:2 CLS:1 AFN:4 DBA:0x0100113d OBJ:77534 SCN:0x0000.0032b4c9 SEQ:2 OP:11.5 ENC:0 RBL:0
...
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 2 ckix: 0
ncol: 1 nnew: 1 size: 1
col  0: [ 6]  52 69 79 61 6a 53 <--RiyajS

CHANGE #3 below is updating the index leaf block. Update to an indexed column value results in delete(s) and insert(s) at the index level. Change #3 is deleting the leaf row and Change Vector #4 is inserting a leaf row.

CHANGE #3 TYP:0 CLS:1 AFN:4 DBA:0x01001153 OBJ:77535 SCN:0x0000.0032b567 SEQ:1 OP:10.4 ENC:0 RBL:0
index redo (kdxlde):  delete leaf row
KTB Redo
...
REDO: SINGLE / -- / --
itl: 2, sno: 0, row size 17

CHANGE vector #4 below is inserting a leaf row in the index leaf block with the key values:“06 52 69 79 61 6a 53 06 01 00 11 3d 00 00″.

06: is the length of key value
52 69 79 51 6a 53: ‘RiyajS’
01 00 11 3d 00 00: ROWID. (notice the dba of the block 0x0100113d). 
CHANGE #4 TYP:0 CLS:1 AFN:4 DBA:0x01001153 OBJ:77535 SCN:0x0000.0032b569 SEQ:1 OP:10.2 ENC:0 RBL:0
index redo (kdxlin):  insert leaf row
...
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 18
insert key: (14):  06 52 69 79 61 6a 53 06 01 00 11 3d 00 00

Undo change vectors

There are two more change vectors describing the undo block changes. CHANGE vector #6 specifies the undo record to rollback then change at the table block level. Basically, pre-image of the row piece is captured in here.

CHANGE #6 TYP:0 CLS:30 AFN:3 DBA:0x00c00832 OBJ:4294967295 SCN:0x0000.0032b4b1 SEQ:1 OP:5.1 ENC:0 RBL:0
...
KDO undo record:
KTB Redo
...
...
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: -1
col  0: [ 5]  52 69 79 61 6a

Change #7 and Change #8 specifies how to undo the changes at the index level. To undo the change at the index level, do a delete of current index entry and insert the older image of the index entry. Change #8 specifies to purge the leaf row with the key value ’06 52 69 79 61 6a 53 06 01 00 11 3d 00 00′ [ RiyajS + rowid combo].

CHANGE #7 TYP:0 CLS:30 AFN:3 DBA:0x00c00832 OBJ:4294967295 SCN:0x0000.0032b569 SEQ:1 OP:5.1 ENC:0 RBL:0
...
Undo type:  Regular undo       Undo type:  Last buffer split:  No
...
index undo for leaf key operations
...
(kdxlre): restore leaf row (clear leaf delete flags)
key ( 13):  05 52 69 79 61 6a 06 01 00 11 3d 00 00

Change #7 specifies to restore the leaf row with the key value ’05 52 69 79 61 6a 06 01 00 11 3d 00 00′ [Riyaj + rowid combo].

CHANGE #8 TYP:0 CLS:30 AFN:3 DBA:0x00c00832 OBJ:4294967295 SCN:0x0000.0032b569 SEQ:2 OP:5.1 ENC:0 RBL:0
...
Undo type:  Regular undo       Undo type:  Last buffer split:  No
...
index undo for leaf key operations
...
(kdxlpu): purge leaf row
key ( 14):  06 52 69 79 61 6a 53 06 01 00 11 3d 00 00

In a nutshell, updating an indexed column with a different column value, deletes current entry from the index leaf block and inserts an entry in the index leaf block with an updated column value. In addition, two change vectors are also added describing how to undo the change at the leaf block.

Wait, did RDBMS engine really delete from the index leaf block?

No, entries are not physically deleted from the index leaf block. Rather, entry with the old value is marked with a D flag and the new entry is added to the leaf block with updated value. This is visible dumping the index leaf block of the index UPDTEST_I1. Please review the lines from the leaf block dump shown below: Row #0 is for the value ‘Riyaj’ and Row #1 is for the value ‘RiyajS’. Row #0 is marked with a D flag indicating that entry as a deleted entry. Row #1 is the active current entry.

-- (Change history #2) Converting DBA 0x01001153 to decimal yields 16781651. 
-- You can use calculator for this conversion.
-- you can also use the following method:
-- select to_number('01001153','xxxxxxxx') from dual;

TO_NUMBER('01001153','XXXXXXXX
------------------------------
                      16781651
( Note that data_block_address_file returns relative file number, not absolute file number. So, you may have to join through dba_data_files to get absolute file number, if the relative file number does not equal absolute file number, for the file in question. Thank you Jeremy.).

--converting from dba to file#, block#.
undef dba
select
dbms_utility.DATA_BLOCK_ADDRESS_FILE(&&dba)||','||
dbms_utility.DATA_BLOCK_ADDRESS_BLOCK(&&dba)
from dual
/

Enter value for dba: 16781651
4,4435

Alter system dump datafile 4 block min 4435 block max 4435;

Trace file contents:
===================
row#0[7959] flag: ---D--, lock: 2, len=29
col 0; len 5; (5):  52 69 79 61 6a
col 1; len 6; (6):  00 00 00 00 00 00
col 2; len 6; (6):  01 00 11 3d 00 07
col 3; len 6; (6):  c0 91 86 99 e8 05

row#1[7905] flag: ------, lock: 2, len=30
col 0; len 6; (6):  52 69 79 61 6a 53
col 1; len 6; (6):  00 00 00 00 00 00
col 2; len 6; (6):  01 00 11 3d 00 07
col 3; len 6; (6):  c0 91 86 99 e8 05

Summary

If you have skipped all these hex dumps, I don’t blame you . But, if you have followed along, right on! To summarize:

  1. Updating the column value to the same value modifies the table block. Row piece in the table block is physically modified. But, the corresponding index entry is not modified.
  2. Updating the column value to a different value modifies both table and index blocks. Updating an indexed column value results in a delete and insert of index entries in the index leaf block.
  3. Delete of an index entry does not delete the entry physically, rather marks the entry with a D flag. Of course, future inserts in to this block will reuse the entry at some point in time. So, you don’t necessarily lose that space permanently.

Of course, there are other scenarios such as index leaf block split, branch block split etc, not covered in this blog entry.
Thanks to Mark Bobak and Jared Still for reviewing this blog entry and providing valuable contributions.

You can read this blog in a more conventional format as update to column value_v2 .

Version: Oracle Database release 11.2.0.1
Change history #2: Fixed a typo with block number. Thanks to Sreejith Sreekantan for pointing out the error.

Posted in Oracle database internals, Performance tuning, recovery | Tagged: , , , , , , , , | 9 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 264 other followers