Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

gc cr disk read

Posted by Riyaj Shamsudeen on January 13, 2012

You might encounter RAC wait event ‘gc cr disk read’ in 11.2 while tuning your applications in RAC environment. Let’s probe this wait event to understand why a session would wait for this wait event.

Understanding the wait event

Let’s say that a foreground process running in node 1, is trying to access a block using a SELECT statement and that block is not in the local cache. To maintain the read consistency, foreground process will require the block consistent with the query SCN. Then the sequence of operation is(simplified):

  1. Foreground session calculates the master node of the block; Requests a LMS process running in the master node to access the block.
  2. Let’s assume that block is resident in the master node’s buffer cache. If the block is in a consistent state (meaning block version SCN is lower (or equal?) to query SCN), then LMS process can send the block to the foreground process immediately. Life is not that simple, so, let’s assume that requested block has an uncommitted transaction.
  3. Since the block has uncommitted changes, LMS process can not send the block immediately. LMS process must create a CR (Consistent Read) version of the block: clones the buffer, applies undo records to the cloned buffer rolling back the block to the SCN consistent with the requested query SCN.
  4. Then the CR block is sent to the foreground process.

LMS is a light weight process

Global cache operations must complete quickly, in the order of milli-seconds, to maintain the overall performance of RAC database. LMS is a critical process and does not do heavy lifting tasks such as disk I/O etc. If LMS process has to initiate I/O, instead of initiating I/O, LMS will downgrade the block mode and send the block to the requesting foreground process (this is known as Light Works rule). Foreground process will apply undo records to the block to construct CR version of the block.

Now, the Foreground process might not find the undo blocks in the local cache as the transactions happened in the remote cache. A request is sent to remote LMS process to access undo block. If the undo block is not in the remote cache either, remote LMS process will send a grant to the foreground process to read the undo block from the disk. Foreground process accounts this wait time for the undo segment block grants to the ‘gc cr disk read’ wait event.

There are other reasons as to why FG process might have to read undo block. One of them is that Fairness downconvert triggered by LMS process. Essentially, if a block is requested too many times leading to many CR block fabrication, then instead of LMS doing more work, LMS process will simply down convert the block, send the block and grant to the requester an access to the block. FG process will apply undo to construct CR block itself.

gv$cr_block_server can be used to review the number of down converts, Light works etc. But, it is probably not possible to identify the reason for a block down convert after the event.

Why do we need this new event?

There is a very good reason why this event was introduced. Prior to 11g, waits for single block CR grants are accounted to wait event such as ‘gc cr block 2-way’, ‘gc cr block 3-way’ etc. Waits for grants on remote-instance-undo-blocks for CR fabrication is special, in the sense that, this is an additional unnecessary work from the application point of view. We need to be able to differentiate the amount of time spent waiting for undo block grants for CR fabrication vs other types of grants (such as data blocks etc). So, it looks like, Oracle has introduced this new event and I do think that this will be very useful for debugging performance issues.

Prior to 11g, you could still differentiate waits for single block grants for undo using ASH data or trace files. But, you will have to use the obj# field for this differentiation and obj# is set to 0 or -1 in the case of undo blocks/undo header blocks.

Test case

Of course, a test case would be nice, Just any regular table will do and my table structure have just two columns number, varchar2(255) with 1000 rows or so.

create table rs.t_1000 (n1 number, v1 varchar2(255));
insert into rs.t_1000 select n1, lpad(n1, 255, 'DEADBEAF') from (select level n1 from dual connect by level <=1000;
commit;

  1. node 1: update rs.t_1000 set v1=v1 where n1=100
  2. node 2; select * from rs.t_1000 where n1=100 – just to get parsing details away.
  3. node 1: alter system flush buffer cache; –flushed buffer cache.
  4. node 2: select * from rs.t_1000 where n1=100 — This SELECT statement suffers from gc cr disk read.

At step 3 in our test case, I flushed the buffer cache in node 2. When I reread the block again in node 1, here is an approximate sequence of operations that occurred:

  1. For SELECT statement in step 4, foreground sent a block request to LMS process in node2; LMS process in node 2 did not find the block in the buffer cache (since we flushed the buffer cache).
  2. So, LMS process in node2 sent a grant to the foreground process to read the data block from disk.
  3. Foreground process read the block from the disk, found that block version is higher than the query environment SCN and that there is a pending transaction in an ITL entry of the block.
  4. Foreground process clones the buffer and tries to apply undo records in order to reconstruct CR version of the block to match the query environment SCN.
  5. But, that pending transaction was initiated in node 2 and that undo segment is mastered by node 2. So, FG process sends a request for the block to node 2 LMS process. LMS process does not find the undo segment block in the node 2 cache and sends back a grant to read the block from the disk to the FG process. Meanwhile, the FG process is still waiting and the amount of time that FG process was waiting to receive the grant is accounted towards ‘gc cr disk read’.

(I formatted trace lines to improve readability and my comments are inline)

.

PARSING IN CURSOR #18446741324873694136 len=162 dep=0 uid=0 oct=3 lid=0 tim=1103269602 hv=361365006 ad='73fba918' sqlid='7cmy6msasmzhf'
select dbms_rowid.rowid_relative_fno (rowid) fno,
dbms_rowid.rowid_block_number(rowid) block,
dbms_rowid.rowid_object(rowid) obj, v1 from rs.t_1000 where n1=100
END OF STMT
c=0,e=17012,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=479790187,tim=1103269601
c=0,e=12393,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=479790187,tim=1103282129
nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=603 tim=1103288000
nam='Disk file operations I/O' ela= 7 FileOperation=2 fileno=4 filetype=2 obj#=603 tim=1103288120
nam='db file sequential read' ela= 786 file#=4 block#=2891 blocks=1 obj#=85844 tim=1103289701
nam='db file sequential read' ela= 560 file#=4 block#=2892 blocks=1 obj#=85844 tim=1103290450
nam='Disk file operations I/O' ela= 4 FileOperation=2 fileno=7 filetype=2 obj#=85844 tim=1103290546
nam='db file sequential read' ela= 542 file#=7 block#=2758 blocks=1 obj#=75154 tim=1103291149
-- RS: Following is for undo header block to find the transaction.
nam='gc cr disk read' ela= 633 p1=6 p2=176 p3=43 obj#=0 tim=1103292048
nam='db file sequential read' ela= 662 file#=6 block#=176 blocks=1 obj#=0 tim=1103292843
-- RS: Following read is for undo block itself to rollback the transaction changes.
nam='gc cr disk read' ela= 483 p1=6 p2=955 p3=44 obj#=0 tim=1103293699
nam='db file sequential read' ela= 569 file#=6 block#=955 blocks=1 obj#=0 tim=1103294355
nam='library cache pin' ela= 1045 handle address=2043988208 pin address=1969440144 100*mode+namespace=48820893384706 obj#=0 tim=1103295827
FETCH :c=0,e=8033,p=5,cr=6,cu=0,mis=0,r=1,dep=0,og=1,plh=479790187,tim=1103296065
WAIT : nam='SQL*Net message from client' ela= 323 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1103296522
FETCH :c=0,e=24,p=0,cr=1,cu=0,mis=0,r=0,dep=0,og=1,plh=479790187,tim=1103296586
STAT id=1 cnt=1 pid=0 pos=1 obj=75154 op='TABLE ACCESS BY INDEX ROWID T_1000 (cr=7 pr=5 pw=0 time=6505 us cost=2 size=261 card=1)'
STAT id=2 cnt=1 pid=1 pos=1 obj=85844 op='INDEX RANGE SCAN T_1000_N1 (cr=3 pr=2 pw=0 time=2494 us cost=1 size=0 card=1)'
WAIT : nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1103296728

Trace file analysis

Let’s review few lines from the trace file. Block 2758 holds the row physically. After reading that block, a ‘gc cr disk read’ wait event is encountered. Essentially, the FG identified that there is a pending transaction, so sent a request to LMS accounting the wait time to ‘gc cr disk read’ event. For this ‘gc cr disk read’ event, p1 is file_id, p2 is block_id, p3 seems to be a counter increasing by 1 for each of these waits. For example, for the next gc cr disk read, p3 is set to 44. obj#=0 indicates that this is an undo block or undo header block.

Notice that next line indicates a physical read for that undo header block occurred and the obj# is set to 0.I verified the undo header block by dumping the block too


nam='db file sequential read' ela= 542 file#=7 block#=2758 blocks=1 obj#=75154 tim=1103291149
nam='gc cr disk read' ela= 633 p1=6 p2=176 p3=43 obj#=0 tim=1103292048
nam='db file sequential read' ela= 662 file#=6 block#=176 blocks=1 obj#=0 tim=1103292843

Commit Cleanout

Commit cleanouts is another reason why you would encounter this wait event. When a session commits, that session will revisit the modified blocks to clean out the ITL entries in the modified blocks. But, this cleanout does not happen in all scenarios. For example, if the number of block changes exceeds list of blocks that session maintains in SGA then the session will mark the transaction table as committed, without cleaning out ITL entries in the actual blocks. Commit cleanout doesn’t happen if the modified block is not in buffer cache anymore (We will use this idea to trigger commit cleanout in the other node).

Next session reading that block will close out the ITL entry in the block with an upper bound SCN. Let’s tweak our test case little bit to simulate commit cleanouts.

One notable difference in the test case below is that after flushing the buffer cache, we switch back to the session updating the block,commit the transaction and flush the buffer cache again. With these two flush we guarantee that session will not commit cleanout and undo block is flushed from the cache. SELECT statement is executed after the commit. When we read the same block from a different node then the reader session will cleanout the ITL entry. Cleanout operation require to identify the transaction state and max query SCN. To identify whether the transaction is committed or not, session in node2 must read the undo header block. Waits to receive the grants for the undo|undo header block are accounted towards gc cr disk read wait event.

  1. node 1: update rs.t_1000 set v1=v1 where n1=100
  2. node 2; select * from rs.t_1000 where n1=100 – just to get parsing details away.
  3. node 1: alter system flush buffer cache; –flushed buffer cache.
  4. commit in node 1 from the other session.
  5. node 1: alter system flush buffer cache; –flushed buffer cache to remove undo blocks from cache.
  6. node 2: select * from rs.t_1000 where n1=100 .

Trace file

My comments are inline


...
-- block with the row is read below
WAIT : nam='db file sequential read' ela= 417 file#=7 block#=2758 blocks=1 obj#=75154 tim=783492616
WAIT : nam='Disk file operations I/O' ela= 2 FileOperation=2 fileno=6 filetype=2 obj#=75154 tim=783492751
-- Undo header block is read with time accounted towards gc cr disk read. Note that, there are no other reads after this.
-- In the prior test case that we saw earlier, there was an additional read to read undo block;
-- In this test case, no additional read for undo block as only commits need to be cleaned out.
WAIT : nam='gc cr disk read' ela= 12779 p1=6 p2=160 p3=41 obj#=0 tim=783507346
WAIT : nam='db file sequential read' ela= 778 file#=6 block#=160 blocks=1 obj#=0 tim=783508447

So, what can we do?

Is there any thing you can do to improve the performance of the application and reduce ‘gc cr disk read’ wait events? There are few options that you can consider, some are long-term options though.
  1. Consider Application affinity. If the application is running in the same node that transactions are aggressively modifying the objects, you can reduce/eliminate the grants. Application affinity is important even though cache fusion is in play.
  2. Of course, tune the statement such a way that number of block visits are reduced.
  3. Reduce commit cleanouts. For example, if a program is excessively modifying the block in node 1. Subsequent program is reading the same blocks in node 2, then you may have to cleanout the block manually from node 1 by reading the blocks. BTW, parallel queries do not perform commit cleanouts, only serial queries will perform commit cleanouts. Another operation that can get stuck is index creation. If the table has numerous blocks without commit cleanouts, then index build might be slower and can suffer from gc cr disk read events. This is magnified by the fact that these reads are single block reads and much slower for bulk operations such as index rebuild. Reading through all blocks of the table through SELECT statements might be good enough to complete commit cleanouts.
  4. Don’t keep long pending transactions on highly active tables in RAC environment. This typically happens for applications that uses tables as queue to pick up jobs i.e. fnd_concurrent_requests table in EBusiness suite. Proper configuration of concurrent manager, optimal values for sleep and cache might help here.
  5. For read only tablespace, you might want to make sure that there are no unnecessary commit cleanout operation, since the block will be never cleaned and so every session will try to clean out the block.

Summary

In summary, this is an useful event to differentiate CR fabrication performance issues and other performance issues. Using few techniques mentioned here, you can reduce the impact of this event.

 

6 Responses to “gc cr disk read”

  1. Most excellent posting, my friend. Clarity and technical depth winding up in 5 solid recommendations that have great potential to increase throughput and scalability and little risk that someone will be able to twist your well explained reasoning into anything harmful.

    A good whole chapter could be written about preparation of a tablespace to be made read only for a substantial period of time.

    In extreme cases someone might also need to hermitize* sequences in support of application affinity so that blocks of seq$ are spared from excessive instance migration as well.

    *hermitize meaning to isolate rows one per block (like hermits, one per dwelling place). Oracle normally stores about 75 sequence definitions per 8K block. The block grouping tends to be by order of definition and so may cross application affinity boundaries. Listing out the sequence names grouped by block can be quite illuminating.

  2. kulkarni said

    we are also facing similar issue in our production environment. A select query reads UNDO blocks endlessly. Here I traced the session and got the following lines in the trace file.

    WAIT #2: nam=’db file sequential read’ ela= 7208 file#=5 block#=1352468 blocks=1 obj#=0 tim=5088625409951
    WAIT #2: nam=’gc cr disk read’ ela= 446 p1=5 p2=1352466 p3=252 obj#=0 tim=5088625410501
    WAIT #2: nam=’db file sequential read’ ela= 6460 file#=5 block#=1352466 blocks=1 obj#=0 tim=5088625417000
    WAIT #2: nam=’gc cr disk read’ ela= 507 p1=5 p2=1352463 p3=252 obj#=0 tim=5088625417585
    WAIT #2: nam=’db file sequential read’ ela= 4773 file#=5 block#=1352463 blocks=1 obj#=0 tim=5088625422392
    WAIT #2: nam=’gc cr disk read’ ela= 681 p1=5 p2=1352460 p3=252 obj#=0 tim=5088625423231
    WAIT #2: nam=’db file sequential read’ ela= 9462 file#=5 block#=1352460 blocks=1 obj#=0 tim=5088625432742

    Here you can see that only UNDO blocks are read and there is no request for data block. Is this due to commit clean out? This is THREE node 11gR1 RAC. We kill the session and rerun the job then it goes through.

    We find this issue happening when ever this select query runs from either node1 or node2 and UNDO blocks are from node3( file#=5 is datafile for UNDO tablespace for node3)

    Here is some more information on traffic between the nodes.

    SQL> select INST_ID,CR_REQUESTS,CURRENT_REQUESTS,UNDO_REQUESTS,DISK_READ_RESULTS,FAIRNESS_DOWN_CONVERTS,LIGHT_WORKS,ERRORS from gv$cr_block_server order by 1;

    INST_ID CR_REQUESTS CURRENT_REQUESTS UNDO_REQUESTS DISK_READ_RESULTS FAIRNESS_DOWN_CONVERTS LIGHT_WORKS     ERRORS
    ------- ----------- ---------------- ------------- ----------------- ---------------------- ----------- ----------
          1    54693551          6422699       1676129           7381302                6125434     4608074          0
          2     2948493          3134301       2661336           2829933                 352236      157720          0
          3    58158519         25104616      18365357          19676544                6631054    16605103          0
    
    

    To avoid this commit clean out we need to find which process did modify the blocks in the node3 so that it can be run on the other nodes. We are not able to find it.

    • Hello Kulkarni

      For commit cleanouts:- undo header blocks (transaction table entries) and undo blocks associated with the transaction table control block changes must be read. The transaction table in the undo header block will indicate the transaction state. But, if the transaction table slot is overwritten for that transaction (and so commit SCN is not immediately available), then Database code will follow undo chain of the transaction table control blocks to create a prior version of transaction table recursively, in a quest to find commit SCN of the transaction.

      For consistent reads:- undo chain of a transaction (a transaction that is either active or commit SCN of that transaction is greater than query SCN) must be followed for consistent read buffer generation.

      First thing is to differentiate between two events: Does the session suffer from ‘data blocks consistent reads – undo records applied’ or ‘transaction table consistent reads – undo records applied” issue? (Stats names can be little different between versions) Check the session level stats to identify that:

      undefine sid
      undefine search_string
      accept search_string prompt 'Enter the wildcard character (Null=All):'
      accept value_threshold prompt 'Enter value threshold :'
      accept sid prompt 'Enter sid :'
      select  stat.name, ses.value from
      v$sesstat ses , v$statname stat
      where stat.statistic#=ses.statistic# and
      ses.sid=&sid
      and lower(stat.name) like  lower('%'||nvl('&&search_string','%')||'%')
      and ses.value >= to_number(nvl('&&value_threshold',0))
      /
      

      As an additional check, review the block contents by dumping the blocks. Check if the undo records are indeed application table changes. You may have to dump the blocks to see contents. alter system dump datafile block min block max is the syntax.

      (1) If the blocks are undo header blocks and stats indicate high ‘transaction table consistent reads – undo records applied’, then it is possible that you are running in to a side effect of auto undo retention tuning feature. Essentially, auto tuning of undo retention tries to reduce probability of ORA-1555 errors, by keeping as much undo possible tuning undo_retention upwards. But, this can negatively affect commit cleanouts since the code is seemingly trying to find exact commit SCN by rolling back the transaction table header changes numerous times.

      In this scenario, you can setup max undo_retention and reduce the effects of auto tuned undo retention feature. you can also setup a job to select from the table, serially, and perform commit cleanouts in the same node.

      (2) If the changes are for CR generation: (not undo header blocks), then it would imply that these blocks are read for Consistent Read generation. Meaning, few processes have updated objects heavily in one node,and sessions in other nodes are reading the blocks, but can not use the blocks since there is a pending transaction (or that commit SCN is later then query env SCN), and so rolling back the transactions to create a Consistent Read copies.

      You could solve this problem with two approaches. If node 3 is modifying blocks heavily by (a) either schedule those intensive updates during less active period, so that, you can avoid these CR storm (b) Modify the application affinity such a way that programs modifying the objects aggressively and programs reading the blocks aggressively are executed in the same node. Features such as “in memory undo” etc doesn’t work well with RAC and so, other node sessions might have to read the blocks physically.

      I do see that light_works in node 3 is much higher than other nodes, this would indicate that node 3 is performing massive changes and other nodes are reading the blocks from node 3.

      I thought, I was going to write few words :)

      HTH

      Cheers,
      Riyaj

      • kulkarni said

        Hello Riyaj,
        Thanks a lot for your wonderful explanation. I will run the query provided by you to check session level stats when this problem occurs again. This will confirm if waits are for data block consistent reads or for transaction table consistent reads.

        However by the following lines in the trace file it appears that the session is suffering from ‘transaction table consistent reads – undo records applied’.

        WAIT #2: nam=’db file sequential read’ ela= 4278 file#=5 block#=1351837 blocks=1 obj#=0 tim=5088627255019 –> here block being waited for is from an UNDO segment.

        WAIT #2: nam=’gc cr disk read’ ela= 593 p1=5 p2=1351834 p3=252 obj#=0 tim=5088627255928 –> again blocks are read from UNDO segment for UNDO apply for transaction table consistent reads.

        I will also dump the blocks next time when this issue occurs again( I hope it is better to dump the data block( here UNDO block) when this issue is happening)

        Thanks again for sparing your valuable,

        Bheemasen

  3. [...] quias os latch contetion que estão ocorrendo na base de dados; • Top 10 Queries using more disk reads: mostra as 10 queries que estão consumindo mais disco neste momento; • Top 10 Queries using more [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 199 other followers

%d bloggers like this: