Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

RAC performance tuning: Understanding Global cache performance

Posted by Riyaj Shamsudeen on December 23, 2009

Global cache performance metrics are not correctly measured. It is not understood clearly either. There are even few blogs and web pages disseminating incorrect information. This blog entry is an attempt to offer few methods and scripts to understand global cache performance.

Gentle remainder that all scripts are now uploaded as zip file racperf_gc_scripts.zip. Please download that zip file instead of clicking the links below.

Always review all instances

It is very important to review the performance metrics from all instances in that RAC cluster, not just one instance that you are connected. If you have access to AWR reports, then it is critical to generate AWR reports (or statspack reports) from all instances. But, the problem is that, DBAs tend to generate AWR reports after logging in to each instance iteratively, enter couple of parameters and then reports are generated. Not exactly a convenient practice.

  REM connect to each instance separately, type in the beginning snap_id and ending snap_id for each node etc..
   sqlplus mydba@proddb1 
   @$ORACLE_HOME/rdbms/admin/awrrpt.sql
   exit;
   sqlplus mydba@proddb2
   @$ORACLE_HOME/rdbms/admin/awrrpt.sql
   exit;
   sqlplus mydba@proddb3
   @$ORACLE_HOME/rdbms/admin/awrrpt.sql
   exit;

  

There are few issues with this approach. It is a cumbersome practice if the instance count is higher. In addition to that, all of AWR reports are, in turn, accessing underlying AWR tables. Physically, rows from all instances are together in the same block and so, by executing these reports connecting to various instances, Global cache traffic is increased. If the database is suffering from Global cache (GC) performance issues then generating reports connecting to various instances is probably not a grand idea.

I have created few basic scripts to generate AWR reports from all instances in a single step. Script awrrpt_all_gen.sql (Refer to zip file in this blog) queries AWR tables and gv$instance to access last snap_id and various instance_ids. Then this script generates AWR reports for all instances in that cluster in the current directory, for the last AWR snap. This script is quite handy while debugging RAC performance issues. Now, we have AWR reports from all instances in one directory and we can easily grep these files to understand metrics in various instances.

@awrrpt_all_gen.sql

awrrpt_all_gen.sql v1.00 by Riyaj Shamsudeen @orainternals.com

To generate AWR Report from all RAC instances concurrently.
Creates reports using last two snap_ids.

...Generating awrrpt_all.sql script.... Please wait....
...Completed script generation.

Executing awrrpt_all.sql to generate AWR reports.
...Generates AWR reports with file name format awrrpt___.txt for each instance.
...Please wait for few minutes...

...AWR report created for instance 1. Please wait..
...AWR report created for instance 2. Please wait..
...AWR report created for instance 3. Please wait..
...AWR report created for instance 4. Please wait..

AWR reports created.

If you would like ability to choose the snap_ids, then use the script awrrpt_all_range_gen.sql in the zip file.

Use granular approach to GC measurements

It is important to measure Global cache performance metrics correctly, especially those damned averages. Incorrect measurements of averages can lead to faulty analysis. For example, following query is retrieving average global cache CR receive time from gv$sysstat view. Problem with this SQL statement is that this statement is retrieving average global cache receive time from the restart of the instances. Global cache performance is quite sensitive to workload and querying over a wide time range, as in this query, can lead to faulty analysis.


REM This is really a bad idea!!

select b1.inst_id, b2.value "RECEIVED",
b1.value "RECEIVE TIME",
((b1.value / b2.value) * 10) "AVG RECEIVE TIME (ms)"
from gv$sysstat b1, gv$sysstat b2
where b1.name = 'gc current block receive time' and
b2.name = 'gc current blocks received' and b1.inst_id = b2.inst_id;

As an analogy, above query is equivalent to a patient-Doctor conversation:

Patient: Doctor, I don’t feel good. I feel that I am having fever.
Doctor : Nope. Your average body temperature from the beginning of this year is around 98. So, you are doing excellent.

It is essential to review the global cache performance metrics at a granular level. Use AWR reports or statspack reports for a short period of time. Better yet, if you want to quickly review GC performance metrics, you can use my script gc_traffic_print.sql in the zip file. This script captures various statistics relevant to Global cache performance in to a PL/SQL table, sleeps for N seconds (default N=60) and re-captures the statistics after the sleep. Then the script prints all the relevant details in easily readable tabular format. Very useful in quickly reviewing the Global cache performance metrics.

SQL> @gc_traffic_print.sql

gc_traffic_print.sql v1.20 by Riyaj Shamsudeen @orainternals.com

...Calculating GC Rx and Tx timing and blocks..
...Default collection period is 60 seconds.... Please wait for at least 60 seconds...
Enter value for sleep: 30
---------|--------------|---------|----------------|----------|---------------|---------------|-------------|
Inst     | CR blocks Rx | CR time |  CUR blocks Rx | CUR time |  CR blocks Tx | CUR blocks Tx |Tot blocks   |
---------|--------------|---------|----------------|----------|---------------|---------------|-------------|
1        |         29288|    12.17|          14914|      4.58|          27059|           17439|        88700|
2        |         21201|     5.75|          12835|      5.19|          26831|           16741|        77608|
3        |         22942|     4.64|          11751|      3.82|          23918|            9205|        67816|
4        |         32075|     4.78|          19260|      4.55|          31046|           19331|       101712|
---------|--------------|---------|----------------|----------|---------------|---------------|-------------|

PL/SQL procedure successfully completed.

Column key for the above script is:
CR block RX : GC CR blocks received
CR time : Average GC CR receive time
CUR blocks RX: GC CUR blocks received
CUR time: Average GC CuR receive time
CR blocks TX: GC CR blocks transmitted
CUR blocks TX: GC CUR blocks transmitted
tot blocks: Sum of transmitted + received for both CR and CUR traffic

Breakup: global cache performance

Global cache processing performance can be broken down to its sub components ( excluding wire latency, but that can be calculated). Breakup of this metrics must be done at a granular level also. We can use yet another script gc_traffic_processing.sql to review the performance metrics for the past N seconds. For example, in the output printed below, Global cache CR processing time is broken in to various sub-components of global cache processing. Remember that these performance metrics are in the SEND side of Global cache performance metrics. Meaning, inefficient global cache processing in one instance affects other instances’ global cache receive time, but not the self. For example, in the table above, GC CR receive performance was at 12.1ms for instance 1. That is due to another instance (probably node 4 in this case) suffering from high GC processing time (more specifically high ‘CR block flush time’ in node 4). It is far more accurate to review global cache performance issues in a granular basis. Caution though, try not to specify at least a minute so that these numbers are consistent.

@gc_traffic_procssing.sql
gc_traffic_processing.sql v1.00 by Riyaj Shamsudeen @orainternals.com

...Prints various timing related information for the past N seconds
...Default collection period is 60 seconds.... Please wait for at least 60 seconds...
Enter value for sleep:

---------|-----------|---------|-----------|----------|------------|------------|------------|----------|
Inst     | CR blk Tx | CR bld  | CR fls tm | CR snd tm| CUR blk TX | CUR pin tm | CUR fls tm |CUR snd tm|
---------|-----------|---------|-----------|----------|------------|------------|------------|----------|
1        |      62732|       .2|       1.25|       .21|       48920|        .96|         .22|        .22|
2        |      46037|      .13|        .76|        .2|       42019|        .68|         .17|         .2|
3        |      72237|      .63|        .79|       .19|       22697|        .44|         .29|        .23|
4        |     113816|      .13|       2.04|       .15|       51020|        .87|         .12|        .13|
--------------------------------------------------------------------------------------------------------

PL/SQL procedure successfully completed.

Here is the column definitions for the script output:
CR blk TX: GC CR blocks served
CR bld : Average GC CR build time/CR block served
CR fls tm: Average GC CR flush time/CR block served
CR snd tm: Average GC CR send time/CR block served
CUR blk TX: GC CUR blocks served
CUR pin tm: Average GC Current pin time /CUR block served
CUR fls tm : Average GC Current flush time/CUR block served
CUR snd tm: Average GC current send time/CUR block served

Review histograms for GC events

Averages can be misleading. Revealing performance issues underneath the average is a very important task to do. For example, we see that ‘gc cr block receive time’ average is high. But, let’s review the histogram for the gc events related to Global cache receive time. My script gc_event_histogram.sql( refer to zip file) is quite handy to review event histograms. This script provides a difference in performance statistics between two samples and prints them in a readable format. In this case, node 1 is has 8578 waits for the event ‘gc cr block 2-way’ under 2ms and the sessions in instance 1 had 9851 waits between 2-4ms. Only few waits over 16ms.

@gc_event_histogram.sql

gc_event_histogram.sql v1.0 by Riyaj Shamsudeen @orainternals.com

Default collection period is sleep seconds. Please wait..
Enter value for event: gc cr block 2-way
Enter value for sleep: 60
---------|-----------------------|----------------|----------|
Inst id  | Event                 |wait time milli |wait cnt  |
---------|-----------------------|----------------|----------|
1        |gc cr block 2-way      |               1|        98|
1        |gc cr block 2-way      |               2|      8578|
1        |gc cr block 2-way      |               4|      9851|
1        |gc cr block 2-way      |               8|      2291|
1        |gc cr block 2-way      |              16|       117|
1        |gc cr block 2-way      |              32|         6|
1        |gc cr block 2-way      |              64|         1|
1        |gc cr block 2-way      |             128|         2|
1        |gc cr block 2-way      |             256|         0|
1        |gc cr block 2-way      |             512|         0|
1        |gc cr block 2-way      |            1024|         0|
1        |gc cr block 2-way      |            2048|         0|
1        |gc cr block 2-way      |            4096|         0|
1        |gc cr block 2-way      |            8192|         0|
1        |gc cr block 2-way      |           16384|         0|
2        |gc cr block 2-way      |               1|       155|
2        |gc cr block 2-way      |               2|     10792|
2        |gc cr block 2-way      |               4|     14201|
2        |gc cr block 2-way      |               8|      3887|
2        |gc cr block 2-way      |              16|       204|
...

But, we also need to see 3-way waits. Let’s review 3-way waits with the above script. We will simply supply ‘gc cr block 3-way’ as the wait event. In this case, we see that there are 12,062 waits over between 4-8ms. Of course, this must be improved.


@gc_event_histogram.sql

gc_event_histogram.sql v1.0 by Riyaj Shamsudeen @orainternals.com

Default collection period is sleep seconds. Please wait..
Enter value for event: gc cr block 3-way
Enter value for sleep: 60
---------|-----------------------|----------------|----------|
Inst id  | Event                 |wait time milli |wait cnt  |
---------|-----------------------|----------------|----------|
1        |gc cr block 3-way      |               1|         3|
1        |gc cr block 3-way      |               2|       320|
1        |gc cr block 3-way      |               4|     17017|
1        |gc cr block 3-way      |               8|     12062|
1        |gc cr block 3-way      |              16|      1725|
1        |gc cr block 3-way      |              32|       113|
1        |gc cr block 3-way      |              64|         0|
1        |gc cr block 3-way      |             128|         0|
1        |gc cr block 3-way      |             256|         0|
1        |gc cr block 3-way      |             512|         0|
1        |gc cr block 3-way      |            1024|         0|
1        |gc cr block 3-way      |            2048|         0|
1        |gc cr block 3-way      |            4096|         0|
1        |gc cr block 3-way      |            8192|         0|
...

In essence, these four scripts will be useful in debugging Global cache performance issues. Please let me know if you encounter errors in these scripts. We will use these scripts in future discussions to show how we resolved few RAC performance issues.
Update 1/13/12: Updated broken links.
Update 6/3/12: Updated script versions correctly.
Update 2/23/15: zip file containing all scripts.
racperf_gc_scripts.zip

29 Responses to “RAC performance tuning: Understanding Global cache performance”

  1. Statistique said

    Hi,

    Nice series of script. But gc_traffic_print.sql seems to have a little “bug”. If the sleeping period is too short or this is no activity on the database the following error happens :
    ERROR at line 1:
    ORA-01476: divisor is equal to zero
    ORA-06512: at line 121

    Environnement is Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options

    One of those operation migth result into a division by zero :

    (e_cr_tm_recv (i) – b_cr_tm_recv (i))
    / (e_cr_blks_recv (i) – b_cr_blks_recv (i))

    I thought you might like to know !

    And oh !!! Happy holidays !

    • Thank you Statistique.

      I have modified the scripts to handle zero divisor issue. Please download the new versions.

      But, it is probably not a good idea to keep the time window too small either such as 1 or 2 seconds. Time out for few events are at 1 second and keeping the window too small can cause issues too.

      Cheers
      Riyaj

  2. Idem:

    SQL> /
    Enter value for sleep: 1
    old 77: select nvl (‘&sleep’,60) into l_sleep from dual;
    new 77: select nvl (‘1’,60) into l_sleep from dual;
    ———|————–|———|—————-|———-|—————|—————|————-|
    Inst | CR blocks Rx | CR time | CUR blocks Rx | CUR time | CR blocks Tx | CUR blocks Tx |Tot blocks |
    ———|————–|———|—————-|———-|—————|—————|————-|
    declare
    *
    ERROR at line 1:
    ORA-01476: divisor is equal to zero
    ORA-06512: at line 115

  3. Yasser said

    Thanks for sharing scripts.

    Eagerly waiting for future posts where you have used these scripts.

    -Yasser

  4. […] 16-How to diagnose global cache performance ? -Scripts collection Riyaj Shamsudeen-RAC performance tuning: Understanding Global cache performance […]

  5. karlarao said

    Cool script! this will be very useful.. 🙂

    – Karl Arao

  6. MirchiFM said

    Thanks for sharing the scripts.

  7. Hi ,

    I executed the script in my env but the output i got is of zero bytes..

    SQL> @awrrpt_all_gen.sql

    awrrpt_all_gen.sql v1.00 by Riyaj Shamsudeen @orainternals.com

    To generate AWR Report from all RAC instances concurrently.

    Creates reports using last two snap_ids.

    …Generating awrrpt_all.sql script…. Please wait….
    …Completed script generation.

    Executing awrrpt_all.sql to generate AWR reports.
    …Generates AWR reports with file name format awrrpt___.txt for each instance.
    …Please wait for few minutes…

    AWR reports created.
    SQL> !ls -lrt awrrpt_all*
    -rw-r–r– 1 osteelp dba 4247 Jul 19 14:32 awrrpt_all_gen.sql
    -rw-r–r– 1 osteelp dba 0 Jul 19 14:32 awrrpt_all.sql

    ??!!!

    thanks,
    baskar.l

    • Hello Baskar
      That’s because serveroutput is off in your session. Please turn on serveroutput on with “set serveroutput on” command, then rerun the script.
      Cheers
      Riyaj

  8. Hi Thanks very much Riyaj…Thanks again for sharing a very useful script..

    regards,
    baskar.l

  9. Hi Riyaj,

    Just got confused..i executed the script first time it generated the output..and executed again i didnt generate a output..

    First time

    SQL> set termout off
    SQL> set serveroutput on
    @awrrpt_all.sql
    SQL> set termout on
    set pagesize 24
    PROMPT AWR reports created.
    set serveroutput off
    …AWR report created for instance 1. Please wait..
    SQL> SQL> SQL> AWR reports created.
    SQL> SQL>

    second time when i executed the same script..it doesnt generate..

    SQL> PROMPT

    SQL> set termout off
    SQL> set serveroutput on
    @awrrpt_all.sql
    set termout on
    SQL> SQL> SQL> set pagesize 24
    SQL> PROMPT AWR reports created.
    AWR reports created.
    SQL> set serveroutput off
    SQL>

    thanks,
    baskar.l

    • It worked correctly both times for me. Are you calling awrpt_all_gen or awrrpt_all.sql?

      What version?

      SQL> @awrrpt_all_gen

      awrrpt_all_gen.sql v1.00 by Riyaj Shamsudeen @orainternals.com

      To generate AWR Report from all RAC instances concurrently.

      Creates reports using last two snap_ids.

      …Generating awrrpt_all.sql script…. Please wait…. …Completed script generation.

      Executing awrrpt_all.sql to generate AWR reports. …Generates AWR reports with file name format awrrpt___.txt for each instance. …Please wait for few minutes…

      …AWR report created for instance 1. Please wait.. AWR reports created. SQL> @awrrpt_all_gen

      awrrpt_all_gen.sql v1.00 by Riyaj Shamsudeen @orainternals.com

      To generate AWR Report from all RAC instances concurrently.

      Creates reports using last two snap_ids.

      …Generating awrrpt_all.sql script…. Please wait…. …Completed script generation.

      Executing awrrpt_all.sql to generate AWR reports. …Generates AWR reports with file name format awrrpt___.txt for each instance. …Please wait for few minutes…

      …AWR report created for instance 1. Please wait.. AWR reports created.

      Cheers

      Riyaj Shamsudeen Principal DBA, Ora!nternals – http://www.orainternals.com Specialists in Performance, Recovery and EBS11i Blog: https://orainternals.wordpress.com OakTable member Oracle ACE Co-author of the book”Expert Oracle Practices” http://tinyurl.com/book-expert-oracle-practices

      • Baskar
        As we discussed, serveroutput should be on as the script uses dbms_output utility. I have uploaded recent version of the script.

        Cheers
        Riyaj

  10. Adhika said

    Hi Riyaj,

    What is the difference between CR and Current? What are their relations?

    Thank you so much for the scripts.

    Thanks,
    Adhika

    • Hello Adhika
      Sorry for the late reply. CR means Consistent Read buffers, essentially, a buffer consistent at a SCN (time). When a SELECT statement accesses a buffer that session can not see uncommitted changes. So, undo records are applied to rollback uncommitted transactions creating a consistent copy of the buffer (database block) and SELECT statement can use that buffer.

      In RAC, When a session opens an access request for a buffer, the request will include SCN version of the block required. If there are pending transactions in the block in remote cache, then LMS process in the remote instance will apply undo records to create a consistent version of the block and then ship the block to the requestor. This is known as CR transfer.
      Current buffer means, that buffer includes latest changes, i.e. it is current. SELECT statement can access current mode buffers also, (1) if the buffer version in the memory and the block version in the disk are the same. (2) Buffer contains latest changes with no pending transactions.
      Obviously, application changes only happen in current mode buffers even if there are many outstanding transactions in that block already. Row level locking mechanism protect session stepping at each other.

      In a nutshell, CR mode buffers are use-and-throw buffers as the CR mode buffers are created consistent at a specific SCN. Current mode buffers contains all latest changes and it is this mode buffer that can be written to the disk. CR mode buffers can not be written to the disk and they are transient. (may or may not have pending transactions).
      Hope this helps.
      Cheers
      Riyaj

      • adhikarexuss said

        Hi Riyaj,

        Thank you so much for the explanation. It helps a lot to understand this.
        May I know which Oracle documentation also explain about this? (if exist)

        Thank you,
        Adhika

      • Hello Adhika
        I am sure that concepts manual will indicate this difference. But, I can’t find the page with this information though..

        Thanks
        Riyaj

  11. An Ng said

    Hi Riyah,

    Thank you for such as productive topic. Based on experience you’ve carried out analyzing GC performance, when you encounter poor performance in the GC, which causes are the most common and how did you get rid of them?

    Thanks in advance,

    An Ng

  12. Pat said

    Thanks for sharing these scripts!!! very usuful!

  13. chiteke said

    Great scripts, Riyaj

  14. getting 404 and other errors trying to get the scripts

    • Hi
      Which specific script is causing 404 errors ? I checked a few scripts and it works fine for me. Please let me know so that I can fix the problem.
      Thanks for reading.

      Cheers
      Riyaj

  15. ashz arora said

    Hi Riyaj,

    when i try to open the link that has these scripts i am getting 404 error , may be the link of your scripts have become invalid , can u pls repost these scripts .

    bless you
    Ashz Arora

  16. Yousuf said

    Getting this error while clicking on script section……

    Please request to have correction on broken link..

    ———————————————————————–
    Page Not Found

    The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please try the following:

    If you typed the page address in the Address bar, make sure that it is spelled correctly.
    Click the Back button in your browser to try another link.
    Use a search engine like Google to look for information on the Internet.
    ————————————————————————

  17. Dallas Deeds said

    Riyaj,

    The scripts page is gone!

    • Hello Dallas
      This blog entry has a zip file with all scripts. Please download that zip file and use the scripts. I updated the blog entry removing the links to my scripts folder (it is long gone).

  18. […] in RAC?Ans:http://satya-racdba.blogspot.com/2012/10/wait-events-in-oracle-rac-wait-events.htmlhttps://orainternals.wordpress.com/2009/12/23/rac-performance-tuning-understanding-global-cache-perfo…gc buffer busygc buffer busy acquiregc current request gc cr requestgc cr failure gc […]

Leave a reply to Adhika Cancel reply