Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Dude, where is my redo?

Posted by Riyaj Shamsudeen on June 12, 2013

This blog entry is to discuss a method to identify the objects inducing higher amount of redo. First,we will establish that redo size increased sharply and then identify the objects generating more redo. Unfortunately, redo size is not tracked at a segment level. However, you can make an educated guess using ‘db block changes’ statistics. But, you must use logminer utility to identify the objects generating more redo scientifically.

Detecting redo size increase

AWR tables (require Diagnostics license) can be accessed to identify the redo size increase. Following query spools the daily rate of redo size. You can easily open the output file redosize.lst in an Excel spreadsheet and graph the data to visualize the redo size change. Use pipe symbol as the delimiter while opening the file in excel spreadsheet.

spool redosize.lst
REM  You need Diagnostic Pack licence to execute this query!
REM  Author: Riyaj Shamsudeen
col begin_interval_time format a30
set lines 160 pages 1000
col end_interval_time format a30
set colsep '|'
alter session set nls_date_format='DD-MON-YYYY';
with redo_sz as (
SELECT  sysst.snap_id, sysst.instance_number, begin_interval_time ,end_interval_time ,  startup_time,
VALUE - lag (VALUE) OVER ( PARTITION BY  startup_time, sysst.instance_number
                ORDER BY begin_interval_time, startup_time, sysst.instance_number) stat_value,
EXTRACT (DAY    FROM (end_interval_time-begin_interval_time))*24*60*60+
            EXTRACT (HOUR   FROM (end_interval_time-begin_interval_time))*60*60+
            EXTRACT (MINUTE FROM (end_interval_time-begin_interval_time))*60+
            EXTRACT (SECOND FROM (end_interval_time-begin_interval_time)) DELTA
  FROM sys.wrh$_sysstat sysst , DBA_HIST_SNAPSHOT snaps
WHERE (sysst.dbid, sysst.stat_id) IN ( SELECT dbid, stat_id FROM sys.wrh$_stat_name WHERE  stat_name='redo size' )
AND snaps.snap_id = sysst.snap_id
AND snaps.dbid =sysst.dbid
AND sysst.instance_number=snaps.instance_number
and begin_interval_time > sysdate-90
)
select instance_number, 
  to_date(to_char(begin_interval_time,'DD-MON-YYYY'),'DD-MON-YYYY') dt 
, sum(stat_value) redo1
from redo_sz
group by  instance_number,
  to_date(to_char(begin_interval_time,'DD-MON-YYYY'),'DD-MON-YYYY') 
order by instance_number, 2
/
spool off

Visualizing the data will help you to quickly identify any pattern anomalies in redo generation. Here is an example graph created from the excel spreadsheet and see that redo size increased recently.

screenshot_redo

Guess the object using ‘db block changes’ statistics

A quick method to guess the objects generating higher redo size is to use ‘db block changes’ statistics. The philosophy behind this technique is that, if the object is modified heavily then that object will probably generate more redo. But, it is not an entirely accurate statement as less frequently modified objects can generate more redo and vice versa. If you are lucky, one or two objects will stand out as a problem and you can review those segments further to reduce redo size.

@segment_stats.sql
To show all segment level statistics in one screen

Enter value for statistic_name: db block changes
old   6:           where value >0  and statistic_name like '%'||'&&statistic_name' ||'%'
new   6:           where value >0  and statistic_name like '%'||'db block changes' ||'%'
   INST_ID STATISTIC_NAME                 OWNER        OBJECT_NAME        OBJECT_TYP        VALUE   PERC
---------- ------------------------------ ------------ ----------------- ---------- ------------ ------
         1 db block changes               INV          SALES_TEMP_N1     INDEX        3831599856  48.66
         3                                INV          MTL_RESERV        TABLE        3794818912  23.78
         3                                ZX           DET_FACTORS_      INDEX        2468120576  15.47
         2                                APPLSYS      FND_TAB           TABLE        2346839248  16.33
….
  

Segment_stats.sql script can be found in segment_stats.

Identify objects using logminer

Scientific method to identify the object generating higher redo uses log mining package. Objects can be identified by the following steps:

Step 1: Start log miner from sys or system user in SQL*Plus. Example given here is for finprod2 instance archivelog file.

begin
  sys.dbms_logmnr.ADD_LOGFILE ('/opt/app/prod/finprod2/arch/finprod_212_2_1212121221.arch');
end;
/
begin
  sys.dbms_logmnr.START_LOGMNR;
end;
/

Step 2: Create a table by querying the data from v$logmnr_contents dynamic performance view. I tend to create a separate table for each archive log file for two reasons: (a) to improve the query performance (b)I haven’t tested thoroughly with multiple archivelog files. Following SQL statement finds the length of redo record by subtracting the RBA (Redo Byte Address) of the current record from the RBA of next record. Redo byte address provides the physical location of a redo record in a redo log file. Using the physical location of current redo record and the next redo record, we can find the length of current redo record.

Update 1: As Greg pointed out in comments section, I was using hard-coded 512 bytes for redo block size in my script, which is true in Solaris and Linux platform. But, in HP platform, redo block size is 1024 bytes. You can use the following SQL statement to identify the redo block size. I have modified the create table script to query redo block size dynamically.

SQL>select max(lebsz) from x$kccle;
MAX(LEBSZ)
----------
       512
drop table redo_analysis_212_2;
CREATE TABLE redo_analysis_212_2 nologging AS
SELECT data_obj#, oper,
  rbablk * le.bsz + rbabyte curpos,
  lead(rbablk*le.bsz+rbabyte,1,0) over (order by rbasqn, rbablk, rbabyte) nextpos
FROM
  ( SELECT DISTINCT data_obj#, operation oper, rbasqn, rbablk, rbabyte
  FROM v$logmnr_contents
  ORDER BY rbasqn, rbablk, rbabyte
  ) ,
  (SELECT MAX(lebsz) bsz FROM x$kccle ) le 
/

Step 3: Query the table to identify the object_name: In this step, we join the table created and obj$ table to identify the objects inducing redo size. Outer join is needed as the object may have been dropped recently. START indicates the redo record for the start of a transaction and COMMIT indicates the redo record for the end of a transaction.

set lines 120 pages 40
column data_obj# format  9999999999
column oper format A15
column object_name format A60
column total_redo format 99999999999999
compute sum label 'Total Redo size' of total_Redo on report
break on report
spool /tmp/redo_212_2.lst
select data_obj#, oper, obj_name, sum(redosize) total_redo
from
(
select data_obj#, oper, obj.name obj_name , nextpos-curpos-1 redosize
from redo_analysis_212_2 redo1, sys.obj$ obj
where (redo1.data_obj# = obj.obj# (+) )
and  nextpos !=0 -- For the boundary condition
and redo1.data_obj#!=0
union all
select data_obj#, oper, 'internal ' , nextpos-curpos  redosize
from redo_analysis_212_2 redo1
where  redo1.data_obj#=0 and  redo1.data_obj# = 0
and nextpos!=0
)
group by data_obj#, oper, obj_name
order by 4
/
...
      46346 INSERT          WSH_EXCEPTIONS                        87006083
   12466144 INTERNAL        MSII_N9                               95800577
   12427363 INTERNAL        MSII_N1                               96445137
          0 START           internal                             125165844
          0 COMMIT          internal                             205600756
   12960642 UPDATE          XLA_GLT_1234567890                   243625297
                                                           ---------------
Total Redo                                                      3681252096
spool off

Notice that objects identified using log miner tool is not matching with the objects from db block changes statistics. In this example, the discrepancy is probably because, I am looking at segment stats from the start of instance which may not be accurate.

In summary, log miner utility can be used to identify the objects generating higher redo. This will help you to understand why the redo generation is higher and may be, gives you a mechanism to reduce redo.

21 Responses to “Dude, where is my redo?”

  1. Gargi said

    Awesome information. Thanks Riyaz.

  2. Greg said

    Thanks for this post , Im wondering why You are using 512 as multiplier ?
    As far as I know redo block size is not hardcoded as 512bytes . Please explain.
    Greg

    • Greg
      Thanks for reading my blog. Yes, you are absolutely correct, I have modified the script to query redo block size, rather than hard-coding to 512 bytes. Thank you for your feedback.

      Cheers
      Riyaj

  3. Suresh said

    Hi Riyaz

    Excellent Post,

    Regarding objects mismatch, data_obj# represents data_object_id, if so there could be mismatch?

    • Hello Suresh
      I think, you are referring to the outer join. Outer join is needed as the object can be dropped and the redo vector will refer to the dropped object. Please clarify your question if I have misunderstood.
      Thanks for reading.

  4. Yasir said

    Hi Riyaj,
    Its really sad there is no other way besides logminer to identify objects and statements causing high redo.
    What about if we have a production instance? All of a sudden redo starts generating high, switching redo logfiles every 3 sec. What do we do in that case. If the database is in no archive log mode it becomes more difficult to find the cause of the problem.
    There is no view either where we can get some historical information about queries and objects generated high redo in past.

    • Greg said

      Hi Yasir,
      start taking snapshots of v$sesstat (look for redosize) or use snapper tool ‘google snapper Tanel’ You can check db changes statistics from v$segment_statistics as well.
      Regards
      Greg

      • Yasir said

        Hi Greg,
        Thanks for your reply. The problem is some of our client databases generates lots of redo. As we get tickets for the issue,when user enquires about it, iit is usually when event has passed. For example, on 21-JUn-2013 at 10 am to 12pm, there was lot of redo generated. By the time user enquires about it, it is after the high redo interval. In such cases, how do we find out historically, what objects and sessions generated high redo for that time?

  5. pradeep said

    Hi, This was a very useful post. I used logminer to analyze one of our databases which suddenly started seeing higher redo. I I compared 12 hour periods on either side of the behavior. All the segment operations generated same redo. All the difference in redo is being attributed to commit redo. Commits started generating higher redo suddenly. We dumped logminer info into two tables, previous_stats shows the info for normal redo period and after_stats show info for higher redo period.

    This is the info for the period which had normal redo.

    SQL> select count(distinct timestamp),count(0) from previous_stats where operation=’COMMIT';

    COUNT(DISTINCTTIMESTAMP) COUNT(0)
    ———————— ———-
    42823 1906277

    SQL> select sum(GREATEST(nextpos-currpos,0))/1024/1024/1024 from previous_stats where operation=’COMMIT';

    SUM(GREATEST(NEXTPOS-CURRPOS,0))/1024/1024/1024
    ———————————————–
    1.57005943

    SQL>

    This is the info for the period which had higher redo. Observe that similar number of redo records and similar number of distinct time stamps(I am assuming these are the number of commits actually done ) generated almost thrice the redo.

    SQL> select count(distinct timestamp),count(0) from after_stats where operation=’COMMIT';

    COUNT(DISTINCTTIMESTAMP) COUNT(0)
    ———————— ———-
    43079 1870068

    SQL> select sum(GREATEST(nextpos-currpos,0))/1024/1024/1024 from after_stats where operation=’COMMIT';

    SUM(GREATEST(NEXTPOS-CURRPOS,0))/1024/1024/1024
    ———————————————–
    4.81485652

    SQL>

    How can we further troubleshoot this. What could have changed which is causing this. I am happy to run more queries and post it.

    • Pradeep
      Thanks for reading. Visually read few redo records to see if the redo record contains both DML activity and commit operation. Can you also please send me AWR reports for these two timeframes?

  6. pradeep said

    BTW, This is how I created after_stats/previous_stats tables after invoking log miner.

    create as select seg_name,operation,timestamp,lead(rbablk*512+rbabyte,1,0) over (order by rbasqn, rbablk, rbabyte) as nextpos,(rbablk*512++rbabyte) as currpos from v$logmnr_contents

  7. pradeep said

    what do you mean by visually read? Rather how can I visually read a redo record. I cannot give the AWR report as per our privacy policy. I wish I could have given it.

  8. If you have the archive log, you can dump the logfile, and inspect the redo records visually. Queries in this blog post calculates length of redo records, but, to see the content of a redo record, one must dump the redo log / archivelog file.

    Please refer to Julian’s blog for the command to dump redo log:

    http://www.juliandyke.com/Diagnostics/Dumps/RedoLogs.html

    My blog entry will give you details about reading redo records:

    http://orainternals.wordpress.com/2010/11/04/does-an-update-statement-modify-the-row-if-the-update-modifies-the-column-to-same-value/

    About AWR reports, That’s fine, if you can’t share AWR reports, I can understand that. But, compare the stats around consistent redo records generation, stats such as :
    data blocks consistent reads – undo records applied
    transaction table consistent reads-..
    etc. I have a theory and was trying to prove or disprove my theory by reviewing AWR report.

    Cheers
    Riyaj

  9. pradeep said

    Thanks much for advise on analyzing a redo log. However, I ran into too many questions. I dumped one archive redo and mined the same log. The size of the file is accounted by (nextpos-currpos) technique you described in this article. But, I am not able to correlate the logmnr_contents entries to the dumpfile.There are far too many entries in logmnr_contents table. Can you help me on how to correlate the entries? Just help me on how I can find the entries in the dump file which correspond to one row in the logmnr_contents table

  10. pradeep said

    BTW,

    all the stats like

    data blocks consistent reads – undo records applied
    transaction table consistent reads-..

    are similar in both in AWR reports.

    The only other stat which different by great degree is Physical reads

  11. pradeep said

    I might be onto something. I see that there are lot of records for opcode23.2 in the redo log dump. This is for the red log after the higher redo generation started. Almost all these records point to a table.

    First of all what is opcode 23.2. I understand it is redo related to block writes. But I do not have concrete info.

    • Pradeep said

      Okay I went to bottom of the problem. There was a big table with out index. This started ramping up rate of physical reads as the table grew and more queries ran against that table. With DB_LOST_WRITE_PROTECT set to typical , each physical read would generate about 30 bytes of redo(http://externaltable.blogspot.com/2013/03/testing-lost-writes-in-oracle-and-with.html?m=1). I actually did some math and this seemed about correct in my case too. Half of the redo is now being consumed by these entries in our case!!!! So we are on our fast track Eng cycle to create index on that table to reduce physical reads and hence redo generation rate.

      One question though. Why did log miner show higher redo under OPERATION=commit. It threw me off. But thanks to redo analysis I was able to drill down.

      • Pradeep
        Impressive analysis! Yep, BRRs will explain the increase in redo. You should also compare AWR reports now and identify a method to detect this problem in the future, without dumping the redo stream.

  12. pradeep said

    In ‘Instance Activity Stats’ section look for values for ‘redo size’,’redo size for direct writes’ & ‘redo size for lost write detection’ . In my case the redo size increased between two periods and all the increase in because ‘redo size for lost write detection’ increased. If the increase in ‘ ‘redo size for lost write detection’ ‘ did not contribute to increases in redo size then it is safe to say transaction rate increased. Another clue would be to check for physical reads value and see if DB_LOST_WRITE_PROTECT is set to TYPICAL.

    All put in text, it is less than 3 lines. But it took me 8 hours to troubleshoot :)

    • Pradeep
      Thanks for sharing. Hindsight is 20/20, so, it is easier to see the pattern after knowing the root cause :) But, one must go through proper analysis to identify the root cause, and that time is well spent. I am sure, this analysis will come handy in a future problem analysis.

  13. Rahul said

    very good explanation.

    in addition to this wen check which user process generating more redos

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 198 other followers

%d bloggers like this: