Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Archive for the ‘recovery’ Category

How to reformat corrupt blocks which are not part of any segment?

Posted by Riyaj Shamsudeen on July 11, 2014

There was a question in . Problem is that there were many corrupt blocks in the system tablespace not belonging to any segment. Both DBV and rman throws errors, backup is filling the v$database_block_corruption with numerous rows. OP asked to see if these blocks can be reinitialized. Also, note 336133.1 is relevant to this issue on hand.

$ dbv file=/oracle/SID/system_1/system.data1

DBVERIFY: Release 10.2.0.5.0 - Production on Fri Jul 11 08:04:18 2014

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oracle/SID/system_1/system.data1
Block Checking: DBA = 67121421, Block Type = Save undo data block
ERROR: SAVE Undo Block Corrupted.  Error Code = 50
kts4subck: record (3) seq# (0), split flag (0)
and total pieces(0)
…
REM Many such corruptions.
select * from v$database_block_corruption order by 1,2,3,4  ;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
----- ------ ------ ------------------ ---------
    1  12557      1                  1 UNKNOWN
    1  12561      1                  1 UNKNOWN
    1  12589      1                  1 UNKNOWN
    1  12593      1                  1 UNKNOWN
..

From the output above, blocks belonging to saved undo data blocks, which probably have undo records referring to other data files also. Since these blocks do not belong to any segment (note 336133.1 has a SQL statement to verify that), we should be able to force the allocation of these blocks to a table, load that table with numerous rows, which, in theory, should re-initialize the blocks. That should eliminate rman and dbv errors.

First, let me take a selfie
Read the rest of this entry »

Posted in corruption, Oracle database internals, recovery | Tagged: , | 1 Comment »

SCN – What, why, and how?

Posted by Riyaj Shamsudeen on January 19, 2012

In this blog entry, we will explore the wonderful world of SCNs and how Oracle database uses SCN internally. We will also explore few new bugs and clarify few misconceptions about SCN itself.

What is SCN?

SCN (System Change Number) is a primary mechanism to maintain data consistency in Oracle database. SCN is used primarily in the following areas, of course, this is not a complete list:

  1. Every redo record has an SCN version of the redo record in the redo header (and redo records can have non-unique SCN). Given redo records from two threads (as in the case of RAC), Recovery will order them in SCN order, essentially maintaining a strict sequential order. As explained in my paper, every redo record has multiple change vectors too.
  2. Every data block also has block SCN (aka block version). In addition to that, a change vector in a redo record also has expected block SCN. This means that a change vector can be applied to one and only version of the block. Code checks if the target SCN in a change vector is matching with the block SCN before applying the redo record. If there is a mismatch, corruption errors are thrown.
  3. Read consistency also uses SCN. Every query has query environment which includes an SCN at the start of the query. A session can see the transactional changes only if that transaction commit SCN is lower then the query environment SCN.
  4. Commit. Every commit will generate SCN, aka commit SCN, that marks a transaction boundary. Group commits are possible too.

SCN format

SCN is a huge number with two components to it: Base and wrap. Wrap is a 16 bit number and base is a 32 bit number. It is of the format wrap.base. When the base exceeds 4 billion, then the wrap is incremented by 1. Essentially, wrap counts the number of  times base wrapped around 4 billion. Few simple SQL script will enumerate this better:
But wait, there’s more!

Posted in 11g, corruption, Oracle database internals, Performance tuning, RAC, recovery | Tagged: , , , , , , , | 28 Comments »

Does an UPDATE statement modify the row if the update modifies the column to same value?

Posted by Riyaj Shamsudeen on November 4, 2010

Introduction

If a table column is updated with the same value in a row, does Oracle RDBMS engine modify the data? (or) Does RDBMS engine have an optimization skipping the update, as value of that column is not changing? This was the essence of a question asked in Oracle-l list and I think, it is a good topic for further discussion. Jared Still came up with a fine method to understand this issue measuring redo/undo size. We will explore the same questions with redo log dump method in this blog entry.

Following few lines shows a test case creating a table, an index, and then populating a row in the table.

create table updtest (v1 varchar2(30));

create index updtest_i1 on updtest(v1);

insert into updtest values ('Riyaj');

commit;

REDO records and change vectors

If a row is modified by a SQL statement, Oracle Database generates redo records describing that change. Generated redo records are applied to the database blocks taking the blocks to next version. Direct mode inserts aka nologging inserts are not discussed in this blog entry.

Changes made by the SQL statement might be rolled back too. So, undo records are created in the undo block and these undo records describe how to rollback the SQL changes. Redo records are generated describing the changes to those undo blocks too. Further, these redo records are applied to the data blocks and undo blocks taking the blocks to next version. So, reviewing the redo records generated is sufficient to understand exactly what happens underneath.

To keep the redo log file dumps as clean as possible, we need a database with no activity. I have one such database and I have disabled all automatic jobs to make the redo dumps as clean as possible. It is also important to dump the log file from another session to maintain clarity.

So, in the following script, we will perform a checkpoint, switch log file from session #1. Then, We update the row with the same value from another session and commit. From Session #1, we switch log again from session #1.

Test case #1: Updating column to the same value
============
Session #1:
 alter system checkpoint;
 alter system switch logfile;
Session #2:
 update updtest set v1='Riyaj';
 commit;
Session #1: alter system switch logfile;

Essentially, Last redo log file has the redo records generated by executing the update statement. We need to dump the redo log file to see contents of the log file.

Notice that I am using ‘Riyaj’ as the value to update, as I can search for ASCII representation of my first name in hex easily in the dump file. I mean, Come on, who would not know the ASCII representation of their first name in hex? Here is a way to get Hex values:

select dump('Riyaj',16) Hex from dual

HEX
----------------------------
Typ=96 Len=5: 52,69,79,61,6a

We will use following Script to dump the last redo log file using ‘alter system dump logfile’ syntax:

-------------------------------------------------------------------------------------------  Script : dump_last_log.sql
------------------------------------------------------------------------------------------- This script will dump the last log file.
--   If the log file is big with enormous activity, this might take much resource.
--
--
--  Author : Riyaj Shamsudeen
--  No implied or explicit warranty !
-----------------------------------------------------------------------------------------set serveroutput on size 1000000
declare
  v_sqltext varchar2(255);
begin
 select 'alter system dump logfile '||chr(39)||member||chr(39)  into v_sqltext
  from v$log  lg, v$logfile lgfile
  where lg.group# = lgfile.group# and
  lg.sequence# = (select sequence#-1 from v$log where status='CURRENT' )
  and rownum <2
 ;
  dbms_output.put_line ('Executing :'||v_sqltext);
  execute immediate v_sqltext;
end;
/

Analysis of redo records:

Dumping the redo log file creates a trace file in the user_dump_dest destination. Let’s review the redo records associated with this statement.
Please refer to the output below and I am showing only relevant details. A REDO RECORD contains multiple change vectors and each change vector describe an atomic change to a block. First change vector is for OBJ: 77534 and that object_id is associated with the table UPDTEST. CHANGE #1 specifies how to Update Row Piece (URP): Update the column 0 to value of ’52 69 79 61 6a’ which is ‘Riyaj’ in the data block with Data Block Address (DBA) 0x0100113d associated with the object 77534.

REDO RECORD - Thread:1 RBA: 0x000071.00000002.0010 LEN: 0x021c VLD: 0x0d
SCN: 0x0000.0032b4c9 SUBSCN:  1 11/03/2010 10:51:02

CHANGE #1 TYP:2 CLS:1 AFN:4 DBA:0x0100113d OBJ:77534 SCN:0x0000.0032b40b SEQ:2 OP:11.5 ENC:0 RBL:0
KTB Redo
op: 0x11  ver: 0x01
..
KDO Op code: URP row dependencies Disabled
...
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 1 ckix: 0
ncol: 1 nnew: 1 size: 0
Vector content:
col  0: [ 5]  52 69 79 61 6a <--- 'Riyaj'

Change Vectors 2 and 3 are irrelevant for our discussion, we will ignore it. CHANGE vector #4 is modifying the block with DBA 0x00c00694 which is for the object with object_id 4294967295. Objects with object_id close to 4GB are for undo segments. This change vector holds an undo record. That undo record describes how to rollback the change: Update the row at slot 0 column 0 in the block with DBA 0x0100113d to ’52 69 79 61 6a’.

CHANGE #4 TYP:0 CLS:18 AFN:3 DBA:0x00c00694 OBJ:4294967295 SCN:0x0000.0032b4a6 SEQ:1 OP:5.1 ENC:0 RBL:0
...
Undo type:  Regular undo        Begin trans    Last buffer split:  No
...
KDO undo record:
KTB Redo
op: 0x04  ver: 0x01
compat bit: 4 (post-11) padding: 0
op: L  itl: xid:  0x000a.00d.000005ae uba: 0x00c00996.01a7.2c
                      flg: C---    lkc:  0     scn: 0x0000.0032b3f1
KDO Op code: URP row dependencies Disabled
  xtype: XAxtype KDO_KDOM2 flags: 0x00000080  bdba: 0x0100113d  hdba: 0x0100113a
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: 0
Vector content:
col  0: [ 5]  52 69 79 61 6a

In a nutshell, update to the table row with the same value, updated the row value from ‘Riyaj’ to ‘Riyaj’. Even though, supplied value and current row value is the same value, update to the row piece must happen. It is important as both external and internalized triggers need to fire correctly.

But, what happened to the index update?

We have an index on that column v1 and we updated that indexed column too. Did Oracle update the indexed column? NO. If the values are matching at the indexed column level, then the RDBMS code is not updating the index, a redo optimization feature. Only the table row piece is updated and the index is not updated.

Updating to a different value

To explain what happens at the index level, we need to consider a test case that updates the column value to a different value. This test case is similar to the Test case #1 except that we are updating the column value from ‘Riyaj’ to ‘RiyajS’.

Test case #2: Updating to a different value.
============
Session #1:
 alter system checkpoint;
 alter system switch logfile;
Session #2:
 update updtest set v1='RiyajS';
 commit;
Session #1: alter system switch logfile;

First Change vector is updating the table row piece to ‘RiyajS’.

CHANGE #1 TYP:2 CLS:1 AFN:4 DBA:0x0100113d OBJ:77534 SCN:0x0000.0032b4c9 SEQ:2 OP:11.5 ENC:0 RBL:0
...
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 2 ckix: 0
ncol: 1 nnew: 1 size: 1
col  0: [ 6]  52 69 79 61 6a 53 <--RiyajS

CHANGE #3 below is updating the index leaf block. Update to an indexed column value results in delete(s) and insert(s) at the index level. Change #3 is deleting the leaf row and Change Vector #4 is inserting a leaf row.

CHANGE #3 TYP:0 CLS:1 AFN:4 DBA:0x01001153 OBJ:77535 SCN:0x0000.0032b567 SEQ:1 OP:10.4 ENC:0 RBL:0
index redo (kdxlde):  delete leaf row
KTB Redo
...
REDO: SINGLE / -- / --
itl: 2, sno: 0, row size 17

CHANGE vector #4 below is inserting a leaf row in the index leaf block with the key values:“06 52 69 79 61 6a 53 06 01 00 11 3d 00 00″.

06: is the length of key value
52 69 79 51 6a 53: ‘RiyajS’
01 00 11 3d 00 00: ROWID. (notice the dba of the block 0x0100113d). 
CHANGE #4 TYP:0 CLS:1 AFN:4 DBA:0x01001153 OBJ:77535 SCN:0x0000.0032b569 SEQ:1 OP:10.2 ENC:0 RBL:0
index redo (kdxlin):  insert leaf row
...
REDO: SINGLE / -- / --
itl: 2, sno: 1, row size 18
insert key: (14):  06 52 69 79 61 6a 53 06 01 00 11 3d 00 00

Undo change vectors

There are two more change vectors describing the undo block changes. CHANGE vector #6 specifies the undo record to rollback then change at the table block level. Basically, pre-image of the row piece is captured in here.

CHANGE #6 TYP:0 CLS:30 AFN:3 DBA:0x00c00832 OBJ:4294967295 SCN:0x0000.0032b4b1 SEQ:1 OP:5.1 ENC:0 RBL:0
...
KDO undo record:
KTB Redo
...
...
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: -1
col  0: [ 5]  52 69 79 61 6a

Change #7 and Change #8 specifies how to undo the changes at the index level. To undo the change at the index level, do a delete of current index entry and insert the older image of the index entry. Change #8 specifies to purge the leaf row with the key value ’06 52 69 79 61 6a 53 06 01 00 11 3d 00 00′ [ RiyajS + rowid combo].

CHANGE #7 TYP:0 CLS:30 AFN:3 DBA:0x00c00832 OBJ:4294967295 SCN:0x0000.0032b569 SEQ:1 OP:5.1 ENC:0 RBL:0
...
Undo type:  Regular undo       Undo type:  Last buffer split:  No
...
index undo for leaf key operations
...
(kdxlre): restore leaf row (clear leaf delete flags)
key ( 13):  05 52 69 79 61 6a 06 01 00 11 3d 00 00

Change #7 specifies to restore the leaf row with the key value ’05 52 69 79 61 6a 06 01 00 11 3d 00 00′ [Riyaj + rowid combo].

CHANGE #8 TYP:0 CLS:30 AFN:3 DBA:0x00c00832 OBJ:4294967295 SCN:0x0000.0032b569 SEQ:2 OP:5.1 ENC:0 RBL:0
...
Undo type:  Regular undo       Undo type:  Last buffer split:  No
...
index undo for leaf key operations
...
(kdxlpu): purge leaf row
key ( 14):  06 52 69 79 61 6a 53 06 01 00 11 3d 00 00

In a nutshell, updating an indexed column with a different column value, deletes current entry from the index leaf block and inserts an entry in the index leaf block with an updated column value. In addition, two change vectors are also added describing how to undo the change at the leaf block.

Wait, did RDBMS engine really delete from the index leaf block?

No, entries are not physically deleted from the index leaf block. Rather, entry with the old value is marked with a D flag and the new entry is added to the leaf block with updated value. This is visible dumping the index leaf block of the index UPDTEST_I1. Please review the lines from the leaf block dump shown below: Row #0 is for the value ‘Riyaj’ and Row #1 is for the value ‘RiyajS’. Row #0 is marked with a D flag indicating that entry as a deleted entry. Row #1 is the active current entry.

-- (Change history #2) Converting DBA 0x01001153 to decimal yields 16781651. 
-- You can use calculator for this conversion.
-- you can also use the following method:
-- select to_number('01001153','xxxxxxxx') from dual;

TO_NUMBER('01001153','XXXXXXXX
------------------------------
                      16781651

--converting from dba to file#, block#.
undef dba
select
dbms_utility.DATA_BLOCK_ADDRESS_FILE(&&dba)||','||
dbms_utility.DATA_BLOCK_ADDRESS_BLOCK(&&dba)
from dual
/

Enter value for dba: 16781651
4,4435

Alter system dump datafile 4 block min 4435 block max 4435;

Trace file contents:
===================
row#0[7959] flag: ---D--, lock: 2, len=29
col 0; len 5; (5):  52 69 79 61 6a
col 1; len 6; (6):  00 00 00 00 00 00
col 2; len 6; (6):  01 00 11 3d 00 07
col 3; len 6; (6):  c0 91 86 99 e8 05

row#1[7905] flag: ------, lock: 2, len=30
col 0; len 6; (6):  52 69 79 61 6a 53
col 1; len 6; (6):  00 00 00 00 00 00
col 2; len 6; (6):  01 00 11 3d 00 07
col 3; len 6; (6):  c0 91 86 99 e8 05

Summary

If you have skipped all these hex dumps, I don’t blame you . But, if you have followed along, right on! To summarize:

  1. Updating the column value to the same value modifies the table block. Row piece in the table block is physically modified. But, the corresponding index entry is not modified.
  2. Updating the column value to a different value modifies both table and index blocks. Updating an indexed column value results in a delete and insert of index entries in the index leaf block.
  3. Delete of an index entry does not delete the entry physically, rather marks the entry with a D flag. Of course, future inserts in to this block will reuse the entry at some point in time. So, you don’t necessarily lose that space permanently.

Of course, there are other scenarios such as index leaf block split, branch block split etc, not covered in this blog entry.
Thanks to Mark Bobak and Jared Still for reviewing this blog entry and providing valuable contributions.

You can read this blog in a more conventional format as update to column value_v2 .

Version: Oracle Database release 11.2.0.1
Change history #2: Fixed a typo with block number. Thanks to Sreejith Sreekantan for pointing out the error.

Posted in Oracle database internals, Performance tuning, recovery | Tagged: , , , , , , , , | 6 Comments »

Resolving corruption issue with file_hdrs dump

Posted by Riyaj Shamsudeen on June 18, 2009

One of our client had an interesting and bizarre corruption issue. Intermittently their database is corrupted.

ORA-01171: datafile 178 going offline due to error advancing checkpoint
ORA-01122: database file 178 failed verification check
ORA-01110: data file 178: ‘/app/u04/oradata/somedb/some_data_01.dbf’
ORA-01251: Unknown File Header Version read for file number 178

Error message printed above indicates that file header is corrupted. This is not a one time issue and recurring many times in the past 45 days or so. Database become unusable and only option was to restore the database from backup and recover it. A painful and business impacting issue. Client’s frustration is understandable.

Of course, Client tried normal channels to resolve this problem with out luck. Opening Service tickets with vendors, Relink software stack, update dbid, modify kernel parameters and replace hardware etc. Client has even planned to reinstall OS soon. Still, problem re-occurred every week. That’s when he reached out to us.

file_hdrs dump

Realizing that file header is corrupted, First thing, I wanted to do was that to find what exactly is there in those file headers. What type of corruption in those file headers? Is it a block formatted with null characters? or Is it that few fields are corrupted? This will give more clues and direct us in the right path. So, I asked the client to perform a file_hdrs dump when this problem re-occurs. Following commands were sent to the client.

oradebug setmypid
oradebug dump file_hdrs 10
exit

Above statement is to dump file headers at level 10. oradebug file_hdrs dump will print blocks from the file headers.

oradebug setmypid
oradebug dump controlf 10

Second dump (controlf dump) should print control file records to the trace file. Idea here is to match the file headers and control file file header section to see if we gain more insights in to this corruption issue.

It is quite possible that this could be a control file corruption and control file corruption can’t be ruled out yet. File headers and files can be corrupted. Essentially, this needs cross verification between control file and file headers to identify the root cause. At least, I was hopeful that this will point us in right direction.

dd of file header block

I also requested the client to take a backup of first few blocks of the corrupted file using ‘dd if=filename of=filehdr.out bs=8192 count=2′ command. In UNIX platform, this dd command will copy first two blocks (of block size 8K) in to a flat file. After every corruption, only workaround is to restore and recover the database and I wanted to collect as much information as possible in one round.

What happens on Friday?

Client also pointed out that this issue happens on Fridays only. Various groups were checking UNIX, SAN layers to see if there is anything special about Friday, looking for maintenance jobs that runs on Friday etc. Nothing of any significance shows up. At this point, we were waiting for the problem to reoccur.

And then it did…

file_hdr and controlf trace files

We received trace files from the client and reviewed it. We will concentrate on just one corrupted data file, even though 68 files were corrupted at this point.

control file section for that file printed below:

DATA FILE #32: 
(name #41) /app/u04/oradata/somedb/somedata1.dbf
 creation size=12800 block size=8192 status=0x1c head=41 tail=41 dup=1
 tablespace 32, index=33 krfil=32 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:48 scn: 0x0910.62fd2f34 05/08/2009 10:30:32
 Stop scn: 0x0910.6305098e 05/08/2009 17:42:13
 Creation Checkpointed at scn:  0x0000.000935a6 04/30/2009 21:42:29

Nothing jumps out. Checkpoint SCN is recent enough [ Of course, this problem happened in May 2009].
Let’s review the file header section for that corrupted file. Clearly , there is a corruption since controlf dump command itself is unable to print correct file header version.

 
File header version cannot be determined due to corruption <<<<
Dump may be suspect
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=1755378070=0x68a0f196, Db Name='SOMEDB'
	Activation ID=0=0x0
	Control Seq=3032023=0x2e43d7, File size=401792=0x62180
	File Number=18, Blksiz=8192, File Type=3 DATA

If the file header is filled with null then above section starting with ‘V10 STYLE FILE HEADER:’ can not be printed. oradebug command is complaining that file header version can not be determined, while also printing fields from file header. Confusing and that doesn’t make sense: File header version is corrupt but fields from file headers can be printed. This means that we might need to review the output of dd command. But, before going that route , decided to compare corrupted file header output with another uncorrupted file in the control file trace file to see if we can spot any obvious issue(s).

controlf trace file for an uncorrupted file

Following first two sections prints the control file record section for a uncorrupted file. Checkpoint scn time line is matching for corrupted and uncorrupted data file records. This gives us a clue that control file record for these files may not be corrupted. Genuinely, there must be a corruption in the file header.

DATA FILE #33: 
  (name #42) /app/u03/oradata/qnoldv01/ANOTHER_GOOD_FILE_01.dbf
creation size=12800 block size=8192 status=0xe head=42 tail=42 dup=1
 tablespace 33, index=34 krfil=33 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:53 scn: 0x0910.641e926d 05/13/2009 16:25:26
 Stop scn: 0xffff.ffffffff 05/02/2009 13:41:42
 Creation Checkpointed at scn:  0x0000.000935c5 04/30/2009 21:42:31
 thread:1 rba:(0x3.e3bbd.10)

Let’s also review the file header section of an uncorrupted data file.

 Hot Backup end marker scn: 0x0000.00000000
 aux_file is NOT DEFINED 
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=1908337442=0x71beeb22, Db Name='SOMEDB'
	Activation ID=0=0x0
	Control Seq=34895=0x884f, File size=2434616=0x252638
	File Number=33, Blksiz=8192, File Type=3 DATA

Comparison of corrupted and uncorrupted file is not showing any insights either. OR Is it?

Not so fast…

There is a difference in the file header section between these these two file headers. Let me reprint both these file headers side-by-side to see if the difference can be spotted easily.

Corrupted file:

 
File header version cannot be determined due to corruption
Dump may be suspect
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=1755378070=0x68a0f196, Db Name='SOMEDB'
	Activation ID=0=0x0
	Control Seq=3032023=0x2e43d7, File size=401792=0x62180
	File Number=18, Blksiz=8192, File Type=3 DATA

Uncorrupted file:

 Hot Backup end marker scn: 0x0000.00000000
 aux_file is NOT DEFINED 
 V10 STYLE FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=1908337442=0x71beeb22, Db Name='SOMEDB'
	Activation ID=0=0x0
	Control Seq=34895=0x884f, File size=2434616=0x252638
	File Number=33, Blksiz=8192, File Type=3 DATA

Ahh.. Right there. DBID is different between these two files! How can the DBID be different in this situation? Client assured me that there is no transportable tablespace or read only tablespaces shared between databases going on here. How can the dbids different between two files in the same database? These tablespaces are both opened with read and write activity. We need to follow this clue.

Corrupted file:     :Db ID=1755378070=0x68a0f196 , Db Name='SOMEDB'
...
Uncorruped file     :Db ID=1908337442=0x71beeb22 , Db Name='SOMEDB'

After pointing out that dbid is different between uncorrupted and corrupted files, client dumped control files from other development and test databases to see if they can locate a db with dbid 1755378070. No databases had the same dbid.

Finally..

With dbid difference in mind, few hours later, client DBA recollected something. When they migrated from old database server to new database server, they used scp to move the database files from old server to new server. Of course, client DBA did not want scp connection to die when his VPN connection dies and so wrote a script to scp the files from cron entry as any good DBA will do.

After migration, system admins changed oracle password. Machine was supposed to be shutdown and kept in pristine condition for few weeks before returning the hardware. Unfortunately, it was not down.

I am sure, you figured out where I am going with it. There was a cron entry scheduled in that old DB server and that was quietly scp’ing the files from old database server to new database server corrupting the database. Client disabled that cron entry and also created new ssh key to avoid these issues.

I heard that client is able to enjoy his Fridays :-)
This blog can be read in a traditional format resolving_corruption_issue_with_filehdr_dumps.
PS: Thanks to the client for allowing me to blog about this issue.

Posted in corruption, Oracle database internals, recovery | Tagged: , , , , | 7 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 198 other followers