Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

IOUG Collaborate 2015

Posted by Riyaj Shamsudeen on March 25, 2015

I will be presenting two topics in IOUG Collaborate 2015 in Vegas. Use the show planner and add my presentations to your schedule 🙂

Session #189: April 13 Monday 9:15 to 10:15AM Topic: Oracle Database 12c In-Memory Internals. Room Palm B

Session #145: April 13 Monday 12:45PM-1:45PM Topic: Tools and Techniques for Advanced Debugging in Solaris & Linux (mostly live demo). Room Palm B.

Posted in inmemory, Oracle database internals, Performance tuning, Presentations, RAC | Tagged: , , , , , | Leave a Comment »

RMOUG Training Days 2015

Posted by Riyaj Shamsudeen on January 19, 2015

I will be talking in Rocky Mountain Oracle User Group Training Days 2015( http://www.rmoug.org), with live demos (hopefully there will be no failures in the demo). My topics are:

Feb 17: Deep dive: 3:15PM to 5:15PM – RAC 12c optimization: I will discuss RAC global cache layer in detail with a few demos. You probably can’t find these deep Global Cache layer details anywhere else 🙂

Feb 19: Wednesday: 2:45PM to 3:45PM – Advanced UNIX tools: I will discuss both Solaris and Linux advanced tools to debug deep performance issues.

Feb 19: Wednesday: 12:15PM – 1:15PM – Exadata SIG panel with Alex Fatkulin.

Come to Denver. Come on, it won’t be cold ( I think 🙂 )

Uploading presentation and scripts of the presentation files: RMOUG_2015_Riyaj_RAC_12c_optim files

Presentation files for Unix tools demos: RMOUG_2015_Riyaj_Unix_tools

Posted in 11g, 12c, Presentations, RAC | Tagged: , , , | 1 Comment »

In-memory pre-population speed

Posted by Riyaj Shamsudeen on October 6, 2014

While presenting at Oaktable World 2014 in San Fransisco, I discussed the in-memory pre-population speed and indicated that it takes about 30 minutes to 1 hour to load ~300GB of tables. Someone asked me “Why?” and that was a fair question. So, I profiled the in-memory pre-population at startup.

Profiling methods

I profiled all in-memory worker sessions using Tanel’s snapper script and also profiled the processes in OS using Linux perf tool with 99Hz sample rate. As there is no other activity in the database server, it is okay to sample everything in the server. Snapper output will indicate where the time is spent; if the time is spent executing in CPU, then the perf report output will tell us the function call stack executing at that CPU cycle. Data from these two profiling methods will help us to understand the root cause of slowness.

  1. @snapper.sql out,gather=stw 600 10 “select sid from v$session where program like ‘%W00%'”
  2. Perf tool : perf record -F 99 -u oracle -g sleep 3600

continue reading

Posted in 12c, in-memory, inmemory, Oracle database internals, Performance tuning, Presentations | Tagged: , , , , , , , | 1 Comment »

Oaktable world 2014 is on!

Posted by Riyaj Shamsudeen on September 26, 2014

Many Oaktable members are planning to talk about deep technical topics in Oaktable world 2014. Looking at the agenda, I am excited, so many deep topics are planned. I will be talking about in-memory internals on Monday morning at 9AM, 9/29/2014, right after Mogens’ Keynote speech. You can find all details here: Oaktable world 2014. I will post my presentation slides after the presentation.

Start your open world week presentation with mine :). Sorry, no beers planned at that time, it is 9AM, after all!

Thanks for attending my presentation at Oaktable World 2014. You can download the slides : In-memory_internals.pdf.

Also, our book Expert Oracle RAC 12c has been translated to Chinese language. You can find details about that book in one of the translator’s blog: Alex lizx.

Posted in 12c, in-memory, inmemory | Tagged: , , | Leave a Comment »

inmemory: Why did that table was not populated in the column store?

Posted by Riyaj Shamsudeen on September 11, 2014

I enabled an huge 70G table for inmemory population, I expected the inmemory population to take a while, but the population didn’t complete even after letting it run for a day. Why?

ASH data

Initial review of the server shows no issues, no resource starvation. This must be a problem with Oracle processes itself. I started digging further, and ASH data shows that in numerous samples the process was seen reading block using single block I/I calls. Also object_id matches with the table I was trying to populate.

   select * from (
    select start_time, end_time, sql_id,event, current_obj#,  cnt_on_cpu + cnt_waiting tot_cnt,
  	rank () over ( order by (cnt_on_cpu + cnt_waiting) desc ) rnk
    from  (
    select
  	  min(start_time) start_time,
  	  max(end_time)   end_time,
  	  sql_id,event,current_obj#,
  	  sum(decode(session_state,'ON CPU',1,0))  cnt_on_cpu,
 	  sum(decode(session_state,'WAITING',1,0)) cnt_waiting
    from
     ( select
 	first_value(sample_time) over( order by sample_time ) start_time,
 	last_value(sample_time) over( order by sample_time
 				rows between unbounded preceding and unbounded following ) end_time,
 	sql_id,event, session_state, current_obj#
	from
 	     (select * from v$active_session_history ash where session_id= &&sid and session_serial#=&&serial_number)
     )
   group by sql_id, event, current_obj#
    )
  )
  where rnk 
/
START_TIME		  END_TIME		    SQL_ID	  EVENT 			 CURRENT_OBJ#	 TOT_CNT	RNK
------------------------- ------------------------- ------------- ------------------------------ ------------ ---------- ----------
18-AUG-14 08.42.03.702 AM 18-AUG-14 09.02.06.463 AM		  db file sequential read	       168967	     990	  1
												       168967	     156	  2
								  direct path read		       168967	      50	  3
						    bdwtqttka2w2y					   -1	       3	  4
						    bdwtqttka2w2y direct path read		       168967	       1	  5
						    24uqc4aqrhdrs				       168967	       1	  5
													   -1	       1	  5

Read the rest of this entry »

Posted in 12c, inmemory | Tagged: , | 3 Comments »

inmemory: sa00 process

Posted by Riyaj Shamsudeen on September 11, 2014

After the restart of a 12c inmemory database with 300GB+ SGA, I noticed that an Oracle background process sa00 was consuming a bit of CPU. Documentation suggests that it is SGA Allocator process, however, ipcs -ma command shows that the shared memory segment is already allocated. I was curious, of course, what would that background process will be allocating?.

pstack

Process stack of the process shows that it is touching SGA pages to pre-page SGA memory pages.

$ pstack 21131
#0  0x0000000000d9996e in ksmprepage_memory ()
#1  0x0000000000d99369 in ksm_prepage_sga_seg ()
#2  0x0000000003a5c78b in skgmapply ()
#3  0x0000000000da686a in ksmapply_v2 ()
#4  0x0000000000d9a82c in ksmprepage ()
#5  0x0000000000d99f89 in ksm_sslv_exec_cbk ()
#6  0x0000000000f79810 in ksvrdp ()
#7  0x00000000031013b7 in opirip ()
#8  0x0000000001bb0a08 in opidrv ()
#9  0x00000000026c0f71 in sou2o ()
#10 0x0000000000bbd85e in opimai_real ()
#11 0x00000000026cb6bc in ssthrdmain ()
#12 0x0000000000bbd72c in main ()

$ ps -ef|grep  21131
oracle   21131     1 96 15:00 ?        00:01:50 ora_sa00_XXXXXX

Two notable changes in this area:
1. Incidentally, pre_page_sga initialization parameter was defaulted to a value of FALSE until 11.2. In version 12.1, the parameter value defaults to TRUE.
2. As huge SGA is expected for inmemory databases, a new background process SA00 is also created to touch all SGA memory pages at startup.

As inmemory worker processes will be populating the inmemory column store soon after the startup, touching memory pages at instance startup makes sense, and the feature should improve the performance of inmemory population. At least, Worker processes doesn’t need to suffer from huge amount page faults. (note that this SGA is not using hugepages).

Change to the parameter pre_page_sga also should improve the performance of inmemory scan, as the memory map entries will be setup at the process startup. However, I am not quite clear, how this change will affect the performance of a connection storm, i.e. if there are numerous database connections in a short period of time and disconnects. Isn’t that the reason why the pre_page_sga was defaulted to FALSE? But, I need to test this thoroughly to understand the implications further.

Posted in 12c | Tagged: , , , | 6 Comments »

Inmemory: Not all inmemory_size is usable to store tables.

Posted by Riyaj Shamsudeen on September 11, 2014

I have been testing the inmemory column store product extensively and the product is performing well for our workload. However, I learnt a bit more about inmemory column store and I will be blogging a few them here. BTW, I will be talking about internals of inmemory in Oaktable world presentation, if you are in the open world 2014, you can come and see my talk: http://www.oraclerealworld.com/oaktable-world/agenda/

inmemory_size

Inmemory_size initialization parameter determines the amount of memory allocated to the in-memory column store. But only 80% of that memory value is allocated to store the objects. For examples, if you set inmemory_size=272G, then only 217G (=272*0.8) is used to store the objects, and the remaining 55GB is allocated for inmemory journal and internal objects. This is the reason why the inmemory heap is also split and tagged: IMCA_RO and IMCA_RW. IMCA_RW seems to be storing inmemory journal and IMCA_RO is to store objects in the memory. (Previous statement is not completely validated yet).

SELECT mem inmem_size,
       tot disk_size,
       bytes_not_pop,
       (tot/mem) compression_ratio,
       100 *((tot-bytes_not_pop)/tot) populate_percent
FROM
  (SELECT SUM(INMEMORY_SIZE)/1024/1024/1024 mem,
    SUM(bytes)              /1024/1024/1024 tot ,
    SUM(bytes_not_populated)/1024/1024/1024 bytes_not_pop
   FROM v$im_segments
   ) 
/
INMEM_SIZE  DISK_SIZE BYTES_NOT_POP COMPRESSION_RATIO POPULATE_PERCENT
---------- ---------- ------------- ----------------- ----------------
    217.25     231.17           .00        1.06407869              100

So, you should plan accordingly when you enable inmemory option. Notice that the compression ratio is about 1, but, that is because these objects are already compressed using HCC compression and so, memory compression is not going to give us a better compression. However, I see that size of a few objects have increased in-memory compared to the disk size, but, I am hoping that will be considered as a bug and will be fixed in later release.

Unfortunately, inmemory area chunks are not visible in v$sgastat at all, but v$sga shows the inmemory area correctly (show SGA output is correct as it queries v$sga too).

Posted in 12c, Oracle database internals, Performance tuning | Tagged: , , , , | Leave a Comment »

Visualizing AWR data using python

Posted by Riyaj Shamsudeen on August 1, 2014

In my earlier post, I talked about, how tableau can be used to visualize the data. In some cases, I find it useful to query AWR base tables directly using Python and graph it using matplotlib package quickly. Since python is preinstalled in almost all computers, I think, this method will be useful for almost everyone. Of course, you may not have all necessary packages installed in your computer, you can install the packages using install python packages . Of course, if you improve the script, please send it to me, I will share it in this blog entry.

Script is available as a zip file: plotdb.py

Usage:

Script usage is straight forward. Unzip the zip file and you will have a .py script in the current directory. Execute the script (after adjusting permissions of the script) using the format described below:

# To graph the events for the past 60 days, for inst_id=1, connecting to PROD, with username system. 
./plotdb.py -d PROD -u system -n 'latch free' -t e -i 1
# To graph the statistics for the past 60 days, for inst_id=2, connecting to PROD
./plotdb.py -d PRD -u system  -n 'physical reads' -t s -i 2

A typical graph from the above script is:

physical_reads

physical_reads

Posted in 11g, 12c | Tagged: , | 1 Comment »

inmemory area is another sub-heap of the top-level SGA heap

Posted by Riyaj Shamsudeen on July 30, 2014

I blogged earlier about heap dump shared pool heap duration and was curious to see how the inmemory – 12.1.0.2 new feature – is implemented. This is a short blog entry to discuss the inmemory area heap.

Parameters

I have set the initialization parameters sga_target=32G and inmemory_size=16G, meaning, out of 32GB SGA, 16GB will be allocated to inmemory area and the remaining 16GB will be allocated to the traditional areas such as buffer_cache, shared_pool etc. I was expecting v$sgastat view to show the memory allocated for inmemory area, unfortunately, there are no rows marked for inmemory area (Command “show sga” shows the inmemory area though). However, dumping heapdump at level 2 shows that the inmemory area is defined as a sub-heap of the top level SGA heap. Following are the commands to take an heap dump.

oradebug setmypid
oradebug heapdump 2 -- this command creates an heap dump trace file.
oradebug tracefile_name

Reviewing trace file

Trace file shows that the inmemory area is implemented as two sub-heaps namely IMCA_RO and IMCA_RW. Split is not equal between these two sub-heaps and I am not exactly sure about the algorithm for this split, about 12.75GB is allocated for IMCA_RO and the remaining 3.25GB is allocated for IMCA_RW area [ That’s about 80-20:) split ].

$ grep "heap name" *ora_56235*.trc
HEAP DUMP heap name="sga heap"  desc=0x600013d0
HEAP DUMP heap name="sga heap(1,0)"  desc=0x60063740
HEAP DUMP heap name="sga heap(1,3)"  desc=0x60068048
HEAP DUMP heap name="sga heap(2,0)"  desc=0x6006d490
HEAP DUMP heap name="sga heap(2,3)"  desc=0x60071d98
HEAP DUMP heap name="sga heap(3,0)"  desc=0x600771e0
...
HEAP DUMP heap name="sga heap(7,0)"  desc=0x6009e720
HEAP DUMP heap name="sga heap(7,3)"  desc=0x600a3028
HEAP DUMP heap name="IMCA_RO"  desc=0x60001130 <--- In memory Read only area?
HEAP DUMP heap name="IMCA_RW"  desc=0x60001278 <--- In memory Read write area?

You can learn all about SGA heap duration here , only last two lines are interesting to this blog entry and shows that two sub-heaps were allocated for Inmemory area.

The inmemory sub-heaps are split in to memory extents, similar to traditional SGA heap allocations. Each extent has numerous 64MB chunks allocated to it. These chunks are tagged as “cimadrv”. Total heap size is about 12.5GB.

HEAP DUMP heap name="IMCA_RO"  desc=0x60001130
 extent sz=0x1040 alt=288 het=32767 rec=0 flg=2 opc=2
 parent=(nil) owner=(nil) nex=(nil) xsz=0x30600000 heap=(nil)
 fl2=0x20, nex=(nil), dsxvers=1, dsxflg=0x0
 dsx first ext=0x64000000
 dsx empty ext bytes=0  subheap rc link=0x64000070,0x64000070
 pdb id=0
EXTENT 0 addr=0x363a00000
  Chunk        363a00010 sz=  8388304    free      "               "
  Chunk        3641ffee0 sz= 65011736    freeable  "cimadrv        "
  Chunk        367fffef8 sz= 67108888    freeable  "cimadrv        " <-- 64MB chunks
  Chunk        36bffff10 sz= 67108888    freeable  "cimadrv        "
  Chunk        36fffff28 sz= 67108888    freeable  "cimadrv        "
  Chunk        373ffff40 sz= 67108888    freeable  "cimadrv        "
...
EXTENT 1 addr=0x2e3b00000
  Chunk        2e3b00010 sz= 66059528    freeable  "cimadrv        "
  Chunk        2e79ffd18 sz= 67108888    freeable  "cimadrv        "
  Chunk        2eb9ffd30 sz= 67108888    freeable  "cimadrv        "
…
Total heap size    =13690208144 <-- Total heap size.

Next heap IMCA_RW is more interesting. This sub-heap also has extents with 64MB of chunks allocated it, however, I see that there are also smaller chunks in the heap. (I am still researching meaning of these chunks and trying to avoid guess work at this time.)

EAP DUMP heap name="IMCA_RW"  desc=0x60001278
 extent sz=0x1040 alt=304 het=32767 rec=0 flg=2 opc=2
 parent=(nil) owner=(nil) nex=(nil) xsz=0x50100000 heap=(nil)
 fl2=0x20, nex=(nil), dsxvers=1, dsxflg=0x0
 dsx first ext=0x790000030
 dsx empty ext bytes=0  subheap rc link=0x7900000a0,0x7900000a0
 pdb id=0
EXTENT 0 addr=0x80ff00000
  Chunk        80ff00010 sz= 17825296    free      "               "
  Chunk        810fffe20 sz= 50331672    freeable  "cimadrv        "
  Chunk        813fffe38 sz= 67108888    freeable  "cimadrv        "
  Chunk        817fffe50 sz= 67108888    freeable  "cimadrv        "
…
  Chunk        80f8d5ef8 sz=     8296    freeable  "cimcadrv-sb    " <-- smaller chunks. Most are about 8k or 16k.
  Chunk        80f8d7f60 sz=       48    freeable  "cimcadrv-sbrcv "
  Chunk        80f8d7f90 sz=      184    freeable  "cimcadrv-sblatc"
  Chunk        80f8d8048 sz=     8296    freeable  "cimcadrv-sb    "
  Chunk        80f8da0b0 sz=       48    freeable  "cimcadrv-sbrcv "
  Chunk        80f8da0e0 sz=      184    freeable  "cimcadrv-sblatc"
…
Total heap size    =3489660848

So, if this is similar to shared pool heap, is it possible to get an out-of-space error such as ORA-4031 for the shared pool heap?. There is such an error associated with inmemory option :).

 oerr ora 64356
64356, 00000, "in-memory area out of space"
// *Document: NO
// *Cause:    The in-memory area had no free space.
// *Action:   Drop the in-memory segments to make space.

In summary, I was expecting inmemory area to be allocated as integral part of buffer_cache buffers, however, that is not the case. Inmemory area size is allocated as sub-heaps very similar to the shared pool sub-heaps (but, NOT part of shared pool heaps though). As the software was released just recently, I need to research further to understand the intricate details.

Posted in 12c, Oracle database internals, Performance tuning | Tagged: , , , , | Leave a Comment »

How to reformat corrupt blocks which are not part of any segment?

Posted by Riyaj Shamsudeen on July 11, 2014

There was a question in . Problem is that there were many corrupt blocks in the system tablespace not belonging to any segment. Both DBV and rman throws errors, backup is filling the v$database_block_corruption with numerous rows. OP asked to see if these blocks can be reinitialized. Also, note 336133.1 is relevant to this issue on hand.

$ dbv file=/oracle/SID/system_1/system.data1

DBVERIFY: Release 10.2.0.5.0 - Production on Fri Jul 11 08:04:18 2014

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oracle/SID/system_1/system.data1
Block Checking: DBA = 67121421, Block Type = Save undo data block
ERROR: SAVE Undo Block Corrupted.  Error Code = 50
kts4subck: record (3) seq# (0), split flag (0)
and total pieces(0)
…
REM Many such corruptions.
select * from v$database_block_corruption order by 1,2,3,4  ;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
----- ------ ------ ------------------ ---------
    1  12557      1                  1 UNKNOWN
    1  12561      1                  1 UNKNOWN
    1  12589      1                  1 UNKNOWN
    1  12593      1                  1 UNKNOWN
..

From the output above, blocks belonging to saved undo data blocks, which probably have undo records referring to other data files also. Since these blocks do not belong to any segment (note 336133.1 has a SQL statement to verify that), we should be able to force the allocation of these blocks to a table, load that table with numerous rows, which, in theory, should re-initialize the blocks. That should eliminate rman and dbv errors.

First, let me take a selfie
Read the rest of this entry »

Posted in corruption, Oracle database internals, recovery | Tagged: , | 1 Comment »