Posted by Riyaj Shamsudeen on June 7, 2016
I came across another strange SQL performance issue: Problem was that a SQL statement was running for about 3+ hours in an User Acceptance (UA) database, compared to 1 hour in a development database. I ruled out usual culprits such as statistics, degree of parallelism etc. Reviewing the SQL Monitor output posted below, you can see that the SQL statement has already done 6 Billion buffer gets and steps 21 through 27 were executed 3 Billion times so far.
Statistics and execution plan
Please refer to the execution plan. Due to the functionality of the SQL statement, the execution plan seems acceptable. 3 Billion executions in to the view (at step 21) is also expected, however, why would the SQL statement runs much slower in the UA database ? [ Table names are changed for security reasons. ].
Posted in 12c, Oracle database internals, Performance tuning | Tagged: EHCC, Exadata performance, HCC, oracle performance | 3 Comments »
Posted by Riyaj Shamsudeen on May 21, 2016
I was testing an application performance in 12c, and one job was constantly running slower than 11g. This post is to detail the steps. I hope the steps would be useful if you encounter similar issue.
In an one hour period, over 90% of the DB time spent on waiting for library cache lock waits. Upon investigation, one statement was suffering from excessive waits for ‘library cache lock’ event. We recreated the problem and investigated it further to understand the issue.
Following is the output of wait_details_rac.sql script (that I will upload here) and there are many PX query servers are waiting for ‘library cache lock’ wait event.
SID PID EVENT USERNAME OSUSER STATE WAIT_TIME WIS P1_P2_P3_TEXT
------ ---------- ------------------------------ ---------- ---------- ------------------- --------- ----- ----------------------------------------
276 12445 library cache lock TST_USR test WAITING 0 1 handle address 399021346904-lock address
288 12449 library cache lock TST_USR test WAITING 0 4 handle address 399021346904-lock address
303 12453 library cache lock TST_USR test WAITING 0 4 handle address 399021346904-lock address
315 12457 library cache lock TST_USR test WAITING 0 4 handle address 399021346904-lock address
Posted in 12c, inmemory, Oracle database internals, Performance tuning, RAC | Tagged: $BUILD$, inmemory internals, kglLockWait, library cache lock, oracle performance, pstack, x$kgllk, x$kglob | 2 Comments »
Posted by Riyaj Shamsudeen on March 24, 2016
Over the last many years, some of you have invited me to attend conferences in India, and talk about Oracle RAC and performance. I have not had an opportunity to make it to conferences in India, until now
I am excited to announce that I will be participating in OTN sponsored Oracle ACE Director’s tour in India (April 23rd to May 2nd 2016), and presenting ( with deep dive demos ) about RAC, performance, and in-memory. This is a golden opportunity for you to learn some of the internal stuff that I talk about in my class too.
Refer http://otnyathra.com for further details.
I am also excited to be in the tour with Connor McDonald, Biju Thomas, Sai, and many others. Don’t forget to send an email to Sai, the conference rooms will fill out quickly.
Did I mention deep dive demos?
Update 1: Updating the presentation slides and scripts. Do ask if I miss any script that you are looking for
Posted in 12c, Performance tuning, Presentations, RAC | Tagged: oracle performance, RAC performance | 5 Comments »
Posted by Riyaj Shamsudeen on January 19, 2015
I will be talking in Rocky Mountain Oracle User Group Training Days 2015( http://www.rmoug.org), with live demos (hopefully there will be no failures in the demo). My topics are:
Feb 17: Deep dive: 3:15PM to 5:15PM – RAC 12c optimization: I will discuss RAC global cache layer in detail with a few demos. You probably can’t find these deep Global Cache layer details anywhere else
Feb 19: Wednesday: 2:45PM to 3:45PM – Advanced UNIX tools: I will discuss both Solaris and Linux advanced tools to debug deep performance issues.
Feb 19: Wednesday: 12:15PM – 1:15PM – Exadata SIG panel with Alex Fatkulin.
Come to Denver. Come on, it won’t be cold ( I think )
Uploading presentation and scripts of the presentation files: RMOUG_2015_Riyaj_RAC_12c_optim files
Presentation files for Unix tools demos: RMOUG_2015_Riyaj_Unix_tools
Posted in 11g, 12c, Presentations, RAC | Tagged: gc buffer busy, RAC performance, RAC training, rmoug | 1 Comment »
Posted by Riyaj Shamsudeen on October 6, 2014
While presenting at Oaktable World 2014 in San Fransisco, I discussed the in-memory pre-population speed and indicated that it takes about 30 minutes to 1 hour to load ~300GB of tables. Someone asked me “Why?” and that was a fair question. So, I profiled the in-memory pre-population at startup.
I profiled all in-memory worker sessions using Tanel’s snapper script and also profiled the processes in OS using Linux perf tool with 99Hz sample rate. As there is no other activity in the database server, it is okay to sample everything in the server. Snapper output will indicate where the time is spent; if the time is spent executing in CPU, then the perf report output will tell us the function call stack executing at that CPU cycle. Data from these two profiling methods will help us to understand the root cause of slowness.
- @snapper.sql out,gather=stw 600 10 “select sid from v$session where program like ‘%W00%'”
- Perf tool : perf record -F 99 -u oracle -g sleep 3600
Posted in 12c, in-memory, inmemory, Oracle database internals, Performance tuning, Presentations | Tagged: 12c in-memory, in-memory, kdzu, perf record, perf report, pre-population cpu time, pre-population speed, snapper | Leave a Comment »
Posted by Riyaj Shamsudeen on September 26, 2014
Many Oaktable members are planning to talk about deep technical topics in Oaktable world 2014. Looking at the agenda, I am excited, so many deep topics are planned. I will be talking about in-memory internals on Monday morning at 9AM, 9/29/2014, right after Mogens’ Keynote speech. You can find all details here: Oaktable world 2014. I will post my presentation slides after the presentation.
Start your open world week presentation with mine. Sorry, no beers planned at that time, it is 9AM, after all!
Thanks for attending my presentation at Oaktable World 2014. You can download the slides : In-memory_internals.pdf.
Also, our book Expert Oracle RAC 12c has been translated to Chinese language. You can find details about that book in one of the translator’s blog: Alex lizx.
Posted in 12c, in-memory, inmemory | Tagged: in-memory, oaktable world 2014, presentations | Leave a Comment »
Posted by Riyaj Shamsudeen on September 11, 2014
I enabled an huge 70G table for inmemory population, I expected the inmemory population to take a while, but the population didn’t complete even after letting it run for a day. Why?
Initial review of the server shows no issues, no resource starvation. This must be a problem with Oracle processes itself. I started digging further, and ASH data shows that in numerous samples the process was seen reading block using single block I/I calls. Also object_id matches with the table I was trying to populate.
select * from (
select start_time, end_time, sql_id,event, current_obj#, cnt_on_cpu + cnt_waiting tot_cnt,
rank () over ( order by (cnt_on_cpu + cnt_waiting) desc ) rnk
sum(decode(session_state,'ON CPU',1,0)) cnt_on_cpu,
first_value(sample_time) over( order by sample_time ) start_time,
last_value(sample_time) over( order by sample_time
rows between unbounded preceding and unbounded following ) end_time,
sql_id,event, session_state, current_obj#
(select * from v$active_session_history ash where session_id= &&sid and session_serial#=&&serial_number)
group by sql_id, event, current_obj#
START_TIME END_TIME SQL_ID EVENT CURRENT_OBJ# TOT_CNT RNK
------------------------- ------------------------- ------------- ------------------------------ ------------ ---------- ----------
18-AUG-14 08.42.03.702 AM 18-AUG-14 09.02.06.463 AM db file sequential read 168967 990 1
168967 156 2
direct path read 168967 50 3
bdwtqttka2w2y -1 3 4
bdwtqttka2w2y direct path read 168967 1 5
24uqc4aqrhdrs 168967 1 5
-1 1 5
Read the rest of this entry »
Posted in 12c, inmemory | Tagged: 12c inmemory, inmemory internals | 3 Comments »
Posted by Riyaj Shamsudeen on September 11, 2014
After the restart of a 12c inmemory database with 300GB+ SGA, I noticed that an Oracle background process sa00 was consuming a bit of CPU. Documentation suggests that it is SGA Allocator process, however, ipcs -ma command shows that the shared memory segment is already allocated. I was curious, of course, what would that background process will be allocating?.
Process stack of the process shows that it is touching SGA pages to pre-page SGA memory pages.
$ pstack 21131
#0 0x0000000000d9996e in ksmprepage_memory ()
#1 0x0000000000d99369 in ksm_prepage_sga_seg ()
#2 0x0000000003a5c78b in skgmapply ()
#3 0x0000000000da686a in ksmapply_v2 ()
#4 0x0000000000d9a82c in ksmprepage ()
#5 0x0000000000d99f89 in ksm_sslv_exec_cbk ()
#6 0x0000000000f79810 in ksvrdp ()
#7 0x00000000031013b7 in opirip ()
#8 0x0000000001bb0a08 in opidrv ()
#9 0x00000000026c0f71 in sou2o ()
#10 0x0000000000bbd85e in opimai_real ()
#11 0x00000000026cb6bc in ssthrdmain ()
#12 0x0000000000bbd72c in main ()
$ ps -ef|grep 21131
oracle 21131 1 96 15:00 ? 00:01:50 ora_sa00_XXXXXX
Two notable changes in this area:
1. Incidentally, pre_page_sga initialization parameter was defaulted to a value of FALSE until 11.2. In version 12.1, the parameter value defaults to TRUE.
2. As huge SGA is expected for inmemory databases, a new background process SA00 is also created to touch all SGA memory pages at startup.
As inmemory worker processes will be populating the inmemory column store soon after the startup, touching memory pages at instance startup makes sense, and the feature should improve the performance of inmemory population. At least, Worker processes doesn’t need to suffer from huge amount page faults. (note that this SGA is not using hugepages).
Change to the parameter pre_page_sga also should improve the performance of inmemory scan, as the memory map entries will be setup at the process startup. However, I am not quite clear, how this change will affect the performance of a connection storm, i.e. if there are numerous database connections in a short period of time and disconnects. Isn’t that the reason why the pre_page_sga was defaulted to FALSE? But, I need to test this thoroughly to understand the implications further.
Posted in 12c | Tagged: 12c inmemory, inmemory, inmemory internals, pre_page_sga | 4 Comments »
Posted by Riyaj Shamsudeen on September 11, 2014
I have been testing the inmemory column store product extensively and the product is performing well for our workload. However, I learnt a bit more about inmemory column store and I will be blogging a few them here. BTW, I will be talking about internals of inmemory in Oaktable world presentation, if you are in the open world 2014, you can come and see my talk: http://www.oraclerealworld.com/oaktable-world/agenda/
Inmemory_size initialization parameter determines the amount of memory allocated to the in-memory column store. But only 80% of that memory value is allocated to store the objects. For examples, if you set inmemory_size=272G, then only 217G (=272*0.8) is used to store the objects, and the remaining 55GB is allocated for inmemory journal and internal objects. This is the reason why the inmemory heap is also split and tagged: IMCA_RO and IMCA_RW. IMCA_RW seems to be storing inmemory journal and IMCA_RO is to store objects in the memory. (Previous statement is not completely validated yet).
SELECT mem inmem_size,
100 *((tot-bytes_not_pop)/tot) populate_percent
(SELECT SUM(INMEMORY_SIZE)/1024/1024/1024 mem,
SUM(bytes) /1024/1024/1024 tot ,
INMEM_SIZE DISK_SIZE BYTES_NOT_POP COMPRESSION_RATIO POPULATE_PERCENT
---------- ---------- ------------- ----------------- ----------------
217.25 231.17 .00 1.06407869 100
So, you should plan accordingly when you enable inmemory option. Notice that the compression ratio is about 1, but, that is because these objects are already compressed using HCC compression and so, memory compression is not going to give us a better compression. However, I see that size of a few objects have increased in-memory compared to the disk size, but, I am hoping that will be considered as a bug and will be fixed in later release.
Unfortunately, inmemory area chunks are not visible in v$sgastat at all, but v$sga shows the inmemory area correctly (show SGA output is correct as it queries v$sga too).
Posted in 12c, Oracle database internals, Performance tuning | Tagged: 12c inmemory, inmem summary, inmemory compression, inmemory internals, v$im_segments | Leave a Comment »
Posted by Riyaj Shamsudeen on August 1, 2014
In my earlier post, I talked about, how tableau can be used to visualize the data. In some cases, I find it useful to query AWR base tables directly using Python and graph it using matplotlib package quickly. Since python is preinstalled in almost all computers, I think, this method will be useful for almost everyone. Of course, you may not have all necessary packages installed in your computer, you can install the packages using install python packages . Of course, if you improve the script, please send it to me, I will share it in this blog entry.
Script is available as a zip file: plotdb.py
Script usage is straight forward. Unzip the zip file and you will have a .py script in the current directory. Execute the script (after adjusting permissions of the script) using the format described below:
# To graph the events for the past 60 days, for inst_id=1, connecting to PROD, with username system.
./plotdb.py -d PROD -u system -n 'latch free' -t e -i 1
# To graph the statistics for the past 60 days, for inst_id=2, connecting to PROD
./plotdb.py -d PRD -u system -n 'physical reads' -t s -i 2
A typical graph from the above script is:
Posted in 11g, 12c | Tagged: oracle performance, performance visualisation | 1 Comment »