You can find pdf version of the presentation here: Performance tuning – for developers . Feel free to ask questions in the comments.
Thank you Mary Elizabeth Mcneely for the opportunity to talk in the Dallas Oracle User Group technology meetup.
Archive for the ‘EBS11i’ Category
Posted by Riyaj Shamsudeen on October 6, 2010
You can find pdf version of the presentation here: Performance tuning – for developers . Feel free to ask questions in the comments.
Posted by Riyaj Shamsudeen on September 30, 2010
So, Here I was merrily enjoying OpenWorld 2010 presentations in SFO, I got a call from a client about a performance issue. Client recently upgraded from Version 9i to Version 10g in an E-Business environment. I had the privilege of consulting before the upgrade, so we setup the environment optimally, and upgrade itself was seamless. Client did not see much regression except One query: That query was running for hours in 10g compared to 15 minutes in 9i.
Review and Analysis
Reviewed the execution plan in the development database and I did not see any issues with the plan. Execution plan in development and production looked decent enough. I wasn’t able to reproduce the issue in the development database either. So, the client allowed me to trace the SQL statement in the production database. Since the size of data in few tables is different between production and development databases, we had to analyze the problem in production environment.
I had to collect as much data possible as the tracing was a one-time thing. I setup a small script to get process stack and process memory area of that Unix dedicated server process to collect more details, in addition to tracing the process with waits => true.
Execution plan from the production database printed below. [ Review the execution plan carefully, it is giving away the problem immediately.] One execution of this statement took 13,445 seconds and almost all of it spent in the CPU time. Why would the process consume 13,719 seconds of CPU time?. Same process completed in just 15 minutes in 9i, as confirmed by Statspack reports. [ As a side note, We collected enormous amount of performance data in 9i in the Production environment before upgrading to 10g, just so that we can quickly resolve any performance issues, and you should probably follow that guideline too]. That collection came handy and It is clear that SQL statement was completing in 15 minutes in 9i and took nearly 3.75 hours after upgrading the database to version 10g.
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 10 13719.71 13445.94 27 5086407 0 99938 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 12 13719.71 13445.94 27 5086407 0 99938 24 HASH GROUP BY (cr=4904031 pr=27 pw=0 time=13240600266 us) 24 NESTED LOOPS OUTER (cr=4904031 pr=27 pw=0 time=136204709 us) 24 NESTED LOOPS (cr=4903935 pr=27 pw=0 time=133347961 us) 489983 NESTED LOOPS (cr=3432044 pr=27 pw=0 time=104239982 us) 489983 NESTED LOOPS (cr=2452078 pr=27 pw=0 time=91156653 us) 489983 TABLE ACCESS BY INDEX ROWID HR_LOCATIONS_ALL (cr=1472112 pr=27 pw=0 time=70907109 us) 489983 INDEX RANGE SCAN HR_LOCATIONS_UK2 (cr=981232 pr=0 pw=0 time=54338789 us)(object id 43397) 489983 INDEX UNIQUE SCAN MTL_PARAMETERS_U1 (cr=979966 pr=0 pw=0 time=17972426 us)(object id 37657) 489983 INDEX UNIQUE SCAN HR_ORGANIZATION_UNITS_PK (cr=979966 pr=0 pw=0 time=10876601 us)(object id 43498) 24 INDEX RANGE SCAN UXPP_FA_LOCATIONS_N3 (cr=1471891 pr=0 pw=0 time=27325172 us)(object id 316461) 24 TABLE ACCESS BY INDEX ROWID PER_ALL_PEOPLE_F (cr=96 pr=0 pw=0 time=2191 us) 24 INDEX RANGE SCAN PER_PEOPLE_F_PK (cr=72 pr=0 pw=0 time=1543 us)(object id 44403)
pstack, pmap, and truss
Reviewing pstack output generated from the script shows many function calls kghfrempty, kghfrempty_ex, qerghFreeHashTable etc, implying hash table operations. Something to do with hash table consuming time?
( Only partial entries shown ) 0000000103f41528 kghfrempty 0000000103f466ec kghfrempty_ex 0000000103191f1c qerghFreeHashTable 000000010318e080 qerghFetch 00000001030b1b3c qerstFetch ... 0000000103f41558 kghfrempty 0000000103f466ec kghfrempty_ex 0000000103191f1c qerghFreeHashTable 000000010318e080 qerghFetch 00000001030b1b3c qerstFetch
Truss of the process also showed quite a bit of mmap calls. So, the process is allocating more memory to an hash table?
... mmap(0xFFFFFFFF231C0000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 7, 0) = 0xFFFFFFFF231C0000 ... pollsys(0xFFFFFFFF7FFF7EC8, 1, 0xFFFFFFFF7FFF7E00, 0x00000000) = 0 mmap(0xFFFFFFFF231D0000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 7, 0) = 0xFFFFFFFF231D0000 ...
Execution plan again ..
Reviewing the execution plan again showed an interesting issue. I am going to post only two relevant lines from the execution plan below. As you can see that elapsed time at NESTED LOOPS OUTER step is 136 seconds. But the elapsed time at the next HASH GROUP BY step is 13240 seconds, meaning nearly 13,100 seconds spent in the HASH GROUP BY Step alone! Why would the process spend 13,100 seconds in a group by operation? Actual SQL execution took only 136 seconds, but the group by operation took 13,100 seconds. That doesn’t make sense, Does it?
24 HASH GROUP BY (cr=4904031 pr=27 pw=0 time=13240600266 us) 24 NESTED LOOPS OUTER (cr=4904031 pr=27 pw=0 time=136204709 us) ...
OFE = 9i
Knowing that time is spent in the Group by operation and that the 10g new feature Hash Grouping method is in use, I decided to test this SQL statement execution with 9i optimizer. The SQL completed in 908 seconds with OFE(optimizer_features_enabled) set to 188.8.131.52 (data is little bit different since production is an active environment). You can also see that SORT technique is used to group the data.
alter session set optimizer_features_enabled=184.108.40.206; Explain plan : call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 106985 887.41 908.25 282379 3344916 158 1604754 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 106987 887.41 908.25 282379 3344916 158 1604754 4 SORT GROUP BY (cr=2863428 pr=0 pw=0 time=37934456 us) 4 NESTED LOOPS OUTER (cr=2863428 pr=0 pw=0 time=34902519 us) 4 NESTED LOOPS (cr=2863412 pr=0 pw=0 time=34198726 us) 286067 NESTED LOOPS (cr=2003916 pr=0 pw=0 time=24285794 us) 286067 NESTED LOOPS (cr=1431782 pr=0 pw=0 time=19288024 us) 286067 TABLE ACCESS BY INDEX ROWID HR_LOCATIONS_ALL (cr=859648 pr=0 pw=0 time=13568456 us) 286067 INDEX RANGE SCAN HR_LOCATIONS_UK2 (cr=572969 pr=0 pw=0 time=9271380 us)(object id 43397) 286067 INDEX UNIQUE SCAN MTL_PARAMETERS_U1 (cr=572134 pr=0 pw=0 time=4663154 us)(object id 37657) ...
Knowing the problem is in the GROUP BY step, we setup a profile with _gby_hash_aggregation_enabled set to FALSE, disabling the new 10g feature for that SQL statement. With the SQL profile, performance of the SQL statement is comparable to pre-upgrade timing.
This almost sounds like a bug! Bug 8223928 is matching with this stack, but it is the opposite. Well, client will work with the support to get a bug fix for this issue.
In summary, you can use scientific methods to debug performance issues. Revealing the details underneath, will enable you to come up with a workaround quickly, leading to a faster resolution.
Note that, I am not saying hash group by feature is bad. Rather, we seem to have encountered an unfortunate bug which caused performance issues at this client. I think, Hash Grouping is a good feature as the efficiency of grouping operations can be improved if you have ample amount of memory. That’s the reason why we disabled this feature at the statement level, NOT at the instance level.
This blog in a traditional format hash_group_by_orainternals
I am adding a script to capture pmap and pstack output in a loop for 1000 times, with 10 seconds interval. Tested in Oracle Solaris.
#! /bin/ksh pid=$1 (( cnt=1000 )) while [[ $cnt -gt 0 ]]; do date pmap -x $pid pstack $pid echo $cnt (( cnt=cnt-1 )) sleep 10 done
To call the script: assuming 7887 is the UNIX pid of the process.
nohup ./pmap_loop.ksh 7887 >> /tmp/a1.lst 2>>/tmp/a1.lst &
Syntax for the truss command is given below. Please remember, you can’t use pmap, pstack and truss concurrently. These commands stops the process (however short that may be!) and inspects them, so use these commands sparingly. [ I had a client who used to run truss on LGWR process on a continuous(!) basis and database used to crash randomly!]. I realize that pmap/pstack/truss can be scripted to work together, but that would involve submitting a background process for the truss command and killing that process after a small timeout window. That would be a risky approach in a Production environment and So, I prefer to use truss command manually and CTRL+C it after few seconds.
truss -d -E -o /tmp/truss.lst -p 7887
I can not stress enough, not to overuse these commands in a Production environment. Command strace( Linux), tusc (HP) are comparable commands of truss(Solaris).
Posted by Riyaj Shamsudeen on February 17, 2010
It is very disappointing to me that I had to cancel my trip to RMOUG training days. I am sick and was not able to catch the flight due to that.
But, I can always share my presentations here. I had two presentations planned in this training day and can be accessed as below:
Why optimizer hates my sql
RMOUG training days audience: Please accept my sincere apologies.
Posted in CBO, EBS11i, Oracle database internals, Performance tuning, Presentations, RAC | Tagged: CBO, object remastering, oracle performance, RAC performance, RAC performance myths, UDP RAC, udp_xmit_hiwat, upd_max_buf, _gc_affinitiy_minimum, _gc_affinity_limit, _gc_affinity_time | Leave a Comment »
Posted by Riyaj Shamsudeen on May 6, 2009
I just presented about Oracle 11g new features specific to performance in COLLABORATE 2009, Orlando Florida. You can download presentation and paper from here:
I met couple of new friends and many familiar faces.
Catherin Devlin introduced sqlpython tool to me. This tool is an user friendly replacement for sqlplus with many UNIX like goodies. This tool will be an useful addition to command line tools. You can see installation instructions for sqlpython here.
I also attended an interesting application performance panel discussion with fellow Oakie Mark W Farnheim, Mike Brown and Sandra Vucinic. We discussed few customer issues.
An interesting problem discussion is worth mentioning. A client clones production database to development database using full rapidclone methodology. But, access plans for a SQL statement is different between production and development, even though everything was exactly the same. I am hoping, that client will send 10053 trace files from both databases and I will blog about it if I get anything from that client. I think, bind peeking is causing the plan difference, but need to analyze 10053 trace file to confirm that. Another possibility is that, may be , he is not cloning Oracle Software and some software patches are not existing in cloned environment. Need trace files to proceed further here.
Posted in 11g, CBO, EBS11i, Oracle database internals, Performance tuning, Presentations | Tagged: 11g, adaptive cursor sharing, bind peeking, CBO, deduplicate, extended_stats, invisible indexes, lob compression, OD lock_type, oracle performance, securefile lobs | Leave a Comment »
Posted by Riyaj Shamsudeen on February 25, 2009
Yes, you read it correct. That shocked me too.
I was trying to understand global cache waits for a client. Naturally, I queried statspack tables and analyzed the data for just one day using a script. Surprisingly, tab$ came as top consumer of global cache waits. I was shocked by the revelations and couldn’t believe it! If something doesn’t make sense, look more closely, right?
Global cache waits
Database version is 9i. Statspack table stats$seg_stat is the source for this script and that table is populated from v$segment_stats. So, these column values (global_cache_cu_blocks_served and global_cache_cr_blocks_served) are cumulative. To find statistics for a specific period, we need to subtract column value of prior row from current row. Analytic function lag can be useful for this.
In this analytic function printed below, partitioning clause uses instance_number, startup_time and dataobj#. All rows with same value for these three columns will be considered in one data partition and then rows ordered by snap_id within that data partition. Lag will pull the row from prior snap_id in that partition. Then, we subtract from current value to get the difference. Please refer to this paper: Performance tuning with SQL new features – paper for more information about analytic functions.
... global_cache_cu_blocks_served - lag(global_cache_cu_blocks_served,1,0) over (partition by instance_number,startup_time, dataobj#, obj# order by snap_id ) global_cache_cu_blocks_served, ...
Script and output
Complete script printed below and running that against a database.
Posted by Riyaj Shamsudeen on January 2, 2009
Earlier, I blogged about resolving cache buffers chains latch contention in my earlier entry , in which, root cause was excessive index access due to Nested Loops join. Recently, we resolved another similar issue.
CPU usage was very high in production database server in user% mode. Dynamic performance view v$session_wait indicated excessive waits for latch contention. Output from a script wait_details.sql (scripts available at the end of this post) shows that many sessions were waiting for ‘latch free’ event. Also, address for these latch children are the same, meaning all these sessions are trying to access one latch children.
SQL> @wait_details SID PID EVENT USERNAME STATE WAIT_TIME WIS P1_P2_P3_TEXT ------ ------- ------------- --------- ---------- ------------------- --------- ----- ----------------------------------- 91 24242 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 0 101 4884 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 0 116 23899 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 0 187 19499 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 0 108 23498 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 3 194 23701 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 0 202 26254 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 4 220 23274 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 0 227 23643 latch free CSTMOP WAITED KNOWN TIME 2 0 address 69476807024-number 98-tries 0 331 26519 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 0 297 23934 latch free CSTMOP WAITING 0 0 address 69476807024-number 98-tries 3 ....
We can identify SQL causing latch contention querying v$session_wait. From the output below, SQL with hash_value 1509082258 is suspicious since there are many sessions executing that SQL and waiting / waited recently for ‘latch free’ event.
select substr(w.event, 1, 28 ) event, sql_hash_value, count(*) from v$session_wait w, v$session s, v$process p where s.sid=w.sid and p.addr = s.paddr and s.username is not null and event not like '%pipe%' and event not like 'SQL*%' group by substr(w.event, 1, 28), sql_hash_value ; EVENT SQL_HASH_VALUE COUNT(*) ------------------------------ -------------- ---------- enqueue 3740270 1 enqueue 747790152 1 enqueue 1192921796 1 latch free 622474477 3 latch free 1509082258 58 <--- latch free 1807800540 1 global cache null to x 3740270 1 global cache null to x 1473456670 1 global cache null to x 3094935671 1 db file sequential read 109444956 1
Mapping to object_name