Posted by Riyaj Shamsudeen on March 20, 2014
After collaborating with many performance engineers in a RAC database, I have come to realize that there are common pattern among the (mis)diagnosis. This blog about discussing those issues. I talked about this in Hotsos 2014 conference also.
Here are the golden rules of RAC performance diagnostics. These rules may not apply general RAC configuration issues though.
- Beware of top event tunnel vision
- Eliminate infrastructure as an issue
- Identify problem-inducing instance
- Review send-side metrics also
- Use histograms, not just averages
Looks like, this may be better read as a document. So, please use the pdf files of the presentation and a paper. Presentation slide #10 shows indepth coverage on gc buffer busy* wait events. I will try to blog about that slide later (hopefully).
Golden rules of RAC diagnostics paper
Golden rules of rac diagnostics ppt
Scripts mentioned in the presentation can be downloaded here.
Posted in 11g, Performance tuning, Presentations, RAC | Tagged: gc buffer busy, oracle performance, RAC performance, RAC performance diagnostics, RAC performance myths, RAC performance scripts | 3 Comments »
Posted by Riyaj Shamsudeen on February 28, 2014
I blogged about Dynamic Resource Mastering (DRM) in RAC here . DRM freezes the global resources during the reconfiguration event and no new resources can be allocated during the reconfiguration. This freeze has a dramatic effect of inducing huge amount of waits for gc buffer busy [acquire|release] events and other gcs drm freeze release, gcs remaster events. In database version 12c, DRM has been improved further.
A major improvement I see is that not all resources are frozen at any time. Essentially, resources are broken down in to partitions and only a resource partition is frozen. This improvement should decrease the impact of DRM related waits tremendously.
LMON Trace file
Following shows the snippet from the LMON trace file. As you see, only one resource partition is frozen, at-a-time. Resources in the first partition is frozen, completes the resource remastering tasks, and unfreezes that resource partition. Then freezes next resource partition and continues until all resources are remastered.
Read the rest of this entry »
Posted in 12c, Performance tuning, RAC | Tagged: DRM, DRM freeze, DRM RAC, gc buffer busy | Leave a Comment »
Posted by Riyaj Shamsudeen on February 25, 2014
I will be presenting in HOTSOS symposium 2014 discussing correct methods to diagnose RAC performance issues. Very surprisingly, even very senior performance engineers make mistakes in their analysis while reviewing RAC issues. Come to my presentation and learn the golden rules of RAC performance diagnostics.
Posted in 12c, Performance tuning, Presentations, RAC | Leave a Comment »
Posted by Riyaj Shamsudeen on November 12, 2013
It is easier to create one or two AWR reports quickly using OEM. But, what if you have to create AWR reports for many snapshots? For example, your Oracle support analyst wants you to supply 10 1-hour AWR reports from 10AM to 8PM in a 8 node cluster? That’s about 80 AWR reports to create! Okay, okay, I may(!) be overselling it, but you get the point. It is useful to have a script to create AWR report for all instances for a given range of snapshot IDs. Following scripts are handy:
|1. To create one AWR report per instance, for the last snap duration :
|2. Same as (1) but in html format :
|3. To create one AWR report per instance, for a range of snap IDs :
|4. To create one AWR report, per instance, per snap ID :
Zip file: awrrpt_scripts
These scripts do not modify anything in the database, just retrieves the data using dbms_workload_repository package. Test the scripts to understand further. Of course, you need access to dbms_workload_repository and access to gv$instance.
Posted in Oracle database internals, Performance tuning, RAC | Tagged: AWR reports, awrrpt.sql, awrrpt_all_gen.sql, awrrpt_all_range_gen.sql | 4 Comments »
Posted by Riyaj Shamsudeen on September 18, 2013
I will be hacking RAC internals with few LINUX tools in Oaktable world presentation series, in SFO. Details are available at Oaktable World 2013
Hope to see you there!
Posted in Oracle database internals, RAC | Tagged: RAC internals | Leave a Comment »
Posted by Riyaj Shamsudeen on September 9, 2013
I blogged about DFS lock handle contention in an earlier blog entry. SV resources in Global Resource Directory (GRD) is used to maintain the cached sequence values. I will further probe the internal mechanics involved in the cached sequences. I will also discuss minor changes in the resource names to support pluggable databases (version 12c).
Let’s create an ordered sequence in rs schema and then query values from the sequence few times.
create sequence rs.test_seq order cache 100;
select rs.test_seq.nextval from dual; -- repeated a few times.
Sequence values are permanently stored in the seq$ dictionary table. Cached sequence values are maintained in SV resources in GRD and SV resource names follows the naming convention to include object_id of the sequence. I will generate a string using a small helper script and we will use that resource name to search in the GRD.
SELECT DISTINCT '[0x'
|| '],[SV]' res
FROM dba_objects WHERE object_name=upper('&objname')
AND owner=upper('&owner') AND object_type LIKE 'SEQUENCE%'
Enter value for objname: TEST_SEQ
Enter value for owner: RS
Read the rest of this entry »
Posted in 12c, Oracle database internals, Performance tuning, RAC, weird stuff | Tagged: oracle performance, pluggable database, RAC internals, RAC performance, SV resource, weird stuff | 2 Comments »
Posted by Riyaj Shamsudeen on September 8, 2013
A quick note, Expert Oracle RAC book co-written by me is available now: Expert Oracle RAC 12c. I have written about 6 chapters covering the RAC internals that you may want to learn :) I even managed to discuss the network internals in deep, after all, network is one of the most important component of a RAC cluster.
Posted in 12c, Oracle database internals, Performance tuning, RAC | Tagged: oracle performance, performance, RAC internals, RAC performance | Leave a Comment »
Posted by Riyaj Shamsudeen on June 12, 2013
This blog entry is to discuss a method to identify the objects inducing higher amount of redo. First,we will establish that redo size increased sharply and then identify the objects generating more redo. Unfortunately, redo size is not tracked at a segment level. However, you can make an educated guess using ‘db block changes’ statistics. But, you must use logminer utility to identify the objects generating more redo scientifically.
Detecting redo size increase
AWR tables (require Diagnostics license) can be accessed to identify the redo size increase. Following query spools the daily rate of redo size. You can easily open the output file redosize.lst in an Excel spreadsheet and graph the data to visualize the redo size change. Use pipe symbol as the delimiter while opening the file in excel spreadsheet.
REM You need Diagnostic Pack licence to execute this query!
REM Author: Riyaj Shamsudeen
col begin_interval_time format a30
set lines 160 pages 1000
col end_interval_time format a30
set colsep '|'
alter session set nls_date_format='DD-MON-YYYY';
with redo_sz as (
SELECT sysst.snap_id, sysst.instance_number, begin_interval_time ,end_interval_time , startup_time,
VALUE - lag (VALUE) OVER ( PARTITION BY startup_time, sysst.instance_number
ORDER BY begin_interval_time, startup_time, sysst.instance_number) stat_value,
EXTRACT (DAY FROM (end_interval_time-begin_interval_time))*24*60*60+
EXTRACT (HOUR FROM (end_interval_time-begin_interval_time))*60*60+
EXTRACT (MINUTE FROM (end_interval_time-begin_interval_time))*60+
EXTRACT (SECOND FROM (end_interval_time-begin_interval_time)) DELTA
FROM sys.wrh$_sysstat sysst , DBA_HIST_SNAPSHOT snaps
WHERE (sysst.dbid, sysst.stat_id) IN ( SELECT dbid, stat_id FROM sys.wrh$_stat_name WHERE stat_name='redo size' )
AND snaps.snap_id = sysst.snap_id
AND snaps.dbid =sysst.dbid
and begin_interval_time > sysdate-90
, sum(stat_value) redo1
group by instance_number,
order by instance_number, 2
Visualizing the data will help you to quickly identify any pattern anomalies in redo generation. Here is an example graph created from the excel spreadsheet and see that redo size increased recently.
Read the rest of this entry »
Posted in 11g, Oracle database internals, Performance tuning, RAC | Tagged: identify objects redo, redo internals, segment_stats.sql, v$logmnr_contents, v$segment_stats | 21 Comments »
Posted by Riyaj Shamsudeen on June 5, 2013
The restart of a UNIX server call initialization scripts to start processes and daemons. Every platform has a unique directory structure and follows a method to implement server startup sequence. In Linux platform (prior to Linux 6), initialization scripts are started by calling scripts in the /etc/rcX.d directories, where X denotes the run level of the UNIX server. Typically, Clusterware is started at run level 3. For example, ohasd daemon started by /etc/rc3.d/S96ohasd file by supplying start as an argument. File S96ohasd is linked to /etc/init.d/ohasd.
S96ohasd -> /etc/init.d/ohasd
/etc/rc3.d/S96ohasd start # init daemon starting ohasd.
Similarly, a server shutdown will call scripts in rcX.d directories, for example, ohasd is shut down by calling K15ohasd script:
K15ohasd -> /etc/init.d/ohasd
/etc/rc3.d/K15ohasd stop #UNIX daemons stopping ohasd
In Summary, server startup will call files matching the pattern of S* in the /etc/rcX.d directories. Calling sequence of the scripts is in the lexical order of script name. For example, S10cscape will be called prior to S96ohasd, as the script S10cscape occurs earlier in the lexical sequence.
Google if you want to learn further about RC startup sequence. Of course, Linux 6 introduces Upstart feature and the mechanism is a little different: http://en.wikipedia.org/wiki/Upstart
That’s not the whole story!
Have you ever thought why the ‘crsctl start crs’ returns immediately? You can guess that Clusterware is started in the background as the command returns to UNIX prompt almost immediately. Executing the crsctl command just modifies the ohasdrun file content to ‘restart’. It doesn’t actually perform the task of starting the clusterware. Daemon init.ohasd reads the ohasdrun file every few seconds and starts the Clusterware if the file content is changed to ‘restart’.
# cat /etc/oracle/scls_scr/oel6rac1/root/ohasdrun
Read the rest of this entry »
Posted in 11g, Oracle database internals, RAC | Tagged: clusterware startup, ohasd startup, ohasdrun, ohasdstr, RAC, RC scripts clusterware | 18 Comments »
Posted by Riyaj Shamsudeen on August 29, 2012
There are many questions from few of my clients about asmlib support in RHEL6, as they are gearing up to upgrade the database servers to RHEL6. There is a controversy about asmlib support in RHEL6. As usual, I will only discuss technical details in this blog entry.
ASMLIB is applicable only to Linux platform and does not apply to any other platform.
Now, you might ask why bother and why not just use OEL and UK? Well, not every Linux server is used as a database server. In a typical company, there are hundreds of Linux servers and just few percent of those servers are used as Database servers. Linux system administrators prefer to keep one flavor of Linux distribution for management ease and so, asking clients to change the distribution from RHEL to OEL or OEL to RHEL is always not a viable option.
Do you need to use ASMLIB in Linux?
Short answer is No. Long answer is possibly No. ASMLIB is an optional support library and eases the administration of ASM devices. Especially, it is helpful while adding new devices to the nodes in a cluster. ASMLIB essentially stamps the devices and so, it is easily visible in other nodes of a cluster in the next asm scandisk. asmlib also provides device persistence, which is the important benefit of ASM (see the discussion below for more details about device persistence).
Read the rest of this entry »
Posted in 11g, RAC | Tagged: asmlib, device mapper, multipath, multipath.conf, oracle performance, oracle RAC asmlib, RAC, udev, udev rules | 11 Comments »