Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Clusterware Startup

Posted by Riyaj Shamsudeen on June 5, 2013

The restart of a UNIX server call initialization scripts to start processes and daemons. Every platform has a unique directory structure and follows a method to implement server startup sequence. In Linux platform (prior to Linux 6), initialization scripts are started by calling scripts in the /etc/rcX.d directories, where X denotes the run level of the UNIX server. Typically, Clusterware is started at run level 3. For example, ohasd daemon started by /etc/rc3.d/S96ohasd file by supplying start as an argument. File S96ohasd is linked to /etc/init.d/ohasd.

S96ohasd -> /etc/init.d/ohasd

/etc/rc3.d/S96ohasd start  # init daemon starting ohasd.

Similarly, a server shutdown will call scripts in rcX.d directories, for example, ohasd is shut down by calling K15ohasd script:

K15ohasd -> /etc/init.d/ohasd
/etc/rc3.d/K15ohasd stop  #UNIX daemons stopping ohasd

In Summary, server startup will call files matching the pattern of S* in the /etc/rcX.d directories. Calling sequence of the scripts is in the lexical order of script name. For example, S10cscape will be called prior to S96ohasd, as the script S10cscape occurs earlier in the lexical sequence.

Google if you want to learn further about RC startup sequence. Of course, Linux 6 introduces Upstart feature and the mechanism is a little different: http://en.wikipedia.org/wiki/Upstart

That’s not the whole story!

Have you ever thought why the ‘crsctl start crs’ returns immediately? You can guess that Clusterware is started in the background as the command returns to UNIX prompt almost immediately. Executing the crsctl command just modifies the ohasdrun file content to ‘restart’. It doesn’t actually perform the task of starting the clusterware. Daemon init.ohasd reads the ohasdrun file every few seconds and starts the Clusterware if the file content is changed to ‘restart’.

# cat /etc/oracle/scls_scr/oel6rac1/root/ohasdrun
restart

If you stop has using ‘crsctl stop has’ , then the ohasdstr file content is modified to stop and so, init.ohasd daemon will not restart Clusterware. However, stop command is synchronous and executes the stop of clusterware too.

# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'oel6rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'oel6rac1'
..

The content of ohasdrun is modified to stop:

# cat  /etc/oracle/scls_scr/oel6rac1/root/ohasdrun
stop # 

In a nutshell, init.ohasd daemon is monitoring the ohasdrun file and starts the Clusterware stack if the value in the file is modified to restart.

Inittab

Init.ohasd daemon is an essential daemon for Clusterware startup. Even if the Clusterware is not running on a node, you can start the Clusterware from a different node. How does that work? Init.ohasd is the reason.

The init.ohasd daemon is started from /etc/inittab. Entries in the inittab is monitored by the init daemon (pid=1) and init daemon will react if the inittab file is modified. The init daemon monitors all processes listed in the inittab file and reacts according to the configuration in the inittab file. For example, if init.ohasd fails for some reason, it is immediately restarted by init daemon.

Following is an example entry in the inittab file. Fields are separated a colon, second field indicates that init.ohasd will be started in run level 3, and the third field indicates an action field. Restart in the action field means that, if the target process exist, just continue scanning inittab file; if the target process does not exist, then restart the process.

#cat /etc/inittab
…
h1:3:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

If you issue a clusterware startup command from a remote node, that a message sent to init.ohasd daemon in the target node, and the daemon initates the clusterware startup. So, init.ohasd will be always running irrespective of whether the Clusterware is running or not.

You can use strace on init.ohasd to verify this behavior. Following are a few relevant lines from the output of strace command of init.ohasd process:

…
5641  1369083862.828494 open("/etc/oracle/scls_scr/oel6rac1/root/ohasdrun", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
5641  1369083862.828581 dup2(3, 1)      = 1
5641  1369083862.828606 close(3)        = 0
5641  1369083862.828631 execve("/bin/echo", ["/bin/echo", "restart"], [/* 12 vars */]) = 0
…

Just for fun!

So, what happens if I manually modify that ohasdrun to restart? I copied the ohasdrun to a temporary file (/tmp/a1.lst) and stopped the clusterware.

cp /etc/oracle/scls_scr/oel6rac1/root/ohasdrun /tmp/a1.lst 
# crsctl stop has

I verified that Clusterware is completely stopped. Now, I will copy the file again overlaying ohasdrun:

# cat /tmp/a1.lst
restart
# cp /tmp/a1.lst  /etc/oracle/scls_scr/oel6rac1/root/ohasdrun

After a minute or so, I see that Clusterware processes are started. Not that, you would use this type of hack in a Production cluster, but this test proves my point.

It’s also important not to remove the files in the scls_scr directories. Any removal of the files underneath the scls_scr directory structure can lead to an invalid configuration.

There are also two more files in the scls_scr directory structure. Ohasdstr file decides if the HAS daemon should be started automatically or not. For example, if you execute ‘crsctl disable has’, that command modifies ohasdstr file contents to ‘disable’. Similarly, crsstart file controls CRS daemon startup. Again, you should use recommended commands to control the startup, rather than modifying any of these files directly.

11.2.0.1 and HugePages

If you tried to configure hugepages in 11.2.0.1 clusterware, by increasing memlock kernel parameter for GRID and database owner, you would have realized that database doesn’t use hugepages if started by the clusterware. Database startup using sqlplus will use hugepages, but the database startup using srvctl may not use hugepages.

As the new processes are cloned from the init.ohasd daemon, until init.ohasd is restarted, user level memlock limit changes are not correctly reflected in an already running process. Only recommended way to resolve the problem is to restart the node completely (not just the clusterware), as init.ohasd daemon must be restarted to reflect the user level limits.

Version 11.2.0.2 fixes this issue by explicitly calling ulimit command from /etc/init.d/ohasd files.


Summary

In Summary, init.ohasd process is an important process. Files underneath scls_scr directory is controlling the startup behavior. This also means that if a server is restarted, you don’t need to explicitly stop the Clusterware. You can let the server startup to restart the Clusterware.

PS: Some of you have asked my why my blogging frequency has decreased. I have been extremely busy co-authoring a book on RAC titled “Expert RAC Practices 12c”. We are covering lots of interesting stuff in the book. As soon as 12c release is Production, we can release the book also.

6 Responses to “Clusterware Startup”

  1. Amos said

    very good, thanks!

  2. Mike said

    Awesome info. Much appreciated!

  3. NT said

    Good Stuff, very useful info.

  4. satya said

    Very Good Stuff .. Keep posting … Thanks a lot …

  5. Mohammed said

    Thanks for sharing the details…

    One small question out of topic related to VIP (virtual IP in RAC)

    If node 1 failed then VIP resources/services will move/shifted to Node 2. Very true…

    As per theory all old connection/sessions which are connected to VIP1 will be discarded from node 2 by RE-ARP Protocol broadcast message.

    Then how oracle handle basic + session failed over property..

    For Example:

    If I do select * from dba_source ..running on node 1…… & node 1 failed for some unexpected reason then same session will move to node 2 and continue select operation without any interruption. (How Oracle handle this request.. ?)

    Please request to shed some light on this.

    Thanks

    Yousuf,

    • Mohammed
      TAF is handled mostly by the client side JDBC/OCI driver. State of the connection, including cursor position, is remembered by the driver. Upon session failures, the driver requests a new connection to the listener process, replays the SQL statement, and returns rows from the cursor position.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 177 other followers

%d bloggers like this: