Clusterware Startup

Posted by Riyaj Shamsudeen on June 5, 2013

The restart of a UNIX server call initialization scripts to start processes and daemons. Every platform has a unique directory structure and follows a method to implement server startup sequence. In Linux platform (prior to Linux 6), initialization scripts are started by calling scripts in the /etc/rcX.d directories, where X denotes the run level of the UNIX server. Typically, Clusterware is started at run level 3. For example, ohasd daemon started by /etc/rc3.d/S96ohasd file by supplying start as an argument. File S96ohasd is linked to /etc/init.d/ohasd.

S96ohasd -> /etc/init.d/ohasd

/etc/rc3.d/S96ohasd start  # init daemon starting ohasd.

Similarly, a server shutdown will call scripts in rcX.d directories, for example, ohasd is shut down by calling K15ohasd script:

K15ohasd -> /etc/init.d/ohasd
/etc/rc3.d/K15ohasd stop  #UNIX daemons stopping ohasd

In Summary, server startup will call files matching the pattern of S* in the /etc/rcX.d directories. Calling sequence of the scripts is in the lexical order of script name. For example, S10cscape will be called prior to S96ohasd, as the script S10cscape occurs earlier in the lexical sequence.

Google if you want to learn further about RC startup sequence. Of course, Linux 6 introduces Upstart feature and the mechanism is a little different: http://en.wikipedia.org/wiki/Upstart

That’s not the whole story!

Have you ever thought why the ‘crsctl start crs’ returns immediately? You can guess that Clusterware is started in the background as the command returns to UNIX prompt almost immediately. Executing the crsctl command just modifies the ohasdrun file content to ‘restart’. It doesn’t actually perform the task of starting the clusterware. Daemon init.ohasd reads the ohasdrun file every few seconds and starts the Clusterware if the file content is changed to ‘restart’.

# cat /etc/oracle/scls_scr/oel6rac1/root/ohasdrun
restart

If you stop has using ‘crsctl stop has’ , then the ohasdstr file content is modified to stop and so, init.ohasd daemon will not restart Clusterware. However, stop command is synchronous and executes the stop of clusterware too.

# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'oel6rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'oel6rac1'
..

The content of ohasdrun is modified to stop:

# cat  /etc/oracle/scls_scr/oel6rac1/root/ohasdrun
stop #

In a nutshell, init.ohasd daemon is monitoring the ohasdrun file and starts the Clusterware stack if the value in the file is modified to restart.

Inittab

Init.ohasd daemon is an essential daemon for Clusterware startup. Even if the Clusterware is not running on a node, you can start the Clusterware from a different node. How does that work? Init.ohasd is the reason.

The init.ohasd daemon is started from /etc/inittab. Entries in the inittab is monitored by the init daemon (pid=1) and init daemon will react if the inittab file is modified. The init daemon monitors all processes listed in the inittab file and reacts according to the configuration in the inittab file. For example, if init.ohasd fails for some reason, it is immediately restarted by init daemon.

Following is an example entry in the inittab file. Fields are separated a colon, second field indicates that init.ohasd will be started in run level 3, and the third field indicates an action field. Restart in the action field means that, if the target process exist, just continue scanning inittab file; if the target process does not exist, then restart the process.

#cat /etc/inittab
…
h1:3:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

If you issue a clusterware startup command from a remote node, that a message sent to init.ohasd daemon in the target node, and the daemon initates the clusterware startup. So, init.ohasd will be always running irrespective of whether the Clusterware is running or not.

You can use strace on init.ohasd to verify this behavior. Following are a few relevant lines from the output of strace command of init.ohasd process:

…
5641  1369083862.828494 open("/etc/oracle/scls_scr/oel6rac1/root/ohasdrun", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
5641  1369083862.828581 dup2(3, 1)      = 1
5641  1369083862.828606 close(3)        = 0
5641  1369083862.828631 execve("/bin/echo", ["/bin/echo", "restart"], [/* 12 vars */]) = 0
…

Just for fun!

So, what happens if I manually modify that ohasdrun to restart? I copied the ohasdrun to a temporary file (/tmp/a1.lst) and stopped the clusterware.

cp /etc/oracle/scls_scr/oel6rac1/root/ohasdrun /tmp/a1.lst 
# crsctl stop has

I verified that Clusterware is completely stopped. Now, I will copy the file again overlaying ohasdrun:

# cat /tmp/a1.lst
restart
# cp /tmp/a1.lst  /etc/oracle/scls_scr/oel6rac1/root/ohasdrun

After a minute or so, I see that Clusterware processes are started. Not that, you would use this type of hack in a Production cluster, but this test proves my point.

It’s also important not to remove the files in the scls_scr directories. Any removal of the files underneath the scls_scr directory structure can lead to an invalid configuration.

There are also two more files in the scls_scr directory structure. Ohasdstr file decides if the HAS daemon should be started automatically or not. For example, if you execute ‘crsctl disable has’, that command modifies ohasdstr file contents to ‘disable’. Similarly, crsstart file controls CRS daemon startup. Again, you should use recommended commands to control the startup, rather than modifying any of these files directly.

11.2.0.1 and HugePages

If you tried to configure hugepages in 11.2.0.1 clusterware, by increasing memlock kernel parameter for GRID and database owner, you would have realized that database doesn’t use hugepages if started by the clusterware. Database startup using sqlplus will use hugepages, but the database startup using srvctl may not use hugepages.

As the new processes are cloned from the init.ohasd daemon, until init.ohasd is restarted, user level memlock limit changes are not correctly reflected in an already running process. Only recommended way to resolve the problem is to restart the node completely (not just the clusterware), as init.ohasd daemon must be restarted to reflect the user level limits.

Version 11.2.0.2 fixes this issue by explicitly calling ulimit command from /etc/init.d/ohasd files.

Summary
In Summary, init.ohasd process is an important process. Files underneath scls_scr directory is controlling the startup behavior. This also means that if a server is restarted, you don’t need to explicitly stop the Clusterware. You can let the server startup to restart the Clusterware.

PS: Some of you have asked my why my blogging frequency has decreased. I have been extremely busy co-authoring a book on RAC titled “Expert RAC Practices 12c”. We are covering lots of interesting stuff in the book. As soon as 12c release is Production, we can release the book also.

This entry was posted on June 5, 2013 at 6:08 pm and is filed under 11g, Oracle database internals, RAC. Tagged: clusterware startup, ohasd startup, ohasdrun, ohasdstr, RAC, RC scripts clusterware. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

21 Responses to “Clusterware Startup”

Amos said

June 7, 2013 at 7:10 pm
very good, thanks!

Reply
Mike said

December 11, 2013 at 3:26 pm
Awesome info. Much appreciated!

Reply
NT said

December 14, 2013 at 1:21 am
Good Stuff, very useful info.

Reply
satya said

January 6, 2014 at 6:35 pm
Very Good Stuff .. Keep posting … Thanks a lot …

Reply
Mohammed said

February 1, 2014 at 4:21 pm
Thanks for sharing the details…

One small question out of topic related to VIP (virtual IP in RAC)

If node 1 failed then VIP resources/services will move/shifted to Node 2. Very true…

As per theory all old connection/sessions which are connected to VIP1 will be discarded from node 2 by RE-ARP Protocol broadcast message.

Then how oracle handle basic + session failed over property..

For Example:

If I do select * from dba_source ..running on node 1…… & node 1 failed for some unexpected reason then same session will move to node 2 and continue select operation without any interruption. (How Oracle handle this request.. ?)

Please request to shed some light on this.

Thanks

Yousuf,

Reply
- Riyaj Shamsudeen said
  
  February 2, 2014 at 11:25 pm
  Mohammed
  TAF is handled mostly by the client side JDBC/OCI driver. State of the connection, including cursor position, is remembered by the driver. Upon session failures, the driver requests a new connection to the listener process, replays the SQL statement, and returns rows from the cursor position.
  
  Reply
  - oracle said
    
    July 10, 2014 at 1:46 pm
    You have been doing lot of good to people like me. Thank you very much for your awesome work. It takes lot of dedication, effort and good heart to share the knowledge.
  - Riyaj Shamsudeen said
    
    July 10, 2014 at 6:21 pm
    Thank you very much.
ash said

June 3, 2014 at 10:06 pm
It was good to know. Do you happen to know the process of starting 11gr2 crs/asm as active while 12c cluster passive or standby until you decide to start 12c cluster and stop 11gr2. On same machine?

Reply
- Riyaj Shamsudeen said
  
  June 3, 2014 at 10:22 pm
  Hello
  Thanks for reading. So, you are trying to install 12c clusterware, but not start using that 12c version until later time. Do I understand your question correctly?
  
  If so, you can install the software only, and not run the rootupgrade.sh script. That should keep the cluster in 11gR2 until you are ready to switch over. However, considering that it doesn’t take too long to install 12c GI software, I wouldn’t recommend taking that unnecessary risk, however, minimal it is.
  
  Cheers
  Riyaj
  
  Reply
  - ash said
    
    June 5, 2014 at 12:49 am
    Hi
    thank you. Yes but configuaring too.
    Basically I have 11gr2 GI AND WOULD LIKE TO INSTALL and configure 12CR1 GI ON THE SAME CLUSTER (2 NODE). THEN I CAN SWITCH AND SWITCHBACK OF VERSION, 11.2 TO 12.1 AND LIKE VERSA. Therefore, what are the common cluster files that can be move before 12.1 GI INSTALL, configure before overnight on them current configuration; let says 11.2. I have identify those files, at least 10, but not sure what will be the conflict can occur when stopping 12c cluster and starting 11gr2 cluster on same cluster using same owner and network. Better practices to have new set up on Ssame GI EXCLUDIND host name and IP. Anyway, I was trying to find a clue from your latest book “expert oracle 12c RAC.” Provided files name to be move after stop and before start crs based on current version to either higher or lower clusterware:
    inittab oratab ohasd init.ohasd ocr.loc olr.loc /etc/rc3d/S9ohasd K16ohasd plus some files from /var/tmp/.oracle. let me know if I am missing any. So far my cluster verification successful and will install 12cr1 GI tomorrow. Additional, I have created ASM LUN for the ocr and vote disk based on higher version to be read. At least avoiding ocr/vote disk conflict for starting up crs.
    Thank you
  - Riyaj Shamsudeen said
    
    June 5, 2014 at 7:58 pm
    I can’t recollect any other files that you need to backup. As long as, voting disk and OCR is different devices, you may be able to do this. However, I have not done this exercise personally, and so, I can’t confirm it will work.
    
    I will be interested to know how this works out. Thanks..
ash said

June 6, 2014 at 1:10 am
Hi
Thank you. Yes!! it is a possible to do it and will provide a steps ” how to do it.” After backing of listed files, was able to install and config the 121 GI. So, I am able to swap 11.2 GI to 12.1 GI, no issue reported. For ocr and vote disk, have created separate disk group on 121; therefore, no conflict between 112 and 121 ocr vote disk. One thing I noticed, the diskgroup that was created on 112 was dismounted and expected on 121 cluster/ASM but their asm compatibility is 0.0.0.0.0 on version 12.1.0.1. Not sure about this directort, /var/tmp/.oracle, would be a conflict when I will swap 121 to 112 since duting installation of clustereware, had received warning and had to move to other location. /usr/local/bin also have to back it up before it over write by root.sh on both occasion GI and RDBMS since they are on common location. I am may not be worry to much since this is just the setting up the env.
So it seems that whatever files have moved, it turn out to be correct conflict files which brought up the 121 GI without any issue including RDBMS. I will follow up tomorrow for 121 to 112 swapping.

Reply
ash said

June 6, 2014 at 7:54 pm
Hi
so I was able to conduct a swap release successful. No issue while bring the crs one at a time. The listed files are correct to make it happen. Steps are simple: shut down the cluster/crs. Move those files in secur location. Allocate new disk and use ASMLIB to label the LUN. Crate now mount point for new version. I used same user id for 12c install and conf. Start install and config 12c clusterware. Then install and config db including PDB. Do some sanity check.
Shutdown the new release and backup those files.
Overwrite those files using 11gr2 bkup files top of 12cr1 files and Setup 11gr2 env. Startup the crs of 11gr2 version. And you can do same step from 12cr1 to 11gr2.
thank you.
-ashish

Reply
- Riyaj Shamsudeen said
  
  June 6, 2014 at 9:06 pm
  Hi
  Thank you for sharing the details. It will be immensely useful for next person trying this task.
  Excellent job!
  
  Reply
Ash said

June 7, 2014 at 1:48 pm
Thank you. Yes, these was my another best effort on 12c setup. First deployment about the plugin PDB from (noc-cdb and from CDB) using RMAN active feature having complex envs setup including Data Guard and Golden Gate. As reader, i always look deep into, page by page word by word, in any related articles and books for new concept to accomplish given tasks. I wish i can share those 50 steps; but can’t upload or attach the doc since am working on Finance env. Again, Thank you and will continue reading your book.
-Ashish

Reply
Alex said

October 23, 2014 at 11:09 am
Here is a good solution for this problem.
http://gumpx.wordpress.com/tag/clsrsc-214-failed-to-start-ohasd/

Reply
joulurunot said

November 18, 2014 at 6:58 am
Yhteisö pakettiauto metso varjostaa varsova optimistinen tampella koheta
suomennos taittua vyöhyke kustaa.

Reply
Heike said

May 20, 2015 at 12:00 pm
Hey,
very interesting!
will it give a new Version of the book “Expert RAC Practices 12c” for RAC-release 12cR2?

Reply
Chowdary said

June 8, 2015 at 8:27 pm
Hi,

One of my OEL 6 server, while rebboting it using run level 3, OHASD is coming up without any issues, but when using run level 5, OHASD is not starting. However, tried starting OHASD manuaaly, and it worked. So could you please let me know if there is any issue with respect to the run level 5 in OEL 6.

Thanks,

Reply
easyoradba said

August 21, 2015 at 10:48 pm
Reblogged this on EasyOraDBA | Shadab Mohammad.

Reply

Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Categories

Archives

Click image for details:

Links for training, scripts etc

Blogroll oracle

Blogroll