Oracle database internals by Riyaj

Discussions about Oracle performance tuning, RAC, Oracle internal & E-business suite.

Reverse Path Filtering and RAC

Posted by Riyaj Shamsudeen on June 1, 2012

This is a quick note about reverse path filtering and impact of that feature to RAC. I encountered an interesting problem recently with a client and it is worth blogging about it, with a strong hope that it might help one of you in the future.

Problem

Environment is 11.2.0.2 GI, Linux 5.6. In a 3 node cluster, Grid Infrastructure (GI) comes up cleanly in just one node, but never comes up in other nodes. If we shutdown GI in first node, we can start the GI in second node with no issues. Meaning, GI can be up in just one node at any time.

System Admins indicated that there are no major changes, only few bug fixes. Seemingly, problem started after those bug fixes. But there were few other changes to the environment /init.ora parameter change etc. So, the problem was not immediately attributable to just OS changes.


Analysis

Reviewing the GI alert log file, It was evident that CSSD daemon was not joining the cluster. CSSD log files indicated an Error message as “Other_node has Disk HB, but no Network HB”, implying that problem is with network layer. Normal checks such as ping, traceroute etc to all other nodes are successful ( and network admin/sysadmin simply said that this is an Oracle issue as the ping/traceroute is working fine).

Update 1: An Important note, After reading Brian’s comment below, I decided to clarify my blog entry. “Other_node has Disk HB, but no Network HB” error can happen for many reasons, but almost all those reasons will distill down to some type of network configuration issue. Essentially, this error means that network packets or multicast packets are not flowing through properly between the nodes. In this entry, I am discussing JUST ONE of that reason. If you encounter “Other_node has Disk HB, but no Network HB” error, you should review your network configuration carefully and review note 1054902.1 “How to Validate Network and Name Resolution Setup for the Clusterware and RAC “. [ Multicast issues are less prevalent (almost non-existent) in 11.2.0.3 version though as the software handles the multicast issues beautifully now ( essentially, tries 230.x.x.x IP range and then 224.x.x.x IP range automatically) ].

Time for advanced tools! With tcpdump and wireshark, I was able to see that packets were leaving the surviving node, but not received in the other node (and vice versa). Also checked the packets in the switch (port mirroring) and could see that packets are flowing through the switch with no issues..

Why would the packets received in the interface will not show up in the wireshark output? Kernel must be somehow filtering the packets.
At this point, we need to prove that packets are thrown away by the kernel. Interestingly named log_martians kernel parameter came handy. After changing the parameters net.ipv4.conf.eth3.log_martians and net.ipv4.conf.eth4.log_martians to 1, System admins confirmed that packets were disregarded by the kernel.

Started reviewing the sysctl.conf and comparing with old copy of sysctl.conf, there are no notable differences between the files for the past few weeks. So, no kernel parameter change either.

A puzzle!

I was expecting to see some kernel parameter change that would tell the Kernel to filter the packets, such as firewall etc. Not seeing any change, I was baffled by the mystery.

Finally, decided to review all OS changes. A notable change from OS point of view stuck out: Kernel was upgraded from 2.6.18 to 2.6.32. While that doesn’t look like a major change, it is relevant since we know that Kernel is throwing away packets for some reason.

Then, I recollected seeing a note about 2.6.32 in MOS and searched for 2.6.32 string. Note 1286796.1 was exactly what I was remembering ” rp_filter for multiple private interconnects and Linux Kernel 2.6.32″.

Reverse Path Filtering

Reverse Path Filtering (RPF) is a security feature, if the reply of a packet may not go through the interface it was received on, that the kernel can throw away the packets. Ironically, this is not a new feature, just that 2.6.32 kernel fixed a bug and so, RPF started to work. This bug fix in 2.6.32 kernel affects private interconnect traffic.

Solution was simple, disable RPF for private interfaces. Modify /etc/sysctl.conf and add following two kernel parameter and then perform sysctl -p (Read that ML note (1286796.1) for complete description).
net.ipv4.conf.eth3.rp_filter=2
net.ipv4.conf.eth4.rp_filter=2

I wish that this is documented better so that this weird problem can be avoided. I also want to make it clear that not all CSSD heart beat issues can be attributed to this RPF. It is just that this client was unfortunate enough to encounter this issue.

As a side node, I have also scheduled next RAC training class in Aug/Sept 2012:
Advanced RAC_Training

Update 1: Fixing the link for training.

11 Responses to “Reverse Path Filtering and RAC”

  1. Gayatri said

    Nice

  2. There are a number of issues that can give the “Other_node has Disk HB, but no Network HB” error message, and as you stated, all of them are network related. I once had a bad network configuration where my private interconnect network cards were not on the same device number. The result was the same. And in 11gR2, Oracle started using multicasting and if your multicasting does not work, or you do not apply the multicast patch to your GI home, then you can get this same error message. I figured I would post these possible causes in case someone Googled for the error message and had a problem other than reverse-path filtering. Thanks!

  3. Yi Lin said

    Riyay,
    Thanks for sharing. I ran into a similar problem with my 11202 2-node RAC on OEL 5.5 (2.6.32). After some google and MOS searches, I found out that my problem was caused by multicast settings. According to MOS 1212703.1, the fix is to ” turn off IGMP Snooping on a VLAN” in Cisco router:
    no ip igmp snooping

    • Hello Yi Lin

      Thanks for reading my blog. Yep, multicast issue in 11.2.0.2 is a much bigger issue, and I have encountered many issues with multicast setup problems. At this time, I recommend my clients to upgrade GI to 11.2.0.3 version (I do agree that you would have encountered multicast problem even in 11.2.0.3 if IGMP snooping is ON in the switch).

      Cheers
      Riyaj

  4. Andreas Krüger said

    ran into exactly the same issue and solved it by inserting the “net.ipv4.conf.ethX.rp_filter=2” bit into sysctl.config as you suggested

    uname -a
    Linux rac1 2.6.39-200.29.2.el6uek.x86_64 #1 SMP Sat Jul 14 10:50:56 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

    Thank you!

  5. Mark Bobak said

    Hmm…I was running the Oracle provided ‘raccheck’ script, on my preprod RAC cluster (5 nodes, Linux x86-64, RHEL5.7, running 11.2.0.2.3).

    It identified that we have ‘rp_filter’ set to ‘1’, but, this is a *working* cluster. So, this is only a problem if you have multiple (non-bonded) private interfaces? We have four Gig-E cards, bonded (O/S level bonding) into two pairs (bond0 and bond1), one designated public and one designated private.

    So, this rp_filter issue isn’t a problem for us, because we only have one private interface configured?

    • Hello Mark
      Thanks for reading my blog. I think, bonding driver provides a MAC address always and so, this problem is probably masked in your environment (just a speculation though). Either way, you are probably better off modifying rp_filter to 2 for private interfaces. Of course, change this parameter when you perform next reboot of the server since the environment is up and running at this time.

      Cheers
      Riyaj

  6. ira.easter said

    Thanks for your marvelous posting! I genuinely enjoyed reading it, you are a great author.
    I will ensure that I bookmark your blog and may come back very soon.
    I want to encourage continue your great job, have a nice weekend!

  7. Antti Backman said

    Great article, too bad I found it right after I managed to fix the issue!

Leave a reply to Riyaj Shamsudeen Cancel reply