Reverse Path Filtering and RAC
Posted by Riyaj Shamsudeen on June 1, 2012
This is a quick note about reverse path filtering and impact of that feature to RAC. I encountered an interesting problem recently with a client and it is worth blogging about it, with a strong hope that it might help one of you in the future.
Environment is 126.96.36.199 GI, Linux 5.6. In a 3 node cluster, Grid Infrastructure (GI) comes up cleanly in just one node, but never comes up in other nodes. If we shutdown GI in first node, we can start the GI in second node with no issues. Meaning, GI can be up in just one node at any time.
System Admins indicated that there are no major changes, only few bug fixes. Seemingly, problem started after those bug fixes. But there were few other changes to the environment /init.ora parameter change etc. So, the problem was not immediately attributable to just OS changes.
Reviewing the GI alert log file, It was evident that CSSD daemon was not joining the cluster. CSSD log files indicated an Error message as “Other_node has Disk HB, but no Network HB”, implying that problem is with network layer. Normal checks such as ping, traceroute etc to all other nodes are successful ( and network admin/sysadmin simply said that this is an Oracle issue as the ping/traceroute is working fine).
Update 1: An Important note, After reading Brian’s comment below, I decided to clarify my blog entry. “Other_node has Disk HB, but no Network HB” error can happen for many reasons, but almost all those reasons will distill down to some type of network configuration issue. Essentially, this error means that network packets or multicast packets are not flowing through properly between the nodes. In this entry, I am discussing JUST ONE of that reason. If you encounter “Other_node has Disk HB, but no Network HB” error, you should review your network configuration carefully and review note 1054902.1 “How to Validate Network and Name Resolution Setup for the Clusterware and RAC “. [ Multicast issues are less prevalent (almost non-existent) in 188.8.131.52 version though as the software handles the multicast issues beautifully now ( essentially, tries 230.x.x.x IP range and then 224.x.x.x IP range automatically) ].
Time for advanced tools! With tcpdump and wireshark, I was able to see that packets were leaving the surviving node, but not received in the other node (and vice versa). Also checked the packets in the switch (port mirroring) and could see that packets are flowing through the switch with no issues..
Why would the packets received in the interface will not show up in the wireshark output? Kernel must be somehow filtering the packets.
At this point, we need to prove that packets are thrown away by the kernel. Interestingly named log_martians kernel parameter came handy. After changing the parameters net.ipv4.conf.eth3.log_martians and net.ipv4.conf.eth4.log_martians to 1, System admins confirmed that packets were disregarded by the kernel.
Started reviewing the sysctl.conf and comparing with old copy of sysctl.conf, there are no notable differences between the files for the past few weeks. So, no kernel parameter change either.
I was expecting to see some kernel parameter change that would tell the Kernel to filter the packets, such as firewall etc. Not seeing any change, I was baffled by the mystery.
Finally, decided to review all OS changes. A notable change from OS point of view stuck out: Kernel was upgraded from 2.6.18 to 2.6.32. While that doesn’t look like a major change, it is relevant since we know that Kernel is throwing away packets for some reason.
Then, I recollected seeing a note about 2.6.32 in MOS and searched for 2.6.32 string. Note 1286796.1 was exactly what I was remembering ” rp_filter for multiple private interconnects and Linux Kernel 2.6.32″.
Reverse Path Filtering
Reverse Path Filtering (RPF) is a security feature, if the reply of a packet may not go through the interface it was received on, that the kernel can throw away the packets. Ironically, this is not a new feature, just that 2.6.32 kernel fixed a bug and so, RPF started to work. This bug fix in 2.6.32 kernel affects private interconnect traffic.
Solution was simple, disable RPF for private interfaces. Modify /etc/sysctl.conf and add following two kernel parameter and then perform sysctl -p (Read that ML note (1286796.1) for complete description).
I wish that this is documented better so that this weird problem can be avoided. I also want to make it clear that not all CSSD heart beat issues can be attributed to this RPF. It is just that this client was unfortunate enough to encounter this issue.
Update 1: Fixing the link for training.