APPLIES TO:

Oracle Database - Enterprise Edition - Version 10.1.0.2 to 11.2.0.1 [Release 10.1 to 11.2]
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Information in this document applies to any platform.

SYMPTOMS

There are two cluster_interconnect defined for a 11.2.0.1 (or earlier) RAC database: eth0 (10.0.0.x) and eth2(10.1.1.x). Due to network maintenance, eth0 is brought down, node 2 is restarted, after that instance 2 fails to startup with the following errors while instance 1 is still up and running:

SQL> startup
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:check if cable failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini1
ORA-27303: additional information: requested interface eth0 interface not running set _disable_interface_checking = TRUE to disable this check for single instance cluster. Check output from ifcon

 

CHANGES

One of the cluster_interconnect network is brought down for maintenance.

CAUSE

For database upto 11.2.0.1 release, even 2 private network defined for cluster_interconnect, there is no failover feature available at database layer. One interface down could lead to communication problem between the instances, causing 2nd instance could not start or could not join the cluster.

When eth0 is down, the message for 2nd instance startup: "ORA-27303: additional information: requested interface eth0 interface not running" is expected.

Setting _disable_interface_checking = TRUE will only allow instance to startup nomount, but not join the cluster, it will error out with ORA-29702 "error occurred in Cluster Group Service operation" while LMON trying to join the cluster. This is because when instance 1 was started, both interfaces were available, hence ports are opened on both interfaces. Alert log of instance 1 shows:
Interface type 1 eth0 10.0.0.0 configured from GPnP Profile for use as a cluster interconnect
Interface type 1 eth2 10.1.xxx.yyy configured from GPnP Profile for use as a cluster interconnect
...
Starting up:
...
Cluster communication is configured to use the following interface(s) for this instance
 10.0.0.yyy
 10.1.xxx.yyy

When instance 2 starts, ports will only be open on eth2, it will not be able to communicate with processes which have ports opened on eth0.

SOLUTION

1. Remove the down interface eth0 from OCR:
oifcfg delif -global eth0

2. Shutdown instance 1

3. Restart both instances

Both instances should be able to start after above.

Note: this issue will not happen for 11.2.0.2+ release due to HAIP feature. HAIP will failover automatically when 1 of the underlying interface is down. It is transparent to database instances and it will not affect communication between database instances.

  • No labels