Tuesday, December 8, 2009

Oracle CRS failure. Rebooting for cluster integrity" due to this error my CRS not able to start.

DB & CRS Version:
OS Version: Red Hat Linux 4 - 64 bit

Symptoms related to this issue as they were reported to Oracle Support have been identified as (but are not necessarily limited to):

- Cluster member reboots
- CLSOMON failing with status 13
- high cpu usage of ocssd.bin

Due to this nodes got rebooted & CRS failed

When I troubleshoot this issue found some logs from OS & crs.

Operating System Log:

Dec 7 10:57:22 babuhost4 logger: Oracle clsomon failed with fatal status 137.
Dec 7 10:57:23 babuhost4 logger: Oracle CRS failure. Rebooting for cluster integrity.
Dec 7 11:02:19 babuhost4 syslogd 1.4.1: restart.
Dec 7 11:02:19 babuhost4 syslog: syslogd startup succeeded

Cluster Log:

[ CSSD]2009-12-08 11:22:35.200 [1262557536] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2009-12-08 11:22:35.200 [1262557536] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[ CSSD]2009-12-08 11:22:35.885 [1136679264] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(3) wrtcnt(5213) LATS(86809174) Disk lastSeqNo(5213)

Operating System:

Linux babuhost4 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

CRS Health Check Failed

Checking CRS health...

Check: Health of CRS
Node Name CRS OK?
------------------------------------ ------------------------
babuhost5 yes
babuhost4 unknown
babuhost3 yes
Result: CRS health check failed.

Refer: Document ID
" 731599.1". As per document this issue occur from Oracle Server - Enterprise Edition - Version: to

Looks glibc package version lower need to upgrade higher version.

Oracle Enterprise Linux (OEL) / RHEL 4

* Problem exists with glibc-2.3.4-2.39
* Fixed in glibc-2.3.4-2.40 and above (only version -2.41 was actually released)

Current Version:

[root@babuhost4 log]# rpm -q glibc
[root@babuhost4 log]

