Friday, August 7, 2009

Monitoring Oracle DB in a HACMP Active/Passive Cluster

The following script is one that I developed to be executed by HACMP every 10 seconds to make sure that the database is up and running:

#!/usr/bin/bash

DATE=`date +%d%m%y` # Record the time to add a time stamp to the log
LOG=/home/hacmp/logs/sblmon_${DATE}.log # Write the time stamp

if [ -f /home/hacmp/lock ]; then # If a file named "lock" exists, then disable the scripts
echo `date` >> $LOG
echo "==================================================" >> $LOG
echo "Oracle Monitoring Script Disabled" >> $LOG
exit 0
fi

export smon=`ps -ef | grep -v grep | grep -c ora_smon_db`
export lsnr=`ps -ef | grep -v grep | grep -c LISTENER`
export pmon=`ps -ef | grep -v grep | grep -c ora_pmon_db`
export lgwr=`ps -ef | grep -v grep | grep -c ora_lgwr_db`
export dbw0=`ps -ef | grep -v grep | grep -c ora_dbw0_db`

# The below statement is to execute the sql_check_rw file (inside /oracle which is the home directory of user oracle) which contains the SQL statement needed to check the DB OPEN_MODE

export dbstatus=`su - oracle <. .profile
./sql_check_rw
EOF`

dbstatus=$(echo ${dbstatus:1}) # To remove the first character (as the string is preceded by an extra special character)

echo `date` >> $LOG
echo "==================================================" >> $LOG


if [ $dbw0 -eq 0 ]; then
echo "dbw0 failed" >> $LOG
exit 1 # Process doesn't exist
elif [ $lsnr -eq 0 ]; then
echo "lsnr failed" >> $LOG
exit 1
elif [ $pmon -eq 0 ]; then
echo "pmon failed" >> $LOG
exit 1
elif [ $lgwr -eq 0 ]; then
echo "lgwr failed" >> $LOG
exit 1
elif [ $smon -eq 0 ]; then
echo "smon failed" >> $LOG
exit 1
elif [ "$dbstatus" != "READ WRITE" ]; then
echo "DB OPEN_MODE is not READ WRITE" >> $LOG
exit 1
else
echo "Success" >> $LOG
exit 0 # All monitored processes are running and database is in "READ WRITE" OPEN_MODE
fi

No comments: