Cluster failure reason


 
Thread Tools Search this Thread
Operating Systems AIX Cluster failure reason
# 1  
Old 01-24-2014
Network Cluster failure reason

Hi guys !
I'm a French IT student in AIX, and i'm note very fluent in english.

I have a task : Write a script to inform the administrator if on of a cluster UC is not working. I'm not going to ask you the script ^^'

But i want to make a list of the failure reason of a cluster (network, process, ...)

I f you any link or answer it will be great
# 2  
Old 01-24-2014
Hi,

as far as I know each node of a cluster will be already monitored by the cluser software itself (Bull ARF, IBM HACMP). In case of a failure the software will inform you, at least with a mail to root. In a simple solution You have just to forward this mail (e.g. user /etc/aliases).

Regards
# 3  
Old 01-24-2014
Ya actually the final aim is not to use the IBM cluster software (HACMP, powerHA)
The aim is the detect a failure in of the cluster UC, and switch manually to another UC.

That's why, i'm making the list of all possibles reasons of a cluster UC failover.
# 4  
Old 01-24-2014
Okay,

- LAN (ping, default gateway, routes, ...)
- SAN (disk errors, failed paths, IO errors (lvm_io_fail), ..)
- rootvg (disk errors, mirroring)
- errpt (permanent hardware errors, ...)

Monitoring the errpt for permanet hardware errors is a good start Smilie

Regards
This User Gave Thanks to -=XrAy=- For This Post:
# 5  
Old 01-24-2014
Thank you,

I can i check the gatway because the ping and traceroute command don't specify the gateway

---------- Post updated at 03:41 PM ---------- Previous update was at 03:18 PM ----------

Do you know how its working the IBM "active dead gateway detection" ?
The format of the standard output ?
# 6  
Old 01-24-2014
keep in mind the server may not respond respond (frozen or crashed) in which case your monitoring has to also take in account form an external point of vue (can I ping and connect to...)
# 7  
Old 01-24-2014
I'd take a different starting point. Instead of looking for possible failures I'd define what resources have to be up and running to say that the cluster node is ok. If you keep in mind that the final goal of a cluster is to garantee the availability of a service and not the detection of errors your script may look a bit different, while the list of resources is pretty much what XrAy wrote.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to get reason for ping failure using perls Net::Ping->new("icmp");?

Hi I am using perl to ping a list of nodes - with script below : $p = Net::Ping->new("icmp"); if ($p->ping($host,1)){ print "$host is alive.\n"; } else { print "$host is unreacheable.\n"; } $p->close();... (4 Replies)
Discussion started by: tavanagh
4 Replies

2. Red Hat

Problem in RedHat Cluster Node while network Failure or in Hang mode

Hi, We are having many RedHat linux Server with Cluster facility for availability of service like HTTPD / MySQL. We face some issue while some issue related to power disturbance / fluctuation or Network failure. There is two Cluster Node configured in... (0 Replies)
Discussion started by: hirenkmistry
0 Replies

3. Solaris

Sun cluster 4.0 - zone cluster failover doubt

Hello experts - I am planning to install a Sun cluster 4.0 zone cluster fail-over. few basic doubts. (1) Where should i install the cluster s/w binaries ?. ( global zone or the container zone where i am planning to install the zone fail-over) (2) Or should i perform the installation on... (0 Replies)
Discussion started by: NVA
0 Replies

4. UNIX for Dummies Questions & Answers

boot up failure unix sco after power failure

hi power went out. next day unix sco wont boot up error code 303. any help appreciated as we are clueless. (11 Replies)
Discussion started by: fredthayer
11 Replies

5. Solaris

Sun cluster and Veritas cluster question.

Yesterday my customer told me to expect a vcs upgrade to happen in the future. He also plans to stop using HDS and move to EMC. Am thinking how to migrate to sun cluster setup instead. My plan as follows leave the existing vcs intact as a fallback plan. Then install and build suncluster on... (5 Replies)
Discussion started by: sparcguy
5 Replies

6. Solaris

Subject: Sun Cluster 3.2.2 Apache HA failure, or cludge?

I folks, season's greetings. Hope you had a good festive season. I've got 2 related problems on the same Sun Cluster 3.2.2 Apache 2.0.63 cluster: clsetup error: ERROR: Failed to get connection to node localhost SunOS... (0 Replies)
Discussion started by: cluster
0 Replies

7. High Performance Computing

SUN Cluster Vs Veritas Cluster

Dear All, Can anyone explain about Pros and Cons of SUN and Veritas Cluster ? Any comparison chart is highly appreciated. Regards, RAA (4 Replies)
Discussion started by: RAA
4 Replies

8. High Performance Computing

Building a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris

Provides a description of how to set up a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris. More... (0 Replies)
Discussion started by: Linux Bot
0 Replies
Login or Register to Ask a Question