Broken Centos Cluster Post: 302925481

Sponsored Content

Operating Systems Linux Red Hat Broken Centos Cluster Post 302925481 by gull04 on Monday 17th of November 2014 12:15:46 PM

11-17-2014

Registered User

Broken Centos Cluster

Hi Guys,

Hopefully this is just a quick one - but you never know.

I have/had a Centos Cluster running a Netbackup server - I've had an outage and we seem to have lost a node. As a consequence I'm in a bit of a quandary, not familiar with this software either.

The server is a Dell PowerEdge 1950 running Centos 5.4 with the kernel 2.6.18-164.11.1.el5PAE #1 SMP and wait for it a back ported GFS for compatibility.

I've managed to get the system back and the GFS disk mounted by hacking the /etc/cluster/cluster.conf file as follows - the original file first.

Code:

<?xml version="1.0"?>
<cluster alias="scsymbak00" config_version="93" name="scsymbak00">
        <fence_daemon clean_start="0" post_fail_delay="1" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="scsymbak01.xxx.com" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="scsymbak01_drac"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="scsymbak02.xxx.com" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="scsymbak02_drac"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_drac" ipaddr="192.168.0.201" login="root" name="scsymbak01_drac" passwd="drut"/>
                <fencedevice agent="fence_drac" ipaddr="192.168.0.202" login="root" name="scsymbak02_drac" passwd="drut"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="scsymbak_fd" ordered="1" restricted="1">
                                <failoverdomainnode name="scsymbak01.xxx.com" priority="2"/>
                                <failoverdomainnode name="scsymbak02.xxx.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.143.252.200" monitor_link="1"/>
                        <script file="/etc/init.d/nbclient" name="nbclient_init"/>
                        <script file="/etc/init.d/netbackup" name="netbackup_init"/>
                        <clusterfs device="/dev/mapper/VolGroup10-DATA" force_unmount="1" fsid="41517" fstype="gfs2" mountpoint="/data" name="symbak_GFS"/>
                        <lvm lv_name="DATA" name="VolGroup10_DATA_CLVM2" vg_name="VolGroup10"/>
                        <script file="/etc/init.d/xinetd" name="xinetd_init"/>
                        <script file="/etc/init.d/vxpbx_exchanged" name="vxpbx_init"/>
                        <ip address="10.143.224.200" monitor_link="1"/>
                        <ip address="10.143.226.200" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="scsymbak_fd" exclusive="0" name="netbackup_srv" recovery="restart">
                        <ip ref="10.143.224.200"/>
                        <ip ref="10.143.226.200"/>
                        <ip ref="10.143.252.200"/>
                        <script ref="vxpbx_init"/>
                        <script ref="xinetd_init"/>
                </service>
        </rm>
</cluster>

This was changed to;

Code:

<?xml version="1.0"?>
<cluster alias="scsymbak00" config_version="93" name="scsymbak00">
        <fence_daemon clean_start="0" post_fail_delay="1" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="scsymbak02.xxx.com" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="scsymbak02_drac"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="0"/>
        <fencedevices>
                <fencedevice agent="fence_drac" ipaddr="192.168.0.202" login="root" name="scsymbak02_drac" passwd="drut"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="scsymbak_fd" ordered="1" restricted="1">
                                <failoverdomainnode name="scsymbak02.xxx.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.143.252.200" monitor_link="1"/>
                        <script file="/etc/init.d/nbclient" name="nbclient_init"/>
                        <script file="/etc/init.d/netbackup" name="netbackup_init"/>
                        <clusterfs device="/dev/mapper/VolGroup10-DATA" force_unmount="1" fsid="41517" fstype="gfs2" mountpoint="/data" name="symbak_GFS"/>
                        <lvm lv_name="DATA" name="VolGroup10_DATA_CLVM2" vg_name="VolGroup10"/>
                        <script file="/etc/init.d/xinetd" name="xinetd_init"/>
                        <script file="/etc/init.d/vxpbx_exchanged" name="vxpbx_init"/>
                        <ip address="10.143.224.200" monitor_link="1"/>
                        <ip address="10.143.226.200" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="scsymbak_fd" exclusive="0" name="netbackup_srv" recovery="restart">
                        <ip ref="10.143.224.200"/>
                        <ip ref="10.143.226.200"/>
                        <ip ref="10.143.252.200"/>
                        <script ref="vxpbx_init"/>
                        <script ref="xinetd_init"/>
                </service>
        </rm>
</cluster>

When I run clustat I see the following.

Code:

Cluster Status for scsymbak00 @ Mon Nov 17 16:55:28 2014
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 scsymbak02.xxx.com                                            1 Online, Local, rgmanager

 Service Name                                                     Owner (Last)                                                     State
 ------- ----                                                     ----- ------                                                     -----
 service:netbackup_srv                                            (none)                                                           stopped

Although the disks have come back, the cluster doesn't seem to be up - is there anything else that I should be looking at. The networking hasn't started properly as I'm not seeing the clustered IP's so here is the output of ifconfig.

Code:

[root@scsymbak02 cluster]# ifconfig -a
bond0     Link encap:Ethernet  HWaddr 00:1E:C9:AB:BB:11
          inet addr:10.143.252.202  Bcast:10.143.253.255  Mask:255.255.254.0
          inet6 addr: fe80::21e:c9ff:feab:bb11/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:41863 errors:0 dropped:22325 overruns:0 frame:0
          TX packets:47278 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:3704818 (3.5 MiB)  TX bytes:32203049 (30.7 MiB)

bond0:1   Link encap:Ethernet  HWaddr 00:1E:C9:AB:BB:11
          inet addr:192.168.0.102  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond1     Link encap:Ethernet  HWaddr 00:1B:21:18:29:68
          inet addr:10.143.224.202  Bcast:10.143.225.255  Mask:255.255.254.0
          inet6 addr: fe80::21b:21ff:fe18:2968/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:9768 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14251 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:694596 (678.3 KiB)  TX bytes:847554 (827.6 KiB)

bond2     Link encap:Ethernet  HWaddr 00:1B:21:18:29:69
          inet addr:10.143.226.202  Bcast:10.143.227.255  Mask:255.255.254.0
          UP BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

eth0      Link encap:Ethernet  HWaddr 00:1E:C9:AB:BB:11
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:30568 errors:0 dropped:11059 overruns:0 frame:0
          TX packets:21803 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2808757 (2.6 MiB)  TX bytes:11663865 (11.1 MiB)
          Interrupt:177 Memory:f8000000-f8012800

eth1      Link encap:Ethernet  HWaddr 00:1E:C9:AB:BB:13
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:11295 errors:0 dropped:11266 overruns:0 frame:0
          TX packets:25475 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:896061 (875.0 KiB)  TX bytes:20539184 (19.5 MiB)
          Interrupt:169 Memory:f4000000-f4012800

eth2      Link encap:Ethernet  HWaddr 00:1B:21:18:29:68
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:4758 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7181 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:339738 (331.7 KiB)  TX bytes:426270 (416.2 KiB)
          Memory:fd2e0000-fd300000

eth3      Link encap:Ethernet  HWaddr 00:1B:21:18:29:69
          UP BROADCAST SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:fd2a0000-fd2c0000

eth4      Link encap:Ethernet  HWaddr 00:1B:21:18:29:6C
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:5010 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7070 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:354858 (346.5 KiB)  TX bytes:421284 (411.4 KiB)
          Memory:fcce0000-fcd00000

eth5      Link encap:Ethernet  HWaddr 00:1B:21:18:29:6D
          UP BROADCAST SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:fcca0000-fccc0000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:7247 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7247 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:971488 (948.7 KiB)  TX bytes:971488 (948.7 KiB)

sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

So I guess the question is, what should I start with - if I want to boot this cluster as single node - how should I go about it. Are there any other changes that I should make to the cluster.conf file? Or are there any other files that I should be changing as well - any help here would be really appreciated.

Unfortunately I have a dentists appointment, but I'll be back online a little later - but out of the office. However if there are any other files I have to look at or change, I'll be doing that first thing in the morning.

Regards

Dave

gull04

View Public Profile for gull04

Visit gull04's homepage!

Find all posts by gull04

9 More Discussions You Might Find Interesting

1. High Performance Computing

Building a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris

Provides a description of how to set up a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris. More...

2. High Performance Computing

SUN Cluster Vs Veritas Cluster

Dear All, Can anyone explain about Pros and Cons of SUN and Veritas Cluster ? Any comparison chart is highly appreciated. Regards, RAA

3. Solaris

Sun cluster and Veritas cluster question.

Yesterday my customer told me to expect a vcs upgrade to happen in the future. He also plans to stop using HDS and move to EMC. Am thinking how to migrate to sun cluster setup instead. My plan as follows leave the existing vcs intact as a fallback plan. Then install and build suncluster on...

4. Red Hat

Centos/rhel 5 cluster 3 nodes with out Quorum

Hi all, i have 3 nodes cluster (Centos 5 cluster suit) with out quorum disk, node vote = 1, the value of a quorum = 2, when 2 nodes going offline, cluster services are destoys. How i can save the cluster and all services(move all services to one alive node) with out quorum disk when other...

5. Solaris

Sun cluster 4.0 - zone cluster failover doubt

Hello experts - I am planning to install a Sun cluster 4.0 zone cluster fail-over. few basic doubts. (1) Where should i install the cluster s/w binaries ?. ( global zone or the container zone where i am planning to install the zone fail-over) (2) Or should i perform the installation on...

6. Red Hat

How to Upgrade Centos 5.7 using Centos 5.8 ISO image on Vmware workstation

Dear Linux Experts, On my windows 7 desktop with the help of Vmware workstation (Version 7.1), created virtual machine and installed Centos 5.7 successfully using ISO image. Query : Is this possible to upgrade the Centos 5.7 using Centos 5.8 ISO image to Centos version 5.8?.. if yes kindly...

7. UNIX for Advanced & Expert Users

CentOS 6.8 with Rocks Cluster: ldconfig is not a symbolic link errors

Any help appreciated just logging in to this server which is a front end for Rocks Cluster 6.1.1. Getting the below errors: ldconfig ldconfig: /usr/lib/libX11.so.6 is not a symbolic link ldconfig: /usr/lib/libjpeg.so.62 is not a symbolic link ldconfig: /usr/lib/libpng12.so.0 is not a symbolic...

8. UNIX for Beginners Questions & Answers

Problem with cluster on centos 6.5

Hallo to everyone.From sometime i have problems with my asterisk pbx on cent os which is in corosync cluster mode with resource groups.The problem is for time the cluster just swap the active one with the other and the only messages in \var\log\messages is that bond0:link status defenitly down for...

9. Solaris

NIS broken after installing patch cluster on Solaris 10

Hi, I installed Solaris 10 recommended patch cluster (patch bundle of 400+ patches). After reboot global zone is fine, but a lot of services was in uninitialized state. I had to run /lib/svc/bin/restore_repository and then services came online. Now I can't login to server with NIS account...

9 More Discussions You Might Find Interesting

1. High Performance Computing

Building a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris

Discussion started by: Linux Bot

2. High Performance Computing

SUN Cluster Vs Veritas Cluster

Discussion started by: RAA

3. Solaris

Sun cluster and Veritas cluster question.

Discussion started by: sparcguy

4. Red Hat

Centos/rhel 5 cluster 3 nodes with out Quorum

Discussion started by: Flomaster

5. Solaris

Sun cluster 4.0 - zone cluster failover doubt

Discussion started by: NVA

6. Red Hat

How to Upgrade Centos 5.7 using Centos 5.8 ISO image on Vmware workstation

Discussion started by: Ananthcn

7. UNIX for Advanced & Expert Users

CentOS 6.8 with Rocks Cluster: ldconfig is not a symbolic link errors

Discussion started by: RobbieTheK

8. UNIX for Beginners Questions & Answers

Problem with cluster on centos 6.5

Discussion started by: evolintend

9. Solaris

NIS broken after installing patch cluster on Solaris 10

Discussion started by: ron323232