rg_move: Failure occurred


 
Thread Tools Search this Thread
Operating Systems AIX rg_move: Failure occurred
# 1  
Old 10-25-2010
rg_move: Failure occurred

Hi,

I have two node cluster.
There are 5 RG's.

3 of them on node2 and 2 of them on node1.

Verification of cluster is done without errors.

When i try to move of put offline RG i got this error for every RG:

Code:
rg_move: Failure occurred while processing Resource Group RG_XXX. Manual intervention required.

Then this RG RG_XXX goes to ERROR state and cluster is UNSTABLE.

I have tried unmount everything , varyoff vg's and run smit hacmp -> Problem Determination Tools -> Recover From HACMP Script Failure , but it didn't help.

I had to reboot both nodes to get cluster back to STABLE state.

Then I started up all RG's successfully.

Then I manually run stop script for RG XXX and all applications has been stopped successfully, then I unmounted fs , varyoffvg and tried turn off RG_XXX via hacmp but I got that error above.

Could you tell me what could be a problem that non of RG's can be turned off of moved to another node ?
Could you tell me some hints where to look or what to check ? pls

Thank you.
# 2  
Old 10-25-2010
Did you check if there is more detailed information according to the actions you did issue inside /tmp/hacmp.out (if this is still the actual path to it)?
# 3  
Old 10-25-2010
I got this from hacmp.out. Everything above ERROR is with 0 exit code.

Code:
Calling /HACMP_V5/events/after_node_down_local with parameters: node_down_local 1
Oct 25 11:10:51 EVENT COMPLETED: after_node_down_local 0

Oct 25 11:10:51 EVENT COMPLETED: node_down_local 0



***************************
Oct 25 2010 11:10:51 !!!!!!!!!! ERROR !!!!!!!!!!
***************************
Oct 25 2010 11:10:51 rg_move: Failure occurred while processing Resource Group RG_XXX. Manual intervention required.

Oct 25 11:10:52 POST EVENT COMMAND: after_rg_move node2 2 RELEASE

Calling /HACMP_V5/events/after_rg_move with parameters: rg_move 1 node2 2 RELEASE
Oct 25 11:10:52 EVENT COMPLETED: after_rg_move node2 2 RELEASE 0

Oct 25 11:10:52 EVENT COMPLETED: rg_move node2 2 RELEASE 0


Oct 25 11:10:52 POST EVENT COMMAND: after_rg_move_release node2 2

Calling /HACMP_V5/events/after_rg_move_release with parameters: rg_move_release 1 node2 2
Oct 25 11:10:52 EVENT COMPLETED: after_rg_move_release node2 2 0

Oct 25 11:10:52 EVENT COMPLETED: rg_move_release node2 2 0

                        HACMP Event Summary
Event: TE_RG_MOVE
Start time: Mon Oct 25 11:10:27 2010

End time: Mon Oct 25 11:10:52 2010

Action:         Resource:                       Script Name:
----------------------------------------------------------------------------
Releasing resource group:       RG_XXX      node_down_local
Search on: Mon.Oct.25.11:10:28.MESZ.2010.node_down_local.RG_XXX.ref
Releasing resource:     All_servers     stop_server
Search on: Mon.Oct.25.11:10:29.MESZ.2010.stop_server.All_servers.RG_XXX.ref
Resource offline:       All_nonerror_servers    stop_server
Search on: Mon.Oct.25.11:10:49.MESZ.2010.stop_server.All_nonerror_servers.RG_XXX.ref
Releasing resource:     All_service_addrs       release_service_addr
Search on: Mon.Oct.25.11:10:50.MESZ.2010.release_service_addr.All_service_addrs.RG_XXX.ref
Resource offline:       All_nonerror_service_addrs      release_service_addr
Search on: Mon.Oct.25.11:10:51.MESZ.2010.release_service_addr.All_nonerror_service_addrs.RG_XXX.ref
Error encountered with group:   RG_XXX      node_down_local
Search on: Mon.Oct.25.11:10:51.MESZ.2010.node_down_local.RG_XXX.ref
----------------------------------------------------------------------------

Oct 25 11:10:52 EVENT START: event_error 1 TE_RG_MOVE


Oct 25 11:10:52 PRE EVENT COMMAND: before_event_error 1 TE_RG_MOVE

Calling /HACMP_V5/events/before_event_error with parameters: event_error 1 TE_RG_MOVE
Oct 25 11:10:52 EVENT COMPLETED: before_event_error 1 TE_RG_MOVE 0

WARNING: Cluster node0102 Failed while running event [RG], exit status was 1
Check hacmp.out on this node for errors.
FFDC event log collection saved to /tmp/ibmsupt/hacmp/eventlogs.2010.10.25.11.10

Oct 25 11:10:52 POST EVENT COMMAND: after_event_error 1 TE_RG_MOVE

Calling /HACMP_V5/events/after_event_error with parameters: event_error 0 1 TE_RG_MOVE
Oct 25 11:10:52 EVENT COMPLETED: after_event_error 1 TE_RG_MOVE 0

Oct 25 11:10:52 EVENT COMPLETED: event_error 1 TE_RG_MOVE 0


Oct 25 11:16:27 EVENT START: config_too_long 360 TE_RG_MOVE


Oct 25 11:16:28 PRE EVENT COMMAND: before_config_too_long 360 TE_RG_MOVE

Calling /HACMP_V5/events/before_config_too_long with parameters: config_too_long 360 TE_RG_MOVE
Oct 25 11:16:28 EVENT COMPLETED: before_config_too_long 360 TE_RG_MOVE 0

FFDC event log collection saved to /tmp/ibmsupt/hacmp/eventlogs.2010.10.25.11.16
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 360 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 390 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 420 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 450 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 480 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 540 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 600 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 660 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 720 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 780 seconds. Please check cluster status.
WARNING: Cluster node0102 has been running recovery program 'TE_RG_MOVE' for 900 seconds. Please check cluster status.

The first line "Calling /HACMP_V5/events/after_node_down_local with parameters: node_down_local 1" it was run with parameter 1. Is that ok ?

Thank you for support.
# 4  
Old 10-26-2010
I think that "node_down_local 1" is just a status in between when the RG is being brought down ie. not up yet. It looks to me as if it is not the problem since that event was just completed with a parameter or status of 0, right in the next lines.

When the cluster is currently in an undefined state, I would check which VGs are maybe still active (you should know which resources network and disk wise etc. are part of that RG xxx) and which adapters are still up, to get a clue which of those might have a problem - also check if your issued scripts for that RG might have written a log somewhere (if they do at all).
There is also 2 warnings pointing to FFDC event log files under /tmp/ibmsupt/hacmp which could be also investigated with with one of the fc* commands you can find in /usr/sbin/rsct/bin like fcreport etc. You might have to check some documentation for that; google for "aix ffdc" and you'll get some IBM documentation sites.

Is there anything related in the errpt that might be helpful?

Did this cluster work after it has been tested? Have there been made changes to the hardware or software and you had not cluster tests after that?

What version of HA/CMP do you use?

Sadly the hacmp.out shows not the detailed information I hoped to see. You can check and set the verbosity of it via
smitty hacmp -> Problem Determination -> HACMP Log Viewing and Management -> Change/Show HACMP Log File Parameters -> Select your node ...

This is the path from an HA/CMP 5.3 cluster node. Later versions should have a smiliar path from what I saw.
# 5  
Old 10-27-2010
Thanks for helping.
I will try to do what you suggested above.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Mload-UTY2408 occurred while trying to record control output infomation - terminating

Hi Mload script has been running properly. Suddenly started giving error since yesterday. UTY2408 Error occurred while trying to record control output intonation - terminating. The table is loading using file but .ksh failing with above error. but same code executed on UAT without error. Now... (0 Replies)
Discussion started by: Rajesh123
0 Replies

2. Shell Programming and Scripting

Error occurred during initialization of VM

Hi , I was invoking a sh file using the nohup command. But while invoking, I received a below error. Error occurred during initialization of VM Unable to load native library: /u01/libjava.so: cannot open shared object file: No such file or directory . Could you please help out. Regards,... (2 Replies)
Discussion started by: Kamal1108
2 Replies

3. AIX

0511-193 An error occurred

Hi, When i am trying to read data from tape cassette its giving below error: tar tvf /dev/rmt0 "tar: 0511-193 An error occurred while reading from the media. A system call received a parameter that is not valid." OS: - AIX 6.1 Tape Library : - IBM TS3100 Tape Cassette : - Ultrium LTO... (1 Reply)
Discussion started by: arunmistry
1 Replies

4. Solaris

Error occurred while making the net-snmp 5.4.4 on Solaris 5.10 version.

Hi all, Error occurred while making the net-snmp-5.4.4 on Solaris 5.10 version. Environment - Solaris 5.10-x86 - Net-SNMP-5.4.4.tar.gz - Path (/etc/profile) PATH=/usr/local/bin:$PATH export PATH LD_LIBRARY_PATHUSR=/usr/ccs/bin: export LD_LIBRARY_PATH Error01 - summary ***... (3 Replies)
Discussion started by: ziosnim
3 Replies

5. Red Hat

KusuDB: Operational Error occurred when connecting to the DB

I have RHEL5.3 that is with the Platform Cluster Manger PCM installation. on master node. Unfortunately some files were deleted from the /var directory and then the postgresql service couldn't start. I have deleted, rm -rf /var/lib/pgsql/data and started the service again now the service is running... (1 Reply)
Discussion started by: ahsanpmd
1 Replies

6. UNIX for Dummies Questions & Answers

boot up failure unix sco after power failure

hi power went out. next day unix sco wont boot up error code 303. any help appreciated as we are clueless. (11 Replies)
Discussion started by: fredthayer
11 Replies

7. Shell Programming and Scripting

sort: 0653-657 A write error occurred while sorting.

Hi I am trying to sort a file of 88075743B size. I am doing some processing on the file and after the processing is done; I get 2 files temp1 and temp2. I need to combine both these files as one and this final file should be sorted on fields 1 and 2. Space is the delimiter between fields. Record... (2 Replies)
Discussion started by: diksha2207
2 Replies

8. Red Hat

rhn_register A socket error occurred

All, I'm getting the following error while I try to register the server to connect the redhat network for the updates. rhn_register updateLoginInfo() login info rhn_register A socket error occurred: (111, 'Connection refused'), attempt #1 rhn_register A socket error occurred: (111,... (6 Replies)
Discussion started by: s_linux
6 Replies
Login or Register to Ask a Question