Can HPOV monitor server hung state ?


 
Thread Tools Search this Thread
Special Forums UNIX and Linux Applications Infrastructure Monitoring Can HPOV monitor server hung state ?
# 8  
Old 06-27-2019
You must develop your own detection algorithms.

Nothing is difficult if you put your mind to it.

Your brain has 100 billion neurons. That is more neurons in your brain than all the stars in our galaxy.

Algorithms are not difficult!

Just do it.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Flow control state changed in server logs

Hi, We observe below logs from switch - the database servers rebooted becaause they couldn't do I/O on vfiler -Any pointers looking at below logs please? Switch logs: 2016 Apr 30 07:41:16.729 EAG-ECOM-POD111GPU-SWF1 %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet152/1/8 is down (Link... (0 Replies)
Discussion started by: admin_db
0 Replies

2. Linux

Server hung, is this a stack trace?

Hi everyone, Our Red Hat server hung yesterday, and I managed to log into the console and see the following message: RIP: 0010: mwait_idle_with_hints+0x66/ 0x67 RSP: 0018:ffffffff80457f40 EFLAGS: 00000046 RAX: 0000000000000010 RBX: ffff810c20075910 RCX: 0000000000000001 RDX:... (6 Replies)
Discussion started by: badoshi
6 Replies

3. Red Hat

How to find the process which is caused system hung state?

when system is hung state due to swap, we will reboot it through ILO. i want to know which process caused system hung. (1 Reply)
Discussion started by: Naveen.6025
1 Replies

4. SuSE

Server hung with firmware error

Hi all We've had an issue over the weekend when one of the SUSE Linux Enterprise Server 11 hung and had to be rebooted. The thing is that I got the ticket alert for a FS exceeding its usage at about 22:41:49 PM on 23 March. I checked the dmesg, the messages log and the boot.msg but all I found... (1 Reply)
Discussion started by: hedkandi
1 Replies

5. Shell Programming and Scripting

How can i get the state of weblogic server

Hi, all Now i want write a shell to get the state of weblogic server,and when the Managed Server's state is not ok, after 3 times checking, i will send msg to the system administrator by sms. BTW, my environment is : Linux ,Redhat 5.4 64bit weblogic version: 10.3.3 the count number... (1 Reply)
Discussion started by: wangsk
1 Replies

6. Solaris

HPOV in solaris

Good Day all i have a solaris 8 server and i want the procedure for how to install HPOV becuse dont have any small info about solaris . (1 Reply)
Discussion started by: thecobra151
1 Replies

7. Solaris

How to clear maintenance state for apache2 server?

Can any one of you suggest me the method to get apache server in online from maintenance mode. I tried in the following way, but couldn't get that service to online. bash-3.00# svcs -a | grep apache legacy_run 9:51:55 lrc:/etc/rc3_d/S50apache offline 9:51:22... (3 Replies)
Discussion started by: Sesha
3 Replies

8. HP-UX

ssh session getting hung (smilar to hpux telnet session is getting hung after about 15 minutes)

Our network administrators implemented some sort of check to kill idle sessions and now burden is on us to run some sort of keep alive. Client based keep alive doesn't do a very good job. I have same issue with ssh. Does solution 2 provided above apply for ssh sessions also? (1 Reply)
Discussion started by: yoda9691
1 Replies

9. UNIX for Advanced & Expert Users

close_wait connections causing a server to hung

Hi Guys, Just wondering if anyone of you have been in a situation where you end up having around 100 close_wait connections and seems to me those connections are locking up resources/processes in the server so unless the server is rebooted those processes won't be released by the close_wait... (3 Replies)
Discussion started by: hariza
3 Replies

10. HP-UX

Server hung

So my server was hung when I came in this morning. It was responding to pings, but the console and telnet sessions would not respond. There was no disk activity. The display said FA1F which I discovered that the "A" represents a high CPU load. I tired several things to get it going but was forced... (6 Replies)
Discussion started by: biznatch
6 Replies
Login or Register to Ask a Question
scdpm(1M)						  System Administration Commands						 scdpm(1M)

NAME
scdpm - manage disk path monitoring daemon SYNOPSIS
scdpm [-a] {node | all} scdpm -f filename scdpm -m {[node | all][:/dev/did/rdsk/]dN | [:/dev/rdsk/]cNtXdY | all} scdpm -n {node | all} scdpm -p [-F] {[node | all][:/dev/did/rdsk/]dN | [/dev/rdsk/]cNtXdY | all} scdpm -u {[node | all][:/dev/did/rdsk/]dN | [/dev/rdsk/]cNtXdY | all} DESCRIPTION
Note - Beginning with the Sun Cluster 3.2 release, Sun Cluster software includes an object-oriented command set. Although Sun Cluster software still supports the original command set, Sun Cluster procedural documentation uses only the object-oriented command set. For more infor- mation about the object-oriented command set, see the Intro(1CL) man page. The scdpm command manages the disk path monitoring daemon in a cluster. You use this command to monitor and unmonitor disk paths. You can also use this command to display the status of disk paths or nodes. All of the accessible disk paths in the cluster or on a specific node are printed on the standard output. You must run this command on a cluster node that is online and in cluster mode. You can specify either a global disk name or a UNIX path name when you monitor a new disk path. Additionally, you can force the daemon to reread the entire disk configuration. You can use this command only in the global zone. OPTIONS
The following options are supported: -a Enables the automatic rebooting of a node when all monitored disk paths fail, provided that the following conditions are met: o All monitored disk paths on the node fail. o At least one of the disks is accessible from a different node in the cluster. You can use this option only in the global zone. Rebooting the node restarts all resource and device groups that are mastered on that node on another node. If all monitored disk paths on a node remain inaccessible after the node automatically reboots, the node does not automatically reboot again. However, if any monitored disk paths become available after the node reboots but then all monitored disk paths again fail, the node automatically reboots again. You need solaris.cluster.device.admin role-based access control (RBAC) authorization to use this option. See rbac(5). -F If you specify the -F option with the -p option, scdpm also prints the faulty disk paths in the cluster. The -p option prints the cur- rent status of a node or a specified disk path from all the nodes that are attached to the storage. -f filename Reads a list of disk paths to monitor or unmonitor in filename. You can use this option only in the global zone. The following example shows the contents of filename. u schost-1:/dev/did/rdsk/d5 m schost-2:all Each line in the file must specify whether to monitor or unmonitor the disk path, the node name, and the disk path name. You specify the m option for monitor and the u option for unmonitor. You must insert a space between the command and the node name. You must also insert a colon (:) between the node name and the disk path name. You need solaris.cluster.device.admin RBAC authorization to use this option. See rbac(5). -m Monitors the new disk path that is specified by node:diskpath. You can use this option only in the global zone. You need solaris.cluster.device.admin RBAC authorization to use this option. See rbac(5). -n Disables the automatic rebooting of a node when all monitored disk paths fail. You can use this option only in the global zone. If all monitored disk paths on the node fail, the node is not rebooted. You need solaris.cluster.device.admin RBAC authorization to use this option. See rbac(5). -p Prints the current status of a node or a specified disk path from all the nodes that are attached to the storage. You can use this option only in the global zone. If you also specify the -F option, scdpm prints the faulty disk paths in the cluster. Valid status values for a disk path are Ok, Fail, Unmonitored, or Unknown. The valid status value for a node is Reboot_on_disk_failure. See the description of the -a and the -n options for more information about the Reboot_on_disk_failure status. You need solaris.cluster.device.read RBAC authorization to use this option. See rbac(5). -u Unmonitors a disk path. The daemon on each node stops monitoring the specified path. You can use this option only in the global zone. You need solaris.cluster.device.admin RBAC authorization to use this option. See rbac(5). EXAMPLES
Example 1 Monitoring All Disk Paths in the Cluster Infrastructure The following command forces the daemon to monitor all disk paths in the cluster infrastructure. # scdpm -m all Example 2 Monitoring a New Disk Path The following command monitors a new disk path.All nodes monitor /dev/did/dsk/d3 where this path is valid. # scdpm -m /dev/did/dsk/d3 Example 3 Monitoring New Disk Paths on a Single Node The following command monitors new paths on a single node. The daemon on the schost-2 node monitors paths to the /dev/did/dsk/d4 and /dev/did/dsk/d5 disks. # scdpm -m schost-2:d4 -m schost-2:d5 Example 4 Printing All Disk Paths and Their Status The following command prints all disk paths in the cluster and their status. # scdpm -p schost-1:reboot_on_disk_failure enabled schost-2:reboot_on_disk_failure disabled schost-1:/dev/did/dsk/d4 Ok schost-1:/dev/did/dsk/d3 Ok schost-2:/dev/did/dsk/d4 Fail schost-2:/dev/did/dsk/d3 Ok schost-2:/dev/did/dsk/d5 Unmonitored schost-2:/dev/did/dsk/d6 Ok Example 5 Printing All Failed Disk Paths The following command prints all of the failed disk paths on the schost-2 node. # scdpm -p -F all schost-2:/dev/did/dsk/d4 Fail Example 6 Printing the Status of All Disk Paths From a Single Node The following command prints the disk path and the status of all disks that are monitored on the schost-2 node. # scdpm -p schost-2:all schost-2:reboot_on_disk_failure disabled schost-2:/dev/did/dsk/d4 Fail schost-2:/dev/did/dsk/d3 Ok EXIT STATUS
The following exit values are returned: 0 The command completed successfully. 1 The command failed completely. 2 The command failed partially. Note - The disk path is represented by a node name and a disk name. The node name must be the host name or all. The disk name must be the global disk name, a UNIX path name, or all. The disk name can be either the full global path name or the disk name: /dev/did/dsk/d3 or d3. The disk name can also be the full UNIX path name: /dev/rdsk/c0t0d0s0. Disk path status changes are logged with the syslogd LOG_INFO facility level. All failures are logged with the LOG_ERR facility level. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWsczu | +-----------------------------+-----------------------------+ |Stability |Evolving | +-----------------------------+-----------------------------+ SEE ALSO
Intro(1CL), cldevice(1CL), clnode(1CL), attributes(5) Sun Cluster System Administration Guide for Solaris OS Sun Cluster 3.2 22 Jun 2006 scdpm(1M)