Sponsored Content
Full Discussion: MPIO reliability
Operating Systems AIX MPIO reliability Post 302491948 by kah00na on Friday 28th of January 2011 05:30:04 PM
Old 01-28-2011
UPDATE: Sorry: The hcheck_interval idea was already mentioned by smurphy. I should have moved on to page 2.

One other thing to check is your "hcheck_interval" which is set at the disk level. The hcheck_interval tells your system how often to check, or re-check, FAILED paths and inactive ENABLED paths (in the case of "algorithm" being set to "fail_over") to ensure they are still connected and functioning. I suggest setting your hcheck_interval to 3600 (once an hour). You'll have to set this on all your disks individually. If the hcheck_interval is set to "0", then this disables it and the disk will never automatically change out of a FAILED or MISSING state.

Remember that MPIO is not like etherchannels, where it automatically re-enables all the paths as soon as the plug is back in. Something has to occur on the disk side to make it recheck them. Either the hcheck_interval comes around again, or you unplug your secondary fiber car which will cause AIX to suddenly start sending checks for all your disks down all the paths, FAILED or MISSING, and try to find a path that is working and it will set it back to ENABLED if it finds one.

Code:
hostname:/:$ lsattr -El hdisk0 | egrep "hcheck_interval"
hcheck_interval 3600                             Health Check Interval      True
hostname:/:$

Also, you can re-enable the paths manually by doing a chdev on it:
Code:
chdev -l hdisk0 -p vscsi0 -s enable

You can also see which path is being used by watching for numbers increasing in the output of "iostat -m":
Code:
hostname:/:$ iostat -m hdisk0

System configuration: lcpu=4 drives=7 ent=0.20 paths=10 vdisks=2

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
          0.0         10.6                0.9   0.5   98.3      0.3   0.0    1.6

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk0           0.3      46.3       3.7   180755051  55682968

Paths:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
Path1            0.0       0.0       0.0          0         0
Path0            0.3      46.3       3.7   180755051  55682968
hostname:/:$

 

8 More Discussions You Might Find Interesting

1. Filesystems, Disks and Memory

Optimizing the system reliability

My product have around 10-15 programs/services running in the sun box, which together completes a task, sequentially. Several instances of the each program/service are running in the unix box, to manage the load and for risk-management reasons. As of now, we dont follow a strict strategy in... (2 Replies)
Discussion started by: Deepa
2 Replies

2. UNIX for Advanced & Expert Users

AIX MPIO and EMC

We are looking at running MPIO for it's redundancy and load balancing benefits. Does anyone know what pieces of software or modules are needed on the VIO server to get load balancing to work. Remember we are using EMC's DMX3500 storage system. We no longer want to use Powerpath. :rolleyes: ... (2 Replies)
Discussion started by: vxg0wa3
2 Replies

3. AIX

AIX native MPIO

Hi folks, does anybody have a link to a documentation how to implement native MPIO on AIX? We are using EMC PowerPath and Datacore SanSymphony/Cambex for this so far and I wasn't able to find a good description on that topic. All I know so far is that mkpath, chpath and lspath are used to... (3 Replies)
Discussion started by: zaxxon
3 Replies

4. High Performance Computing

High reliability web server - cluster, redundancy, etc

Hi. I am IT manager/developer for a small organization. I have been doing as-needed linux server administration for several years and am by no means an expert. I've built several of my own servers, and our org is currently using hosting services for our servers and I am relatively happy. We... (3 Replies)
Discussion started by: bsaadmin
3 Replies

5. AIX

MPIO Driver

On a particular LPAR, I was running AIX 5.3 TL 3. On Monday I did an update of the LPAR to 5.3 TL 9 SP2. The install was smooth, but then I ran into a problem. The MPIO driver does not work with LSI's StoreAge (SVM4). I did some looking, and looks like 5.3 TL3 = IBM.MPIO 5.3.0.30 5.3... (0 Replies)
Discussion started by: clking
0 Replies

6. Solaris

Reasons for NOT using LDOMs? reliability?

Dear Solaris Experts, We are upgrading from sun4u to T4 systems and one proposal is to use LDOMs and also zones within LDOMs. Someone advised using only zones and not LDOMs because the new machines have fewer chips and if a chip or a core fails then it doesn't impact the zones, but impacts... (3 Replies)
Discussion started by: User121
3 Replies

7. AIX

Need Help with SDD / SDDPCM / MPIO

This is getting very confusing for me, and appreciate if someone can help. Platform: Power VM ( Virtual I/O Server) ioslevel 2.1.3.10-FP23 # oslevel -s 6100-05-00-0000 Storage: IBM DS4300 Two HBAs - Dual Port Fibre Adapter Channels Each card has two ports , so a total of 4 ports going... (3 Replies)
Discussion started by: filosophizer
3 Replies

8. AIX

DISK and MPIO

Hello, I have some concerns over the disk management of my AIX system. For example server1 / > lspv hdisk0 00fa6d1288c820aa rootvg active hdisk1 00fa6d1288c8213c vg_2 active hdisk2 00c1cc14d6de272b ... (6 Replies)
Discussion started by: Phat
6 Replies
scdpm(1M)						  System Administration Commands						 scdpm(1M)

NAME
scdpm - manage disk path monitoring daemon SYNOPSIS
scdpm [-a] {node | all} scdpm -f filename scdpm -m {[node | all][:/dev/did/rdsk/]dN | [:/dev/rdsk/]cNtXdY | all} scdpm -n {node | all} scdpm -p [-F] {[node | all][:/dev/did/rdsk/]dN | [/dev/rdsk/]cNtXdY | all} scdpm -u {[node | all][:/dev/did/rdsk/]dN | [/dev/rdsk/]cNtXdY | all} DESCRIPTION
Note - Beginning with the Sun Cluster 3.2 release, Sun Cluster software includes an object-oriented command set. Although Sun Cluster software still supports the original command set, Sun Cluster procedural documentation uses only the object-oriented command set. For more infor- mation about the object-oriented command set, see the Intro(1CL) man page. The scdpm command manages the disk path monitoring daemon in a cluster. You use this command to monitor and unmonitor disk paths. You can also use this command to display the status of disk paths or nodes. All of the accessible disk paths in the cluster or on a specific node are printed on the standard output. You must run this command on a cluster node that is online and in cluster mode. You can specify either a global disk name or a UNIX path name when you monitor a new disk path. Additionally, you can force the daemon to reread the entire disk configuration. You can use this command only in the global zone. OPTIONS
The following options are supported: -a Enables the automatic rebooting of a node when all monitored disk paths fail, provided that the following conditions are met: o All monitored disk paths on the node fail. o At least one of the disks is accessible from a different node in the cluster. You can use this option only in the global zone. Rebooting the node restarts all resource and device groups that are mastered on that node on another node. If all monitored disk paths on a node remain inaccessible after the node automatically reboots, the node does not automatically reboot again. However, if any monitored disk paths become available after the node reboots but then all monitored disk paths again fail, the node automatically reboots again. You need solaris.cluster.device.admin role-based access control (RBAC) authorization to use this option. See rbac(5). -F If you specify the -F option with the -p option, scdpm also prints the faulty disk paths in the cluster. The -p option prints the cur- rent status of a node or a specified disk path from all the nodes that are attached to the storage. -f filename Reads a list of disk paths to monitor or unmonitor in filename. You can use this option only in the global zone. The following example shows the contents of filename. u schost-1:/dev/did/rdsk/d5 m schost-2:all Each line in the file must specify whether to monitor or unmonitor the disk path, the node name, and the disk path name. You specify the m option for monitor and the u option for unmonitor. You must insert a space between the command and the node name. You must also insert a colon (:) between the node name and the disk path name. You need solaris.cluster.device.admin RBAC authorization to use this option. See rbac(5). -m Monitors the new disk path that is specified by node:diskpath. You can use this option only in the global zone. You need solaris.cluster.device.admin RBAC authorization to use this option. See rbac(5). -n Disables the automatic rebooting of a node when all monitored disk paths fail. You can use this option only in the global zone. If all monitored disk paths on the node fail, the node is not rebooted. You need solaris.cluster.device.admin RBAC authorization to use this option. See rbac(5). -p Prints the current status of a node or a specified disk path from all the nodes that are attached to the storage. You can use this option only in the global zone. If you also specify the -F option, scdpm prints the faulty disk paths in the cluster. Valid status values for a disk path are Ok, Fail, Unmonitored, or Unknown. The valid status value for a node is Reboot_on_disk_failure. See the description of the -a and the -n options for more information about the Reboot_on_disk_failure status. You need solaris.cluster.device.read RBAC authorization to use this option. See rbac(5). -u Unmonitors a disk path. The daemon on each node stops monitoring the specified path. You can use this option only in the global zone. You need solaris.cluster.device.admin RBAC authorization to use this option. See rbac(5). EXAMPLES
Example 1 Monitoring All Disk Paths in the Cluster Infrastructure The following command forces the daemon to monitor all disk paths in the cluster infrastructure. # scdpm -m all Example 2 Monitoring a New Disk Path The following command monitors a new disk path.All nodes monitor /dev/did/dsk/d3 where this path is valid. # scdpm -m /dev/did/dsk/d3 Example 3 Monitoring New Disk Paths on a Single Node The following command monitors new paths on a single node. The daemon on the schost-2 node monitors paths to the /dev/did/dsk/d4 and /dev/did/dsk/d5 disks. # scdpm -m schost-2:d4 -m schost-2:d5 Example 4 Printing All Disk Paths and Their Status The following command prints all disk paths in the cluster and their status. # scdpm -p schost-1:reboot_on_disk_failure enabled schost-2:reboot_on_disk_failure disabled schost-1:/dev/did/dsk/d4 Ok schost-1:/dev/did/dsk/d3 Ok schost-2:/dev/did/dsk/d4 Fail schost-2:/dev/did/dsk/d3 Ok schost-2:/dev/did/dsk/d5 Unmonitored schost-2:/dev/did/dsk/d6 Ok Example 5 Printing All Failed Disk Paths The following command prints all of the failed disk paths on the schost-2 node. # scdpm -p -F all schost-2:/dev/did/dsk/d4 Fail Example 6 Printing the Status of All Disk Paths From a Single Node The following command prints the disk path and the status of all disks that are monitored on the schost-2 node. # scdpm -p schost-2:all schost-2:reboot_on_disk_failure disabled schost-2:/dev/did/dsk/d4 Fail schost-2:/dev/did/dsk/d3 Ok EXIT STATUS
The following exit values are returned: 0 The command completed successfully. 1 The command failed completely. 2 The command failed partially. Note - The disk path is represented by a node name and a disk name. The node name must be the host name or all. The disk name must be the global disk name, a UNIX path name, or all. The disk name can be either the full global path name or the disk name: /dev/did/dsk/d3 or d3. The disk name can also be the full UNIX path name: /dev/rdsk/c0t0d0s0. Disk path status changes are logged with the syslogd LOG_INFO facility level. All failures are logged with the LOG_ERR facility level. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWsczu | +-----------------------------+-----------------------------+ |Stability |Evolving | +-----------------------------+-----------------------------+ SEE ALSO
Intro(1CL), cldevice(1CL), clnode(1CL), attributes(5) Sun Cluster System Administration Guide for Solaris OS Sun Cluster 3.2 22 Jun 2006 scdpm(1M)
All times are GMT -4. The time now is 09:33 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy