MPIO reliability

01-14-2011

Registered User

6, 1

Join Date: Dec 2010

Last Activity: 10 May 2011, 9:46 AM EDT

Posts: 6

Thanks Given: 0

Thanked 1 Time in 1 Post

settings you need implemented

My company have encountered similar problems, and we have seen that some settings need to be set.

Here are the settings which have to be implemented.

Each child fibre device (fscsiX) has to have the following two modes set:

Code:

chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail

Additionally, every hdisk device needs to be changed (which I didn't see mentioned in the post).

Code:

chdev -l hdiskX -a reserve_policy=no_reserve

Lastly, you may want to check that the hcheck_interval is NOT set to 0, as then it won't check at all. Usual recommendation is to set to 30 (but 10 should be sufficient).

Code:

chdev -l hdiskX -a hcheck_interval=10

Moderator's Comments:

Use code tags please.

Last edited by zaxxon; 01-17-2011 at 12:02 PM.. Reason: code tags

smurphy

View Public Profile for smurphy

Find all posts by smurphy

01-28-2011

Registered User

301, 28

Join Date: Jul 2007

Last Activity: 28 April 2020, 9:45 AM EDT

Location: Kansas

Posts: 301

Thanks Given: 21

Thanked 28 Times in 21 Posts

UPDATE: Sorry: The hcheck_interval idea was already mentioned by smurphy. I should have moved on to page 2.

One other thing to check is your "hcheck_interval" which is set at the disk level. The hcheck_interval tells your system how often to check, or re-check, FAILED paths and inactive ENABLED paths (in the case of "algorithm" being set to "fail_over") to ensure they are still connected and functioning. I suggest setting your hcheck_interval to 3600 (once an hour). You'll have to set this on all your disks individually. If the hcheck_interval is set to "0", then this disables it and the disk will never automatically change out of a FAILED or MISSING state.

Remember that MPIO is not like etherchannels, where it automatically re-enables all the paths as soon as the plug is back in. Something has to occur on the disk side to make it recheck them. Either the hcheck_interval comes around again, or you unplug your secondary fiber car which will cause AIX to suddenly start sending checks for all your disks down all the paths, FAILED or MISSING, and try to find a path that is working and it will set it back to ENABLED if it finds one.

Code:

hostname:/:$ lsattr -El hdisk0 | egrep "hcheck_interval"
hcheck_interval 3600                             Health Check Interval      True
hostname:/:$

Also, you can re-enable the paths manually by doing a chdev on it:

Code:

chdev -l hdisk0 -p vscsi0 -s enable

You can also see which path is being used by watching for numbers increasing in the output of "iostat -m":

Code:

hostname:/:$ iostat -m hdisk0

System configuration: lcpu=4 drives=7 ent=0.20 paths=10 vdisks=2

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
          0.0         10.6                0.9   0.5   98.3      0.3   0.0    1.6

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk0           0.3      46.3       3.7   180755051  55682968

Paths:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
Path1            0.0       0.0       0.0          0         0
Path0            0.3      46.3       3.7   180755051  55682968
hostname:/:$

kah00na

View Public Profile for kah00na

Find all posts by kah00na

AIX

MPIO reliability

8 More Discussions You Might Find Interesting

1. AIX

DISK and MPIO

Discussion started by: Phat

2. AIX

Need Help with SDD / SDDPCM / MPIO

Discussion started by: filosophizer

3. Solaris

Reasons for NOT using LDOMs? reliability?

Discussion started by: User121

4. AIX

MPIO Driver

Discussion started by: clking

5. High Performance Computing

High reliability web server - cluster, redundancy, etc

Discussion started by: bsaadmin

6. AIX

AIX native MPIO

Discussion started by: zaxxon

7. UNIX for Advanced & Expert Users

AIX MPIO and EMC

Discussion started by: vxg0wa3

8. Filesystems, Disks and Memory

Optimizing the system reliability

Discussion started by: Deepa