Sponsored Content
Full Discussion: MPIO reliability
Operating Systems AIX MPIO reliability Post 302448418 by zaxxon on Thursday 26th of August 2010 03:08:57 AM
Old 08-26-2010
MPIO reliability

Hi,

we have a vew boxes using MPIO and they are connected to some virtualization software managing some disk subsystems, offering volumes to the AIX boxes.
Sometimes when a cable has been plugged out for a test or when a real problem occurs, using lspath to show the state of the paths shows correct, that for example 1 path is failed, the other enabled. When the cable is plugged back in again or the problem has been recovered, that path still shows that it is failed. Even waiting some time, this will not recover. No matter what we tried will change that but a reboot of the box. I do not remember exactly if the path being shown as "failed" did still work (I thought I issued a fcstat and there was bytes counting up, not sure though, too long ago) even though the lspath showed.

Did anybody have had any similar experience with MPIO? We thought that since MPIO is some years on the market now, that an obvious problem like not updating the status of a path should be obsolete. So we came to the conclusion that it might be some kind of incompability with our virtualization software.

I never saw something like it on a box using Powerpath.

Additionally, this problem does not happen every time and not on all of the MPIO boxes.

Our boxes are running AIX 5.3 TL11 SP4.

Any hints are welcome.

---------- Post updated at 09:08 AM ---------- Previous update was at 08:54 AM ----------

Here the config of a path from a box that had no problem so far - the other boxes have same parameters for health check etc.:
Code:
> lsattr -El hdisk2
PCM             PCM/friend/dcfcpother                              Path Control Module              False
algorithm       fail_over                                          Algorithm                        True
clr_q           no                                                 Device CLEARS its Queue on error True
dist_err_pcnt   0                                                  Distributed Error Percentage     True
dist_tw_width   50                                                 Distributed Error Sample Time    True
hcheck_cmd      inquiry                                            Health Check Command             True
hcheck_interval 60                                                 Health Check Interval            True
hcheck_mode     nonactive                                          Health Check Mode                True
location                                                           Location Label                   True
lun_id          0x1000000000000                                    Logical Unit Number ID           False
max_transfer    0x40000                                            Maximum TRANSFER Size            True
node_name       0x20070030d910849e                                 FC Node Name                     False
pvid            00c6c34f19954aed0000000000000000                   Physical volume identifier       False
q_err           yes                                                Use QERR bit                     True
q_type          simple                                             Queuing TYPE                     True
queue_depth     16                                                 Queue DEPTH                      True
reassign_to     120                                                REASSIGN time out value          True
reserve_policy  single_path                                        Reserve Policy                   True
rw_timeout      70                                                 READ/WRITE time out value        True
scsi_id         0x829980                                           SCSI ID                          False
start_timeout   60                                                 START unit time out value        True
unique_id       3214fi220001_somelunidentifier                     Unique device identifier         False
ww_name         0x210100e08ba2958f                                 FC World Wide Name               False

 

8 More Discussions You Might Find Interesting

1. Filesystems, Disks and Memory

Optimizing the system reliability

My product have around 10-15 programs/services running in the sun box, which together completes a task, sequentially. Several instances of the each program/service are running in the unix box, to manage the load and for risk-management reasons. As of now, we dont follow a strict strategy in... (2 Replies)
Discussion started by: Deepa
2 Replies

2. UNIX for Advanced & Expert Users

AIX MPIO and EMC

We are looking at running MPIO for it's redundancy and load balancing benefits. Does anyone know what pieces of software or modules are needed on the VIO server to get load balancing to work. Remember we are using EMC's DMX3500 storage system. We no longer want to use Powerpath. :rolleyes: ... (2 Replies)
Discussion started by: vxg0wa3
2 Replies

3. AIX

AIX native MPIO

Hi folks, does anybody have a link to a documentation how to implement native MPIO on AIX? We are using EMC PowerPath and Datacore SanSymphony/Cambex for this so far and I wasn't able to find a good description on that topic. All I know so far is that mkpath, chpath and lspath are used to... (3 Replies)
Discussion started by: zaxxon
3 Replies

4. High Performance Computing

High reliability web server - cluster, redundancy, etc

Hi. I am IT manager/developer for a small organization. I have been doing as-needed linux server administration for several years and am by no means an expert. I've built several of my own servers, and our org is currently using hosting services for our servers and I am relatively happy. We... (3 Replies)
Discussion started by: bsaadmin
3 Replies

5. AIX

MPIO Driver

On a particular LPAR, I was running AIX 5.3 TL 3. On Monday I did an update of the LPAR to 5.3 TL 9 SP2. The install was smooth, but then I ran into a problem. The MPIO driver does not work with LSI's StoreAge (SVM4). I did some looking, and looks like 5.3 TL3 = IBM.MPIO 5.3.0.30 5.3... (0 Replies)
Discussion started by: clking
0 Replies

6. Solaris

Reasons for NOT using LDOMs? reliability?

Dear Solaris Experts, We are upgrading from sun4u to T4 systems and one proposal is to use LDOMs and also zones within LDOMs. Someone advised using only zones and not LDOMs because the new machines have fewer chips and if a chip or a core fails then it doesn't impact the zones, but impacts... (3 Replies)
Discussion started by: User121
3 Replies

7. AIX

Need Help with SDD / SDDPCM / MPIO

This is getting very confusing for me, and appreciate if someone can help. Platform: Power VM ( Virtual I/O Server) ioslevel 2.1.3.10-FP23 # oslevel -s 6100-05-00-0000 Storage: IBM DS4300 Two HBAs - Dual Port Fibre Adapter Channels Each card has two ports , so a total of 4 ports going... (3 Replies)
Discussion started by: filosophizer
3 Replies

8. AIX

DISK and MPIO

Hello, I have some concerns over the disk management of my AIX system. For example server1 / > lspv hdisk0 00fa6d1288c820aa rootvg active hdisk1 00fa6d1288c8213c vg_2 active hdisk2 00c1cc14d6de272b ... (6 Replies)
Discussion started by: Phat
6 Replies
All times are GMT -4. The time now is 12:51 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy