we have a vew boxes using MPIO and they are connected to some virtualization software managing some disk subsystems, offering volumes to the AIX boxes.
Sometimes when a cable has been plugged out for a test or when a real problem occurs, using lspath to show the state of the paths shows correct, that for example 1 path is failed, the other enabled. When the cable is plugged back in again or the problem has been recovered, that path still shows that it is failed. Even waiting some time, this will not recover. No matter what we tried will change that but a reboot of the box. I do not remember exactly if the path being shown as "failed" did still work (I thought I issued a fcstat and there was bytes counting up, not sure though, too long ago) even though the lspath showed.
Did anybody have had any similar experience with MPIO? We thought that since MPIO is some years on the market now, that an obvious problem like not updating the status of a path should be obsolete. So we came to the conclusion that it might be some kind of incompability with our virtualization software.
I never saw something like it on a box using Powerpath.
Additionally, this problem does not happen every time and not on all of the MPIO boxes.
Our boxes are running AIX 5.3 TL11 SP4.
Any hints are welcome.
---------- Post updated at 09:08 AM ---------- Previous update was at 08:54 AM ----------
Here the config of a path from a box that had no problem so far - the other boxes have same parameters for health check etc.:
Code:
> lsattr -El hdisk2
PCM PCM/friend/dcfcpother Path Control Module False
algorithm fail_over Algorithm True
clr_q no Device CLEARS its Queue on error True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
hcheck_cmd inquiry Health Check Command True
hcheck_interval 60 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
location Location Label True
lun_id 0x1000000000000 Logical Unit Number ID False
max_transfer 0x40000 Maximum TRANSFER Size True
node_name 0x20070030d910849e FC Node Name False
pvid 00c6c34f19954aed0000000000000000 Physical volume identifier False
q_err yes Use QERR bit True
q_type simple Queuing TYPE True
queue_depth 16 Queue DEPTH True
reassign_to 120 REASSIGN time out value True
reserve_policy single_path Reserve Policy True
rw_timeout 70 READ/WRITE time out value True
scsi_id 0x829980 SCSI ID False
start_timeout 60 START unit time out value True
unique_id 3214fi220001_somelunidentifier Unique device identifier False
ww_name 0x210100e08ba2958f FC World Wide Name False
@funksen
Thanks so far for the info - I don't remember if we tried that one but I will try that next time I get a chance.
Quote:
Originally Posted by shockneck
I wonder if you could you post the adapter settings as well?
Neither cost nor effort spared:
Code:
> lsattr -El fcs0
bus_intr_lvl 65765 Bus interrupt level False
bus_io_addr 0xefc00 Bus I/O address False
bus_mem_addr 0xf0040000 Bus memory address False
init_link pt2pt INIT Link flags True
intr_priority 3 Interrupt priority False
lg_term_dma 0x800000 Long term DMA True
max_xfer_size 0x100000 Maximum Transfer Size True
num_cmd_elems 200 Maximum number of COMMANDS to queue to the adapter True
pref_alpa 0x1 Preferred AL_PA True
sw_fc_class 2 FC Class for Fabric True
The other adapter has the same settings.
Here is the fscsi device:
Code:
> lsattr -El fscsi0
attach switch How this adapter is CONNECTED False
dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0xa9f00 Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True
The other device has the same settings.
Thanks.
Edit:
Just a note - I have currently no way to test/reproduce it so don't put too much effort into it. Any hint is good though.
In my case it takes some time for MPIO to rebuild path (VIO + N_portID). We got script that :
-lsdev (look for defined disk) & rmdev (if any)
-lspath (look for missing path) & rmpath
-cfgmgr
No clue if that was the case back then. Currently I found mixed settings like paths having the same priority and paths on another box with different priorities according to which virtualized storage they primarily talk to (while having algroithm=fail_over).
I also asked a coworker about it some seconds ago who told me he has the task to check and set all paths to different priorities.
I will keep it in mind, checking for path priority, just in case we have those strange effects again.
Hello,
I have some concerns over the disk management of my AIX system.
For example server1
/ > lspv
hdisk0 00fa6d1288c820aa rootvg active
hdisk1 00fa6d1288c8213c vg_2 active
hdisk2 00c1cc14d6de272b ... (6 Replies)
This is getting very confusing for me, and appreciate if someone can help.
Platform: Power VM ( Virtual I/O Server)
ioslevel 2.1.3.10-FP23
# oslevel -s
6100-05-00-0000
Storage: IBM DS4300
Two HBAs - Dual Port Fibre Adapter Channels
Each card has two ports , so a total of 4 ports going... (3 Replies)
Dear Solaris Experts,
We are upgrading from sun4u to T4 systems and one proposal is to use LDOMs and also zones within LDOMs.
Someone advised using only zones and not LDOMs because the new machines have fewer chips and if a chip or a core fails then it doesn't impact the zones, but impacts... (3 Replies)
On a particular LPAR, I was running AIX 5.3 TL 3. On Monday I did an update of the LPAR to 5.3 TL 9 SP2. The install was smooth, but then I ran into a problem.
The MPIO driver does not work with LSI's StoreAge (SVM4). I did some looking, and looks like
5.3 TL3 = IBM.MPIO 5.3.0.30
5.3... (0 Replies)
Hi. I am IT manager/developer for a small organization. I have been doing as-needed linux server administration for several years and am by no means an expert. I've built several of my own servers, and our org is currently using hosting services for our servers and I am relatively happy.
We... (3 Replies)
Hi folks,
does anybody have a link to a documentation how to implement native MPIO on AIX? We are using EMC PowerPath and Datacore SanSymphony/Cambex for this so far and I wasn't able to find a good description on that topic. All I know so far is that mkpath, chpath and lspath are used to... (3 Replies)
We are looking at running MPIO for it's redundancy and load balancing benefits. Does anyone know what pieces of software or modules are needed on the VIO server to get load balancing to work. Remember we are using EMC's DMX3500 storage system. We no longer want to use Powerpath. :rolleyes: ... (2 Replies)
My product have around 10-15 programs/services running in the sun box, which together completes a task, sequentially. Several instances of the each program/service are running in the unix box, to manage the load and for risk-management reasons. As of now, we dont follow a strict strategy in... (2 Replies)