[ASK] - AIX Fibre Channel behavior


 
Thread Tools Search this Thread
Operating Systems AIX [ASK] - AIX Fibre Channel behavior
# 1  
Old 03-07-2019
[ASK] - AIX Fibre Channel behavior

Hello all,

Let me introduce about the context and my environment.
We have an AIX 6.1 system, it has 4 FC channels
Code:
[root@xxx] / > lsdev -Cc adapter | grep fcs
fcs0 Available 23-T1 Virtual Fibre Channel Client Adapter
fcs1 Available 23-T1 Virtual Fibre Channel Client Adapter
fcs2 Available 23-T1 Virtual Fibre Channel Client Adapter
fcs3 Available 23-T1 Virtual Fibre Channel Client Adapter

- 2 virtual FC fcs0, fcs2 comes from VIOS_A --> mapped to the only 1 physical FC
- 2 virtual FC fcs1, fcs3 comes from VIOS_B --> mapped to the only 1 physical FC

--> We can say we have 2 physical FC path.


There is a chance that I reboot the machine, and it cannot boot up. It said that the boot partition is not found. In the SMS mode, I have checked and found that the fcs2 is failed, and fcs3 is partially worked

Code:
WorldWidePortName: c050760941350104
 1.  202700a0b86e87a4,0                 0 MB Disk drive - reserved
 2.  202700a0b86e87a4,1000000000000     107 GB Disk drive
 3.  202700a0b86e87a4,2000000000000     0 MB Disk drive - reserved
 4.  202700a0b86e87a4,3000000000000     0 MB Disk drive - reserved
 5.  202700a0b86e87a4,4000000000000     0 MB Disk drive - reserved
 6.  202700a0b86e87a4,5000000000000     0 MB Disk drive - reserved
 7.  202700a0b86e87a4,6000000000000     0 MB Disk drive - reserved
 8.  202700a0b86e87a4,7000000000000     0 MB Disk drive - reserved
 9.  202700a0b86e87a4,8000000000000     107 GB Disk drive
10.  202700a0b86e87a4,9000000000000     107 GB Disk drive
11.  202700a0b86e87a4,a000000000000     107 GB Disk drive
12.  202700a0b86e87a4,b000000000000     0 MB Disk drive - reserved

First action, I have asked the storage guy to remove the fcs3 WWPN from the mapping, try to detect the boot device, then asked again to remove fcs2 WWPN, the both case didn't help.

Second action, I asked the storage guy to map back the fcs2 & fcs3 WWPN back to the machine. Try to detect and get the positive results. Now fcs3 can see all the LUN and detect the boot device.
Code:
Select Attached Device
  Pathname: /vdevice/vfc-client@300001a7
  WorldWidePortName: c050760941350104
 1.  202700a0b86e87a4,0                 107 GB Disk drive - bootable
 2.  202700a0b86e87a4,1000000000000     107 GB Disk drive
 3.  202700a0b86e87a4,2000000000000     107 GB Disk drive
 4.  202700a0b86e87a4,3000000000000     107 GB Disk drive
 5.  202700a0b86e87a4,4000000000000     107 GB Disk drive
 6.  202700a0b86e87a4,5000000000000     107 GB Disk drive
 7.  202700a0b86e87a4,6000000000000     107 GB Disk drive
 8.  202700a0b86e87a4,7000000000000     107 GB Disk drive
 9.  202700a0b86e87a4,8000000000000     107 GB Disk drive
10.  202700a0b86e87a4,9000000000000     107 GB Disk drive
11.  202700a0b86e87a4,a000000000000     107 GB Disk drive
12.  202700a0b86e87a4,b000000000000     107 GB Disk drive

At the end I can boot up the AIX machine back to normal.

Check further with multipath to verify fcs2, found that the LUN are missing on fcs2. This is matched with fcs2 is failed from the beginning.
Code:
Enabled hdisk7  fscsi1
Enabled hdisk8  fscsi1
Enabled hdisk9  fscsi1
Enabled hdisk10 fscsi1
Enabled hdisk11 fscsi1
Enabled hdisk12 fscsi1
Missing hdisk2  fscsi2
Missing hdisk3  fscsi2
Missing hdisk4  fscsi2
Missing hdisk5  fscsi2
Missing hdisk6  fscsi2
Missing hdisk7  fscsi2
Missing hdisk8  fscsi2

So my concern here is:
I repeat:
Code:
- 2 virtual FC fcs0, fcs2 comes from VIOS_A  --> mapped to the only 1 physical FC
- 2 virtual FC fcs1, fcs3  comes from VIOS_B  --> mapped to the only 1 physical FC

With the first action, the fcs2 & fcs3 were removed. We still have fcs0 & fcs1 (mapped to 2 different physical FCs) can see the LUN, but not see the bootable partition.

With the second action, fcs2 & fcs3 were re-added, this action makes fcs3 refreshed and see the LUN with bootable partition.

Why in the first action, the LUN & boot partition is not detected? we still have the full visibility to the LUN.
Why in the second action, we can see the LUN and boot partition?

As I know, FC card has 2 ports. if 1 port failed, the rest can continue to work. Please correct me if I'm wrong.

Here in reality, we have 2 physical FCs with 1 port failure per each, and still not boot the server until 1 port failure come up again.

Please advise.
# 2  
Old 03-08-2019
Quote:
Originally Posted by Phat
Let me introduce about the context and my environment.
We have an AIX 6.1 system, it has 4 FC channels
Code:
[root@xxx] / > lsdev -Cc adapter | grep fcs
fcs0 Available 23-T1 Virtual Fibre Channel Client Adapter
fcs1 Available 23-T1 Virtual Fibre Channel Client Adapter
fcs2 Available 23-T1 Virtual Fibre Channel Client Adapter
fcs3 Available 23-T1 Virtual Fibre Channel Client Adapter

- 2 virtual FC fcs0, fcs2 comes from VIOS_A --> mapped to the only 1 physical FC
- 2 virtual FC fcs1, fcs3 comes from VIOS_B --> mapped to the only 1 physical FC

--> We can say we have 2 physical FC path.
At first a few general remarks: If you want to analyse aspects of a system configuration in a (virtualised) AIX environment what you posted is not helpful at all. You need to look elsewhere, especially:

1) The LPARs profile on the HMC, either via the Web GUI or the commandline (lssyscfg and lshwres)

2) VIOS profile on the HMC

3) from the VIOS commandline the various aspects of virtualised resources (lsdev, lsvdev, lsmap, ...)

What you posted simply won't tell you (or us) anything useful (that is: useful on its own) information about the system. Its like if i would ask you how to repair my car and when you ask back "which car?" i'd say "a yellow one".

Quote:
Originally Posted by Phat
There is a chance that I reboot the machine, and it cannot boot up. It said that the boot partition is not found. In the SMS mode, I have checked and found that the fcs2 is failed, and fcs3 is partially worked

Code:
WorldWidePortName: c050760941350104
 1.  202700a0b86e87a4,0                 0 MB Disk drive - reserved
 2.  202700a0b86e87a4,1000000000000     107 GB Disk drive
 3.  202700a0b86e87a4,2000000000000     0 MB Disk drive - reserved
 4.  202700a0b86e87a4,3000000000000     0 MB Disk drive - reserved
 5.  202700a0b86e87a4,4000000000000     0 MB Disk drive - reserved
 6.  202700a0b86e87a4,5000000000000     0 MB Disk drive - reserved
 7.  202700a0b86e87a4,6000000000000     0 MB Disk drive - reserved
 8.  202700a0b86e87a4,7000000000000     0 MB Disk drive - reserved
 9.  202700a0b86e87a4,8000000000000     107 GB Disk drive
10.  202700a0b86e87a4,9000000000000     107 GB Disk drive
11.  202700a0b86e87a4,a000000000000     107 GB Disk drive
12.  202700a0b86e87a4,b000000000000     0 MB Disk drive - reserved

First action, I have asked the storage guy to remove the fcs3 WWPN from the mapping, try to detect the boot device, then asked again to remove fcs2 WWPN, the both case didn't help.

Second action, I asked the storage guy to map back the fcs2 & fcs3 WWPN back to the machine. Try to detect and get the positive results. Now fcs3 can see all the LUN and detect the boot device.
Code:
Select Attached Device
  Pathname: /vdevice/vfc-client@300001a7
  WorldWidePortName: c050760941350104
 1.  202700a0b86e87a4,0                 107 GB Disk drive - bootable
 2.  202700a0b86e87a4,1000000000000     107 GB Disk drive
 3.  202700a0b86e87a4,2000000000000     107 GB Disk drive
 4.  202700a0b86e87a4,3000000000000     107 GB Disk drive
 5.  202700a0b86e87a4,4000000000000     107 GB Disk drive
 6.  202700a0b86e87a4,5000000000000     107 GB Disk drive
 7.  202700a0b86e87a4,6000000000000     107 GB Disk drive
 8.  202700a0b86e87a4,7000000000000     107 GB Disk drive
 9.  202700a0b86e87a4,8000000000000     107 GB Disk drive
10.  202700a0b86e87a4,9000000000000     107 GB Disk drive
11.  202700a0b86e87a4,a000000000000     107 GB Disk drive
12.  202700a0b86e87a4,b000000000000     107 GB Disk drive

At the end I can boot up the AIX machine back to normal.
First: what do you mean by "mapping"?? Do you mean "zoning"?

Second: which storage do you use?

Third: how is your system connected to the storage? I mean: physically connected. How does the FC cabling layout look like? I.e. are both ports on the physical adapters connected? And, if yes, do they work both?

Quote:
Originally Posted by Phat
Code:
- 2 virtual FC fcs0, fcs2 comes from VIOS_A  --> mapped to the only 1 physical FC
- 2 virtual FC fcs1, fcs3  comes from VIOS_B  --> mapped to the only 1 physical FC

With the first action, the fcs2 & fcs3 were removed. We still have fcs0 & fcs1 (mapped to 2 different physical FCs) can see the LUN, but not see the bootable partition.

With the second action, fcs2 & fcs3 were re-added, this action makes fcs3 refreshed and see the LUN with bootable partition.

Why in the first action, the LUN & boot partition is not detected? we still have the full visibility to the LUN.
Why in the second action, we can see the LUN and boot partition?

As I know, FC card has 2 ports. if 1 port failed, the rest can continue to work. Please correct me if I'm wrong.
You are wrong: not wrong in that the card may have two ports but wrong in the assumption that if it has two ports both have to work. Maybe only one is connected, i don't know - but you don't know either, so i suggest you find out. See above - you (seem to) don't know some pretty relevant details about your environment.

Quote:
Here in reality, we have 2 physical FCs with 1 port failure per each, and still not boot the server until 1 port failure come up again.
This can have all sorts of reasons and then some: the zoning was wrong before and correct then, the zones where there but not correctly activated, there was a short outage of the FC connection - this happens frequently, which is why ones uses multipath drivers. (This is also the reason why i feel more comfortable not booting off a NPIV device.) And, and, ....

Sorry to have no better answer for you but you will have to learn how FC works, how zoning works, etc. - not to forget how IBM virtualisation works - to understand your environment. I am glad to explain some detail to you but i cannot teach you the job over the internet. And i am glad to help you but i cannot troubleshoot your system over the internet either.

I hope this helps.

bakunin
# 3  
Old 03-08-2019
Hi Bakunin,

You are right. I'm not so understand well the FC work, zoning work, and IBM virtualization work.
Just want to know the FC port work and multipath(redundant) work.

We talk about another aspect. For example, if you have 2 FC cards, each has 2 ports. So 4 ports are connected to the LUN. Assuming that, 3 ports failed and we have only 1 port. So we still can see the LUN? and in case 2 ports failed, we can see the LUN? In regard redundant, with 4 line connected to LUN, the system is only down if 4 lines were down/broken, even if 1 line is still available, the system still running. Please correct me this.

I'm surprised is that in my environment I can see the LUN on 2 still-working FC cards, but not boot device. After 1 more FC is up again, I can see. So confused about this.
# 4  
Old 03-08-2019
Quote:
Originally Posted by Phat
We talk about another aspect. For example, if you have 2 FC cards, each has 2 ports. So 4 ports are connected to the LUN. Assuming that, 3 ports failed and we have only 1 port. So we still can see the LUN? and in case 2 ports failed, we can see the LUN? In regard redundant, with 4 line connected to LUN, the system is only down if 4 lines were down/broken, even if 1 line is still available, the system still running. Please correct me this.
In principle: yes, you can. It depends on how your "zones" are configured. So, here is a short introduction to zoning:

When you plug a network card into a network you immediately have a "any to any" connection. For instance, you plug a network card (and an accompanying computer) in and you start a ssh to some other computer on this network. The connection itself is immediately possible and only the remote computer will decide if you are allowed to proceed - that is, by asking your password or whatever. But on the network level, as in exchanging packets, the connection is immediate.

In an FC network this is not the case. When you plug your FC adapter in it is NOT allowed to contact anybody. On the other hand there is no further authentication: when you can access something you can immediately use it. You need to create "zones" to allow (on a per-case-basis) access to other entities on the network.

Now, what is a "zone"? Every item on a FC network - FC adapters, switch ports, but also LUNs - have a "WWPN", which serves about the same role as a MAC-address in a normal network. It is a unique identifier. A zone now is a rule which WWPN is allowed to contact/access which other WWPN. You can have more than one zone for an item, i.e. you may want a certain adapter to work with two disks, so you create a zone stating adapter X is allowed to access disk A and another zone allowing adapter X to access disk B. You may also have several zones for the same disk meaning that several adapters (and therefore maybe different systems?) are allowed to acccess it. This is dangerous because you want to avoid two systems writing to the same disk but on the other hand you need that in clusters. The cluster software will in this case make sure that only one system at a time can write to the disk.

So, depending on how your environment is set up (ask your storage guy - he probably knows more about zones than i do) you may (or may not) have multiple pathes to access a disk because the zoning is set up this way.

Also, a multipath driver will be able to recognise that if you see a disk (=LUN) via such multiple pathes it is still one and the same disk. Like if you have 4 pictures of the same house from different directions you understand that there is one house, not four of them. In case of the driver that means that you may have different device entries for each path but there is a pseudo-device "above" these, which you use on the LVM level. Depending on the driver used this is done differently but the principle is always the same: you have several devices (oftenly, but not always "hdisk"s) which represent the different views (pathes) to a single LUN. Then you have a pseudo-device, which represents the LUN itself and the driver will, when you address this pseudo-device, use just one available path (or even several of them concurrently) to address it.

Also notice that each adapter (physical as well as virtual) in an IBM environment has TWO WWPNs, not one! This is necessary for LPM (live partition mobility) and both these WWPNs need to be zoned.

I hope this helps.

bakunin

Last edited by bakunin; 03-08-2019 at 06:02 AM..
This User Gave Thanks to bakunin For This Post:
# 5  
Old 03-08-2019
Thanks Bakunin, for that brilliant explanation, I wished I could be so clear...
I want to point out the very last paragraph is crucial! as more than 75% of the issues I encounter are related to that paragraph, usually when migrating disk bays or servers in a hurry ( time frame to respect...) you go to the vital first to get thing working fast and when the pressure drop remember that you havent finished.. to finalise you need those both WWPNs to be effective, only if you are unlucky you have an issue before and chances you find yourself in this very same situation... In other words when doing this sort of operations be sure you have check with the SAN team that all is configured and correct before lets say, reboot after a patching, or moving the VM, it sounds silly but often you having your schedules and issues, are not always aware of what the other team may have done meanwhile that may have a side effects...
# 6  
Old 03-08-2019
Hi Bakunin,

Can you explain to me 1 thing. As you mentioned, each physical/virtual FC has 2 WWPN. As I check on HMC, I can see the virtual FC assigned to my LPAR has 2 WWPN as screenshot attached.

But when I check on LPAR, I see only 1:
Code:
[root@xxx] / > lscfg -vl fcs0 | grep -i network
        Network Address.............C050760671B10018
[root@xxx] / > lscfg -vl fcs1 | grep -i network
        Network Address.............C050760671B1001A
[root@xxx] / > lscfg -vl fcs2 | grep -i network
        Network Address.............C050760671B1001C
[root@xxx] / > lscfg -vl fcs3 | grep -i network
        Network Address.............C050760671B1001E

[ASK] - AIX Fibre Channel behavior-aix_wwpn2png
# 7  
Old 03-08-2019
If it's not zoning, I guess we still see the WWPN info on the AIX.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Solaris

Fibre Channel link not ready on Netra 240

Hi, One of my Netra 240 went into hung state and I had to reboot it. I powered it off and tried booting it again but unsuccessful. It is not connected to SAN and have local disks. Not able to boot in failsafe mode too. There are two disks of 72GB, both are mirrored in SVM. It complains about... (5 Replies)
Discussion started by: solaris_1977
5 Replies

2. Solaris

Fibre channel link down on booting Solaris server

Hi I had power issue that affected a server, in which I had power ON the server SPARC T1-B3 running solaris 10. After power on the system stops at ok prompt, them I issued the following commands: {0} ok setenv auto-boot? false auto-boot? = false {0} ok reset-all SPARC T3-1B,... (10 Replies)
Discussion started by: fretagi
10 Replies

3. AIX

AIX - Fibre Adapter and IBM Storage

Hello, Just a quick question Usually from a PSERIES if you want to connect to IBM SAN Storage you connect the IBM SAN Storage through a SAN Switch something like this --- however my question Can you connect from Pseries directly to San Storage without SAN Switch what would be... (8 Replies)
Discussion started by: filosophizer
8 Replies

4. Hardware

Fibre Channel HBA recommendations?

We will be buying new Xeon E5-based servers for our datacenter and were wondering which Fibre Channel host bus adapters we should select for these. The choices are Emulex or QLogic (8Gb FC HBAs). Anybody have any recommendations on which is the better choice? Thanks in advance. (1 Reply)
Discussion started by: atahmass
1 Replies

5. AIX

Setting up a secure channel with AIX

hi i have two aix servers and I was asked to setup a secure shell between the two servers using the sybase user. Can any one let me know how to do this (2 Replies)
Discussion started by: newtoaixos
2 Replies

6. Solaris

USCSICMD ioctl calls for Fibre Channel(FC) devices on Solaris 10?

Hi , I have wrtitten a C program that issues USCSICMD ioctl call to the tape devices attached on solaris sparc 10. I was able to get the required information from all SCSI tape devices attached using the utility. But, whenever it is run on FC attached tape drives , the program returns an error... (0 Replies)
Discussion started by: naveen448
0 Replies

7. AIX

Power6 Virtual Fibre Channel Adapter

Hello, Searched in all IBM Redbooks and on the internet and couldn't find anything about the new feature of POWER 6 which Virtual Fibre ( Fiber ) channel adapter. It is similar to virtual scsi adapter. In my client partition I created the virtual Fibre Adapter mapped it with the VIO... (1 Reply)
Discussion started by: filosophizer
1 Replies

8. AIX

Install Fibre Card AIX 5.3

Hello, I have two systems that are being prepared to be SAN attached .. can anyone tell me any specific checks I should perform prior to the cards being installed... I am aware of firmware / OS level and relevant drivers, is there anything else? thanks Chris. (8 Replies)
Discussion started by: chlawren
8 Replies

9. AIX

Fibre channel drivers on RS/6000 aix 5L

Want to configure IBM raid strorage but the aix 5L cds do not have the drivers for the fibre channels. The machine is RS/6000. I have gone to IBM downlaodable sites but i can't find the drivers? help pliz:mad: (4 Replies)
Discussion started by: Zim-Aix-Guru
4 Replies
Login or Register to Ask a Question