Sun Storage Tek 6140 compatible with RHEL5.7?

 
Thread Tools Search this Thread
Operating Systems Linux Red Hat Sun Storage Tek 6140 compatible with RHEL5.7?
# 1  
Old 10-11-2012
Sun Storage Tek 6140 compatible with RHEL5.7?

Issue Description:
================

There are of 4 servers (SunFire X440) :

siman7tdw: SunFire X440 (affected server)
siman8tdw:SunFire X440 (affected server)
siman9tdw:SunFire X440
siman10tdw:SunFire X440

Storage Server: Sun Storage Tek 6140 (Name: simantdw_disk_bak) and Sun Storage Tek 6780

I) siman7tdw: it is a SunFire X4440 server with following software:
#########################################
[root@siman7tdw ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.7 (Tikanga)
[root@siman7tdw ~]# uname -a
Linux siman7tdw 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@siman7tdw ~]#
#########################################
II) siman8tdw: it is a SunFire X4440 server with following software:
#########################################
[root@siman8tdw ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.7 (Tikanga)
[root@siman8tdw ~]# uname -a
Linux siman8tdw 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@siman8tdw ~]#
#########################################
III) siman9tdw: it is a SunFire X4470 server with following software:
#########################################
[root@siman9tdw ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.7 (Tikanga)
[root@siman9tdw ~]# uname -a
Linux siman9tdw 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@siman9tdw ~]#
#########################################
IV) siman10tdw: it is a SunFire X4470 server with following software:
#########################################
[root@siman10tdw ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.7 (Tikanga)
[root@siman10tdw ~]# uname -a
Linux siman10tdw 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@siman10tdw ~]#
#########################################

All these servers are included in Oracle cluster (RAC) and connected to 2 Oracle storage cabinets (Sun Storage Tek 6140 and Sun Storage Tek 6780) configured with multipath (RDAC).

Since these servers were updated to RHEL Server release 5.7, we observed one of the storage cabinet (Sun Storage Tek 6140, named as "simantdw_disk_bak") is switching between controller A and B quite frequently (many times per day) which sometime even provoked server reboot (file system is ocfs2).

To avoid these reboots all LUNs in storage cabinet "simantdw_disk_bak" were unmounted for all the servers, but controllers continue doing failover for siman7tdw and siman8tdw.

Oracle support confirmed multipathing software does failover between controllers due to timeout and has discarded any problem with storage cabinet.

RDAC driver has been updated to a newer version in every server, but behaviour has not changed.

These are the messages displayed in the affected servers, we are concerned about "mpp" messages and also "lpfc_sci" ones:

#########################################
[root@siman7tdw ~]# tail -f /var/log/messages
Oct 10 08:47:53 siman7tdw kernel: 122 [RAIDarray.mpp]simantdw_disk_bak:0:0:12 Controller IO time expired. Delta 400 secs
Oct 10 08:47:53 siman7tdw kernel: 497 [RAIDarray.mpp]simantdw_disk_bak:0:0:12 Failed controller to 1. retry. vcmnd SN 76094 pdev H3:C0:T1:L12 0x00/0x00/0x00 0x06000000 mpp_status:8
Oct 10 08:47:53 siman7tdw kernel: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from dma_map_sg. Config 64, seg_cnt 128
Oct 10 08:47:53 siman7tdw kernel: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from dma_map_sg. Config 64, seg_cnt 128
Oct 10 08:47:53 siman7tdw kernel: 10 [RAIDarray.mpp]simantdw_disk_bak:1 Failover command issued
Oct 10 08:47:53 siman7tdw kernel: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from dma_map_sg. Config 64, seg_cnt 128
Oct 10 08:48:24 siman7tdw last message repeated 3363 times
Oct 10 08:49:25 siman7tdw last message repeated 8460 times
Oct 10 08:50:26 siman7tdw last message repeated 10103 times
Oct 10 08:51:27 siman7tdw last message repeated 10124 times
#########################################

#########################################
[root@siman8tdw ~]# tail -f /var/log/messages
Oct 10 08:53:28 siman8tdw kernel: 122 [RAIDarray.mpp]simantdw_disk_bak:1:0:10 Controller IO time expired. Delta 400 secs
Oct 10 08:53:28 siman8tdw kernel: 497 [RAIDarray.mpp]simantdw_disk_bak:1:0:10 Failed controller to 0. retry. vcmnd SN 27060 pdev H3:C0:T0:L10 0x00/0x00/0x00 0x06000000 mpp_status:8
Oct 10 08:53:28 siman8tdw kernel: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from dma_map_sg. Config 64, seg_cnt 124
Oct 10 08:53:28 siman8tdw kernel: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from dma_map_sg. Config 64, seg_cnt 124
Oct 10 08:53:28 siman8tdw kernel: 10 [RAIDarray.mpp]simantdw_disk_bak:0 Failover command issued
Oct 10 08:53:28 siman8tdw kernel: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from dma_map_sg. Config 64, seg_cnt 124
Oct 10 08:53:30 siman8tdw last message repeated 2 times
Oct 10 08:53:30 siman8tdw kernel: 801 [RAIDarray.mpp]Failover succeeded to simantdw_disk_bak:0
Oct 10 08:53:30 siman8tdw kernel: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from dma_map_sg. Config 64, seg_cnt 124
Oct 10 08:54:01 siman8tdw last message repeated 1678 times
#########################################

Was this problem happened because of RHEL upgrade to 5.7?
# 2  
Old 10-12-2012
If the problem was due to an upgrade it was likely due to a kernel upgrade. I would boot one of the affected machines into an older 5.6 kernel (leaving the other on the current 5.7 kernel). If the 5.7 kernel keeps flopping but 5.6 remains steady you can be sure it's a regression and just hold the boxes until it is fixed then let the boxes go back to being upgraded.

As a side note, upgrading to the bleeding edge (even though it does have to go through Red Hat's QA process) on boxes such as these aren't advisable. You can get away with that if it's a workstation or just a file server, but nothing that is this fundamental to the network and likely to have plenty of third party software running on it.

It probably would have been preferable to keep them at 5.6, then put the migration to 5.7 through some sort of testing process of your own. At the very least upgrading one at the time, with gaps between to let issues present themselves. If it had been done in this way you would have no question in your mind whether the loss of functionality was due to the upgrade or not.
Login or Register to Ask a Question

Previous Thread | Next Thread

2 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Sun Storage Tek 2540 M2 controller lockdown

hello all, I have recently acquired the above server and i am having issues with the RAID controllers being put into a lockdown state. This occurs when new drives are added to the array, the controller automagically reboots and then gives status codes OE and LT. After this i am unable to... (0 Replies)
Discussion started by: WombleLord
0 Replies

2. Solaris

Inband communication for 6140 storage

Hi, How to establish in band communication in solaris 10 to connect 6140 storage. one lun is mapped to this storage. My storage both controller IP not working. When i tried to connect serial cable console also not coming. thanks (2 Replies)
Discussion started by: sunnybee
2 Replies
Login or Register to Ask a Question