FAULTY DISK replacement HP rx4640


 
Thread Tools Search this Thread
Operating Systems HP-UX FAULTY DISK replacement HP rx4640
# 8  
Old 05-23-2012
I dont know how to deal with disk names (glad ioscan looks as usual...). This talks to me:
Code:
disk 4 0/1/1/1.0.0 sdisk NO_HW DEVICE HP 146 GST3146855LC
/dev/dsk/c3t0d0 /dev/dsk/c3t0d0s2 /dev/rdsk/c3t0d0 /dev/rdsk/c3t0d0s2

The bad and gone disk...

Code:
# pvdisplay –v /dev/dsk/c3t0d0 | more

Why did you not compare with the good one to see the expected output (c2t1d0)...

Now more tricky: (this is on HP-UX 11.11...)
Code:
ant:/home/vbe $ echo boot_string/S | adb /stand/vmunix /dev/kmem
boot_string:
boot_string:    disk(0/0/1/1.2.0.0.0.0.0;0)/stand/vmunix

if it returns something like disk(0/1/1/0.1.0.0.0.0.0;0)/stand/vmunix
You are lucky... (well not doomed anyway...)

But there is nothing more to do now than get that a new disk for replacement...
# 9  
Old 05-23-2012
Hi
I asked someone on site to execute the echo command and this is what was returned.
adb: warning: Unrecognized format character - 'S'
I don't know if this means that the command was not correct??
Anyway i got more than 2 disks HP 146 ST3146855LC in stock so what I'm missing is the procedure. I have been searching all this days and i have comed with two very similar ones. And on both of them I'm not very sure about partitioning and make boot. Any way I'm posting booth and please, please correct me if I'm wrong. I want to apologize as I don't know how to separate the tags.

PROCEDURE 1


1) Deactivate Physical Volumen before extraction:

Code:
#/root> pvchange -a N /dev/dsk/c3t0d0s2

Do disk replacement. Give 90-120 seconds after disk extract and after disk insert.



2) "Discover" new disk:

Code:
#/root> diskinfo /dev/rdsk/c3t0d0

Code:
#/root> ioscan -fnC disk | grep c3t0d0

Code:
#/root> insf -eC disk



3) Erase partition table and create a new one:

Code:
#/root> idisk -Rw /dev/rdsk/c3t0d0

Code:
Code:
#/root> cat /tmp/partition_file
3 EFI 500MB HPUX 100% HPSP 400MB

Code:
#/root> idisk -wf /tmp/partition_file /dev/rdsk/c3t0d0

Code:
#/root> insf -eC disk

Code:
#/root> efi_fsinit -d /dev/rdsk/c3t0d0s1


4) Restore LVM data and reactivate PV:


Code:
#/root> vgcfgrestore -n vg00 /dev/rdsk/c3t0d0s2

Code:
#/root> pvchange -a y /dev/dsk/c3t0d0s2




5) Create boot data:


Code:
#/root> mkboot -e -l /dev/rdsk/c2t6d0

I'M NOT SURE ABOUT THE BELOW

Code:
#/root> mkboot -a "boot vmunix -lq" /dev/rdsk/c3t0d0

Code:
#/root> lvlnboot -v -R /dev/vg00

Code:
#/root> vgchange -a y vg00


Some tests to see the sync process:



Code:
#
Code:
/root> for i in 1 2 3 4 5 6 7 ; do lvdisplay -v /dev/vg00/lvol${i} ; done | grep "LV Stat"
LV Status available/syncd LV Status available/syncd LV Status available/syncd LV Status available/syncd LV Status available/syncd LV Status available/stale LV Status available/stale


PROCEDURE 2



1) Save the hardware path
Run the ioscan command and note the hardware path
Code:
# ioscan –m lun /dev/disk/disk13
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health    Description
========================================================================
disk     13  64000/0xfa00/0x5   esdisk  NO_HW       DEVICE       disabled  HP 146 GST3146855LC       
             0/1/1/1.0x0.0x0
                      /dev/disk/disk13      /dev/disk/disk13_p2   /dev/rdisk/disk13     /dev/rdisk/disk13_p2
                      /dev/disk/disk13_p1   /dev/disk/disk13_p3   /dev/rdisk/disk13_p1  /dev/rdisk/disk13_p3

Lun hardware path is 64000/0xfa00/0x5
Lunpath hardware path is 0/1/1/1.0x0.0x0

2) Halt LVM access to the disk

Code:
# pvchange -a N /dev/disk/disk13_p2

3) Replace the hot-swappable disk and wait 2 minutes

4) Notify the maas storage subsystem that the disk has been replaced

If system not rebooted run scsimgr before using disk as a replacement for the old disk. For example:
Code:
# scsimgr replace_wwid –D /dev/rdisk/disk13

5) Determine the new lun instance number for the replacement disk. For example

Code:
# ioscan –m lun 
Class     I  Lun H/W Path  Driver  S/W State   H/W Type     Health    Description
========================================================================
disk     13  64000/0xfa00/0x5   esdisk  NO_HW       DEVICE       offline  HP MSA Vol      
             
                      /dev/disk/disk13         /dev/rdisk/disk13     
                      /dev/disk/disk13_p1      /dev/rdisk/disk13_p1  
              /dev/disk/disk13_p2      /dev/rdisk/disk13_p2
              /dev/disk/disk13_p3      /dev/rdisk/disk13_p3

disk 28 64000/0xfa00/0x1c esdisk Claimed DEVICE online HP MSA Vol
0/1/1/1.0x0.0x0
/dev/disk/disk28 /dev/rdisk/disk28


6) (HP Integrity servers only) Partition the replacement disk.

a. Partition the disk by using the idisk command and a partition description file

First cleare the previews partition configuration on disk
Code:
idisk -Rw /dev/rdsk/c3t0d0

Create a partition description file. For example:
Code:
Code:
# vi /tmp/pdf
In this example, the partition description file contains: 3 EFI 500MB HPUX 100% HPSP 400MB

Partition the disk using idisk and the partition description file created above:
Code:
idisk -f /tmp/pdf -w /dev/rdsk/c3t0d0

To verify enter:
Code:
# idisk /dev/rdsk/c3t0d0

b. Enter the insf command with -e option to create legeacy device files for partitions:

Code:
# insf -insf -eC disk

Use efi_fsinit to initialize the FAT filesystem on the EFI pertition:

Code:
# efi_fsinit -d /dev/rdsk/c3t0d0s1

7) 7. Assign the old instance number to the replacement disk. For example:
Code:
# io_redirect_dsf -d /dev/disk/disk13 -n /dev/disk/disk28

This assigns the old LUN instance number(13) to the replacement disk. In addition, this device
special files for the new disk are renamed to be consistent with the old LUN instance number.

The following ioscan –m lun output shows the result:

Code:
# ioscan –m lun /dev/disk/disk13

Class I Lun H/W Path Driver S/W State    H/W Type     Health    Description
========================================================================
disk 13   64000/0xfa00/0x1c esdisk CLAIMED DEVICE online HP MSA Vol 
0/1/1/1.0x0.0x0
                   /dev/disk/disk13         /dev/rdisk/disk13     
                      /dev/disk/disk13_p1      /dev/rdisk/disk13_p1  
              /dev/disk/disk13_p2      /dev/rdisk/disk13_p2
              /dev/disk/disk13_p3      /dev/rdisk/disk13_p3

8) Restore LVM configuration information to the new
Code:
# vgcfgrestore -n /dev/vg00 /dev/rdisk/disk13_p2

9) Restore LVM access to the disk.

Code:
# pvchange -a y /dev/dsk/disk13_p2

10) Initialize boot information on the disk.

Code:
# mkboot -e -l /dev/rdsk/c2t6d0

I'M NOT SURE ABOUT THE BELOW

Code:
#/root> mkboot -a "boot vmunix -lq" /dev/rdsk/c3t0d0

#/root> lvlnboot -v -R /dev/vg00

#/root> vgchange -a y vg00




Thanks again for the support,
Gjk

Last edited by gjk; 05-23-2012 at 08:25 AM..
# 10  
Old 05-23-2012
My two cents:
You have an Integrity box, I only know PA-RISC... (worked with since 1993...), only I had to change a couple of times bad disks, reading the updated When Good Disks Go Bad: Dealing with Disk Failures Under LVM I would stick from your point 10 to what is to do on page 51 step 5 and 6 and do only that!
I will be away for 5 days but other here surely will take over (methyl?)


point4:
You must
Code:
 scsimgr replace_wwid –D /dev/rdisk/disk13

because you have not rebooted OK?
so 5: you run ioscan -m lun

point 6:
Code:
insf -eC disk

I am not sure that after, you need to initialize (using the above options...)...
So check before with
Code:
efi_ls -d /dev/rdsk/c3t0d0s1


Last edited by vbe; 05-23-2012 at 02:22 PM.. Reason: URL...
This User Gave Thanks to vbe For This Post:
# 11  
Old 05-23-2012
@vbe
methyl is listening, but busy with a work job and on UK time.
Would like to know the exact hardware specification for this system (HP don't sell "300GB hot-swappable disks") and the expected mirror configuration. Not prepared to guess.
Let's eliminate the obvious. Is /var/adm/syslog.log full of SCSI LBOLT errors and other disc/controller disaster signs? Has someone visited the server to make sure that the power supply is intact and that the SCSI cable has not become displaced? What lights are lit on the disc drives (both of them)? Red, Green, flashing Green, none?

At the end of this, please consider fitting a third disc drive with view to triple mirroring.
# 12  
Old 05-23-2012
The disks are
HP 146 GB 2.5" Hot Swap Hard Drive 10000RPM
Part #: 431958-B21

The second disk in the mirror is gone...
Im also very busy with 2 audits...
# 13  
Old 05-23-2012
Assuming that the disc cannot be recovered by restoring power or plugging a cable in.

First impression is that all you need is to quiesce any applications (i.e. stop the lot) then:

1) Swap the dead disc for a new blank HP-supplied disc which will not have any previous LVM information on it whatsover.

2) Run ioscan -fn and check that the disc has changed from NO_HW to CLAIMED.
If it doesn't recover then you need a hardware engineer to look at the computer in depth.

3) Check the size of the disc with diskinfo and compare with the output of the same command for the good disc (very important, must be equal to or fractionally greater than the size of the good disc). If the disc is small, get another one and do not proceed.

4) Run the vgcfgrestore to the new disc.
Then, make volume group active again:
vgchange -a y vg00
Wait for 5 mins.

5) If you don't find that vgsync is already running (check ps -ef) then issue vgsync vg00.

6) Keep an eye on /var/adm/syslog/syslog.log for progress (it only makes an entry when a sync has completed). This could take several hours.

7) Check periodically on the sync status of all partitions. (Others have posted the commands).

8) When all the partition syncs are complete, relax and start the applications.

Last edited by methyl; 05-23-2012 at 08:12 PM.. Reason: layout, typos, more typos, add vgchange (important)
This User Gave Thanks to methyl For This Post:
# 14  
Old 05-24-2012
VBE and METHYL thank you both for your support so far.

@ methyl
after what you posted i'm not sure if i need to partition the new disk or run
Code:
vgcfgrestore

right after the new disk is recognized?

Thanks again

Last edited by gjk; 05-24-2012 at 08:46 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Show faulty shows PS1 faulty

I plugged both power cables in both power supply. When I unplugged each power cable one by one, the SPARC T4-1 machine keep running. However, show faulty command shows below message. (I have also attached the picture of both power supply) -> show faulty Target ... (1 Reply)
Discussion started by: z_haseeb
1 Replies

2. AIX

DISK ARRAY PROTECTION SUSPENDED message following disk replacement

Hello, On 4/20/2018, we performed a disk replacement on our IBM 8202 P7 server. After the disk was rebuilt, the SAS Disk Array sissas0 showed a status of degraded. However, the pdisks in the array all show a status of active. We did see a message in errpt. DISK ARRAY PROTECTION SUSPENDED. ... (3 Replies)
Discussion started by: terrya
3 Replies

3. Filesystems, Disks and Memory

DISK ARRAY PROTECTION SUSPENDED message displayed following disk replacement

Hello, On 4/20/2018, we performed a disk replacement on our IBM 8202 P7 server. After the disk was rebuilt, the SAS Disk Array sissas0 showed a status of degraded. However, the pdisks in the array all show a status of active. We did see a message in errpt. DISK ARRAY PROTECTION SUSPENDED. ... (1 Reply)
Discussion started by: terrya
1 Replies

4. AIX

Disk replacement on SharedVG.

Hi, One of my disk is in 'disk missing state'. It is a sharedVG and cluster nodes. The errpt keeps reporting stale partition error. lvs are in open/stale state. In this sceanario is replacing the disk the best practice? When i do a lsdev the disk is labelled as below. hdisk3 Available ... (2 Replies)
Discussion started by: ElizabethPJ
2 Replies

5. Solaris

[solved] How to blink faulty disk in Solaris hardware?

Hi Guys, One of two disks in my solaris machine has failed, the name is disk0, this is SUN physical sparc machine But I work remotely, so people working near that physical server are not that technical, so from OS command prompt can run some command to bink faulty disk at front panel of Server.... (9 Replies)
Discussion started by: manalisharmabe
9 Replies

6. HP-UX

Remove faulty disk LV from VG

Hi, Have mirrored the primary disk to 3 . Server and OS: # uname -a HP-UX pdwp1s B.11.11 U 9000/800 118434630 unlimited-user license # model 9000/800/L3000-7x # strings /etc/lvmtab /dev/vg00 +F@< /dev/dsk/c1t2d0 /dev/dsk/c2t2d0 /dev/dsk/c2t0d0 But now I have only 1 disk... (5 Replies)
Discussion started by: Shirishlnx
5 Replies

7. HP-UX

Remove Faulty disk from HP-UX LVM VG

Requirement to remove a faulty mirrored disk from hp-ux LVM <root@pdwp1s>/etc # vgdisplay -v /dev/vg00 vgdisplay: Warning: couldn't query physical volume "/dev/dsk/c2t0d0": The specified path does not correspond to physical volume attached to this volume group vgdisplay: Warning: couldn't... (9 Replies)
Discussion started by: Shirishlnx
9 Replies

8. Solaris

Help with faulty Disk on Sun OS

Hi, Recently i came across a disk that seems to be faulty and need help. I have gathered some information by running below commands and any help on how to solve this will be great. # uname –a SunOS XYZ 5.7 Generic_106541-16 sun4u sparc SUNW,Ultra-4 #df -k Filesystem kbytes used... (3 Replies)
Discussion started by: phanidhar6039
3 Replies

9. AIX

Removing Faulty Disk SSA

Hi Experts, I have configured A D40 Array. There is an faulty disk which is not part of an raid volume but shows fault in the diagnostics. pdisk15 U0.1-P1-I1/Q1-W40AA83CC2400D SSA160 Physical Disk Drive ( MB) Is there a way to stop this... (2 Replies)
Discussion started by: vuppala360
2 Replies

10. Solaris

Disk replacement with svm

I dont even know what raid level this is, but its raid 5 mirrored from the looks of it. I have a failed disk (t12) within this mirror. What is the best way to replace this disk? 2 things concern me, isn't there a command to prepare the disk for a hot swap? and what should i do with the... (3 Replies)
Discussion started by: BG_JrAdmin
3 Replies
Login or Register to Ask a Question