Disk Failure


 
Thread Tools Search this Thread
Operating Systems HP-UX Disk Failure
# 1  
Old 06-07-2005
Disk Failure

I am new to being a Unix admin and have a question about replacing some hardware. I have a K class box using HP-UX 10.20 with three disks. Two of the drives are in one logical volume. Every 3 or 4 days, the syslog is showing that one of these drives is experiencing "POWERFAILED" and then recovering a few seconds or minutes later. My manager feels that the drive should be replaced.

From reading the documentation, it seems to me that I can shutdown, replace the one drive that is failing, and then restore the whole logical volume. Do I need to re-create the logical volume before doing a restore? Are there any other steps I need to take when replacing only 1 drive when the volume group and logical volume encompasses 2 drives?

Thank you in advance for any help.

;
# 2  
Old 06-07-2005
Hi,
Follow this:
1.- First of all you should backup your VG and the structure (vgcfgbackup).
2.- Test your disk with diskinfo /dev/rdsk/cXtXdX and dd if=/dev/dsk/cXtXdX of=dev/null and ioscan -fnCdisk (if it is not answering then it is definitely failed)
3.- Shutdown your system
4.- Replace the disk
5.- Boot to single user
6.- Execute vgcfgrestore /dev/vgXX /dev/dsk/cXtXdX
7.- Activate the VG
8.- Probably, if you had information spread all along the disks you should restore the data.

I hope it helps.

Cristian.
# 3  
Old 06-07-2005
Post the exact text of the error message. I would not immediately suspect a bad drive although it is possible. Does your manager have a good reason for suspecting a bad drive? A bad drive should be diagnosed from the hardware logs. Use the script command to record your session. Then as root, use the command: "cstm". From the cstm prompt, type "runutil logtool". Do a "sl" and pay attention to the output. It will tell you what the current log was renamed to. Type "sr", and when prompted, type in the name of that log. You will get a summary of the errors. Type "fr" to format the raw log. Now type "fl" to finally view the log.

Each logtool command is two letters and you type return after the two letters. If the commands wants more info, it will ask for it. To summarize the logtool commands:

sl [switch log]
sr [select raw]
fr [format raw]
fl [formatted log]

Then "quit" to get out of logtool. And "quit" to get out of cstm.


By the way, "powerfailed" sounds like a disk driver or a lvm driver thought an operation took too long. You might have an overloaded bus or an unreasonable timeout value. This is what I would be checking first.
# 4  
Old 06-07-2005
Here is the exact error message from Syslog. (I could not find cstm on my system).

Jun 7 06:02:04 nvidev vmunix: xvfs: mesg 016 : vx_ilisterr - /fs5 file system error readin inode 473
Jun 7 12:40:07 nvidev vmunix: disc30 56/52.4.0 SCSI even UNKNOWN_RESELECT
Jun 7 12:40:07 nvidev vmunix: LVM: vg[1]: pvnum=0 (dev_t=0x1c00400) is POWERFAILED
Jun 7 12:40:07 nvidev vmunix: LVM: PV 0 has been returned to vg[1].

Once this happened while one of our programmers was in the middle of something and the whole system froze up. After about 10 minutes of panic from several people, the system cleared up and he was able to save his work. Since this happens 3 or 4 times a week, the manager believes that the drive is failing and would like it replaced before it fails completely.

Just to let you know, we first noticed the problem when the whole system crashed. We restarted the machine and noticed the errors in the syslog. As far as I know, there were no changes to the system before the crash.

Thanx for your help

;
# 5  
Old 06-10-2005
Thanx for the help, Cristian. I was not aware of the vgcfgbackup and would have been lost without it.

Perderabo,
Based on the info from Syslog, would you do a drive replacement? Or should I be looking at something else?

Thanx for the help.

;
# 6  
Old 06-10-2005
looks like drive needs replacement from what the log is saying ... especially if manager wants it and is willing to pay for new drive --- make him feel good Smilie
# 7  
Old 06-10-2005
There is some wisdom in what Just Ice says. The manager wants the drive replaced, so replace it. Drives are not super expensive and it can't hurt to replace one.

I don't know what I would do if I was in your position. There is no way that I would allow myself to be in that position. You have a HP-UX OS without support tools. Well I would install them pronto. That means finding your support cd. Should you find it...here is the manual. In theory another option is to download the tools and that option is mentioned here. But I can't find the support tools for 10.20. I assume you know that 10.20 is no longer supported? Without the output from the diagnotics I don't know where to point a finger.

So the best path I can see is to replace the drive as your manager wants. After that is done, you will know if it was the drive or not. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

AIX hard disk failure

Hi all, I have encountered the issue with the hard disk, the disk is failed and need to replace by the new one. As my understanding, this is just to take out the failed disk and insert the new ones, and that's all. But the third party hardware vendor said, there should be another procedure... (9 Replies)
Discussion started by: Phat
9 Replies

2. Linux

0: Failure: (138) Device does not have a disk-config

Hi guys, Any idea why I am getting the below error ? # drbdsetup disk-options 0 --resync-rate=500M 0: Failure: (138) Device does not have a disk-config Some info is; # fdisk -l Disk /dev/sda: 64.4 GB, 64424509440 bytes 64 heads, 32 sectors/track, 61440 cylinders Units =... (0 Replies)
Discussion started by: Junaid Subhani
0 Replies

3. Linux

Disk is predictive failure but LED is showing green.

Hi Linux Team we use hpacucli ctrl all show config command find out the failure disk. Here command output is showing predictive failure but The Datacenter team are unable to find the correct failure disk. From Datacenter all LEDs are showing green. Datacenter asks us to blink the LED. How... (2 Replies)
Discussion started by: Naveen.6025
2 Replies

4. UNIX for Advanced & Expert Users

RAID5 multi disk failure

Hi there, Don't know if my title is relevant but I'm dealing with dangerous materials that I don't really know and I'm very afraid to mess anything up. I have a Debian 5.0.4 server with 4 x 1TB hard drives. I have the following mdstat Personalities : md1 : active raid1 sda1 sdd1... (3 Replies)
Discussion started by: chebarbudo
3 Replies

5. Solaris

Poor disk performance however no sign of failure

Hello guys, I have two servers performing the same disk operations. I believe one server is having a disk's impending failure however I have no hard evidence to prove it. This is a pair of Netra 210's with 2 drives in a hardware raid mirror (LSI raid controller). While performing intensive... (4 Replies)
Discussion started by: s ladd
4 Replies

6. Red Hat

How to monitor HP server hard disk failure ?

in red hat 4, 5 any one know any commands or any scritps to monitor HP DL 380 G5/6 server and trigger alarm when hard disk failed. thanks for all support ---------- Post updated at 02:45 PM ---------- Previous update was at 12:00 PM ---------- does HP ProLiant Support Pack support is... (4 Replies)
Discussion started by: maxlee24
4 Replies

7. Solaris

SAN disk failure

hi all, have a solaris 9 OS and a SAN disk which used to work fine is not getting picked up by my machine. can anyone point out things to check in order to troubleshoot this ?? thanks in advance. (3 Replies)
Discussion started by: cesarNZ
3 Replies

8. Filesystems, Disks and Memory

Looking for a solution to disk failure!

Hi people, I have been using my disk for quite a long time but the other day I heard the drive making some noise and had to restart the system again. But when I did that the disk would not boot and I fear that the data might be deleted or lost. So, if you people have any know about the ways to... (2 Replies)
Discussion started by: christopher4
2 Replies

9. Filesystems, Disks and Memory

Looking for a solution to disk failure!

Hi people, I have been using my disk for quite a long time but the other day I heard the drive making some noise and had to restart the system again. But when I did that the disk would not boot and I fear that the data might be deleted or lost. So, if you people have any know about the ways to get... (1 Reply)
Discussion started by: adam466
1 Replies

10. UNIX for Advanced & Expert Users

Disk failure

is there anu way by which i can find out if all the disks on the system are working ? Milind Shauche. (2 Replies)
Discussion started by: shauche
2 Replies
Login or Register to Ask a Question