DiskSuite: Breaking mirrors.


 
Thread Tools Search this Thread
Operating Systems Solaris DiskSuite: Breaking mirrors.
# 1  
Old 12-23-2005
DiskSuite: Breaking mirrors.

Ok, so I have a remote system (7 states away) that's using SDS to manage the two 18 gig disks. /, swap, /var, /home, and /opt.

The mirroring procedure I created uses installboot to ensure there's a bootblk on both disks of an SDS mirror.

The system has a problem booting (can't write to /var/adm/utmp) and there's no bootable CD on site. I have a "hands & eyes" person who's not familiar with Solaris. My intention was to break the mirror, boot to one disk, fsck the second disk and boot to it to recover. Remirror the system after it's back up.

OBP has boot-device=disk net
c0t0d0 and c0t2d0 are the two disks.
c0t2d0 is identified as the left side of the mirror where normally t0 is left and t2 is right.

Remember, I'm not there so all the commands are being entered by the H&E guy.

1. Enter root's password to go to single-user mode.
2. fsck all the slices on t0 and t2 except t0's /var slice since it's mounted in ro mode.
3. mount c0t2d0s0 /mnt
4. Remove the MDD stuff from /mnt/etc/system
5. Change the mounts in /mnt/etc/vfstab
6. eeprom to set boot-device=disk1
7. umount /mnt (ensures everything's written to disk).
8. to be sure, installboot bootblk /dev/rdsk/c0t0d0s0 and /dev/rdsk/c0t2d0s0

Upon boot, the system said that c0t0d0s0 was not of this fstype and we received the same "can't write to /var/adm/utmp" error.

I did some google searching and didn't find anything specific to this issue. I can boot in single user mode and mount the slice without a problem so it's puzzling.

Because of this, I think that "disk" is defined as t2 instead of t0 so bring it to single user and change eeprom boot-device=disk and it generates the exact same error.

Now, aside from the problems (we ultimately left the mirror broken, reinstalled Solaris on one disk and recovered the data from the second disk), does this sound like it should have worked?

One of the results of this for me is to ensure installboot is run on all SDS mirrors and to check the status of boot-device (some systems weren't "disk disk1").

Carl
# 2  
Old 12-23-2005
same process worked for me with the older version of disksuite on solaris 2.5.1 but definitely went bonkers with the newer version on solaris 8 ... your process would've worked if you reformatted the metadb slice on 1 of the drives prior to rebooting ... the system enables disksuite on bootup and sees intact metadbs so it tries to configure the filesystems under disksuite control like normal ... removing the metadbs ensures that the system doesn't have disksuite running ... and everything related to disksuite in /etc/system --- from "Begin MDD" to "End MDD" needed to get removed and not just commented out ...

the fsck of the individual filesystems while they were still mirrored, however, did not sound too good --- i think they should have been done after the mirrors were broken and the box rebooted ...

if you didn't know this yet ...
you don't need to go into single user mode to reset the eeprom entries if you don't want to (see "man eeprom") ... and you could also set them from the ok prompt as required (see this )
# 3  
Old 12-23-2005
Well, I must be not focusing on something in your post. OK, /var is screwed up somehow. So you break the mirror, do nothing to repair /var, and boot from one side of the mirror. And then.. 'we received the same "can't write to /var/adm/utmp" error.' Was that not the expected result?

You say 'fsck all the slices on t0 and t2 except t0's /var slice since it's mounted in ro mode.' How did it come to pass that /var was mounted in ro mode? Even if you couldn't get it unmounted for some odd reason, fsck -n should have been possible. Repairing /var seems like the key to recovery. What's wrong with boot into single user mode, square away /var, and reboot? Smilie
# 4  
Old 12-26-2005
Quote:
Originally Posted by Just Ice
same process worked for me with the older version of disksuite on solaris 2.5.1 but definitely went bonkers with the newer version on solaris 8 ...
Well, that might explain it since the last time I had to do this was on a 2.5.1 Solaris system.

Quote:
your process would've worked if you reformatted the metadb slice on 1 of the drives prior to rebooting ... the system enables disksuite on bootup and sees intact metadbs so it tries to configure the filesystems under disksuite control like normal ... removing the metadbs ensures that the system doesn't have disksuite running ... and everything related to disksuite in /etc/system --- from "Begin MDD" to "End MDD" needed to get removed and not just commented out ...
I hadn't thought that formatting the metadb slice would have mattered. I removed the MDD entries from /etc/system and removed /etc/system all together on further boots just in case.

Quote:
the fsck of the individual filesystems while they were still mirrored, however, did not sound too good --- i think they should have been done after the mirrors were broken and the box rebooted ...
Yea, that could have been a problem, however we did boot it several times over a few days and fscked the systems a few times so disk suite should have been off the system pretty quickly.

Quote:
if you didn't know this yet ...
you don't need to go into single user mode to reset the eeprom entries if you don't want to (see "man eeprom") ... and you could also set them from the ok prompt as required (see this )
Yep, knew that. Thanks though.

Carl
# 5  
Old 12-26-2005
Quote:
Originally Posted by Perderabo
Well, I must be not focusing on something in your post. OK, /var is screwed up somehow. So you break the mirror, do nothing to repair /var
Well, I didn't "do nothing". I attempted to fsck /var but it responded that it couldn't since /var was already mounted as read-only. I don't know why it was mounted read only. I can only assume (which is why I'm asking here) that due to the initial problem, it couldn't remount read-write.

I thought the initial process of mounting disks, mounted root and var in read-only, fscked them, then remounted them to read-write before mounting the rest of the slices.

Correction would be appreciated of course.

Quote:
, and boot from one side of the mirror. And then.. 'we received the same "can't write to /var/adm/utmp" error.' Was that not the expected result?
The initial problem was that the system wasn't able to create utmp, possibly because /var was not able to be remounted read-write (again, assuming my comment above is true).

So my steps were to break the mirror so that I had two separate disks, reboot it to just a single, non disk suite controlled disk, fsck the other disk and bring the mirror back.

I cleared /etc/system, changed the md entries in /etc/vfstab back to mounting the disk rather than mounting the metadisks, fixed eeprom so that it booted from the second disk and booted the system.

Since it should be booting from a clean, non SDS controlled disk, it should have booted successfully. I got puzzled when I received the exact same error.

Quote:
You say 'fsck all the slices on t0 and t2 except t0's /var slice since it's mounted in ro mode.' How did it come to pass that /var was mounted in ro mode?
I don't know. See my theory above. Since I don't have a Sun contract here at work (AIX/Red Hat shop), I can't check some of the deeper knowledge available within Sunsolve that I had available when I was working at a Sun shop.

Quote:
Even if you couldn't get it unmounted for some odd reason, fsck -n should have been possible.
I don't know how -n would have worked (reply 'n' to all prompts) and it wouldn't have occurred to me to try it. Can you explain further how it might have helped?

Quote:
Repairing /var seems like the key to recovery.
Well yea Smilie That's what I was trying to do. I thought, perhaps in error, that getting it mounted without disk suite would let me fsck the other disk and then boot to repaired disk to get the system back up. I could then re mirror the disk afterwards.

Quote:
What's wrong with boot into single user mode, square away /var, and reboot? Smilie
Hence the questions. Thanks for taking the time though.

Carl
# 6  
Old 12-26-2005
I'm not sure what version of Solaris you're using, but I thought that when I boot into single user mode, only / and /usr was mounted. Even if /var was mounted, I would think that a "umount /var" would take care of that. I'm not sure if stuff is mirrored in single user mode. But I would resist breaking a mirror if I didn't need to. And I don't see how that will help here. Actually, I now think that you simply had a typo in /etc/vfstab. Someone had changed /var to "ro". Later, a reboot caused your problem to occur. Smilie

As for "fsck -n", it simply provides information and I try to gather information when I don't understand something. "fsck -n" might result in anything from "all looks cool" to "file system? what file system?". Knowing the state of /var would be a help. If I'm right about the typo in /etc/fstab, "fsck -n" will not find a problem. That would lead me to stop looking at /var. Add that to the odd read-only status of /var and my next step would be to check vfstab.

I also like to run "fsck -n" to see how bad stuff is before I run a plain "fsck". I have deeply regretted not doing that on several occasions.
# 7  
Old 12-26-2005
Quote:
Originally Posted by Perderabo
I'm not sure what version of Solaris you're using,
Solaris 8.

Quote:
but I thought that when I boot into single user mode, only / and /usr was mounted. Even if /var was mounted, I would think that a "umount /var" would take care of that.
Nope, umount /var gave me a "mount point busy" type of message.

Quote:
I'm not sure if stuff is mirrored in single user mode. But I would resist breaking a mirror if I didn't need to. And I don't see how that will help here.

Actually, I now think that you simply had a typo in /etc/vfstab. Someone had changed /var to "ro". Later, a reboot caused your problem to occur. Smilie
Well I can check again, however we're using explorer to get a weekly dump of the system. I checked out the copied system and vfstab just to make sure there wasn't a problem, however I wasn't looking for that in particular so I'll check again when I get in tomorrow.

Quote:
As for "fsck -n", it simply provides information and I try to gather information when I don't understand something. "fsck -n" might result in anything from "all looks cool" to "file system? what file system?". Knowing the state of /var would be a help. If I'm right about the typo in /etc/fstab, "fsck -n" will not find a problem. That would lead me to stop looking at /var. Add that to the odd read-only status of /var and my next step would be to check vfstab.

I also like to run "fsck -n" to see how bad stuff is before I run a plain "fsck". I have deeply regretted not doing that on several occasions.
I just use fsck since it'll ask for each item. Then I can review them as they come up. If you autoanswer 'y' or 'n', you won't be able to evaluate the problems as they come up.

I appreciate the thought on vfstab. I have found an error in another server's vfstab so there's a chance that was it. I'll check the explorer output.

I've spent a lot of time these past several months discovering problems, making repairs and whipping up scripts so they won't happen again so it wouldn't surprise me.

Thanks.

Carl
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. Solaris

Zpool with 3 2-way mirrors in a pool

I have a single zpool with 3 2-way mirrors ( 3 x 2 way vdevs) it has a degraded disk in mirror-2, I know I can suffer a single drive failure, but looking at this how many drive failures can this suffer before it is no good? On the face of it, I thought that I could lose a further 2 drives in each... (4 Replies)
Discussion started by: fishface
4 Replies

2. Solaris

Oneway mirrors

All, One-way mirror. Elements of the concat in Last-errd state. What would be the best way to correct it? metastat -s db2test -pc db2test/d220 p 5.0GB db2test/d200 db2test/d219 p 5.0GB db2test/d200 db2test/d218 p 5.0GB db2test/d200 db2test/d217 p 30GB db2test/d200... (0 Replies)
Discussion started by: ossupport55
0 Replies

3. Solaris

Help with attaching mirrors

Hi Guys, I need a help with attaching the sub mirrors as it keep throwing errors. I have done solaris live upgrade and it was succesful but it keeps throwing error only for root (s0) and swap (s1)when i try to attach them. For rest of the partitions for slices 3,4,5 on target 1 are able to... (4 Replies)
Discussion started by: phanidhar6039
4 Replies

4. Linux

[Errno 256] No more mirrors to try.

Dear all, CentOS 6 After executing "yum update -y" command I am facing this error. Please help me out. thanks in advance. Full error & error code is given as follow: ... (7 Replies)
Discussion started by: saqlain.bashir
7 Replies

5. Linux

Additional mirrors on centos

How can I add additional mirrors to my CENTOS distro, according to this page AdditionalResources/Repositories - CentOS Wiki there are few fedora project repositories I'd like to add any of them but I don't know how? Thank you in advance (0 Replies)
Discussion started by: c0mrade
0 Replies

6. Solaris

both mirrors in needs maintenance state.

Hi, Ii am facing the belwo problem: d50: Mirror Submirror 0: d30 State: Needs maintenance Submirror 1: d40 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 212176648 blocks (101 GB) d30:... (3 Replies)
Discussion started by: sag71155
3 Replies
Login or Register to Ask a Question