DiskSuite: Breaking mirrors.


 
Thread Tools Search this Thread
Operating Systems Solaris DiskSuite: Breaking mirrors.
# 8  
Old 12-26-2005
Quote:
Originally Posted by BOFH
Nope, umount /var gave me a "mount point busy" type of message.
That's not the end of the road. Use fuser to find out why. lsof would be better, but it probably won't be there in single user mode.

Quote:
Originally Posted by BOFH
I just use fsck since it'll ask for each item. Then I can review them as they come up. If you autoanswer 'y' or 'n', you won't be able to evaluate the problems as they come up.
That policy is fine for the first few hundred questions. Then what? How many more questions will there be? A 1,000 more? 10,000 more? Do you abort an fsck that might have had one more question to ask? Is it safe to abort fsck and restart it with -y? A few sessions like this and that "fsck -n" starts to look pretty good. "fsck -n" will only take a few seconds if everythinkg is ok. And it can always be safely aborted since it doesn't change anything. But if you don't like "fsck -n", my other trick is jamming the "y" key down with a paper wad. Smilie
# 9  
Old 12-26-2005
Quote:
Originally Posted by Perderabo
That's not the end of the road. Use fuser to find out why. lsof would be better, but it probably won't be there in single user mode.
Isn't wtmpx the most likely issue here? If the system needs wtmp and it's holding on so I can't umount, how is knowing that going to help? And it might not be wtmpx, it could be that I should have been able to umount /var and didn't realize that. When I couldn't umount it or remount it rw (which I also tried), I figured it wasn't possible and moved on to breaking mirrors.

Can you really force a umount? Something I'll have to investigate.

But I will keep fuser in mind for next time Smilie

Quote:
That policy is fine for the first few hundred questions. Then what? How many more questions will there be? A 1,000 more? 10,000 more? Do you abort an fsck that might have had one more question to ask? Is it safe to abort fsck and restart it with -y? A few sessions like this and that "fsck -n" starts to look pretty good. "fsck -n" will only take a few seconds if everythinkg is ok. And it can always be safely aborted since it doesn't change anything. But if you don't like "fsck -n", my other trick is jamming the "y" key down with a paper wad. Smilie
You can use fsck -y Smilie

Seriously though, if there are 1000 questions, what are you going to do by answering 'n' to all of them except know that there are a bleeding lot of them? Especially when there's some guy on the other side of the phone who has absolutely no idea what's scrolling off the console or even worse, if I'm tipping in and can't interrupt. Do I go away for coffee then and come back in 2 hours?

I figure that after 20 or so 'y's, there might be something even more serious going on and I may have to consider newfs'ing the slice and copying over the apparently good slice, or hoping there's a good backup somewhere.

And another couple of data points that I didn't think important for the initial question:

1) The keyboard wasn't a Sun keyboard. It was, I believe, an MVS type keyboard. Two rows of function keys along the top as near as I could understand. Chris couldn't find a break key and the control key was labeled something else. We had to use Ctrl+[ to generate the necessary escape sequence in vi since there wasn't an escape key either.

2) Also, the console connection was somewhat flaky. Kept throwing odd characters on the screen from time to time and locked the console up so that it had to be power cycled to recover.

Really though. In spite of all the other questions and suggestions, the real problem was, why didn't breaking the mirror succeed? Why couldn't I disassemble the mirror and bring the system back up on a single disk in the manner I described?

Right now, mirroring is one of this company's backup schemes so this is important.

1. Tivoli for the paying customers.
It's interesting that user crontabs continue to function after a user's password expires on AIX and Red Hat, but stops processing on Solaris. This was a cause of backup failures on our Sun boxes. (This is the responsibility of another department so we don't get whacked.)

2. Mirrors to ensure system availability in the event of a failed disk.
Unless of course, we're unable to boot to the "good" disk for some reason.

3. flarchive for a jumpstart recovery if the system can't be mirrored.

Currently 2 and 3 are mutually exclusive. If the system isn't mirrored, we use flarchive. Not enough disk for flarchives of all the systems.

So is there a step I missed? Something else I should have tried? Granted, there might have been a simple problem in vfstab. I'll check that and check it on the other systems. I'll even post the vfstab here if it'll help. It sounds like I should have been able to umount /var. If true, I'll try harder to do that when I purposly break one of the lab boxes as a test.

Carl
# 10  
Old 12-27-2005
There are very few files that "the system" needs. Most files are opened by processes. wtmp is not held open by any properly function process. You still miss my point about fsck and I will just live with that. But I will point out that interrupting a process via tip is no problem. You just need to use stty to set your interrupt character to something usable.
# 11  
Old 12-27-2005
Quote:
Originally Posted by BOFH
Isn't wtmpx the most likely issue here?
Not very likely actually as Perderabo pointed out.

Quote:
Can you really force a umount?
Yes

Quote:
But I will keep fuser in mind for next time
Good idea



Quote:
Do I go away for coffee then and come back in 2 hours?
Pretty much, you let it finish, then look in lost+found which if you're lucky will contain nothing more than a few log file segments.

And another couple of data points that I didn't think important for the initial question:

Quote:
Really though. In spite of all the other questions and suggestions, the real problem was, why didn't breaking the mirror succeed? Why couldn't I disassemble the mirror and bring the system back up on a single disk in the manner I described?
Without knowing exactly what you did it's really very difficult to say, I have used the procedure many many times without problems.


Quote:
2. Mirrors to ensure system availability in the event of a failed disk.
Unless of course, we're unable to boot to the "good" disk for some reason.

3. flarchive for a jumpstart recovery if the system can't be mirrored.

Currently 2 and 3 are mutually exclusive. If the system isn't mirrored, we use flarchive. Not enough disk for flarchives of all the systems.
Carl
There is no reason at all why 2 and 3 above should be exclusive, solaris supports any combination of the above during installation, granted that in Solaris 8 you do need to be a little creative with a finish script, in Solaris 9 there are jumpstart keywords which make it really easy. However all these choices can all be calculated using custom probes during installation. Something that I have been doing for several years with great success.
# 12  
Old 12-27-2005
Quote:
Originally Posted by reborg
There is no reason at all why 2 and 3 above should be exclusive
Sorry. I wasn't clear. It's the policy here based on not having sufficient disk on the jumpstart server to hold all the images for all the servers.

And for completeness, here's the system vfstab from the explorer dump:

Code:
$ ls -la vfstab
-rw-------   1 carls    aixteam         453 Oct 29 2004  vfstab
$ more vfstab
#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass    at boot options
#
#/dev/dsk/c1d0s2 /dev/rdsk/c1d0s2 /usr          ufs     1       yes     -
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/md/dsk/d32 -       -       swap    -       no      -
/dev/md/dsk/d30 /dev/md/rdsk/d30        /       ufs     1       no      -
/dev/md/dsk/d31 /dev/md/rdsk/d31        /var    ufs     1       no      -
/dev/md/dsk/d33 /dev/md/rdsk/d33        /home   ufs     2       yes     -
/dev/md/dsk/d34 /dev/md/rdsk/d34        /opt    ufs     2       yes     -
swap    -       /tmp    tmpfs   -       yes     -

Carl
# 13  
Old 12-27-2005
Quote:
Originally Posted by reborg
Pretty much, you let it finish, then look in lost+found which if you're lucky will contain nothing more than a few log file segments.
But if fsck -n doesn't do anything, there won't be anything in lost+found.

Quote:
Without knowing exactly what you did it's really very difficult to say, I have used the procedure many many times without problems.
I've used it several times without problems until this one which is why the questions. This is the first time I've had to try and talk someone through it who's not at least moderately familiar with Solaris.

I guess I'm fortunate that I haven't been in this position before.

Carl
# 14  
Old 12-27-2005
Quote:
Originally Posted by Perderabo
There are very few files that "the system" needs. Most files are opened by processes. wtmp is not held open by any properly function process.
So the basic concept here is that I should have been able to umount /var. Ok, I'll file that away. I'll be bringing up a lab box so I can test these out and be ready for next time.

Quote:
You still miss my point about fsck and I will just live with that.
I think I understand. You're using it as a troubleshooting step to see if there's a problem and just how bad the problem is. I'm not saying I won't do it in the future.

Quote:
But I will point out that interrupting a process via tip is no problem. You just need to use stty to set your interrupt character to something usable.
Well, ctrl+c interrupted the session to the point where we couldn't reestablish it. It seems the break sequence should be something different than the break sequence on the host system based on that response.

The other question would be how to restore the ability to reconnect. Is there a "hang-up" type of command or an open file somewhere?

Thanks for the responses though. I do want to learn how to be better at this. It does sound like I have to circle the concept a few times in order to fully understand it though. Sorry if I seem a little slow at attaining comprehension.

Carl
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. Solaris

Zpool with 3 2-way mirrors in a pool

I have a single zpool with 3 2-way mirrors ( 3 x 2 way vdevs) it has a degraded disk in mirror-2, I know I can suffer a single drive failure, but looking at this how many drive failures can this suffer before it is no good? On the face of it, I thought that I could lose a further 2 drives in each... (4 Replies)
Discussion started by: fishface
4 Replies

2. Solaris

Oneway mirrors

All, One-way mirror. Elements of the concat in Last-errd state. What would be the best way to correct it? metastat -s db2test -pc db2test/d220 p 5.0GB db2test/d200 db2test/d219 p 5.0GB db2test/d200 db2test/d218 p 5.0GB db2test/d200 db2test/d217 p 30GB db2test/d200... (0 Replies)
Discussion started by: ossupport55
0 Replies

3. Solaris

Help with attaching mirrors

Hi Guys, I need a help with attaching the sub mirrors as it keep throwing errors. I have done solaris live upgrade and it was succesful but it keeps throwing error only for root (s0) and swap (s1)when i try to attach them. For rest of the partitions for slices 3,4,5 on target 1 are able to... (4 Replies)
Discussion started by: phanidhar6039
4 Replies

4. Linux

[Errno 256] No more mirrors to try.

Dear all, CentOS 6 After executing "yum update -y" command I am facing this error. Please help me out. thanks in advance. Full error & error code is given as follow: ... (7 Replies)
Discussion started by: saqlain.bashir
7 Replies

5. Linux

Additional mirrors on centos

How can I add additional mirrors to my CENTOS distro, according to this page AdditionalResources/Repositories - CentOS Wiki there are few fedora project repositories I'd like to add any of them but I don't know how? Thank you in advance (0 Replies)
Discussion started by: c0mrade
0 Replies

6. Solaris

both mirrors in needs maintenance state.

Hi, Ii am facing the belwo problem: d50: Mirror Submirror 0: d30 State: Needs maintenance Submirror 1: d40 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 212176648 blocks (101 GB) d30:... (3 Replies)
Discussion started by: sag71155
3 Replies
Login or Register to Ask a Question