Solaris Boot Problems, random messages [/etc/rcS: /etc/dfs/sharetab: cannot create]


 
Thread Tools Search this Thread
Operating Systems Solaris Solaris Boot Problems, random messages [/etc/rcS: /etc/dfs/sharetab: cannot create]
# 1  
Old 02-02-2008
Java Solaris Boot Problems, random messages [/etc/rcS: /etc/dfs/sharetab: cannot create]

Hello All,

I have all of a sudden developed issues with booting up one of my Solaris [V240] Servers. Upon a routine reboot, I was faced with the following errors:

Feb 1 07:56:44 sco1-au-tci scsi: WARNING: /pci@1c,600000/scsi@2/sd@0,0 (sd0):
Feb 1 07:56:44 sco1-au-tci Error for Command: read(10) Error Level: Retryable
Feb 1 07:56:44 sco1-au-tci scsi: Requested Block: 114007888 Error Block: 114007903
Feb 1 07:56:44 sco1-au-tci scsi: Vendor: SEAGATE Serial Number: 053532DN34
Feb 1 07:56:44 sco1-au-tci scsi: Sense Key: Media Error
Feb 1 07:56:44 sco1-au-tci scsi: ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xf
Feb 1 07:56:45 sco1-au-tci scsi: WARNING: /pci@1c,600000/scsi@2/sd@0,0 (sd0):
Feb 1 07:56:45 sco1-au-tci Error for Command: read(10) Error Level: Fatal
Feb 1 07:56:45 sco1-au-tci scsi: Requested Block: 114007888 Error Block: 114007903
Feb 1 07:56:45 sco1-au-tci scsi: Vendor: SEAGATE Serial Number: 053532DN34
Feb 1 07:56:45 sco1-au-tci scsi: Sense Key: Media Error
Feb 1 07:56:45 sco1-au-tci scsi: ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xf

So I figured, Oh ****...the disk is messed up. However, on running a few scans, i.e. 'iostat -En' showed ALL errors to be '0'. In addition, I ran the format -> analyze -> read test which ran for about 10 or so hours and came back saying 0 errors found to be repaired. So it appears nothing particularly is wrong with my hardware. After the 2nd reboot, I didn't get the errors above anymore but now I can't seem to get past the single-user mode. I get the following errors.

mount: the state of /dev/dsk/c1t0d0s0 is not okay
and it was attempted to be mounted read/write
mount: Please run fsck and try again
/sbin/rcS: /etc/dfs/sharetab: cannot create
failed to open /etc/coreadm.confsyseventd: Unable to open daemon lock file '/etc/sysevent/syseventd_lock': 'Read-only file system'
INIT: Cannot create /var/adm/utmpx

INIT: failed write of utmpx entry:" "

INIT: failed write of utmpx entry:" "

INIT: SINGLE USER MODE

Type control-d to proceed with normal startup,
(or give root password for system maintenance):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode

I am unable to run fsck since this drive has an image of a corrupted drive (which had a bunch of unreadable sectors/blocks). I used ufsdump/ufsrestore to back it up, which obv left a gaping hole at the track/sectors where the original/corrupted disk was unreadable. So now even though it makes the server do its function without any problems, it doesn't allow me to run fsck and gives me a message like

[root@sol8-ssw01 /]# fsck -y /dev/rdsk/c1t0d0s0
** /dev/rdsk/c1t0d0s0

CANNOT READ: BLK 143278112
CONTINUE? yes

THE FOLLOWING SECTORS COULD NOT BE READ: 143278112 143278113 143278114 143278115

I have read a whole bunch of stuff as I found on google, like /var being full (it's not), the WWN being wrong as compared between vfstab, /dev, and /devices directory etc. I don't know what is wrong and I don't know what to do to fix this. Any ideas as to why this happened and what I can do?

PLEASE HELP!!!
# 2  
Old 02-02-2008
Have you tried another disk?

I'm also curious about the "routine reboots". Do you routinely reboot Solaris servers? WHy?
# 3  
Old 02-02-2008
Solaris has the command "iostat -E" which reports hardware errors. I suggest the OP run that.

System Shock, I am favorably inclined towards routine reboots. My last employer's Data Center went down due to power problems (despite an super-ups and an on-site generator!) and dozens of boxes which had been up for months did not reboot. Various changes had been made and no one had tested the start up scripts. Some of the boxes did not reboot because the battery in the id-prom had died. I finally figured out how to get them up, but this left them in a state where they would be unbootable should power drop again. Rebooting a few boxes at a time each week would have exposed those issues. Another time, we had to take a box down to move it and we noticed it had a .reconfigure in /. The guy who put it there had left over a year ago. We had no idea what the reboot would bring. Also we were unable to install security patches because they would almost always reboot a box. If we have a reboot schedule, we can have a reasonable patch management policy.
# 4  
Old 02-02-2008
System Shock: I don't think the situation is at a point of trying new disks. If I had to do that I wouldn't be posting my question anywhere. I only replace disks when I know for sure it's the problem with the disk and not something else. Not to mention the fact that we don't have an on-site OPs team and I live on a different continent than where the servers reside, plsu it being a weekend and the time difference of 16 hrs doesn't make it any easier to just use the 'replace disk' card too often or too casually. As for 'routine reboot', exactly as Perderabo said. It exposes a lot of problems that one would never have caught.

Perderabo: iostate -En was the first thing I'd tried, and I have said in my original message that it came back with 0 (zero) errors on ALL lines. Plus format -> analyze -> read showed no errors, so I'm guessing it's not the disk. Plus the media errors only showed up once, but don't show up after subsequent reboots which they would if the disk was damaged.
# 5  
Old 02-02-2008
Darren Dunham already gave you pretty much everything you needed to know about this elsewhere.

This is your disk:
Code:
[root@sol8-ssw01 /]# prtvtoc -s /dev/rdsk/c1t0d0s0
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 2 00 0 141476928 141476927
1 3 01 141476928 1872384 143349311
2 5 00 0 143349312 143349311

141476927 < 143278112

As you can see from this you have tried to restore a dump which contains more data than can fit in the slice you tried to restore it into. Re-layout the disk and try again.
# 6  
Old 02-02-2008
Hi reborg,

thanks for that. I was waiting on darren to get back to me to confirm that I'm reading / understanding it correctly. What's confusing is that the ufsdump/restore was done from a disk with the exact same geometry / mode/ size etc. The partition table was copied from the disk as well, so I don't know how there's more data than the original slice would have had? Also, would re-laying out of the disk need me to reinstall everything from scratch including OS/Applications etc?

The 2nd question is, is fixing the partitions and presumably getting fsck to run going to fix my original problem of not being able to boot up? Mind you, this server has been successfully booted/rebooted in the past with the same partitioning etc in the past. It was up for about 178 days and I rebooted it just during maintenance but ran into these errors. They somehow occurred all by themselves during the period it was running fat and happy.

Any thoughts on the original problem?

Thanks
\R

Last edited by ranjtech; 02-02-2008 at 05:41 PM.. Reason: additional query
# 7  
Old 02-02-2008
Quote:
Originally Posted by Perderabo

System Shock, I am favorably inclined towards routine reboots. My last employer's Data Center went down due to power problems (despite an super-ups and an on-site generator!) and dozens of boxes which had been up for months did not reboot. Various changes had been made and no one had tested the start up scripts. Some of the boxes did not reboot because the battery in the id-prom had died. I finally figured out how to get them up, but this left them in a state where they would be unbootable should power drop again. Rebooting a few boxes at a time each week would have exposed those issues. Another time, we had to take a box down to move it and we noticed it had a .reconfigure in /. The guy who put it there had left over a year ago. We had no idea what the reboot would bring. Also we were unable to install security patches because they would almost always reboot a box. If we have a reboot schedule, we can have a reasonable patch management policy.

You are in Rockville.. that total loss of power, did it happen in a data center around Beltsville, by any chance?
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Solaris

Getting error while trying to create a Solaris boot instance

I issue the following command to create a boot instance from an active boot instance on solaris 11.1. beadm create -a -d "Oracle x86 64 BIT Solaris SunOS Rel 5.11 Ver 11.1 10/2012 Enable SSH " OraSolBcp I get the following error below;- be_mount_callback: failed to mount dataset... (5 Replies)
Discussion started by: Tenyhwa
5 Replies

2. Solaris

Create a boot disk mirror on Solaris 10 x86

I’m setting up a boot disk mirror on Solaris 10 x86. I’m used to doing it on SPARC, where you can copy the partition table using fmthard. My x86 boot disk has 2 primary partitions, a Solaris one and a diagnostic one. Is there a way to copy those 2 primary partitions to the second disk without... (6 Replies)
Discussion started by: TKD
6 Replies

3. UNIX for Dummies Questions & Answers

suppress RCS messages

ci filename This command displays a message. I don't want it to. How can I keep RCS from doing so? (5 Replies)
Discussion started by: robin_simple
5 Replies

4. Solaris

Cannot change the permission of /etc/dfs/sharetab

I am using a Solaris 5.10 with patch level 10/08. Here /etc/dfs/sharetab is listed as a file system with "df -k" as below: #df -k /etc/dfs/sharetab Filesystem kbytes used avail capacity Mounted on sharefs 0 0 0 0% /etc/dfs/sharetab What is "sharefs"? This is not present in my other... (6 Replies)
Discussion started by: varunla
6 Replies

5. Solaris

PXE boot problems in Solaris 10

Hi folks, I was trying to setup Network based NFS installation with PXE. I'm using virtual box for this purpose. I have one redhat (DHCP) server and Solaris 10 (Install server - Source) and on the other one am trying to install solaris using PXE. As in, dhcp address and boot file can be fetched... (2 Replies)
Discussion started by: vijaytrendz
2 Replies

6. UNIX for Advanced & Expert Users

How to Create Banner/Login Messages in Solaris.

Hi, I have been trying to create a banner/login message (something for displaying the usage policy etc) to appear while I login using telnet/FTP to any Solaris (pref. Solaris9) machine. I have tried using /etc/issue and /etc/motd files. Both do not solve my need. In that, /etc/issue displays... (8 Replies)
Discussion started by: mahatma
8 Replies

7. Programming

how to create random no between 10 to 40 in C

can any one tell me how to create integer random no between 10 to 40 in C language.When i m using random() or rand() functions they r creting some long int which is not required (5 Replies)
Discussion started by: useless79
5 Replies

8. Solaris

Solaris 8 boot problems

I'm new to Unix so please bear with me. We had a Ultra 10 running solaris 8 and the motherboard went. So we bought a replacement Ultra 10 on ebay. Exact same hardware. Swapped out the drives with our drives and booted. The new system was set up to boot off disk1 and ours were set to boot off... (1 Reply)
Discussion started by: jbestor
1 Replies
Login or Register to Ask a Question