Solaris Boot Problems, random messages [/etc/rcS: /etc/dfs/sharetab: cannot create]

02-02-2008

Registered User

3, 0

Join Date: Feb 2008

Last Activity: 24 May 2014, 7:09 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Solaris Boot Problems, random messages [/etc/rcS: /etc/dfs/sharetab: cannot create]

Hello All,

I have all of a sudden developed issues with booting up one of my Solaris [V240] Servers. Upon a routine reboot, I was faced with the following errors:

Feb 1 07:56:44 sco1-au-tci scsi: WARNING: /pci@1c,600000/scsi@2/sd@0,0 (sd0):
Feb 1 07:56:44 sco1-au-tci Error for Command: read(10) Error Level: Retryable
Feb 1 07:56:44 sco1-au-tci scsi: Requested Block: 114007888 Error Block: 114007903
Feb 1 07:56:44 sco1-au-tci scsi: Vendor: SEAGATE Serial Number: 053532DN34
Feb 1 07:56:44 sco1-au-tci scsi: Sense Key: Media Error
Feb 1 07:56:44 sco1-au-tci scsi: ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xf
Feb 1 07:56:45 sco1-au-tci scsi: WARNING: /pci@1c,600000/scsi@2/sd@0,0 (sd0):
Feb 1 07:56:45 sco1-au-tci Error for Command: read(10) Error Level: Fatal
Feb 1 07:56:45 sco1-au-tci scsi: Requested Block: 114007888 Error Block: 114007903
Feb 1 07:56:45 sco1-au-tci scsi: Vendor: SEAGATE Serial Number: 053532DN34
Feb 1 07:56:45 sco1-au-tci scsi: Sense Key: Media Error
Feb 1 07:56:45 sco1-au-tci scsi: ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xf

So I figured, Oh ****...the disk is messed up. However, on running a few scans, i.e. 'iostat -En' showed ALL errors to be '0'. In addition, I ran the format -> analyze -> read test which ran for about 10 or so hours and came back saying 0 errors found to be repaired. So it appears nothing particularly is wrong with my hardware. After the 2nd reboot, I didn't get the errors above anymore but now I can't seem to get past the single-user mode. I get the following errors.

mount: the state of /dev/dsk/c1t0d0s0 is not okay
and it was attempted to be mounted read/write
mount: Please run fsck and try again
/sbin/rcS: /etc/dfs/sharetab: cannot create
failed to open /etc/coreadm.confsyseventd: Unable to open daemon lock file '/etc/sysevent/syseventd_lock': 'Read-only file system'
INIT: Cannot create /var/adm/utmpx

INIT: failed write of utmpx entry:" "

INIT: failed write of utmpx entry:" "

INIT: SINGLE USER MODE

Type control-d to proceed with normal startup,
(or give root password for system maintenance):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode

I am unable to run fsck since this drive has an image of a corrupted drive (which had a bunch of unreadable sectors/blocks). I used ufsdump/ufsrestore to back it up, which obv left a gaping hole at the track/sectors where the original/corrupted disk was unreadable. So now even though it makes the server do its function without any problems, it doesn't allow me to run fsck and gives me a message like

[root@sol8-ssw01 /]# fsck -y /dev/rdsk/c1t0d0s0
** /dev/rdsk/c1t0d0s0

CANNOT READ: BLK 143278112
CONTINUE? yes

THE FOLLOWING SECTORS COULD NOT BE READ: 143278112 143278113 143278114 143278115

I have read a whole bunch of stuff as I found on google, like /var being full (it's not), the WWN being wrong as compared between vfstab, /dev, and /devices directory etc. I don't know what is wrong and I don't know what to do to fix this. Any ideas as to why this happened and what I can do?

PLEASE HELP!!!

ranjtech

View Public Profile for ranjtech

Find all posts by ranjtech

02-02-2008

Registered User

558, 9

Join Date: May 2006

Last Activity: 4 May 2012, 3:37 PM EDT

Location: Tau Ceti V

Posts: 558

Thanks Given: 0

Thanked 9 Times in 7 Posts

Have you tried another disk?

I'm also curious about the "routine reboots". Do you routinely reboot Solaris servers? WHy?

System Shock

View Public Profile for System Shock

Find all posts by System Shock

02-02-2008

Administrator Emeritus

9,926, 461

Join Date: Aug 2001

Last Activity: 26 February 2016, 12:31 PM EST

Location: Ashburn, Virginia

Posts: 9,926

Thanks Given: 63

Thanked 461 Times in 270 Posts

Solaris has the command "iostat -E" which reports hardware errors. I suggest the OP run that.

System Shock, I am favorably inclined towards routine reboots. My last employer's Data Center went down due to power problems (despite an super-ups and an on-site generator!) and dozens of boxes which had been up for months did not reboot. Various changes had been made and no one had tested the start up scripts. Some of the boxes did not reboot because the battery in the id-prom had died. I finally figured out how to get them up, but this left them in a state where they would be unbootable should power drop again. Rebooting a few boxes at a time each week would have exposed those issues. Another time, we had to take a box down to move it and we noticed it had a .reconfigure in /. The guy who put it there had left over a year ago. We had no idea what the reboot would bring. Also we were unable to install security patches because they would almost always reboot a box. If we have a reboot schedule, we can have a reasonable patch management policy.

Perderabo

View Public Profile for Perderabo

Find all posts by Perderabo

02-02-2008

Registered User

3, 0

Join Date: Feb 2008

Last Activity: 24 May 2014, 7:09 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

System Shock: I don't think the situation is at a point of trying new disks. If I had to do that I wouldn't be posting my question anywhere. I only replace disks when I know for sure it's the problem with the disk and not something else. Not to mention the fact that we don't have an on-site OPs team and I live on a different continent than where the servers reside, plsu it being a weekend and the time difference of 16 hrs doesn't make it any easier to just use the 'replace disk' card too often or too casually. As for 'routine reboot', exactly as Perderabo said. It exposes a lot of problems that one would never have caught.

Perderabo: iostate -En was the first thing I'd tried, and I have said in my original message that it came back with 0 (zero) errors on ALL lines. Plus format -> analyze -> read showed no errors, so I'm guessing it's not the disk. Plus the media errors only showed up once, but don't show up after subsequent reboots which they would if the disk was damaged.

ranjtech

View Public Profile for ranjtech

Find all posts by ranjtech

02-02-2008

Administrator Emeritus

4,463, 16

Join Date: Mar 2005

Last Activity: 29 March 2012, 7:00 PM EDT

Location: Ireland

Posts: 4,463

Thanks Given: 0

Thanked 16 Times in 14 Posts

Darren Dunham already gave you pretty much everything you needed to know about this elsewhere.

This is your disk:

Code:

[root@sol8-ssw01 /]# prtvtoc -s /dev/rdsk/c1t0d0s0
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 2 00 0 141476928 141476927
1 3 01 141476928 1872384 143349311
2 5 00 0 143349312 143349311

141476927 < 143278112

As you can see from this you have tried to restore a dump which contains more data than can fit in the slice you tried to restore it into. Re-layout the disk and try again.

reborg

View Public Profile for reborg

Find all posts by reborg

02-02-2008

Registered User

3, 0

Join Date: Feb 2008

Last Activity: 24 May 2014, 7:09 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi reborg,

thanks for that. I was waiting on darren to get back to me to confirm that I'm reading / understanding it correctly. What's confusing is that the ufsdump/restore was done from a disk with the exact same geometry / mode/ size etc. The partition table was copied from the disk as well, so I don't know how there's more data than the original slice would have had? Also, would re-laying out of the disk need me to reinstall everything from scratch including OS/Applications etc?

The 2nd question is, is fixing the partitions and presumably getting fsck to run going to fix my original problem of not being able to boot up? Mind you, this server has been successfully booted/rebooted in the past with the same partitioning etc in the past. It was up for about 178 days and I rebooted it just during maintenance but ran into these errors. They somehow occurred all by themselves during the period it was running fat and happy.

Any thoughts on the original problem?

Thanks
\R

Last edited by ranjtech; 02-02-2008 at 05:41 PM.. Reason: additional query

ranjtech

View Public Profile for ranjtech

Find all posts by ranjtech

02-02-2008

Registered User

558, 9

Join Date: May 2006

Last Activity: 4 May 2012, 3:37 PM EDT

Location: Tau Ceti V

Posts: 558

Thanks Given: 0

Thanked 9 Times in 7 Posts

Quote:

Originally Posted by Perderabo

System Shock, I am favorably inclined towards routine reboots. My last employer's Data Center went down due to power problems (despite an super-ups and an on-site generator!) and dozens of boxes which had been up for months did not reboot. Various changes had been made and no one had tested the start up scripts. Some of the boxes did not reboot because the battery in the id-prom had died. I finally figured out how to get them up, but this left them in a state where they would be unbootable should power drop again. Rebooting a few boxes at a time each week would have exposed those issues. Another time, we had to take a box down to move it and we noticed it had a .reconfigure in /. The guy who put it there had left over a year ago. We had no idea what the reboot would bring. Also we were unable to install security patches because they would almost always reboot a box. If we have a reboot schedule, we can have a reasonable patch management policy.

You are in Rockville.. that total loss of power, did it happen in a data center around Beltsville, by any chance?

System Shock

View Public Profile for System Shock

Find all posts by System Shock

Solaris

Solaris Boot Problems, random messages [/etc/rcS: /etc/dfs/sharetab: cannot create]

8 More Discussions You Might Find Interesting

1. Solaris

Getting error while trying to create a Solaris boot instance

Discussion started by: Tenyhwa

2. Solaris

Create a boot disk mirror on Solaris 10 x86

Discussion started by: TKD

3. UNIX for Dummies Questions & Answers

suppress RCS messages

Discussion started by: robin_simple

4. Solaris

Cannot change the permission of /etc/dfs/sharetab

Discussion started by: varunla

5. Solaris

PXE boot problems in Solaris 10

Discussion started by: vijaytrendz

6. UNIX for Advanced & Expert Users

How to Create Banner/Login Messages in Solaris.

Discussion started by: mahatma

7. Programming

how to create random no between 10 to 40 in C

Discussion started by: useless79

8. Solaris

Solaris 8 boot problems

Discussion started by: jbestor