Unix/Linux Go Back    

Unix Version 7 - man page for crash (v7 section 8)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)

CRASH(8)										 CRASH(8)

       crash - what to do when the system crashes

       This  section  gives  at least a few clues about how to proceed if the system crashes.  It
       can't pretend to be complete.

       Bringing it back up.  If the reason for the crash is not evident (see below  for  guidance
       on  `evident') you may want to try to dump the system if you feel up to debugging.  At the
       moment a dump can be taken only on magtape.  With a  tape  mounted  and	ready,	stop  the
       machine,  load address 44, and start.  This should write a copy of all of core on the tape
       with an EOF mark.  Caution: Any error is taken to mean the end of core has  been  reached.
       This  means that you must be sure the ring is in, the tape is ready, and the tape is clean
       and new.  If the dump fails, you can try again, but some of the registers  will	be  lost.
       See below for what to do with the tape.

       In restarting after a crash, always bring up the system single-user.  This is accomplished
       by following the directions in boot(8) as modified for  your  particular  installation;	a
       single-user  system  is	indicated  by  having  a particular value in the switches (173030
       unless you've changed init) as the system starts executing.  When it is running, perform a
       dcheck  and  icheck(1) on all file systems which could have been in use at the time of the
       crash.  If any serious file system problems are found, they should be repaired.	When  you
       are  satisfied  with  the  health of your disks, check and set the date if necessary, then
       come up multi-user.  This is most easily accomplished by changing the single-user value in
       the switches to something else, then logging out by typing an EOT.

       To  even  boot  UNIX  at  all,  three  files (and the directories leading to them) must be
       intact.	First, the initialization program /etc/init must be present and  executable.   If
       it  is  not,  the  CPU  will loop in user mode at location 6.  For init to work correctly,
       /dev/tty8 and /bin/sh must be present.  If either does not  exist,  the	symptom  is  best
       described  as thrashing.  Init will go into a fork/exec loop trying to create a Shell with
       proper standard input and output.

       If you cannot get the system to boot, a runnable system must be	obtained  from	a  backup
       medium.	 The  root file system may then be doctored as a mounted file system as described
       below.  If there are any problems with the root file system, it is probably prudent to  go
       to a backup system to avoid working on a mounted file system.

       Repairing  disks.  The first rule to keep in mind is that an addled disk should be treated
       gently; it shouldn't be mounted unless necessary, and if it is very valuable yet in  quite
       bad shape, perhaps it should be dumped before trying surgery on it.  This is an area where
       experience and informed courage count for much.

       The problems reported by icheck typically fall into two kinds.  There can be problems with
       the  free  list:  duplicates in the free list, or free blocks also in files.  These can be
       cured easily with an icheck -s.	If the same block appears in more than one file or  if	a
       file  contains  bad  blocks, the files should be deleted, and the free list reconstructed.
       The best way to delete such a file is to use clri(1), then remove its  directory  entries.
       If  any of the affected files is really precious, you can try to copy it to another device

       Dcheck may report files which have more directory entries than links.  Such situations are
       potentially  dangerous;	clri  discusses a special case of the problem.	All the directory
       entries for the file should be removed.	If on the other hand there are	more  links  than
       directory  entries,  there is no danger of spreading infection, but merely some disk space
       that is lost for use.  It is sufficient to copy the file (if it has  any  entries  and  is
       useful) then use clri on its inode and remove any directory entries that do exist.

       Finally,  there	may  be inodes reported by dcheck that have 0 links and 0 entries.  These
       occur on the root device when the system is stopped with pipes open,  and  on  other  file
       systems	when the system stops with files that have been deleted while still open.  A clri
       will free the inode, and an icheck -s will recover any missing blocks.

       Why did it crash?  UNIX types a message on the  console	typewriter  when  it  voluntarily
       crashes.   Here is the current list of such messages, with enough information to provide a
       hope at least of the remedy.  The message has the form `panic: ...', possibly  accompanied
       by  other  information.	 Left  unstated  in all cases is the possibility that hardware or
       software error produced the message in some unexpected way.

	    The getblk routine was called with a nonexistent major  device  as	argument.   Defi-
	    nitely hardware or software error.

	    Null  device table entry for the major device used as argument to getblk.  Definitely
	    hardware or software error.

	    An I/O error reading the super-block for the root file system during initialization.

       out of inodes
	    A mounted file system has no more i-nodes when creating a file.   Sorry,  the  device
	    isn't available; the icheck should tell you.

       no fs
	    A device has disappeared from the mounted-device table.  Definitely hardware or soft-
	    ware error.

       no imt
	    Like `no fs', but produced elsewhere.

       no inodes
	    The in-core inode table is full.  Try increasing NINODE in param.h.  Shouldn't  be	a
	    panic, just a user error.

       no clock
	    During initialization, neither the line nor programmable clock was found to exist.

       swap error
	    An	unrecoverable  I/O  error  during a swap.  Really shouldn't be a panic, but it is
	    hard to fix.

       unlink - iget
	    The directory containing a file being deleted can't be found.  Hardware or software.

       out of swap space
	    A program needs to be swapped out, and there is no more swap space.   It  has  to  be
	    increased.	This really shouldn't be a panic, but there is no easy fix.

       out of text
	    A  pure  procedure	program is being executed, and the table for such things is full.
	    This shouldn't be a panic.

	    An unexpected trap has occurred within the system.	This is accompanied by three num-
	    bers:  a  `ka6',  which  is the contents of the segmentation register for the area in
	    which the system's stack is kept; `aps', which is the  location  where  the  hardware
	    stored the program status word during the trap; and a `trap type' which encodes which
	    trap occurred.  The trap types are:

       0	 bus error
       1	 illegal instruction
       2	 BPT/trace
       3	 IOT
       4	 power fail
       5	 EMT
       6	 recursive system call (TRAP instruction)
       7	 11/70 cache parity, or programmed interrupt
       10	 floating point trap
       11	 segmentation violation

       In some of these cases it is possible for octal 20 to be added into the	trap  type;  this
       indicates  that	the  processor	was  in user mode when the trap occurred.  If you wish to
       examine the stack after such a trap, either dump the system, or use the	console  switches
       to examine core; the required address mapping is described below.

       Interpreting dumps.  All file system problems should be taken care of before attempting to
       look at dumps.  The dump should be read into the file /usr/sys/core; cp(1)  will  do.   At
       this  point,  you should execute ps -alxk and who to print the process table and the users
       who were on at the time of the crash.  You should dump ( od(1))	the  first  30	bytes  of
       /usr/sys/core.  Starting at location 4, the registers R0, R1, R2, R3, R4, R5, SP and KDSA6
       (KISA6 for 11/40s) are stored.  If the dump had to be restarted, R0 will not  be  correct.
       Next,  take  the value of KA6 (location 022(8) in the dump) multiplied by 0100(8) and dump
       01000(8) bytes starting from there.  This is the  per-process  data  associated	with  the
       process	running at the time of the crash.  Relabel the addresses 140000 to 141776.  R5 is
       C's frame or display pointer.  Stored at (R5) is the old R5 pointing to the previous stack
       frame.	At  (R5)+2  is	the  saved PC of the calling procedure.  Trace this calling chain
       until you obtain an R5 value of 141756, which is where the user's R5 is	stored.   If  the
       chain  is  broken,  you	have to look for a plausible R5, PC pair and continue from there.
       Each PC should be looked up in the system's name list using adb(1) and its `:' command, to
       get  a  reverse	calling order.	In most cases this procedure will give an idea of what is
       wrong.  A more complete discussion of system debugging is impossible here.

       clri(1), icheck(1), dcheck(1), boot(8)

Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums

All times are GMT -4. The time now is 01:54 PM.