01-24-2011
Well, I guess that electrical issues would be the biggest. If the boxes shutdown gracefully, there is no obvious reason why it wouldnt then boot again gracefully.
First I guess you'd have to identify which boxes are down (UPS boxes may have survived, or some may be down depending on the UPS battery time). I would recommend a list of boxes to check whether they are up or down. An SNMP system may be useful to check on hosts, network applicances and services. Then once you have identified which services need to be started, you need to identify in what order (Network Switches, DNS, DHCP, SAN/NAS boxes, Active Directory, file servers, etc). Make sure they booted ok before you move onto secondary services. Create a document detailing how you would test these services to make sure they are working and the definitive order of which to boot first. Once they are up, then list the secondary services you would need to reboot and how to test they are working. With UNIX hosts check the /var/log/messages (or appropriate syslog entries), on windows check event viewer to check that everything is running ok. To be honest you cant really second guess why services may be down, so it is hard to preempt that. You should make sure you have all the necessary documentation, including error messages for all the services you are trying to run so that in an emergency you can find it quickly. You could build a plan on what you would do in the event that a piece (or multiple pieces) of hardware have failed. Eg spare hardware, restore documentation, etc. Keep a telephone list of people that may be called upon to fix hardware or software services in an emergency. Keep a list of hardware serial numbers, contracts, SLA's and telephone numbers for emergency callout for hardware and software vendors, so that you can call them in an emergency to get them fixed. Virtual machines are very useful as you can have 2 or more host machines with standby virtual images containing up-to-date backups that can be started in the event that a given piece of hardware has died. VMware, for example, allows you to create pools of virtual machine hosts that can take over functionality easily and quickly should one fail....erm otherwise I would get a book on the subject or google the subject as a whole, as Im sure there are major area's Ive missed. I hope this helps...
9 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
For you Unix sysadmins: what are you 10 most common duties/responsibilities as sysadmins and what would you suggest a newbie sysadmin do to learn them?
For instance, say adding/deleting users is one of your most common duties. So a newbie would be wise to get familiar with useradd/userdel,... (15 Replies)
Discussion started by: jatkins679
15 Replies
2. AIX
HELLOW ALL
Can any one tell me what are the Requirements for any system administrator to be a system administrators (After taking all the courses for IBM or the the track that requires only during your job). (1 Reply)
Discussion started by: magasem
1 Replies
3. Solaris
hi guys,
i'm new so don't bite too hard.
having a problem booting up a V210 running sol9 on after a power outage...
an init5 was done but not a init0 before the power cut... so now when booting up it gives the ff:
SC Alert: Host System has Reset
Probing system devices
Probing memory... (2 Replies)
Discussion started by: lungsta
2 Replies
4. What is on Your Mind?
Hello Unix Experts,
I'm going to be graduating with a CIS (Computer Information Systems) degree in the coming year. I have been offered an internship with a job title of Unix Administrator under a well known company. I understand that Unix is used for high-end servers in many large... (1 Reply)
Discussion started by: brentmd24
1 Replies
5. Red Hat
Hi All,
At present i have good knowledge and experience in unix/ linux shell scripting. I believe unix shell scripting with administration will be a hot skill set, so I would like to become a Unix/Linux system admin. What are the key skills i have to learn to become a successful administrator.... (1 Reply)
Discussion started by: apsprabhu
1 Replies
6. Solaris
Dear friends
I have a doubt 4 months back i've completed my Solaris course now i'am searching for job on 2+ years experience please anyone tell me what are the common responsibilities of solaris admin means when i'll get a job what is the common daily work for me in office as a 2+ years... (7 Replies)
Discussion started by: suneelieg
7 Replies
7. AIX
i am new to aix environment and all my servers are @ remote location
just curious to know , what issues/tasks we will be facing when there is a power outage in a data centre, i heard outage's will be a challenging task for administrators.. any example of that sort will be a great help (2 Replies)
Discussion started by: rigin
2 Replies
8. AIX
Hello Forum,
I am very newbie with AIX.
We have 2 AIX 9111-285 servers. The OS version is 5.3.
After the power outage, they did not come up.
I try to unplug the power cable and re-connect after 1 minutes but do not help.
Both display the same reference code 110000AC on the front panel... (6 Replies)
Discussion started by: lilyn
6 Replies
9. AIX
Hi everyone,
We had a power outage few days ago, and I got the servers up and running but I was informed to look into, if there is a way to bring up the servers automatically/defaultly.
I was told the windows admin has their server set up where the servers are up automatically if there is a... (11 Replies)
Discussion started by: Adnans2k
11 Replies
POWERD(8) System Manager's Manual POWERD(8)
NAME
powerd - UPS monitoring daemon
SYNOPSIS
/sbin/powerd [tty]
DESCRIPTION
powerd monitors the serial port connected to an UPS device and will perform an unattended shutdown of the system if the UPS is on battery
longer than a specified number of minutes. powerd needs to watch a tty with modem control properties. Please refer to the powerd documen-
tation for further information.
powerd also has the capabilities of notifying other clients on the network that may have a UPS but not be connected to the serial line that
there is a power outage, and id configured through the powerd.conf file
CONFIGURATION FILE
Here is the configuration format:
Lines beginning with '#' are ignored.
MODE <mode>
specifies the mode the UPS should be in. Valid arguements are MONITOR and PEER MONITOR being the mode to actually watch a UPS serial
port, and PEER being to listen for a connection from a machine in MONITOR mode
MONITOR <device>
Specifies which device to monitor while in MONITOR mode. Specify an actual device file. Example: /dev/ttyS0
POWERFAIL <line> <high|low>
Specifies which lines on the serial ports indicate that the power is out. Valid arguements are DCD, CAR, CTS, and RNG. Also specify
if the line being HIGH or LOW indicates a power failure.
Since most people may not know this arguement, Please use the enclosed upsdetect program to automatically find this line.
NOTIFY <hostname[:port]> <password>
Specified in MONITOR mode to notify a client running in PEER mode. Specify the hostname of the machine, and optional port the daemon
is running on, And the password as specified by their LISTEN command. See below for more details.
LISTEN <hostname> <password>
Specified in PEER mode, specifies a hostname that is allowed to notify us of when the power is out, and the password they shall give
us to authenticate themselves. The 2 passwords should match on the MONITOR mode machines NOTIFY password, and the PEER modes LISTEN
password.
LISTENPORT <port>
Specified in PEER mode, specifies the port that powerd should listen on. If you use this arguement, powerd shall not default to
using port 532, and the machine in MONITOR mode must specify the port you use in their NOTIFY command.
DELAY <delayinseconds>
Specifies how many seconds before notifying init of a power outage. Note that this doesnt mean that the system will shut down in
that many seconds, as it depends on how init is configured. Init usually issues a 2 min shutdown.
USER <username>
Specifies which username to drop to from root. The program will reobtain root access only when it needs to, Like when notifying init
that the power is out. This is simply a security feature, and not needed for powerd to operate. Note: powerd must still be run ini-
tially as root. It will then drop to the user if, and only if, a username is specified.
ARGUMENTS
None: Please use the configuration file /etc/powerd.conf which can be generated with detectups. See detectups(8) for more information
FILES
/etc/powerd.conf powerd configuration file
/etc/powerstatus indicates line power status
/etc/inittab init is what actually issues the shutdown
SEE ALSO
powerd(8), shutdown(8), wall(1), init(8), inittab(5).
AUTHOR
James Brents <James@nistix.com> (with parts of this man page borrowed from all over the Linux community)
POWERD(8)