Guides for new HPC admins

 
Thread Tools Search this Thread
Special Forums UNIX and Linux Applications High Performance Computing Guides for new HPC admins
# 1  
Old 06-28-2011
Guides for new HPC admins

In my company, it's fallen on me to serve as the admin of our new HPC cluster, a task that's very new to me. It's very important to me to lay a solid foundation and avoid any unnecessary pitfalls. So, can anyone recommend a succinct guide or list of do's-and-don'ts for adiminstering an HPC cluster? The cluster runs the latest CentOS, PGI compilers, MPICH2, WRF, etc.

Thanks!
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. What is on Your Mind?

Good Practice Guides

A recent post where someone suggested redirecting with a clobber ">" to a file the same command was reading from prompted me to post this sysad good practice list. Some items are from times where I have learned things the hard way. I think this would be helpful so we can learn from each others... (8 Replies)
Discussion started by: ilikecows
8 Replies

2. AIX

AIX study Guides

Please help me in getting some fine docs ( other than redbooks)to learn AIX. My mail ID: qsecofr400@gmail.com Thanks in advance. (3 Replies)
Discussion started by: secofr
3 Replies

3. UNIX for Dummies Questions & Answers

Oracle guides for exams

hi people, I am very much interested to Oracle. I decided to write one exam soon and I am refering guides from certmagic.com. It seems good. Any of you know any good books than this ?! (0 Replies)
Discussion started by: developer_me
0 Replies

4. Windows & DOS: Issues & Discussions

Beginners Guides: Forgotten Passwords & Recovery Methods

Ever wondered how to recover or reset a forgotten password in WindowsXP? This site will help you get back into your computer, all without reinstalling the operating system. - Version 1.0.0: Reference: http://www.intelligenceweb.org/showthread.php?t=2 (6 Replies)
Discussion started by: Neo
6 Replies

5. Solaris

Mail server guides/tutorials?

solved issue (0 Replies)
Discussion started by: n0rus
0 Replies

6. UNIX for Dummies Questions & Answers

Solaris Study Guides

I am currently working on my Solaris 8.0 Certification ,and I've been working primarily with the Solaris Study Guide produced by Syngress & Osborne. It's a good study guide ,but I think that it does lack some clarity and detail! My question is ,are there any other Solaris Study Guides that could... (1 Reply)
Discussion started by: bilal_aa
1 Replies
Login or Register to Ask a Question
cmdisklock(1m)															    cmdisklock(1m)

NAME
cmdisklock - manage Serviceguard cluster lock devices. SYNOPSIS
cmdisklock check path cmdisklock [-f] reset path DESCRIPTION
cmdisklock is a tool to check the current state of a Serviceguard cluster lock device. It can also be used to reset the state of the clus- ter lock device. The need to reset the cluster lock device state could arise if the cluster lock device is replaced or becomes corrupt. A cluster lock device can be either an HP-UX LVM cluster lock or a cluster lock LUN device. HP-UX LVM cluster locks exist only on a disk in an LVM volume group. Cluster lock LUNs exist only on disks dedicated to cluster lock. cmdisklock is useful for checking either type of cluster lock and for re-initializing cluster lock LUN devices after a failure or corruption. NOTE To restore an HP-UX LVM cluster lock, use vgcfgrestore. cmdisklock will fail until vgcfgrestore is run, and cmdisklock is unnecessary as long as vgcfgbackup was done after the cluster lock was initialized. See the Managing Serviceguard manual for details. The syntax of the path option depends on the type of lock. For HP-UX LVM cluster lock disks, the syntax is VG:PV (for example: /dev/vglock:/dev/dsk/c0t0d2). For cluster lock LUN disks, the path is the disk device path. For example, /dev/sdd1 (on Linux) or /dev/dsk/c0t1d2 (on HP-UX). Options cmdisklock supports the following options: check Check the current state of the cluster lock device and report the results. reset Reset (initialize) the state of the cluster lock device. This operation should only be performed on a cluster lock LUN device. For HP-UX LVM cluster lock, use vgcfgrestore as documented in the Managing Serviceguard manual. After performing a reset, a check can be used to verify that the lock is cleared. EXAMPLES
If the cluster lock LUN device becomes corrupted and the cluster is up, messages like the following will appear in syslog. Mar 15 12:20:41 usb cmdisklockd[17599]: WARNING: Cluster lock LUN /dev/dsk/c0t1d2 is corrupt: bad label. Until this situation is cor- rected, a single failure could cause all nodes in the cluster to crash. Mar 15 12:20:41 usb cmdisklockd[17599]: After ensuring that all active nodes in the cluster have logged this message, run 'cmdisklock reset /dev/dsk/c0t1d2' to repair Mar 15 12:20:41 usb cmdisklockd[17599]: Cluster lock disk /dev/dsk/c0t1d2 is inaccessible Once the above messages appear in syslog on all running nodes, the following command will re-initialize the cluster lock LUN: ucd:/> cmdisklock reset /dev/dsk/c0t1d2 WARNING: Cluster lock LUN /dev/dsk/c0t1d2 is corrupt: bad label. Until this situation is corrected, a single failure could cause all nodes in the cluster to crash. After ensuring that all active nodes in the cluster have logged this message, run 'cmdisklock reset /dev/dsk/c0t1d2' to repair /dev/dsk/c0t1d2 is inaccessible Resetting cluster lock device /dev/dsk/c0t1d2 Cluster lock reset completed /dev/dsk/c0t1d2 is accessible cleared After the lock is restored, a message like the following appears in syslog: Mar 15 12:23:11 usb cmdisklockd[17599]: Cluster lock disk /dev/dsk/c0t1d2 is accessible WARNINGS
CAUTION For cluster lock LUN, reset is a potentially destructive operation. While cmdisklock checks for known volume manager and file system use (overridden by -f), it does not validate that the device to be reset is actually used by any cluster. If -f is used on the wrong device file, loss of data may result. CAUTION Care should be taken when doing a reset when the cluster is active as there is a remote possibility that the cluster will partition right when this command is run and both nodes could end up thinking they have successfully acquired the lock. To avoid this situation, make sure cmcld has logged a message in syslog on all running nodes saying the device is inaccessble, before performing a reset. Note that it is safe to run cmdisklock when the cluster is down. RETURN VALUE
cmdisklock returns the following values: 0 Successful completion. 1 The disk is inaccessible or is not recognized as a cluster lock. AUTHOR
cmdisklock was developed by HP. SEE ALSO
cmapplyconf(1m), cmviewcl(1m), vgcfgbackup(1m), vgcfgrestore(1m) Requires Optional Serviceguard Software cmdisklock(1m)