06-01-2007
problem with paralell computing
I use a quantum chemistry program called Gaussian 03, which I run on a cluster of computers. The cluster consists of a main node plus 11 other computers (nodes). Gaussian uses software 'linda' for paralell computing.
Often my job (process) will freeze - which I can get running again by pressing ctrl-c.
On closer examination, I found that the job freezes when one of the subprocesses becomes defunct. This leads to the master node not communicating to the other nodes and hence freeze in calculation. ctrl-c probably kills the defunct process and gets the calculation running again.
It is a big hassel and waste of time to press ctrl-c every few hours. What can I do about it?
Thanks.
7 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
I am trying to print ro a netwerkprinter on AIX.
The printer is connected to a accton printserverbox. The box has three paralell ports and one IP adress. How can I print to the first port?
I made standard printq's, and I added the printer to the /etc/hosts file (1 Reply)
Discussion started by: adje75
1 Replies
2. Cybersecurity
About a year ago, a friend of mine who worked on the OReilly Snort book took a propsal he and I had worked on for a book on Trusted Computing. Though the editor thought the content was good and worthwhile, he felt that there wasn't enough of a market to justify printing such a work.
How many... (0 Replies)
Discussion started by: kduffin
0 Replies
3. Virtualization and Cloud Computing
Tim Bass
Thu, 15 Nov 2007 23:55:07 +0000
*I predict we may experience less*debates*on the use of the term “event cloud”*related to*CEP in the future, now that both IBM and Google* have made announcements about “cloud computing” and “computing cloud”, IBM Turning Data Centers Into ‘Computing... (0 Replies)
Discussion started by: Linux Bot
0 Replies
4. High Performance Computing
Hello,
I want to know how to combine the processing power of given 2 FEDORA machines in LAN.
Can you please tell me the commands,etc used to perform such an operations.Can you please give me the links where I can find more info on this topic. (5 Replies)
Discussion started by: nsharath
5 Replies
5. Programming
Is it possible to call the unix command md5sum from within a C program. I am trying to write a C program that scans a directory and computes the MD5Sum of all the files in the directory. Whenever I use md5sum 'filename' I get the error 'md5sum undeclared'. Is there a header file or some library... (3 Replies)
Discussion started by: snag49ers
3 Replies
6. High Performance Computing
I want to build a computing cluster and have been looking into grid solutions. My understanding from grid solutions is that participating nodes have to actually sign up to participate in a computation and that an isolated piece of work is sent to the node through a request from that node (pull).... (4 Replies)
Discussion started by: figaro
4 Replies
7. High Performance Computing
There are plenty of sources that explain the performance per watt of a computer. However, I wanted to investigate how accelerated computer components (notably GPUs) have become more efficient at a lower price over the years. I have thus defined a metric: performance per watt per price-unit, and... (0 Replies)
Discussion started by: figaro
0 Replies
clinfo(1M) System Administration Commands clinfo(1M)
NAME
clinfo - display cluster information
SYNOPSIS
clinfo [-nh]
DESCRIPTION
The clinfo command displays cluster configuration information about the node from which the command is executed.
Without arguments, clinfo returns an exit status of 0 if the node is configured and booted as part of a cluster. Otherwise, clinfo returns
an exit status of 1.
OPTIONS
The following options are supported:
-h Displays the highest node number allowed to be configured. This is different from the maximum number of nodes supported in a given
cluster. The current highest configured node number can change immediately after the command returns since new nodes can be dynam-
ically added to a running cluster.
For example, clinfo -h might return 64, meaning that the highest number you can use to identify a node is 64. See the Sun Cluster
3.0 System Administration Guide for a description of utilities you can use to determine the number of nodes in a cluster.
-n Prints the number of the node from which clinfo is executed.
EXIT STATUS
The following exit values are returned:
0 Successful completion.
1 An error occurred.
This is usually because the node is not configured or booted as part of a cluster.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWcsu |
+-----------------------------+-----------------------------+
SEE ALSO
attributes(5)
SunOS 5.10 12 Mar 2002 clinfo(1M)