problem with paralell computing


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers problem with paralell computing
# 1  
Old 06-01-2007
problem with paralell computing

I use a quantum chemistry program called Gaussian 03, which I run on a cluster of computers. The cluster consists of a main node plus 11 other computers (nodes). Gaussian uses software 'linda' for paralell computing.

Often my job (process) will freeze - which I can get running again by pressing ctrl-c.
On closer examination, I found that the job freezes when one of the subprocesses becomes defunct. This leads to the master node not communicating to the other nodes and hence freeze in calculation. ctrl-c probably kills the defunct process and gets the calculation running again.


It is a big hassel and waste of time to press ctrl-c every few hours. What can I do about it?
Thanks.
# 2  
Old 06-01-2007
Do you need an advice how to build a script who will simulate press of "Ctrl+C" ? You should report this issue to the respective software developers.
# 3  
Old 06-01-2007
I favor sysgate's second answer. You may be losing data with ctrl/c as well.
This is definitely a vendor problem.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. High Performance Computing

Accelerated computing / GPUs

There are plenty of sources that explain the performance per watt of a computer. However, I wanted to investigate how accelerated computer components (notably GPUs) have become more efficient at a lower price over the years. I have thus defined a metric: performance per watt per price-unit, and... (0 Replies)
Discussion started by: figaro
0 Replies

2. High Performance Computing

I/O bound computing clusters

I want to build a computing cluster and have been looking into grid solutions. My understanding from grid solutions is that participating nodes have to actually sign up to participate in a computation and that an isolated piece of work is sent to the node through a request from that node (pull).... (4 Replies)
Discussion started by: figaro
4 Replies

3. Programming

Computing an MD5Sum in C

Is it possible to call the unix command md5sum from within a C program. I am trying to write a C program that scans a directory and computes the MD5Sum of all the files in the directory. Whenever I use md5sum 'filename' I get the error 'md5sum undeclared'. Is there a header file or some library... (3 Replies)
Discussion started by: snag49ers
3 Replies

4. High Performance Computing

how to do GRID COMPUTING?

Hello, I want to know how to combine the processing power of given 2 FEDORA machines in LAN. Can you please tell me the commands,etc used to perform such an operations.Can you please give me the links where I can find more info on this topic. (5 Replies)
Discussion started by: nsharath
5 Replies

5. Virtualization and Cloud Computing

Event Cloud Computing - IBM Turning Data Centers Into ?Computing Cloud?

Tim Bass Thu, 15 Nov 2007 23:55:07 +0000 *I predict we may experience less*debates*on the use of the term “event cloud”*related to*CEP in the future, now that both IBM and Google* have made announcements about “cloud computing” and “computing cloud”, IBM Turning Data Centers Into ‘Computing... (0 Replies)
Discussion started by: Linux Bot
0 Replies

6. Cybersecurity

Trusted Computing

About a year ago, a friend of mine who worked on the OReilly Snort book took a propsal he and I had worked on for a book on Trusted Computing. Though the editor thought the content was good and worthwhile, he felt that there wasn't enough of a market to justify printing such a work. How many... (0 Replies)
Discussion started by: kduffin
0 Replies

7. UNIX for Dummies Questions & Answers

Two paralell ports on a serverbox

Hi, I am trying to print ro a netwerkprinter on AIX. The printer is connected to a accton printserverbox. The box has three paralell ports and one IP adress. How can I print to the first port? I made standard printq's, and I added the printer to the /etc/hosts file (1 Reply)
Discussion started by: adje75
1 Replies
Login or Register to Ask a Question