I work for one of my professors and we are trying to run SU2 in parallel on a cluster owned by the university that uses slurm for its workload manager. The problem we are running into is that when we ssh into the cluster and run the command:
on an assigned node by slurm (using sbatch), the code hangs and wont run. The weird thing about this is if we run the same command on the login node, it works just fine. Do any of you know what could possibly be the problem?
Here is some additional information:
- We talked with the IT guy in charge of the cluster and he doesn't have enough background to know what is going on.
- On some of our output files we would get the escape key [!0134h, when we changed the terminal settings to get rid of the escape key the code behavior was consistent as above.
- We can run SU2_CFD "config file", the code in serial, on both the login node and the cluster just fine
- We have tried running an interactive session on a node (using srun), no change in behavior
Any thoughts would be appreciated! We really want to be able to run the code in-house instead of outsource.
Moderator's Comments:
Please use CODE tags as required by forum rules!
Last edited by RudiC; 11-09-2016 at 04:07 AM..
Reason: Added CODE tags.
Setting up HACMP 6.1 on a two node cluster. The other node works fine and can start properly on STABLE state (VGs varied, FS mounted, Service IP aliased). However, the other node is always stuck on ST_JOINING state. Its taking forever and you can't stop the cluster as well or recover from script... (2 Replies)
Hi,
Is it possible to have a Solaris cluster of 2 nodes at SITE-A using SVM and creating metaset using say 2 LUNs (on SAN). Then replicating these 2 LUNs to remote site SITE-B via storage based replication and then using these LUNs by importing them as a metaset on a server at SITE-B which is... (0 Replies)
Hi Gurus,
I am very new to clustering and for test i have created a single node cluster, now i want to remove the system from cluster. Did some googling however as a newbee in cluster unable to co related the info.
Please help
Thanks (1 Reply)
hi,
i am trying to setup a 2 node cluster environment. following is what i have;
1. 2 x sun ultra60 - 450MHz procs, 1GB RAM, 9GB HDD, solaris 10
2. 2 x HBA cards
3. 2 x Connection leads to connect ultra60 with D1000
4. 1 x D1000 storage box.
5. 3 x 9GB HDD + 2 x 36GB HDD
first of all,... (1 Reply)
All-
I am new to these forums so please excuse me if this post is in the wrong place.
I had a node crash in a 4 node cluster and mgmt has determined this node will not be part of the cluster when rebuilt. I am researching how to remove it from the cluster information on the other 3 nodes and... (2 Replies)
Hi,
Please advise me whereas I have two node cluster server configured with MC/SG. Application and DB are running on Node 1, while Node 2 is standby.
All the volume group devices are part of cluster environment. There is only one package running at node 1.
Node 2 is having the problem to... (1 Reply)
Hello,
Under ksh I have to run a script on one of the nodes of a Solaris 8 cluster which at some time must execute a command on the alternate node:
# rsh <name> "command"
I have to implement this script on all the clusters of my company (a lot of...).
Fortunately, the names of the two nodes... (11 Replies)