Sponsored Content
Special Forums UNIX and Linux Applications High Performance Computing Python code runs on login node but not on cluster Post 302985382 by devinmgibson on Tuesday 8th of November 2016 05:29:10 PM
Old 11-08-2016
Python code runs on login node but not on cluster

I work for one of my professors and we are trying to run SU2 in parallel on a cluster owned by the university that uses slurm for its workload manager. The problem we are running into is that when we ssh into the cluster and run the command:

Code:
parallel_computation.py -f SU2.cfg

on an assigned node by slurm (using sbatch), the code hangs and wont run. The weird thing about this is if we run the same command on the login node, it works just fine. Do any of you know what could possibly be the problem?

Here is some additional information:
- We talked with the IT guy in charge of the cluster and he doesn't have enough background to know what is going on.
- On some of our output files we would get the escape key [!0134h, when we changed the terminal settings to get rid of the escape key the code behavior was consistent as above.
- We can run SU2_CFD "config file", the code in serial, on both the login node and the cluster just fine
- We have tried running an interactive session on a node (using srun), no change in behavior

Any thoughts would be appreciated! We really want to be able to run the code in-house instead of outsource.


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 11-09-2016 at 04:07 AM.. Reason: Added CODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

The other node name of a SUN cluster

Hello, Under ksh I have to run a script on one of the nodes of a Solaris 8 cluster which at some time must execute a command on the alternate node: # rsh <name> "command" I have to implement this script on all the clusters of my company (a lot of...). Fortunately, the names of the two nodes... (11 Replies)
Discussion started by: heartwork
11 Replies

2. HP-UX

Node can't join cluster

Need help guys! when running cmrunnode batch i'm getting this error cmrunnode : Waiting for cluster to... (1 Reply)
Discussion started by: Tris
1 Replies

3. HP-UX

MC/SG Fail to join cluster node

Hi, Please advise me whereas I have two node cluster server configured with MC/SG. Application and DB are running on Node 1, while Node 2 is standby. All the volume group devices are part of cluster environment. There is only one package running at node 1. Node 2 is having the problem to... (1 Reply)
Discussion started by: rauphelhunter
1 Replies

4. High Performance Computing

Removed crashed node from Solaris Cluster 3.0

All- I am new to these forums so please excuse me if this post is in the wrong place. I had a node crash in a 4 node cluster and mgmt has determined this node will not be part of the cluster when rebuilt. I am researching how to remove it from the cluster information on the other 3 nodes and... (2 Replies)
Discussion started by: bluescreen
2 Replies

5. High Performance Computing

Setting up 2 node cluster using solaris 10

hi, i am trying to setup a 2 node cluster environment. following is what i have; 1. 2 x sun ultra60 - 450MHz procs, 1GB RAM, 9GB HDD, solaris 10 2. 2 x HBA cards 3. 2 x Connection leads to connect ultra60 with D1000 4. 1 x D1000 storage box. 5. 3 x 9GB HDD + 2 x 36GB HDD first of all,... (1 Reply)
Discussion started by: solman17
1 Replies

6. Solaris

Active Sun cluster node?

I now the logical name and Virtual IP of the cluster. How can I find the active sun cluster node without having root access? (3 Replies)
Discussion started by: sreeniatbp
3 Replies

7. HP-UX

Identify cluster active node

Hello, Is there any way to identify the active node in a HP-UX cluster without root privileges? (3 Replies)
Discussion started by: psimoes79
3 Replies

8. Solaris

How to remove single node cluster

Hi Gurus, I am very new to clustering and for test i have created a single node cluster, now i want to remove the system from cluster. Did some googling however as a newbee in cluster unable to co related the info. Please help Thanks (1 Reply)
Discussion started by: kumarmani
1 Replies

9. Solaris

SVM metaset on 2 node Solaris cluster storage replicated to non-clustered Solaris node

Hi, Is it possible to have a Solaris cluster of 2 nodes at SITE-A using SVM and creating metaset using say 2 LUNs (on SAN). Then replicating these 2 LUNs to remote site SITE-B via storage based replication and then using these LUNs by importing them as a metaset on a server at SITE-B which is... (0 Replies)
Discussion started by: dn2011
0 Replies

10. AIX

Cluster node not starting

Setting up HACMP 6.1 on a two node cluster. The other node works fine and can start properly on STABLE state (VGs varied, FS mounted, Service IP aliased). However, the other node is always stuck on ST_JOINING state. Its taking forever and you can't stop the cluster as well or recover from script... (2 Replies)
Discussion started by: depam
2 Replies
cmdeleteconf(1m)														  cmdeleteconf(1m)

NAME
cmdeleteconf - Delete either the cluster or the package configuration SYNOPSIS
cmdeleteconf [-f] [-v] [-c cluster_name] [[-p package_name]...] DESCRIPTION
cmdeleteconf deletes either the entire cluster configuration, including all its packages, or only the specified package configuration. If neither cluster_name nor package_name is specified, cmdeleteconf will delete the local cluster's configuration and all its packages. If the local node's cluster configuration is outdated, cmdeleteconf without any argument will only delete the local node's configuration. If only the package_name is specified, the configuration of package_name in the local cluster is deleted. If both cluster_name and pack- age_name are specified, the package must be configured in the cluster_name, and only the package package_name will be deleted. cmdelete- conf with only cluster_name specified will delete the entire cluster configuration on all the nodes in the cluster, regardless of the con- figuration version. The local cluster is the cluster that the node running the cmdeleteconf command belongs to. Only a superuser, whose effective user ID is zero (see id(1) and su(1)), can delete the configuration. To delete the cluster configuration, halt the cluster first. To delete a package configuration you must halt the package first, but you do not need to halt the cluster (it may remain up or be brought down). To delete the package VxVM-CVM-pkg (HP-UX only), you must first delete all packages with STORAGE_GROUP defined. While deleting the cluster, if any of the cluster nodes are powered down, the user can choose to continue deleting the configuration. In this case, the cluster configuration on the down node will remain in place and, therefore, be out of sync with the rest of the cluster. If the powered-down node ever comes up, the user should execute the cmdeleteconf command with no argument on that node to clean up the config- uration before doing any other Serviceguard command. Options cmdeleteconf supports the following options: -f Force the deletion of either the cluster configuration or the package configuration. -v Verbose output will be displayed. -c cluster_name Name of the cluster to delete. The cluster must be halted already, if intending to delete the cluster. -p package_name Name of an existing package to delete from the cluster. The package must be halted already. There should not be any packages in the cluster with STORAGE_GROUP defined before having a package_name of VxVM-CVM-pkg (HP-UX only). RETURN VALUE
Upon completion, cmdeleteconf returns one of the following values: 0 Successful completion. 1 Command failed. EXAMPLES
The high availability environment contains the cluster, clusterA , and a package, pkg1. To delete package pkg1 in clusterA, do the following: cmdeleteconf -f -c clusterA -p pkg1 To delete the cluster clusterA and all its packages, do the following: cmdeleteconf -f -c clusterA AUTHOR
cmdeleteconf was developed by HP. SEE ALSO
cmcheckconf(1m), cmapplyconf(1m), cmgetconf(1m), cmmakepkg(1m), cmquerycl(1m). Requires Optional Serviceguard Software cmdeleteconf(1m)
All times are GMT -4. The time now is 08:00 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy