problem in time limit of a job on slurm


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers problem in time limit of a job on slurm
# 1  
Old 02-08-2012
problem in time limit of a job on slurm

Hello everyone.

I am trying to do a parallel computation and the computation continues for like an hour and then it stops with the error:

slurmd[veredas4]: *** JOB 785385 CANCELLED AT 2012-02-08T20:18:42 DUE TO TIME LIMIT ***

Please can anyone tell me what is going on and how to fix the error...I changed the number of nodes but in vain..

Thankssss
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Time limit on ifconfig wlan0 down?

Is there a time limit ifconfig wlan0 down? I used that command to take my wireless down. sudo ifconfig wlan0 downWhen I came back about 6 hours later it was working without me bringing my wireless back up. I am the only one that uses my computer or that knows the root password. (0 Replies)
Discussion started by: cokedude
0 Replies

2. UNIX for Dummies Questions & Answers

SLURM script for quantum espresso

Hello everyone. I am trying to submit a job on quantum espresso program on a SLURM environment (parallel computing); of course SBATCH is used. I am a UNIX dummy and not sure if my script is right but i keep getting the error: sbatch: error: Batch job submission failed: job has invalid account. The... (0 Replies)
Discussion started by: lebphys78
0 Replies

3. UNIX for Advanced & Expert Users

Completion time of a cron job ?

how can we identify the task completion time which was initiated by CRON. we have 1000 of jobs whihc are runing from cron so it is not feasable to edit every cron entry or every script to add the respective code to find teh completion time. Can some one please provide a inut to find the task... (3 Replies)
Discussion started by: NIMISH AGARWAL
3 Replies

4. UNIX for Dummies Questions & Answers

ldapsearch time limit

We have an application that uses Active directory to authenticate the users. the admins of the app. were complaining because the windows domain controller they are going against is not very stable. I wrote a shell script using ldapsearch to look up a user against the domain controller their app... (2 Replies)
Discussion started by: jayjabour
2 Replies

5. Shell Programming and Scripting

I need to set a time limit for a script

Hello Folks, I have been asked to write a test script which can be run by students. the script should have a time limit. I have almost completed it except the bit of timing! I've seen something like this: on_timeout() { echo "$USER $score " >> theresult.txt echo "Time out!... (2 Replies)
Discussion started by: SultanKSA
2 Replies

6. Shell Programming and Scripting

Start time/end time and status of crontab job

Is there anyway to get the start time and end time / status of a crontab job which was just completed? Of course, we know the start time of the crontab job since we are scheduling. But I would like to know process start and time recorded somewhere or can be fetched from a command like 'ps'. ... (3 Replies)
Discussion started by: thambi
3 Replies

7. AIX

failed login time limit

Hello, we had a situation where an account was locked out due to too many failed login attempts. From the logs (failedlogin, etc) it appears that AIX 'remembered' the failed login attempts from the past month or so. does anyone know where this is set, or how long it will remember the number of... (2 Replies)
Discussion started by: zuessh
2 Replies

8. Shell Programming and Scripting

Run job for a period of time

I have a job that runs for an unspecified amount of time. I want to run this as a cron job for a specified amount of time, say 2 hours. Once the time is up, the program should be killed in the middle of execution. How can I do this? Thanks. (5 Replies)
Discussion started by: cooldude
5 Replies

9. UNIX for Dummies Questions & Answers

Limit login time...

How do I limit the amount of idle time an account gets on solaris? Thanks. VJ (2 Replies)
Discussion started by: vancouver_joe
2 Replies
Login or Register to Ask a Question
Slurm API(3)						      Slurm job signal calls						      Slurm API(3)

NAME
slurm_kill_job, slurm_kill_job_step, slurm_signal_job, slurm_signal_job_step, slurm_terminate_job, slurm_terminate_job_step - Slurm job signal calls SYNTAX
#include <slurm/slurm.h> int slurm_kill_job ( uint32_t job_id, uint16_t signal, uint16_t batch_flag ); int slurm_kill_job_step ( uint32_t job_id, uint32_t job_step_id, uint16_t signal ); int slurm_signal_job ( uint32_t job_id, uint16_t signal ); int slurm_signal_job_step ( uint32_t job_id, uint32_t job_step_id, uint16_t signal ); int slurm_terminate_job ( uint32_t job_id, ); int slurm_terminate_job_step ( uint32_t job_id, uint32_t job_step_id, ); ARGUMENTS
batch_flag If non-zero then signal only the batch job shell. job_id Slurm job id number. job_step_id Slurm job step id number. signal Signal to be sent to the job or job step. DESCRIPTION
slurm_kill_job Request that a signal be sent to either the batch job shell (if batch_flag is non-zero) or all steps of the specified job. If the job is pending and the signal is SIGKILL, the job will be terminated immediately. This function may only be successfully executed by the job's owner or user root. slurm_kill_job_step Request that a signal be sent to a specific job step. This function may only be successfully executed by the job's owner or user root. slurm_signal_job Request that the specified signal be sent to all steps of an existing job. slurm_signal_job_step Request that the specified signal be sent to an existing job step. slurm_terminate_job Request termination of all steps of an existing job by sending a REQUEST_TERMINATE_JOB rpc to all slurmd in the the job allocation, and then calls slurm_complete_job(). slurm_signal_job_step Request that terminates a job step by sending a REQUEST_TERMI- NATE_TASKS rpc to all slurmd of a job step, and then calls slurm_complete_job_step() after verifying that all nodes in the job step no longer have running tasks from the job step. (May take over 35 seconds to return.) RETURN VALUE
On success, zero is returned. On error, -1 is returned, and Slurm error code is set appropriately. ERRORS
SLURM_PROTOCOL_VERSION_ERROR Protocol version has changed, re-link your code. ESLURM_DEFAULT_PARTITION_NOT_SET the system lacks a valid default partition. ESLURM_INVALID_JOB_ID the requested job id does not exist. ESLURM_JOB_SCRIPT_MISSING the batch_flag was set for a non-batch job. ESLURM_ALREADY_DONE the specified job has already completed and can not be modified. ESLURM_ACCESS_DENIED the requesting user lacks authorization for the requested action (e.g. trying to delete or modify another user's job). ESLURM_INTERCONNECT_FAILURE failed to configure the node interconnect. SLURM_PROTOCOL_SOCKET_IMPL_TIMEOUT Timeout in communicating with SLURM controller. NOTE
These functions are included in the libslurm library, which must be linked to your process for use (e.g. "cc -lslurm myprog.c"). COPYING
Copyright (C) 2002 The Regents of the University of California. Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). CODE-OCEC-09-009. All rights reserved. This file is part of SLURM, a resource management program. For details, see <http://www.schedmd.com/slurmdocs/>. SLURM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. SLURM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. SEE ALSO
scancel(1), slurm_get_errno(3), slurm_perror(3), slurm_strerror(3) Morris Jette November 2003 Slurm API(3)