CUDA GPU terminates process at random instances


 
Thread Tools Search this Thread
Special Forums Hardware CUDA GPU terminates process at random instances
# 1  
Old 12-18-2016
CUDA GPU terminates process at random instances

I am trying to start troubleshooting an error on a virtual server that uses the ubuntu 14.04 OS. Basically what happens (seeming random) is that the GPU stops processing and terminates. What Imean by seeming random is that for 3 runs there is no error then on run 4 the error appears. It has happend 4 times now and about the only consistency is that it appears to error at the same time - cycle 21 (as indicated by the log not included). If I reboot the GPU starts up again and processes normal.
Are there any commands/recommendations that might help me figure out what is going on? Thank you Smilie.

Error:
Code:
CUDA: gpuDeviceConfig: device added for evaluation: 0:GeForce GTX 970 v5.2
3.99982GB
CUDA: gpuDeviceConfig: minimum compute version used for pipeline: 2.0
CUDA 0: gpuDeviceConfig::initDeviceContexts: Creating Context and Constant
memory on device with id: 0
terminate called after throwing an instance of 'cudaExecutionException'

+----------------------------------------
 | ** CUDA ERROR! **
 | Error: 46
 | Msg: all CUDA-capable devices are busy or unavailable
 | File: 
cudaWrapper.cpp
 | Line: 127
 +----------------------------------------
  what():  CUDA EXCEPTION: Error occurred during job Execution!


Last edited by rbatte1; 12-19-2016 at 09:13 AM.. Reason: Removed SIZE tags from within CODE tags
# 2  
Old 12-19-2016
I know very little about GPU programming, but from the error message I would assume that you are asking the GPU to start a new thread when the resources needed to run that thread are not available.

What does your documentation for your GeForce GTX 970 v5.2 say error code 46 means? What are you running on your GPU?

What is cycle 21 in your GPU code doing?
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 12-20-2016
Error 46 seems to be a CUDA API error. The GPU runs data-intensive analysis utilizing hpc clustering and parallel-processing.

Code:
File:
     /sw_results/R_2016_12_05_13_30_48_user_S5-00580-17-Medexome/X0_Y0/acq_0020.
     dat
     FileLoadWorker: ImageProcessing time for flow 21: 0.65(ld=0.39 pin=0.05
     cnc=0.11 xt=0.09 sem=0.00 cache=0.06) sec 16:07:13
     File:
     /sw_results/R_2016_12_05_13_30_48_user_S5-00580-17-Medexome/X0_Y0/acq_0021.
     dat
     CUDA: gpuDeviceConfig: device added for evaluation: 0:GeForce GTX 970 v5.2
     3.99982GB
     CUDA: gpuDeviceConfig: minimum compute version used for pipeline: 2.0
     CUDA 0: gpuDeviceConfig::initDeviceContexts: Creating Context and Constant
     memory on device with id: 0
     terminate called after throwing an instance of 'cudaExecutionException'

It seems the CUDA exception was thrown in flow 21 and the GPU was interrupted. Is there a way that I may be able to figure out the cause of that interruption? Thank you Smilie.

Last edited by cmccabe; 12-20-2016 at 09:35 AM.. Reason: added details
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Session terminates automatically

Hi Am using unix Aix Am facing an issue with my login. When i enter user_id and password i can able to login and can able to work on it. When i keep this session idle then again started accesing then i cant able to acesss giving error message "session terminated" everytime am reset my... (1 Reply)
Discussion started by: O_vvv
1 Replies

2. Shell Programming and Scripting

Need to generate a file with random data. /dev/[u]random doesn't exist.

Need to use dd to generate a large file from a sample file of random data. This is because I don't have /dev/urandom. I create a named pipe then: dd if=mynamed.fifo do=myfile.fifo bs=1024 count=1024 but when I cat a file to the fifo that's 1024 random bytes: cat randomfile.txt >... (7 Replies)
Discussion started by: Devyn
7 Replies

3. Programming

Generating Random Number in Child Process using Fork

Hello All, I am stuck up in a program where the rand functions ends up giving all the same integers. Tried sleep, but the numbers turned out to be same... Can anyone help me out how to fix this issue ? I have called the srand once in the program, but I feel like when I call fork the child process... (5 Replies)
Discussion started by: manisum
5 Replies

4. Homework & Coursework Questions

Process, where each process generates a random integer

Hello all, I am writing a program where user enters an integer and the program creates that number of processes. Each child process generates a random integer. When a child process calls a procedure say Myprocedure it should terminate where as the parent process wait for the child to terminate. (4 Replies)
Discussion started by: manisum
4 Replies

5. Shell Programming and Scripting

Capturing PIDs of same process at different instances

Hi, I'm gonna launch a process from my 'C' code. I'm gonna launch it a few times. I would like to capture the PID of that process each time I launch. I have to copy the each PIDs into a 'C' variable and I have to kill all of them when I exit from the 'C' code. My requirement is int... (3 Replies)
Discussion started by: suryaemlinux
3 Replies

6. UNIX for Advanced & Expert Users

GPU and CUDA

Hi , i want begin programming using CUDA which enviroment can i get .I don't have desktop to buy GPU graphics card. what should to do to get CUDA enviroment. i'm thinking to buy desktop has this card or laptop (1 Reply)
Discussion started by: Scotch
1 Replies

7. Shell Programming and Scripting

shell script to auto process ten random files and generate logs

Hello member's I'm learning to script in the ksh environment on a Solaris Box. I have 10 files in a directory that I need to pass, as input to a batch job one by one. lets say, the files are named as follows: abcd.txt ; efgh.bat ; wxyz.temp etc. (random filenames with varied extensions ).... (1 Reply)
Discussion started by: novice82
1 Replies

8. Shell Programming and Scripting

Checking for multiple instances of a process

Hi I have a scenario where i need to check multiple instances of a running shell script (abc.sh) . How can I find from inside a running shell script whether any other instance of the same script is running or not? If any other instance of same shell script is running I need to exit from... (4 Replies)
Discussion started by: raghu.amilineni
4 Replies

9. AIX

Zerofault terminates and coredumps - Segmentation fault

Hi, I am using zerofault in AIX to find memory leaks for my server. zf -c <forked-server> zf -l 30 <server> <arguments> Then after some (5 mins ) it terminates core dumping and saying server exited abnormally. I could not understand the core file generated: its something like show in below... (0 Replies)
Discussion started by: vivek.gkp
0 Replies
Login or Register to Ask a Question