CUDA GPU terminates process at random instances


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 12-18-2016
CUDA GPU terminates process at random instances

I am trying to start troubleshooting an error on a virtual server that uses the ubuntu 14.04 OS. Basically what happens (seeming random) is that the GPU stops processing and terminates. What Imean by seeming random is that for 3 runs there is no error then on run 4 the error appears. It has happend 4 times now and about the only consistency is that it appears to error at the same time - cycle 21 (as indicated by the log not included). If I reboot the GPU starts up again and processes normal.
Are there any commands/recommendations that might help me figure out what is going on? Thank you Smilie.

Error:
Code:
CUDA: gpuDeviceConfig: device added for evaluation: 0:GeForce GTX 970 v5.2
3.99982GB
CUDA: gpuDeviceConfig: minimum compute version used for pipeline: 2.0
CUDA 0: gpuDeviceConfig::initDeviceContexts: Creating Context and Constant
memory on device with id: 0
terminate called after throwing an instance of 'cudaExecutionException'

+----------------------------------------
 | ** CUDA ERROR! **
 | Error: 46
 | Msg: all CUDA-capable devices are busy or unavailable
 | File: 
cudaWrapper.cpp
 | Line: 127
 +----------------------------------------
  what():  CUDA EXCEPTION: Error occurred during job Execution!


Last edited by rbatte1; 12-19-2016 at 08:13 AM.. Reason: Removed SIZE tags from within CODE tags
# 2  
Old 12-19-2016
I know very little about GPU programming, but from the error message I would assume that you are asking the GPU to start a new thread when the resources needed to run that thread are not available.

What does your documentation for your GeForce GTX 970 v5.2 say error code 46 means? What are you running on your GPU?

What is cycle 21 in your GPU code doing?
This User Gave Thanks to Don Cragun For This Post:
cmccabe (12-20-2016)
# 3  
Old 12-20-2016
Error 46 seems to be a CUDA API error. The GPU runs data-intensive analysis utilizing hpc clustering and parallel-processing.

Code:
File:
     /sw_results/R_2016_12_05_13_30_48_user_S5-00580-17-Medexome/X0_Y0/acq_0020.
     dat
     FileLoadWorker: ImageProcessing time for flow 21: 0.65(ld=0.39 pin=0.05
     cnc=0.11 xt=0.09 sem=0.00 cache=0.06) sec 16:07:13
     File:
     /sw_results/R_2016_12_05_13_30_48_user_S5-00580-17-Medexome/X0_Y0/acq_0021.
     dat
     CUDA: gpuDeviceConfig: device added for evaluation: 0:GeForce GTX 970 v5.2
     3.99982GB
     CUDA: gpuDeviceConfig: minimum compute version used for pipeline: 2.0
     CUDA 0: gpuDeviceConfig::initDeviceContexts: Creating Context and Constant
     memory on device with id: 0
     terminate called after throwing an instance of 'cudaExecutionException'

It seems the CUDA exception was thrown in flow 21 and the GPU was interrupted. Is there a way that I may be able to figure out the cause of that interruption? Thank you Smilie.

Last edited by cmccabe; 12-20-2016 at 08:35 AM.. Reason: added details
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
How disabling GPU? _Fabio_79 Hardware 11 06-14-2017 12:04 PM
Cooler GPU figaro Hardware 4 12-01-2014 03:09 PM
Session terminates automatically O_vvv UNIX for Advanced & Expert Users 1 02-07-2014 11:09 AM
Need to generate a file with random data. /dev/[u]random doesn't exist. Devyn Shell Programming and Scripting 7 08-22-2013 10:23 PM
Generating Random Number in Child Process using Fork manisum Programming 5 02-10-2012 02:01 PM
Process, where each process generates a random integer manisum Homework & Coursework Questions 4 02-06-2012 08:33 PM
Capturing PIDs of same process at different instances suryaemlinux Shell Programming and Scripting 3 11-02-2011 11:12 AM
External GPU issue aihake Hardware 2 10-18-2011 06:34 PM
expect script for random password and random commands vanid Ubuntu 0 03-18-2011 01:29 AM
GPU and CUDA Scotch UNIX for Advanced & Expert Users 1 01-21-2011 01:10 AM
shell script to auto process ten random files and generate logs novice82 Shell Programming and Scripting 1 10-05-2009 08:40 AM
Checking for multiple instances of a process raghu.amilineni Shell Programming and Scripting 4 04-04-2009 04:13 AM
Zerofault terminates and coredumps - Segmentation fault vivek.gkp AIX 0 02-04-2009 06:09 AM
Tracing the GPU usage solea Solaris 0 08-01-2005 02:40 AM