Unix/Linux Go Back    


Hardware Device drivers, hardware compatibility issues, motherboards, disk drives, graphics cards and other hardware related topics.

CUDA GPU terminates process at random instances

Hardware


Tags
gpu, ubuntu 14.04.3

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 12-18-2016
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 28 April 2017, 3:02 PM EDT
Location: Chicago
Posts: 1,098
Thanks: 640
Thanked 13 Times in 12 Posts
CUDA GPU terminates process at random instances

I am trying to start troubleshooting an error on a virtual server that uses the ubuntu 14.04 OS. Basically what happens (seeming random) is that the GPU stops processing and terminates. What Imean by seeming random is that for 3 runs there is no error then on run 4 the error appears. It has happend 4 times now and about the only consistency is that it appears to error at the same time - cycle 21 (as indicated by the log not included). If I reboot the GPU starts up again and processes normal.
Are there any commands/recommendations that might help me figure out what is going on? Thank you Linux.

Error:

Code:
CUDA: gpuDeviceConfig: device added for evaluation: 0:GeForce GTX 970 v5.2
3.99982GB
CUDA: gpuDeviceConfig: minimum compute version used for pipeline: 2.0
CUDA 0: gpuDeviceConfig::initDeviceContexts: Creating Context and Constant
memory on device with id: 0
terminate called after throwing an instance of 'cudaExecutionException'

+----------------------------------------
 | ** CUDA ERROR! **
 | Error: 46
 | Msg: all CUDA-capable devices are busy or unavailable
 | File: 
cudaWrapper.cpp
 | Line: 127
 +----------------------------------------
  what():  CUDA EXCEPTION: Error occurred during job Execution!


Last edited by rbatte1; 12-19-2016 at 08:13 AM.. Reason: Removed SIZE tags from within CODE tags
Sponsored Links
    #2  
Old Unix and Linux 12-19-2016
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is online now Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 30 April 2017, 4:43 AM EDT
Location: San Jose, CA, USA
Posts: 10,225
Thanks: 514
Thanked 3,550 Times in 3,023 Posts
I know very little about GPU programming, but from the error message I would assume that you are asking the GPU to start a new thread when the resources needed to run that thread are not available.

What does your documentation for your GeForce GTX 970 v5.2 say error code 46 means? What are you running on your GPU?

What is cycle 21 in your GPU code doing?
The Following User Says Thank You to Don Cragun For This Useful Post:
cmccabe (12-20-2016)
Sponsored Links
    #3  
Old Unix and Linux 12-20-2016
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 28 April 2017, 3:02 PM EDT
Location: Chicago
Posts: 1,098
Thanks: 640
Thanked 13 Times in 12 Posts
Error 46 seems to be a CUDA API error. The GPU runs data-intensive analysis utilizing hpc clustering and parallel-processing.


Code:
File:
     /sw_results/R_2016_12_05_13_30_48_user_S5-00580-17-Medexome/X0_Y0/acq_0020.
     dat
     FileLoadWorker: ImageProcessing time for flow 21: 0.65(ld=0.39 pin=0.05
     cnc=0.11 xt=0.09 sem=0.00 cache=0.06) sec 16:07:13
     File:
     /sw_results/R_2016_12_05_13_30_48_user_S5-00580-17-Medexome/X0_Y0/acq_0021.
     dat
     CUDA: gpuDeviceConfig: device added for evaluation: 0:GeForce GTX 970 v5.2
     3.99982GB
     CUDA: gpuDeviceConfig: minimum compute version used for pipeline: 2.0
     CUDA 0: gpuDeviceConfig::initDeviceContexts: Creating Context and Constant
     memory on device with id: 0
     terminate called after throwing an instance of 'cudaExecutionException'

It seems the CUDA exception was thrown in flow 21 and the GPU was interrupted. Is there a way that I may be able to figure out the cause of that interruption? Thank you Linux.

Last edited by cmccabe; 12-20-2016 at 08:35 AM.. Reason: added details
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Process, where each process generates a random integer manisum Homework & Coursework Questions 4 02-06-2012 08:33 PM
Capturing PIDs of same process at different instances suryaemlinux Shell Programming and Scripting 3 11-02-2011 11:12 AM
GPU and CUDA Scotch UNIX for Advanced & Expert Users 1 01-21-2011 01:10 AM
Checking for multiple instances of a process raghu.amilineni Shell Programming and Scripting 4 04-04-2009 04:13 AM



All times are GMT -4. The time now is 05:05 AM.