Sponsored Content
Special Forums Hardware CUDA GPU terminates process at random instances Post 302987926 by cmccabe on Sunday 18th of December 2016 10:43:38 AM
Old 12-18-2016
CUDA GPU terminates process at random instances

I am trying to start troubleshooting an error on a virtual server that uses the ubuntu 14.04 OS. Basically what happens (seeming random) is that the GPU stops processing and terminates. What Imean by seeming random is that for 3 runs there is no error then on run 4 the error appears. It has happend 4 times now and about the only consistency is that it appears to error at the same time - cycle 21 (as indicated by the log not included). If I reboot the GPU starts up again and processes normal.
Are there any commands/recommendations that might help me figure out what is going on? Thank you Smilie.

Error:
Code:
CUDA: gpuDeviceConfig: device added for evaluation: 0:GeForce GTX 970 v5.2
3.99982GB
CUDA: gpuDeviceConfig: minimum compute version used for pipeline: 2.0
CUDA 0: gpuDeviceConfig::initDeviceContexts: Creating Context and Constant
memory on device with id: 0
terminate called after throwing an instance of 'cudaExecutionException'

+----------------------------------------
 | ** CUDA ERROR! **
 | Error: 46
 | Msg: all CUDA-capable devices are busy or unavailable
 | File: 
cudaWrapper.cpp
 | Line: 127
 +----------------------------------------
  what():  CUDA EXCEPTION: Error occurred during job Execution!


Last edited by rbatte1; 12-19-2016 at 09:13 AM.. Reason: Removed SIZE tags from within CODE tags
 

9 More Discussions You Might Find Interesting

1. AIX

Zerofault terminates and coredumps - Segmentation fault

Hi, I am using zerofault in AIX to find memory leaks for my server. zf -c <forked-server> zf -l 30 <server> <arguments> Then after some (5 mins ) it terminates core dumping and saying server exited abnormally. I could not understand the core file generated: its something like show in below... (0 Replies)
Discussion started by: vivek.gkp
0 Replies

2. Shell Programming and Scripting

Checking for multiple instances of a process

Hi I have a scenario where i need to check multiple instances of a running shell script (abc.sh) . How can I find from inside a running shell script whether any other instance of the same script is running or not? If any other instance of same shell script is running I need to exit from... (4 Replies)
Discussion started by: raghu.amilineni
4 Replies

3. Shell Programming and Scripting

shell script to auto process ten random files and generate logs

Hello member's I'm learning to script in the ksh environment on a Solaris Box. I have 10 files in a directory that I need to pass, as input to a batch job one by one. lets say, the files are named as follows: abcd.txt ; efgh.bat ; wxyz.temp etc. (random filenames with varied extensions ).... (1 Reply)
Discussion started by: novice82
1 Replies

4. UNIX for Advanced & Expert Users

GPU and CUDA

Hi , i want begin programming using CUDA which enviroment can i get .I don't have desktop to buy GPU graphics card. what should to do to get CUDA enviroment. i'm thinking to buy desktop has this card or laptop (1 Reply)
Discussion started by: Scotch
1 Replies

5. Shell Programming and Scripting

Capturing PIDs of same process at different instances

Hi, I'm gonna launch a process from my 'C' code. I'm gonna launch it a few times. I would like to capture the PID of that process each time I launch. I have to copy the each PIDs into a 'C' variable and I have to kill all of them when I exit from the 'C' code. My requirement is int... (3 Replies)
Discussion started by: suryaemlinux
3 Replies

6. Homework & Coursework Questions

Process, where each process generates a random integer

Hello all, I am writing a program where user enters an integer and the program creates that number of processes. Each child process generates a random integer. When a child process calls a procedure say Myprocedure it should terminate where as the parent process wait for the child to terminate. (4 Replies)
Discussion started by: manisum
4 Replies

7. Programming

Generating Random Number in Child Process using Fork

Hello All, I am stuck up in a program where the rand functions ends up giving all the same integers. Tried sleep, but the numbers turned out to be same... Can anyone help me out how to fix this issue ? I have called the srand once in the program, but I feel like when I call fork the child process... (5 Replies)
Discussion started by: manisum
5 Replies

8. Shell Programming and Scripting

Need to generate a file with random data. /dev/[u]random doesn't exist.

Need to use dd to generate a large file from a sample file of random data. This is because I don't have /dev/urandom. I create a named pipe then: dd if=mynamed.fifo do=myfile.fifo bs=1024 count=1024 but when I cat a file to the fifo that's 1024 random bytes: cat randomfile.txt >... (7 Replies)
Discussion started by: Devyn
7 Replies

9. UNIX for Advanced & Expert Users

Session terminates automatically

Hi Am using unix Aix Am facing an issue with my login. When i enter user_id and password i can able to login and can able to work on it. When i keep this session idle then again started accesing then i cant able to acesss giving error message "session terminated" everytime am reset my... (1 Reply)
Discussion started by: O_vvv
1 Replies
CUDA Runtime API Specific Functions(3)			     Hardware Locality (hwloc)			    CUDA Runtime API Specific Functions(3)

NAME
CUDA Runtime API Specific Functions - Functions static inline int hwloc_cudart_get_device_pci_ids (hwloc_topology_t topology , int device, int *domain, int *bus, int *dev) static inline int hwloc_cudart_get_device_cpuset (hwloc_topology_t topology , int device, hwloc_cpuset_t set) static inline hwloc_obj_t hwloc_cudart_get_device_pcidev (hwloc_topology_t topology, int device) Function Documentation static inline int hwloc_cudart_get_device_cpuset (hwloc_topology_t topology, intdevice, hwloc_cpuset_tset) [static] Get the CPU set of logical processors that are physically close to device device. For the given CUDA Runtime API device device, read the corresponding kernel-provided cpumap file and return the corresponding CPU set. This function is currently only implemented in a meaningful way for Linux; other systems will simply get a full cpuset. static inline int hwloc_cudart_get_device_pci_ids (hwloc_topology_t topology, intdevice, int *domain, int *bus, int *dev) [static] Return the domain, bus and device IDs of device device. static inline hwloc_obj_t hwloc_cudart_get_device_pcidev (hwloc_topology_ttopology, intdevice) [static] Get the hwloc object for the PCI device corresponding to device device. For the given CUDA Runtime API device device, return the hwloc PCI object containing the device. Returns NULL if there is none. Author Generated automatically by Doxygen for Hardware Locality (hwloc) from the source code. Version 1.4.1 Mon Feb 27 2012 CUDA Runtime API Specific Functions(3)
All times are GMT -4. The time now is 04:53 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy