Sponsored Content
Special Forums Hardware CUDA GPU terminates process at random instances Post 302987926 by cmccabe on Sunday 18th of December 2016 10:43:38 AM
Old 12-18-2016
CUDA GPU terminates process at random instances

I am trying to start troubleshooting an error on a virtual server that uses the ubuntu 14.04 OS. Basically what happens (seeming random) is that the GPU stops processing and terminates. What Imean by seeming random is that for 3 runs there is no error then on run 4 the error appears. It has happend 4 times now and about the only consistency is that it appears to error at the same time - cycle 21 (as indicated by the log not included). If I reboot the GPU starts up again and processes normal.
Are there any commands/recommendations that might help me figure out what is going on? Thank you Smilie.

Error:
Code:
CUDA: gpuDeviceConfig: device added for evaluation: 0:GeForce GTX 970 v5.2
3.99982GB
CUDA: gpuDeviceConfig: minimum compute version used for pipeline: 2.0
CUDA 0: gpuDeviceConfig::initDeviceContexts: Creating Context and Constant
memory on device with id: 0
terminate called after throwing an instance of 'cudaExecutionException'

+----------------------------------------
 | ** CUDA ERROR! **
 | Error: 46
 | Msg: all CUDA-capable devices are busy or unavailable
 | File: 
cudaWrapper.cpp
 | Line: 127
 +----------------------------------------
  what():  CUDA EXCEPTION: Error occurred during job Execution!


Last edited by rbatte1; 12-19-2016 at 09:13 AM.. Reason: Removed SIZE tags from within CODE tags
 

9 More Discussions You Might Find Interesting

1. AIX

Zerofault terminates and coredumps - Segmentation fault

Hi, I am using zerofault in AIX to find memory leaks for my server. zf -c <forked-server> zf -l 30 <server> <arguments> Then after some (5 mins ) it terminates core dumping and saying server exited abnormally. I could not understand the core file generated: its something like show in below... (0 Replies)
Discussion started by: vivek.gkp
0 Replies

2. Shell Programming and Scripting

Checking for multiple instances of a process

Hi I have a scenario where i need to check multiple instances of a running shell script (abc.sh) . How can I find from inside a running shell script whether any other instance of the same script is running or not? If any other instance of same shell script is running I need to exit from... (4 Replies)
Discussion started by: raghu.amilineni
4 Replies

3. Shell Programming and Scripting

shell script to auto process ten random files and generate logs

Hello member's I'm learning to script in the ksh environment on a Solaris Box. I have 10 files in a directory that I need to pass, as input to a batch job one by one. lets say, the files are named as follows: abcd.txt ; efgh.bat ; wxyz.temp etc. (random filenames with varied extensions ).... (1 Reply)
Discussion started by: novice82
1 Replies

4. UNIX for Advanced & Expert Users

GPU and CUDA

Hi , i want begin programming using CUDA which enviroment can i get .I don't have desktop to buy GPU graphics card. what should to do to get CUDA enviroment. i'm thinking to buy desktop has this card or laptop (1 Reply)
Discussion started by: Scotch
1 Replies

5. Shell Programming and Scripting

Capturing PIDs of same process at different instances

Hi, I'm gonna launch a process from my 'C' code. I'm gonna launch it a few times. I would like to capture the PID of that process each time I launch. I have to copy the each PIDs into a 'C' variable and I have to kill all of them when I exit from the 'C' code. My requirement is int... (3 Replies)
Discussion started by: suryaemlinux
3 Replies

6. Homework & Coursework Questions

Process, where each process generates a random integer

Hello all, I am writing a program where user enters an integer and the program creates that number of processes. Each child process generates a random integer. When a child process calls a procedure say Myprocedure it should terminate where as the parent process wait for the child to terminate. (4 Replies)
Discussion started by: manisum
4 Replies

7. Programming

Generating Random Number in Child Process using Fork

Hello All, I am stuck up in a program where the rand functions ends up giving all the same integers. Tried sleep, but the numbers turned out to be same... Can anyone help me out how to fix this issue ? I have called the srand once in the program, but I feel like when I call fork the child process... (5 Replies)
Discussion started by: manisum
5 Replies

8. Shell Programming and Scripting

Need to generate a file with random data. /dev/[u]random doesn't exist.

Need to use dd to generate a large file from a sample file of random data. This is because I don't have /dev/urandom. I create a named pipe then: dd if=mynamed.fifo do=myfile.fifo bs=1024 count=1024 but when I cat a file to the fifo that's 1024 random bytes: cat randomfile.txt >... (7 Replies)
Discussion started by: Devyn
7 Replies

9. UNIX for Advanced & Expert Users

Session terminates automatically

Hi Am using unix Aix Am facing an issue with my login. When i enter user_id and password i can able to login and can able to work on it. When i keep this session idle then again started accesing then i cant able to acesss giving error message "session terminated" everytime am reset my... (1 Reply)
Discussion started by: O_vvv
1 Replies
CUDA Driver API Specific Functions(3)			     Hardware Locality (hwloc)			     CUDA Driver API Specific Functions(3)

NAME
CUDA Driver API Specific Functions - Functions static inline int hwloc_cuda_get_device_pci_ids (hwloc_topology_t topology , CUdevice cudevice, int *domain, int *bus, int *dev) static inline int hwloc_cuda_get_device_cpuset (hwloc_topology_t topology , CUdevice cudevice, hwloc_cpuset_t set) static inline hwloc_obj_t hwloc_cuda_get_device_pcidev (hwloc_topology_t topology, CUdevice cudevice) static inline hwloc_obj_t hwloc_cuda_get_device_osdev (hwloc_topology_t topology, CUdevice cudevice) static inline hwloc_obj_t hwloc_cuda_get_device_osdev_by_index (hwloc_topology_t topology, unsigned idx) Detailed Description Function Documentation static inline int hwloc_cuda_get_device_cpuset (hwloc_topology_t topology, CUdevicecudevice, hwloc_cpuset_tset) [static] Get the CPU set of logical processors that are physically close to device cudevice. Return the CPU set describing the locality of the CUDA device cudevice. Topology topology and device cudevice must match the local machine. I/O devices detection and the CUDA component are not needed in the topology. The function only returns the locality of the device. If more information about the device is needed, OS objects should be used instead, see hwloc_cuda_get_device_osdev() and hwloc_cuda_get_device_osdev_by_index(). This function is currently only implemented in a meaningful way for Linux; other systems will simply get a full cpuset. static inline hwloc_obj_t hwloc_cuda_get_device_osdev (hwloc_topology_ttopology, CUdevicecudevice) [static] Get the hwloc OS device object corresponding to CUDA device cudevice. Return the hwloc OS device object that describes the given CUDA device cudevice. Return NULL if there is none. Topology topology and device cudevice must match the local machine. I/O devices detection and the NVML component must be enabled in the topology. If not, the locality of the object may still be found using hwloc_cuda_get_device_cpuset(). Note: The corresponding hwloc PCI device may be found by looking at the result parent pointer. static inline hwloc_obj_t hwloc_cuda_get_device_osdev_by_index (hwloc_topology_ttopology, unsignedidx) [static] Get the hwloc OS device object corresponding to the CUDA device whose index is idx. Return the OS device object describing the CUDA device whose index is idx. Return NULL if there is none. The topology topology does not necessarily have to match the current machine. For instance the topology may be an XML import of a remote host. I/O devices detection and the CUDA component must be enabled in the topology. Note: The corresponding PCI device object can be obtained by looking at the OS device parent object. This function is identical to hwloc_cudart_get_device_osdev_by_index(). static inline int hwloc_cuda_get_device_pci_ids (hwloc_topology_t topology, CUdevicecudevice, int *domain, int *bus, int *dev) [static] Return the domain, bus and device IDs of the CUDA device cudevice. Device cudevice must match the local machine. static inline hwloc_obj_t hwloc_cuda_get_device_pcidev (hwloc_topology_ttopology, CUdevicecudevice) [static] Get the hwloc PCI device object corresponding to the CUDA device cudevice. Return the PCI device object describing the CUDA device cudevice. Return NULL if there is none. Topology topology and device cudevice must match the local machine. I/O devices detection must be enabled in topology topology. The CUDA component is not needed in the topology. Author Generated automatically by Doxygen for Hardware Locality (hwloc) from the source code. Version 1.7 Sun Apr 7 2013 CUDA Driver API Specific Functions(3)
All times are GMT -4. The time now is 09:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy