Orphaned process "D" state


 
Thread Tools Search this Thread
Operating Systems Linux Red Hat Orphaned process "D" state
# 1  
Old 04-22-2019
Orphaned process "D" state

Hello,

How can we clear the D state (orphaned) process? I have tried to kill it with kill -9 but not work.

The server is critical, so is there anyway to clear the D process without rebooting the server?
# 2  
Old 04-22-2019
You can check to see what is the parent process, and if possible you can kill or restart the parent process (as long as the parent process is not the root process).

In the case of remote mounts causing the D state, you can check the parent networking process and decide how to proceed.

Some people have tried to be creative as follows:
  1. Determine the zombie & parent PIDs. (in this example let's say the zombie's PID 3200 and the parent's PID 3100)
  2. Start gdb and attach to the parent in this example , attach 3200
  3. Call waitpid for the zombie process:, for example call waitpid(3100,0,0)
  4. Detach from the parent (detach) and exit the debugger.

Update: Fixed typos (I think!)
This User Gave Thanks to Neo For This Post:
# 3  
Old 04-22-2019
D state is "device waiting" and is a bit nasty.
Such a process cannot be killed.
It makes sense to guess the blocking device, and fix it. Once fixed, the proceses will leave the D state and continue.
These 2 Users Gave Thanks to MadeInGermany For This Post:
# 4  
Old 04-22-2019
Here are the different process state codes and description:-

Code:
D    Uninterruptible sleep (usually IO)
R    Running or runnable (on run queue)
S    Interruptible sleep (waiting for an event to complete)
T    Stopped, either by a job control signal or because it is being traced.
W    paging (not valid since the 2.6.xx kernel)
X    dead (should never be seen)
Z    Defunct ("zombie") process, terminated but not reaped by its parent.

As you can see, D means uninterruptible sleep usually due to an IO.

You can check the wchan - name of the kernel function in which the process is sleeping to understand what exactly is going on:-
Code:
ps -eo pid,ppid,state,wchan=WIDE-WCHAN-COLUMN,comm,args | ( read -r; printf "  %s\n" "$REPLY"; grep <your process name/pid> )

Usually it will be a exit_mm() function to release all memory descriptors and related data structures.

As per linux kernel documentation, it first of all checks mm->core_waiters flag is set. If it does, then the process is dumping the contents of memory to a core file (IO). If that is the case, I believe to avoid corruption, it will not respond to a KILL signal until the core file dumping is completed.
These 4 Users Gave Thanks to Yoda For This Post:
# 5  
Old 04-23-2019
Hi Neo,

Quote:
You can check to see what is the parent process, and if possible you can kill or restart the parent process (as long as the parent process is not the root process).
It's orphan process, not zombie, and its PPID is 1 Smilie

Code:
[root@xxx:~]# ps -ef | grep dsmc
root     13613     1  0 Apr19 ?        00:00:00 dsmc q systeminfo policy -console
root     17067 12166  0 14:33 pts/2    00:00:00 grep dsmc
root     21870     1  0 Apr22 ?        00:00:00 dsmc

Hi MadeinGermany
Quote:
It makes sense to guess the blocking device, and fix it. Once fixed, the proceses will leave the D state and continue.
You mean guessing the IO devices (disks) ? The root cause of this is that the NFS server was disconnected unexpectedly and caused the NFS mounted folder became unresponsive, I have forced unmount and remount when the NFS server is back. And cannot kill it.

Hi Yoda,
I have tried your command
Code:
ps -eo pid,ppid,state,wchan=WIDE-WCHAN-COLUMN,comm,args | ( read -r; printf "  %s\n" "$REPLY"; grep <your process name/pid> )

And resulted in as below:
Code:
[root@xxx:~]# ps -eo pid,ppid,state,wchan=WIDE-WCHAN-COLUMN,comm,args | ( read -r; printf "  %s\n" "$REPLY"; grep 13613 )
    PID  PPID S WIDE-WCHAN-COLUMN COMMAND         COMMAND
13613     1 D cifs_reconnect_tc dsmc            dsmc q systeminfo policy -console

[root@xxx:~]# ps -eo pid,ppid,state,wchan=WIDE-WCHAN-COLUMN,comm,args | ( read -r; printf "  %s\n" "$REPLY"; grep 21870 )
    PID  PPID S WIDE-WCHAN-COLUMN COMMAND         COMMAND
21870     1 D cifs_reconnect_tc dsmc            dsmc

Look like it matches with my finding above (nfs disconnected). Now the nfs mounted folders are back. As the state D, so we cannot kill it, a reboot only can help clearing it?
These 2 Users Gave Thanks to Phat For This Post:
# 6  
Old 04-23-2019
Quote:
Originally Posted by Phat
Hi Neo,



It's orphan process, not zombie, and its PPID is 1 Smilie

Code:
[root@xxx:~]# ps -ef | grep dsmc
root     13613     1  0 Apr19 ?        00:00:00 dsmc q systeminfo policy -console
root     17067 12166  0 14:33 pts/2    00:00:00 grep dsmc
root     21870     1  0 Apr22 ?        00:00:00 dsmc

....
Yes, I understand D state is for orphans and Z is for zombie.

However, the process of using gdb to attach to the process is the same.

The "creative process" I suggested using gdb can be tried before rebooting if you absolutely do not want to reboot.

Don't you agree?
# 7  
Old 04-23-2019
Actually I expect gdb to also get hung when attaching it to a process in D state. But it's worth a trial.

If processes are permanently hung in cifs_reconnect_tcon then it looks like a kernel bug (or a missing interrupt/timeout feature).
Is your kernel at the latest patch level?
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

2. Solaris

How to remove a LUN in "Online Busy" state?

Hi all, I have a LUN that is in "Online Busy" when I issue the dev_gestate subcommand of luxadm: root@es088wb6:~# luxadm -v -e dev_getstate /dev/rdsk/c21t50050763090887FEd4s2 phys path = "/devices/pci@6c0/pci@1/pci@0/pci@4/SUNW,qlc@0/fp@0,0/ssd@w50050763090887fe,4:c,raw" ... (5 Replies)
Discussion started by: ludiegu
5 Replies

3. Solaris

How to solve M5000 CPU "Deconfigured" state?

Hi Community, i have one M5000 spare machine which was handled by support team. they told me that it is gone completely . i have checked the status. before it was showing MBU_B degraded. i updated to latest firmware and , resetted the xscf and now this is showing as normal. MBU_B... (5 Replies)
Discussion started by: bentech4u
5 Replies

4. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

5. BSD

Process stuck in "pipewr" state

Hi Experts, I am executing "svn" checkout command through my java code on a freeBSD machine. SVN checkout gets started , but when I run "top" command on my freebsd machine, I have observed that "svn" processes are stuck in "pipewr" state. Any pointer for this problem? Thanks, akash (0 Replies)
Discussion started by: akash.mahakode
0 Replies

6. UNIX for Advanced & Expert Users

Processes on FreeBSD are stuck in "pipewr" state

Hi Experts, I am executing "svn" checkout command through my java code on a freeBSD machine. SVN checkout gets started , but when I run "top" command on my freebsd machine, I have observed that "svn" processes are stuck in "pipewr" state. Any pointer for this problem? Thanks, akash (0 Replies)
Discussion started by: akash.mahakode
0 Replies

7. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

8. Red Hat

"service" , "process" and " daemon" ?

Friends , Anybody plz tell me what is the basic difference between "service" , "process" and " daemon" ? Waiting for kind reply .. .. (1 Reply)
Discussion started by: shipon_97
1 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies
Login or Register to Ask a Question