BASH - Handling background processes - distributed processing

04-17-2011

Registered User

6, 0

Join Date: Jan 2011

Last Activity: 29 April 2011, 11:49 PM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

BASH - Handling background processes - distributed processing

NOTE: I am using BASH and Solaris 10 for this.

Currently in the process of building a script that has a main "watcher" daemon that reads a configuration file and starts background processes based on it's global configuration. It is basically an infinite loop of configuration reading. Some of the background processes do things like "decrypting files" and "encrypting files" all from a configuration table that is read in. Yes the sub processes have configuration files also. The idea is that the watcher process calls the sub process with an "ID" that is valid in the sub processes' configuration.

What I'm having trouble deciding on is how to deal with things like notifications via email on how the sub process finished. They run in the background from the watcher process so assuming after it's finished I can't tell the watcher what happened. These sub processes can be called without the watcher as well. E.g.

Code:

# ClientDataDecrypt 1

Where 1 is the ID from a table configuration.

My thoughts were to:
1. Have the watcher touch a stat file when it kicks off the particular subtask. The sub process can then update this. I can also use this to stop the watcher from kicking off another sub process too quickly.
2. Have the watcher pass the relevant email addresses to the sub process and let the sub process handle the notifications. There still may be an issue with spam notifications if the sub process fails on particular files.

Sorry if I have confused what I'm trying to do. Your thoughts and feedback are welcome.

---------- Post updated at 04:07 PM ---------- Previous update was at 02:42 PM ----------

Thinking further about this I'm thinking that when I kick off the sub process I have it spit to an output file:

Code:

# ClientDataEncrypt -i <id> -o <path>_<id>_<parent>.lock

Where ID is the ID in the config, <path> is the parent file path, and <parent> is the parent (watcher) process ID.

I can then from the watcher keep checking for files matching the above criteria as it parses through. The out file can have something like this to read in:

SUCCESS=
FAIL=
SOURCE_DIR=
DEST_DIR=

It can then construct a notification based on this.

dcarrion87

View Public Profile for dcarrion87

Find all posts by dcarrion87

04-17-2011

Registered User

24, 3

Join Date: Jun 2009

Last Activity: 18 August 2014, 12:00 PM EDT

Location: Belgium, Mechelen

Posts: 24

Thanks Given: 0

Thanked 3 Times in 3 Posts

In these kind of situations you'll find the environment variable $! (pid of backgound process) and the manpage for the command wait very useful.
I recently made something to launch sql statements and gzip on the output files with a maximum number of subprocesses (I chose this maximum in function of the numbers of cpu cores)

basically in the script you start a subprocess

Code:

ClientDataEncrypt <args> &
BPID=$!

and thus keep track of the backgound pids (by storing them in an array).

from there you can limit the number of subprocesses.

(P.S. on Solaris 10 pgrep is very useful in these situations, especially if you want to be able to launch ClientDataEncrypt both manually and via the watcher and want the watcher daemon to know about it)

I hope this helps

Last edited by pbillast; 04-17-2011 at 07:16 AM..

pbillast

View Public Profile for pbillast

Find all posts by pbillast

04-17-2011

Registered User

3,216, 33

Join Date: Mar 2005

Last Activity: 4 September 2020, 7:11 AM EDT

Location: classification algos

Posts: 3,216

Thanks Given: 19

Thanked 33 Times in 30 Posts

I think I wrote something similar to this recently which is almost like a map-reduce version for a single host which is more robust for a shared host execution which takes care of cpu and available memory hogging.

Interfacing between the master and slave workers is not that great, as I used lock files to communicate between master and slave. Before the slave is about to start its destined work, it creates a lock file which is accessible by the master and it checks periodically to see if the lock file is still there. Once the worker completes the work, the lock file would be removed which indicates that worker is done with work, based on current load on the system a new process can or cannot be spawned.

Though I said, lock interfacing isn't that great, it works very well as the master is very pessimist and takes the veto to kill the worker if it detects that the worker has gone stale, not needed any more, parallel execution of work by more than 1 worker.

Is that the same problem that you are trying to solve?

---------- Post updated at 06:58 PM ---------- Previous update was at 06:53 PM ----------

Quote:

Originally Posted by pbillast

Code:

ClientDataEncrypt <args> &
BPID=$!

Is the reason for storing PIDs to check periodically whether the sub process has completed execution or not? If so, there is a potential problem in it and yes it will happen in ultra busy nodes.

For ex: Lets take the pid stored is 'p1', by the time watcher scans the pid list, 'p1' can complete its processing, die and a new process with 'p1' can be spawned again, watcher unaware of this will think that 'p1' is still alive which is not true.

Basically, problem is due to expanded scope, if there is a boundary drawn using process group, this problem can be avoided. Even if 'p1' is respawned as part of some other process, it wont be part of the process group boundary that we are checking.

matrixmadhan

View Public Profile for matrixmadhan

Find all posts by matrixmadhan

04-18-2011

Registered User

6, 0

Join Date: Jan 2011

Last Activity: 29 April 2011, 11:49 PM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks for your responses so far. I never actually really thought about managing the amount of sub processes. I am going to implement this however as it could quite easily get out of control.

I've changed the way I am dealing with the "watcher" and "worker" processes in terms of notifications. Instead of the watcher daemon doing the notifications I will have the watcher parse an email address argument, specified in the watcher config, to the worker processes.

End of the day, all these processes are doing sysloging so I know what's going on as an admin and the staff that care about a particular client system will get the notifications they need to get (if decryption failed or not for example). This way operations staff can call the worker processes manually if the worker config isn't set to auto-decrypt for that system and have the results shot to their email address...Well...the sub processes determine if they need to run or not based on the ID that's parsed by the watcher and if there's actually data in the path!

In other words, in the instance of an auto-decrypt for example:

1. ClientDataWatcher running and hits entry with auto-decrypt.
2. ClientDataWatcher checks if there is lock file matching the ID (lets assume there isn't).
3. ClientDataWatcher kicks off subtask. Touches a lock file with ID in file name.

Code:

ClientDataDecrypt "3" "person1@bla.com,person2@bla.com

4. ClientDataDecrypt sees if it has an entry for 3 in it's config.
5. ClientDataDecrypt finds entry and checks its config for folder of files it has to decrypt.
6. ClientDataDecrypt does decryption if it has to otherwise EXITS.
7. ClientDataDecrypt sends relevent notification based on above result and recipients passed.

Obviously there is a lot more stuff going on with the above but it's a summarized logic.

In terms of process sanity control, I'm going to have the watcher touch a lock file as it kicks off a process for that particular ID task based on config entry (as per above). The lock file will contain an epoch date on when the process started and it will use this to determine when it can rerun an auto-decrypt for example based on some sort of constant integer (specified somewhere).

Feedback, thoughts and abuse welcome!

dcarrion87

View Public Profile for dcarrion87

Find all posts by dcarrion87

04-18-2011

Registered User

3,216, 33

Join Date: Mar 2005

Last Activity: 4 September 2020, 7:11 AM EDT

Location: classification algos

Posts: 3,216

Thanks Given: 19

Thanked 33 Times in 30 Posts

Am trying to understand the points you have stated.
How is the notification mechanism handled? Is it email based or is it based on pub-sub model whether the notification and notification processing can be automated and be more real time.

Is this processing being extended a lot of hosts? Or is it just a single host for processing?

matrixmadhan

View Public Profile for matrixmadhan

Find all posts by matrixmadhan

Shell Programming and Scripting

BASH - Handling background processes - distributed processing

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with shell script handling processes

Discussion started by: JackyShane_36

2. UNIX for Advanced & Expert Users

File Processing: Handling spaces in a line

Discussion started by: baanprog

3. Shell Programming and Scripting

Need help on background processes

Discussion started by: ravinunna

4. Shell Programming and Scripting

Background Processes

Discussion started by: Across

5. Shell Programming and Scripting

background processing in BASH

Discussion started by: jville

6. Linux

background processing in BASH

Discussion started by: jville

7. Solaris

Handling Stdout&StdErr for background jobs.

Discussion started by: alvinbush

8. Shell Programming and Scripting

Handling Stdout&StdErr for background jobs.

Discussion started by: alvinbush

9. UNIX for Advanced & Expert Users

Background processes

Discussion started by: korndog