BASH - Handling background processes - distributed processing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting BASH - Handling background processes - distributed processing
# 1  
Old 04-17-2011
BASH - Handling background processes - distributed processing

NOTE: I am using BASH and Solaris 10 for this.

Currently in the process of building a script that has a main "watcher" daemon that reads a configuration file and starts background processes based on it's global configuration. It is basically an infinite loop of configuration reading. Some of the background processes do things like "decrypting files" and "encrypting files" all from a configuration table that is read in. Yes the sub processes have configuration files also. The idea is that the watcher process calls the sub process with an "ID" that is valid in the sub processes' configuration.

What I'm having trouble deciding on is how to deal with things like notifications via email on how the sub process finished. They run in the background from the watcher process so assuming after it's finished I can't tell the watcher what happened. These sub processes can be called without the watcher as well. E.g.

Code:
# ClientDataDecrypt 1

Where 1 is the ID from a table configuration.

My thoughts were to:
1. Have the watcher touch a stat file when it kicks off the particular subtask. The sub process can then update this. I can also use this to stop the watcher from kicking off another sub process too quickly.
2. Have the watcher pass the relevant email addresses to the sub process and let the sub process handle the notifications. There still may be an issue with spam notifications if the sub process fails on particular files.

Sorry if I have confused what I'm trying to do. Your thoughts and feedback are welcome.

---------- Post updated at 04:07 PM ---------- Previous update was at 02:42 PM ----------

Thinking further about this I'm thinking that when I kick off the sub process I have it spit to an output file:

Code:
# ClientDataEncrypt -i <id> -o <path>_<id>_<parent>.lock

Where ID is the ID in the config, <path> is the parent file path, and <parent> is the parent (watcher) process ID.

I can then from the watcher keep checking for files matching the above criteria as it parses through. The out file can have something like this to read in:

SUCCESS=
FAIL=
SOURCE_DIR=
DEST_DIR=

It can then construct a notification based on this.
# 2  
Old 04-17-2011
In these kind of situations you'll find the environment variable $! (pid of backgound process) and the manpage for the command wait very useful.
I recently made something to launch sql statements and gzip on the output files with a maximum number of subprocesses (I chose this maximum in function of the numbers of cpu cores)

basically in the script you start a subprocess
Code:
ClientDataEncrypt <args> &
BPID=$!

and thus keep track of the backgound pids (by storing them in an array).

from there you can limit the number of subprocesses.

(P.S. on Solaris 10 pgrep is very useful in these situations, especially if you want to be able to launch ClientDataEncrypt both manually and via the watcher and want the watcher daemon to know about it)

I hope this helps

Last edited by pbillast; 04-17-2011 at 07:16 AM..
# 3  
Old 04-17-2011
I think I wrote something similar to this recently which is almost like a map-reduce version for a single host which is more robust for a shared host execution which takes care of cpu and available memory hogging.

Interfacing between the master and slave workers is not that great, as I used lock files to communicate between master and slave. Before the slave is about to start its destined work, it creates a lock file which is accessible by the master and it checks periodically to see if the lock file is still there. Once the worker completes the work, the lock file would be removed which indicates that worker is done with work, based on current load on the system a new process can or cannot be spawned.

Though I said, lock interfacing isn't that great, it works very well as the master is very pessimist and takes the veto to kill the worker if it detects that the worker has gone stale, not needed any more, parallel execution of work by more than 1 worker.

Is that the same problem that you are trying to solve?

---------- Post updated at 06:58 PM ---------- Previous update was at 06:53 PM ----------

Quote:
Originally Posted by pbillast
In these kind of situations you'll find the environment variable $! (pid of backgound process) and the manpage for the command wait very useful.
I recently made something to launch sql statements and gzip on the output files with a maximum number of subprocesses (I chose this maximum in function of the numbers of cpu cores)

basically in the script you start a subprocess
Code:
ClientDataEncrypt <args> &
BPID=$!

and thus keep track of the backgound pids (by storing them in an array).

from there you can limit the number of subprocesses.

(P.S. on Solaris 10 pgrep is very useful in these situations, especially if you want to be able to launch ClientDataEncrypt both manually and via the watcher and want the watcher daemon to know about it)

I hope this helps
Is the reason for storing PIDs to check periodically whether the sub process has completed execution or not? If so, there is a potential problem in it and yes it will happen in ultra busy nodes.

For ex: Lets take the pid stored is 'p1', by the time watcher scans the pid list, 'p1' can complete its processing, die and a new process with 'p1' can be spawned again, watcher unaware of this will think that 'p1' is still alive which is not true.

Basically, problem is due to expanded scope, if there is a boundary drawn using process group, this problem can be avoided. Even if 'p1' is respawned as part of some other process, it wont be part of the process group boundary that we are checking.

Smilie
# 4  
Old 04-18-2011
Thanks for your responses so far. I never actually really thought about managing the amount of sub processes. I am going to implement this however as it could quite easily get out of control.

I've changed the way I am dealing with the "watcher" and "worker" processes in terms of notifications. Instead of the watcher daemon doing the notifications I will have the watcher parse an email address argument, specified in the watcher config, to the worker processes.

End of the day, all these processes are doing sysloging so I know what's going on as an admin and the staff that care about a particular client system will get the notifications they need to get (if decryption failed or not for example). This way operations staff can call the worker processes manually if the worker config isn't set to auto-decrypt for that system and have the results shot to their email address...Well...the sub processes determine if they need to run or not based on the ID that's parsed by the watcher and if there's actually data in the path!

In other words, in the instance of an auto-decrypt for example:

1. ClientDataWatcher running and hits entry with auto-decrypt.
2. ClientDataWatcher checks if there is lock file matching the ID (lets assume there isn't).
3. ClientDataWatcher kicks off subtask. Touches a lock file with ID in file name.
Code:
ClientDataDecrypt "3" "person1@bla.com,person2@bla.com

4. ClientDataDecrypt sees if it has an entry for 3 in it's config.
5. ClientDataDecrypt finds entry and checks its config for folder of files it has to decrypt.
6. ClientDataDecrypt does decryption if it has to otherwise EXITS.
7. ClientDataDecrypt sends relevent notification based on above result and recipients passed.

Obviously there is a lot more stuff going on with the above but it's a summarized logic.

In terms of process sanity control, I'm going to have the watcher touch a lock file as it kicks off a process for that particular ID task based on config entry (as per above). The lock file will contain an epoch date on when the process started and it will use this to determine when it can rerun an auto-decrypt for example based on some sort of constant integer (specified somewhere).

Feedback, thoughts and abuse welcome! Smilie
# 5  
Old 04-18-2011
Am trying to understand the points you have stated.
How is the notification mechanism handled? Is it email based or is it based on pub-sub model whether the notification and notification processing can be automated and be more real time.

Is this processing being extended a lot of hosts? Or is it just a single host for processing?
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with shell script handling processes

Hello I have a file which has around 120 lines of commands. I am trying to write a shell script like which reads the 'command' file and executes line by line with some additional (common argument) with maximum 6 commands active at a time. Each of these commands when executed takes time... (5 Replies)
Discussion started by: JackyShane_36
5 Replies

2. UNIX for Advanced & Expert Users

File Processing: Handling spaces in a line

Hi All, Iam trying to get a file processed and some lines have spaces...the below is not working Want to remove empty line Want to remove lines that start with # Avoid line with substring WHOA When trying to get the substring from the var also Iam having trouble file is like VAR=VALUE,... (13 Replies)
Discussion started by: baanprog
13 Replies

3. Shell Programming and Scripting

Need help on background processes

Hi, I have a schell script parent.ksh from which I am calling three background processes a.ksh,b.ksh and c.ksh. Once these three processes completes the next step in parent.ksh should execute. How to achieve this? Please help me.... Thanks... (1 Reply)
Discussion started by: ravinunna
1 Replies

4. Shell Programming and Scripting

Background Processes

Ok guys so I have my first dummy shell almost done except for one tiny part: I do not know how to run a process in the background, from the code! I already know how to do that in a normal shell: $ program & However, no clue when it comes to how to program that thing. :eek: A very... (2 Replies)
Discussion started by: Across
2 Replies

5. Shell Programming and Scripting

background processing in BASH

I have script 3 scripts 1 parent (p1) and 2 children child1 and child2 I have script 3 scripts 1 parent 2 children child1 child2 In the code below the 2 child processes fire almost Instantaneously in the background, Is that possible to know the status of pass/fail of each process... (12 Replies)
Discussion started by: jville
12 Replies

6. Linux

background processing in BASH

I have script 3 scripts 1 parent 2 children child1 child2 In the code below the 2 child processes fire almost Instantaneously in the background, Is that possible to know the status of pass/fail of each process "as it happens" ? In the present scenario although Child2... (5 Replies)
Discussion started by: jville
5 Replies

7. Solaris

Handling Stdout&StdErr for background jobs.

Hello Friends, sorry, i am not very familiar with Unix programming. Could you please help me on this? We have to start different components from a startup script. each components are started as below in the background in a startprocess function $nohup $file $args >>$logFile 2>&1 & ... (1 Reply)
Discussion started by: alvinbush
1 Replies

8. Shell Programming and Scripting

Handling Stdout&StdErr for background jobs.

Hello Friends, sorry, i am not very familiar with Unix programming. Could you please help me on this? We have to start different components from a startup script. each components are started as below in the background in a startprocess function $nohup $file $args >>$logFile 2>&1 & ... (0 Replies)
Discussion started by: alvinbush
0 Replies

9. UNIX for Advanced & Expert Users

Background processes

How do you capture the return code from a background process? I am dumping data to a fifo and then processing it in a c program. I need to know that the sql finished successfully to ensure no missing data. Thanks. ex. sqlplus user/password < get_data.sql > data_fifo.txt & bin/process_data... (2 Replies)
Discussion started by: korndog
2 Replies
Login or Register to Ask a Question