I apologize, Don. I don't provide all the code because I thought it would obfuscate things, but it seems I've made things more complicated. I really appreciate your patience, here.
I also apologize. I should have gone to bed at midnight this morning instead of trying to help you with your problem. I completely overlooked line #10 in your code that wipes out the data that you have just copied (and occasionally another one or more chunks of data that have been added to the file between the time the cp on the previous line completes and the time the redirection wipes out the file you copied, which with 400 jobs running on your system could be minutes later.
Quote:
First, I run "ps" and look for instances of "my_job" and have a maximum number that's checked before spawning another. I've run as many as 400 to stress things, and it worked (with the exception being the problem I'm talking about here, which doesn't seem to be affected at all by that number). I currently run a maximum of 20, but at this instant, for debugging purposes, I've set the limit at one. There is some proprietary stuff inside "my_job" that I'm hesitant to show (yes, I understand how difficult that makes this!)
If you mean that you run ps somewhere in the 1st three lines of your script (which you stripped out of the code you showed us), that won't have any effect on the number of jobs started in the background on line 19 in the loop on lines 5 through 21.
If you mean that you run ps in my_job, that won't affect the number of jobs started in the background on line 19 in your script nor the speed with which they are spawned.
If you mean that you have another loop between lines 18 and 19 in the code you showed us that keeps you from getting to line 19 until some of your background jobs complete, that would be CRUCIAL information that completely changes the way your script works that you have hidden from us.
From what you have shown us, the only thing limiting the number of invocations of my_job that you try to run concurrently is the number of lines available to process in your input file and how fast your "producer" can write data into that file.
Quote:
As for reading the files continuously, the source of data is always on, populating the source text file. I copy the file over, erase the source copy, and then read each line until a counter exceeds the line size of the file, OR if the last line I read has a timestamp that is too old.
As I mentioned above, the way you are copying and erasing the source file will sometimes silently discard some data. But, if you discard data if it is too old (something else we can't see in your code) maybe it doesn't matter
Quote:
As for your suggestion to read the file a single time, yes, I used that successfully and just switched back with the suspicion that that method (the method you recommend here) was causing my current problem.
I can assure you that that wasn't your problem unless the problem was that you ran out of disk space due to the size of the file or you exceeded the maximum file size that could be written by the process that is adding data to your source file. (And the description of the symptoms you have provided do not support either of these possibilities.)
Quote:
Exit status: It executes an "exit 0" on success or failure, but results are all echoed to a log file. Failures I check are all for functions that read or write data to and from hardware, but I still exit 0, and simply report the results of those functions.
You tell us that you limit the number of jobs you are running simultaneously, but you don't show us any code that suggests that this is true. From what you have shown us, there is a high likelihood that attempts to spawn my_job in the background will fail do to exceeding the number of processes a user is allowed to run at once. Since you never wait for any of your background jobs to complete and never check the status of any of your background jobs, you will never know how many attempts to start my_job failed (and in these cases, my_job can't possibly log the fact that it never started).
Quote:
Would you suggest waiting until the process count (of running "my_job" instances) dropped to some lower number or perhaps zero before fetching a new file full of records?
You have ignored my requests for information about the type of system you're using and the number of threads you might be able to run concurrently. Unless you have a massively parallel processing system, running 400 background jobs is much more likely to cause thrashing and scheduling problems than it is likely to improve throughput.
What you have shown us is logically equivalent to a script like this:
which will bring any system to its knees in seconds.
Quote:
Thanks again!!
Mark
These 2 Users Gave Thanks to Don Cragun For This Post:
Let me know what else I can provide, such as more of what these called programs contain. Essentially, the "my_job" runs for up to 90 seconds maximum and then deposits whatever results in its own file for retrieval by the "unloader".
The Garbage Collection routine periodically checks for "heartbeat" files that haven't been updated in several minutes, tries to kill the number of the process inside it, if its still alive, and then discards the file.
Append a:
before the final done, as already mentioned a few times.
As of now, during each loop you spawn 'check-jobs' to the background, ignoring wether or not they even had finished, while starting new jobs in the same loop..
As already said, even while [ true ];do sleep 1 & ; done can bring a machine to its knees, imagine what background jobs do, that actualy do something...
hth
PS:
You might want to have a look at: [BASH] Script to manage background scripts (running, finished, exit code)
The mods were kind and provided several working scripts.
And on the 3rd page its (currently) last post shows my solution using TUI, which runs multiple scripts in background, limiting/set an amount of allowed scripts and reports their exit status.
Now, this is the code that checks for existing processes (the job name is "my_job") and only sleeps if the number hits MAX_NUM_PROCS (which
has been set as high as 400, but is now set at 2)
Does this work?
---------- Post updated at 01:28 PM ---------- Previous update was at 01:26 PM ----------
Someone asked what "heartbeat" did, since I only "touch" it. I check that with a background watchdog process that just sees if it's been touched recently, and if not, assumes this process isn't well, kills it if it exists, and replaces it.
Make sense?
Last edited by Don Cragun; 01-29-2015 at 03:45 PM..
Reason: Add CODE tags.
Was me, and i removed it because you already answered it but i had overseen.
Though, its not clear to me how it would identify which process to kill, as the file is just touched and you spawned multiple jobs without 'saving' their corresponding pid. (edit: Unless that is handled in that other script.)
Ok, so you want to check if there are already started enough process, issue is, MAX_NUM_PROCS is not set anywhere in the code you posted.
Okay -- I've seen a few references to process-limiting methods. What is "bctl" and how would I use it in my situation? Is my approach of using "ps" and gripping for the function name unusable? Or perhaps something like "bctl" just does it better?
Thanks again!
---------- Post updated at 01:59 PM ---------- Previous update was at 01:57 PM ----------
sea -- thanks again!
The "my_job" function keeps the PID in its own "heartbeat" file, so when the garbage collection routine comes around, it sees how long that file's been untouched, and then tries to kill "old" jobs using the contained PID.
Again, if there's a better way to do this, please, feel free to straighten me out!
bctl is a tool written in C (#+?) by DGPickett.
He shared the code, so you can compile it on your system.
tui-psm is tool written in bash by me. (psm stands for paralell script manager)
I shared the code and made it part of its own dependency (TUI).
There was a discussion wether to use kill, ps or /proc way to identify the processes.
For me, ps worked the best.
So, give the others a try, if they work better for you - switch, otherwise keep using ps.
Hi all,
here's my script
#!/bin/ksh
if
then
export DB_CREATE_PATH=`pwd`
fi
echo
echo "********************--Menu--*****************************"
echo "*** "
echo "*** 1. Pre-Upgrade Steps "... (3 Replies)
I'm running cygwin bash on windows 7 and I'm have some bat files that perform large builds and take a long time and a lot of memory.
Therefor, I don't want to builds executing simultaneously (too much memory).
How can I implement a queue so I can queue up multiple builds and only execute one... (2 Replies)
hi all, I am trying to do a loop on a series of plotting function shown below:
colorlist=(blue red green); n=0;
for k in $xy; do
psbasemap $range -JM$scale -B10g5 -X1 -Y1 -P -K > $outfile
pscoast $range -JM$scale -B10g5 -D$res -P -W$lwidth -G$fill -O -K >> $outfile
echo... (1 Reply)
Hi
I need help with my coding , first time I'm working with bash .
What i must do is check if there is 3 .txt files if there is not 3 of them i must give an error code , if al three is there i must first arrange them in alphabetical order and then take the last word in al 3 of the .txt files... (1 Reply)
Hello,
can someone please help me to fix this script,
I have a 2 files, one file has hostname information and second file has console information of the hosts in each line, I have written a script which actually reads each line in hostname file and should grep in the console file and paste the... (8 Replies)
I've got a large file (2-4 gigs), made up of 4 columns. I'd like to split the file into two, based on the 2nd column value being even or odd. The following script does the job, but runs incredibly slow--I'm not sure it will complete this week.
There must be a clever way to do this with just... (2 Replies)
Hi All,
There is a script (test.sh) which is taking more CPU usage. I am attaching the script in this thread.
Could anybody please help me out to optimize the script in a better way.
Thanks,
Gobinath (6 Replies)
Hello,
I'm running a bash script and I'd like to get more accurate a runtime information then now.
So far I've been using this method:
STARTM=`date -u "+%s"`
.........
*script function....
.........
STOPM=`date -u "+%s"`
RUNTIMEM=`expr $STOPM - $STARTM`
if (($RUNTIMEM>59)); then... (6 Replies)
Hi,
I am having the following problem.
test > hourOfDay=06 ; delayTime=$(((9-$hourOfDay)*60)) ; echo $delayTime
180
test > hourOfDay=07 ; delayTime=$(((9-$hourOfDay)*60)) ; echo $delayTime
120
test > hourOfDay=08 ; delayTime=$(((9-$hourOfDay)*60)) ; echo $delayTime
bash: (9-08: value... (5 Replies)
Hello -
I have a bash script which does some logging, and I'd like to include the line number of the echo statement that pipes into $LOGGER:
MYPID=$$
MYNAME=`basename $0`
LOGGER="/usr/bin/logger -t $MYNAME($LINENO) -p daemon.error"
...
echo 'this is an entry into the log file' | $LOGGER
... (3 Replies)