BASH: Script jams Cygwin to 100% CPU -


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting BASH: Script jams Cygwin to 100% CPU -
# 1  
Old 09-20-2010
BASH: Script jams Cygwin to 100% CPU -

I'd like to streamline the code more than a bit to get it to run faster.

There's a thread about this and related issues of mine on the Cygwin mailing-list, but I want to eliminate any chances it might just be inefficient/inelegant/crappy code. A previous run of the same script on both Cygwin and Ubuntu 9.04's GNOME Terminal, using a source file and a list file of equal length and content, had a significant difference in execution time (2m20s versus just under 3/4 of an hour, GNOME:Cygwin).

The code:
Code:
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")

while read 'line';
do
#8-point check:
	cate=$(exiv2 -g Iptc.Application2.Category -Pv $line)
	if [[ "$cate" == "" ]]; then cate2=" Category Tag";fi
	cred=$(exiv2 -g Iptc.Application2.Credit -Pv $line)
	if [[ "$cred" == "" ]]; then cred2=" Credit Tag";fi
	sour=$(exiv2 -g Iptc.Application2.Source -Pv $line)
	if [[ "$sour" == "" ]]; then sour2=" Source Tag";fi
	writ=$(exiv2 -g Iptc.Application2.Writer -Pv $line)
	if [[ "$writ" == "" ]]; then writ2=" Writer Tag";fi
	trans=$(exiv2 -g Iptc.Application2.TransmissionReference -Pv $line)
	if [[ "$trans" == "" ]]; then trans2=" Transmission Reference Tag";fi
	fixid=$(exiv2 -g Iptc.Application2.FixtureId -Pv $line)
	if [[ "$fixid" == "" ]]; then fixid2=" Event/Fixture Identifier Tag";fi
	objnm=$(exiv2 -g Iptc.Application2.ObjectName -Pv $line)
	if [[ "$objnm" == "" ]]; then objnm2=" Object Name Tag";fi
	locn=$(exiv2 -g Iptc.Application2.SubLocation -Pv $line)
	if [[ "$locn" == "" ]]; then locn2=" Location Tag";fi
echo -e "Evaluating File #$x \ $line."
iptc=$(echo $cate2$cred2$sour2$writ2$trans2$fixid2$objnm2$locn2)
case "$iptc" in 
 "" ) ;;
 * ) echo -e "$line:$iptc">>fieldsmissing.txt 
 y=$[y+1]
 ;;
esac
x=$[x+1]
unset cate2
unset cred2
unset sour2
unset writ2
unset trans2
unset fixid2
unset objnm2
unset iptc
done<list.txt
if [ "$y" -gt "0" ]; then
	echo -ne "\n$x JPEG files evaluated,\nwith $y missing some or all of 
their required IPTC tag data.\nFile-by-file is in fieldsmissing.txt.\n"
else 
	echo -ne "\n$x JPEG files evaluated, none missing any required IPTC
 tag data.\nNo new information was written to fieldsmissing.txt.\n"
fi
IFS=$SAVEIFS

I wouldn't pick on this script for any other reason than: since uninstalling one or two binaries from the Cygwin mirrors and from elsewhere, this is the only script (written by YT within the last two months ) that consistently raises CPU usage, as observed in Task Manager by me, to between 94 and 100% usage while it's being executed. Which is Cygwin "biting itself in the a**," to coin a phrase; the longer the cycle time the longer it takes for a script to finish.

Anyone's advice on how to streamline the code would be much appreciated.

BZT
# 2  
Old 09-20-2010
each $( exiv2 ) call creates a child process (exec) which is very expensive (it creates a process, opens and reads the executable, opens the input file, etc.)

You have 8 per 'line'. You are probably creating 10K+ child processes which is a huge amount of cpu overhead and still have done nothing toward getting your result. One exiv2 call can print everything to a file in one call; you can then use process substitution to get all of the data into one variable that you can parse.
# 3  
Old 09-21-2010
I was just thinking of that.

This dovetails, coincidentally, with something the author of Exiv2, Andreas Huggel, and I were discussing on his projects forum just yesterday and the day before. The "-g" option in his command-line tool, as implemented, has one shortcoming: in the case of metadata tags, keys and fields that can support multiple entries, when invoked, -g only returns the first one.

Right now, you have to use one of the modified capital P (for "Print") options, specifically -Pnv, to get the multi-line data in two columns, and then grep for the key name you want. I was thinking maybe a similar approach, one ...
Code:
fullfiledata=$(exiv2 -Pnv $line)

...with succeeding greps and cuts (or their equivalents in IFS settings and other builtins) to break down the variable fullfiledata into its component parts, which then can be further evaluated by the "if/then/fi" conditional loops, and printed to a file in the same way the current script does (unless there's a better approach to that as well).

Eight calls at once is indeed heavy loading. Especially with Cygwin, which handles that sort of thing pißß-poorly anyway, it's well-known.

This may solve another problem with the script. It hasn't been clearing those *foo2* variables consistently. Before I used the unsets, I had one line of just variable=;nextvariable=; and so forth, which did the job but (idiot me) I thought it was that which was slowing things down. Proof was in the pudding, though; a fieldsmissing.txt file I had my GNOME Terminal create had several line items with the same missing tag -- checked out in XnView, the pics were not missing the tag the list said they were. Root of that problem: a hung variable value from evaluating some earlier line item (maybe one two-dozen lines back in list.txt ).

if "one call [to Exiv2] does it all," the variable-clearing issue and the CPU load problem may both go away. I'll keep the unsets just in case. Now I think I should ask what builtins I can use to break down the "fullfiledata" variable. I've been an external 'junkie' so long in my scripting, I've yet to get my mind around them all, in a big way.

Thanks for the help so far. Hoping it continues.

BZT

Last edited by SilversleevesX; 09-21-2010 at 10:11 AM.. Reason: Completed a complex thought; style slimming, rephrasing.
# 4  
Old 09-21-2010
Quote:
Originally Posted by SilversleevesX
Eight calls at once is indeed heavy loading. Especially with Cygwin, which handles that sort of thing pißß-poorly anyway, it's well-known.
Small correction here: the problem ain't Cygwin, it's Windows. UNIX software (including that running inside Cygwin) relies heavily on the fact that a fork() is a fast process spawning call. On Windows, however, creating a new process is something very very slow, compared to creating a new thread, which is the preferred model for Microsoft. Since each subshell is a fork+exec, combined with slow process creation... well, you've seen the results.
# 5  
Old 09-21-2010
Be that as it may...

I'm trying to re-create the script using one call to exiv2. "-Pnv"/"-Pkv" and grep are no good, because the output of the former is a block of text when set to a variable. So I have it writing "fullfiledata" to an ASCII file thus:
Code:
fullfiledata=$(exiv2 -Pkv $line>file$x)

Calling any part of that back, I'm reminded of the problems I had when first writing a script to perform this function.
Code:
cate1=$(grep Iptc.Application2.Category file$x)
	if [[ -z "$cate" ]]; then cate2=" Category Tag";fi

On the command line, as two separate lines, these will return cate2 as empty if indeed there is such a line as "Iptc.Application2.Category" in file x. Likewise for every variable set to a tag by grep to the end of the set of conditionals (the other 7 tags Exiv2 grepped in the original script).

Where the problem occurs is in assembling the variable iptc: something wants to ignore any and all empties and stock it with the "if set but empty" value of foo2. Printing iptc to a file, then, means I'm getting a "false report" on all the files from the first one in list.txt to the last, (or until I get so annoyed I terminate the script, which is more likely at this point).

It has so far made no difference to include "else" statements setting the value of a particular second variable to nothing.

Here's the revised code, letting out all possible stops and (I think) allowing for most possible combinations of missing fields.
Code:
#!/bin/bash
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
x=1
while read 'line';
do
#8-point check:
	
	fullfiledata=$(exiv2 -Pkv $line>file$x 2>/dev/null)
	cate1=$(grep -o Iptc.Application2.Category file$x)
	if [[ "$cate1" == * ]]; then cate2=" Category Tag";fi
	if [[ "$cate2" != * ]]; then iptc=$(echo "$cate2" ); fi
	cred1=$(grep -o Iptc.Application2.Credit file$x)
	if [[ "$cred" == * ]]; then cred2=" Credit Tag";fi
	if [[ "$cred2" != * ]]; then iptc=$(echo "$cred2" ); fi
	sour1=$(grep -o Iptc.Application2.Source file$x)
	if [[ "$sour" == * ]]; then sour2=" Source Tag";fi
	if [[ "$sour2" != * ]]; then iptc=$(echo "$sour2" ); fi
	writ1=$(grep -o Iptc.Application2.Writer file$x)
	if [[ "$writ" == * ]]; then writ2=" Writer Tag";fi
	if [[ "$writ2" != * ]]; then iptc=$(echo "$writ2" ); fi
	trans1=$(grep -o Iptc.Application2.TransmissionReference file$x)
	if [[ "$trans" == * ]]; then trans2=" Transmission Reference Tag";fi
	if [[ "$trans2" != * ]]; then iptc=$(echo "$trans2" ); fi
	fixid1=$(grep -o Iptc.Application2.FixtureId file$x)
	if [[ "$fixid" == * ]]; then fixid2=" Event/Fixture Identifier Tag";fi
	if [[ "$fixid2" != * ]]; then iptc=$(echo "$fixid2" ); fi
	objnm1=$(grep -o Iptc.Application2.ObjectName file$x)
	if [[ -z "$objnm1" ]]; then objnm2=" Object Name Tag";fi
	if [[ "$objnm2" != * ]]; then iptc=$(echo "$objnm2" ); fi
	locn1=$(grep -o Iptc.Application2.SubLocation file$x)
	if [[ "$locn" == * ]]; then locn2=" Location Tag";fi
	if [[ "$locn2" != * ]]; then iptc=$(echo "$locn2" ); fi
echo -e "Evaluating File #$x \ $line."
if [[ "$cate2" != * ]] && [[ "$cred2" != * ]]; then iptc=$(echo "$cate2$cred2"); fi
if [[ "$cate2" != * ]] && [[ "$cred2" != * ]] && [[ "$sour2" != * ]]; then iptc=$(echo "$cate2$cred2$sour2"); fi
if [[ "$cate2" != * ]] && [[ "$cred2" != * ]] && [[ "$sour2" != * ]] && [[ "$writ2" != * ]]; then iptc=$(echo "$cate2$cred2$sour2$writ2"); fi
if [[ "$cate2" != * ]] && [[ "$cred2" != * ]] && [[ "$sour2" != * ]] && [[ "$writ2" != * ]] && [[ "$trans2" != * ]]; then iptc=$(echo "$cate2$cred2$sour2$writ2$trans2"); fi
if [[ "$cate2" != * ]] && [[ "$cred2" != * ]] && [[ "$sour2" != * ]] && [[ "$writ2" != * ]] && [[ "$trans2" != * ]] && [[ "$fixid2" != * ]]; then iptc=$(echo "$cate2$cred2$sour2$writ2$trans2$fixid2"); fi
if [[ "$cate2" != * ]] && [[ "$cred2" != * ]] && [[ "$sour2" != * ]] && [[ "$writ2" != * ]] && [[ "$trans2" != * ]] && [[ "$fixid2" != * ]] && [[ "$objnm2" != * ]]; then iptc=$(echo "$cate2$cred2$sour2$writ2$trans2$fixid2$objnm2"); fi
if [[ "$cate2" != * ]] && [[ "$cred2" != * ]] && [[ "$sour2" != * ]] && [[ "$writ2" != * ]] && [[ "$trans2" != * ]] && [[ "$fixid2" != * ]] && [[ "$objnm2" != * ]] && [[ "$locn" != * ]]; then iptc=$(echo "$cate2$cred2$sour2$writ2$trans2$fixid2$objnm2"); fi
case "$iptc" in 
 "" ) ;;
 * ) echo -e "$line:$iptc">>fieldsmissingm.txt 
 y=$[y+1]
 ;;
esac
if [[ "$iptc" -gt "" ]]; then
	echo -e "$line:$iptc">>fieldsmissingm.txt 
	y=$[y+1]
fi
rm file$x
x=$[x+1]
done<msxm-eventmissing.txt
if [ "$y" -gt "0" ]; then
	echo -ne "\n$x JPEG files evaluated,\nwith $y missing some or all of their required IPTC tag data.\nFile-by-file is in fieldsmissing.txt.\n"
else 
	echo -ne "\n$x JPEG files evaluated, none missing any required IPTC tag data.\nNo new information was written to fieldsmissing.txt.\n"
fi
IFS=$SAVEIFS

BZT
# 6  
Old 09-21-2010
exiv/exif has C/C++ libraries to do this.

If this is a production matter consider writing C code, where you have infinitely more control. IMO your shell script is going to be hell to maintain, especially if someone else gets it to do a bug fix.

Unix dictum:
IF can't be done resonably in shell script
If it cannot be done resonably in (perl ruby python)
then go to C.

I think you have reached to 'go to C' stage in this project.
# 7  
Old 09-22-2010
Thanks, Jim.

I'll start by asking some folks on the Exiv2 forum who appreciate new use concepts for the C++ library and know far better than I how to write the code to employ (didn't want to use the word "exploit, though it may suit better) the resources in both.

BZT

Quote:
Originally Posted by jim mcnamara
exiv/exif has C/C++ libraries to do this.

If this is a production matter consider writing C code, where you have infinitely more control. IMO your shell script is going to be hell to maintain, especially if someone else gets it to do a bug fix.

Unix dictum:
IF can't be done resonably in shell script
If it cannot be done resonably in (perl ruby python)
then go to C.

I think you have reached to 'go to C' stage in this project.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Using BATCH to call a BASH script on CygWin

I am trying to use a batch file to automatically execute a bash script with no luck this far. The batch script looks like this: C:\Cygwin64\bin\bash test.sh I have also tried this: C:\Cygwin64\bin\bash "C:\Cygwin64\bin\test.sh" Needless to say that the windows box has Cygwin... (7 Replies)
Discussion started by: Xterra
7 Replies

2. Shell Programming and Scripting

Bash Script (tar + md) on Cygwin

Hi everybody, First, I'm sorry for my bad english! I have the following situation: I have a Windows 2012 R2 with Cygwin installed. The Windows Server is used as a backup Server with Dell AppAssure installed. At the moment, AppAssure saves Backup Targets to a repository on his D. The... (9 Replies)
Discussion started by: fibra3000
9 Replies

3. Shell Programming and Scripting

Bash script - cygwin (powershell?) pull from GitHub API Parse JSON

All, Have a weird issue where i need to generate a report from GitHub monthly detailing user accounts and the last time they logged in. I'm using a windows box to do this (work issued) and would like to know if anyone has any experience scripting for GitAPI using windows / cygwin / powershell?... (9 Replies)
Discussion started by: ChocoTaco
9 Replies

4. Shell Programming and Scripting

Cygwin bash script to unmount and mount an XP partition

As stated, I am looking into keeping my backup drive unmounted in normal windows use. Partly this is to address threats like cryptolocker. Since one of my backup drives is an internal drive, it will not likely afford any protection from such a threat. I am thinking of adding code to my rsync script... (5 Replies)
Discussion started by: LMHmedchem
5 Replies

5. Windows & DOS: Issues & Discussions

run cygwin bash script from notepad++

I'm using Notepad++ to edit my BASH scripts and using CYGWIN to run them from Windows7. In Notepad++ there is a 'Run' capability. How do I get this to run my scripts directly without having to enter the script name from the Cygwin command line? (3 Replies)
Discussion started by: millsy5
3 Replies

6. Shell Programming and Scripting

CPU assignment bash script

Hi guys, I'm basically looking for some help with a bash script I've written. It's purpose is to assign process to individual CPU cores once that process hits 15% CPU usage or more. If it drops below 15%, it's unassigned again (using taskset). My problem is that I can't think of a way to... (2 Replies)
Discussion started by: mcky
2 Replies

7. UNIX for Dummies Questions & Answers

Cygwin bash script and read command

Hello everyone, I am struggling a bit with a batch script that I need to run in cygwin. I work in winXP and I had to write some awk scripts to do some file manipulation, and now I would like to automate the process by just running a batch file so that my colleagues can use it easily. Now, the... (2 Replies)
Discussion started by: Teroc
2 Replies

8. Shell Programming and Scripting

Is there a way to make bash [or another shell] use all CPU cores to execute a single script?

I wrote a very simple script that matches combinations of alphabetic characters (1-5). I want to use it to test CPU speeds of different hardware/platforms. The problem is that on multi-core/processor systems, only one CPU is being utilized to execute the script. Is there a way to change that?... (16 Replies)
Discussion started by: ph0enix
16 Replies

9. Shell Programming and Scripting

Help with bash script - Need to get CPU usage as a percentage

I'm writing a bash script to log some selections from a sensors output (core temp, mb temp, etc.) and I would also like to have the current cpu usage as a percentage. I have no idea how to go about getting it in a form that a bash script can use. For example, I would simply look in the output of... (3 Replies)
Discussion started by: graysky
3 Replies

10. UNIX for Dummies Questions & Answers

How to use cygwin to run bash script

Hi, all, I try to run a quite simple bash script mytest.sh in cygwin, it's content is: #!/bin/bash echo "It is my first bash shell" there are three lines in the script. The second line is blank line. When I run it use command: bash c:/mytest.sh, ... (6 Replies)
Discussion started by: Jenny.palmy
6 Replies
Login or Register to Ask a Question