I'd like to streamline the code more than a bit to get it to run faster.
There's a thread about this and related issues of mine on the Cygwin mailing-list, but I want to eliminate any chances it might just be inefficient/inelegant/crappy code. A previous run of the same script on both Cygwin and Ubuntu 9.04's GNOME Terminal, using a source file and a list file of equal length and content, had a significant difference in execution time (2m20s versus just under 3/4 of an hour, GNOME:Cygwin).
The code:
I wouldn't pick on this script for any other reason than: since uninstalling one or two binaries from the Cygwin mirrors and from elsewhere, this is the only script (written by YT within the last two months ) that consistently raises CPU usage, as observed in Task Manager by me, to between 94 and 100% usage while it's being executed. Which is Cygwin "biting itself in the a**," to coin a phrase; the longer the cycle time the longer it takes for a script to finish.
Anyone's advice on how to streamline the code would be much appreciated.
each $( exiv2 ) call creates a child process (exec) which is very expensive (it creates a process, opens and reads the executable, opens the input file, etc.)
You have 8 per 'line'. You are probably creating 10K+ child processes which is a huge amount of cpu overhead and still have done nothing toward getting your result. One exiv2 call can print everything to a file in one call; you can then use process substitution to get all of the data into one variable that you can parse.
This dovetails, coincidentally, with something the author of Exiv2, Andreas Huggel, and I were discussing on his projects forum just yesterday and the day before. The "-g" option in his command-line tool, as implemented, has one shortcoming: in the case of metadata tags, keys and fields that can support multiple entries, when invoked, -g only returns the first one.
Right now, you have to use one of the modified capital P (for "Print") options, specifically -Pnv, to get the multi-line data in two columns, and then grep for the key name you want. I was thinking maybe a similar approach, one ...
...with succeeding greps and cuts (or their equivalents in IFS settings and other builtins) to break down the variable fullfiledata into its component parts, which then can be further evaluated by the "if/then/fi" conditional loops, and printed to a file in the same way the current script does (unless there's a better approach to that as well).
Eight calls at once is indeed heavy loading. Especially with Cygwin, which handles that sort of thing pißß-poorly anyway, it's well-known.
This may solve another problem with the script. It hasn't been clearing those *foo2* variables consistently. Before I used the unsets, I had one line of just variable=;nextvariable=; and so forth, which did the job but (idiot me) I thought it was that which was slowing things down. Proof was in the pudding, though; a fieldsmissing.txt file I had my GNOME Terminal create had several line items with the same missing tag -- checked out in XnView, the pics were not missing the tag the list said they were. Root of that problem: a hung variable value from evaluating some earlier line item (maybe one two-dozen lines back in list.txt ).
if "one call [to Exiv2] does it all," the variable-clearing issue and the CPU load problem may both go away. I'll keep the unsets just in case. Now I think I should ask what builtins I can use to break down the "fullfiledata" variable. I've been an external 'junkie' so long in my scripting, I've yet to get my mind around them all, in a big way.
Thanks for the help so far. Hoping it continues.
BZT
Last edited by SilversleevesX; 09-21-2010 at 10:11 AM..
Reason: Completed a complex thought; style slimming, rephrasing.
Eight calls at once is indeed heavy loading. Especially with Cygwin, which handles that sort of thing pißß-poorly anyway, it's well-known.
Small correction here: the problem ain't Cygwin, it's Windows. UNIX software (including that running inside Cygwin) relies heavily on the fact that a fork() is a fast process spawning call. On Windows, however, creating a new process is something very very slow, compared to creating a new thread, which is the preferred model for Microsoft. Since each subshell is a fork+exec, combined with slow process creation... well, you've seen the results.
I'm trying to re-create the script using one call to exiv2. "-Pnv"/"-Pkv" and grep are no good, because the output of the former is a block of text when set to a variable. So I have it writing "fullfiledata" to an ASCII file thus:
Calling any part of that back, I'm reminded of the problems I had when first writing a script to perform this function.
On the command line, as two separate lines, these will return cate2 as empty if indeed there is such a line as "Iptc.Application2.Category" in file x. Likewise for every variable set to a tag by grep to the end of the set of conditionals (the other 7 tags Exiv2 grepped in the original script).
Where the problem occurs is in assembling the variable iptc: something wants to ignore any and all empties and stock it with the "if set but empty" value of foo2. Printing iptc to a file, then, means I'm getting a "false report" on all the files from the first one in list.txt to the last, (or until I get so annoyed I terminate the script, which is more likely at this point).
It has so far made no difference to include "else" statements setting the value of a particular second variable to nothing.
Here's the revised code, letting out all possible stops and (I think) allowing for most possible combinations of missing fields.
BZT
If this is a production matter consider writing C code, where you have infinitely more control. IMO your shell script is going to be hell to maintain, especially if someone else gets it to do a bug fix.
Unix dictum:
IF can't be done resonably in shell script
If it cannot be done resonably in (perl ruby python)
then go to C.
I think you have reached to 'go to C' stage in this project.
I'll start by asking some folks on the Exiv2 forum who appreciate new use concepts for the C++ library and know far better than I how to write the code to employ (didn't want to use the word "exploit, though it may suit better) the resources in both.
BZT
Quote:
Originally Posted by jim mcnamara
exiv/exif has C/C++ libraries to do this.
If this is a production matter consider writing C code, where you have infinitely more control. IMO your shell script is going to be hell to maintain, especially if someone else gets it to do a bug fix.
Unix dictum:
IF can't be done resonably in shell script
If it cannot be done resonably in (perl ruby python)
then go to C.
I think you have reached to 'go to C' stage in this project.
I am trying to use a batch file to automatically execute a bash script with no luck this far.
The batch script looks like this:
C:\Cygwin64\bin\bash test.sh
I have also tried this:
C:\Cygwin64\bin\bash "C:\Cygwin64\bin\test.sh"
Needless to say that the windows box has Cygwin... (7 Replies)
Hi everybody,
First, I'm sorry for my bad english!
I have the following situation:
I have a Windows 2012 R2 with Cygwin installed. The Windows Server is used as a backup Server with Dell AppAssure installed. At the moment, AppAssure saves Backup Targets to a repository on his D. The... (9 Replies)
All,
Have a weird issue where i need to generate a report from GitHub monthly detailing user accounts and the last time they logged in. I'm using a windows box to do this (work issued) and would like to know if anyone has any experience scripting for GitAPI using windows / cygwin / powershell?... (9 Replies)
As stated, I am looking into keeping my backup drive unmounted in normal windows use. Partly this is to address threats like cryptolocker. Since one of my backup drives is an internal drive, it will not likely afford any protection from such a threat. I am thinking of adding code to my rsync script... (5 Replies)
I'm using Notepad++ to edit my BASH scripts and using CYGWIN to run them from Windows7.
In Notepad++ there is a 'Run' capability. How do I get this to run my scripts directly without having to enter the script name from the Cygwin command line? (3 Replies)
Hi guys,
I'm basically looking for some help with a bash script I've written. It's purpose is to assign process to individual CPU cores once that process hits 15% CPU usage or more. If it drops below 15%, it's unassigned again (using taskset).
My problem is that I can't think of a way to... (2 Replies)
Hello everyone,
I am struggling a bit with a batch script that I need to run in cygwin. I work in winXP and I had to write some awk scripts to do some file manipulation, and now I would like to automate the process by just running a batch file so that my colleagues can use it easily.
Now, the... (2 Replies)
I wrote a very simple script that matches combinations of alphabetic characters (1-5). I want to use it to test CPU speeds of different hardware/platforms. The problem is that on multi-core/processor systems, only one CPU is being utilized to execute the script. Is there a way to change that?... (16 Replies)
I'm writing a bash script to log some selections from a sensors output (core temp, mb temp, etc.) and I would also like to have the current cpu usage as a percentage. I have no idea how to go about getting it in a form that a bash script can use. For example, I would simply look in the output of... (3 Replies)
Hi, all,
I try to run a quite simple bash script mytest.sh in cygwin, it's content is:
#!/bin/bash
echo "It is my first bash shell"
there are three lines in the script. The second line is blank line.
When I run it use command: bash c:/mytest.sh, ... (6 Replies)