Need help optimizing this piece of code (Shell script Busybox)
I am looking for suggestions on how I could possibly optimized that piece of code where most of the time is spend on this script. In a nutshell this is a script that creates an xml file(s) based on certain criteria that will be used by a movie jukebox.
Example of data:
$SORTEDTMP= it is a file created using find that contain all the movies (200+) on my drive. ex:
/USB/movies/science-fiction/alien/alien.avi
/USB/movies/comedy/funnymovie/funnymovie.avi
....etc
Movieinfo = Just an xml file containing data about the movie; I only care about extracting the movie title if the file exist vs using the filename as the movie title which may or may not be accurate.
All the other jpg, bmp may or may not exist and appropriate action are taken based on their existence.
Currently the created xml file is being appended for every movies that match the grep MOVIESPATH which could include all the movies. This part of the code is called up many times to create an xml files for all the different movie category including "All movies". I am not sure if I can store all the data in a temporary place in memory and before I leave this loop write the file would be faster ????? I don't know and do not have much experience.
So I'll ask again does anyone sees anything there that could be optimized that would significantly increase the running speed of this piece of code.
The performance problem is not due to your redirection -- that's not ideal, but doesn't really HURT you that much.
It's mostly due to running grep | sed | awk | cut | kitchen | sink repeatedly...
For instance I might do something like
Running one awk for the whole batch is much faster than running 10,000 awks for 10,000 lines. And you may be able to use awk for more than one thing here, fed into several variables a loop, which would help you do more things yet with one program. You could probably even move the grep inside awk.
You can also move your redirection out of the loop completely:
For any further processing needed, it's much more efficient to use shell builtins than awk | cut | sed | kitchen | sink. This being busybox and not bash the builtins you have available will be rather few, though, but it's still not impossible.
Unfortunately, as you have not posted your input data, I have no idea what data needs to be processed into what. What those XML files look like, and exactly what the output from that grep looks like, would be good.
The performance problem is not due to your redirection -- that's not ideal, but doesn't really HURT you that much.
It's mostly due to running grep | sed | awk | cut | kitchen | sink repeatedly...
For instance I might do something like
Running one awk for the whole batch is much faster than running 10,000 awks for 10,000 lines. And you may be able to use awk for more than one thing here, fed into several variables a loop, which would help you do more things yet with one program. You could probably even move the grep inside awk.
You can also move your redirection out of the loop completely:
For any further processing needed, it's much more efficient to use shell builtins than awk | cut | sed | kitchen | sink. This being busybox and not bash the builtins you have available will be rather few, though, but it's still not impossible.
Unfortunately, as you have not posted your input data, I have no idea what data needs to be processed into what. What those XML files look like, and exactly what the output from that grep looks like, would be good.
Unfortunately this script need to run on a media player and awk is not available on it. Of course I could always copy it there but I rather use the default tools available on the media player because it makes it easier to distribute.
The input file for this loop as I mentioned on the first post is the list of all my movies on my hard drive created using the find command:
If there's not an explicit link made, you can use it with busybox awk. Try it and see. awk will make everything easier if you have it.
You would think that awk would be available but unfortunately they are using an old version of busybox 1.1.3 that they recompile themself to be as small as possible since it is flash into an embeded device (media player) that only has 128 MB flash nand. Trust me it is not available by issuing busybox by itself it displays all the command that are available and awk is not one of them or trust me I would definitely use it; it would make my life a lot easier.
Quote:
Originally Posted by Corona688
Lastly, you forgot to post data from your XML files. I can't do anything without seeing the data I'm supposed to transform, with or without awk.
Input to the script $SORTEDTMP is not an XML files but a list of all the movies on my movie directory as explained in previous post.
The output is XML and can be deducted from the code. For example for the input of the previous post and $MOVIEPATH = "/USB/movies/science-fiction/" the output created by the code would be:
For all the movies in the /USB/movies/science-fiction/directory.
I left out the xml header that is created before the loop and the footer which is created after the loop; but in case anyone is interested here it is:
As you can see all the <movie> elements are appended to the file $RSS = jukebox.xml on every loop iteration; I am not sure how much time consuming this is ...... anyway if all the data was sent to some buffer/array/variable and then create the file $RSS=jukebox.xml after the loop is done be more efficient versus a write file operation on each loop iteration ???? Would that even make it faster ???? If so what would be the best method to do that.
Ah!!! You got me there; this XML file contain description for the movies ex: tiltle, synopsis, rating etc.... The only information I am looking for in that XML if it is available is the movie title I do not care about the rest of the information. In a lot of cases that XML file does not even exist and the movie title is derived from the basename filename with the .avi, .mpg etc.... removed
see this line: MovieName=`basename "$plik" | sed 's/\(.*\)\..*/\1/'`
Well at least here's one thing for improvement I could just do this:
Instead of assigning "Moviename" when not even required.
Just saved a "basename" and "sed" call if MovieInfo.nfo file do exist.
I have prepared a shell script to find the duplicates based on the part of filename and retain latest.
#!/bin/bash
if ; then
mkdir -p dup
fi
NOW=$(date +"%F-%H:%M:%S")
LOGFILE="purge_duplicate_log-$NOW.log"
LOGTIME=`date "+%Y-%m-%d %H:%M:%S"`
echo... (6 Replies)
Hi,
I need to remove first column from a csv file and i can do this by using below command.
cut -f1 -d, --complement Mytest.csv
I need to implement this in shell scripting, Whenever i am using the above command alone in command line it is working fine.
I have 5 files in my directory and... (3 Replies)
Hi All,
I have a script that I wrote on a bash shell, I use it to sort files from a directory into various other directories. I have an variable set, which is an array of strings, I then check each file against the array and if it is in there the script sorts it into the correct folder.
But... (5 Replies)
Hi,
I have two files in the format listed below. I need to find out all values from field 12 to field 20 present in file 2 and list them in file3(format as file2)
File1 :
FEIN,CHRISTA... (2 Replies)
Hello -- I am trying to learn to do a little sed and awk scripting to search for text and numbers in text files (text processing/manipulation). My professor gave me a piece of uncommented code and I am very unfamiliar w/ the language. Can someone help me with comments so I can understand what is... (2 Replies)
Hi,
Does anybody know how to enable the shell sh while creating Ramdisk fs using BusyBox? while creating a configuration using the GUI, I see options only for the ash shell. Is there some option in the config file that gets created with which I can enable the sh shell also apart from the ash... (0 Replies)
hi fndz.
Can you please help me with the code if I call a stored procedure from my shell script and stored procedure returns a cursor,
cursor output should be saved to a file (3 Replies)
Hi All,
I am trying to understand and change some code written by some programmer a while ago. There are following three lines of code that I am unable to grasp. Could anybody please help me understand it?
1) cd - > /dev/null
2) fname=`basename "$1"` where $1 = /dirA/dirB/a.txt
... (3 Replies)
use "getopts" to get params from command. Need replace black with a specified string like "%20
DEFAULT_DELIM=%20
...
while getopts dek:f:t:vh OPTION
do
case $OPTION in
t)
DELIM=`tvar=/'"$OPTARG"'/ svar="$DEFAULT_DELIM" awk 'BEGIN{T=ENVIRON;S=ENVIRON; while(index(T,S)!=0){S=S"0"};print... (0 Replies)