Need help optimizing this piece of code (Shell script Busybox)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need help optimizing this piece of code (Shell script Busybox)
# 1  
Old 10-03-2011
Need help optimizing this piece of code (Shell script Busybox)

I am looking for suggestions on how I could possibly optimized that piece of code where most of the time is spend on this script. In a nutshell this is a script that creates an xml file(s) based on certain criteria that will be used by a movie jukebox.

Example of data:

$SORTEDTMP= it is a file created using find that contain all the movies (200+) on my drive. ex:
/USB/movies/science-fiction/alien/alien.avi
/USB/movies/comedy/funnymovie/funnymovie.avi
....etc

$MOVIESPATH= string = "/USB/movies/science-fiction/"

Movieinfo = Just an xml file containing data about the movie; I only care about extracting the movie title if the file exist vs using the filename as the movie title which may or may not be accurate.

All the other jpg, bmp may or may not exist and appropriate action are taken based on their existence.

Currently the created xml file is being appended for every movies that match the grep MOVIESPATH which could include all the movies. This part of the code is called up many times to create an xml files for all the different movie category including "All movies". I am not sure if I can store all the data in a temporary place in memory and before I leave this loop write the file would be faster ????? I don't know and do not have much experience.

So I'll ask again does anyone sees anything there that could be optimized that would significantly increase the running speed of this piece of code.

Code:
 
grep "$MOVIESPATH" $SORTEDTMP | while read plik
do
  MovieName=`basename "$plik" | sed 's/\(.*\)\..*/\1/'`
  DirectoryPath=`dirname "$plik"`
  if [ -e "$DirectoryPath/MovieInfo.nfo" ];
  then
     MOVIEINFO="$DirectoryPath/MovieInfo.nfo"
     MOVIETITLE=`grep "<title>.*<.title>" "$MOVIEINFO" | sed -e "s/^.*<title/<title/" | cut -f2 -d">"| cut -f1 -d"<"`
  else
     MOVIETITLE=$MovieName
  fi
  if [ -e "$DirectoryPath/folder.jpg" ];
  then
     MOVIEPOSTER=$DirectoryPath/folder.jpg
  elif [ -e "$DirectoryPath/${MovieName}.jpg" ];
  then
     MOVIEPOSTER=$DirectoryPath/${MovieName}.jpg
  else  
     MOVIEPOSTER=/usr/local/etc/srjg/nofolder.bmp
  fi
 
  if [ -e "$DirectoryPath/about.jpg" ];
  then
     MOVIESHEET=$DirectoryPath/about.jpg
  elif [ -e "$DirectoryPath/0001.jpg" ];
  then
     MOVIESHEET=$DirectoryPath/0001.jpg
  elif [ -e "$DirectoryPath/${MovieName}_sheet.jpg" ];
  then
     MOVIESHEET=$DirectoryPath/${MovieName}_sheet.jpg
  else  
     MOVIESHEET=/usr/local/etc/srjg/NoMovieinfo.bmp
  fi
  echo -e '<Movie>
  <title>'$MOVIETITLE'</title>
  <poster>'$MOVIEPOSTER'</poster>
  <info>'$MOVIESHEET'</info>
  <file>'$plik'</file>
  </Movie>' >> $RSS 
done

Any help appreciated. Thank you.
# 2  
Old 10-03-2011
The performance problem is not due to your redirection -- that's not ideal, but doesn't really HURT you that much.

It's mostly due to running grep | sed | awk | cut | kitchen | sink repeatedly...

For instance I might do something like
Code:
grep "$STRING" | awk ... | while read ITEM1 ITEM2 ITEM3 ; do ... ; done

Running one awk for the whole batch is much faster than running 10,000 awks for 10,000 lines. And you may be able to use awk for more than one thing here, fed into several variables a loop, which would help you do more things yet with one program. You could probably even move the grep inside awk.

You can also move your redirection out of the loop completely:

Code:
grep "$STRING" | awk ... | while read ITEM1 ITEM2 ITEM3 ; do echo something ; done >> appended_file

For any further processing needed, it's much more efficient to use shell builtins than awk | cut | sed | kitchen | sink. This being busybox and not bash the builtins you have available will be rather few, though, but it's still not impossible.

Unfortunately, as you have not posted your input data, I have no idea what data needs to be processed into what. What those XML files look like, and exactly what the output from that grep looks like, would be good.
# 3  
Old 10-03-2011
Quote:
Originally Posted by Corona688
The performance problem is not due to your redirection -- that's not ideal, but doesn't really HURT you that much.

It's mostly due to running grep | sed | awk | cut | kitchen | sink repeatedly...

For instance I might do something like
Code:
grep "$STRING" | awk ... | while read ITEM1 ITEM2 ITEM3 ; do ... ; done

Running one awk for the whole batch is much faster than running 10,000 awks for 10,000 lines. And you may be able to use awk for more than one thing here, fed into several variables a loop, which would help you do more things yet with one program. You could probably even move the grep inside awk.

You can also move your redirection out of the loop completely:

Code:
grep "$STRING" | awk ... | while read ITEM1 ITEM2 ITEM3 ; do echo something ; done >> appended_file

For any further processing needed, it's much more efficient to use shell builtins than awk | cut | sed | kitchen | sink. This being busybox and not bash the builtins you have available will be rather few, though, but it's still not impossible.

Unfortunately, as you have not posted your input data, I have no idea what data needs to be processed into what. What those XML files look like, and exactly what the output from that grep looks like, would be good.
Unfortunately this script need to run on a media player and awk is not available on it. Of course I could always copy it there but I rather use the default tools available on the media player because it makes it easier to distribute.

The input file for this loop as I mentioned on the first post is the list of all my movies on my hard drive created using the find command:

Code:
 
find "$MOVIESPATH" | egrep -i '\.(asf|avi|dat|divx|flv|img|iso|m1v|m2p|m2t|m2ts|m2v|m4v|mkv|mov|mp4|mpg|mts|qt|rm|rmp4|rmvb|tp|trp|ts|vob|wmv)$' | egrep -iv "$FILTER" > $TMP

Then sorted and saves as $SORTEDTMP

hence input is:

/USB/movies/science-fiction/alien/alien.avi
/USB/movies/comedy/funnymovie/funnymovie.avi
/USB/movies/science-fiction/Ironman/IronMan.avi
etc ....

grep output is based on $MOVIESPATH so if it is equal to "/USB/movies/science-fiction/" then two output will be produce by the grep command:

/USB/movies/science-fiction/alien/alien.avi
/USB/movies/science-fiction/Ironman/IronMan.avi

Thank you for your inputs.

Snappy46
# 4  
Old 10-03-2011
Quote:
Originally Posted by snappy46
Unfortunately this script need to run on a media player and awk is not available on it.
Smilie busybox almost certainly comes with awk.

If there's not an explicit link made, you can use it with busybox awk. Try it and see. awk will make everything easier if you have it.

Lastly, you forgot to post data from your XML files. I can't do anything without seeing the data I'm supposed to transform, with or without awk.
# 5  
Old 10-04-2011
Quote:
Originally Posted by Corona688
Smilie busybox almost certainly comes with awk.

If there's not an explicit link made, you can use it with busybox awk. Try it and see. awk will make everything easier if you have it.
You would think that awk would be available but unfortunately they are using an old version of busybox 1.1.3 that they recompile themself to be as small as possible since it is flash into an embeded device (media player) that only has 128 MB flash nand. Trust me it is not available by issuing busybox by itself it displays all the command that are available and awk is not one of them or trust me I would definitely use it; it would make my life a lot easier. Smilie

Quote:
Originally Posted by Corona688
Lastly, you forgot to post data from your XML files. I can't do anything without seeing the data I'm supposed to transform, with or without awk.
Input to the script $SORTEDTMP is not an XML files but a list of all the movies on my movie directory as explained in previous post.

The output is XML and can be deducted from the code. For example for the input of the previous post and $MOVIEPATH = "/USB/movies/science-fiction/" the output created by the code would be:

<Movie>
<title>alien</title>
<poster>/USB/movies/science-fiction/alien/folder.jpg</poster>
<info>/USB/movies/science-fiction/alien/about.jpg</info>
<file>/USB/movies/science-fiction/alien/alien.avi</file>
</Movie>
<Movie>
<title>IronMan</title>
<poster>/USB/movies/science-fiction/IronMan/folder.jpg</poster>
<info>/USB/movies/science-fiction/IronMan/about.jpg</info>
<file>/USB/movies/science-fiction/IronMan/IronMan.avi</file>
</Movie>
etc.......

For all the movies in the /USB/movies/science-fiction/directory.


I left out the xml header that is created before the loop and the footer which is created after the loop; but in case anyone is interested here it is:

Code:
 
echo -e '<?xml version="1.0" encoding="UTF-8"?>
<Jukebox>' > $RSS
 
echo -e "</Jukebox>" >> $RSS

As you can see all the <movie> elements are appended to the file $RSS = jukebox.xml on every loop iteration; I am not sure how much time consuming this is ...... anyway if all the data was sent to some buffer/array/variable and then create the file $RSS=jukebox.xml after the loop is done be more efficient versus a write file operation on each loop iteration ???? Would that even make it faster ???? If so what would be the best method to do that.

Any ideas on this specific subject ....

Thank you.
# 6  
Old 10-04-2011
If none of the input is XML, then what is
Code:
MOVIETITLE=`grep "<title>.*<.title>" "$MOVIEINFO" | sed -e "s/^.*<title/<title/" | cut -f2 -d">"| cut -f1 -d"<"`

for?
# 7  
Old 10-04-2011
Quote:
Originally Posted by Corona688
If none of the input is XML, then what is
Code:
MOVIETITLE=`grep "<title>.*<.title>" "$MOVIEINFO" | sed -e "s/^.*<title/<title/" | cut -f2 -d">"| cut -f1 -d"<"`

for?
Ah!!! You got me there; this XML file contain description for the movies ex: tiltle, synopsis, rating etc.... The only information I am looking for in that XML if it is available is the movie title I do not care about the rest of the information. In a lot of cases that XML file does not even exist and the movie title is derived from the basename filename with the .avi, .mpg etc.... removed

see this line: MovieName=`basename "$plik" | sed 's/\(.*\)\..*/\1/'`

Well at least here's one thing for improvement I could just do this:

Code:
 
if [ -e "$DirectoryPath/MovieInfo.nfo" ];
  then
     MOVIEINFO="$DirectoryPath/MovieInfo.nfo"
     MOVIETITLE=`grep "<title>.*<.title>" "$MOVIEINFO" | sed -e "s/^.*<title/<title/" | cut -f2 -d">"| cut -f1 -d"<"`
  else
     MOVIETITLE=`basename "$plik" | sed 's/\(.*\)\..*/\1/'`

Instead of assigning "Moviename" when not even required.

Just saved a "basename" and "sed" call if MovieInfo.nfo file do exist.


Cheers !!!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Optimizing the Shell Script [Expert Advise Needed]

I have prepared a shell script to find the duplicates based on the part of filename and retain latest. #!/bin/bash if ; then mkdir -p dup fi NOW=$(date +"%F-%H:%M:%S") LOGFILE="purge_duplicate_log-$NOW.log" LOGTIME=`date "+%Y-%m-%d %H:%M:%S"` echo... (6 Replies)
Discussion started by: gold2k8
6 Replies

2. Shell Programming and Scripting

Need a piece of shell scripting to remove column from a csv file

Hi, I need to remove first column from a csv file and i can do this by using below command. cut -f1 -d, --complement Mytest.csv I need to implement this in shell scripting, Whenever i am using the above command alone in command line it is working fine. I have 5 files in my directory and... (3 Replies)
Discussion started by: Samah
3 Replies

3. Shell Programming and Scripting

Bash Script to Ash (busybox) - Beginner

Hi All, I have a script that I wrote on a bash shell, I use it to sort files from a directory into various other directories. I have an variable set, which is an array of strings, I then check each file against the array and if it is in there the script sorts it into the correct folder. But... (5 Replies)
Discussion started by: sgtbobie
5 Replies

4. Shell Programming and Scripting

Optimizing the code

Hi, I have two files in the format listed below. I need to find out all values from field 12 to field 20 present in file 2 and list them in file3(format as file2) File1 : FEIN,CHRISTA... (2 Replies)
Discussion started by: nua7
2 Replies

5. Programming

what is the name of this piece of code

while ((numRead = read(inputFd, buf, BUF_SIZE)) > 0) if (write(outputFd, buf, numRead) != numRead) fatal("couldn't write whole buffer"); if (numRead == -1) errExit("read"); if (close(inputFd) == -1) errExit("close input"); if (close(outputFd) == -1) errExit("close output"); ... (1 Reply)
Discussion started by: fwrlfo
1 Replies

6. Shell Programming and Scripting

Looking for guidance (comments) on a piece of code

Hello -- I am trying to learn to do a little sed and awk scripting to search for text and numbers in text files (text processing/manipulation). My professor gave me a piece of uncommented code and I am very unfamiliar w/ the language. Can someone help me with comments so I can understand what is... (2 Replies)
Discussion started by: smithan05
2 Replies

7. Shell Programming and Scripting

Enabling sh shell in BusyBox

Hi, Does anybody know how to enable the shell sh while creating Ramdisk fs using BusyBox? while creating a configuration using the GUI, I see options only for the ash shell. Is there some option in the config file that gets created with which I can enable the sh shell also apart from the ash... (0 Replies)
Discussion started by: jake24
0 Replies

8. Shell Programming and Scripting

script or piece of code where the data returned by a stored procedure you are writing

hi fndz. Can you please help me with the code if I call a stored procedure from my shell script and stored procedure returns a cursor, cursor output should be saved to a file (3 Replies)
Discussion started by: enigma_83
3 Replies

9. Shell Programming and Scripting

what does this piece of code do?

Hi All, I am trying to understand and change some code written by some programmer a while ago. There are following three lines of code that I am unable to grasp. Could anybody please help me understand it? 1) cd - > /dev/null 2) fname=`basename "$1"` where $1 = /dirA/dirB/a.txt ... (3 Replies)
Discussion started by: Vikas Sood
3 Replies

10. Shell Programming and Scripting

a piece of code, plz help to review

use "getopts" to get params from command. Need replace black with a specified string like "%20 DEFAULT_DELIM=%20 ... while getopts dek:f:t:vh OPTION do case $OPTION in t) DELIM=`tvar=/'"$OPTARG"'/ svar="$DEFAULT_DELIM" awk 'BEGIN{T=ENVIRON;S=ENVIRON; while(index(T,S)!=0){S=S"0"};print... (0 Replies)
Discussion started by: anypager
0 Replies
Login or Register to Ask a Question