Shell Programming and Scripting

View Public Profile for snappy46

10-04-2011

Registered User

16, 0

Join Date: May 2011

Last Activity: 6 April 2012, 12:46 PM EDT

Posts: 16

Thanks Given: 3

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Corona688

I'm assuming the XML data for that looks like what you posted before?

Working on something.

Yes, but extracting the title from the MovieInfo.nfo file is pretty fast it's all those if / test statement and writing to the file that seem to take a long time. For example if my drive contain approximately 200 movies (not really that much considering some people have thousand) divided in 9 movies genres the loop would be run like 10 times using the same list but different MOVIESPATH. For example

first run = All the movies 200 index to create XML file
Second run = genre science-fiction maybe 30 movies indexes to create XML file
Third run = genre romance maybe 20 movies indexes to create XML file
etc.....

Obviously the total of movies for the Second run to the last run (10) will include a total of 200 movies.

I hope I am making sense and thanks again for your help/time.

snappy46

Find all posts by snappy46

10-04-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Here's a version tested in busybox which uses almost pure shell builtins:

Code:

MOVIESPATH="./moviedir/"
SORTEDTMP="./movie"
OLDIFS="$IFS"
RSS="rssfile"

# I just do this to have any info at all...
find ./moviedir -iname '*.avi' > "$SORTEDTMP"

grep "$MOVIESPATH" "$SORTEDTMP" | while read LINE
do
        MOVIEPATH="${LINE%/*}"  # Shell builtins instead of basename
        MOVIEFILE="${LINE##*/}" # Shell builtins instead of basedir

        if ! [ "$MOVIEPATH/$MOVIEFILE" = "$LINE" ]
        then
                echo "Error processing line" >&2
                continue
        fi

        # Initialize defaults, replace later
        MOVIETITLE="${MOVIEFILE/.*}"  # Strip off .ext
        MOVIESHEET=/usr/local/etc/srjg/NoMovieinfo.bmp
        MOVIEPOSTER=/usr/local/etc/srjg/nofolder.bmp

        if [ -e "$MOVIEPATH/MovieInfo.nfo" ]
        then
                # Look for lines matching <title>
                while read LINE
                do
                        # Strip out <title> to make it shorter.
                        SHORT="${LINE/<title>}"
                        # If it's not shorter, it didn't have <title>
                        [ "${#SHORT}" = "${#LINE}" ] && continue

                        LINE="${LINE//<title>}"  # Strip out <title>
                        LINE="${LINE//<?title>}" # Strip out </title>

                        MOVIETITLE="$LINE"
                        break   # Found <title>, quit looking
                done <"$MOVIEPATH/MovieInfo.nfo"
        fi

        # Check for any files of known purpose inside the movie's folder.
        for FILE in "$MOVIEPATH"/*
        do
                [ -e "$FILE" ] || break # No files exist?

                case "${FILE##*/}" in
                "folder.jpg")           MOVIEPOSTER="$FILE"     ;;
                "${MOVIENAME}.jpg")     MOVIEPOSTER="$FILE"     ;;
                "about.jpg")            MOVIESHEET="$FILE"      ;;
                "0001.jpg")             MOVIESHEET="$FILE"      ;;
                "${MOVIENAME}_sheet.jpg")       MOVIESHEET="$FILE" ;;
                *)      ;;
                esac
        done

        # Print it all in one whack with a here-document.
        cat <<EOF
<Movie>
<title>$MOVIETITLE</title>
<poster>$MOVIEPOSTER</poster>
<info>$MOVIESHEET</info>
<file>$MOVIEFILE</file>
</Movie>

EOF
        # Note:  OVERWRITES $RSS
done > $RSS

---------- Post updated at 02:02 PM ---------- Previous update was at 01:57 PM ----------

Quote:

Originally Posted by snappy46

Yes, but extracting the title from the MovieInfo.nfo file is pretty fast it's all those if / test statement and writing to the file that seem to take a long time.

What sort of disk are you writing to?

Shell builtins are hundreds of times faster than calling an external utility to operate on tiny amounts of data.

I think I've reduced the number of if/else's by using a case, too.

Also, you were reopening $RSS dozens of times, which probably didn't help.

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for snappy46

10-05-2011

Registered User

16, 0

Join Date: May 2011

Last Activity: 6 April 2012, 12:46 PM EDT

Posts: 16

Thanks Given: 3

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Corona688

What sort of disk are you writing to?

Shell builtins are hundreds of times faster than calling an external utility to operate on tiny amounts of data.

I think I've reduced the number of if/else's by using a case, too.

Also, you were reopening $RSS dozens of times, which probably didn't help.

That is awesome I did not expect anyone to do all the dirty work for me. Hopefully this script will work fine on the old default busybox available on the media player.

I forgot to mentioned on my previous post that the whole indexing process All movies + all genres (200 movies) takes about 3to4 minutes. Hopefully your way of doing things will cut that down some.

The jukebox.xml file created by the loop is normally stored/written to the media player (internal drive) or externally connected USB drive; it all depends where the movies are located. The jukebox.xml is store in the genre root directory or the movie directory for the All movies.

I can wait to try this out. I will post the results once I did.

Again thank you.

---------- Post updated at 11:27 PM ---------- Previous update was at 04:55 PM ----------

Hi Corona,

Here's your script with some minor changes to make it work for me.

Code:

#MOVIESPATH="./moviedir/"
#SORTEDTMP="./movie"
#OLDIFS="$IFS"
#RSS="rssfile"

# I just do this to have any info at all...
#find ./moviedir -iname '*.avi' > "$SORTEDTMP"

grep "$MOVIESPATH" "$SORTEDTMP" | while read LINE
do
        MOVIEPATH="${LINE%/*}"  # Shell builtins instead of dirname
        MOVIEFILE="${LINE##*/}" # Shell builtins instead of basename
        MOVIENAME="${MOVIEFILE%.*}"  # Strip off .ext       

        if ! [ "$MOVIEPATH/$MOVIEFILE" = "$LINE" ]
        then
                echo "Error processing line" >&2
                continue
        fi

        # Initialize defaults, replace later


        MOVIETITLE="$MOVIENAME"
        MOVIESHEET=/usr/local/etc/srjg/NoMovieinfo.bmp
        MOVIEPOSTER=/usr/local/etc/srjg/nofolder.bmp

  if [ -e "$MOVIEPATH/MovieInfo.nfo" ];
  then
     MOVIEINFO="$MOVIEPATH/MovieInfo.nfo"
     MOVIETITLE=`grep "<title>.*<.title>" "$MOVIEINFO" | sed -e "s/^.*<title/<title/" | cut -f2 -d">"| cut -f1 -d"<"`
  fi

#        if [ -e "$MOVIEPATH/MovieInfo.nfo" ]
#        then
                # Look for lines matching <title>
#                while read LINE
#                do
                        # Strip out <title> to make it shorter.
#                        SHORT="${LINE/<title>}"
                        # If it's not shorter, it didn't have <title>
#                        [ "${#SHORT}" = "${#LINE}" ] && continue

#                        LINE="${LINE//<title>}"  # Strip out <title>
#                        LINE="${LINE//<?title>}" # Strip out </title>

#                        MOVIETITLE="$LINE"
#                        break   # Found <title>, quit looking
#                done <"$MOVIEPATH/MovieInfo.nfo"
#        fi

        # Check for any files of known purpose inside the movie's folder.
        for FILE in "$MOVIEPATH"/*
        do
                [ -e "$FILE" ] || break # No files exist?

                case "${FILE##*/}" in
                "folder.jpg")           MOVIEPOSTER="$FILE"     ;;
                "${MOVIENAME}.jpg")     MOVIEPOSTER="$FILE"     ;;
                "about.jpg")            MOVIESHEET="$FILE"      ;;
                "0001.jpg")             MOVIESHEET="$FILE"      ;;
                "${MOVIENAME}_sheet.jpg")       MOVIESHEET="$FILE" ;;
                *)      ;;
                esac
        done

        # Print it all in one whack with a here-document.
        cat <<EOF
<Movie>
<title>$MOVIETITLE</title>
<poster>$MOVIEPOSTER</poster>
<info>$MOVIESHEET</info>
<file>$MOVIEPATH/$MOVIEFILE</file>
</Movie>

EOF
        # Note:  OVERWRITES $RSS
done >> $RSS

I could not get the function to extract the title from the MovieInfo.nfo file to work so I just inserted the one I already had in my script just to test the difference between the two script. The results were very surprising to me to say the least.

The original script as provided in my first post took 1 minutes and 49 second to process my 200 movies. The new source code you provided took 2 minutes and 46 seconds almost a whole minute longer ???? I could not believe the results so I tried it again with the same results. It would appear that using sed/cut/grep etc to get the work done is faster than using the built-in substitution command ????? I was quite shock, something in there seem to take a lot of time to accomplish.

I still think that I can use some of your code process to cut down further the process time ... well maybe. I would think that creating the file only once would be faster than appending the file for every movie element. I will try to introduce some of those step you have one at a time and see what makes thing go faster and what make things go slower.

I learned a lot from your inputs/code so again thank you.

Snappy46

snappy46

Find all posts by snappy46

10-05-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by snappy46

I could not get the function to extract the title from the MovieInfo.nfo file to work so I just inserted the one I already had in my script just to test the difference between the two script.

Throwing away nearly all the speed gain in the process

If you could post the literal contents of one of those files, that'd be good. I built it to work with your test data. If it's actually any different, then I really need to see what it is.

Quote:

It would appear that using sed/cut/grep etc to get the work done is faster than using the built-in substitution command ????? I was quite shock, something in there seem to take a lot of time to accomplish.

The built-in substitution operator is hundreds of times faster than calling an external utility. At the very least the builtins instead of basename and basedir ought to be better. The rest of my code may have depended on certain assumptions about your data.

Are these folders full of irrelevant files? If so, the for FILE in "$MOVIEPATH"/* loop will waste a lot of time. Come to think of it, since we're only interested in .jpg, you can make it for FILE in "$MOVIEPATH"/*.jpg

---------- Post updated at 09:29 AM ---------- Previous update was at 09:20 AM ----------

Here's a version which works without trawling every file in the folder. You can replace the long if-else chain with two for's. It also helps make the lists longer without making your code longer (though testing for too many things will slow you down in any case).

Code:

for FILE in "folder.jpg" "${MOVIENAME}.jpg"
do
        [ ! -e "$MOVIEPATH/$FILE" ] && continue
        MOVIEPOSTER="$MOVIEPATH/$FILE"
        break
done

for FILE in "about.jpg" "0001.jpg" "${MOVIENAME}_sheet.jpg"
do
        [ ! -e "$MOVIEPATH/$FILE" ] && continue
        MOVIESHEET="$MOVIEPATH/$FILE"
        break
done

---------- Post updated at 09:35 AM ---------- Previous update was at 09:29 AM ----------

And you can strip out this completely:

Code:

        if ! [ "$MOVIEPATH/$MOVIEFILE" = "$LINE" ]
        then
                echo "Error processing line" >&2
                continue
        fi

I just put it there in case your input data was radically different from what I assumed it was.

Last edited by Corona688; 10-05-2011 at 12:35 PM..

Corona688

View Public Profile for snappy46

10-06-2011

Registered User

16, 0

Join Date: May 2011

Last Activity: 6 April 2012, 12:46 PM EDT

Posts: 16

Thanks Given: 3

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Corona688

Throwing away nearly all the speed gain in the process Smilie

I know this is temporary I plan on figuring out why this does not work; I am getting "bad substitution" on the busybox version that I am using on those lines:

SHORT="${LINE/<title>}"
LINE="${LINE//<title>}" # Strip out <title>
etc....

Can not remember which ones exactely but more than one of those substitution caused a problem. I had the same issue with that line:

Code:

 
="${MOVIEFILE/.*}"  # Strip off .ext

which I fixed by changing to this:

Code:

 
="${MOVIEFILE%.*}"  # Strip off .ext

I did not have much time to play around with that last nights and was curious about the change in processing time so I jump right in with the old code. Apparently my wife's needs are more important than working on this script.

Quote:

Originally Posted by Corona688

If you could post the literal contents of one of those files, that'd be good. I built it to work with your test data. If it's actually any different, then I really need to see what it is. The built-in substitution operator is hundreds of times faster than calling an external utility. At the very least the builtins instead of basename and basedir ought to be better. The rest of my code may have depended on certain assumptions about your data.

I will provide sample of the input files (SORTEDTMP, MovieInfo.nfo) and output file (jukebox.xml) hopefully tonight time permitting; I am not home right now and do not have access to those file. Agree built-in substitution should be faster.

Quote:

Originally Posted by Corona688

Are these folders full of irrelevant files? If so, the for FILE in "$MOVIEPATH"/* loop will waste a lot of time. Come to think of it, since we're only interested in .jpg, you can make it for FILE in "$MOVIEPATH"/*.jpg

Yeah there is quite a few files on there some relevant some not so much. I think you just nailed it why it takes so much more time. I will use those new "for loop" you provided and I have a feeling that I will finally see some speed increase compare to the old script.

Thanks again from one fellow Canadian to another.

Cheers!!!

---------- Post updated 10-06-11 at 12:17 AM ---------- Previous update was 10-05-11 at 01:51 PM ----------

Ok when using the for loop and deleting the unnecessary if statement as indicated in your previous post I now can process my 200 movies in 1 minutes and 10 seconds .... wow!! That is an improvement of about 40 sec compare to the original script. That makes more sense; I guess the case statement + the number of files were definitely slowing things down.

Now trying to get the procedure to pull the title from the MovieInfo.nfo file to work but I am still stuck on one thing that does not want to work. In my version of busybox we must use "#" for deletion from the left to match; and "%" for deletion from the right to match. Once I figure that out you would think that thing would have when pretty smoothly but of course it did not. I can easily delete the <title> but I am unsuccessful in deleting the </title> ... I think that the "/" forward slash is creating a problem ???? Funny thing is that it works fine in the interactive mode but refuses to work in the script ????

This works fine in the shell:

export foo="<title>hello</title>
echo "${foo%</title>}"
or
echo "${foo%%</title>}"
or
echo "${foo%<?title>}"
or
echo "${foo%<*title>}"

all those returns "<title>hello" but none of those combination will remove the </title> when used in the script ???? This is driving me crazy any ideas??????

But this works fine in the script to remove the <title>

Code:

LINE="${LINE#<title>}"  # Strip out <title>

By the way I also removed that line since it is not really needed and just used SHORT for the Strip out </title>

Here's the procedure that works fine except for the strip put </title>.

Code:

       if [ -e "$MOVIEPATH/MovieInfo.nfo" ]
        then
                # Look for lines matching <title>
                while read LINE
                do
                        # Strip out <title> to make it shorter.
                        SHORT="${LINE#<title>}"
                        # If it's not shorter, it didn't have <title>
                        [ "${#SHORT}" = "${#LINE}" ] && continue

#                        LINE="${LINE#<title>}"  # Strip out <title>
                        LINE="${SHORT%%</title>}" # Strip out </title>

                        MOVIETITLE="$LINE"
                        break   # Found <title>, quit looking
                done <"$MOVIEPATH/MovieInfo.nfo"
        fi

Thanks

snappy46

Find all posts by snappy46

10-06-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

# strips from the beginning of the string, % from the end. ## and %% are the same when you're removing a literal string, so might as well use % and #. See string operations. Not all shells have all of these. Some don't have any of them. Developers on embedded busybox are more fortunate than some people using 'real' shells, it has at least a few

Code:

VAR="${VAR#<title>}" # Strip out <title>
VAR="${VAR%</title>}" # Then strip out </title>

---------- Post updated at 03:17 PM ---------- Previous update was at 03:08 PM ----------

YOu could also try putting quotes around "${VAR%"</title>"}"

---------- Post updated at 03:20 PM ---------- Previous update was at 03:17 PM ----------

globbing should also work inside it, so:

Code:

VAR="${VAR#<[^>]*>}" # Remove first <...>
VAR="${VAR%<[^>]*>}" # Remove last <...>

---------- Post updated at 03:21 PM ---------- Previous update was at 03:20 PM ----------

Also: If it works in the prompt and not in your script, then your data may not be what you really think it is. You feed it a nice "<title>stuf</title>" and it works but the data actually has stuff after <title> perhaps. PLEASE post a sample of your input data!!

Last edited by Corona688; 10-06-2011 at 06:16 PM..

Corona688