Help with File Slow Processing

06-27-2011

Registered User

24, 0

Join Date: Apr 2011

Last Activity: 9 November 2012, 8:59 AM EST

Posts: 24

Thanks Given: 4

Thanked 0 Times in 0 Posts

Help with File Slow Processing

Hello,

Hope you are doing fine. Let me describe the problem, I have a script that calls another script K2Test.sh, this script K2Test.sh (created by another team) takes date as argument and generates approx 1365 files in localcurves directory for given date.

Out of these 1365 I am only interested in 133 files, so I have created a list of file names (ZEROCURVEFILES as below) that we need to process.

I loop through these 1365 files (`ls $localcurves` as below) and check if file name is in 133 file list (ZEROCURVEFILES ) and if it is then I process the file by reading it line by line.

It seems it takes too long just to process 133 files, am I using some in-efficient code below? is there a way to process it faster? is it slow because I open and read 133 files line by line?

I need to run this script for 400 days which means I would be looping 400 * 1365 times i.e once per day and and for each day process 133 file.

I would really appreciate any help to help make it faster. Here is the code, I know it is too much code, please let me know if something in script.

Code:

#!/bin/sh
#e.g. 20110627 (june 27 2011)
currdate=$1
#e.g. 20100310 (march 10 2010)
enddate=$2

#directory where 1365 files get generated
localcurves="/home/sratta/feds/localCurves/curves"
outputdir="/home/sratta/curves"
#output fileto be generated
OUTFILE="/home/sratta/ZeroCurves/BulkLoad.csv"
touch $OUTFILE

# List of 133 curve file names
ZEROCURVEFILES="saud1-monthlinmid \
saud6-monthlinmid \
.....
suvruvr_usdlinmid \
szarzar_usdlinmid "

#Loop until currdate is not equal to enddate (reverse loop)
while [ $currdate -ne $enddate ]
do

  #Call K2test.sh which generates 1365 files for a given date in $localcurves directory
 ./K2test.sh $currdate
 filesfound=0

#Loop through the 1365 files generated by K2test.sh in $localcurves directory
 for FILE in `ls $localcurves`
 do
  filesfound=1
  #Check if the filename is one of the 133 files we want?  If it is only then process otherwise ignore
  zerocurvefile=`echo cat $ZEROCURVEFILES|grep $FILE`

  # If file is in the list then process it
   if [ "$zerocurvefile" != "" ]
   then
    echo "Processing $LOWERCASEFILE.$currdate file"

  #THIS PROCESSING IS SLOW LINE BY LINE
   exec 3<&0
  #Open the file
   exec 0<"$localcurves/$FILE"
   cnt=0
   rowstoprocess=0
  #Read file line by line
   while read line
   do
    cnt=`expr $cnt + 1`
    # First line in file contains number of records to process
    if [ "$cnt" -eq "1" ]
    then
     numheadrecords=`echo $line | awk '{FS=""}{print $1}'`
     rowstoprocess=`expr $numheadrecords + 2`
     echo "Total Number of Rows in header for $LOWERCASEFILE.$currdate is: $numheadrecords"
    fi
    
    if [ "$cnt" -gt "1" ] && [ "$cnt" -lt "$rowstoprocess" ]
    then
     julianmdate=`echo $line | awk '{FS=" "}{print $1}'`
     rate=`echo $line | awk '{FS=" "}{print $2}'`
     mdate=`echo $line | awk '{FS=" "}{print $4}'`
     # extract certain columns and put the data into out file
     echo "$LOWERCASEFILE,$currdate,$julianmdate,$rate,$mdate" >> $OUTFILE
    fi
    
   # If we have processed number of records as in first line then break the loop
    if [ "$cnt" -eq "$rowstoprocess" ]
    then
     break
    fi
   done
   exec 0<&3
  fi
 done
 
#Subtract 1 day from currdate (reverse loop)
 currdate=`./shift_date $currdate -1`
done

Last edited by srattani; 06-28-2011 at 07:58 AM..

srattani

View Public Profile for srattani

Find all posts by srattani

06-28-2011

Registered User

6,402, 678

Join Date: Mar 2008

Last Activity: 8 June 2016, 9:58 PM EDT

Posts: 6,402

Thanks Given: 288

Thanked 678 Times in 647 Posts

What Operating System and version are you running?
What Shell is /bin/sh on your computer?
How many lines are processed from the 133 files? Is it definitely not the whole of each file?
Does the script work?

What are these lines for? Is there a local reason for these complex redirects?

Quote:

#THIS PROCESSING IS SLOW LINE BY LINE
exec 3<&0
#Open the file
exec 0<"$localcurves/$FILE"

exec 0<&3

There is great scope for efficiency in this script but let's get a feel for the environment and the size of the data files first.

methyl

View Public Profile for methyl

Find all posts by methyl

06-28-2011

Registered User

24, 0

Join Date: Apr 2011

Last Activity: 9 November 2012, 8:59 AM EST

Posts: 24

Thanks Given: 4

Thanked 0 Times in 0 Posts

Hi methyl,

Thanks for look at my post I really appreciate it, I am new to Unix scripting so def. need guidance. Please see my answers

What Operating System and version are you running? It is sun solaris

What Shell is /bin/sh on your computer? How do I tell? I just know i am using sh

How many lines are processed from the 133 files? Is it definitely not the whole of each file? Each file has a number of records on very first line, I read that and process those many rows it can be anywhere from 10 to 200

Does the script work? Yes the script works but each file is taking approx 4 seconds to process and 133 files are taking 523 seconds which is almost 8 minutes for 133 files for 1 day and I have to process it for 400 days which wud take 53 hours

What are these lines for? Is there a local reason for these complex redirects? I copied it from a colleague so if you think there is no reason for these redirections I would appreciate your guidance

srattani

View Public Profile for srattani

Find all posts by srattani

06-28-2011

Registered User

894, 183

Join Date: Jul 2010

Last Activity: 2 November 2018, 11:07 AM EDT

Location: IN

Posts: 894

Thanks Given: 15

Thanked 183 Times in 174 Posts

Just a snippet. If your shell accepts then try changing the all the single square brackets to double square brackets. Ex.

Code:

while [ $currdate -ne $enddate ]
to
while [[ $currdate -ne $enddate ]]

michaelrozar17

View Public Profile for michaelrozar17

Find all posts by michaelrozar17

06-28-2011

Registered User

1,714, 63

Join Date: Apr 2004

Last Activity: 15 May 2020, 11:27 AM EDT

Location: Bordeaux, France

Posts: 1,714

Thanks Given: 2

Thanked 63 Times in 59 Posts

Try this version of your script (not tested):

Code:

#!/bin/sh
#e.g. 20110627 (june 27 2011)
currdate=$1
#e.g. 20100310 (march 10 2010)
enddate=$2

#directory where 1365 files get generated
localcurves="/home/sratta/feds/localCurves/curves"
outputdir="/home/sratta/curves"
#output fileto be generated
OUTFILE="/home/sratta/ZeroCurves/BulkLoad.csv"
touch $OUTFILE

# List of 133 curve file names
ZEROCURVEFILES="saud1-monthlinmid \
saud6-monthlinmid \
.....
suvruvr_usdlinmid \
szarzar_usdlinmid "

#Loop until currdate is not equal to enddate (reverse loop)
while [ $currdate -ne $enddate ]
do

  #Call K2test.sh which generates 1365 files for a given date in $localcurves directory
 ./K2test.sh $currdate
 filesfound=0

#Loop through the 1365 files generated by K2test.sh in $localcurves directory
 for FILE in `cd localcurves; ls $ZEROCURVEFILES 2>/dev/null`
 do
  filesfound=1
  echo "Processing $LOWERCASEFILE.$currdate file"

  awk '
    FNR==1 {
        numheadrecords = $1;
        rowstoprocess  = numheadrecords + 2;
        printf "Total Number of Rows in header for %s.%s is %s\n", LowFile, Date, numheadrecords;
        next;
    }
    FNR<rowstoprocess {
        julianmdate = $1;
        rate        = $2;
        mdate       = $4
        printf "%s,%s,%s,%s,%s\n", LowFile, Date, juliandate, rate, mdate;
    }
  ' LowFile=$LOWERCASEFILE Date=$currdate $FILE
    
 done
 
#Subtract 1 day from currdate (reverse loop)
 currdate=`./shift_date $currdate -1`
done

Jean-Pierre.

aigles

View Public Profile for aigles

Find all posts by aigles

06-28-2011

Registered User

24, 0

Join Date: Apr 2011

Last Activity: 9 November 2012, 8:59 AM EST

Posts: 24

Thanks Given: 4

Thanked 0 Times in 0 Posts

michaelrozar17, I did put double square braces and I get a syntax error, what is this for? you want to know which shell it is ?

Thanks Jean-Pierre, I will try it out and let you know.

---------- Post updated at 09:44 AM ---------- Previous update was at 09:38 AM ----------

Jean-Pierre, I am encountering a problem.

The 1365 files generated in $localcurves directory are in mixed-case name i.e. e.g. sCADTierTwolinMid but I need in lower case

If you see the list $ZEROCURVEFILES is all lower case so when we do `ls $ZEROCURVEFILES` it will not find any. Is there a way to do ls case in-sensitive?

srattani

View Public Profile for srattani

Find all posts by srattani

06-28-2011

Registered User

4,996, 477

Join Date: Dec 2003

Last Activity: 12 June 2016, 11:03 PM EDT

Location: /dev/ph

Posts: 4,996

Thanks Given: 73

Thanked 477 Times in 439 Posts

For those who are unaware of it ... on Solaris 10 and earlier, the default shell (/bin/sh) is the Bourne Shell.

fpmurphy

View Public Profile for fpmurphy

Find all posts by fpmurphy

Shell Programming and Scripting

Help with File Slow Processing

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Processing too slow with loop

Discussion started by: nikhil jain

2. Shell Programming and Scripting

Shell script reading file slow

Discussion started by: Chenchireddy

3. Programming

awk processing / Shell Script Processing to remove columns text file

Discussion started by: ajayram

4. Red Hat

GFS file system performance is very slow

Discussion started by: susindram

5. Shell Programming and Scripting

Very big text file - Too slow!

Discussion started by: fedonMan

6. Shell Programming and Scripting

Slow performance filtering file

Discussion started by: Miila

7. Shell Programming and Scripting

File processing is very slow with cut command

Discussion started by: bilalghazi

8. Red Hat

file writing over nfs very slow

Discussion started by: abhig

9. SCO

Slow Processing - not matching hardware capabilities

Discussion started by: atpbrownie

10. UNIX for Advanced & Expert Users

File writing is slow

Discussion started by: bhagyaraj.p