Help with File Slow Processing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with File Slow Processing
# 1  
Old 06-27-2011
Help with File Slow Processing

Hello,

Hope you are doing fine. Let me describe the problem, I have a script that calls another script K2Test.sh, this script K2Test.sh (created by another team) takes date as argument and generates approx 1365 files in localcurves directory for given date.

Out of these 1365 I am only interested in 133 files, so I have created a list of file names (ZEROCURVEFILES as below) that we need to process.

I loop through these 1365 files (`ls $localcurves` as below) and check if file name is in 133 file list (ZEROCURVEFILES ) and if it is then I process the file by reading it line by line.

It seems it takes too long just to process 133 files, am I using some in-efficient code below? is there a way to process it faster? is it slow because I open and read 133 files line by line?

I need to run this script for 400 days which means I would be looping 400 * 1365 times i.e once per day and and for each day process 133 file.

I would really appreciate any help to help make it faster. Here is the code, I know it is too much code, please let me know if something in script.

Code:
#!/bin/sh
#e.g. 20110627 (june 27 2011)
currdate=$1
#e.g. 20100310 (march 10 2010)
enddate=$2

#directory where 1365 files get generated
localcurves="/home/sratta/feds/localCurves/curves"
outputdir="/home/sratta/curves"
#output fileto be generated
OUTFILE="/home/sratta/ZeroCurves/BulkLoad.csv"
touch $OUTFILE

# List of 133 curve file names
ZEROCURVEFILES="saud1-monthlinmid \
saud6-monthlinmid \
.....
suvruvr_usdlinmid \
szarzar_usdlinmid "

#Loop until currdate is not equal to enddate (reverse loop)
while [ $currdate -ne $enddate ]
do

  #Call K2test.sh which generates 1365 files for a given date in $localcurves directory
 ./K2test.sh $currdate
 filesfound=0

#Loop through the 1365 files generated by K2test.sh in $localcurves directory
 for FILE in `ls $localcurves`
 do
  filesfound=1
  #Check if the filename is one of the 133 files we want?  If it is only then process otherwise ignore
  zerocurvefile=`echo cat $ZEROCURVEFILES|grep $FILE`

  # If file is in the list then process it
   if [ "$zerocurvefile" != "" ]
   then
    echo "Processing $LOWERCASEFILE.$currdate file"

  #THIS PROCESSING IS SLOW LINE BY LINE
   exec 3<&0
  #Open the file
   exec 0<"$localcurves/$FILE"
   cnt=0
   rowstoprocess=0
  #Read file line by line
   while read line
   do
    cnt=`expr $cnt + 1`
    # First line in file contains number of records to process
    if [ "$cnt" -eq "1" ]
    then
     numheadrecords=`echo $line | awk '{FS=""}{print $1}'`
     rowstoprocess=`expr $numheadrecords + 2`
     echo "Total Number of Rows in header for $LOWERCASEFILE.$currdate is: $numheadrecords"
    fi
    
    if [ "$cnt" -gt "1" ] && [ "$cnt" -lt "$rowstoprocess" ]
    then
     julianmdate=`echo $line | awk '{FS=" "}{print $1}'`
     rate=`echo $line | awk '{FS=" "}{print $2}'`
     mdate=`echo $line | awk '{FS=" "}{print $4}'`
     # extract certain columns and put the data into out file
     echo "$LOWERCASEFILE,$currdate,$julianmdate,$rate,$mdate" >> $OUTFILE
    fi
    
   # If we have processed number of records as in first line then break the loop
    if [ "$cnt" -eq "$rowstoprocess" ]
    then
     break
    fi
   done
   exec 0<&3
  fi
 done
 
#Subtract 1 day from currdate (reverse loop)
 currdate=`./shift_date $currdate -1`
done


Last edited by srattani; 06-28-2011 at 08:58 AM..
# 2  
Old 06-28-2011
What Operating System and version are you running?
What Shell is /bin/sh on your computer?
How many lines are processed from the 133 files? Is it definitely not the whole of each file?
Does the script work?

What are these lines for? Is there a local reason for these complex redirects?
Quote:
#THIS PROCESSING IS SLOW LINE BY LINE
exec 3<&0
#Open the file
exec 0<"$localcurves/$FILE"

exec 0<&3

There is great scope for efficiency in this script but let's get a feel for the environment and the size of the data files first.
# 3  
Old 06-28-2011
Hi methyl,

Thanks for look at my post I really appreciate it, I am new to Unix scripting so def. need guidance. Please see my answers

What Operating System and version are you running? It is sun solaris

What Shell is /bin/sh on your computer? How do I tell? I just know i am using sh

How many lines are processed from the 133 files? Is it definitely not the whole of each file? Each file has a number of records on very first line, I read that and process those many rows it can be anywhere from 10 to 200

Does the script work? Yes the script works but each file is taking approx 4 seconds to process and 133 files are taking 523 seconds which is almost 8 minutes for 133 files for 1 day and I have to process it for 400 days Smilie which wud take 53 hours Smilie

What are these lines for? Is there a local reason for these complex redirects? I copied it from a colleague so if you think there is no reason for these redirections I would appreciate your guidance
# 4  
Old 06-28-2011
Just a snippet. If your shell accepts then try changing the all the single square brackets to double square brackets. Ex.
Code:
while [ $currdate -ne $enddate ]
to
while [[ $currdate -ne $enddate ]]

# 5  
Old 06-28-2011
Try this version of your script (not tested):
Code:
#!/bin/sh
#e.g. 20110627 (june 27 2011)
currdate=$1
#e.g. 20100310 (march 10 2010)
enddate=$2

#directory where 1365 files get generated
localcurves="/home/sratta/feds/localCurves/curves"
outputdir="/home/sratta/curves"
#output fileto be generated
OUTFILE="/home/sratta/ZeroCurves/BulkLoad.csv"
touch $OUTFILE

# List of 133 curve file names
ZEROCURVEFILES="saud1-monthlinmid \
saud6-monthlinmid \
.....
suvruvr_usdlinmid \
szarzar_usdlinmid "

#Loop until currdate is not equal to enddate (reverse loop)
while [ $currdate -ne $enddate ]
do

  #Call K2test.sh which generates 1365 files for a given date in $localcurves directory
 ./K2test.sh $currdate
 filesfound=0

#Loop through the 1365 files generated by K2test.sh in $localcurves directory
 for FILE in `cd localcurves; ls $ZEROCURVEFILES 2>/dev/null`
 do
  filesfound=1
  echo "Processing $LOWERCASEFILE.$currdate file"

  awk '
    FNR==1 {
        numheadrecords = $1;
        rowstoprocess  = numheadrecords + 2;
        printf "Total Number of Rows in header for %s.%s is %s\n", LowFile, Date, numheadrecords;
        next;
    }
    FNR<rowstoprocess {
        julianmdate = $1;
        rate        = $2;
        mdate       = $4
        printf "%s,%s,%s,%s,%s\n", LowFile, Date, juliandate, rate, mdate;
    }
  ' LowFile=$LOWERCASEFILE Date=$currdate $FILE
    
 done
 
#Subtract 1 day from currdate (reverse loop)
 currdate=`./shift_date $currdate -1`
done

Jean-Pierre.
# 6  
Old 06-28-2011
michaelrozar17, I did put double square braces and I get a syntax error, what is this for? you want to know which shell it is ?

Thanks Jean-Pierre, I will try it out and let you know.

---------- Post updated at 09:44 AM ---------- Previous update was at 09:38 AM ----------

Jean-Pierre, I am encountering a problem.

The 1365 files generated in $localcurves directory are in mixed-case name i.e. e.g. sCADTierTwolinMid but I need in lower case

If you see the list $ZEROCURVEFILES is all lower case so when we do `ls $ZEROCURVEFILES` it will not find any. Is there a way to do ls case in-sensitive?
# 7  
Old 06-28-2011
For those who are unaware of it ... on Solaris 10 and earlier, the default shell (/bin/sh) is the Bourne Shell.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Processing too slow with loop

I have 2 files file 1 : contains ALINE ALINE BANG B ON A B.B.V.A. BANG AMER CORG BANG ON MORENA BANG ON MORENAIC BANG ON MORENAICA BANG ON MORENAICA CORP BANG ON MORENAICA N.A file 2 contains and is seprated by ^ delimiter : NATIO MARKET^345432534 (10 Replies)
Discussion started by: nikhil jain
10 Replies

2. Shell Programming and Scripting

Shell script reading file slow

I have shell program as below #!/bin/sh echo ======= LogManageri start ========== #This directory is getting the raw data from remote server Raw_data=/opt/ftplogs # This directory is ready for process the data Processing_dir=/opt/processing_dir # This directory is prcoessed files and... (4 Replies)
Discussion started by: Chenchireddy
4 Replies

3. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

4. Red Hat

GFS file system performance is very slow

My code Hi All, I am having redhat linux 5.3 (Tikanga) with GFS file system and its very very slow for executing ls -ls command also.Please see the below for 2minits 12 second takes. Please help me to fix the issue. $ sudo time ls -la BadFiles |wc -l 0.01user 0.26system... (3 Replies)
Discussion started by: susindram
3 Replies

5. Shell Programming and Scripting

Very big text file - Too slow!

Hello everyone, suppose there is a very big text file (>800 mb) that each line contains an article from wikipedia. Each article begins with a tag (<..>) containing its url. Currently there are 10^6 articles in the file. I want to take random N articles, eliminate all non-alpharithmetic... (14 Replies)
Discussion started by: fedonMan
14 Replies

6. Shell Programming and Scripting

Slow performance filtering file

Please, I need help tuning my script. It works but it's too slow. The code reads an acivity log file with 50.000 - 100.000 lines and filters error messages from it. The data in the actlog file look similar to this: 02/08/2011 00:25:01,ANR2034E QUERY MOUNT: No match found using this criteria.... (5 Replies)
Discussion started by: Miila
5 Replies

7. Shell Programming and Scripting

File processing is very slow with cut command

Dear All, I am using the following script to find and replace the date format in a file. The field18 in the file has the following format: "01/26/2010 11:55:14 GMT+04:00" which I want to convert into the following format "20100126115514" for this purpose I am using the following lines of codes:... (5 Replies)
Discussion started by: bilalghazi
5 Replies

8. Red Hat

file writing over nfs very slow

Hi guys, I am trying something. I wrote a simple shell program to test something where continuous while loop writes on a file over the nfs. The time taken to write "hello" 3000 times take about 10 sec which is not right. Ideally it should take fraction of seconds. If I write on the local disk, it... (1 Reply)
Discussion started by: abhig
1 Replies

9. SCO

Slow Processing - not matching hardware capabilities

I have been a SCO UNIX user, never an administrator...so I am stumbling around looking for information. I don't know too much about what is onboard in terms of hardware, however; I will try my best. We have SCO 5.07 and have applied MP5. We have a quad core processor with 4 250 GB... (1 Reply)
Discussion started by: atpbrownie
1 Replies

10. UNIX for Advanced & Expert Users

File writing is slow

Hello Guru, I am using a Pro*C program to prepare some reports usaually the report file size is greater than 1GB. But nowadays program is very slow. I found out the program is taking much time to write data to file ..... is there any unix related reason to be slow down, the file writting... (2 Replies)
Discussion started by: bhagyaraj.p
2 Replies
Login or Register to Ask a Question