Help with File Slow Processing

06-28-2011

Registered User

1,714, 63

Join Date: Apr 2004

Last Activity: 15 May 2020, 11:27 AM EDT

Location: Bordeaux, France

Posts: 1,714

Thanks Given: 2

Thanked 63 Times in 59 Posts

Try and adapt this new version (not tested) :

Code:

#!/bin/sh
#e.g. 20110627 (june 27 2011)
currdate=$1
#e.g. 20100310 (march 10 2010)
enddate=$2

#directory where 1365 files get generated
localcurves="/home/sratta/feds/localCurves/curves"
outputdir="/home/sratta/curves"
#output fileto be generated
OUTFILE="/home/sratta/ZeroCurves/BulkLoad.csv"
touch $OUTFILE

# List of 133 curve file names
ZEROCURVEFILES="saud1-monthlinmid
saud6-monthlinmid
.....
suvruvr_usdlinmid
szarzar_usdlinmid"

echo "$ZEROCURVEFILES" > /tmp/zerocurvefiles.tmp

#Loop until currdate is not equal to enddate (reverse loop)
while [ $currdate -ne $enddate ]
do

  #Call K2test.sh which generates 1365 files for a given date in $localcurves directory
 ./K2test.sh $currdate
 filesfound=0

#Loop through the 1365 files generated by K2test.sh in $localcurves directory
 for FILE in `cd localcurves; ls | /usr/xpg4/bin/grep -iwf /tmp/zerocurvefiles.tmp 2>/dev/null`
 do
  filesfound=1
  echo "Processing $LOWERCASEFILE.$currdate file"

  nawk '
    FNR==1 {
        numheadrecords = $1;
        rowstoprocess  = numheadrecords + 2;
        printf "Total Number of Rows in header for %s.%s is %s\n", LowFile, Date, numheadrecords;
        next;
    }
    FNR<rowstoprocess {
        julianmdate = $1;
        rate        = $2;
        mdate       = $4
        printf "%s,%s,%s,%s,%s\n", LowFile, Date, juliandate, rate, mdate;
    }
  ' LowFile=$LOWERCASEFILE Date=$currdate $FILE
    
 done
 
#Subtract 1 day from currdate (reverse loop)
 currdate=`./shift_date $currdate -1`
done

Jean-Pierre.

aigles

View Public Profile for aigles

Find all posts by aigles

06-28-2011

Registered User

6,402, 678

Join Date: Mar 2008

Last Activity: 8 June 2016, 9:58 PM EDT

Posts: 6,402

Thanks Given: 288

Thanked 678 Times in 647 Posts

(Late post - lost connection, may be out of context)

Quote:

What Operating System and version are you running? It is sun solaris

The version is in the output from the "uname -a" command. It should then be possible to look up whether your Solaris is an old one which has the old Bourne Shell for /bin/sh or a new one with the more modern Posix Shell.

1) The big inefficency is using a Shell "read" to read records line-by-line from a data file, then using multiple "awk" runs to separate the fields.
I see now why you reassigned the channels because you are already using the Shell input channel to read a list of files.

I agree with the ideas behind "agiles" modifications.

2) As you have a list of required files, use that list.
I'd add a test to the script to check whether the file exists.
I see that "agiles" modification is ingeneous because it allows for this by sending errors to /dev/null:

Quote:

for FILE in `cd localcurves; ls $ZEROCURVEFILES 2>/dev/null`

3) Invoke awk only once and use it to read the data from the files.
A lot of the inefficiency comes from the number of times the original script starts "awk" to process the same $line .

4) Hold the list of 133 files in a real file not an environment variable and use "while" rather than "for". Some Bourne shells will not let you have an environment variable that big.

5) Consider making a version of K2test.sh which only generates the relevant 133 files in /home/sratta/feds/localCurves/curves .

6) Noticed that the variable $LOWERCASEFILE is not set anywhere.

7) If you have a journalling filesystem it is inefficient to repeatedly create a batch of files then overwrite them with Shell. Depends whether K2test.sh removes old files before generating new files.

methyl

View Public Profile for methyl

Find all posts by methyl

06-28-2011

Registered User

24, 0

Join Date: Apr 2011

Last Activity: 9 November 2012, 8:59 AM EST

Posts: 24

Thanks Given: 4

Thanked 0 Times in 0 Posts

Hi Jean Pierre/Methyl,

Jean Pierre, I see that when you are using awk you are using printf, I want the data elements to be sent to a file and not printed, how do I make data elements to go to $OUTFILE where OUTFILE is a variable holding name of file.

I would appreciate your help.

Thanks
Regards

srattani

View Public Profile for srattani

Find all posts by srattani

06-28-2011

Registered User

6,402, 678

Join Date: Mar 2008

Last Activity: 8 June 2016, 9:58 PM EDT

Posts: 6,402

Thanks Given: 288

Thanked 678 Times in 647 Posts

Check "agiles" next post, but I think this is enough for the redirect:
' LowFile=$LOWERCASEFILE Date=$currdate $FILE >> ${OUTFILE}

However there are other problems:
e.g. There is no value in $LOWERCASEFILE .

It would be so much easier if $ZEROCURVEFILES was the name of a file containing a list of the required files with their correct names. This could be created from a "ls -1" report and deleting the ones you don't want. It could equally be created using a "here document" within the script.
Translating the mixed upper-and-lower filename to lower case is a trivial task for the unix "tr" command. Working from a lower case list is proving to be not trivial.

Any chance you can let us know your version of Solaris?

methyl

View Public Profile for methyl

Find all posts by methyl

06-29-2011

Registered User

24, 0

Join Date: Apr 2011

Last Activity: 9 November 2012, 8:59 AM EST

Posts: 24

Thanks Given: 4

Thanked 0 Times in 0 Posts

Thanks Jean Pierre and Methyl,

The awk command you sent will only run for number of record in header correct? For the lines following the first line, it uses space as delimiter within the line to extract fields? as I dont see mention of separator being space.

I will modify the $ZEROCURVEFILES so that it has exact names. After doing that, how do I loop through only those files basically i have to some how use ls command giving it this list then inside loop i can change file name to lower case using tr like you mentioned.

I will let you know the version of Solaris when I reach work. I appreciate ur help.

Thanks
Regards

Last edited by srattani; 06-29-2011 at 07:20 AM..

srattani

View Public Profile for srattani

Find all posts by srattani

Shell Programming and Scripting

Help with File Slow Processing

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Processing too slow with loop

Discussion started by: nikhil jain

2. Shell Programming and Scripting

Shell script reading file slow

Discussion started by: Chenchireddy

3. Programming

awk processing / Shell Script Processing to remove columns text file

Discussion started by: ajayram

4. Red Hat

GFS file system performance is very slow

Discussion started by: susindram

5. Shell Programming and Scripting

Very big text file - Too slow!

Discussion started by: fedonMan

6. Shell Programming and Scripting

Slow performance filtering file

Discussion started by: Miila

7. Shell Programming and Scripting

File processing is very slow with cut command

Discussion started by: bilalghazi

8. Red Hat

file writing over nfs very slow

Discussion started by: abhig

9. SCO

Slow Processing - not matching hardware capabilities

Discussion started by: atpbrownie

10. UNIX for Advanced & Expert Users

File writing is slow

Discussion started by: bhagyaraj.p