Comparison and For Loop Taking Too Long


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Comparison and For Loop Taking Too Long
# 1  
Old 07-07-2009
Comparison and For Loop Taking Too Long

I'd like to
1. Check and compare the 10,000 pnt files contains single record from the /$ROOTDIR/scp/inbox/string1 directory against 39 bad pnt files from the /$ROOTDIR/output/tma/pnt/bad/string1 directory based on the fam_id column value start at position 38 to 47 from the record below. Here is an example of the record from the file in both directories:
PNT0220060503081122003700100000091049000005629001005146417001407712SFirstname Lastname
2. If fam_id is matched then move current file from the /$ROOTDIR/scp/inbox/string1 directory into the /$ROOTDIR/output/tma/pnt/bad/string1 directory.
If not then continue the normal process
The below code is worked but it took 2 plus hours to complete the comparison process. Please advice if there is a better way to re-write or improve the comparison process to make it run faster and better. Thanks
Code:
pntcnt1=`ls -l /$ROOTDIR/scp/inbox/string1 | grep 'PNT.*' | wc -l`
if [[ $pntcnt1 -gt 0 ]] then
 
for gfile in `ls -1 /$ROOTDIR/scp/inbox/string1/PNT.2*`
 do
   gline=`sed '1q' $gfile`
   x=`echo "$gline" | awk '{ print substr( $0, 38, 9 ) }'`
   for bfile in `ls -1 /$ROOTDIR/output/tma/pnt/bad/string1/PNT.2*`
    do
      bline=`sed '1q' $bfile`
      y=`echo "$bline" | awk '{ print substr( $0, 38, 9 ) }'`
if [ "$x" -eq "$y" ]
then
  echo "file moved $gfile"
  mv -f $gfile /$ROOTDIR/output/tma/pnt/bad/string1
 
break
fi
 
done
 
done
fi

# 2  
Old 07-10-2009
There is room for improvement, but I'm not sure how much improvement it will be. In the end, you need to have a double-loop. There is a possibility for another way, below.
Code:
# pntcnt1=`ls -l /$ROOTDIR/scp/inbox/string1 | grep 'PNT.*' | wc -l`
## replaced with:
find /$ROOTDIR/scp/inbox/string1/ -name "*PNT.2*" -print |
# if [[ $pntcnt1 -gt 0 ]] then
## replaced with a while-pipe:
while read gfile 
 do
   # gline=`sed '1q' $gfile` # no longer needed here; awk does it all
   x=`awk 'FNR==1 { print substr( $0, 38, 9 ); exit }' $gfile`

   # for bfile in `ls -1 /$ROOTDIR/output/tma/pnt/bad/string1/PNT.2*`
   find /$ROOTDIR/scp/inbox/string1/ -name "*PNT.2*" -print |
   while read bfile
    do
      # let awk do the string comparison. 
      if awk -v x="$x" 'FNR==1 { if x == substr( $0, 38, 9 ) exit(0); exit(1); }' $bfile` 
      then
         echo "file moved $gfile"
         mv -f $gfile /$ROOTDIR/output/tma/pnt/bad/string1
         break
      fi
  done
done

The other method is memory-intensive: You go through the first directory and build up a tree of filename-string pairs; then you go through the second directory and compare each file's first row to your entries. It can be done in awk, but here's how to do it in perl:
Code:
#!/usr/bin/perl -w
$dir1= ; # put the first dir name here
$dir2= ; # put the second dir name here

opendir(D1,$dir1) || die "Cannot open $dir1: $!";
opendir(D2,$dir2) || die "Cannot open $dir2: $!";

# read record snippets from dir1
while ( $file1=readdir(D1) ) { 
   next unless $file1 =~ /PNT\.2/;
   open(FILE,$dir1."/".$file1) || do { warn "Could not open $dir1/$file1, skipping: $!"; next; }
   $line=<FILE>;
   $X{ substr($line,37,9) } = $file1;
}
close FILE;

# compare to files in dir2
while ( $file2=readdir(D2) ) { 
   next unless $file2 =~ /PNT\.2/;
   open(FILE,$dir2."/".$file2) || do { warn "Could not open $dir2/$file2, skipping: $!"; next; }
   $line=<FILE>; 
   $y=substr($line,37,9);
   if (exists $X{ $y }) { 
      print "mv -f $dir1/$X{$y} $dir2";
      delete $X{$y}; 
   }
}

That perl code is untested. It prints out the mv commands, rather than executing them. You can then examine the output is right, and replace the last "print" with "system". Files with spaces and funny characters in them might not work in this case. The substr...37 isn't a mistake. Perl starts counting strings at 0, while awk starts at 1.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Rm -rf is taking very long, will it timeout?

I have so many (hundreds of thousands) files and directories within this one specific directory that my "rm -rf" command to delete them has been taking forever. I did this via the SSH, my question is: if my SSH connection times out before rm -rf finishes, will it continue to delete all of those... (5 Replies)
Discussion started by: phpchick
5 Replies

2. Shell Programming and Scripting

While loop problem taking too long

while read myhosts do while read discovered do echo "$discovered" done < $LOGFILE | grep -Pi "|" | egrep... (7 Replies)
Discussion started by: SkySmart
7 Replies

3. UNIX for Dummies Questions & Answers

ls is taking long time to list

Hi, All the data are kept on Netapp using NFS. some directories are so fast when doing ls but few of them are slow. After doing few times, it becomes fast. Then again after few minutes, it becomes slow again. Can you advise what's going on? This one directory I am very interested is giving... (3 Replies)
Discussion started by: samnyc
3 Replies

4. Solaris

How to find out bottleneck if system is taking long time in gzip

Dear All, OS = Solaris 5.10 Hardware Sun Fire T2000 with 1 Ghz quode core We have oracle application 11i with 10g database. When ever i am trying to take cold backup of database with 55GB size its taking long time to finish. As the application is down nobody is using the server at all... (8 Replies)
Discussion started by: yoojamu
8 Replies

5. Solaris

Re-sync Taking Extremely Long.

It's almost 3 days now and my resync/re-attach is only at 80%. Is there something I can check in Solaris 10 that would be causing the degradation. It's only a standby machine. My live system completed in 6hrs. (9 Replies)
Discussion started by: ravzter
9 Replies

6. UNIX for Dummies Questions & Answers

Job is taking long time

Hi , We have 20 jobs are scheduled. In that one of our job is taking long time ,it's not completing. If we are not terminating it's running infinity time actually the job completion time is 5 minutes. The job is deleting some records from the table and two insert statements and one select... (7 Replies)
Discussion started by: ajaykumarkona
7 Replies

7. UNIX for Dummies Questions & Answers

gref -f taking long time for big file

grep -f taking long time to compare for big files, any alternate for fast check I am using grep -f file1 file2 to check - to ckeck dups/common rows prsents. But my files contains file1 contains 5gb and file 2 contains 50 mb and its taking such a long time to compare the files. Do we have any... (10 Replies)
Discussion started by: gkskumar
10 Replies

8. Shell Programming and Scripting

<AIX>Problem in purge script, taking very very long time to complete 18.30hrs

Hi, I have here a script which is used to purge older files/directories based on defined purge period. The script consists of 45 find commands, where each command will need to traverse through more than a million directories. Therefore a single find command executes around 22-25 mins... (7 Replies)
Discussion started by: sravicha
7 Replies

9. Shell Programming and Scripting

For Loop Taking Too Long

I'm new from UNIX scripting. Please help. I have about 10,000 files from the $ROOTDIR/scp/inbox/string1 directory to compare with the 50 files from /$ROOTDIR/output/tma/pnt/bad/string1/ directory and it takes about 2 hours plus to complete the for loop. Is there a better way to re-write the... (5 Replies)
Discussion started by: hanie123
5 Replies

10. Red Hat

login process taking a long time

I'm having a bit of a login performance issue.. wondering if anyone has any ideas where I might look. Here's the scenario... Linux Red Hat ES 4 update 5 regardless of where I login from (ssh or on the text console) after providing the password the system seems to pause for between 30... (4 Replies)
Discussion started by: retlaw
4 Replies
Login or Register to Ask a Question