Sponsored Content
Top Forums UNIX for Advanced & Expert Users Comparison and For Loop Taking Too Long Post 302332843 by otheus on Friday 10th of July 2009 07:50:48 AM
Old 07-10-2009
There is room for improvement, but I'm not sure how much improvement it will be. In the end, you need to have a double-loop. There is a possibility for another way, below.
Code:
# pntcnt1=`ls -l /$ROOTDIR/scp/inbox/string1 | grep 'PNT.*' | wc -l`
## replaced with:
find /$ROOTDIR/scp/inbox/string1/ -name "*PNT.2*" -print |
# if [[ $pntcnt1 -gt 0 ]] then
## replaced with a while-pipe:
while read gfile 
 do
   # gline=`sed '1q' $gfile` # no longer needed here; awk does it all
   x=`awk 'FNR==1 { print substr( $0, 38, 9 ); exit }' $gfile`

   # for bfile in `ls -1 /$ROOTDIR/output/tma/pnt/bad/string1/PNT.2*`
   find /$ROOTDIR/scp/inbox/string1/ -name "*PNT.2*" -print |
   while read bfile
    do
      # let awk do the string comparison. 
      if awk -v x="$x" 'FNR==1 { if x == substr( $0, 38, 9 ) exit(0); exit(1); }' $bfile` 
      then
         echo "file moved $gfile"
         mv -f $gfile /$ROOTDIR/output/tma/pnt/bad/string1
         break
      fi
  done
done

The other method is memory-intensive: You go through the first directory and build up a tree of filename-string pairs; then you go through the second directory and compare each file's first row to your entries. It can be done in awk, but here's how to do it in perl:
Code:
#!/usr/bin/perl -w
$dir1= ; # put the first dir name here
$dir2= ; # put the second dir name here

opendir(D1,$dir1) || die "Cannot open $dir1: $!";
opendir(D2,$dir2) || die "Cannot open $dir2: $!";

# read record snippets from dir1
while ( $file1=readdir(D1) ) { 
   next unless $file1 =~ /PNT\.2/;
   open(FILE,$dir1."/".$file1) || do { warn "Could not open $dir1/$file1, skipping: $!"; next; }
   $line=<FILE>;
   $X{ substr($line,37,9) } = $file1;
}
close FILE;

# compare to files in dir2
while ( $file2=readdir(D2) ) { 
   next unless $file2 =~ /PNT\.2/;
   open(FILE,$dir2."/".$file2) || do { warn "Could not open $dir2/$file2, skipping: $!"; next; }
   $line=<FILE>; 
   $y=substr($line,37,9);
   if (exists $X{ $y }) { 
      print "mv -f $dir1/$X{$y} $dir2";
      delete $X{$y}; 
   }
}

That perl code is untested. It prints out the mv commands, rather than executing them. You can then examine the output is right, and replace the last "print" with "system". Files with spaces and funny characters in them might not work in this case. The substr...37 isn't a mistake. Perl starts counting strings at 0, while awk starts at 1.
 

10 More Discussions You Might Find Interesting

1. Red Hat

login process taking a long time

I'm having a bit of a login performance issue.. wondering if anyone has any ideas where I might look. Here's the scenario... Linux Red Hat ES 4 update 5 regardless of where I login from (ssh or on the text console) after providing the password the system seems to pause for between 30... (4 Replies)
Discussion started by: retlaw
4 Replies

2. Shell Programming and Scripting

For Loop Taking Too Long

I'm new from UNIX scripting. Please help. I have about 10,000 files from the $ROOTDIR/scp/inbox/string1 directory to compare with the 50 files from /$ROOTDIR/output/tma/pnt/bad/string1/ directory and it takes about 2 hours plus to complete the for loop. Is there a better way to re-write the... (5 Replies)
Discussion started by: hanie123
5 Replies

3. Shell Programming and Scripting

<AIX>Problem in purge script, taking very very long time to complete 18.30hrs

Hi, I have here a script which is used to purge older files/directories based on defined purge period. The script consists of 45 find commands, where each command will need to traverse through more than a million directories. Therefore a single find command executes around 22-25 mins... (7 Replies)
Discussion started by: sravicha
7 Replies

4. UNIX for Dummies Questions & Answers

gref -f taking long time for big file

grep -f taking long time to compare for big files, any alternate for fast check I am using grep -f file1 file2 to check - to ckeck dups/common rows prsents. But my files contains file1 contains 5gb and file 2 contains 50 mb and its taking such a long time to compare the files. Do we have any... (10 Replies)
Discussion started by: gkskumar
10 Replies

5. UNIX for Dummies Questions & Answers

Job is taking long time

Hi , We have 20 jobs are scheduled. In that one of our job is taking long time ,it's not completing. If we are not terminating it's running infinity time actually the job completion time is 5 minutes. The job is deleting some records from the table and two insert statements and one select... (7 Replies)
Discussion started by: ajaykumarkona
7 Replies

6. Solaris

Re-sync Taking Extremely Long.

It's almost 3 days now and my resync/re-attach is only at 80%. Is there something I can check in Solaris 10 that would be causing the degradation. It's only a standby machine. My live system completed in 6hrs. (9 Replies)
Discussion started by: ravzter
9 Replies

7. Solaris

How to find out bottleneck if system is taking long time in gzip

Dear All, OS = Solaris 5.10 Hardware Sun Fire T2000 with 1 Ghz quode core We have oracle application 11i with 10g database. When ever i am trying to take cold backup of database with 55GB size its taking long time to finish. As the application is down nobody is using the server at all... (8 Replies)
Discussion started by: yoojamu
8 Replies

8. UNIX for Dummies Questions & Answers

ls is taking long time to list

Hi, All the data are kept on Netapp using NFS. some directories are so fast when doing ls but few of them are slow. After doing few times, it becomes fast. Then again after few minutes, it becomes slow again. Can you advise what's going on? This one directory I am very interested is giving... (3 Replies)
Discussion started by: samnyc
3 Replies

9. Shell Programming and Scripting

While loop problem taking too long

while read myhosts do while read discovered do echo "$discovered" done < $LOGFILE | grep -Pi "|" | egrep... (7 Replies)
Discussion started by: SkySmart
7 Replies

10. Shell Programming and Scripting

Rm -rf is taking very long, will it timeout?

I have so many (hundreds of thousands) files and directories within this one specific directory that my "rm -rf" command to delete them has been taking forever. I did this via the SSH, my question is: if my SSH connection times out before rm -rf finishes, will it continue to delete all of those... (5 Replies)
Discussion started by: phpchick
5 Replies
diff(1) 							   User Commands							   diff(1)

NAME
diff - compare two files SYNOPSIS
diff [-bitw] [-c | -e | -f | -h | -n | -u] file1 file2 diff [-bitw] [-C number | -U number] file1 file2 diff [-bitw] [-D string] file1 file2 diff [-bitw] [-c | -e | -f | -h | -n | -u] [-l] [-r] [-s] [-S name] directory1 directory2 DESCRIPTION
The diff utility will compare the contents of file1 and file2 and write to standard output a list of changes necessary to convert file1 into file2. This list should be minimal. Except in rare circumstances, diff finds a smallest sufficient set of file differences. No output will be produced if the files are identical. The normal output contains lines of these forms: n1 a n3,n4 n1,n2 d n3 n1,n2 c n3,n4 where n1 and n2 represent lines file1 and n3 and n4 represent lines in file2 These lines resemble ed(1) commands to convert file1 to file2. By exchanging a for d and reading backward, file2 can be converted to file1. As in ed, identical pairs, where n1=n2 or n3=n4, are abbrevi- ated as a single number. Following each of these lines come all the lines that are affected in the first file flagged by `<', then all the lines that are affected in the second file flagged by `>'. OPTIONS
The following options are supported: -b Ignores trailing blanks (spaces and tabs) and treats other strings of blanks as equivalent. -i Ignores the case of letters. For example, `A' will compare equal to `a'. -t Expands TAB characters in output lines. Normal or -c output adds character(s) to the front of each line that may adversely affect the indentation of the original source lines and make the output lines difficult to interpret. This option will preserve the origi- nal source's indentation. -w Ignores all blanks (SPACE and TAB characters) and treats all other strings of blanks as equivalent. For example, `if ( a == b )' will compare equal to `if(a==b)'. The following options are mutually exclusive: -c Produces a listing of differences with three lines of context. With this option, output format is modified slightly. That is, output begins with identification of the files involved and their creation dates, then each change is separated by a line with a dozen *'s. The lines removed from file1 are marked with '--'. The lines added to file2 are marked '+'. Lines that are changed from one file to the other are marked in both files with '!'. -C number Produces a listing of differences identical to that produced by -c with number lines of context. -D string Creates a merged version of file1 and file2 with C preprocessor controls included so that a compilation of the result without defining string is equivalent to compiling file1, while defining string will yield file2. -e Produces a script of only a, c, and d commands for the editor ed, which will recreate file2 from file1. In connection with the -e option, the following shell program may help maintain multiple versions of a file. Only an ancestral file ($1) and a chain of version-to-version ed scripts ($2,$3,...) made by diff need be on hand. A ``latest version'' appears on the standard output. (shift; cat $*; echo a'1,$p') | ed - $1 -f Produces a similar script, not useful with ed, in the opposite order. -h Does a fast, half-hearted job. It works only when changed stretches are short and well separated, but does work on files of unlimited length. Options -c, -C, -D, -e, -f, and -n are unavailable with -h. diff does not descend into directories with this option. -n Produces a script similar to -e, but in the opposite order and with a count of changed lines on each insert or delete command. -u Produces a listing of differences with three lines of context. The output is similar to that of the -c option, except that the context is "unified". Removed and changed lines in file1 are marked by a '-' while lines added or changed in file2 are marked by a '+'. Both versions of changed lines appear in the output, while added, removed, and context lines appear only once. The identification of file1 and file2 is different, with "---" and "+++" being printed where "***" and "---" would appear with the -c option. Each change is separated by a line of the form @@ -n1,n2 +n3,n4 @@ -U number Produces a listing of differences identical to that produced by -u with number lines of context. The following options are used for comparing directories: -l Produces output in long format. Before the diff, each text file is piped through pr(1) to paginate it. Other differences are remembered and summarized after all text file differences are reported. -r Applies diff recursively to common subdirectories encountered. -s Reports files that are identical. These identical files would not otherwise be mentioned. -S name Starts a directory diff in the middle, beginning with the file name. OPERANDS
The following operands are supported: file1 A path name of a file or directory to be compared. If either file1 or file2 is -, the standard input will be used in its file2 place. directory1 A path name of a directory to be compared. directory2 If only one of file1 and file2 is a directory, diff will be applied to the non-directory file and the file contained in the directory file with a filename that is the same as the last component of the non-directory file. USAGE
See largefile(5) for the description of the behavior of diff when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes). EXAMPLES
Example 1 Typical output of the diff command In the following command, dir1 is a directory containing a directory named x, dir2 is a directory containing a directory named x, dir1/x and dir2/x both contain files named date.out, and dir2/x contains a file named y: example% diff -r dir1 dir2 Common subdirectories: dir1/x and dir2/x Only in dir2/x: y diff -r dir1/x/date.out dir2/x/date.out 1c1 < Mon Jul 2 13:12:16 PDT 1990 --- > Tue Jun 19 21:41:39 PDT 1990 ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of diff: LANG, LC_ALL, LC_CTYPE, LC_MES- SAGES, LC_TIME, and NLSPATH. TZ Determines the locale for affecting the timezone used for calculating file timestamps written with the -C and -c options. EXIT STATUS
The following exit values are returned: 0 No differences were found. 1 Differences were found. >1 An error occurred. FILES
/tmp/d????? temporary file used for comparison /usr/lib/diffh executable file for -h option ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |Enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
bdiff(1), cmp(1), comm(1), dircmp(1), ed(1), pr(1), sdiff(1), attributes(5), environ(5), largefile(5), standards(5) NOTES
Editing scripts produced under the -e or -f options are naive about creating lines consisting of a single period (.). Missing NEWLINE at end of file indicates that the last line of the file in question did not have a NEWLINE. If the lines are different, they will be flagged and output, although the output will seem to indicate they are the same. SunOS 5.11 22 Sep 2004 diff(1)
All times are GMT -4. The time now is 05:47 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy