The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
file comparison...help needed. er_ashu UNIX for Dummies Questions & Answers 4 05-15-2008 06:37 PM
Comparison Unix and Windows file sysytem localp UNIX for Dummies Questions & Answers 1 04-11-2008 01:02 AM
Output format - comparison with I/p file velappangs Shell Programming and Scripting 1 04-03-2008 03:31 AM
file comparison script tiger99 Shell Programming and Scripting 1 01-30-2008 07:47 AM
File Time Comparison Question pc9456 UNIX for Advanced & Expert Users 2 07-23-2003 12:05 PM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1  
Old 01-02-2008
Registered User
 

Join Date: Dec 2007
Posts: 8
File Comparison

I have to compare two text files, very few of the lines in these files will have some difference in some column.
The files size is in GB.
Sample lines are as below:
11111122222222333333aaaaaaaaaabbbbbbbbbccccccccdddddd
11111122222222333333aaaaaaaaaabbbbbbbbbccccccccddeddd

So assuming these two lines are from file1 and file2 respectively, I should get the second file line in a new output file which is the difference file.

What I would like to do is read line1 from file1 and loop through all the lines in file2 and stop when a match is found, else print it that line to output file. And repeat the same steps for all the lines from file1.

Appreciate any help in this regard.
Reply With Quote
Forum Sponsor
  #2  
Old 01-02-2008
...@...
 

Join Date: Feb 2004
Location: NM
Posts: 4,294
What do you mean by "stop when a match is found", then you read more from file1....
Do you want the line number? Stop usually means to exit the read loop.
Reply With Quote
  #3  
Old 01-02-2008
Registered User
 

Join Date: Dec 2007
Posts: 8
Yes, I want to exit the read loop when a match is found, I do not want to check any more for that line.
No I do not need the line number.
Reply With Quote
  #4  
Old 01-02-2008
Registered User
 

Join Date: Jan 2008
Location: Pittsburgh, PA
Posts: 2
If I understand what you're trying to do correctly, here's a quick bash script.

Code:
#!/bin/bash

compareFile = "/path/to/file/to/compare.txt"
outputFile = "/path/to/outputFile.txt"

for filename in /some/dir/of/text/files/*.txt; do 
        
        numlines=`cat $filename | wc -l`
                
        for i in `seq 1 $numlines`; do 
                current=`cat $filename | head -$i | tail -1` 
 
                grep -q "${current}" ${compareFile} 
 
                if [ $? != 0 ]; then
                         #doesn't exist, append to $outputFile
                        echo "${filename}:${current}" >> ${outputFile} 
                fi
        done 
done
Reply With Quote
  #5  
Old 01-02-2008
Registered User
 

Join Date: Dec 2007
Posts: 8
Hi, Thank you for the quick solution and looks pretty much what I want.
But I am unable to run this script, I use ksh.
One of the errors is "seq: command not found"
Reply With Quote
  #6  
Old 01-02-2008
Registered User
 

Join Date: Jan 2008
Location: Pittsburgh, PA
Posts: 2
which seq (usually resides in /usr/bin/)

It's an individual executable command; should be part of the coreutils package if you're using linux.

if it exists on your system, modify the script
seq="/path/to/seq"

then modify the for statement to use the variable: for i in `${seq}...
Reply With Quote
  #7  
Old 01-02-2008
Registered User
 

Join Date: Sep 2006
Posts: 1,580
Quote:
Originally Posted by dislusive View Post
If I understand what you're trying to do correctly, here's a quick bash script.

Code:
#!/bin/bash

compareFile = "/path/to/file/to/compare.txt"
outputFile = "/path/to/outputFile.txt"

for filename in /some/dir/of/text/files/*.txt; do 
        
        numlines=`cat $filename | wc -l`
                
        for i in `seq 1 $numlines`; do 
                current=`cat $filename | head -$i | tail -1` 
 
                grep -q "${current}" ${compareFile} 
 
                if [ $? != 0 ]; then
                         #doesn't exist, append to $outputFile
                        echo "${filename}:${current}" >> ${outputFile} 
                fi
        done 
done
As mentioned by OP, the files are in GB. I think there will be some performance lag. just a guess.
Also seq is not a standard command in some *nix OS. Therefore if you want to use loops that loop over a counter, a while loop can be used instead. eg while [ $num -le $numlines ]
Reply With Quote
Google The UNIX and Linux Forums
Reply

Tags
linux

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 10:05 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0