The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
file comparison...help needed. er_ashu UNIX for Dummies Questions & Answers 4 05-15-2008 09:37 PM
Comparison Unix and Windows file sysytem localp UNIX for Dummies Questions & Answers 1 04-11-2008 04:02 AM
Output format - comparison with I/p file velappangs Shell Programming and Scripting 1 04-03-2008 06:31 AM
file comparison script tiger99 Shell Programming and Scripting 1 01-30-2008 10:47 AM
File Time Comparison Question pc9456 UNIX for Advanced & Expert Users 2 07-23-2003 03:05 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 01-02-2008
net_shree net_shree is offline
Registered User
  
 

Join Date: Dec 2007
Posts: 8
File Comparison

I have to compare two text files, very few of the lines in these files will have some difference in some column.
The files size is in GB.
Sample lines are as below:
11111122222222333333aaaaaaaaaabbbbbbbbbccccccccdddddd
11111122222222333333aaaaaaaaaabbbbbbbbbccccccccddeddd

So assuming these two lines are from file1 and file2 respectively, I should get the second file line in a new output file which is the difference file.

What I would like to do is read line1 from file1 and loop through all the lines in file2 and stop when a match is found, else print it that line to output file. And repeat the same steps for all the lines from file1.

Appreciate any help in this regard.
  #2 (permalink)  
Old 01-02-2008
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
  
 

Join Date: Feb 2004
Location: NM
Posts: 5,715
What do you mean by "stop when a match is found", then you read more from file1....
Do you want the line number? Stop usually means to exit the read loop.
  #3 (permalink)  
Old 01-02-2008
net_shree net_shree is offline
Registered User
  
 

Join Date: Dec 2007
Posts: 8
Yes, I want to exit the read loop when a match is found, I do not want to check any more for that line.
No I do not need the line number.
  #4 (permalink)  
Old 01-02-2008
dislusive dislusive is offline
Registered User
  
 

Join Date: Jan 2008
Location: Pittsburgh, PA
Posts: 2
If I understand what you're trying to do correctly, here's a quick bash script.

Code:
#!/bin/bash

compareFile = "/path/to/file/to/compare.txt"
outputFile = "/path/to/outputFile.txt"

for filename in /some/dir/of/text/files/*.txt; do 
        
        numlines=`cat $filename | wc -l`
                
        for i in `seq 1 $numlines`; do 
                current=`cat $filename | head -$i | tail -1` 
 
                grep -q "${current}" ${compareFile} 
 
                if [ $? != 0 ]; then
                         #doesn't exist, append to $outputFile
                        echo "${filename}:${current}" >> ${outputFile} 
                fi
        done 
done
  #5 (permalink)  
Old 01-02-2008
net_shree net_shree is offline
Registered User
  
 

Join Date: Dec 2007
Posts: 8
Hi, Thank you for the quick solution and looks pretty much what I want.
But I am unable to run this script, I use ksh.
One of the errors is "seq: command not found"
  #6 (permalink)  
Old 01-02-2008
dislusive dislusive is offline
Registered User
  
 

Join Date: Jan 2008
Location: Pittsburgh, PA
Posts: 2
which seq (usually resides in /usr/bin/)

It's an individual executable command; should be part of the coreutils package if you're using linux.

if it exists on your system, modify the script
seq="/path/to/seq"

then modify the for statement to use the variable: for i in `${seq}...
  #7 (permalink)  
Old 01-02-2008
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,506
Quote:
Originally Posted by dislusive View Post
If I understand what you're trying to do correctly, here's a quick bash script.

Code:
#!/bin/bash

compareFile = "/path/to/file/to/compare.txt"
outputFile = "/path/to/outputFile.txt"

for filename in /some/dir/of/text/files/*.txt; do 
        
        numlines=`cat $filename | wc -l`
                
        for i in `seq 1 $numlines`; do 
                current=`cat $filename | head -$i | tail -1` 
 
                grep -q "${current}" ${compareFile} 
 
                if [ $? != 0 ]; then
                         #doesn't exist, append to $outputFile
                        echo "${filename}:${current}" >> ${outputFile} 
                fi
        done 
done
As mentioned by OP, the files are in GB. I think there will be some performance lag. just a guess.
Also seq is not a standard command in some *nix OS. Therefore if you want to use loops that loop over a counter, a while loop can be used instead. eg while [ $num -le $numlines ]
Sponsored Links
Closed Thread

Bookmarks

Tags
linux

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 10:30 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0