Help with script or command to differentiate difference between two input file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with script or command to differentiate difference between two input file?
# 8  
Old 12-31-2010
See if this works faster:
Code:
awk -F '' '{getline s<f;split(s,T);for(i=1;i<=NF;i++)if($i==T[i])$i=" "}1' OFS= f=file2 file1

Try mawk instead of awk if you have that available...
This User Gave Thanks to Scrutinizer For This Post:
# 9  
Old 01-02-2011
Thanks, Scrutinizer.
I just try your awk command. It worked fine Smilie
I found out that it required huge memory if I'm dealing with comparing two huge file (>1GB)
Do you have any better idea to figure out this problem?

---------- Post updated at 09:03 PM ---------- Previous update was at 09:01 PM ----------

Thanks, rdcwayx.
Your awk command worked fine Smilie
If I'm dealing with comparing two huge data file, do you have any suggestion to reduce the memory required by the awk command?
# 10  
Old 01-03-2011
Quote:
Originally Posted by perl_beginner
Thanks, Scrutinizer.
I just try your awk command. It worked fine Smilie
I found out that it required huge memory if I'm dealing with comparing two huge file (>1GB)
Do you have any better idea to figure out this problem?
[..]
That surprises me. The little program shouldn't store more than about two times two lines at any time in its internal variables... Are you sure the application is using that memory and it is not caching by the OS, like for example is the case on Linux and which is in fact free memory? How long are the lines? How did you determine the memory use?
# 11  
Old 01-03-2011
Some of the read length is around 10,000,000 or more.
Huge memory taken by the awk program is shown when I key in the "top" at the bash shell Smilie
# 12  
Old 01-03-2011
That is a bit much. Perhaps you could introduce a couple of linefeeds and limit the line length to for example 80 characters:
Code:
awk '{getline s<f;split(s,T);for(i=1;i<=NF;i++)if($i==T[i])$i=" "}1' FS= OFS= f=<(fold -w80 file2) <(fold -w80 file1)

This example would work in bash/ksh93 only on most OS. But you can always first prepare input files using the fold command and then use the those files as input....
This User Gave Thanks to Scrutinizer For This Post:
# 13  
Old 01-03-2011
Thanks for your advice, Scrutinizer.
# 14  
Old 01-03-2011
As I like PERL:
Code:
use strict;
use warnings;
use File::Basename;

my $NAME = basename $0;

$\ = "\n";
$, = '';
$" = '';

if (2 != @ARGV) {
    print STDERR 'USAGE: ', $NAME, '<file1> <file2>';
    exit 1;
}

my $F1 = shift @ARGV;
my $F2 = shift @ARGV;

my $len = length($F1) > length($F2) ? length($F1) : length($F2);
my $fmt = "\%-${len}s(%d): \%s\n";

open F1, '<', $F1 or die $F1;
open F2, '<', $F2 or die $F2;

my $L1;
my $L2;

while (1) {
    $L1 = <F1>;
    $L2 = <F2>; 

    last unless defined $L1 && defined $L2;

    if ($L1 eq $L2) {
        print '';
    next;
    }

    chomp $L1;
    chomp $L2;

    my @L1 = split //, $L1;
    my @L2 = split //, $L2;

    my @R = ();

    while (0 < @L1 || 0 < @L2 ) {
        my $c1 = shift @L1; $c1 = ' ' unless defined $c1;
        my $c2 = shift @L2; $c2 = ''  unless defined $c2;

    push @R, $c1 eq $c2 ? ' ' : $c1;
    }

    print @R;
}

# if file-1 is longer than file-2

while (defined $L1) {
    chomp $L1;
    print $L1;
    $L1 = <F1>;
}

# if file-2 is longer than file-1

while (defined $L2) {
    print ' ' x length($L2);
    $L2 = <F2>;
}

It was not a typo that $c1 is being assigned to a space if not defined and that $c2 is being assigned to an empty string if not defined. The former causes a space to be added to the resultant if line 1 is shorter than line2, the latter causes the value of $c1 to be added if line 2 is shorter than line 1.

A billion+ characters will take some time to process.
This User Gave Thanks to m.d.ludwig For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Read input file with in awk script not through command line

Hi All, Do we know how to read input file within awk script and send output toanother log file. All this needs to be in awk script, not in command line. I am running this awk through crontab. Cat my.awk #!/bin/awk -f function test(var){ some code} { } END { print"test code" } (5 Replies)
Discussion started by: random_thoughts
5 Replies

2. Shell Programming and Scripting

Difference between command line and script

i have a bit of a unique question. is there a way to know if a script is being run from from the command line or from another script? example: command line: ### ./autorun.sh ERROR: not permitted to run script from the commandline. exiting... but if i put "autorun.sh" into another... (4 Replies)
Discussion started by: SkySmart
4 Replies

3. Shell Programming and Scripting

how to differentiate file and directory name using ls command

how to differentiate file and directory name using ls command. l (3 Replies)
Discussion started by: jhon123
3 Replies

4. Shell Programming and Scripting

Script to delete files with an input for directories and an input for path/file

Hello, I'm trying to figure out how best to approach this script, and I have very little experience, so I could use all the help I can get. :wall: I regularly need to delete files from many directories. A file with the same name may exist any number of times in different subdirectories.... (3 Replies)
Discussion started by: *ShadowCat*
3 Replies

5. UNIX for Dummies Questions & Answers

Bash script to delete file input on command line

1) I wrote a script and gave the desired permissions using "chmod 755 scriptname". Now if i edit the script file, why do i need to set the permission again? Didn't i set the permission attribute.. or if i edit the file, does the inode number of file changes? 2) I am running my unix on a server... (1 Reply)
Discussion started by: animesharma
1 Replies

6. Shell Programming and Scripting

Need script to take input from file, match on it in file 2 and input data

All, I am trying to figure out a script to run in windows that will allow me to match on First column in file1 to 8th Column in File2 then Insert file1 column2 to file2 column4 then create a new file. File1: 12345 Sam 12346 Bob 12347 Bill File2:... (1 Reply)
Discussion started by: darkoth
1 Replies

7. UNIX for Dummies Questions & Answers

differentiate between a file and a device

sorry probably a beginner question but i was just wondering how unix does this as device are treated as files? (6 Replies)
Discussion started by: keith_hampson
6 Replies

8. Shell Programming and Scripting

Perl code to differentiate numeric and non-numeric input

Hi All, Is there any code in Perl which can differentiate between numeric and non-numeric input? (11 Replies)
Discussion started by: Raynon
11 Replies

9. UNIX for Dummies Questions & Answers

how to differentiate a file from a folder in a FIND?

I have to read a complete folder and if it is a file older that 7 days I have to copy it elsewhere and if it is a folder nothing to make. The way I do it: for I in `find /home/. -name "*" -mtime +7` do cp -Rf $I /home/elsewhere/. done Am I okay with the way I want to do it? Help... (3 Replies)
Discussion started by: denysQC
3 Replies

10. Shell Programming and Scripting

How to input username on text file into finger command on shell script

I'm trying to clean up my server and I have the list of some "special" users stored on the text file like this Now I want to write a shell script to finger all of them so I can have some kind of ideas who they are but here comes the problem....I completely forgot how to do it with shell... (3 Replies)
Discussion started by: Micz
3 Replies
Login or Register to Ask a Question