Sponsored Content
Top Forums Shell Programming and Scripting Difference of two data files & writing to an outfile. Post 302530668 by filter on Tuesday 14th of June 2011 05:15:42 PM
Old 06-14-2011
Hi rdcwayx,

Really appreciate your time in writing the awk script.

But each of these files have 500K records.

But I have managed to write a perl script for the diff file. i.e.

Code:
#!/usr/local/bin/perl
$self = $0;
$self =~ s!^.*/!!;
#
$[ = 1; # = number of first index into arrays and strings
#
$FIELD_SEPARATOR = '\t';
$FIELD_NUMBER_LIST =('38','82');
$field_separator = $FIELD_SEPARATOR;
$field_number_list = $FIELD_NUMBER_LIST;
#
while (@ARGV)
{
    $_ = shift;
    if    (/^-F$/)    { $field_separator = shift; }
    elsif (/^-L$/)    { $field_number_list = shift; }
    elsif (/^-F.+$/)  { $field_separator = substr($_,$[+2); }
    elsif (/^-L.+$/)  { $field_number_list = substr($_,$[+2); }
    #else              { push(@filename, $_); }
}
#
$file_a = 'file1';
$file_b = 'file2';
#
unless (($file_a ne "") && (-f $file_a))
{
    die "Error: Can't find file '$file_a'!\n";
}
unless (($file_b ne "") && (-f $file_b))
{
    die "Error: Can't find file '$file_b'!\n";
}
#
@index_list = split(/,/, $field_number_list);
#
# Scan first file, Pass 1:
open(FILE_A, "<$file_a") || die "Can't open '$file_a': $!\n";
#
while (<FILE_A>)
{
    chop if /\n$/;
    undef $key;
    undef @field;
    @field = split(/$field_separator/o);
    foreach $index (@index_list)
    {
        if (defined $key)
        {
            $key .= "\n" . $field[$index];
        }
        else
        {
            $key = $field[$index];
        }
    }
     $intersection{$key} = 1;
}
#
close(FILE_A);
# Scan second file, Pass 1:
#
$empty_intersection = 1;
#
open(FILE_B, "<$file_b") || die "Can't open '$file_b': $!\n";
#
while (<FILE_B>)
{
    chop if /\n$/;
    undef $key;
    undef @field;
    @field = split(/$field_separator/o);
    foreach $index (@index_list)
    {
        if (defined $key)
        {
            $key .= "\n" . $field[$index];
        }
        else
        {
            $key = $field[$index];
        }
    }
 $code = $intersection{$key};
if ($code == 1)
    {
        $intersection{$key} = 3;
        $empty_intersection = 0;
    }
    else
    {
        if ($code != 3) { $intersection{$key} = 2; }
    }
}
#
close(FILE_B);
#
# Prepare output file names:
$file_a_1 = $file_a . '.1';
#
# Scan first file, Pass 2:
#
open(FILE_A, "<$file_a")     || die "Can't open '$file_a': $!\n";
open(FILE_A_1, ">$file_a_1") || die "Can't write '$file_a_1': $!\n";
#
while (<FILE_A>)
{
    chop if /\n$/;
    undef $key;
    undef @field;
    @field = split(/$field_separator/o);
    foreach $index (@index_list)
    {
        if (defined $key)
        {
            $key .= "\n" . $field[$index];
        }
else
        {
            $key = $field[$index];
        }
    }
    if ($intersection{$key} == 3)
    {
       # 
    }
    else
    {
        print FILE_A_1 $_, "\n";
    }
}
#
close(FILE_A);
close(FILE_A_1);
#
# Scan second file, Pass 2:
#
open(FILE_B, "<$file_b")     || die "Can't open '$file_b': $!\n";
open(FILE_A_1, ">>$file_a_1") || die "Can't write '$file_a_1': $!\n";
#
while (<FILE_B>)
{
    chop if /\n$/;
    undef $key;
    undef @field;
    @field = split(/$field_separator/o);
    foreach $index (@index_list)
    {
        if (defined $key)
        {
            $key .= "\n" . $field[$index];
        }
        else
        {
            $key = $field[$index];
        }
    }
    if ($intersection{$key} == 3)
    {
      #      }
    else
    {
        print FILE_A_1 $_, "\n";
    }
}
#
close(FILE_B);
close(FILE_A_1);
#
# Display results:
#
printf("The Diff file created '%s'\n\n", $file_a_1);
#


The above code works perfectly for generating the diff file i.e. depending upon the primary keys (here taken 2) the outfile contains the records that exists in file1 but not in file2 and the records that exists in file2 but not in file1.

Now,

I need to compare the whole record(line) if the primary keys in file1 matches with the primary keys in file2. If both the lines are equal then discard else write to the outfile.

Could someone please help me out in order to the above step.

Really appreciate your thoughts on this.

Last edited by filter; 06-14-2011 at 07:04 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX Desktop Questions & Answers

what is the difference between Unix & linux, what are the advantages & disadvantages

ehe may i know what are the difference between Unix & Linux, and what are the advantages of having Unix as well as disadvantages of having Unix or if u dun mind i am dumb do pls tell me what are the advantages as well as the disadvantages of having linux as well. thanks (1 Reply)
Discussion started by: cybertechmkteo
1 Replies

2. Shell Programming and Scripting

Need help in writing a script to create a new text file with specific data from existing two files

Hi, I have two text files. Need to create a third text file extracting specific data from first two existing files.. Text File 1: Format contains: SQL*Loader: Release 10.2.0.1.0 - Production on Wed Aug 4 21:06:34 2010 some text ............so on...and somwhere text like: Record 1:... (1 Reply)
Discussion started by: shashi143ibm
1 Replies

3. UNIX for Dummies Questions & Answers

Reading and writing data to and from multiple files

Hi, I have several text files. One main file contains the detail data, other have some information to extract data from the main file, and some are empty files. Examples are shown below: The main file look like: MainFile.txt >Header1 data1...data1... >Header2 data2...data2... ... ...... (2 Replies)
Discussion started by: Fahmida
2 Replies

4. Shell Programming and Scripting

How to combine 2 files and output the unique & difference?

Hi Guys, I have two input files and I want to combine them and get the unique values and differences and put them into one file. See below desired output file. Inputfile1: 1111111 2222222 3333333 7860068 7860069 7860071 7860072 Inputfile2: 4444444 (4 Replies)
Discussion started by: pinpe
4 Replies

5. Shell Programming and Scripting

Sort a the file & refine data column & row format

cat file1.txt field1 "user1": field2:"data-cde" field3:"data-pqr" field4:"data-mno" field1 "user1": field2:"data-dcb" field3:"data-mxz" field4:"data-zul" field1 "user2": field2:"data-cqz" field3:"data-xoq" field4:"data-pos" Now i need to have the date like below. i have just... (7 Replies)
Discussion started by: ckaramsetty
7 Replies

6. Shell Programming and Scripting

Copying the Header & footer Information to the Outfile.

Hi I am writing a perl script which checks for the specific column values from a file and writes to the OUT file. So the feed file has a header information and footer information. I header information isaround107 lines i.e. Starts with START-OF-FILE ....... so on .... ... (11 Replies)
Discussion started by: filter
11 Replies

7. Shell Programming and Scripting

search & merg data from 3 files

i have 3 files which contains as below (example): yy-mm-dd hh:mm:sec lat lon depth mag 2006-01-01 23:17:26.80 39.8405 41.8795 2.0 3.3 2006-01-06 00:10:26.80 39.9570 41.2130 5.0 3.3 2006-01-06 06:59:02.10 39.4099 44.6065 10.0 3.7 2006-01-06 13:49:52.70... (4 Replies)
Discussion started by: oreka18
4 Replies

8. Shell Programming and Scripting

awk help: Match data fields from 2 files & output results from both into 1 file

I need to take 2 input files and create 1 output based on matches from each file. I am looking to match field #1 in both files (Userid) and create an output file that will be a combination of fields from both file1 and file2 if there are any differences in the fields 2,3,4,5,or 6. Below is an... (5 Replies)
Discussion started by: ambroze
5 Replies

9. Shell Programming and Scripting

Help on writing data from 2 different files to one based on a common factor

Hello all, I have 2 text files. For example: File1.txt contains data A B C D ****NEXT**** X Y Z ****NEXT**** L M N and File2.txt contains data (13 Replies)
Discussion started by: vat1kor
13 Replies

10. UNIX for Dummies Questions & Answers

Difference between & and nohup &

Hi All, Can anyone please help me understanding what the difference between the below two? 1. script.sh & 2. nohup script.sh & (2 Replies)
Discussion started by: Anupam_Halder
2 Replies
All times are GMT -4. The time now is 06:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy