Plz Help. Compare 2 files field by field and get the output in another file.


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 07-25-2012
Data Plz Help. Compare 2 files field by field and get the output in another file.

Hi Freinds,
I have 2 files . one is source.txt and second one is target.txt. I want to keep source.txt as baseline and compare target.txt. please find the data in 2 files and Expected output.

Source.txt
Code:
1|HYD|NAG|TRA|34.5|1234
2|CHE|ESW|DES|36.5|134
3|BAN|MEH|TRA|33.5|234
4|PUN|ABHI|TA|38.5|123
5|KIN|NAV|PRA|31.5|135

target.txt
Code:
1|HYD|NAG|TRA|34.5|1234
2|CHE|EW|DES|33.5|134
5|KIN|NV|PRA|31.5|136

Expected output should be :

Number of Extra records in Source file : 2
Code:
3|BAN|MEH|TRA|33.5|234
4|PUN|ABHI|TA|38.5|123

Number of mismatches:
Code:
KEYFIELD|COLUMN_NUMBER|SOURCE_VALUE|TARGET_VALUE
2 |3 |ESW |EW
2 |5 |36.5 |33.5
5 |3 |NAV |NV

Please help. I am Very new to Unix shell scripting . Smilie Smilie

Last edited by Franklin52; 07-28-2012 at 02:25 PM.. Reason: Please use code tags for data and code samples, thank you
# 2  
Old 07-25-2012
I guess this is already discussed!! I saw almost the same examples too!! Smilie
# 3  
Old 07-25-2012
Hi Pikka45 , yes we have done this with the follwing code

paste -d '|' File1.txt File2.txt | awk -F '|' '{c=NF/2;for(i=1;i<=c;i++)if($i!=$(i+c))printf "line %-5s field %s\n",NR,i}'

The above code is used only for the below condition :

1)when we have equal number of records where all the keyfields (let assume first field) present in both the source.txt and target.txt

2) When we have source.txt.count < target.txt.count and all the keyfields (let assume 1st filed) are present in both the source.txt and target.txt.

3) it is not working when source.txt.count > target.txt.count and if there are mismatches .

I hope you understand the above cases. Kindly help if there is any chance to cover the 3rd senario. Smilie

---------- Post updated at 12:56 PM ---------- Previous update was at 12:10 PM ----------

Hi Friends, Can anyone look into the Thread .. Plz help..
# 4  
Old 07-25-2012
Hi, check this

Code:
#!/bin/bash

>extra.txt
>mismatch.txt
while read sLine; do
    OFS="$IFS"
    IFS="|"
    sTab=( $sLine );
    tLine="$(egrep "^"${sTab[0]} target.txt)"
    if [ -z "$tLine" ]; then echo "$sLine" >>extra.txt; IFS="$OFS"; continue; fi
    tTab=( $tLine );
    for (( i = 1 ; i < ${#sTab[@]} ; i++ )); do
        [ "${sTab[$i]}" = "${tTab[$i]}" ] || echo "${sTab[0]}|$i|${sTab[$i]}|${tTab[$i]}" >>mismatch.txt
    done
    IFS="$OFS"
done <source.txt

echo "Number of Extra records in Source file : $(cat extra.txt|wc -l)"
cat extra.txt

echo "Number of mismatches : $(cat mismatch.txt|wc -l)"
cat mismatch.txt

This User Gave Thanks to Chirel For This Post:
i150371485 (07-25-2012)
# 5  
Old 07-25-2012
@Chirel : Thank you so much Smilie Smilie it worked as expected. I have one doubt. if i use for some 50,000 records with 50 columns it is taking more time. is there any way we can reduce the timing and increase the performance ? Plz help ..
# 6  
Old 07-26-2012
perl

Code:
open my $fh,"<a.txt";
while(<$fh>){
	my @tmp = split("[|]",$_);
	my @t = @tmp[1..$#tmp];
	$hash{$tmp[0]} = \@t;
}
close $fh;
while(<DATA>){
	my @tmp = split("[|]",$_);
	if(not exists $hash{$tmp[0]}){
		print;
	}
	else{
		my @t = @{$hash{$tmp[0]}};
		my @diff;
		for(my $i=0;$i<=$#t;$i++){
			if($t[$i] ne $tmp[$i+1]){
				push @diff, ($i+2,$t[$i],$tmp[$i+1]);
			}
		}
		print join "|", ($tmp[0],@diff) if $#diff>=0;
		print "\n";
	}
}
__DATA__
1|HYD|NAG|TRA|34.5|1234
2|CHE|ESW|DES|36.5|134
3|BAN|MEH|TRA|33.5|234
4|PUN|ABHI|TA|38.5|123
5|KIN|NAV|PRA|31.5|135

a.txt
Code:
1|HYD|NAG|TRA|34.5|1234
2|CHE|EW|DES|33.5|134
5|KIN|NV|PRA|31.5|136

Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Combine Similar Output from the 2nd field w.r.t 1st Field alvinoo UNIX for Dummies Questions & Answers 1 04-28-2016 04:11 AM
Command/script to match a field and print the next field of each line in a file. pocodot Shell Programming and Scripting 10 08-17-2015 08:28 AM
File Compare at field level Saanvi1 Shell Programming and Scripting 5 02-12-2014 01:35 PM
How do I format a Date field of a .CSV file with multiple commas in a string field? dhruuv369 Linux 3 10-16-2013 04:50 PM
Compare two CSV files and put the difference in third file with line no,field no and diff value. karingulanagara Shell Programming and Scripting 12 03-05-2013 04:01 AM
Read in 2-column CSV, output many files based on field pxalpine Shell Programming and Scripting 4 12-06-2012 08:16 PM
Help with AWK - Compare a field in a file to lookup file and substitute if only a match venalla_shine UNIX for Dummies Questions & Answers 4 10-08-2012 03:46 PM
Compare two files Field by field and output the result in another file i150371485 Shell Programming and Scripting 7 07-20-2012 07:01 AM
Compare a common field in two files and append a column from File 1 in File2 Santoshbn Shell Programming and Scripting 2 04-26-2012 01:12 AM
Compare Field in Current Line with Field in Previous moutaye Shell Programming and Scripting 9 03-27-2012 05:18 AM
compare two files based on common field in unix PRS UNIX for Dummies Questions & Answers 4 12-07-2011 12:27 PM
Compare two files and output difference, by first field using awk. charles33 Shell Programming and Scripting 2 11-04-2011 12:55 AM
AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2 right_coaster Shell Programming and Scripting 4 10-06-2011 06:07 PM
How to compare 2 field from 2 separated file micxshinoda UNIX and Linux Applications 0 09-19-2011 01:39 AM
AWK to compare two files for each field value Sangtha Shell Programming and Scripting 2 07-17-2009 10:09 AM