Compare files column to column based on keys


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare files column to column based on keys
# 1  
Old 03-30-2010
Compare files column to column based on keys

Here is my situation. I need to compare two tab separated files (diff is not useful since there could be known difference between files).

I have found similar posts , but not fully matching.I was thinking of writing a shell script using cut and grep and while loop but after going thru posts it appears awk or perl would be more appropriate. Thanks

req 1- I need to extract keys from file1 (two columns) and match with file2 based on keys. This is done by the following which I found in other posts.

Code:
 
awk -F"\t" '
    FILENAME=="f1.txt" {
        Keys[$1 $2]++
    }
    FILENAME=="f2.txt" {
        if (Keys[$1 $2] == 0) {
            print $0
        }
    }
' f1.txt f2.txt > rf.txt

req 2- match rows based on keys for all columns except keys one at a time and produce a report if column value mismatch between files. Report will be examined to ignore some known differences.

File samples -
Code:
f1.txt

210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11
410	998877	web	1003		

f2.txt

210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11.00
410	998877	web	1203		

report -

f2	310	998877	column6 11	11.00
f2	410	998877	column4	1003	1203

# 2  
Old 03-30-2010
Write this code, but only get one line:
Code:
awk -F "\t" 'NR==FNR{a[$1]=$0;next}
                {split(a[$1],b,"\t") ;for (i=1;i<=NF;i++) {if ($i!=b[i]) print FILENAME,$1,$2, "column"i,b[i],$i}}' f1.txt f2.txt

f2.txt 410 998877 column4 1003 1203

The reason is ($i!=b[i]) , awk can't identify 11 is not same as 11.00

How can I fix the code?

---------- Post updated at 01:34 PM ---------- Previous update was at 01:24 PM ----------

Ok, I fix it by myself.

Code:
$ awk -F "\t" 'NR==FNR{a[$1]=$0;next}
                {split(a[$1],b,"\t") ;for (i=1;i<=NF;i++) {c=$i"x";d=b[i]"x"; if (c!=d) print FILENAME,$1,$2, "column"i,b[i],$i}}' f1.txt f2.txt   

f2.txt 310 998877 column6 11 11.00
f2.txt 410 998877 column4 1003 1203

# 3  
Old 03-31-2010
files are being matched using only first column instead of first two columns, so when there are multiple records with same column1 value (which is record id) but different column2(which transaction id) matching doesn't work.
Code:
f1.txt
210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11
410	998877	web	1003		
210	998878	phone	9981128210	add	112 nw st.
310	998878	usg	750	ex	11
410	998878	web	930		

f2.txt
210	998877	phone	9981128209	add	111 nw st.
310	998877	usg	650	ex	11.00
410	998877	web	1203		
210	998878	phone	9981128210	add	112 nw st.
310	998878	usg	750	ex	11
410	998878	web	850		

result.txt

f2.txt 210 998877 column2 998878 998877
f2.txt 210 998877 column4 9981128210 9981128209
f2.txt 210 998877 column6 112 nw st. 111 nw st.
f2.txt 310 998877 column2 998878 998877
f2.txt 310 998877 column4 750 650
f2.txt 310 998877 column6 11 11.00
f2.txt 410 998877 column2 998878 998877
f2.txt 410 998877 column4 930 1203
f2.txt 410 998878 column4 930 850

it should be 

f2.txt 310 998877 column6 11 11.00
f2.txt 410 998877 column4 1003 1203
f2.txt 410 998878 column4 930 850

simple diff works too -
 
 diff -b f1.txt f2.txt

2,3c2,3
< 310   998877  usg     650     ex      11
< 410   998877  web     1003
---
> 310   998877  usg     650     ex      11.00
> 410   998877  web     1203
6c6
< 410   998878  web     930
---
> 410   998878  web     850

but I can't format diff output as required. Thought awk should do the trick. At the heart of it is that I'm trying to compare two huge directories with different filenames and generate a report to show only the diff in such a way that can be matched to some exception list. I wrote shell script to dothe work, but go stuck it generating the report.
Thanks again.

---------- Post updated at 05:46 PM ---------- Previous update was at 02:31 PM ----------

got it- added $1$2 instead of only $1

Code:
awk -F "\t" 'NR==FNR{a[$1$2]=$0;next}
                {split(a[$1$2],b,"\t") ;
                for (i=1;i<=NF;i++)
                {c=$i"x";d=b[i]"x"; if (c!=d) print FILENAME,$1,$2, "column"i,b[i],$i}
                }' f1.txt f2.txt

result -
f2.txt 310 998877 column6 11 11.00
f2.txt 410 998877 column4 1003 1203
f2.txt 410 998878 column4 930 850


Last edited by blackjack101; 03-31-2010 at 06:23 PM.. Reason: did more testing
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

UNIX script to compare 3rd column value with first column and display

Hello Team, My source data (INput) is like below EPIC1 router EPIC2 Targetdefinition Exp1 Expres rtr1 Router SQL SrcQual Exp1 Expres rtr1 Router EPIC1 Targetdefinition My output like SQL SrcQual Exp1 Expres Exp1 Expres rtr1 Router rtr1 Router EPIC1 Targetdefinition... (5 Replies)
Discussion started by: sekhar.lsb
5 Replies

2. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Example: I have files in below format file 1: zxc,133,joe@example.com cst,222,xyz@example1.com File 2 Contains: hxd hcd jws zxc cst File 1 has 50000 lines and file 2 has around 30000 lines : Expected Output has to be : hxd hcd jws (5 Replies)
Discussion started by: TestPractice
5 Replies

3. Shell Programming and Scripting

Compare two csv's with column based

Hi, I am having below two CSV's col_1,col_2,col_3 1,2,4 1,3,6 col_1,col_3,col2,col_5,col_6 1,2,3,4,5 1,6,3,,, I need to compare based on the columns where the mismatch is expected output col_1,col_2,col_3 1,2,4 (3 Replies)
Discussion started by: rohit_shinez
3 Replies

4. Shell Programming and Scripting

Combine multiple rows based on selected column keys

Hello I want to collapse a file with multiple rows into consolidated lines of entries based on selected columns as the 'key'. Example: 1 2 3 Abc def ghi 1 2 3 jkl mno p qrts 6 9 0 mno def Abc 7 8 4 Abc mno mno abc 7 8 9 mno mno abc 7 8 9 mno j k So if columns 1, 2 and 3 are... (6 Replies)
Discussion started by: linuxlearner123
6 Replies

5. Shell Programming and Scripting

Compare two files based on column

Hi, I have two files roughly 1200 fields in length for each row, sorted on the 2nd field. I need to compare based on that 2nd column between file1 and file2 and print lines that exist in both files into separate files (I can't guarantee that every line in file1 is in file2). Example: File1: ... (1 Reply)
Discussion started by: origon
1 Replies

6. Shell Programming and Scripting

Compare based on column value

Hi Experts, I want to compare 2 text files based on their column values text1 is like prd-1234 yes no yes yes prd-2345 no no no yes prd-6475 yes yes yes no and test 2 is prd-1234 no no no yes prd-2345 yes no no no desired out put as follows prd-1234 1 3 prd-235 1 4 basically it shows... (5 Replies)
Discussion started by: tijomonmathew
5 Replies

7. Shell Programming and Scripting

Compare 2 files and match column data and align data from 3 column

Hello experts, Please help me in achieving this in an easier way possible. I have 2 csv files with following data: File1 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:35:47,JOB_5330 08/23/2012 12:36:09,JOB_5340 08/23/2012 12:36:14,JOB_5340 08/23/2012 12:36:22,JOB_5350 08/23/2012... (5 Replies)
Discussion started by: asnandhakumar
5 Replies

8. Shell Programming and Scripting

Nawk script to compare records of a file based on a particular column.

Hi Gurus, I am struggling with nawk command where i am processing a file based on columns. Here is the sample data file. UM113570248|24-AUG-11|4|man1|RR211 Alert: Master Process failure |24-AUG-11 UM113570624|24-AUG-11|4|man1| Alert: Pattern 'E_DCLeDAOException' found |24-AUG-11... (7 Replies)
Discussion started by: usha rao
7 Replies

9. Shell Programming and Scripting

Compare Two Files(Column By Column) In Perl or shell

Hi, I am writing a comparator script, which comapre two txt files(column by column) below are the precondition of this comparator 1)columns of file are not seperated Ex. file1.txt 8888812341181892 1243548895685687 8945896789897789 1111111111111111 file2.txt 9578956789567897... (2 Replies)
Discussion started by: kumar96877
2 Replies

10. Shell Programming and Scripting

Sum a column value based on multiple keys

Hi, I have below as i/p file: 5ABC 36488989 K 000010000ASB BYTRES 5PQR 45757754 K 000200005KPC HGTRET 5ABC 36488989 K 000045000ASB HGTRET 5GTH 36488989 K 000200200ASB BYTRES 5FTU ... (2 Replies)
Discussion started by: nirnkv
2 Replies
Login or Register to Ask a Question