Compare three files based on two fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare three files based on two fields
# 1  
Old 12-27-2012
Compare three files based on two fields

Guys,

I tried searching on the internet and I couldn't get the answer for this problem. I have 3 files. First 2 fields of all of them are of same type, say they come from various databases but first two fields in the 3 files means the same.

I need to verify the entries that are not present in all the 3 files.

Below are the files 1.txt, 2.txt & 3.txt respectively

Code:
2677|47876992|1|20:20:51|12/16/2012|1|1|496300|
2677|47877024|2|13:47:37|12/16/2012|1|1|008994|
2677|47877052|3|21:03:56|12/16/2012|1|1|647546|
2677|47877055|4|16:54:07|12/16/2012|1|1|133914|
2677|47877099|5|16:29:06|12/16/2012|1|1|379245|
2677|47877081|6|10:44:13|12/16/2012|1|1|014078|
2677|47877232|7|19:07:06|12/16/2012|1|1|242776|
2677|47877246|8|13:02:32|12/16/2012|1|1|623853|
2677|47877258|9|22:03:05|12/16/2012|1|1|997345|
2677|47877351|10|16:29:27|12/16/2012|1|1|792584|
 
2677|47876992|1|1|40|
2677|47877024|32|1|100|
2677|47877052|2|1|39|
2677|47877055|1|1|75|
2677|47877074|1|1|9|
2677|47877081|2|1|175|
2677|47877232|1|1|10|
2677|47877246|9|1|25|
2677|47877258|25|1|40|
2677|47877350|9|1|50|

2677|47876992|1|7000|603098|40|0|
2677|47877024|1|7000|603086|100|0|
2677|47877052|1|1700|200180|39|0|
2677|47877055|1|7000|603098|75|0|
2677|47877074|1|1700|003400|9|0|
2677|47877081|1|7000|603062|25|0|
2677|47877081|2|7000|603065|50|0|
2677|47877081|3|7000|603074|100|0|
2677|47877232|1|7000|601802|10|0|
2677|47877246|1|7000|252120|25|0|

The output should be the lines that are not in all the three files. that is like below.

Code:
2677|47877099
2677|47877258
2677|47877351
2677|47877350

It would be geat if the output has the filenames as well. Smilie Smilie like below

Code:
2677|47877099|1.txt|
2677|47877258|1.txt,2.txt|
2677|47877351|1.txt|
2677|47877350|2.txt|

# 2  
Old 12-27-2012
I'm still new to unix, but I'm pretty sure you need to AWK your results. I'll try to find an example, but if your not new to unix. I'm sure you will find it quicker than me. Much of luck, hope I was of some assistance.
# 3  
Old 12-27-2012
Code:
$ awk -F "|" 'f==1{A[$1,$2]++;B[$1,$2]=$1 FS $2 FS FILENAME}
f==2{A[$1,$2]++;B[$1,$2]=B[$1,$2]?B[$1,$2]","FILENAME:$1 FS $2 FS FILENAME}
f==3{A[$1,$2]++;B[$1,$2]=B[$1,$2]?B[$1,$2]","FILENAME:$1 FS $2 FS FILENAME}END{
for (i in A){if(A[i]<3){print B[i]"|"}}}
' f=1 file1 f=2 file2 f=3 file3

2677|47877074|file2,file3|
2677|47877258|file1,file2|
2677|47877099|file1|
2677|47877350|file2|
2677|47877351|file1|

EDIT:

Minimized it's length lit bit..Smilie using function..Smilie

Code:
awk -F "|" 'function define_arr() {
A[$1,$2]++;
B[$1,$2]=B[$1,$2]?B[$1,$2]","FILENAME:$1 FS $2 FS FILENAME
}
f==1{define_arr()}
f==2{define_arr()}
f==3{define_arr()}
END{for (i in A){if(A[i]<3){print B[i]"|"}}}' f=1 file1 f=2 file2 f=3 file3


Last edited by pamu; 12-27-2012 at 02:11 AM..
This User Gave Thanks to pamu For This Post:
# 4  
Old 12-27-2012
pamu's suggestion could be shortened a bit further without a function:
Code:
awk -F\| '
{
  i=$1 FS $2
  A[i]++
  B[i]=(B[i]?B[i]",":i FS) FILENAME
}
END{
  for (i in A) if(A[i]<3) print B[i] FS
}' file*

But this would not count the occurrence of entries that occur multiple times in one file but not in other files. To counteract that, one could do something like this:

Code:
awk -F\| '
{
  i=$1 FS $2
}
!D[i,FILENAME]++{
  A[i]++
  B[i]=(B[i]?B[i]",":i FS) FILENAME
}
END{
  for (i in A) if(A[i]<3) print B[i] FS
}' file*


Last edited by Scrutinizer; 12-27-2012 at 05:30 AM..
# 5  
Old 12-27-2012
awk

Code:
(awk '{print $0""FILENAME}' a && awk '{print $0""FILENAME}' b && awk '{print $0""FILENAME}' c) | awk -F"[|]" '{
cnt[$1" "$2]++
file[$1" "$2]=sprintf("%s%s",file[$1" "$2],(file[$1" "$2])?","$NF:$NF)
}
END{
for (i in cnt){
  if(cnt[i] < 3){
    print i" "file[i]	
  }
}
}'


Last edited by Scrutinizer; 12-27-2012 at 05:41 AM.. Reason: code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Is there a UNIX command that can compare fields of files with differing number of fields?

Hi, Below are the sample files. x.txt is from an Excel file that is a list of users from Windows and y.txt is a list of database account. $ head -500 x.txt y.txt ==> x.txt <== TEST01 APP_USER_PROFILE USER03 APP_USER_PROFILE TEST02 APP_USER_EXP_PROFILE TEST04 APP_USER_PROFILE USER01 ... (3 Replies)
Discussion started by: newbie_01
3 Replies

2. Shell Programming and Scripting

Compare fields in two files

Hi, I am trying to check two files based on certain string and field. cat f1 source=\GREP\" hi this \\ source=\SED\" skdmsmd dnksdns source=\PERL\" cat f2 source=\SED\" source=\GREP\" vlamskds amdksk m source=\AWK\" awk \here\" (3 Replies)
Discussion started by: greet_sed
3 Replies

3. UNIX for Dummies Questions & Answers

Compare 2 fields in 2 files

I am trying to compare two files (separted by a pipe) using 2 fields (field 1,3 from fileA and 1,2 from fileB) if the two files match i want the whole record of fileA adding the extra fields left from fileB. 1. A.txt cat|floffy|12|anything|anythings cat|kitty|15|lala|lalala... (6 Replies)
Discussion started by: sabercats
6 Replies

4. Shell Programming and Scripting

Compare two fields in text files?

Hi, I have two text files, compare column one in both the files and if it matches then the output should contain the id in column one, the number and the description. Both the files are sorted. Is there a one liner to get this done, kindly help. Thank you File 1: NC_000964 92.33 ... (2 Replies)
Discussion started by: pulikoti
2 Replies

5. Shell Programming and Scripting

Compare fields in files

Hi, I need the most efficient way of comparing the following and arriving at the result I have a file which has entries like, File1: 1|2|5|7|8|2|3|6|3|1 File2: 1|2|3|1|2|7|9|2 I need to compare the entries in these two file with those of a general file, 1|2|3|5|2|5|6|9|3|1... (7 Replies)
Discussion started by: pradebban
7 Replies

6. Shell Programming and Scripting

compare fields in different files

HI I'm having some troubles to compare and permut diffrent fields indexed with another filed like the following example `: file1 1 1 2 2 3 3 file2 7 1 9 2 10 3 result------------------- (6 Replies)
Discussion started by: yassinegoth
6 Replies

7. Shell Programming and Scripting

Compare files with fields separated with semicolon

Dear experts I have files like ABD : 5869 events, relative ratio : 1.173800E-01 , sum of ratios : 1.173800E-01 VBD : 12147 events, relative ratio : 2.429400E-01 , sum of ratios : 3.603200E-01 SDF : 17000 events, relative ratio : 3.400000E-01 , sum of ratios : 7.003200E-01 OIP: 14984... (9 Replies)
Discussion started by: Alkass
9 Replies

8. Shell Programming and Scripting

AWK Compare files, different fields, output

Hi All, Looking for a quick AWK script to output some differences between two files. FILE1 device1 1.1.1.1 PINGS device1 2.2.2.2 PINGS FILE2 2862 SITE1 device1-prod 1.1.1.1 icmp - 0 ... (4 Replies)
Discussion started by: stacky69
4 Replies

9. Shell Programming and Scripting

Compare fields in 2 files using AWK

Hi unix gurus, I have a urgent requirement, I need to write a AWK script to compare each fields in 2 files using AWK. Basically my output should be like this. file1 row|num1|num2|num3 1|one|two|three 2|one|two|three file2 row|num1|num2|num3 1|one|two|three 2|one|two|four ... (5 Replies)
Discussion started by: rashmisb
5 Replies

10. Shell Programming and Scripting

Compare two files based on values of fields.

Hi All, I have two files and data looks like this: File1 Contents #Field1,Field2 Dist_Center_file1.txt;21 Dist_Center_file3.txt;20 Dist_Center_file2.txt;20 File2 Contents (*** No Header ***) Dist_Center_file1.txt;23 Dist_Center_file2.txt;20 Dist_Center_file3.txt;20 I have... (4 Replies)
Discussion started by: Hangman2
4 Replies
Login or Register to Ask a Question