comparing two fields from two different files in AWK


 
Thread Tools Search this Thread
Top Forums Programming comparing two fields from two different files in AWK
# 1  
Old 10-08-2011
comparing two fields from two different files in AWK

Hi,

I have two files formatted as following:

File 1: (user_num_ID , realID) (the NR here is 41671)

Code:
1  cust_034_60
2  cust_80_91
3  cust_406_4

..
..

File 2: (realID , clusterNumber) (total NR here is 1000)

Code:
cust_034_60  2
cust_406_4   3
..
..

and i want to compare these two files based on File 1 $2 field and File 2 $1 field and get a resulting file like

File 3: (user_num_ID, realID, clusterNumber)
Code:
1 cust_034_60 2
3 cust_406_4 3
..
..
..

just to summarize:
Create a new file3
Find where file2 ($1) is equal to file1($2)
and print to file3 file1($1) and file2($1,$2) (as a single record)



since im new in awk, i've tried several suggestions from previous threads but it seems that im doing something wrong...

the latest one I've tried:

Code:
awk > file3 'NR==FNR{ _[$2]=$1 next}{print $0, _[$1,$2] }' file1 file2

surely im doing something wrong Smilie

Thanks in advance

Moderator's Comments:
Mod Comment Video tutorial on how to use code tags in The UNIX and Linux Forums.

Last edited by radoulov; 10-09-2011 at 06:06 AM..
# 2  
Old 10-09-2011
Code:
awk > file3 'NR == FNR {
  f2[$1] = $2; next 
  }
$2 in f2 {
  print $0, f2[$2]
  }' file2 file1

# 3  
Old 10-09-2011
Hi radoulov,

Thank you for your quick response. I've tried the script you suggested and it seems that i get an empty file3...is there an issue on whether i'm using awk under cygwin? I'm sure that i have types exactly what you suggested.

Thanks again

ps: i've tried using grep -f by just comparing the real_custID of file2 with file1 and it worked fine..but i'm curious on how this can be done using awk.
# 4  
Old 10-09-2011
No, this is not a Cygwin issue.

Consider the following (rigth now, I'm on Cygwin too):

This is the content of the two input files: file1 and file2:
Code:
% head file[12]
==> file1 <==
1  cust_034_60
2  cust_80_91
3  cust_406_4

==> file2 <==
cust_034_60  2
cust_406_4   3

This is what I get when I run the awk command:

Code:
% awk 'NR == FNR {
  f2[$1] = $2; next
  }
$2 in f2 {
  print $0, f2[$2]
  }' file2 file1
1  cust_034_60 2
3  cust_406_4 3

To debug further, try dumping the content of the array f2 and the content of file1:

Code:
% awk 'NR == FNR {
  f2[$1] = $2; next
  }
FNR == 1 {
  for (F in f2)
    printf "|%s| --> |%s|\n", f2[F], F
  }
{
  printf "|%s|\n", $2
  }' file2 file1
|2| --> |cust_034_60|
|3| --> |cust_406_4|
|cust_034_60|
|cust_80_91|
|cust_406_4|


Your output should be different than mine.
This User Gave Thanks to radoulov For This Post:
# 5  
Old 10-10-2011
It worked perfectly well nowSmilie

Thank you again radoulov, all the best!
# 6  
Old 12-07-2011
Hi, all:

I roughly understand that this code first reads in a smaller file(file_2) and stores every line into an array f2.

Next, when it reads in the other bigger one, file_1, line by line, it checks each line's 2nd field against the array f2. If there is a match, it prints out the current line in process and an element from the array f2.

Code:
awk > file3 'NR == FNR {
  f2[$1] = $2; next 
  }
$2 in f2 {
  print $0, f2[$2]
  }' file_2 file_1

#==> file_2 <==
#cust_034_60 2
#cust_406_4 3
#
#==> file_1 <==
#1 cust_034_60
#2 cust_80_91
#3 cust_406_4
#
#==> OUTPUT:
#1 cust_034_60 2
#3 cust_406_4 3


But I have some difficulty to figure out:

1. how file_2's lines being stored into f2?

Use the 1st line of file_2 as an example, "fcust_034_60 2"
==> f2[$1]=$2

It looks to me, the code only put the 2nd field, which is the number 2, into the array's position $1, not the entire current line goes in there.

2. how f2's element being retrieved by the code
==> f2[$2]?

If it had just put a field in the 1st step,
how could the code use "fcust_034_60" now read in from file_1 to search against the array f2?

To me, "fcust_034_60" is even not in f2.
And in the previous step, seems to me, the code "f2[$1]=$2", just put $2 from the line in the $1 position of f2. How come when retrieved, it is from the position $2 of f2?

I am new here and awk is new to me too.
The book I have at hand is talking about using awk to process one single file only.

Using awk to process 2 or more files at a time seems will take me some get used to it.

Thanks to the other person here on the other page clarified the NR/FNR issue very well,
Now I have a vague impression about how awk uses NR/FNR to handle 2 files's comparison at a time.

Hopefully someone here will give me some hints to my questions above too. Thank you in advance.

Regards,
# 7  
Old 12-07-2011
Quote:
Originally Posted by Cybex2011
1. how file_2's lines being stored into f2?

Use the 1st line of file_2 as an example, "fcust_034_60 2"
==> f2[$1]=$2
In this example:f2['cust_034_60']=2
f2 is the array, and yes, 'cust_034_60' is an array index Smilie
(not sure where you got fcust_034_60 however...)

Quote:
Originally Posted by Cybex2011
2. how f2's element being retrieved by the code
==> f2[$2]?

If it had just put a field in the 1st step,
how could the code use "fcust_034_60" now read in from file_1 to search against the array f2?
See my previous statement Smilie
This User Gave Thanks to tukuyomi For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two files using four fields

Dear All, I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2... (6 Replies)
Discussion started by: NamS
6 Replies

2. Shell Programming and Scripting

Comparing two files using four fields

I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2 are... (1 Reply)
Discussion started by: NamS
1 Replies

3. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Hi All, I am looking for an awk script to do the following Join the fields together only if the first 4 fields are same. Can it be done with join function in awk?? a,b,c,d,8,,, a,b,c,d,,7,, a,b,c,d,,,9, a,b,p,e,8,,, a.b,p,e,,9,, a,b,p,z,,,,9 a,b,p,z,,8,, desired output: ... (1 Reply)
Discussion started by: aksijain
1 Replies

4. UNIX for Dummies Questions & Answers

Comparing multiple fields from 2 files uing awk

Hi I have 2 files as below File 1 Chr Start End chr1 120 130 chr1 140 150 chr2 130 140 File2 Chr Start End Value chr1 121 128 ABC chr1 144 149 XYZ chr2 120 129 PQR I would like to compare these files using awk; specifically if column 1 of file1 is equal to column 1 of file2... (7 Replies)
Discussion started by: sshetty
7 Replies

5. Shell Programming and Scripting

comparing two files for matching fields

I am newbie to unix and would please like some help to solve the task below I have two files, file_a.text and file_b.text that I want to evaluate. file_a.text 1698.74 1711.88 6576.25 899.41 3205.63 4187.98 697.35 1551.83 ... (3 Replies)
Discussion started by: gameli
3 Replies

6. Shell Programming and Scripting

Problem in comparing 2 fields from 2 files

I've 2 files. Need to compare File1.Field1,File1.Field2 with File2.Field1,File2.Field2. If matches then create a new file. File1 10 A|ADB|967143.24|1006101.5 3E HK|DHB|24294.76|242513.89 ABN ACU|ADB|22104.69|51647.14 ABN BU|DBA|39137.14|109128.38 ABN|ADB|64466.89|167936.55 ABOC... (2 Replies)
Discussion started by: buster
2 Replies

7. Shell Programming and Scripting

Comparing two files and inserting new fields

Hi all, I searched the forum and tried to learn from the similar posts. However, I am new and I need to get help on this. I hope an expert kindly help me to sort this out. I need to compare field 1 and 2 of the first file with the same fields of the second file and if both fields matches... (9 Replies)
Discussion started by: GoldenFire
9 Replies

8. Shell Programming and Scripting

Comparing fields in two files

Hi, i want to compare two files by one field say $3 in file1 needs to compare with $2 in file2. sample file1 - reqd_charge_code 2263881188,24570896,439 2263881964,24339077,439 2263883220,22619162,228 2263884224,24631840,442 2263884246,22612161,442 sample file2 - rg_j ... (2 Replies)
Discussion started by: raghavendra.cse
2 Replies

9. Shell Programming and Scripting

Comparing two files and replacing fields

I have two files with ids and email addresses. File 2 cotains a subset of the records in file 1. The key field is the first field containing the id. file 1: 123|myadr@abc.com 456|myadr2@abc.com 789|myadr3@abc.com file 2: 456|adr456@xyz.com Where the record appears in the second... (3 Replies)
Discussion started by: tltroy
3 Replies

10. Shell Programming and Scripting

Merging two files by comparing three fields

Hi Experts, I need your timely help. I have a problem with merging two files. Here my situation : Here I have to compare first three fields from FILE1 with FILE2. If they are equal, I have to append the remaining values from FILE2 with FILE1 to create the output. FILE1: Class ... (3 Replies)
Discussion started by: Hunter85
3 Replies
Login or Register to Ask a Question