comparing two fields from two different files in AWK


 
Thread Tools Search this Thread
Top Forums Programming comparing two fields from two different files in AWK
# 8  
Old 12-07-2011
Hi Cybex2011,
welcome to unix.com!

Quote:
But I have some difficulty to figure out:

1. how file_2's lines being stored into f2?

Use the 1st line of file_2 as an example, "fcust_034_60 2"
==> f2[$1]=$2

It looks to me, the code only put the 2nd field, which is the number 2,
into the array's position $1, not the entire current line goes in there.
We associate the value of the first field (this is the key, the pseudo-index) with the value of the second one
(the value, the array element).
This is actually the entire line (because in this case file_2 contains only two fields).
Note that what you see in the output is the content of the entire line from file_1
and only the second column from file_2:

Code:
==> OUTPUT:
1 cust_034_60       2
|__ file_1___|  |_file_2_|
3 cust_406_4        3
|__ file_1___|  |_file_2_|



awk
's arrays are associative: consider them key-value pairs.
"Array position $1" actually means that the value of the second field $2
is associated with the key - the value of the first field:

Code:
cust_034_60       2
|___key___|   |_value_|

Check this page for more detailed info.

Quote:
2. how f2's element being retrieved by the code
==> f2[$2]?

If it had just put a field in the 1st step,
how could the code use "fcust_034_60" now read in from file_1 to search against the array f2?

To me, "fcust_034_60" is even not in f2.
And in the previous step, seems to me, the code "f2[$1]=$2", just put $2 from the line in the $1 position of f2. How come when retrieved, it is from the position $2 of f2?
The value is retrieved by comparing the value of the second field from file_1
with the keys present in the associative array f2
(in this case: cust_034_60 and cust_406_4).

Hope this helps.
This User Gave Thanks to radoulov For This Post:
# 9  
Old 12-08-2011
tukuyomi & radoulov,
Many thanks to you both.
///////


f2[$1]=$2 and f2[$2] bogged me down for a whole afternoon.
I was confused by them. Thanks to your help, it is clear to me now.

In this block
Code:
$2 in f2 {
  print $0, f2[$2]
  }'

I know now $2 refers to the 2nd column of file_1,
and file_2 at this stage is already a thing of the past.

One line back, in this statement
Code:
f2[$1]=$2;

$1 refers to the 1st column of file_2.

These 2 colums have some elements in common.
In that case, the related lines will be printed out.

I also revised radoulov's original code as follows.
I feel it's more readable for a prior c/c++ person like me.
It seems I don't need 'next' in here any more.
The output is the same.
Correct me please if there is any error in my revision. Thank you both again.
Code:
awk  '{if(NR==FNR){#now read file_2 into f2
       f2[$1]=$2} 
      else #now read in file_1, because NR!=FNR
      {
         if($2 in f2){print $0, f2[$2]}}
      }' file_2 file_1

# 10  
Old 12-08-2011
Hi Cybex2011,
your revision/code rewrite is correct.
In my version the next statement is needed because in the second block
I don't check the NR == FNR condition:

Code:
NR == FNR {
  ... some code ...
  next                    # jump directly to the next input line
                          # so the actions in the following blocks
                          #  don't execute when NR == FNR
  }
... here we're sure NR != FNR ...

# 11  
Old 12-09-2011
Hi, radoulov:
Using your code as a foundation, making some remodeling, I got my solutions to solve my own problem.

I intended to interleave and merge my own 2 data files like below.
I realized they are not in array, so I cannot take any advantage of the pseudo index feature of awk.
But I took full advantage of its NR/FNR feature to build a traditional array and get my work done.

My code works for me pretty well except there is tiny imperfection at the very beginning of the output. I can manually fix it in a snap. Its not for commercial use, just for my own hobby, so it's good enough for me for now.

I need to thank you again. Without your support, I couldn't have finished what I thought of so quick. I wish you have a happy holiday.
regards,



==> file_E
Code:
1
00:00:01,400 --> 00:00:10,300
In the air war of Vietnam, one day stands out among all the rest.
2
00:00:10,300 --> 00:00:21,666
May 10, 1972, the full fury of American air power is unleashed on North Vietnam.
3
00:00:22,600 --> 00:00:30,433
More Vietnamese MiGs are shot down on this day than on any other day of the war.


==> file_C
Code:
1
00:00:01,400 --> 00:00:10,300
在越戰空戰史上,有個特殊日子與眾不同。
2
00:00:10,300 --> 00:00:21,666
1972年5月10日,美國空中武力對北越釋出飽和之憤怒。
3
00:00:22,600 --> 00:00:30,433
更多的北越米格機在該天較任何其他日子被擊落。


==> awk code
Code:
awk > "SRT" '{if(NR==FNR && length($0)> 0)
{#now read file_E into f2
f2[FNR]=$0} 
 
else if(length($0)> 0)
{#now read in file_C, because NR!=FNR
{if(FNR==1){end=NR-1}}
{for (i=1; i <= end; i++)
if( $0==f2[i]){print $0; next}
}
# {printf "|%d| --> |%s|\n", FNR, f2[FNR]}
printf "%s\n",f2[FNR];
printf "%s\n\n",$0
}
}' file_E file_C #



==> output:
Code:
1
00:00:01,400 --> 00:00:10,300
In the air war of Vietnam, one day stands out among all the rest.
在越戰空戰史上,有個特殊日子與眾不同。
2
00:00:10,300 --> 00:00:21,666
May 10, 1972, the full fury of American air power is unleashed on North Vietnam.
1972年5月10日,美國空中武力對北越釋出飽和之憤怒。
3
00:00:22,600 --> 00:00:30,433
More Vietnamese MiGs are shot down on this day than on any other day of the war.
更多的北越米格機在該天較任何其他日子被擊落。


p.s.
The file is the subtitles of a History Channel program 《Dogfights》. The 2nd data file is in Chinese. Chinese speaking people are not necessary from China. In Asia, there are 3 countries use Chinese as their official language.
# 12  
Old 12-09-2011
Glad you enjoy awk!
If the input format is constant,
this should work too:

Code:
awk 'FNR == 1 {
  sub(/^\xef\xbb\xbf/, x)
  }
/-->/ { 
  idx = $0 
  $0 in d && $0 = $0 RS d[$0]
  }
  NR == FNR {
    !/^[0-9]+$/ && d[idx] = $0
    next
    }
  1' file_E file_C

Happy Holidays!

Last edited by radoulov; 12-09-2011 at 03:33 PM..
These 2 Users Gave Thanks to radoulov For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two files using four fields

Dear All, I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2... (6 Replies)
Discussion started by: NamS
6 Replies

2. Shell Programming and Scripting

Comparing two files using four fields

I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2 are... (1 Reply)
Discussion started by: NamS
1 Replies

3. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Hi All, I am looking for an awk script to do the following Join the fields together only if the first 4 fields are same. Can it be done with join function in awk?? a,b,c,d,8,,, a,b,c,d,,7,, a,b,c,d,,,9, a,b,p,e,8,,, a.b,p,e,,9,, a,b,p,z,,,,9 a,b,p,z,,8,, desired output: ... (1 Reply)
Discussion started by: aksijain
1 Replies

4. UNIX for Dummies Questions & Answers

Comparing multiple fields from 2 files uing awk

Hi I have 2 files as below File 1 Chr Start End chr1 120 130 chr1 140 150 chr2 130 140 File2 Chr Start End Value chr1 121 128 ABC chr1 144 149 XYZ chr2 120 129 PQR I would like to compare these files using awk; specifically if column 1 of file1 is equal to column 1 of file2... (7 Replies)
Discussion started by: sshetty
7 Replies

5. Shell Programming and Scripting

comparing two files for matching fields

I am newbie to unix and would please like some help to solve the task below I have two files, file_a.text and file_b.text that I want to evaluate. file_a.text 1698.74 1711.88 6576.25 899.41 3205.63 4187.98 697.35 1551.83 ... (3 Replies)
Discussion started by: gameli
3 Replies

6. Shell Programming and Scripting

Problem in comparing 2 fields from 2 files

I've 2 files. Need to compare File1.Field1,File1.Field2 with File2.Field1,File2.Field2. If matches then create a new file. File1 10 A|ADB|967143.24|1006101.5 3E HK|DHB|24294.76|242513.89 ABN ACU|ADB|22104.69|51647.14 ABN BU|DBA|39137.14|109128.38 ABN|ADB|64466.89|167936.55 ABOC... (2 Replies)
Discussion started by: buster
2 Replies

7. Shell Programming and Scripting

Comparing two files and inserting new fields

Hi all, I searched the forum and tried to learn from the similar posts. However, I am new and I need to get help on this. I hope an expert kindly help me to sort this out. I need to compare field 1 and 2 of the first file with the same fields of the second file and if both fields matches... (9 Replies)
Discussion started by: GoldenFire
9 Replies

8. Shell Programming and Scripting

Comparing fields in two files

Hi, i want to compare two files by one field say $3 in file1 needs to compare with $2 in file2. sample file1 - reqd_charge_code 2263881188,24570896,439 2263881964,24339077,439 2263883220,22619162,228 2263884224,24631840,442 2263884246,22612161,442 sample file2 - rg_j ... (2 Replies)
Discussion started by: raghavendra.cse
2 Replies

9. Shell Programming and Scripting

Comparing two files and replacing fields

I have two files with ids and email addresses. File 2 cotains a subset of the records in file 1. The key field is the first field containing the id. file 1: 123|myadr@abc.com 456|myadr2@abc.com 789|myadr3@abc.com file 2: 456|adr456@xyz.com Where the record appears in the second... (3 Replies)
Discussion started by: tltroy
3 Replies

10. Shell Programming and Scripting

Merging two files by comparing three fields

Hi Experts, I need your timely help. I have a problem with merging two files. Here my situation : Here I have to compare first three fields from FILE1 with FILE2. If they are equal, I have to append the remaining values from FILE2 with FILE1 to create the output. FILE1: Class ... (3 Replies)
Discussion started by: Hunter85
3 Replies
Login or Register to Ask a Question