Comparing multiple fields from 2 files uing awk


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Comparing multiple fields from 2 files uing awk
# 1  
Old 08-17-2012
Comparing multiple fields from 2 files uing awk

Hi
I have 2 files as below
File 1
Chr Start End
chr1 120 130
chr1 140 150
chr2 130 140

File2
Chr Start End Value
chr1 121 128 ABC
chr1 144 149 XYZ
chr2 120 129 PQR

I would like to compare these files using awk; specifically if column 1 of file1 is equal to column 1 of file2 (and column 2 of file 1 if less than column 2 of file2 and column3 of file1 is greater than column 3 of file2). If all 3 conditions satisfy then extract the entire line of file2.

I know that one of file has to be loaded into an array, but just dont know how to index all the fields.

Any help appreciated.
Thanks
# 2  
Old 08-18-2012
Have a go with this:

Code:
awk '
    NR == 1 { getline <f2 buf; next; }  # skip header record (remove this line if no header)

    {
        getline buf <f2;
        split( buf, a, " " );
        if( $1 == a[1]  && $2+0 < a[2]+0 && $3 > a[3] )
            printf( "%s\n", buf );
    }
' f2="file-2" file-1

This assumes that there are the exact same number of lines in each file, and that records align between both files.
# 3  
Old 08-20-2012
Thanks!!!

Thanks a lot!!!! Your code worked perfectly for my test file.
However the file sizes are not the same. Also there is no one to one correspondence between the lines in the 2 files.
I would like each row of file1 to search through every row of file 2 and return all values in file 2 which were within the range of the single line in file 1.
For example:
1st row of file1
chr1 201 301

File2 (all rows)
chr1 220 230 abc
chr2 400 500 xyz
chr1 290 300 pqr

Result
chr1 201 301 abc
chr1 201 301 pqr

since both of these are within the range of 2 rows in file2. Therefore I was thinking of loading the file2 in an array and then looping over this array with each row of file1.

I would greatly appreciate any input.
Thanks once again.
# 4  
Old 08-20-2012
You have provided contradictory information.

Quote:
Originally Posted by sshetty
If all 3 conditions satisfy then extract the entire line of file2
But your sample output in the following is not extracting the entire line from file2. It is using the entire line from file1 with the final field in file2 appended.

Quote:
Originally Posted by sshetty
1st row of file1
chr1 201 301

File2 (all rows)
chr1 220 230 abc
chr2 400 500 xyz
chr1 290 300 pqr

Result
chr1 201 301 abc
chr1 201 301 pqr
Regards,
Alister

---------- Post updated at 01:42 PM ---------- Previous update was at 01:38 PM ----------

Assuming that your sample output in your most recent post is accurate ...
Code:
join f1 f2 | awk '$2<$4+0 && $3>$5+0 {print $1,$2,$3,$6}'

... will do the job if you sort f1 and f2.

Regards,
Alister

Last edited by alister; 08-20-2012 at 02:58 PM..
# 5  
Old 08-20-2012
Thanks Allister!!
Apologize for the mistake in my earlier post. Yes the most recent output example is the desired output.
I am not very familiar with awk and therefore do not understand what +0 does in this statement :$2<$4+0
Thanks

---------- Post updated at 01:04 PM ---------- Previous update was at 12:57 PM ----------

"Also, why is the highlighted line included in the result? Column 3 in file1 is not greater than column 3 in file2."

Allister, like I had mentioned I was looking to search through the entire file2, not just the corresponding lines in both files.
Therefore first line of file1 is repeated, since it overlaps (ie, $2 (file1)<$2(file2) and $3(file1)> $3(file2).
# 6  
Old 08-20-2012
The addition coerces one of the operands to numeric type. This, along with the numeric nature of the strings in your fields, ensures that the comparison is done numerically and not lexicographically. If the comparison were done using string types, then 2 is greater than 100, instead of less.

Regards,
Alister

---------- Post updated at 02:08 PM ---------- Previous update was at 02:06 PM ----------

Quote:
Originally Posted by sshetty
"Also, why is the highlighted line included in the result? Column 3 in file1 is not greater than column 3 in file2."
I only had that up there for about two minutes or so. It was the result of a minor brain aneurysm on my part. Column 3 in file1 obviously is greater than column 3 in file2. Disregard it and let us never speak of this again. Smilie

Regards,
Alister
# 7  
Old 08-20-2012
Thanks for explaining Allister. Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: matching multiple fields between 2 files

Hi, I have 2 tab-delimited input files as follows. file1.tab: green A apple red B apple file2.tab: apple - A;Z Objective: Return $1 of file1 if, . $1 of file2 matches $3 of file1 and, . any single element (separated by ";") in $3 of file2 is present in $2 of file1 In order to... (3 Replies)
Discussion started by: beca123456
3 Replies

2. Shell Programming and Scripting

awk arrays comparing multiple columns across two files.

Hi, I'm trying to use awk arrays to compare values across two files based on multiple columns. I've attempted to load file 2 into an array and compare with values in file 1, but success has been absent. If anyone has any suggestions (and I'm not even sure if my script so far is on the right lines)... (4 Replies)
Discussion started by: hubleo
4 Replies

3. Shell Programming and Scripting

Download multiple files uing wget

Need Assistance . Using wget how can i download multiple files from http site. Http doesnt has wild card (*) but FTP has it . Any ideas will be appreciative. wget --timeout=120 --append-output=output.txt --no-directories --cut-dirs=1 -np -m --accept=grib2 -r http://sample.com/... (4 Replies)
Discussion started by: ajayram_arya
4 Replies

4. Shell Programming and Scripting

UNIX append field with comparing fields from multiple column

I have a csv dump from sql server that needs to be converted so it can be feed to another program. I already sorted on field 1 but there are multiple columns with same field 1 where it needs to be compared against and if it is same then append field 5. i.e from ANG SJ,0,B,LC22,LC22(0) BAT... (2 Replies)
Discussion started by: nike27
2 Replies

5. Shell Programming and Scripting

Comparing two files using four fields

Dear All, I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2... (6 Replies)
Discussion started by: NamS
6 Replies

6. Shell Programming and Scripting

Comparing two files using four fields

I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2 are... (1 Reply)
Discussion started by: NamS
1 Replies

7. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Hi All, I am looking for an awk script to do the following Join the fields together only if the first 4 fields are same. Can it be done with join function in awk?? a,b,c,d,8,,, a,b,c,d,,7,, a,b,c,d,,,9, a,b,p,e,8,,, a.b,p,e,,9,, a,b,p,z,,,,9 a,b,p,z,,8,, desired output: ... (1 Reply)
Discussion started by: aksijain
1 Replies

8. Programming

comparing two fields from two different files in AWK

Hi, I have two files formatted as following: File 1: (user_num_ID , realID) (the NR here is 41671) 1 cust_034_60 2 cust_80_91 3 cust_406_4 .. .. File 2: (realID , clusterNumber) (total NR here is 1000) cust_034_60 2 cust_406_4 3 .. .. (11 Replies)
Discussion started by: amarn
11 Replies

9. Shell Programming and Scripting

Comparing fields in two files

Hi, i want to compare two files by one field say $3 in file1 needs to compare with $2 in file2. sample file1 - reqd_charge_code 2263881188,24570896,439 2263881964,24339077,439 2263883220,22619162,228 2263884224,24631840,442 2263884246,22612161,442 sample file2 - rg_j ... (2 Replies)
Discussion started by: raghavendra.cse
2 Replies

10. Shell Programming and Scripting

awk print fields to multiple files?

I am trying to print the output of a command to two separate files. Is it possible to use awk to print $1 to one file and $2 to another file? Thanks in advance! (1 Reply)
Discussion started by: TheCrunge
1 Replies
Login or Register to Ask a Question