awk arrays comparing multiple columns across two files.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk arrays comparing multiple columns across two files.
# 1  
Old 01-02-2014
awk arrays comparing multiple columns across two files.

Hi, I'm trying to use awk arrays to compare values across two files based on multiple columns. I've attempted to load file 2 into an array and compare with values in file 1, but success has been absent. If anyone has any suggestions (and I'm not even sure if my script so far is on the right lines) it would be very much appreciated.

file1
Code:
A 1 10 20
B 1 35 50
C 2 40 50
D 2 65 100
E 3 10 30
F 3 20 40
G 4 25 50
H 4 45 70

file2
Code:
ID1 11 16 1
ID2 75 100 1
ID3 45 47 2
ID4 15 30 3
ID5 40 45 4
ID6 55 65 4

Essentially, if column 2 in file 1, and column 4 in file 2 match, continue.
Then, if column 2 in file 2 is >= column 3 in file 1
and column 3 in file 2 is <= column 4 in file 1
print out matching line of file 1, and column 1,2 and 3 from file2.

Desired output
Code:
A 1 10 20 ID1 11 16
C 2 40 50 ID3 45 47
E 3 10 30 ID4 15 30
G 4 25 50 ID5 40 45
H 4 45 70 ID6 55 65

Using a pseudo script based on similar problems online, I've got:
Code:
awk 'FNR == NR
{
f2[$0]++
next
}
{
for (i in f2)
{
split(i,f2_split)
if ((f2_split[4] == $2) && (f2_split[2] >= $3) && (f2_split[3] <= $4))
{print $0, f2_split[1],f2_split[2],f2_split[3]
}
}
}' file2 file1

This does a fantastic job of printing out the contents of file 2. Alas it's not what I was after. Any help would be much appreciated.
# 2  
Old 01-02-2014
Try:
Code:
awk 'NR==FNR{a[NR]=$0;next}{for (i in a){split(a[i],x," ");if (x[4]==$2&&x[2]>=$3&&x[3]<=$4)print $0,x[1],x[2],x[3]}}' file2 file1

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 01-02-2014
Thanks, that worked a treat. Had a little play around with that code, and realised one of the issues with my initial code was the use of new lines in inappropriate places. If I put a line between the awk 'NR==FNR and the rest of your code, that just prints out file 2 as well.
Code:
awk 'NR==FNR
{a[NR]=$0;next}{for (i in a){split(a[i],x," ");if (x[4]==$2&&x[2]>=$3&&x[3]<=$4)print $0,x[1],x[2],x[3]}}' file2 file1

A valuable lesson that I'd previously overlooked. Thanks for your help.
# 4  
Old 01-02-2014
You can put a newline like this:
Code:
awk 'NR==FNR{
a[NR]=$0;next}{for (i in a){split(a[i],x," ");if (x[4]==$2&&x[2]>=$3&&x[3]<=$4)print $0,x[1],x[2],x[3]}}' file2 file1

This User Gave Thanks to bartus11 For This Post:
# 5  
Old 01-02-2014
Awesome, that's really useful. A rookie mistake no doubt but one that I won't make again any time soon, hopefully! Thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Appending different columns of multiple files in awk

Hello All, I have three input files cat file1 col1|col2|col3 a|1|A b|2|B cat file2 col1|col2|col3 c|3|C cat file3 col1|col2|col3 d|4|D e|5|E i want below output file4 col1|col2 a|1 (6 Replies)
Discussion started by: looney
6 Replies

2. Shell Programming and Scripting

Comparing multiple columns using awk

Hello All; I have two files with below conditions: 1. Entries in file A is missing in file B (primary is field 1) 2. Entries in file B is missing in file A (primary is field 1) 3. Field 1 is present in both files but Field 2 is different. Example Content: File A ... (4 Replies)
Discussion started by: mystition
4 Replies

3. UNIX for Advanced & Expert Users

Need help in comparing multiple columns from two files.

Hi all, I have two files as below. I need to compare field 2 of file 1 against field 1 of file 2 and field 5 of file 1 against filed 2 of file 2. If both matches , then create a result file 1 with first file data and if not matches , then create file with first fie data. Please help me in... (12 Replies)
Discussion started by: sivarajb
12 Replies

4. Shell Programming and Scripting

awk script to split file into multiple files based on many columns

So I have a space delimited file that I'd like to split into multiple files based on multiple column values. This is what my data looks like 1bc9A02 1 10 1000 FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH 1ku1A02 1 10... (9 Replies)
Discussion started by: viored
9 Replies

5. UNIX for Dummies Questions & Answers

Comparing multiple fields from 2 files uing awk

Hi I have 2 files as below File 1 Chr Start End chr1 120 130 chr1 140 150 chr2 130 140 File2 Chr Start End Value chr1 121 128 ABC chr1 144 149 XYZ chr2 120 129 PQR I would like to compare these files using awk; specifically if column 1 of file1 is equal to column 1 of file2... (7 Replies)
Discussion started by: sshetty
7 Replies

6. Shell Programming and Scripting

AWK: Comparing two columns from two different files

Hi - I have two files as follows: File 1: chr5 118464905 118465027 ENST00000514151 utr5 0 + chr5 118464903 118465118 ENST00000504031 utr5 0 + chr5 118468826 118469180 ENST00000504031 utr5 0 + chr5 118469920 118470084 ... (14 Replies)
Discussion started by: polsum
14 Replies

7. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

8. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}... (1 Reply)
Discussion started by: orcaja
1 Replies

9. Shell Programming and Scripting

comparing the values of repeated keys in multiple columns

Hi Guyz The 1st column of the input file has repeated keys like x,y and z. The ist task is if the 1st column has unique key (say x) and then need to consider 4th column, if it is + symbol then subtract 2nd column value with 3rd column value (we will get 2(10-8)) or if it is - symbol subtract 3rd... (3 Replies)
Discussion started by: repinementer
3 Replies

10. Shell Programming and Scripting

awk 3 files to one based on multiple columns

Hi all, I have three files, one is a navigation file, one is a depth file and one is a file containing the measured field of gravity. The formats of the files are; navigation file: 2006 320 17 39 0 0 *nav 21.31542 -157.887 2006 320 17 39 10 0 *nav 21.31542 -157.887 2006 320 17 39 20 0... (2 Replies)
Discussion started by: andrealphus
2 Replies
Login or Register to Ask a Question