Field matching in two data files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Field matching in two data files
# 1  
Old 04-08-2017
Field matching in two data files

Hello,
I am looking to output all of the lines from file2 whose 11th field is present in the first field in file1. Then the second field from file1 should be appended as such:

file1:
Code:
2222 0.35
4444 0.25
5555 0.75

file2:
Code:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 1111
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 3333
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555

Desired output:
Code:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222 0.35
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444 0.25
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555 0.75

Thanks so much!
# 2  
Old 04-08-2017
Hello palex,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{A[$1]=$0;next} ($NF in A){print $0,A[$NF]}'  Input_file1   Input_file2

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 04-08-2017
Column 11 of file2 was duplicated in the output, but this works for me. Thank you so much!
# 4  
Old 04-09-2017
Try, then,
Code:
awk 'FNR==NR{A[$1]=$2;next} ($NF in A){print $0,A[$NF]}' file1   file2
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222 0.35
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444 0.25
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555 0.75

This User Gave Thanks to RudiC For This Post:
# 5  
Old 04-09-2017
Hi.

If the input files are sorted on the fields to be matched, then one can use:
Code:
join [options] <files>

like this:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate blending matched-field files, join.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C join pass-fail

E=expected-output.txt

# Remove old results file.
rm -f f1

pl " Input data files data1, data2:"
head data[12]

pl " Expected output:"
cat $E

# output all of the lines from file2 whose 11th field is present
# in the first field in file1
pl " Results:"
format="2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,2.10,1.1,1.2"
join -t " " -1 1 -2 11 -o "$format" data1 data2 |
tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.7 (jessie) 
bash GNU bash 4.3.30
join (GNU coreutils) 8.23
pass-fail (local) 1.9

-----
 Input data files data1, data2:
==> data1 <==
2222 0.35
4444 0.25
5555 0.75

==> data2 <==
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 1111
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 3333
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555

-----
 Expected output:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222 0.35
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444 0.25
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555 0.75

-----
 Results:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 2222 0.35
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 4444 0.25
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 5555 0.75

-----
 Verify results if possible:

-----
 Comparison of 3 created lines with 3 lines of desired results:
 Succeeded -- files (computed) f1 and (standard) expected-output.txt have same content.

See man join and experiment.

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Continued trouble matching fields in different files and selective field printing ([g]awk)

I apologize in advance, but I continue to have trouble searching for matches between two files and then printing portions of each to output in awk and would very much appreciate some help. I have data as follows: File1 PS012,002 PRQ 0 1 1 17 1 0 -1 3 2 1 2 -1 ... (7 Replies)
Discussion started by: jvoot
7 Replies

2. Shell Programming and Scripting

awk to update field using matching value in file1 and substring in field in file2

In the awk below I am trying to set/update the value of $14 in file2 in bold, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

Matching two files with special field separator

Hello, I have a file with such structure: >ENSGALG00000000011|ENSGALT00000000012|57|1123|1125 AACTGTGTGTTTTTT >ENSGALG00000000012|ENSGALT00000000013|57|1145|1155 AAAAAAGGTCCTGTGTGC >ENSGALG00000000015|ENSGALT00000000014|57|1144|1155 AAAATGTGTGTGTGTGTGTGTG I want to use another file... (8 Replies)
Discussion started by: Homa
8 Replies

4. Shell Programming and Scripting

Matching and Merging csv data fields based on a common field

Dear List, I have a file of csv data which has a different line per compliance check per host. I do not want any omissions from this csv data file which looks like this: date,hostname,status,color,check 02-03-2012,COMP1,FAIL,Yellow,auth_pass_change... (3 Replies)
Discussion started by: landossa
3 Replies

5. Shell Programming and Scripting

Help to retrieve data from two files matching a string

Hello Experts, I have come back to this forum after a while now, since require a better way to get my result.. My query is as below.. I have 3 files -- 1 Input file, 2 Data files .. Based on the input file, data has to be retreived matching from two files which has one common key.. For EX:... (4 Replies)
Discussion started by: shaliniyadav
4 Replies

6. Shell Programming and Scripting

Compare two files and get matching data

Hi, Can anyone help me to compare two files and get the matching data... say i have file1 and file2 ... file1 has 300 unique data with that i need to match with file2 to see how may are matching.. file2 have 1000 records. (4 Replies)
Discussion started by: zooby
4 Replies

7. Shell Programming and Scripting

Field matching between 2 files

Okay so I'm pretty new to scripting therefore this problem seems pretty tough. I have a main file that has a column of IP addresses and I have to compare it with 3 separate files that also have IP address columns. These 3 files are automatically generated from 3 different servers. Each time... (2 Replies)
Discussion started by: Spunkerspawn
2 Replies

8. Shell Programming and Scripting

Matching lines across multiple csv files and merging a particular field

I have about 20 CSV's that all look like this: "","","","","","","","","","","","","","","",""What I've been told I need to produce is the exact same thing, but with each file now containing the start_code from every other file where the email matches. It doesn't matter if any of the other... (1 Reply)
Discussion started by: Demosthenes
1 Replies

9. Shell Programming and Scripting

How to find the matching data b/w 2 files in perl?

Hi friends,, i have find the matching data between 2files. My file1 have a data like rs3001336 rs3984736 rs2840532 File2 have a data like rs3736330 1 2359237 A G 0.28 1.099 0.010 rs2840532 1 2359977 G A 0.363 0.3373 1.123 rs3001336 1 ... (4 Replies)
Discussion started by: sureshraj
4 Replies

10. Programming

How to find the matching data b/w 2 files in perl?

Hi friends,, i have find the matching data between 2files. My file1 have a data like rs3001336 rs3984736 rs2840532 File2 have a data like rs3736330 1 2359237 A G 0.28 1.099 0.010 rs2840532 1 2359977 G A 0.363 0.3373 1.123 rs3001336 1 2365193 G A 0.0812 0.07319 1.12 ... (1 Reply)
Discussion started by: sureshraj
1 Replies
Login or Register to Ask a Question