Get columns from another file for match in col 2 in 1st file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Get columns from another file for match in col 2 in 1st file
# 1  
Old 12-27-2009
Get columns from another file for match in col 2 in 1st file

Hi,
My first file has
Code:
592155 9 rs16916098 1
592156 19 rs7249604 1
592157 4 rs885156 1
592158 5 rs350067 1

2nd file has
Code:
9       rs16916098      0       113228129       2       4
19      rs7249604       0       58709070        4       2
2       rs17042833      0       113558750       4       2
4       rs10517059      0       24534680        2       1
4       rs885156        0       140779838       1       2

You will notice that Col3 of file1 = Col2 of file2.
I want to get for each value of Col2 in file 1 the equivalent matching Col 4 from file 2 and append it to the last column of the first file like this..

result_file
Code:
592155     9     rs16916098     1    113228129
592156     19     rs7249604     1  58709070
592157     4     rs885156     1  140779838

Trying to get this done with awk. Found this old thread with a similar problem
https://www.unix.com/shell-programmin...uce-3rd-2.html
But I am not smart enough to figure the way awk works with the provided solution..
Code:
awk -F, 'NR==FNR{a[$1$2]=$3;next}a[$1$2]{$4=a[$1$2];print}' OFS="," file1 file2

It would be very helpful to de-mystify the above code for the solution in the link.


Thanks
~GH

Last edited by genehunter; 12-27-2009 at 10:35 PM..
# 2  
Old 12-28-2009
Debian

Hi, this should do the trick:

Code:
awk 'NR==FNR{A[$1$2]=$4;next} $5=A[$2$3]' file2 file1

output;
Code:
592155 9 rs16916098 1 113228129
592156 19 rs7249604 1 58709070
592157 4 rs885156 1 140779838

You asked for some demystification. The code says:
NR==FNR if we are reading the first input file (i.e. file2 if we are reading file1 then these are unequal) then execute the section in curly brackets.
A[$1$2]=$4 store the value in column 4 in array A with a keyvalue consist of field 1 and 2 glued together
nextread the next line of the input file; do not execute the statements that follow.
$5=A[$2$3] create a fifth field (when reading file1, the second file) and fill it with the previously stored value, but this time unlocked by the value $2 an $3 combined. In this context - outside of curly brackets - this also means that if this last operation is successful then print all fields.

You can introduce a tab character so the output looks neatly arranged if required:
Code:
awk 'NR==FNR{A[$1$2]=$4;next} $5=A[$2$3]' OFS="\t" file2 file1

output:
Code:
592155  9       rs16916098      1       113228129
592156  19      rs7249604       1       58709070
592157  4       rs885156        1       140779838


Last edited by Scrutinizer; 12-28-2009 at 01:18 AM..
# 3  
Old 12-30-2009
Thank you very much and especially for explaining the rationale.
It worked on the sample data. But I dont get output with my real data. I am trying to see if the format is messed up in the 1 million lines somewhere Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies

2. Shell Programming and Scripting

Match Columns in one file and extract columns from another file

Kindly help merging information from two files with the following data structure. I want to match for the CHR-SNP in Foo and get the columns that match from CHROM-rsID Fields 1 & 2 of foo may have duplicates, however, a joint key of Fields $1$2$3$4 is unique. Also would be helpful to clean up... (4 Replies)
Discussion started by: genehunter
4 Replies

3. Shell Programming and Scripting

Match pattern from file 1 with any/all columns in file 2

Hi, I have been looking everywhere for an example so I can try and do this myself but I am having difficulty. I have 2 large files of different sizes and if the pattern in the 3rd column in file 1 is in "any" column in file 2 I want to print all of the line in file 1 and append that line with the... (5 Replies)
Discussion started by: kieranfoley
5 Replies

4. Shell Programming and Scripting

Run a program-print parameters to output file-replace op file contents with max 4th col

Hi Friends, This is the only solution to my task. So, any help is highly appreciated. I have a file cat input1.bed chr1 100 200 abc chr1 120 300 def chr1 145 226 ghi chr2 567 600 unix Now, I have another file by name input2.bed (This file is a binary file not readable by the... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

5. Shell Programming and Scripting

Fortmating text file - single col to X columns

Hi, I have a large file with 100,000+ lines. Each line have a single value. How do I effiecntly format the text to be X number of commas spaced fields in width. i.e how do i turn this input: 26 53 79 80 81 82 111 131 135 139 192 193 198 199 203 209 (2 Replies)
Discussion started by: carlr
2 Replies

6. Shell Programming and Scripting

Match and print columns in second file

Hi All, I have to match each row in file 1 with 1st row in file 2 and print the corresponding column from file2. I am trying to use an awk script to do this. For example cat File1 X1 X3 X4 cat File2 ID X1 X2 X3 X4 A 1 6 2 1 B 2 7 3 3 C 3 8 4 1 D 4 9 1 1 (3 Replies)
Discussion started by: newpro
3 Replies

7. Shell Programming and Scripting

Strings from one file which exactly match to the 1st column of other file and then print lines.

Hi, I have two files. 1st file has 1 column (huge file containing ~19200000 lines) and 2nd file has 2 columns (small file containing ~6000 lines). ################################# huge_file.txt a a ab b ################################## small_file.txt a 1.5 b 2.5 ab ... (4 Replies)
Discussion started by: AshwaniSharma09
4 Replies

8. Shell Programming and Scripting

Compare - 1st col of file

Hi, I have two different files, one has two columns and other has only one column. I would like to compare the first column in the first file with the data in the second file and write a third file with the data that is not present is not common to them. First file:... (26 Replies)
Discussion started by: swame_sp
26 Replies

9. Ubuntu

Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello, I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file. For example: File 1 has 1411 rows, I ignore how many columns it has (thousands) File 2 has 311 rows, 1 column Would like to... (7 Replies)
Discussion started by: sogi
7 Replies

10. UNIX for Dummies Questions & Answers

two files.say a and b.both have long columns.i wanna match the column fron 1st file w

ex: a file has : 122323 123456456 125656879 678989965t635 234323432 b has : this is finance no. this is phone no this is extn ajkdgag idjsidj i want the o/p as: 122323 his is finance no. 123456456 this is phone no 123456456 ... (4 Replies)
Discussion started by: TRUPTI
4 Replies
Login or Register to Ask a Question