Matching 2 files based on one column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching 2 files based on one column
# 1  
Old 11-11-2010
Matching 2 files based on one column

Hi,

On a similar subject, the following. I have two files:

file1.txt
Code:
dbSNP_rsID,Chromosome,Position,Gene
rs10399749,chr. 01,45162,?
rs4030303,chr. 01,72434,?
rs4030300,chr. 01,72515,?
rs940550,chr. 01,78032,?
rs13328714,chr. 01,81468,?
rs11490937,chr. 01,222077,?
rs6683466,chr. 01,524446,"OR4F29, OR4F16, OR4F3"
rs12025928,chr. 01,536560,"OR4F29, OR4F16, OR4F3"

file2.txt
Code:
dbSNP_rsID	R_square
rs6650104	0.000
rs9629043	0.000
rs11497407	0.000
rs12565286	0.332
rs11804171	0.338
rs2977670	0.352
rs2977656	0.000

And now I would like to match the files (file1 to file2) on dbSNP_rsID so that I get a new file:

file_new.txt
dbSNP_rsID,R_square,Chromosome,Position,Gene
rs12565286,0.332,chr. 01,711153,"OR4F29, OR4F16, OR4F3"


I've really tried to understand the code you guys have supplied, but I really don't understand. SmilieCould you please help me with a code? Smilie

Thanks!

Sander

Last edited by Franklin52; 11-12-2010 at 05:44 AM.. Reason: Please use code tags!
# 2  
Old 11-11-2010
Hi Sander,

Welcome to the forum.. In future it is best to create a new thread and then refer to this thread for example..
Your case is a little bit simpler, except for the fact that your input files use different field separators.
You could give this a try:
Code:
awk 'NR==FNR{A[$1]=$2;next}A[$1]{$2=A[$1] FS $2;print}' file2 FS=, OFS=, file1

Which will print only those records that have a match in file2 and it will use the value in the second field from that file...
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 11-12-2010
Code

Hi,

Thanks a bunch! It work! Got now a new file with this code:

Code:
awk 'NR==FNR{A[$1]=$2;next}A[$1]{$2=A[$1] FS $2;print}' file2.txt FS=, OFS=, file1.txt > file3.txt

Thank you also for pointing out how the forum works. I just have some questions about the code though: can you explain the parts? I don't fully understand what each part does, and than if I'd understand I could learn maybe new commands to work my files.Smilie

Thanks again.

Sander

---------- Post updated at 10:17 AM ---------- Previous update was at 10:13 AM ----------

Also I'd like to have the original file2 to be added upon with those columns from file1 that match. If there's no match, I just want the position in file2 to remain blank for those columns. Is that possible too?
# 4  
Old 11-12-2010
Can you post the desired output, based on the given 2 files?

Please use code tags.
# 5  
Old 11-12-2010
Hi,

Here's the file I'd like to get:

d
Quote:
bSNP_rsID,R_square,Chromosome,Position,Gene
rs6650104,0.000,,,
rs9629043,0.000,,,
rs11497407,0.000,,,
rs12565286,0.332,chr. 01,711153,?
rs11804171,0.338,chr. 01,713682,?
rs2977670,0.352,chr. 01,713754,?
rs3094315,0.997,chr. 01,742429,NCRNA00115
rs3131972,0.988,chr. 01,742584,?
rs3115860,0.934,chr. 01,743268,?
rs2073813,0.950,chr. 01,743404,?
For instance the first three lines are not in file1, but are in file2. I'd like to keep them in the final file3. So basically I'd like to add the information from file1 to file2, if there's information, if not, just leave it blank.

Thanks.

Sander
# 6  
Old 11-12-2010
Like this?
Code:
awk 'NR==FNR{A[$1]=$0;next}{if(A[$1]){sub(/[^,]*/,"",A[$1]);$2=$2 A[$1]}else $2=$2 ",,,"}1' FS=, OFS=, file1 FS="[ \t]*" file2

---------- Post updated at 12:30 ---------- Previous update was at 12:21 ----------

Or shorter:
Code:
awk 'NR==FNR{p=$1;$1=x;A[p]=$0;next}{$2=$2(A[$1]?A[$1]:",,,")}1'  FS=, OFS=, file1 FS="[ \t]*" file2

---------- Post updated at 12:56 ---------- Previous update was at 12:30 ----------

Quote:
Originally Posted by swvanderlaan
[..]I just have some questions about the code though: can you explain the parts? I don't fully understand what each part does, and than if I'd understand I could learn maybe new commands to work my files.Smilie
NR==FNR If we are reading the first file (The variables NR and FNR are only equal when reading the first file)
A[$1]=$2 store the second field in array a using the index of the first field
nextproceed to read the next record
A[$1]if A[$1] exists (using the $1 of the second file)
$2=A[$1] FS $2;print then append FS (a comma) followed by A[$1] (using the $1 of the second file)
Fs=,set the input file seperator to ","
OFS=,set the output file seperator to ","
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Matching 2 files based on key

Hi all I have two files I need to match record from first file and second file on column 1,8 and and output only match records on file1 File1: 020059801803180116130926800002090000800231000245204003160000000002000461OUNCE000000350000100152500BM01007W0000 ... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies

2. Linux

Merge two files based on matching criteria

Hi, I am trying to merge two csv files based on matching criteria: File description is as below : Key_File : 000|ÇÞ|Key_HF|ÇÞ|Key_FName 001|ÇÞ|Key_11|ÇÞ|Sort_Key22|ÇÞ|Key_31 002|ÇÞ|Key_12|ÇÞ|Sort_Key23|ÇÞ|Key_32 003|ÇÞ|Key_13|ÇÞ|Sort_Key24|ÇÞ|Key_33 050|ÇÞ|Key_15|ÇÞ|Sort_Key25|ÇÞ|Key_34... (3 Replies)
Discussion started by: PK29
3 Replies

3. Shell Programming and Scripting

Insert value of column based on file name matching

At the top of the XYZ file, I need to insert the ABC data value of column 2 only when ABC column 1 matches the prefix XYZ file name (not the ".txt"). Is there an awk solution for this? ABC Data 0101 0.54 0102 0.48 0103 1.63 XYZ File Name 0101.txt 0102.txt 0103.txt ... (7 Replies)
Discussion started by: ncwxpanther
7 Replies

4. Shell Programming and Scripting

Matching two files per column

Hi, I hope somebody can help me with this problem, since I would like to solve this problem using awk, but im not experienced enough with this. I have two files which i want to match, and output the matching column name and row number. One file contains 4 columns like this: FILE1: a ... (6 Replies)
Discussion started by: Jenna.bos
6 Replies

5. Shell Programming and Scripting

Based on column in file1, find match in file2 and print matching lines

file1: file2: I need to find matches for any lines in file1 that appear in file2. Desired output is '>' plus the file1 term, followed by the line after the match in file2 (so the title is a little misleading): This is honestly beyond what I can do without spending the whole night on it, so I'm... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

6. UNIX for Dummies Questions & Answers

How to fetch files right below based on some matching criteria?

I have a requirement where in i need to select records right below the search criteria qwertykeyboard white 10 20 30 30 40 50 60 70 80 qwertykeyboard black 40 50 60 70 90 100 qwertykeyboard and white are headers separated by a tab. when i execute my script..i would be searching... (4 Replies)
Discussion started by: vinnu10
4 Replies

7. Shell Programming and Scripting

awk print non matching lines based on column

My item was not answered on previous thread as code given did not work I wanted to print records from file2 where comparing column 1 and 16 for both files find rows where column 16 in file 1 does not match column 16 in file 2 Here was CODE give to issue ~/unix.com$ cat f1... (0 Replies)
Discussion started by: sigh2010
0 Replies

8. UNIX for Dummies Questions & Answers

Removing Lines based on matching first column

I have a file1 that looks like this: File 1 a b b c c e d e and a file 2 that looks like this: File 2 b c e e Note that file 2 is the right hand column from file1. I want to remove any lines from file1 that begin with the column in file2. In this case the desired output... (6 Replies)
Discussion started by: kschiltz55
6 Replies

9. Shell Programming and Scripting

Matching words based on column headers

Hi , Pls help on this. Input file: NAME1 BSC1 TEXT ID 1 MAINSFAIL TEXT ID 2 DGON TEXT ID 3 lOADONDG NAME2 BSC2 TEXT ID 1 DGON TEXT ID 3 lOADONG (1 Reply)
Discussion started by: bha148
1 Replies

10. Shell Programming and Scripting

Compare files column to column based on keys

Here is my situation. I need to compare two tab separated files (diff is not useful since there could be known difference between files). I have found similar posts , but not fully matching.I was thinking of writing a shell script using cut and grep and while loop but after going thru posts it... (2 Replies)
Discussion started by: blackjack101
2 Replies
Login or Register to Ask a Question