Merging two files based on two columns to make a third file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Merging two files based on two columns to make a third file
# 1  
Old 04-26-2011
Merging two files based on two columns to make a third file

Hi there,

I'm trying to merge two files and make a third file.

However, two of the columns need to match exactly in both files AND I want everything from both files in the output if the two columns match in that row.

First file looks like this:

Code:
chr1    10001980    T    A

Second file looks like this:

Code:
1    chr1    41980    41981    snp    A    G    dbsnp.86:rs806721

I need column 1 and 2 in file 1 to match column 2 and 4 in file 2, respectively.

Any help you can provide is very much appreciated.

I've tried using the join function and awk, but I've failed miserably at both.

Thank you.
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 04-26-2011 at 06:00 PM.. Reason: code tags, please!
# 2  
Old 04-26-2011
Are you matching row1 with row1, row2 with row2, .. so on and so forth.. or can a match exist in any row?

Also are you trying to write this in a specific language or does it matter?
# 3  
Old 04-26-2011
A match can exist in any row... the first file should ALL match in some way to an entry in the second file, but the second file has thousands of rows that will not match to the first file.

I'm working in bash on a unix device (macbook) in terminal.

Thank you !!
# 4  
Old 04-26-2011
This may not be the most efficient way if you're files are really large but it works. Let me know if any of it is unclear.

Code:
Code:
#!/bin/bash
while read line
do
    col1=$(echo $line | awk '{print $1}')
    col2=$(echo $line | awk '{print $2}')
    row_from_file2=$(grep " $col1 " file2 | grep " $col2 ")
    file2_col2=$(echo $row_from_file2 | awk '{print $2}')
    file2_col4=$(echo $row_from_file2 | awk '{print $4}')
    if [[ "$col1" = "$file2_col2" && "$col2" = "$file2_col4" ]]
    then
        echo "Found a match"
        echo "$line $row_from_file2" >> file3
    fi
done < file1

Output:
Code:
# cat file1
chr1 10001980 T A
# cat file2
1 chr1 41980 41981 snp A G dbsnp.86:rs806721
1 chr1 41980 10001980 snp A G dbsnp.86:rs806721
# ./match_file1_file2.bash
Found a match
# cat file3
chr1 10001980 T A 1 chr1 41980 10001980 snp A G dbsnp.86:rs806721

# 5  
Old 04-26-2011
wow, this is incredibly sophisticated compared to the basic unix that I know.

the only thing I'm confused about is where to put the file "paths"... each time it says file2 do i write the actual file path in? ie. /users/etc/
# 6  
Old 04-26-2011
Yep. "file1", "file2" were all local to my working directory. I'm a big fan of using full paths in my scripts so I would suggest it. I just wrote everything local to be just get the logic working. Happy scripting.
# 7  
Old 04-26-2011
Code:
nawk 'FNR==NR{f1[$1,$2]=$0;next}{idx=$2 SUBSEP $4; if(idx in f1) $0=f1[idx] OFS $0}1' file1 file2

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Merging two files based on matching columns

Hi, I am facing issues while accomplishing below task. We have two files Test1.txt and Test2.txt. We have to match 1st column of Test1.txt file with 2nd column of Test2.txt and then merge 2nd file with the 1st file. In the output we should select column 1 and 2 from the 1st file and column 1... (5 Replies)
Discussion started by: Prathmesh
5 Replies

2. Shell Programming and Scripting

Make copy of text file with columns removed (based on header)

Hello, I have some tab delimited text files with a three header rows. The headers look like, (sorry the tabs look so messy). index group Name input input input input input input input input input input input... (9 Replies)
Discussion started by: LMHmedchem
9 Replies

3. Shell Programming and Scripting

Merging two file based on comparison of first columns

Respected Members. Hello. This is my first post in the forum. I will try to follow all the rules as prescribed by the forum. In case of non-compliance, I request you to kindly give me some more time to understand and abide by them. I am working on two files. I wish to merge the two files... (6 Replies)
Discussion started by: manojmalhotra
6 Replies

4. Shell Programming and Scripting

Merging two file based on comparison of first columns

Respected Members. Hello. This is my first post in the forum. I will try to follow all the rules as prescribed by the forum. In case of non-compliance, I request you to kindly give me some more time to understand and abide by them. I am working on two files. I wish to merge the two files... (1 Reply)
Discussion started by: manojmalhotra
1 Replies

5. Shell Programming and Scripting

Merging columns based on one or more column in two files

I have two files. FileA.txt 30910 rs7468327 36587 rs10814410 91857 rs9408752 105797 rs1133715 146659 rs2262038 152695 rs2810979 181843 rs3008128 182129 rs3008131 192118 rs3008170 FileB.txt 30910 1.9415219673 0 36431 1.3351312477 0.0107191428 36587 1.3169171182... (2 Replies)
Discussion started by: genehunter
2 Replies

6. UNIX for Dummies Questions & Answers

Merging two text files by two columns

Hi, I have two text files that I would like to merge/join. I would like to join them if the first columns of both text files match and the second column of the first text file matches the third column of the second text file. Example input: First file: 1334 10 0 0 1 5.2 1334 12 0 0 1 4.5... (4 Replies)
Discussion started by: evelibertine
4 Replies

7. Shell Programming and Scripting

Merging columns from multiple files

Hello, I have a number of tab delimited data files consists of two columns. Like that: File1 800.000000 0.002744 799.000000 0.002517 798.000000 0.002836 797.000000 0.002553 FIle2 800.000000 0.000261 799.000000 0.000001 798.000000 0.000551 797.000000 0.000275 File3... (19 Replies)
Discussion started by: erden
19 Replies

8. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies

9. Shell Programming and Scripting

Merging columns from multiple files in one file

Hi, I want to select columns from multiple files and combine them in one file. The files are simulation-data-files with 23 columns each and about 50 rows. I now use: cut -f 11 Sweep?wing-30?scale=0.?0?fan2?.txt | pr -3 | awk '{printf("\n%s\t%s\t%s",$1,$2,$3)}' > ../Data_Processed/output.txtI... (1 Reply)
Discussion started by: isgoed
1 Replies

10. Shell Programming and Scripting

merging few columns of two text files to a new file

hi i need to select a few columns of two txt files and write it to a new file. there is one common field for both of these files. plz help me in this thanks in advance (4 Replies)
Discussion started by: kolvi
4 Replies
Login or Register to Ask a Question