Home Man
Today's Posts

This forum is closed for new posts. Please post beginner questions to learn unix and learn linux in the UNIX for Beginners Questions & Answers forum.

Join 2 files with multiple columns: awk/grep/join?



Thread Tools Search this Thread
# 1  
Old 12-01-2009
Data Join 2 files with multiple columns: awk/grep/join?

My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns:

File A: (tab-delimited)
PDB CHAIN Start End Fragment
1avq A 171 176 awyfan
1avq A 172 177 wyfany
1c7k A 2 7 vtvtyd
1c7k A 3 8 tvtydp

File B: (tab-delimited)
PDB CHAIN Start End Fragment

1cfe A 104 109 awyfan
1cfe A 105 110 lgcgra
1dk0 A 50 55 awyfan
1d3g A 83 88 fveigs
1d3g A 84 89 vtvtyd
1dk0 A 51 56 gsqyai

I want to join the rows of two tables based on overlaps of the fifth column (column fragment). As such, the output should read:

Fragment PDB CHAIN Start End PDB CHAIN Start End
awyfan 1avq A 171 176 1cfe A 104 109
awyfan 1avq A 171 176 1dk0 A 50 55
vtvtyd 1c7k A 2 7 1d3g A 84 89

Kindly note that there can be multiple overlaps between Files A and B.

I have read of using the join function in Unix, but when I tried it I ended up with output of File A only. (I tried join -1 1 FileA FileB) based on reading a similar thread.
Alternatively, I know that I can use awk (NC=='5' I think), but I am not very familiar with the script. I know how to fgrep if the file contains a single column, but not multiple columns.
Is there a simple way to write this???

Your help will be really appreciated!
Thanks in advance!
# 2  
Old 12-01-2009
I think you'll probably have to use something like awk since you also reorder 5 column to column 1. E.g.:
awk '{i=$5;$5=x} NR==FNR{A[i]=$0;next} A[i]{print i,A[i]$0}' fileA fileB

Last edited by Scrutinizer; 12-01-2009 at 08:06 PM..
# 3  
Old 12-01-2009
join -t'TAB' -j5 -o1.5,1.1,1.2,1.3,1.4,2.1,2.2,2.3,2.4 <(sort -k5 FileA) <(sort -k5 FileB)

Last edited by binlib; 12-01-2009 at 07:40 PM..
# 4  
Old 12-01-2009
Thank you both for your replies. Unfortunately I have tried both of them,

1. with the awk script I only ended up getting the headers as the output:
PDB CHAIN StartPos EndPos PDB CHAIN StartPos EndPos

2. With the join command I output to a txt file but I ended up with nothing?

Could you both kindly explain the code to me? Maybe if I get the logic then I can modify it accordingly. Thanks!

---------- Post updated at 04:45 PM ---------- Previous update was at 04:39 PM ----------

I doubled checked, I had wrongly-formatted files so the commands weren't working properly. Both working find now!
Thank you very much!!

PS: Is there a way however to modify it so that I don't end up with duplicates:
For example, I end up with
glelk 1avq A 129 133 1avq A 129 133
which is basically an identical row in both FilesA and FilesB

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Join columns across multiple lines in a Text based on common column using BASH nv186000 Shell Programming and Scripting 6 03-06-2018 08:14 AM
Join two files combining multiple columns and produce mix and match output mady135 Shell Programming and Scripting 2 11-29-2014 02:56 AM
Join multiple files fat Shell Programming and Scripting 4 03-18-2014 04:11 AM
Join two files with matching columns unkleruckus Shell Programming and Scripting 6 06-25-2013 07:15 PM
Sort and join multiple columns using awk quincyjones Shell Programming and Scripting 8 01-31-2013 10:34 AM
Join 4 files on first three columns jacobs.smith Shell Programming and Scripting 6 08-30-2012 01:51 PM
How to use the the join command to join multiple files by a common column evelibertine UNIX for Dummies Questions & Answers 5 07-05-2012 04:15 PM
Awk - join multiple files quincyjones Shell Programming and Scripting 2 07-02-2012 12:39 AM
sql,multiple join,outer join issue robbiezr Programming 0 06-05-2009 11:26 PM
Join columns from 2 files osramos Shell Programming and Scripting 2 11-14-2007 04:25 AM

All times are GMT -4. The time now is 04:00 PM.

Unix & Linux Forums Content Copyrightę1993-2018. All Rights Reserved.
Show Password