Unix/Linux Go Back    


UNIX for Dummies Questions & Answers This forum is closed for new posts. Please post beginner questions to learn unix and learn linux in this forum UNIX for Beginners Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

UNIX for Dummies Questions & Answers


 
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 12-01-2009   -   Original Discussion by InfoSeeker
InfoSeeker InfoSeeker is offline
Registered User
 
Join Date: Nov 2008
Last Activity: 22 October 2010, 2:07 PM EDT
Posts: 12
Thanks: 0
Thanked 0 Times in 0 Posts
Data Join 2 files with multiple columns: awk/grep/join?

Hello,
My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns:

File A: (tab-delimited)
PDB CHAIN Start End Fragment
1avq A 171 176 awyfan
1avq A 172 177 wyfany
1c7k A 2 7 vtvtyd
1c7k A 3 8 tvtydp

File B: (tab-delimited)
PDB CHAIN Start End Fragment

1cfe A 104 109 awyfan
1cfe A 105 110 lgcgra
1dk0 A 50 55 awyfan
1d3g A 83 88 fveigs
1d3g A 84 89 vtvtyd
1dk0 A 51 56 gsqyai

I want to join the rows of two tables based on overlaps of the fifth column (column fragment). As such, the output should read:

Fragment PDB CHAIN Start End PDB CHAIN Start End
awyfan 1avq A 171 176 1cfe A 104 109
awyfan 1avq A 171 176 1dk0 A 50 55
vtvtyd 1c7k A 2 7 1d3g A 84 89

Kindly note that there can be multiple overlaps between Files A and B.

I have read of using the join function in Unix, but when I tried it I ended up with output of File A only. (I tried join -1 1 FileA FileB) based on reading a similar thread.
Alternatively, I know that I can use awk (NC=='5' I think), but I am not very familiar with the script. I know how to fgrep if the file contains a single column, but not multiple columns.
Is there a simple way to write this???

Your help will be really appreciated!
Thanks in advance!
DG
Sponsored Links
    #2  
Old Unix and Linux 12-01-2009   -   Original Discussion by InfoSeeker
Scrutinizer's Unix or Linux Image
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 18 November 2017, 12:20 AM EST
Location: Amsterdam
Posts: 11,617
Thanks: 516
Thanked 3,380 Times in 2,979 Posts
I think you'll probably have to use something like awk since you also reorder 5 column to column 1. E.g.:

Code:
awk '{i=$5;$5=x} NR==FNR{A[i]=$0;next} A[i]{print i,A[i]$0}' fileA fileB


Last edited by Scrutinizer; 12-01-2009 at 09:06 PM..
Sponsored Links
    #3  
Old Unix and Linux 12-01-2009   -   Original Discussion by InfoSeeker
binlib binlib is offline
Registered User
 
Join Date: Aug 2009
Last Activity: 15 March 2013, 10:40 AM EDT
Location: New Jersey
Posts: 380
Thanks: 7
Thanked 90 Times in 75 Posts

Code:
join -t'TAB' -j5 -o1.5,1.1,1.2,1.3,1.4,2.1,2.2,2.3,2.4 <(sort -k5 FileA) <(sort -k5 FileB)


Last edited by binlib; 12-01-2009 at 08:40 PM..
    #4  
Old Unix and Linux 12-01-2009   -   Original Discussion by InfoSeeker
InfoSeeker InfoSeeker is offline
Registered User
 
Join Date: Nov 2008
Last Activity: 22 October 2010, 2:07 PM EDT
Posts: 12
Thanks: 0
Thanked 0 Times in 0 Posts
Thank you both for your replies. Unfortunately I have tried both of them,

1. with the awk script I only ended up getting the headers as the output:
PDB CHAIN StartPos EndPos PDB CHAIN StartPos EndPos

2. With the join command I output to a txt file but I ended up with nothing?

Could you both kindly explain the code to me? Maybe if I get the logic then I can modify it accordingly. Thanks!

---------- Post updated at 04:45 PM ---------- Previous update was at 04:39 PM ----------

I doubled checked, I had wrongly-formatted files so the commands weren't working properly. Both working find now! Linux
Thank you very much!!

PS: Is there a way however to modify it so that I don't end up with duplicates:
For example, I end up with
glelk 1avq A 129 133 1avq A 129 133
which is basically an identical row in both FilesA and FilesB
Sponsored Links
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Join multiple Split files in Unix venu_nbk UNIX for Dummies Questions & Answers 2 11-04-2009 08:23 AM
sql,multiple join,outer join issue robbiezr Programming 0 06-06-2009 12:26 AM
shell script to join multiple files bonosungho Shell Programming and Scripting 7 04-26-2009 04:29 AM
Join columns from 2 files osramos Shell Programming and Scripting 2 11-14-2007 05:25 AM
Command line tool to join multiple .wmv files? karman OS X (Apple) 2 09-23-2007 02:52 AM



All times are GMT -4. The time now is 02:35 AM.