Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Join 2 files with multiple columns: awk/grep/join? Post 302376539 by InfoSeeker on Tuesday 1st of December 2009 04:06:38 PM
Old 12-01-2009
Data Join 2 files with multiple columns: awk/grep/join?

Hello,
My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns:

File A: (tab-delimited)
PDB CHAIN Start End Fragment
1avq A 171 176 awyfan
1avq A 172 177 wyfany
1c7k A 2 7 vtvtyd
1c7k A 3 8 tvtydp

File B: (tab-delimited)
PDB CHAIN Start End Fragment

1cfe A 104 109 awyfan
1cfe A 105 110 lgcgra
1dk0 A 50 55 awyfan
1d3g A 83 88 fveigs
1d3g A 84 89 vtvtyd
1dk0 A 51 56 gsqyai

I want to join the rows of two tables based on overlaps of the fifth column (column fragment). As such, the output should read:

Fragment PDB CHAIN Start End PDB CHAIN Start End
awyfan 1avq A 171 176 1cfe A 104 109
awyfan 1avq A 171 176 1dk0 A 50 55
vtvtyd 1c7k A 2 7 1d3g A 84 89

Kindly note that there can be multiple overlaps between Files A and B.

I have read of using the join function in Unix, but when I tried it I ended up with output of File A only. (I tried join -1 1 FileA FileB) based on reading a similar thread.
Alternatively, I know that I can use awk (NC=='5' I think), but I am not very familiar with the script. I know how to fgrep if the file contains a single column, but not multiple columns.
Is there a simple way to write this???

Your help will be really appreciated!
Thanks in advance!
DG
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join columns from 2 files

Hello guys. I need to join columns of Start Time and End time of 2 files. The Examples is: File 1 APPLICATION GROUP_NAME JOB_NAME ODATE STATUS START_TIME END_TIME :: MODCMPDA EDPONLINE MC00A1700 071102 Ended OK 20071102 17:00:01 File 2 APPLICATION GROUP_NAME JOB_NAME ODATE... (2 Replies)
Discussion started by: osramos
2 Replies

2. Programming

sql,multiple join,outer join issue

example sql: select a.a1,b.b1,c.c1,d.d1,e.e1 from a left outer join b on a.x=b.x left outer join c on b.y=c.y left outer join d on d.z=a.z inner join a.t=e.t I know how single outer or inner join works in sql. But I don't really understand when there are multiple of them. can... (0 Replies)
Discussion started by: robbiezr
0 Replies

3. Shell Programming and Scripting

Join multiple files by column with awk

Hi all, I searched through the forum but i can't manage to find a solution. I need to join a set of files placed in a directory (~1600) by column, and obtain an output with first and second column common to each file, but following columns are taken from the file in the list (precisely the fourth... (10 Replies)
Discussion started by: macsx82
10 Replies

4. Shell Programming and Scripting

Awk - join multiple files

Is it possible to join all the files with input1 based on 1st column? input1 a b c d e f input2 a b input3 a e input4 c (2 Replies)
Discussion started by: quincyjones
2 Replies

5. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

6. Shell Programming and Scripting

Join 4 files on first three columns

Hi, Can someone suggest me on how to join 4 files by comparing the first three columns? ---------- Post updated at 03:56 PM ---------- Previous update was at 03:42 PM ---------- Hope it helps someone. I was looking online for a solution and on stackoverflow, I found a solution and tried... (6 Replies)
Discussion started by: jacobs.smith
6 Replies

7. Shell Programming and Scripting

Sort and join multiple columns using awk

Is it possible to join all the values after sorting them based on 1st column key and replace empty rows with 0 like below ? input a1 0 a1 1 a1 1 a3 1 b2 1 a2 1 a4 1 a2 1 a4 1 c4 1 a3 1 d1 1 a3 1 b1 1 d1 1 a4 1 c4 1 b2 1 b1 1 b2 1 c4 1 d1 1 output... (8 Replies)
Discussion started by: quincyjones
8 Replies

8. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

9. Shell Programming and Scripting

Join and merge multiple files with duplicate key and fill void columns

Join and merge multiple files with duplicate key and fill void columns Hi guys, I have many files that I want to merge: file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: (5 Replies)
Discussion started by: yjacknewton
5 Replies

10. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies
join(1) 						      General Commands Manual							   join(1)

NAME
join - relational database operator SYNOPSIS
[options] file1 file2 DESCRIPTION
forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 or file2 is the standard input is used. file1 and file2 must be sorted in increasing collating sequence (see Environment Variables below) on the fields on which they are to be joined; normally the first in each line. The output contains one line for each pair of lines in file1 and file2 that have identical join fields. The output line normally consists of the common field followed by the rest of the line from file1, then the rest of the line from file2. The default input field separators are space, tab, or new-line. In this case, multiple separators count as one field separator, and lead- ing separators are ignored. The default output field separator is a space. Some of the below options use the argument n. This argument should be a or a referring to either file1 or file2, respectively. Options In addition to the normal output, produce a line for each unpairable line in file n, where n is or Replace empty output fields by string s. Join on field m of both files. The argument m must be delimited by space characters. This option and the following two are provided for backward compatibility. Use of the and options ( see below ) is recommended for portability. Join on field m of file1. Join on field m of file2. Each output line comprises the fields specified in list, each element of which has the form where n is a file number and m is a field number. The common field is not printed unless specifically requested. Use character c as a separator (tab character). Every appearance of c in a line is significant. The character c is used as the field sepa- rator for both input and output. Instead of the default output, produce a line only for each unpairable line in file_number, where file_number is or Join on field f of file 1. Fields are numbered starting with 1. Join on field f of file 2. Fields are numbered starting with 1. EXTERNAL INFLUENCES
Environment Variables determines the collating sequence expects from input files. determines the alternative blank character as an input field separator, and the interpretation of data within files as single and/or multi- byte characters. also determines whether the separator defined through the option is a single- or multi-byte character. If or is not specified in the environment or is set to the empty string, the value of is used as a default for each unspecified or empty variable. If is not specified or is set to the empty string, a default of ``C'' (see lang(5)) is used instead of If any internationaliza- tion variable contains an invalid setting, behaves as if all internationalization variables are set to ``C'' (see environ(5)). International Code Set Support Single- and multi-byte character code sets are supported with the exception that multi-byte-character file names are not supported. EXAMPLES
The following command line joins the password file and the group file, matching on the numeric group ID, and outputting the login name, the group name, and the login directory. It is assumed that the files have been sorted in the collating sequence defined by the or environment variable on the group ID fields. The following command produces an output consisting all possible combinations of lines that have identical first fields in the two sorted files sf1 and sf2, with each line consisting of the first and third fields from and the second and fourth fields from WARNINGS
With default field separation, the collating sequence is that of with the sequence is that of a plain sort. The conventions of and are incongruous. Numeric filenames may cause conflict when the option is used immediately before listing filenames. AUTHOR
was developed by OSF and HP. SEE ALSO
awk(1), comm(1), sort(1), uniq(1). STANDARDS CONFORMANCE
join(1)
All times are GMT -4. The time now is 11:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy