Bash: join 2 files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Bash: join 2 files
# 1  
Old 01-31-2014
Wrench Bash: join 2 files

Hello !

I want to join 2 files.
They look like this:
Code:
KE340296.1    1    0/0:11
KE340296.1    2    0/0:12
KE340296.1    6    0/1:13
KE340297.1    1    0/1:14
KE340297.1    3    0/1:15
KE340297.1    4    0/1:16

and
Code:
KE340296.1    1    0/0:21
KE340296.1    2    0/0:22
KE340296.1    3    0/1:23
KE340297.1    1    0/1:24
KE340297.1    2    0/1:25
KE340297.1    3    0/1:26

I would like to obtain this: Columns 1 and 2 are the two first columns of the two files. Column 3 corresponds to file1:col3 and column 4 to file2:col3.
Code:
KE340296.1    1    0/0:11   0/0:21
KE340296.1    2    0/0:12   0/0:22
KE340296.1    3    .        0/1:23
KE340296.1    6    0/1:13   .
KE340297.1    1    0/1:14   0/1:24
KE340297.1    2    .        0/1:25
KE340297.1    3    0/1:15   0/1:26
KE340297.1    4    0/1:16   .

Let me explain:
These two files are sorted by first the colum 1 where I have this names KE... and then by the column 2 (position).
The tricky thing is that for some position, I don't have a value in one of the file. For instance in KE340296.1, position 4, I have only a value in the first file.
The file I want to obtain contains all the positions represented in the two files (there may be hole, as for KE340296.1, no position 4 and 5 in either of the two files) and if there is no value for one of the file, then I would like a dot or whatever character.

I wanted to use "join" but it seems that it won't work because positions are not perfectly matching.

Any one knows how I can do that ?

Thanks a lot !

Mu
# 2  
Old 01-31-2014
I would suggest using awk. Here is an awk program with sort
Code:
awk '
        NR == FNR {
                A[$1 FS $2] = $NF
                next
        }
        {
                B[$1 FS $2] = $NF
        }
        END {
                for ( k in A )
                {
                        if ( k in B )
                                print k, A[k], B[k]
                        else
                                print k, A[k], "."
                }
                for ( k in B )
                {
                        if ( k in A )
                                print k, A[k], B[k]
                        else
                                print k, ".", B[k]
                }
        }
' OFS='\t' file1 file2 | sort -u

Replace file1 and file2 with your original input file names
# 3  
Old 01-31-2014
Hello,
Another awk solution:
Code:
awk '{X=$1 OFS $2}; FNR == NR {A[X]=$3 OFS ".";next};{if( X in A) { gsub(".$",$3,A[X]);next} else {A[X]="." OFS $3}};END{for (i in A) print i, A[i]}' OFS='\t' file1 file2 | sort

Regards.
# 4  
Old 02-01-2014
yaa*:
Code:
awk '
  {
    i=$1 OFS $2
    A[i]=NR==FNR ? $3 " . " : A[i] " . " $3
  }
  END{
    for(i in A) {
      $0=A[i]
      print i, $1, $NF
    }
  }
' OFS='\t' file1 file2 | sort

One line:
Code:
awk '{i=$1 OFS $2; A[i]=NR==FNR?$3 " . ":A[i] " . " $3} END{for(i in A) {$0=A[i]; print i, $1, $NF}}' OFS='\t' file1 file2 | sort

-- edit --
May be easier to follow:

Code:
awk '
  {
    C[i=$1 OFS $2]
    if(NR==FNR) A[i]=$3; else B[i]=$3
  } 
  END{
    for(i in C) print i, i in A ? A[i] : "." , i in B ? B[i] : "."
  }
' OFS='\t' file1 file2 | sort

Code:
awk '{C[i=$1 OFS $2]; if(NR==FNR)A[i]=$3; else B[i]=$3} END{for(i in C) {print i,i in A?A[i]:".", i in B?B[i]:"."}}' OFS='\t' file1 file2 | sort

---
* yaa=yet another awk Smilie

Last edited by Scrutinizer; 02-01-2014 at 07:00 AM..
# 5  
Old 02-03-2014
OK thanks a lot for your responses !

I didn't know NR, FNR and OFS so I learnt a lot here !

I picked the last solution as it seems the shortest. Though, this is not the simplest to understand Smilie

Is it possible to have an explanation of the different steps ?
Thanks a lot !
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies

2. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Hello, I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName. Table1|Column1 Table2|Column1 Table5|Column1 Table3|Column2 Table2|Column2 Table4|Column3 Table2|Column3 Table2|Column4 Table5|Column4 Table2|Column5 From... (6 Replies)
Discussion started by: nv186000
6 Replies

3. UNIX for Beginners Questions & Answers

BASH join command error PLS

i've tried every variation possible and keep getting not sorted error. can anyone shed any light on how to do this? (image attached) (1 Reply)
Discussion started by: deadcick
1 Replies

4. Shell Programming and Scripting

How to join 2 text files using bash scripting?

Hi Guys, I want to combine 2 files and and put together in 1 file . See below desired output. Any help will be much appreciated. FILE AX 2134 101L 12345.00 22222.00 1 10 X 2134 101L 12345.00 22222.00 11 20 X 2134 101L 12345.00 22222.00 21 30 X 2134 111L 77777.00 ... (3 Replies)
Discussion started by: H.R
3 Replies

5. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

6. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

7. Shell Programming and Scripting

Bash join script not working

So i'm currently working on a project where I'm attempting to display information of users from the /etc/passwd file and also another information file holding addition information about users. Problem is I've been trying to join the two files together and have all of the information about each... (2 Replies)
Discussion started by: Nostyx
2 Replies

8. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

9. Shell Programming and Scripting

[bash] join command

Hi I've 2 files: I'd like to get an output like: if I do like: join file1 file2 I get Where line 30 miss. I'm reading this old post: https://www.unix.com/unix-dummies-questions-answers/14647-join-command.html. Where a solutionn with awk is suggested. I mean is possible get the same... (5 Replies)
Discussion started by: Dedalus
5 Replies

10. Shell Programming and Scripting

join files

Hi , I want to join 2 files based on 2 column join condition. a11 john 2230 5000 a12 XXX 2230 A B 200 345 Expected O/P John 2230 5000 A B 200 I have tried this awk 'NR==FNR{a=$1;next}a&&sub($1,a)' a11 a12 > a13 (3 Replies)
Discussion started by: mohan705
3 Replies
Login or Register to Ask a Question