How to join 2 .txt files based on a common column?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to join 2 .txt files based on a common column?
# 1  
Old 05-17-2013
How to join 2 .txt files based on a common column?

Hi all,

I'm trying to join two .txt file tab delimitated based on a common column.

File 1
Code:
transcript_id   gene_id length  effective_length        expected_count  TPM     FPKM    IsoPct
comp1000201_c0_seq1     comp1000201_c0  337     183.51  0.00    0.00    0.00    0.00
comp1000297_c0_seq1     comp1000297_c0  612     458.50  0.00    0.00    0.00    0.00
comp100036_c0_seq1      comp100036_c0   333     179.51  1.00    6.72    6.17    100.00
comp10003_c1_seq1       comp10003_c1    328     174.51  0.00    0.00    0.00    0.00
comp100041_c0_seq1      comp100041_c0   338     184.51  0.00    0.00    0.00    0.00
comp100041_c0_seq2      comp100041_c0   302     148.52  0.00    0.00    0.00    0.00
comp100041_c0_seq3      comp100041_c0   339     185.51  0.00    0.00    0.00    0.00
comp100051_c0_seq1      comp100051_c0   323     169.51  0.00    0.00    0.00    0.00
comp1000890_c0_seq1     comp1000890_c0  460     306.51  2.00    7.87    7.23    100.00


File 2
Code:
transcript_id       Seq. Description        Seq. Length     #Hits   min. eValue mean Similarity #GOs    GOs     Enzyme Codes    InterProScan
comp1000201_c0_seq1     ---NA---        337     0                       0       -               
comp1000297_c0_seq1     ---NA---        612     0                       0       -               
comp100036_c0_seq1      ---NA---        333     0                       0       -               
comp10003_c1_seq1       ---NA---        328     0                       0       -               
comp100041_c0_seq1      ---NA---        338     0                       0       -               
comp100041_c0_seq2      ---NA---        302     0                       0       -               
comp100041_c0_seq3      ---NA---        339     0                       0       -               
comp100051_c0_seq1      pogo transposable element with krab domain-like 323     5       8.05925533998611E-10    55.8%   0       -       
        
comp1000890_c0_seq1     ---NA---        460     0                       0       -               
comp1000928_c0_seq1     40s ribosomal protein s7        317     5       1.02118911037712E-20    66.6%   4       F:GO:0003735; P:GO:0006412; C:GO:0005840; C:GO:0005622


Both files have 307154 lines and the common column is the first one, transcript_id

I've been trying:
Code:
$join <file 1> <file 2> > <output file>

and two things are happening:

- I'm losing the tab delimitation in the output file
- My output file is 307049 lines, so i'm losing lines...

I've looking for some answer in previous threats without success.
Any help will be great!

Thanks!
# 2  
Old 05-18-2013
You have to specify the tab via "ctrl-v <tab>":
Code:
join -t'      ' f1 f2 > f3

Check out this info:Unix Join with tabs
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Hello, I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName. Table1|Column1 Table2|Column1 Table5|Column1 Table3|Column2 Table2|Column2 Table4|Column3 Table2|Column3 Table2|Column4 Table5|Column4 Table2|Column5 From... (6 Replies)
Discussion started by: nv186000
6 Replies

2. Shell Programming and Scripting

Paste columns based on common column: multiple files

Hi all, I've multiple files. In this case 5. Space separated columns. Each file has 12 columns. Each file has 300-400K lines. I want to get the output such that if a value in column 2 is present in all the files then get all the columns of that value and print it side by side. Desired output... (15 Replies)
Discussion started by: genome
15 Replies

3. UNIX for Dummies Questions & Answers

Join 2 files based on certain column

I have file input1.txt 11103|11|OTTAWA|City|AA|CAR|0|0|1|-1|0|8526|2014-09-07 23:00:14 11103|11|OTTAWA|City|BB|TRAIN|0|0|2|-2|6|6359|2014-09-07 23:00:14 11104|11|CANADA|City|CC|CAR|0|0|2|-2|0|5947|2014-09-07 23:00:14 11104|11|CANADA|City|DD|TRAIN|0|0|2|-2|1|4523|2014-09-07 23:00:14... (5 Replies)
Discussion started by: radius
5 Replies

4. Shell Programming and Scripting

common entries between files based on 1st column

Hi, I am trying to get the common entries from 2 files based on 1st field.. However when I try to do in perl I am getting blank output.. How can I do this in awk? open(BUFF1, "my_genes"); open(BUFF3, "rawcounts"); #open(WRBUFF,">result_rawcounts"); while($line =<BUFF1>) { ... (3 Replies)
Discussion started by: Diya123
3 Replies

5. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

6. Shell Programming and Scripting

join two files based on one column

Hi All, I am trying to join to files based on one common column. Cat File1 ID HID Ab_1 23 Cd 45 df 22 Vv 33 Cat File2 ID pval Ab_1 0.3 Cd 10 Vv 0.0444 (3 Replies)
Discussion started by: newpro
3 Replies

7. Shell Programming and Scripting

join files based on a common field

Hi experts, Would you please help me with this? I have several files and I need to join the forth field of them based on the common first field. here's an example... first file: 280346 39.88 -75.08 547.8 280690 39.23 -74.83 538.7 280729 40.83 -75.08 499.2 280907 40.9 -74.4 507.8... (5 Replies)
Discussion started by: GoldenFire
5 Replies

8. Shell Programming and Scripting

Join multiple files based on 1 common column

I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ? file1 ax100 20 30 40 ax200 22 33 44 file2 ax100 10 20 40 ax200 12 13 44 file2 ax100 0 0 4 ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies

9. Shell Programming and Scripting

Merging 2 files based on a common column

Hi All, I do have 2 files file 1 has 4 tab delimited columns 234 a c dfgyu 294 b g fih 302 c h jzh 328 z c san 597 f g son File 2 has 2 tab delimted columns 234 23 302 24 597 24 I want to merge file 2 with file 1 based on the data common in both files which is the first column so... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

10. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies
Login or Register to Ask a Question