Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How to join 2 .txt files based on a common column? Post 302808961 by alisrpp on Friday 17th of May 2013 06:57:47 PM
Old 05-17-2013
How to join 2 .txt files based on a common column?

Hi all,

I'm trying to join two .txt file tab delimitated based on a common column.

File 1
Code:
transcript_id   gene_id length  effective_length        expected_count  TPM     FPKM    IsoPct
comp1000201_c0_seq1     comp1000201_c0  337     183.51  0.00    0.00    0.00    0.00
comp1000297_c0_seq1     comp1000297_c0  612     458.50  0.00    0.00    0.00    0.00
comp100036_c0_seq1      comp100036_c0   333     179.51  1.00    6.72    6.17    100.00
comp10003_c1_seq1       comp10003_c1    328     174.51  0.00    0.00    0.00    0.00
comp100041_c0_seq1      comp100041_c0   338     184.51  0.00    0.00    0.00    0.00
comp100041_c0_seq2      comp100041_c0   302     148.52  0.00    0.00    0.00    0.00
comp100041_c0_seq3      comp100041_c0   339     185.51  0.00    0.00    0.00    0.00
comp100051_c0_seq1      comp100051_c0   323     169.51  0.00    0.00    0.00    0.00
comp1000890_c0_seq1     comp1000890_c0  460     306.51  2.00    7.87    7.23    100.00


File 2
Code:
transcript_id       Seq. Description        Seq. Length     #Hits   min. eValue mean Similarity #GOs    GOs     Enzyme Codes    InterProScan
comp1000201_c0_seq1     ---NA---        337     0                       0       -               
comp1000297_c0_seq1     ---NA---        612     0                       0       -               
comp100036_c0_seq1      ---NA---        333     0                       0       -               
comp10003_c1_seq1       ---NA---        328     0                       0       -               
comp100041_c0_seq1      ---NA---        338     0                       0       -               
comp100041_c0_seq2      ---NA---        302     0                       0       -               
comp100041_c0_seq3      ---NA---        339     0                       0       -               
comp100051_c0_seq1      pogo transposable element with krab domain-like 323     5       8.05925533998611E-10    55.8%   0       -       
        
comp1000890_c0_seq1     ---NA---        460     0                       0       -               
comp1000928_c0_seq1     40s ribosomal protein s7        317     5       1.02118911037712E-20    66.6%   4       F:GO:0003735; P:GO:0006412; C:GO:0005840; C:GO:0005622


Both files have 307154 lines and the common column is the first one, transcript_id

I've been trying:
Code:
$join <file 1> <file 2> > <output file>

and two things are happening:

- I'm losing the tab delimitation in the output file
- My output file is 307049 lines, so i'm losing lines...

I've looking for some answer in previous threats without success.
Any help will be great!

Thanks!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

2. Shell Programming and Scripting

Merging 2 files based on a common column

Hi All, I do have 2 files file 1 has 4 tab delimited columns 234 a c dfgyu 294 b g fih 302 c h jzh 328 z c san 597 f g son File 2 has 2 tab delimted columns 234 23 302 24 597 24 I want to merge file 2 with file 1 based on the data common in both files which is the first column so... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

3. Shell Programming and Scripting

Join multiple files based on 1 common column

I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ? file1 ax100 20 30 40 ax200 22 33 44 file2 ax100 10 20 40 ax200 12 13 44 file2 ax100 0 0 4 ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies

4. Shell Programming and Scripting

join files based on a common field

Hi experts, Would you please help me with this? I have several files and I need to join the forth field of them based on the common first field. here's an example... first file: 280346 39.88 -75.08 547.8 280690 39.23 -74.83 538.7 280729 40.83 -75.08 499.2 280907 40.9 -74.4 507.8... (5 Replies)
Discussion started by: GoldenFire
5 Replies

5. Shell Programming and Scripting

join two files based on one column

Hi All, I am trying to join to files based on one common column. Cat File1 ID HID Ab_1 23 Cd 45 df 22 Vv 33 Cat File2 ID pval Ab_1 0.3 Cd 10 Vv 0.0444 (3 Replies)
Discussion started by: newpro
3 Replies

6. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

7. Shell Programming and Scripting

common entries between files based on 1st column

Hi, I am trying to get the common entries from 2 files based on 1st field.. However when I try to do in perl I am getting blank output.. How can I do this in awk? open(BUFF1, "my_genes"); open(BUFF3, "rawcounts"); #open(WRBUFF,">result_rawcounts"); while($line =<BUFF1>) { ... (3 Replies)
Discussion started by: Diya123
3 Replies

8. UNIX for Dummies Questions & Answers

Join 2 files based on certain column

I have file input1.txt 11103|11|OTTAWA|City|AA|CAR|0|0|1|-1|0|8526|2014-09-07 23:00:14 11103|11|OTTAWA|City|BB|TRAIN|0|0|2|-2|6|6359|2014-09-07 23:00:14 11104|11|CANADA|City|CC|CAR|0|0|2|-2|0|5947|2014-09-07 23:00:14 11104|11|CANADA|City|DD|TRAIN|0|0|2|-2|1|4523|2014-09-07 23:00:14... (5 Replies)
Discussion started by: radius
5 Replies

9. Shell Programming and Scripting

Paste columns based on common column: multiple files

Hi all, I've multiple files. In this case 5. Space separated columns. Each file has 12 columns. Each file has 300-400K lines. I want to get the output such that if a value in column 2 is present in all the files then get all the columns of that value and print it side by side. Desired output... (15 Replies)
Discussion started by: genome
15 Replies

10. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Hello, I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName. Table1|Column1 Table2|Column1 Table5|Column1 Table3|Column2 Table2|Column2 Table4|Column3 Table2|Column3 Table2|Column4 Table5|Column4 Table2|Column5 From... (6 Replies)
Discussion started by: nv186000
6 Replies
JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME
join - relational database operator SYNOPSIS
join [ options ] file1 file2 DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard input is used. File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in each line. There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con- sists of the common field, then the rest of the line from file1, then the rest of the line from file2. Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis- carded. These options are recognized: -an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2. -e s Replace empty output fields by string s. -jn m Join on the mth field of file n. If n is missing, use the mth field in each file. -o list Each output line comprises the fields specifed in list, each element of which has the form n.m, where n is a file number and m is a field number. -tc Use character c as a separator (tab character). Every appearance of c in a line is significant. SEE ALSO
sort(1), comm(1), awk(1) BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort. The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous. JOIN(1)
All times are GMT -4. The time now is 10:09 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy