Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 07-05-2012
Registered User
 
Join Date: May 2011
Posts: 190
Thanks: 92
Thanked 0 Times in 0 Posts
How to use the the join command to join multiple files by a common column

Hi,

I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column).

I want to write a script to join the files by the first common column so that in the resulting output file, the first column is the common column that is present in all 20 files and the following sets of two columns after that are the last two columns of each text file (i.e. columns 2 and 3 are columns 2 and 3 of GSM1.txt, columns 4 and 5 are columns 2 and 3 of GSM 2.txt and so on...)

How do I go about doing that? Thanks!
Sponsored Links
    #2  
Old 07-05-2012
Mead Rotor
 
Join Date: Aug 2005
Location: Saskatchewan
Posts: 16,374
Thanks: 491
Thanked 2,535 Times in 2,418 Posts
join file1 file2 file3?
The Following User Says Thank You to Corona688 For This Useful Post:
evelibertine (07-05-2012)
Sponsored Links
    #3  
Old 07-05-2012
Registered User
 
Join Date: May 2011
Posts: 190
Thanks: 92
Thanked 0 Times in 0 Posts
Actually I get the error message

join: extra operand `3.txt'


When I try
Code:
join 1.txt 2.txt 3.txt > output.txt

    #4  
Old 07-05-2012
elixir_sinari's Avatar
Gotham Knight
 
Join Date: Mar 2012
Location: India
Posts: 1,370
Thanks: 87
Thanked 476 Times in 456 Posts
Check join man page; join can take only 2 files at a time. You'll need it run it in a loop or through xargs.
Sponsored Links
    #5  
Old 07-05-2012
Mead Rotor
 
Join Date: Aug 2005
Location: Saskatchewan
Posts: 16,374
Thanks: 491
Thanked 2,535 Times in 2,418 Posts
Thanks, I didn't realize that.

Okay then:


Code:
awk -F"\t" -v OFS="\t" 'F!=FILENAME { FNUM++; F=FILENAME }

{       COL[$1]++;        C=$1; $1="";        A[C, FNUM]=$0 }

END {
        for(X in COL)
        {
                printf("%s", X);
                for(N=1; N<=FNUM; N++) printf("%s", A[X, N]);
                printf("\n");
        }
}' file1 file2 file3 file4 ...

The Following User Says Thank You to Corona688 For This Useful Post:
evelibertine (07-05-2012)
Sponsored Links
    #6  
Old 07-05-2012
...@...
 
Join Date: Feb 2004
Location: NM
Posts: 9,652
Thanks: 164
Thanked 645 Times in 622 Posts
extend this to the number of files you have

Code:
join GSN1.txt GSN2.txt > tmp.tmp     
for f in GSN3.txt GSN4.txt GSN5.txt  
do                                   
    join tmp.tmp $f > tmpf           
    mv tmpf tmp.tmp                  
done                                 
mv tmp.tmp GSN_ALL.txt               
cat GSN_ALL.txt

The Following User Says Thank You to jim mcnamara For This Useful Post:
evelibertine (07-05-2012)
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
how to join two files using "Join" command with one common field in this problem? mindfreak UNIX for Dummies Questions & Answers 2 04-13-2012 05:55 AM
Perl join two files by "common" column yifangt Web Programming 5 02-07-2011 08:30 AM
Join multiple files based on 1 common column quincyjones Shell Programming and Scripting 9 12-17-2010 01:17 AM
Join multiple files by column with awk macsx82 Shell Programming and Scripting 10 09-18-2010 04:56 PM
Join 2 files with multiple columns: awk/grep/join? InfoSeeker UNIX for Dummies Questions & Answers 3 12-01-2009 07:45 PM



All times are GMT -4. The time now is 03:48 PM.