Join 3 or more files using matching column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Join 3 or more files using matching column
# 1  
Old 04-27-2011
Join 3 or more files using matching column

Dear Forum,

Full title of the topic would be: "Join 3 or more files using matching column without full list in any of these columns"

I have several, typically 3 or 4 files which I need to join, something like FULL JOIN in slq scripts, all combinations of matches should be printed into an output file, including those lines where no match to any other file exists. I used mysql where FULL JOIN statement does not exist but some workarounds do the job, at least in case of 3 files but sometimes I got duplicates or even multiplicates. And most important mysql is slow with big files.

I give a single column examle hoping that I manage to implement it to multicolumn cases:
File 1
col1
aaa
bbb
abb
fff

File2
col1
bbb
abb
ccc
fff

File3
aaa
ccc
dce
fff
Output
col1_file1 col1_file2 col1_file3
aaa null aaa
bbb bbb null
abb abb null
null ccc ccc
null null dce
fff fff fff

The best wold be when I can add more files with a ease to the script!

appreciating your ideas!

cyz
# 2  
Old 04-27-2011
three files version:

Code:
awk 'ARGIND==1{a[$1]=$0 " null null";next;} 
    ARGIND==2{ if($1 in a)sub(/ null /, " "$1" ",a[$1]);else a[$1]="null "$1" null";next; }
    ARGIND==3{if($1 in a)sub(/ null$/," "$1" ",a[$1]); else a[$1]="null null "$1;}
END{for(i in a)print a[i]}' file1 file2 file3

---------- Post updated at 13:17 ---------- Previous update was at 13:06 ----------

multi-files version:

just fill the files in "files" list

PHP Code:

#!/usr/bin/python
files=['file1','file2','file3']
dict={}
len files.__len__()
for 
s in files:
    
idx files.index(s)
    
open(s)
    
line = [x.replace("\n","") for x in f.readlines()]
    for 
l in line:
        if(
not dict.has_key(l)):
            
dict[l] = ["null"]*len
        dict
[l][idx] = l
    f
.close()
keys dict.keys()
for 
k in keys:
    print  
reduce(lambda x,y" " y,dict[k]) 
# 3  
Old 04-27-2011
Great!

For 4 files with awk I tried:

awk 'ARGIND==1{a[$1]=$0 "null null null";next;}
ARGIND==2{if($1 in a)sub(/ null /, " "$1" ",a[$1]);else a[$1]="null "$1" null null";next; }
ARGIND==3{if($1 in a)sub(/ null$/," "$1" ",a[$1]); else a[$1]="null null "$1" null";}
ARGIND==4{if($1 in a)sub(/ null$/," "$1" ",a[$1]); else a[$1]="null null null "$1"";}
END{for(i in a)print a[i]}' file1 file2 file3 file4

Seems to work, only columns coming from file3 and file 4 a swapped...

Minor question, by some reason there are 13 spaces between columns 1 and 2, how come?

cyz
# 4  
Old 04-27-2011
the red part is not correct for 4 files.
I suggest that you try the python way if you have more than 3 files.

Code:
ARGIND==3{if($1 in a)sub(/ null$/," "$1" ",a[$1]); else a[$1]="null null "$1" null";}
    ARGIND==4{if($1 in a)sub(/ null$/," "$1" ",a[$1]); else a[$1]="null null null "$1"";}

This User Gave Thanks to sk1418 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join 2nd column of multiple files

Dear All, I have many files formatted like this: file1.txt: 1/2-SBSRNA4 18 A1BG 3 A1BG-AS1 6 A1CF 0 A2LD1 1 A2M 1160 file2.txt 1/2-SBSRNA4 53 A1BG 1 A1BG-AS1 7 A1CF 0 A2LD1 3 A2M 2780 (5 Replies)
Discussion started by: paolo.kunder
5 Replies

2. UNIX for Dummies Questions & Answers

Join 2 files based on certain column

I have file input1.txt 11103|11|OTTAWA|City|AA|CAR|0|0|1|-1|0|8526|2014-09-07 23:00:14 11103|11|OTTAWA|City|BB|TRAIN|0|0|2|-2|6|6359|2014-09-07 23:00:14 11104|11|CANADA|City|CC|CAR|0|0|2|-2|0|5947|2014-09-07 23:00:14 11104|11|CANADA|City|DD|TRAIN|0|0|2|-2|1|4523|2014-09-07 23:00:14... (5 Replies)
Discussion started by: radius
5 Replies

3. UNIX for Dummies Questions & Answers

Join files by second column

I have file input file1 1/1/2013 A 553.0763397 96 16582 1/1/2013 B 459.8333588 195 11992 1/2/2013 A 844.2973022 306 19555 1/2/2013 B 833.9300537 457 20165 1/3/2013 A 563.6917419 396 13879 1/3/2013 B 632.0749969 169 ... (1 Reply)
Discussion started by: radius
1 Replies

4. Shell Programming and Scripting

Join two files with matching columns

Hi, I need to join two files together with one common value in a column. I think I can use awk or join or a combination but I can't quite get it. Basically my data looks like this, with the TICKER columns matching up in each file File1 TICKER,column 1, column, 2, column, 3, column 4 ... (6 Replies)
Discussion started by: unkleruckus
6 Replies

5. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

6. Shell Programming and Scripting

join two files based on one column

Hi All, I am trying to join to files based on one common column. Cat File1 ID HID Ab_1 23 Cd 45 df 22 Vv 33 Cat File2 ID pval Ab_1 0.3 Cd 10 Vv 0.0444 (3 Replies)
Discussion started by: newpro
3 Replies

7. Shell Programming and Scripting

Matching the substring and join two files

Hi I had two files like below. file-1 101001234567890 101001234567891 101001234567892 101001234567893 101001234567894 101001234567895 101001234567896 101001234567897 101001234567898 101001234567899 file-2 (6 Replies)
Discussion started by: p_sai_ias
6 Replies

8. Shell Programming and Scripting

Join multiple files by column with awk

Hi all, I searched through the forum but i can't manage to find a solution. I need to join a set of files placed in a directory (~1600) by column, and obtain an output with first and second column common to each file, but following columns are taken from the file in the list (precisely the fourth... (10 Replies)
Discussion started by: macsx82
10 Replies

9. UNIX for Dummies Questions & Answers

Join 2 files using first column

Hi, I'm trying to compare the first column of two files (tab or whitespace delimited, either way's fine, I`ve got both) and print the lines that are identical for the first column of both files. Something like this: File1 AAA 26 49 7 27 36 33 46 75 73 69 AAAAA 4 10 4 7 10 18 21... (2 Replies)
Discussion started by: vanesa1230
2 Replies

10. Shell Programming and Scripting

Join 3 files using key column in a mapping file

I'm new of UNIX shell scripting. I'm recently generating a excel report in UNIX(file with delimiter is fine). How should I make a script to do it? 1 file to join comes from output of one UNIX command, the second from another UNIX command, and third from a database query. The key columes of all... (7 Replies)
Discussion started by: bigsmile
7 Replies
Login or Register to Ask a Question