UNIX outer join


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers UNIX outer join
# 1  
Old 10-02-2013
UNIX outer join

Hello.. I am trying to join two files of about 7000 records. it looked quite straight forward when i began, but i'm not getting the desired output.
here is what i'm trying to do:

Code:
cat xxx
item,So,Mo,Tu
aaa,1,1,1
bbb,1,1,4
ccc,1,1,0
ddd,1,1,1

Code:
cat yyy
item,Tu,We
aaa,1,1
bbb,4,0
eee,0,1
fff,0,2

Required output:
Code:
item,So,Mo,Tu,We
aaa,1,1,1,1
bbb,1,1,4,0
ccc,1,1,0,0
ddd,1,1,1,0
eee,0,0,0,1
fff,0,0,0,2

------------------------------------
( I sorted the files first )
1.This is what I could get: but makes no sense without the items!
Code:
join -t, -1 1 -2 1 -a 1 -a 2 -e 0 -o 1.2,1.3,1.4,2.3 xxx yyy
1,1,1,1
1,1,4,0
1,1,0,0
1,1,1,0
0,0,0,1
0,0,0,2
So,Mo,Tu,We

2. And this is what i get if i try to add them!
Code:
join -t, -1 1 -2 1 -a 1 -a 2 -e 0 -o 1.1,2.1,1.2,1.3,1.4,2.3 xxx yyy
aaa,aaa,1,1,1,1
bbb,bbb,1,1,4,0
ccc,0,1,1,0,0
ddd,0,1,1,1,0
0,eee,0,0,0,1
0,fff,0,0,0,2
item,item,So,Mo,Tu,We

Code:
join -t, -1 1 -2 1 -a 1 -a 2 -e 0 -o 0,1.2,1.3,1.4,2.3 xxx yyy
Usage: join [-aFilenum] [-vFilenum] [-e String] [-[j]Filenum Fieldnum] [-tCharacter] [-o [Filenum.Fieldnum[,Filenum.Fieldnum]...]] File1 File2

Is there a better,direct way in join to reach this output? else i think I will have to work with the output i got with 2nd statement.

many thanks in advance...

Moderator's Comments:
Mod Comment edit by bakunin: please use CODE-tags as required per forum rules. We even state that in the above the editor window you used to type your post into:

Do you have any code fragments or data samples in your post? If so wrap them in code tags using the code tag button in the editor below

hint: it looks like this -->

Image

How hard is it to ignore such explicit advice?


---------- Post updated at 03:00 PM ---------- Previous update was at 02:40 PM ----------

really sorry about the code-tags..

Last edited by bakunin; 10-02-2013 at 06:22 AM..
# 2  
Old 10-02-2013
If you compare the two variants, both of which worked partly, you may notice that in the first you left out the "items" field, which is why it didn't show up in the result. The join field is not automatically part of the output.:

Quote:
Originally Posted by wanderingmind16
Code:
join -t, -1 1 -2 1 -a 1 -a 2 -e 0 -o 1.2,1.3,1.4,2.3 xxx yyy

Instead of leaving it out you ordered it twice (once from file 1, once from file 2) here:
Code:
join -t, -1 1 -2 1 -a 1 -a 2 -e 0 -o 1.1,2.1,1.2,1.3,1.4,2.3 xxx yyy

The solution - as often, it is almost philosophical ;-)) - is the middle way of including the field only once. Wether you use "1.1" (include it from the first file) or "2.1" (include it from the second file) doesn't matter as these fields are equal anyway - you joined on them.

I hope this helps.

bakunin
# 3  
Old 10-02-2013
Thank you.
Yes, i have tried that as well..But then, for items not present in one file,it'll then print 0

Code:
join -t, -1 1 -2 1 -a 1 -a 2 -e 0 -o 1.1,1.2,1.3,1.4,2.3 xxx yyy
aaa,1,1,1,1
bbb,1,1,4,0
ccc,1,1,0,0
ddd,1,1,1,0
0,0,0,0,1
0,0,0,0,2
item,So,Mo,Tu,We

Code:
join -t, -1 1 -2 1 -a 1 -a 2 -e 0 -o 2.1,1.2,1.3,1.4,2.3 xxx yyy
aaa,1,1,1,1
bbb,1,1,4,0
0,1,1,0,0
0,1,1,1,0
eee,0,0,0,1
fff,0,0,0,2
item,So,Mo,Tu,We

# 4  
Old 10-02-2013
An approach using awk (not simple as join Smilie):
Code:
awk -F, '
        NR == FNR {
                        idx = idx < NF ? NF : idx
                        if ( FNR == 1 )
                        {
                                for ( i = 1; i <= NF; i++ )
                                        H1[i] = $i
                        }
                        else
                        {
                                if ( !(K[$1]) )
                                {
                                        T[++j] = $1
                                        K[$1] = $1
                                }
                                for ( i = 2; i <= NF; i++ )
                                        V[$1,H1[i]] = $i
                        }
                        next
        }
        {
                        idx = idx < NF ? NF : idx
                        if ( FNR == 1 )
                        {
                                for ( i = 1; i <= NF; i++ )
                                        H2[i] = $i
                        }
                        else
                        {
                                if ( !(K[$1]) )
                                {
                                        T[++j] = $1
                                        K[$1] = $1
                                }
                                for ( i = 2; i <= NF; i++ )
                                        V[$1,H2[i]] = $i
                        }
        }
        END {

                        for ( i = 1; i <= idx; i++ )
                        {
                                if ( !R[H1[i]] && H1[i] )
                                {
                                        H[++c] = H1[i]
                                        R[H1[i]] = H1[i]
                                }
                        }
                        for ( i = 1; i <= idx; i++ )
                        {
                                if ( !R[H2[i]] && H2[i] )
                                {
                                        H[++c] = H2[i]
                                        R[H2[i]] = H2[i]
                                }
                        }
                        for ( i = 1; i <= c; i++ )
                                h = h ? h OFS H[i] : H[i]
                        print h
                        for ( s = 1; s <= j; s++ )
                        {
                                for ( i = 1; i <= c; i++ )
                                {
                                        t = V[T[s],H[i]] ? V[T[s],H[i]] : "0"
                                        f = f ? f OFS t : T[s]
                                }
                                print f
                                f = ""
                        }

        }
' OFS=, xxx yyy

# 5  
Old 10-02-2013
The join field can always be included in the output format, even when it doesn't appear in one of the files, by using 0 instead of 1.x or 2.y.
Code:
$ cat file1
item,So,Mo,Tu
aaa,1,1,1
bbb,1,1,4
ccc,1,1,0
ddd,1,1,1

$ cat file2
item,Tu,We
aaa,1,1
bbb,4,0
eee,0,1
fff,0,2

$ join -t, -a1 -a2 -e0 -o0,1.2,1.3,1.4,2.3 file1 file2
item,So,Mo,Tu,We
aaa,1,1,1,1
bbb,1,1,4,0
ccc,1,1,0,0
ddd,1,1,1,0
eee,0,0,0,1
fff,0,0,0,2

Regards,
Alister

Last edited by alister; 10-02-2013 at 02:45 PM..
# 6  
Old 10-04-2013
Thank you alister.

But like i said in the first post, i already tried this with 0 and then i get this error . -o option is not taking 0. am i doing anything wrong?

Code:
join -t, -a1 -a2 -e0 -o0,1.2,1.3,1.4,2.3 xxx yyy
Usage: join [-aFilenum] [-vFilenum] [-e String] [-[j]Filenum Fieldnum] [-tCharacter] [-o [Filenum.Fieldnum[,Filenum.Fieldnum]...]] File1 File2

Thankyou Yoda..but it looks too big to put in this script Smilie

Last edited by wanderingmind16; 10-04-2013 at 04:37 AM..
# 7  
Old 10-04-2013
Quote:
Originally Posted by wanderingmind16
Thank you alister.

But like i said in the first post, i already tried this with 0 and then i get this error . -o option is not taking 0. am i doing anything wrong?

Code:
join -t, -a1 -a2 -e0 -o0,1.2,1.3,1.4,2.3 xxx yyy
Usage: join [-aFilenum] [-vFilenum] [-e String] [-[j]Filenum Fieldnum] [-tCharacter] [-o [Filenum.Fieldnum[,Filenum.Fieldnum]...]] File1 File2

Woops. I overlooked that command in your first post.

I see nothing wrong with your command. POSIX join -o has supported 0 for at least 15 years, cf. POSIX join 1997 version.

Which operating system or join implementation are you using?

Regards,
Alister
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

UNIX Join not working as expected

Hello All, I'm working on a Shell script to join data from two files using Join command but not able to get the desired output as its throwing me an error: I have sorted the two files on the Column 1 numerically which is used as Join clause File 1: 1,ABC,GGG,20160401 2,XYZ,KKK,20160401... (2 Replies)
Discussion started by: venkat_reddy
2 Replies

2. Shell Programming and Scripting

How to join two different file in UNIX?

Hello folks I have 2 files which are like this: file (a): id phenotype 100 1.2 200 -0.5 300 3.1 file (b) id genotype 100 0 1 2 ...... 200 1 1 1...... 300 2 0 0 ....... I should mention in file (a) I have 2... (3 Replies)
Discussion started by: sajmar
3 Replies

3. Shell Programming and Scripting

Break the outer loop

Hi I'm comparing same files names which are in different folders . The first for loop for the files in DAY1 folder and the second for loop for the files in DAY2 folder . the first IF condition is for checking whether the file names are equal the second If condtion is for checking the... (4 Replies)
Discussion started by: smile689
4 Replies

4. UNIX for Dummies Questions & Answers

how to join all lines in afile in unix

Hi, I have a unix file which has many lines, i need to join all the lines to single line. Eg: myfile.txt contains: a 123 45fg try and i need the output as : a 123 45fg try Please help me on this. Thanks! (2 Replies)
Discussion started by: RP09
2 Replies

5. Shell Programming and Scripting

Unix Join

Hi, I am trying to join two simple files but unable to do so properly. File R1 : 100 101 102 105 . . 1000 10001 File R2 100|x1 102|x2 1000|a1 10001|a2 and when i do (4 Replies)
Discussion started by: wanderingmind16
4 Replies

6. Programming

OUTER join?

Hi, I have a code where it traverses through each record froma table and creates records in another table. I use FOREACH cursor to do this, i'm using another cursor inside the FOREACH to fetch the details of other table. Please suggest me which would be more efficient, as this will run against... (2 Replies)
Discussion started by: dvah
2 Replies

7. Shell Programming and Scripting

How can i join three lines into one in unix?

Hi all, I am trying to merge three lines into one in my unix text file. My text file sis omething like this. xxxxxxxxx yyyyyyyyyyy zzz aaaaaaaaa bbbbbb ccccc Expected out put is xxxxxxxxx yyyyyyyyyyy zzz aaaaaaaaa bbbbbb ccccc I tried with awk as shown below. (23 Replies)
Discussion started by: rdhanek
23 Replies

8. Programming

sql,multiple join,outer join issue

example sql: select a.a1,b.b1,c.c1,d.d1,e.e1 from a left outer join b on a.x=b.x left outer join c on b.y=c.y left outer join d on d.z=a.z inner join a.t=e.t I know how single outer or inner join works in sql. But I don't really understand when there are multiple of them. can... (0 Replies)
Discussion started by: robbiezr
0 Replies

9. Programming

Oracle: fixing an outer join on subquery...

I have a stored procedure that is failing. The current query behind it is: SELECT DISTINCT <many, many values> FROM table1 LEFT OUTER JOIN table2 ON (table2.key = (select max (table2.key) from table2 where table2.key = table1.key) or ... (0 Replies)
Discussion started by: Elric of Grans
0 Replies

10. UNIX for Advanced & Expert Users

unix join !

just wandering if anyone knows what kind of algorithm unix join command is using to join 2 files. Thanks! (2 Replies)
Discussion started by: strike
2 Replies
Login or Register to Ask a Question