Join columns across multiple lines in a Text based on common column using BASH


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Join columns across multiple lines in a Text based on common column using BASH
# 1  
Old 03-06-2018
Join columns across multiple lines in a Text based on common column using BASH

Hello,

I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName.

Code:
Table1|Column1
Table2|Column1
Table5|Column1
Table3|Column2
Table2|Column2
Table4|Column3
Table2|Column3
Table2|Column4
Table5|Column4
Table2|Column5

From the below file i am trying to generate a Dynamic SQL Join between tablenames in column 1 when they have same value for columnname in column 2 of above table.

Code:
select * from Table1 a inner join Table2 b on a.Column1=b.column1 inner join Table5 c on a.Column1=c.column1

and

Code:
select * from Table3 a inner join Table2 b on a.column2 = b.column2

i want to do this by traversing through the file row by row till i reach end of file.

Can you please advise What is the best way to do it ?

Note: If i can just get the table names sharing same column names then also i can create the dynamic SQL outside the Bash logic.


Moderator's Comments:
Mod Comment Please use CODE tags - for data as well - as required by forum rules!

Last edited by RudiC; 03-06-2018 at 06:52 AM.. Reason: Added CODE tags.
# 2  
Old 03-06-2018
Welcome to the forum.

Any attempts / ideas / thoughts from your side? How would you "create the dynamic SQL outside the Bash logic"? Do solution proposals have to be shell or would text tools like e.g. awk be acceptable as well?
# 3  
Old 03-06-2018
Attempts so far

In another thread some one suggested below AWK code , unfortunately our awk version did not match and i was not able to use this .

Code:
{ a[$2][$1] };
END {
    for (col in a) {
        printf "%s", col;
        for (tab in a[col])
            printf "|%s", tab;
        print ""
    }
}

Another method i am trying now is using a self join

Code:
join -t "|" -1 2 -2 2  -o '1.1,2.1,1.2'  file  file

and i am getting below output

Code:
Table1|Table1|Column1
Table1|Table2|Column1
Table1|Table5|Column1
Table2|Table1|Column1
Table2|Table2|Column1
Table2|Table5|Column1
Table5|Table1|Column1
Table5|Table2|Column1
Table5|Table5|Column1
Table3|Table3|Column2
Table3|Table2|Column2
Table2|Table3|Column2
Table2|Table2|Column2
Table4|Table4|Column3
Table4|Table2|Column3
Table2|Table4|Column3
Table2|Table2|Column3
Table2|Table2|Column4
Table2|Table5|Column4
Table5|Table2|Column4
Table5|Table5|Column4
Table2|Table2|Column5

Now at least 2 problem with this approach

1) At max I can get 2 tables with common join columns . If there are 3 tables with same columns i will have to split it into 2 separate SQL statement with one join each.

2) I need to remove functional duplicates . that is joins between same tables , joins where just table names are reversed table1 join table2 == table2 join table1

Note:
1) I am open to using awk , or sed ..or anything that works.
2) outside Bash i meant , i can use hard coded concatenation to generate most part of the SQL statement apart from the join part.
Moderator's Comments:
Mod Comment Please use CODE tags for sample input and output as well as for code segments.

Last edited by Don Cragun; 03-06-2018 at 08:26 AM.. Reason: Add CODE and ICODE tags.
# 4  
Old 03-06-2018
Quote:
Originally Posted by nv186000
In another thread some one suggested below AWK code , unfortunately our awk version did not match and i was not able to use this .

Code:
{ a[$2][$1] };
END {
    for (col in a) {
        printf "%s", col;
        for (tab in a[col])
            printf "|%s", tab;
        print ""
    }
}

Another method i am trying now is using a self join

Code:
join -t "|" -1 2 -2 2  -o '1.1,2.1,1.2'  file  file

and i am getting below output

Code:
Table1|Table1|Column1
Table1|Table2|Column1
Table1|Table5|Column1
Table2|Table1|Column1
Table2|Table2|Column1
Table2|Table5|Column1
Table5|Table1|Column1
Table5|Table2|Column1
Table5|Table5|Column1
Table3|Table3|Column2
Table3|Table2|Column2
Table2|Table3|Column2
Table2|Table2|Column2
Table4|Table4|Column3
Table4|Table2|Column3
Table2|Table4|Column3
Table2|Table2|Column3
Table2|Table2|Column4
Table2|Table5|Column4
Table5|Table2|Column4
Table5|Table5|Column4
Table2|Table2|Column5

Now at least 2 problem with this approach

1) At max I can get 2 tables with common join columns . If there are 3 tables with same columns i will have to split it into 2 separate SQL statement with one join each.

2) I need to remove functional duplicates . that is joins between same tables , joins where just table names are reversed table1 join table2 == table2 join table1

Note:
1) I am open to using awk , or sed ..or anything that works.
2) outside Bash i meant , i can use hard coded concatenation to generate most part of the SQL statement apart from the join part.
Moderator's Comments:
Mod Comment Please use CODE tags for sample input and output as well as for code segments.
Saying, "unfortunately our awk version did not match" makes it clear that the operating system and shell you're using is important information that we need to know to be able to supply you with suggestions that will work in your environment. (And, this information should be supplied in every thread started in this forum.) From what you have said above, I would assume that you're using bash, but we don't know which version. So, please tell us what operating system (including the release/version number) and which version of bash you're using so we can provide you with suggestions that will work in your environment.
# 5  
Old 03-06-2018
Thanks Don . I will keep your points in mind before opening a new thread.

The details you asked are given below

Code:
 echo $BASH_VERSION
3.2.57(1)-release

Code:
 uname -a
Linux sedcahdp0390 3.0.101-80-default #1 SMP Fri Jul 15 14:30:41 UTC 2016 (eb2ba81) x86_64 x86_64 x86_64 GNU/Linux

Code:
 eedc_edg_s_d-itm_e@sedcahdp0390:/home/eedc_edg_s_d-itm_e/test_automation : lsb_release -a
LSB Version:    core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch
Distributor ID: SUSE LINUX
Description:    SUSE Linux Enterprise Server 11 (x86_64)
Release:        11
Codename:       n/a

Code:
awk --version
GNU Awk 3.1.8

Code:
sed --version
GNU sed version 4.1.5


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 03-06-2018 at 09:05 AM.. Reason: Added CODE tags.
# 6  
Old 03-06-2018
How about (tested on linux with mawk 1.3.3)
Code:
awk -F\| '
        {TMP[$2] = TMP[$2] FA[$2] $1
         FA[$2] = FS
        }
END     {CH[1] = 97
         for (t in TMP) {n = split (TMP[t], T)
                         printf "select * from %s %c", T[1], CH[1]
                         for (i=2; i<=n; i++)   {CH[i] = CH[i-1] + 1
                                                 printf " inner join %s %c on %c.%s=%c.%s", T[i], CH[i], CH[1], t, CH[i], t
                                                }
                         printf RS
                        }
        }
' file
select * from Table1 a inner join Table2 b on a.Column1=b.Column1 inner join Table5 c on a.Column1=c.Column1
select * from Table3 a inner join Table2 b on a.Column2=b.Column2
select * from Table4 a inner join Table2 b on a.Column3=b.Column3
select * from Table2 a inner join Table5 b on a.Column4=b.Column4
select * from Table2 a

EDIT: or mayhap with a simpler END section:
Code:
END     {split ("a b c d e f g", CH, " ")
         for (t in TMP) {n = split (TMP[t], T)
                         printf "select * from %s %c", T[1], CH[1]
                         for (i=2; i<=n; i++)   printf " inner join %s %c on %c.%s=%c.%s", T[i], CH[i], CH[1], t, CH[i], t
                         printf RS
                        }


Last edited by RudiC; 03-06-2018 at 08:53 AM..
This User Gave Thanks to RudiC For This Post:
# 7  
Old 03-06-2018
Thank you RudiC . Both approaches are working fine. I am testing the solution with real data and will let you know in case something comes up
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Paste columns based on common column: multiple files

Hi all, I've multiple files. In this case 5. Space separated columns. Each file has 12 columns. Each file has 300-400K lines. I want to get the output such that if a value in column 2 is present in all the files then get all the columns of that value and print it side by side. Desired output... (15 Replies)
Discussion started by: genome
15 Replies

2. Shell Programming and Scripting

Join multiple lines from text file

Hi Guys, Could you please advise how to join multiple details lines into single row, with HEADER 1 as the record separator and comma(,) as the field separator. Input: HEADER 1, HEADER 2, HEADER 3, 11,22,33, COLUMN1,COLUMN2,COLUMN3, AA1, BB1, CC1, END: ABC HEADER 1, HEADER 2,... (3 Replies)
Discussion started by: budz26
3 Replies

3. Shell Programming and Scripting

Join common patterns in multiple lines into one line

Hi I have a file like 1 2 1 2 3 1 5 6 11 12 10 2 7 5 17 12 I would like to have an output as 1 2 3 5 6 10 7 11 12 17 any help would be highly appreciated Thanks (4 Replies)
Discussion started by: Harrisham
4 Replies

4. UNIX for Dummies Questions & Answers

How to join 2 .txt files based on a common column?

Hi all, I'm trying to join two .txt file tab delimitated based on a common column. File 1 transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct comp1000201_c0_seq1 comp1000201_c0 337 183.51 0.00 0.00 0.00 0.00 comp1000297_c0_seq1 ... (1 Reply)
Discussion started by: alisrpp
1 Replies

5. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

6. Shell Programming and Scripting

join files based on a common field

Hi experts, Would you please help me with this? I have several files and I need to join the forth field of them based on the common first field. here's an example... first file: 280346 39.88 -75.08 547.8 280690 39.23 -74.83 538.7 280729 40.83 -75.08 499.2 280907 40.9 -74.4 507.8... (5 Replies)
Discussion started by: GoldenFire
5 Replies

7. Shell Programming and Scripting

Join multiple files based on 1 common column

I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ? file1 ax100 20 30 40 ax200 22 33 44 file2 ax100 10 20 40 ax200 12 13 44 file2 ax100 0 0 4 ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies

8. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

9. Shell Programming and Scripting

sum multiple columns based on column value

i have a file - it will be in sorted order on column 1 abc 0 1 abc 2 3 abc 3 5 def 1 7 def 0 1 -------- i'd like (awk maybe?) to get the results (any ideas)??? abc 5 9 def 1 8 (2 Replies)
Discussion started by: jjoe
2 Replies

10. UNIX for Dummies Questions & Answers

find common lines using just one column to compare and result with all columns

Hi. If we have this file A B C 7 8 9 1 2 10 and this other file A C D F 7 9 2 3 9 2 3 4 The result i´m looking for is intersection with A B C D F so the answer here will be (10 Replies)
Discussion started by: alcalina
10 Replies
Login or Register to Ask a Question