Join 4 files on first three columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Join 4 files on first three columns
# 1  
Old 08-29-2012
Join 4 files on first three columns

Hi,

Can someone suggest me on how to join 4 files by comparing the first three columns?

---------- Post updated at 03:56 PM ---------- Previous update was at 03:42 PM ----------

Hope it helps someone.

I was looking online for a solution and on stackoverflow, I found a solution and tried the following command

Code:
awk '{o1=$1;o2=$2;o3=$3; $1=$2=$3="";gsub(" +","");_[o1 FS o2 FS o3]=_[o1 FS o2 FS o3] FS $0} END{ for(i in _) print i,_[i] }' file1 file2

My input files are this way

input1:
Code:
a 1 2 3 x
b 3 4 5 y
c 6 7 8 z

input2:

Code:
a 1 2 4 y
b 3 4 6 u
c 8 9 7 z

output:

Code:
c 8 9  7z
b 3 4  5y 6u
a 1 2  3x 4y
c 6 7  8z

---------- Post updated at 03:58 PM ---------- Previous update was at 03:56 PM ----------

Can the above code be modified for four files?

Last edited by jacobs.smith; 08-29-2012 at 04:57 PM.. Reason: code tags
# 2  
Old 08-30-2012
please provide the sample input and output required for 4 file
# 3  
Old 08-30-2012
Why don't you add the filenames on the command line and check for yourself? Smilie
This User Gave Thanks to elixir_sinari For This Post:
# 4  
Old 08-30-2012
...and why don't you use join to join files? It was meant for exactly this purpose.

Of course it is possible to solve any problem in arbitrarily complex form using tools not suited for the purpose. Still, this begs the question: is there any point to it? Using "awk" for what "join" was intended to do is like using poison ivy instead of toilet paper: less effective while being probably dangerous and perhaps cumbersome.

I hope this helps.

bakunin
# 5  
Old 08-30-2012
Quote:
Originally Posted by elixir_sinari
Why don't you add the filenames on the command line and check for yourself? Smilie
Hi elixir,

Thanks for the suggestion. Will try it now.

---------- Post updated at 09:17 AM ---------- Previous update was at 09:17 AM ----------

Quote:
Originally Posted by bakunin
...and why don't you use join to join files? It was meant for exactly this purpose.

Of course it is possible to solve any problem in arbitrarily complex form using tools not suited for the purpose. Still, this begs the question: is there any point to it? Using "awk" for what "join" was intended to do is like using poison ivy instead of toilet paper: less effective while being probably dangerous and perhaps cumbersome.

I hope this helps.

bakunin
That was a very funny comparison between poison ivy to toilet paper. I liked it Smilie

I am comparing on three columns in each of the file. Join is doing it only on one column. So, I was forced to use awk. Any further suggestions are appreciated.

---------- Post updated at 09:24 AM ---------- Previous update was at 09:17 AM ----------

@raj_saini20,

Here is wat I did? I am getting my expected output. But, how do I make it tab separated or sorted? Can u enhance it?


cat 1
Code:
a 1 2 3 x
b 3 4 5 y
c 6 7 8 z

cat 2
Code:
a 1 2 4 y
b 3 4 6 u
c 8 9 7 z

cat 3
Code:
a 1 2 7 g
b 3 4 9 k
c 45 67 9 m

cat 4
Code:
a 1 2 77 m
b 3 4 7 j
c 34 56 8 p

Code:
awk '{o1=$1;o2=$2;o3=$3; $1=$2=$3="";gsub(" +","");_[o1 FS o2 FS o3]=_[o1 FS o2 FS o3] FS $0} END{ for(i in _) print i,_[i] }' 1 2 3 4

generated output
Code:
c 8 9  7z
b 3 4  5y 6u 9k 7j
c 45 67  9m
a 1 2  3x 4y 7g 77m
c 6 7  8z
c 34 56  8p

@elixir, thanks for a beautiful suggestion my friend. It worked.
# 6  
Old 08-30-2012
Quote:
Originally Posted by jacobs.smith
I am comparing on three columns in each of the file. Join is doing it only on one column. So, I was forced to use awk. Any further suggestions are appreciated.
Actually join can use arbitrary delimiters, so it is possible to use your favourite text filter (awk, sed, ed, tr, ....) to put these in (alternatively remove the respective blanks to make it one key) and remove them (alternatively put them back in) after the join.

You could have come up with this yourself, couldn't you?

bakunin
This User Gave Thanks to bakunin For This Post:
# 7  
Old 08-30-2012
Quote:
Originally Posted by bakunin
Actually join can use arbitrary delimiters, so it is possible to use your favourite text filter (awk, sed, ed, tr, ....) to put these in (alternatively remove the respective blanks to make it one key) and remove them (alternatively put them back in) after the join.

You could have come up with this yourself, couldn't you?

bakunin
You are right, I got you.

Merge first three columns as one key, run join and after I see the output, re split them back with desired delimiters. Thanks for another valuable thought.
I can do that. Thanks for the confidence donation. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies

3. Shell Programming and Scripting

Join and merge multiple files with duplicate key and fill void columns

Join and merge multiple files with duplicate key and fill void columns Hi guys, I have many files that I want to merge: file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: (5 Replies)
Discussion started by: yjacknewton
5 Replies

4. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

5. Shell Programming and Scripting

Join two lines into one, but the second line only the last two columns

Hi guys, I hope you are doing well! I have a file and I need to join two lines into one, but the second line I need only the last two columns. ================= "eHealth Trend Report","logoRpt" "LAN/WAN Group 123" "Divide by Time" "switch1_a-RH-Serial0" "BW: 1.02 M" ... (4 Replies)
Discussion started by: antoniorajr
4 Replies

6. Shell Programming and Scripting

Join two files with matching columns

Hi, I need to join two files together with one common value in a column. I think I can use awk or join or a combination but I can't quite get it. Basically my data looks like this, with the TICKER columns matching up in each file File1 TICKER,column 1, column, 2, column, 3, column 4 ... (6 Replies)
Discussion started by: unkleruckus
6 Replies

7. Shell Programming and Scripting

Sort and join multiple columns using awk

Is it possible to join all the values after sorting them based on 1st column key and replace empty rows with 0 like below ? input a1 0 a1 1 a1 1 a3 1 b2 1 a2 1 a4 1 a2 1 a4 1 c4 1 a3 1 d1 1 a3 1 b1 1 d1 1 a4 1 c4 1 b2 1 b1 1 b2 1 c4 1 d1 1 output... (8 Replies)
Discussion started by: quincyjones
8 Replies

8. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

9. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

10. Shell Programming and Scripting

Join columns from 2 files

Hello guys. I need to join columns of Start Time and End time of 2 files. The Examples is: File 1 APPLICATION GROUP_NAME JOB_NAME ODATE STATUS START_TIME END_TIME :: MODCMPDA EDPONLINE MC00A1700 071102 Ended OK 20071102 17:00:01 File 2 APPLICATION GROUP_NAME JOB_NAME ODATE... (2 Replies)
Discussion started by: osramos
2 Replies
Login or Register to Ask a Question