Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Merge input from two files into one based on conditions

Shell Programming and Scripting


Tags
awk, bash, join, linux, merging files

Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 05-20-2015   -   Original Discussion by Zooma
Zooma's Unix or Linux Image
Zooma Zooma is offline
Registered User
 
Join Date: Mar 2015
Last Activity: 5 April 2017, 1:01 PM EDT
Posts: 37
Thanks: 41
Thanked 2 Times in 2 Posts
Merge input from two files into one based on conditions

Using Linux (bash), I have two files which contain information about berries. Example:

file1.txt:


Code:
Blueberry blue 14
Raspberry red 12
Blackberry dark 4

file2.txt


Code:
Blackberry sour 4 3
Blueberry tasty 12 78
Strawberry yummy 33 88

I want to merge these two files into one. The desired result from the above would be:


Code:
Blueberry blue 14 tasty 12 78
Raspberry red 12 - - -
Blackberry dark 4 sour 4 3
Strawberry - - yummy 33 88

So the three columns of file1.txt and the four columns of file2.txt should result in a new file with six columns where all berries present in these two files only are represented with one row each. If a berry is not present in one of the two files, the corresponding entries from that file are replaced with a dash (like for Strawberry and Raspberry above).

This command is getting me somewhere close:


Code:
awk 'FNR==NR{a[$1]=$2 FS $3 FS $4;next}{ print $0, a[$1]}' file2.txt file1.txt

Result:


Code:
Blueberry blue 14 tasty 12 78
Raspberry red 12
Blackberry dark 4 sour 4 3

Only prints the berries that are present in file1.txt though and it doesn't add the dashes. Can someone help?

Thanks!
/Z
Sponsored Links
    #2  
Old Unix and Linux 05-20-2015   -   Original Discussion by Zooma
RudiC's Unix or Linux Image
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 22 May 2018, 5:11 PM EDT
Location: Aachen, Germany
Posts: 12,727
Thanks: 413
Thanked 3,912 Times in 3,598 Posts
You're not too far off:

Code:
awk 'FNR==NR{a[$1]=$2 FS $3 FS $4;next}{ print $0, a[$1]?a[$1]:"- -"; delete a[$1]} END {for (i in a) print i, "- -", a[i]}' file2 file1
Blueberry blue 14 tasty 12 78
Raspberry red 12 - -
Blackberry dark 4 sour 4 3
Strawberry - - yummy 33 88

The Following User Says Thank You to RudiC For This Useful Post:
Zooma (05-20-2015)
Sponsored Links
    #3  
Old Unix and Linux 05-20-2015   -   Original Discussion by Zooma
Zooma's Unix or Linux Image
Zooma Zooma is offline
Registered User
 
Join Date: Mar 2015
Last Activity: 5 April 2017, 1:01 PM EDT
Posts: 37
Thanks: 41
Thanked 2 Times in 2 Posts
Thanks RudiC! Just get a minor error (also added one missing dash in your command):


Code:
$ awk 'FNR==NR{a[$1]=$2 FS $3 FS $4;next}{ print $0, a[$1]?a[$1]:"- - -"; delete a[$1]} END {for (i in a) print i, "- -", a[i]}' file2.txt file1.txt
Blueberry blue 14 tasty 12 78
Raspberry red 12 - - -
Blackberry dark 4 sour 4 3
 - - -
Strawberry - - yummy 33 88

So there's an extra "- - -" there that I can't explain. Bonus question: if I don't know how many columns file1 and file2 have, is there some way of making this command more dynamic?

Thanks!
/Z
    #4  
Old Unix and Linux 05-20-2015   -   Original Discussion by Zooma
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
Join Date: Oct 2010
Last Activity: 20 May 2018, 10:39 PM EDT
Posts: 3,527
Thanks: 154
Thanked 1,258 Times in 1,152 Posts
If you sort your files you can then use join:



Code:
$ join -j 1 -e "-" -a 1 -a 2 -o 0,1.2,1.3,2.2,2.3,2.4 file1.sor file2.sor
Blackberry dark 4 sour 4 3
Blueberry blue 14 tasty 12 78
Raspberry red 12 - - -
Strawberry - - yummy 33 88

The Following User Says Thank You to Chubler_XL For This Useful Post:
Zooma (05-21-2015)
Sponsored Links
    #5  
Old Unix and Linux 05-21-2015   -   Original Discussion by Zooma
RudiC's Unix or Linux Image
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 22 May 2018, 5:11 PM EDT
Location: Aachen, Germany
Posts: 12,727
Thanks: 413
Thanked 3,912 Times in 3,598 Posts
Dynamic case:

Code:
awk '
FNR==NR         {a[$1]=$2 
                 d="-"
                 for (i=3; i<=NF; i++) {a[$1]=a[$1] FS $i; d=d FS "-"}
                 next
                }
                {print $0, a[$1]?a[$1]:d
                 D="-"
                 for (i=3; i<=NF; i++) D=D FS "-"
                 delete a[$1]
                }
END             {for (i in a) print i, D, a[i]
                }
        ' file2 file1

That extra line doesn't appear when I try it...
The Following User Says Thank You to RudiC For This Useful Post:
Zooma (05-21-2015)
Sponsored Links
    #6  
Old Unix and Linux 05-21-2015   -   Original Discussion by Zooma
Zooma's Unix or Linux Image
Zooma Zooma is offline
Registered User
 
Join Date: Mar 2015
Last Activity: 5 April 2017, 1:01 PM EDT
Posts: 37
Thanks: 41
Thanked 2 Times in 2 Posts
Thanks a lot RudiC. #5 was just what I was looking for. :-)
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Need script for making files based on some conditions. ROCK_PLSQL Shell Programming and Scripting 26 05-19-2015 10:10 AM
Merge two input files and concatenate the rest jabriel Shell Programming and Scripting 3 11-01-2014 11:57 AM
Read input files and merge them in given order and write them to input one param or one file hyd1234 Shell Programming and Scripting 4 02-11-2014 10:59 AM
awk merging files based on 2 complex conditions ruby_sgp Shell Programming and Scripting 4 04-06-2010 04:15 AM
any script for joining files based on simple conditions stateperl UNIX for Dummies Questions & Answers 0 03-26-2010 07:12 PM



All times are GMT -4. The time now is 07:15 PM.