Sort and join multiple columns using awk | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Sort and join multiple columns using awk

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 01-30-2013
quincyjones quincyjones is offline
Registered User
 
Join Date: Aug 2010
Last Activity: 14 April 2014, 5:11 AM EDT
Posts: 128
Thanks: 30
Thanked 0 Times in 0 Posts
Sort and join multiple columns using awk

Is it possible to join all the values after sorting them based on 1st column key and replace empty rows with 0 like below ?

input

Code:
a1	0	a1	1	a1	1	a3	1	b2	1
a2	1	a4	1	a2	1	a4	1	c4	1
a3	1	d1	1	a3	1	b1	1	d1	1
a4	1			c4	1	b2	1		
b1	1								
b2	1								
c4	1								
d1	1

output

Code:
a1	0	1	1	0	0
a2	1	0	1	0	0
a3	1	0	1	1	0
a4	1	1	0	1	0
b1	1	0	0	1	0
b2	1	0	0	1	1
c4	1	0	1	0	1
d1	1	1	0	0	1

Sponsored Links
    #2  
Old 01-30-2013
neutronscott's Avatar
neutronscott neutronscott is online now Forum Advisor  
script kiddie
 
Join Date: Jun 2011
Last Activity: 18 April 2014, 2:04 PM EDT
Location: McMurdo Station, Antarctica
Posts: 745
Thanks: 25
Thanked 215 Times in 202 Posts

Code:
mute@clt:~/temp/quincyjones$ ./script
a1      0       1       1       0       0
a2      1       0       1       0       0
a3      1       0       1       1       0
a4      1       1       0       1       0
b1      1       0       0       1       0
b2      1       0       0       1       1
c4      1       0       1       0       1
d1      1       1       0       0       1


Code:
#!/bin/sh
awk 'BEGIN { FS=OFS="\t"; }
{
        if (NF > maxNF) maxNF=NF
        for (i=1;i<=NF;i+=2) {
                if (length($i) < 1) continue #skip blanks
                #new, map it
                if (!name_to_idx[$i]) {
                        idx_to_name[++idx]=$i
                        name_to_idx[$i]=idx
                }
                arr[$i,i]=$(i+1) #make matrix
        }
}
END {
        for (i=1;i<=idx;i++) {
                printf("%s",idx_to_name[i])
                for (j=1;j<maxNF;j+=2)
                        printf("%s%d", OFS, arr[idx_to_name[i],j])
                print "" #newline
        }
}' file | sort -k1,1


Last edited by neutronscott; 01-30-2013 at 04:28 PM.. Reason: it should work now?
The Following User Says Thank You to neutronscott For This Useful Post:
quincyjones (01-31-2013)
Sponsored Links
    #3  
Old 01-30-2013
Yoda's Avatar
Yoda Yoda is offline Forum Advisor  
Jedi Master
 
Join Date: Jan 2012
Last Activity: 17 April 2014, 8:35 PM EDT
Location: Galactic Empire
Posts: 3,282
Thanks: 227
Thanked 1,157 Times in 1,094 Posts
Here is another dirty approach using join command.

Since join command cannot accept more than 2 files at a time, we have to split the orignal file and perform the join operation:

Code:
awk -F'\t' '{ print $1,$2 > "file1"; print $3,$4 > "file2"; print $5,$6 > "file3"; print $7,$8 > "file4"; print $9,$10 > "file5"; }' filename
join -a1 -1 1 -2 1 -o 1.1 1.2 2.2 -e "0" file1 file2 > j1
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 2.2 -e "0" j1 file3 > j2
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 1.4 2.2 -e "0" j2 file4 > j3
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 1.4 1.5 2.2 -e "0" j3 file5


Last edited by Yoda; 01-30-2013 at 06:05 PM.. Reason: set field separator to tab spaces
    #4  
Old 01-30-2013
Scrutinizer's Avatar
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 18 April 2014, 1:06 PM EDT
Location: Amsterdam
Posts: 8,874
Thanks: 230
Thanked 2,141 Times in 1,922 Posts
Assuming the columns are sorted like in the sample, try:

Code:
awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' OFS='\t' file

The Following 2 Users Say Thank You to Scrutinizer For This Useful Post:
quincyjones (01-31-2013), Yoda (01-30-2013)
Sponsored Links
    #5  
Old 01-30-2013
Yoda's Avatar
Yoda Yoda is offline Forum Advisor  
Jedi Master
 
Join Date: Jan 2012
Last Activity: 17 April 2014, 8:35 PM EDT
Location: Galactic Empire
Posts: 3,282
Thanks: 227
Thanked 1,157 Times in 1,094 Posts
Scrutinizer, I see some minor discrepancy in the o/p:

Code:
$ awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' OFS='\t' filename
a1      0       1       1       0       0                     
a2      1       0       1       0       0                     
a3      1       0       1       1       0                     
a4      1       1       0       1       0                     
b1      1       0       0       1       0                     
b2      1       0       1       0       1                    
c4      1       1       0       0       1
d1      1       1       0       0       1

EDIT: I noticed discrepancy even in my join command o/p.

EDIT: Got it fixed after setting the field separator to tab spaces -F'\t'

EDIT: Scrutinizer, setting the field separator to tab spaces for your awk code gives correct o/p
The Following User Says Thank You to Yoda For This Useful Post:
Scrutinizer (01-30-2013)
Sponsored Links
    #6  
Old 01-30-2013
Scrutinizer's Avatar
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 18 April 2014, 1:06 PM EDT
Location: Amsterdam
Posts: 8,874
Thanks: 230
Thanked 2,141 Times in 1,922 Posts
OK, thanks, so:

Code:
awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' FS='\t' OFS='\t' file

The Following User Says Thank You to Scrutinizer For This Useful Post:
Yoda (01-30-2013)
Sponsored Links
    #7  
Old 01-31-2013
quincyjones quincyjones is offline
Registered User
 
Join Date: Aug 2010
Last Activity: 14 April 2014, 5:11 AM EDT
Posts: 128
Thanks: 30
Thanked 0 Times in 0 Posts
@Scrutinizer: Is it possible to apply your code even on unsorted file ?
@Neutron: Your code making all 0.* values as 0 ?

@Neutron : Is it possible to define the number of value-columns for a key column, when I run the script? In the above example (1 key and 1 value). This example (1 key 5 value cols)


Code:
a1	0	1	1	0	1	a1	1	1	1	0	1	a1	1	1	1	1	1	a1	0	0	0	0	1	a1	1	1	1	1	1	1
a2	1	0	1	0	1	a4	1	1	1	1	1	a4	0	0	1	1	1	a4	0	0	0	0	1	a4	1	1	1	1	1	1
a3	1	0	1	0	1	d1	1	1	1	0	1	d1	1	1	0	0	1	d1	0	0	0	0	1	d1	1	1	1	1	1	1
a4	1	0	1	0	1	c4	1	1	1	0	0	c4	1	1	1	0	0	c4	0	0	0	0	1	c4	1	1	1	1	1	1
b1	1	0	1	0	1																									
b2	1	1	1	0	1																									
c4	1	1	1	0	1																									
d1	1	1	1	0	1

---------- Post updated at 09:19 AM ---------- Previous update was at 04:50 AM ----------
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
sort on multiple columns deepakiniimt Shell Programming and Scripting 3 07-05-2012 10:15 AM
sort on multiple columns sramirez Shell Programming and Scripting 1 01-18-2012 04:29 AM
sort by based on multiple columns Takeeshe Shell Programming and Scripting 1 05-30-2010 07:07 PM
Join 2 files with multiple columns: awk/grep/join? InfoSeeker UNIX for Dummies Questions & Answers 3 12-01-2009 07:45 PM
awk or python to join alternating columns genehunter Shell Programming and Scripting 2 10-14-2009 09:45 PM



All times are GMT -4. The time now is 02:13 PM.