Sort and join multiple columns using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sort and join multiple columns using awk
# 1  
Old 01-30-2013
Sort and join multiple columns using awk

Is it possible to join all the values after sorting them based on 1st column key and replace empty rows with 0 like below ?

input
Code:
a1	0	a1	1	a1	1	a3	1	b2	1
a2	1	a4	1	a2	1	a4	1	c4	1
a3	1	d1	1	a3	1	b1	1	d1	1
a4	1			c4	1	b2	1		
b1	1								
b2	1								
c4	1								
d1	1

output
Code:
a1	0	1	1	0	0
a2	1	0	1	0	0
a3	1	0	1	1	0
a4	1	1	0	1	0
b1	1	0	0	1	0
b2	1	0	0	1	1
c4	1	0	1	0	1
d1	1	1	0	0	1

# 2  
Old 01-30-2013
Code:
mute@clt:~/temp/quincyjones$ ./script
a1      0       1       1       0       0
a2      1       0       1       0       0
a3      1       0       1       1       0
a4      1       1       0       1       0
b1      1       0       0       1       0
b2      1       0       0       1       1
c4      1       0       1       0       1
d1      1       1       0       0       1

Code:
#!/bin/sh
awk 'BEGIN { FS=OFS="\t"; }
{
        if (NF > maxNF) maxNF=NF
        for (i=1;i<=NF;i+=2) {
                if (length($i) < 1) continue #skip blanks
                #new, map it
                if (!name_to_idx[$i]) {
                        idx_to_name[++idx]=$i
                        name_to_idx[$i]=idx
                }
                arr[$i,i]=$(i+1) #make matrix
        }
}
END {
        for (i=1;i<=idx;i++) {
                printf("%s",idx_to_name[i])
                for (j=1;j<maxNF;j+=2)
                        printf("%s%d", OFS, arr[idx_to_name[i],j])
                print "" #newline
        }
}' file | sort -k1,1


Last edited by neutronscott; 01-30-2013 at 05:28 PM.. Reason: it should work now?
This User Gave Thanks to neutronscott For This Post:
# 3  
Old 01-30-2013
Here is another dirty approach using join command.

Since join command cannot accept more than 2 files at a time, we have to split the orignal file and perform the join operation:
Code:
awk -F'\t' '{ print $1,$2 > "file1"; print $3,$4 > "file2"; print $5,$6 > "file3"; print $7,$8 > "file4"; print $9,$10 > "file5"; }' filename
join -a1 -1 1 -2 1 -o 1.1 1.2 2.2 -e "0" file1 file2 > j1
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 2.2 -e "0" j1 file3 > j2
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 1.4 2.2 -e "0" j2 file4 > j3
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 1.4 1.5 2.2 -e "0" j3 file5


Last edited by Yoda; 01-30-2013 at 07:05 PM.. Reason: set field separator to tab spaces
# 4  
Old 01-30-2013
Assuming the columns are sorted like in the sample, try:
Code:
awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' OFS='\t' file

These 2 Users Gave Thanks to Scrutinizer For This Post:
# 5  
Old 01-30-2013
Scrutinizer, I see some minor discrepancy in the o/p:
Code:
$ awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' OFS='\t' filename
a1      0       1       1       0       0                     
a2      1       0       1       0       0                     
a3      1       0       1       1       0                     
a4      1       1       0       1       0                     
b1      1       0       0       1       0                     
b2      1       0       1       0       1                    
c4      1       1       0       0       1
d1      1       1       0       0       1

EDIT: I noticed discrepancy even in my join command o/p.

EDIT: Got it fixed after setting the field separator to tab spaces -F'\t'

EDIT: Scrutinizer, setting the field separator to tab spaces for your awk code gives correct o/p
This User Gave Thanks to Yoda For This Post:
# 6  
Old 01-30-2013
OK, thanks, so:
Code:
awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' FS='\t' OFS='\t' file

This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 01-31-2013
@Scrutinizer: Is it possible to apply your code even on unsorted file ?
@Neutron: Your code making all 0.* values as 0 ?

@Neutron : Is it possible to define the number of value-columns for a key column, when I run the script? In the above example (1 key and 1 value). This example (1 key 5 value cols)

Code:
a1	0	1	1	0	1	a1	1	1	1	0	1	a1	1	1	1	1	1	a1	0	0	0	0	1	a1	1	1	1	1	1	1
a2	1	0	1	0	1	a4	1	1	1	1	1	a4	0	0	1	1	1	a4	0	0	0	0	1	a4	1	1	1	1	1	1
a3	1	0	1	0	1	d1	1	1	1	0	1	d1	1	1	0	0	1	d1	0	0	0	0	1	d1	1	1	1	1	1	1
a4	1	0	1	0	1	c4	1	1	1	0	0	c4	1	1	1	0	0	c4	0	0	0	0	1	c4	1	1	1	1	1	1
b1	1	0	1	0	1																									
b2	1	1	1	0	1																									
c4	1	1	1	0	1																									
d1	1	1	1	0	1

---------- Post updated at 09:19 AM ---------- Previous update was at 04:50 AM ----------
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies

2. Shell Programming and Scripting

Join and merge multiple files with duplicate key and fill void columns

Join and merge multiple files with duplicate key and fill void columns Hi guys, I have many files that I want to merge: file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: (5 Replies)
Discussion started by: yjacknewton
5 Replies

3. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Hello, I have a file with 2 columns ( tableName , ColumnName) delimited by a Pipe like below . File is sorted by ColumnName. Table1|Column1 Table2|Column1 Table5|Column1 Table3|Column2 Table2|Column2 Table4|Column3 Table2|Column3 Table2|Column4 Table5|Column4 Table2|Column5 From... (6 Replies)
Discussion started by: nv186000
6 Replies

4. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

5. Programming

Ls -ltr Sort multiple columns

Hi All, I have one requirement, where I need to have output of ls -l command sorted on 1) first on filename 2) last modified time ( descending ) - latest change first. I am not able to figure out how to do it.. Also I dont have a way to change Date display for ls -ltr command.. I am... (1 Reply)
Discussion started by: freakabhi
1 Replies

6. Shell Programming and Scripting

sort on multiple columns

hi all, i have a file , having few columns. i wanted to sort it based on 2nd column and then based on 1st column. But i have some problem in first column. first column have characters and numbers, but problem is number of characters are not same in all rows. Few rows have 13 characters and then... (3 Replies)
Discussion started by: deepakiniimt
3 Replies

7. Shell Programming and Scripting

sort on multiple columns

Howdy! Need to sort a large .txt file containing the following, using sort. First based on the 1st column, and then on the 2nd column: Group01.01 1000500 31 0.913 -1.522974494 Group01.01 1001500 16 0.684 -0.967496041 Group01.01 36500 19 0.476 na Group01.02 365500 15 0.400 na... (1 Reply)
Discussion started by: sramirez
1 Replies

8. Shell Programming and Scripting

sort by based on multiple columns

Hi, Is there any way to sort a file in cshell by sort command, sorting it by multiple fields, like to sort it first by the second column and then by the first column. Thanks forhead (1 Reply)
Discussion started by: Takeeshe
1 Replies

9. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

10. Shell Programming and Scripting

awk or python to join alternating columns

Hi, here is my input TEST-SAM-A6ZZ SM-S6KV 0 0 2 1 2 3 3 3 2 2 2 2 1 1 2 2 1 3 1 3 1361 SM-HA4Q 0 0 2 1 3 3 3 3 2 2 2 2 1 1 2 2 1 3 3 3 4 2 4 2 2 276217 SM-H9ZG 0 0 2 2 3 3 3 3 2 2 2 2 1 1 2 2 3 3 1 3 2 2 2 2 GC15458 SM-HAQX 0 0 2 1 2 3 1 3 2 2 2 2 1 1 2 2 1 3 1 3 2 2 2 18331 SM-HA5E 0 0 2 2... (2 Replies)
Discussion started by: genehunter
2 Replies
Login or Register to Ask a Question