|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Sort and join multiple columns using awk
Is it possible to join all the values after sorting them based on 1st column key and replace empty rows with 0 like below ? input Code:
a1 0 a1 1 a1 1 a3 1 b2 1 a2 1 a4 1 a2 1 a4 1 c4 1 a3 1 d1 1 a3 1 b1 1 d1 1 a4 1 c4 1 b2 1 b1 1 b2 1 c4 1 d1 1 output Code:
a1 0 1 1 0 0 a2 1 0 1 0 0 a3 1 0 1 1 0 a4 1 1 0 1 0 b1 1 0 0 1 0 b2 1 0 0 1 1 c4 1 0 1 0 1 d1 1 1 0 0 1 |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
Code:
mute@clt:~/temp/quincyjones$ ./script a1 0 1 1 0 0 a2 1 0 1 0 0 a3 1 0 1 1 0 a4 1 1 0 1 0 b1 1 0 0 1 0 b2 1 0 0 1 1 c4 1 0 1 0 1 d1 1 1 0 0 1 Code:
#!/bin/sh
awk 'BEGIN { FS=OFS="\t"; }
{
if (NF > maxNF) maxNF=NF
for (i=1;i<=NF;i+=2) {
if (length($i) < 1) continue #skip blanks
#new, map it
if (!name_to_idx[$i]) {
idx_to_name[++idx]=$i
name_to_idx[$i]=idx
}
arr[$i,i]=$(i+1) #make matrix
}
}
END {
for (i=1;i<=idx;i++) {
printf("%s",idx_to_name[i])
for (j=1;j<maxNF;j+=2)
printf("%s%d", OFS, arr[idx_to_name[i],j])
print "" #newline
}
}' file | sort -k1,1Last edited by neutronscott; 01-30-2013 at 04:28 PM.. Reason: it should work now? |
| The Following User Says Thank You to neutronscott For This Useful Post: | ||
quincyjones (01-31-2013) | ||
| Sponsored Links | ||
|
|
#3
|
||||
|
||||
|
Here is another dirty approach using
join command. Since join command cannot accept more than 2 files at a time, we have to split the orignal file and perform the join operation: Code:
awk -F'\t' '{ print $1,$2 > "file1"; print $3,$4 > "file2"; print $5,$6 > "file3"; print $7,$8 > "file4"; print $9,$10 > "file5"; }' filename
join -a1 -1 1 -2 1 -o 1.1 1.2 2.2 -e "0" file1 file2 > j1
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 2.2 -e "0" j1 file3 > j2
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 1.4 2.2 -e "0" j2 file4 > j3
join -a1 -1 1 -2 1 -o 1.1 1.2 1.3 1.4 1.5 2.2 -e "0" j3 file5Last edited by Yoda; 01-30-2013 at 06:05 PM.. Reason: set field separator to tab spaces |
|
#4
|
||||
|
||||
|
Assuming the columns are sorted like in the sample, try: Code:
awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' OFS='\t' file |
| The Following 2 Users Say Thank You to Scrutinizer For This Useful Post: | ||
quincyjones (01-31-2013), Yoda (01-30-2013) | ||
| Sponsored Links | |
|
|
#5
|
||||
|
||||
|
Scrutinizer, I see some minor discrepancy in the o/p: Code:
$ awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' OFS='\t' filename
a1 0 1 1 0 0
a2 1 0 1 0 0
a3 1 0 1 1 0
a4 1 1 0 1 0
b1 1 0 0 1 0
b2 1 0 1 0 1
c4 1 1 0 0 1
d1 1 1 0 0 1EDIT: I noticed discrepancy even in my join command o/p. EDIT: Got it fixed after setting the field separator to tab spaces -F'\t' EDIT: Scrutinizer, setting the field separator to tab spaces for your awk code gives correct o/p |
| The Following User Says Thank You to Yoda For This Useful Post: | ||
Scrutinizer (01-30-2013) | ||
| Sponsored Links | |
|
|
#6
|
||||
|
||||
|
OK, thanks, so: Code:
awk '{s=$1; for(i=1; i<=5; i++) {A[$(i*2-1),i]=$(i*2); s=s OFS A[$1,i]+0} print s}' FS='\t' OFS='\t' file |
| The Following User Says Thank You to Scrutinizer For This Useful Post: | ||
Yoda (01-30-2013) | ||
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
@Scrutinizer: Is it possible to apply your code even on unsorted file ? @Neutron: Your code making all 0.* values as 0 ? @Neutron : Is it possible to define the number of value-columns for a key column, when I run the script? In the above example (1 key and 1 value). This example (1 key 5 value cols) Code:
a1 0 1 1 0 1 a1 1 1 1 0 1 a1 1 1 1 1 1 a1 0 0 0 0 1 a1 1 1 1 1 1 1 a2 1 0 1 0 1 a4 1 1 1 1 1 a4 0 0 1 1 1 a4 0 0 0 0 1 a4 1 1 1 1 1 1 a3 1 0 1 0 1 d1 1 1 1 0 1 d1 1 1 0 0 1 d1 0 0 0 0 1 d1 1 1 1 1 1 1 a4 1 0 1 0 1 c4 1 1 1 0 0 c4 1 1 1 0 0 c4 0 0 0 0 1 c4 1 1 1 1 1 1 b1 1 0 1 0 1 b2 1 1 1 0 1 c4 1 1 1 0 1 d1 1 1 1 0 1 ---------- Post updated at 09:19 AM ---------- Previous update was at 04:50 AM ---------- |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| sort on multiple columns | deepakiniimt | Shell Programming and Scripting | 3 | 07-05-2012 10:15 AM |
| [Solved] sort on multiple columns | sramirez | Shell Programming and Scripting | 1 | 01-18-2012 04:29 AM |
| sort by based on multiple columns | Takeeshe | Shell Programming and Scripting | 1 | 05-30-2010 07:07 PM |
| Join 2 files with multiple columns: awk/grep/join? | InfoSeeker | UNIX for Dummies Questions & Answers | 3 | 12-01-2009 07:45 PM |
| awk or python to join alternating columns | genehunter | Shell Programming and Scripting | 2 | 10-14-2009 09:45 PM |
|
|