|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
gawk asort to sort record groups based on one subfield
input ("/" delimited fields): Code:
style1/book1 (author_C)/editor1/2000 style1/book2 (author_A)/editor2/2004 style1/book3 (author_B)/editor3/2001 style2/book8 (author_B)/editor4/2010 style2/book5 (author_A)/editor2/1998 Records with same field 1 belong to the same group. Using asort (not sort), in each group I need to sort the records in ascending order based on the string between braces in field 2, to obtain: Code:
style1/book2 (author_A)/editor2/2004 style1/book3 (author_B)/editor3/2001 style1/book1 (author_C)/editor1/2000 style2/book5 (author_A)/editor2/1998 style2/book8 (author_B)/editor4/2010 I tried to sort the records by field1 and then by subfield2 in field2, but it didn't work: Code:
BEGIN{FS=OFS="/"}
{
array[$1] = $0
split ($2, aut, " ")
asort(array)
o = asort(aut)
for (o in aut)
print array[aut[o]]
} |
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
The versions of awk that I use (on OS X) don't have the asort() and asorti() functions, but I have read the gawk man page. Unlike the sort utility, there is no way to specify a sort key for these functions; they always sort the array using the entire contents of the string as the sort key. If you want to use asort() in gawk to sort with field 1 as your primary sort key and the second part of field 2 as your secondary key; you need to prepend each line in your array with primary and secondary sort fields, use asort() or asorti() to sort the modified records, and then strip off the added sort fields when you print (or otherwise process) the results.
|
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Quote:
1st) sort by field1 and generate a first output 2nd) use this output to sort by subfield 2 and generate the final output. I tried things like below but still doesn't work. Code:
BEGIN{FS=OFS="/"}
{
# sort by field1
array[$1] = $0
asort(array)
# first output
for (i in array)
$0 = array[i]
# redefine fields in first output
split($0, rec, FS)
rec[$2] = $0
split($0, sub, " ")
aut[++a] = sub[2]
# sort by subfield2
n = asort(aut)
# print final output
for (j=1; j<=n; j++)
print array[aut[j]]
} |
|
#4
|
|||
|
|||
|
Quote:
I don't have access to a system running gawk, but just using standard interfaces, I get the output: Quote:
Code:
#!/bin/ksh
awk 'BEGIN{FS=OFS="/"
tmpfile = "asorti.out"
sortcommand = "sort -t/ -o " tmpfile
cleanup = "rm " tmpfile
}
{ split ($2, sub, " ")
array[$1 "/" sub[2] "/" $0] = $0
}
END{for (i in array) print i | sortcommand
close(sortcommand)
while(getline i < tmpfile) print array[i]
close(tmpfile)
system(cleanup)
}' inwhere in contains the data listed in your first posting on this thread. If I read the gawk man page correctly, this should be roughly equivalent to: Code:
#!/bin/ksh
gawk 'BEGIN{FS = OFS = "/" }
{ split ($2, sub, " ")
array[$1 "/" sub[2] "/" $0] = $0
}
END{n = asorti(array)
for(i = 1; i <= n; print array[i++]);
}' in |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Match groups of capital words using gawk | louisJ | Shell Programming and Scripting | 1 | 05-22-2012 06:55 AM |
| Don't understand AWK asort behaviour | jgilot | Shell Programming and Scripting | 3 | 11-23-2011 06:39 PM |
| Help with sort and keep data record to calculate N50 in c | cpp_beginner | Programming | 5 | 07-19-2011 06:57 AM |
| Gawk / Awk Merge Lines based on Key | Jamesfirst | Shell Programming and Scripting | 9 | 10-28-2010 09:22 AM |
| Removing \n within a record (awk/gawk) | CKT_newbie88 | Shell Programming and Scripting | 10 | 05-13-2009 03:12 PM |
|
|