awk Grouping and Subgrouping with Counts Post: 302776737

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers awk Grouping and Subgrouping with Counts Post 302776737 by Don Cragun on Wednesday 6th of March 2013 06:15:32 PM

03-06-2013

Registered User

I'm not sure I understand all of your requirements, but here is an awk script that I think does what you want. It looks long, but most of this proposed solution is comments rather than running code:

Code:

#!/bin/ksh
PCF=".Product_Counts"
SCF=".Skew_Counts"
# Variable dictionary:
# cmd                           command string to be used to sort product and
#                               skew count files
# ec                            exit code
# i                             loop control
# ndp                           # of different products in top 3 products
# nds                           # of different skews in top 5 skews for a
#                               given top 3 product
# P                             current product name
# p["product"]                  # of times "product" appears in 1st field
# pcf                           sorted product count filename
# pl[p["product"],plc["product"]] list of skew values associated with "product"
# plc["product"]                # of skew values associated with "product"
# ppc                           previous product count
# psc                           previous skew count
# r1, r2                        return code from some function
# S                             current skew
# s["product","skew"]           # of times "skew" appears with "product"
# scf                           sorted skew count filename
awk -v pcf="$PCF" -v scf="$SCF" '
BEGIN { FS = OFS = "|"
}
{       # Process input data...
        # Increment # of times we have seen this product.
        p[$1]++
        if(!(($1, $2) in s))
                # Add a new skew for this product.
                pl[$1, ++plc[$1]] = $2
        # Increment # of times we have seen this skew with this product.
        s[$1, $2]++
}
END {   # Sort the product counts.
        cmd = "sort -t \"|\" -k2,2nr -k1,1 -o " pcf
        for(i in p) print i, p[i] | cmd
        close(cmd)
        ppc = 0
        # Set the sort command to be used to process the skew counts.
        cmd = "sort -t \"|\" -k2,2nr -k1,1 -o " scf
        # Process the top 3 products.
        while((r1 = (getline < pcf)) == 1) {
                # Increment count of top 3 products, but include more if the
                # number of hits is the same for later products
                if(++ndp > 3 && ppc != $2) break
                P = $1
                ppc = $2
                printf("%d hits for product: %s\n", ppc, P)
                # Sort the skew counts for this product.
                for(i = 1; i <= plc[P]; i++)
                        print pl[P, i], s[P, pl[P, i]] | cmd
                close(cmd)
                # Process the top 5 skews for this product.
                nds = 0
                while((r2 = (getline < scf)) == 1) {
                        # Increment count of top 5 skews, but include more if
                        # the number of hits is the same for later skews.
                        if(++nds > 5 && psc != $2) break
                        S = $1
                        psc = $2
                        printf("\t%d hits for skew: %s\n", psc, S)
                }
                if(r2 < 0) {
                        printf("Error reading top 5 skew list from \"%s\".\n",
                                scf)
                        ec = 1
                }
                close(scf)
        }
        if(r1 < 0) {
                printf("Error reading top 3 product list from "%s".\n", pcf)
                ec = 1
        }
        close(pcf)
        exit(ec + 0)
}' file
if [ $? -eq 0 ]
then    # awk completed successfully...
        # Remove sort output files.
        rm "$PCF" "$SCF"
        exit
fi
exit 1

As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk. I used the Korn shell while testing this script, but any shell that accepts basic Bourne shell syntax can be used for this sample.

This script produces the following output when given the input data shown in message #5 in this thread.

Code:

19 hits for product: p4
	9 hits for skew: 98707
	4 hits for skew: 098
	3 hits for skew: 98708
	2 hits for skew: 1234
	1 hits for skew: 98706
13 hits for product: p3
	9 hits for skew: 234345
	3 hits for skew: 234
	1 hits for skew: 2343
8 hits for product: p6
	8 hits for skew: 23467

If the number of hits for later products matches the number of hits for the 3rd highest number of hits, more products will be listed. And, if the number of hits for later skews matches the number of hits for the 5th highest number of hits for a skew within that product, more skews will be listed.

I haven't tested this on a file with millions of entries, but it works the way I expected with a file containing a few hundred entries.

These 2 Users Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

counts

How can i do a simple record count in my shell script? i just want to count the number of records i receive from a specific file.

2. UNIX for Dummies Questions & Answers

counts

To start I have a table that has ticketholders. Each ticket holder has a unique number and each ticket holder is associated to a so called household number. You can have multiple guests w/i a household. I would like to create 3 flags (form a, for a household that has 1-4 gst) form b 5-8 gsts...

3. Shell Programming and Scripting

Grouping using sed/awk ?

I run awk cat $1|awk '{print $6}' and get a lot of results and I want results to group them. For example my result is (o/p is unknown to user) xyz xyz abc pqr xyz pqr etc I wanna group them as xyz=total found 7 abc=total .... pqr= Thank

4. Shell Programming and Scripting

awk grouping by name script

Hello I am trying to figure out a script which could group a log file by user names. I worked with awk command and I could trim the log file to: <USER: John Frisbie > /* Thu Aug 06 2009 15:11:45.7974 */ FLOAT GRANT WRITE John Frisbie (500 of 3005 write) <USER: Shawn Sanders > /* Thu Aug 06...

5. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Hi, I need an awk script (or whatever shell-construct) that would take data like below and get the max value of 3 column, when grouping by the 1st column. clientname,day-of-month,max-users ----------------------------------- client1,20120610,5 client2,20120610,2 client3,20120610,7...

6. Shell Programming and Scripting

awk and perl grouping.

Hello folks. After awk, i have decided to start to learn perl, and i need some help. I have following output : 1 a 1 b 2 k 2 f 3 s 3 p Now with awk i get desired output by issuing : awk ' { a = a FS $2 } END { for ( i in a) print i,a }' input 1 a b 2 k f 3 s p Can...

7. Shell Programming and Scripting

grouping using sed or awk

I have below inside a file. 11.22.33.44 user1 11.22.33.55 user2 I need this manipulated as alias server1.domain.com='ssh user1@11.22.33.44' alias server2.domain.com='ssh user2@11.22.33.55'

8. UNIX for Dummies Questions & Answers

awk adding counts together from column

Hello Im new treat me nicely, I have a headache :) I have a script that seemed to work now it doesnt anyway, the last part is adding counts of unique items in a csv file eg 05492U34 38 05492U34 47 two columns, (many different values like this in file) i want...

9. Shell Programming and Scripting

Grouping and Subgrouping using awk

I have a data which looks like 1440993600|L|ABCDEF 1440993600|L|ABCD 1440993601|L|ABCDEF 1440993602|L|ABC 1440993603|L|ABCDE . . . 1441015200|L|AB 1441015200|L|ABC 1441015200|L|ABCDEF So basically, the $1 is epoch date, $2 and $3 is some application data From one if the...

10. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :). file chr5 77316500 77316628 ...

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

counts

Discussion started by: k@ssidy

2. UNIX for Dummies Questions & Answers

counts

Discussion started by: sbr262

3. Shell Programming and Scripting

Grouping using sed/awk ?

Discussion started by: pujansrt

4. Shell Programming and Scripting

awk grouping by name script

Discussion started by: Avto

5. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Discussion started by: ckmehta

6. Shell Programming and Scripting

awk and perl grouping.

Discussion started by: Peasant

7. Shell Programming and Scripting

grouping using sed or awk

Discussion started by: anil510

8. UNIX for Dummies Questions & Answers

awk adding counts together from column

Discussion started by: aniquebmx

9. Shell Programming and Scripting

Grouping and Subgrouping using awk

Discussion started by: hemanty4u

10. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

Discussion started by: cmccabe