awk Grouping and Subgrouping with Counts

03-06-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by JoshCrosby

This looks awesome!! I have a really dumb question though, in the variables - is that expecting 2 files? one for just the products with counts and one just with skews, products and counts?

No, the variables PCF and SCF in the shell and pcf and scf in awk are the names of temp files used by the script to store sorted lists of products and sorted lists of skews for a selected product, respectively, while producing output in the END actions. The input data comes from a single file named file as marked in red in this excerpt from about eight lines from the end of the shell script:

Code:

        close(pcf)
        exit(ec + 0)
}' file
if [ $? -eq 0 ]
then    # awk completed successfully...

Obviously, you can change file to any other filename you want to use. Or you could change it to "$1" and pass the name of the file you want to process as the only argument to the shell script.

This script doesn't need to sort the entire input file, but with millions of input lines these temporary sort result files could still be large. If they are too big to save in the directory where you run this script, you could add an option to the shell script to specify a different directory for these temp files.

Let me know if you're still confused.

---------- Post updated at 19:08 ---------- Previous update was at 18:45 ----------

Quote:

Originally Posted by JoshCrosby

... ... ...
---------- Post updated at 07:35 PM ---------- Previous update was at 07:12 PM ----------

Works perfect!!!

For those who want to know.

To create the .Skew_Count use this one-liner:

Code:

awk -F"|" '$1 ~/p[0-9]/ { p[$2]++ }END{for (n in p) print n, p[n]}' products.txt >.Skew_Counts

To create the .Product_Counts use this:

Code:

awk -F"|" '$1 ~/p[0-9]/ { p[$1]++ }END{for (n in p) print n, p[n]}' products.txt > .Product_Counts

Don't forget to change file to products.txt at the end of the awk command

Don - HUGE THANK YOU!!!!!!!

By the way, using a MAC, so I don't have KornShell, used Bash without issue.

The code you have above will create the .Product_Counts file used by the script (before sorting it in reverse order by the number of hits for the product and increasing order of product name), but the script produces a .Skew_Counts file for each product entry in the top 3 list. The script never produces the entire list of skews. (You did say that some skews could appear with more than one product.) The lists of skews produced by my script, only show the skew counts for each of the displayed products; skipping all occurrences of the skew for other products.

Note that I developed and tested this on my MacBook Pro. You should have the KornShell available as /bin/ksh on any recent version of OS X.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

03-06-2013

Registered User

6, 0

Join Date: Feb 2013

Last Activity: 5 August 2017, 3:49 PM EDT

Posts: 6

Thanks Given: 3

Thanked 0 Times in 0 Posts

No kidding, I wasn't aware of that ksh was on mac, never really looked though either.

1 question though for understanding. Would you mind explaining this piece, I know it's calling the products.txt file, just trying to understand it a bit better.

Code:

{       # Process input data...
        # Increment # of times we have seen this product.
        p[$1]++
        if(!(($1, $2) in s))
                # Add a new skew for this product.
                pl[$1, ++plc[$1]] = $2
        # Increment # of times we have seen this skew with this product.
        s[$1, $2]++
}

Thanks again!

JoshCrosby

View Public Profile for JoshCrosby

Find all posts by JoshCrosby

03-07-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by JoshCrosby

Code:

1{       # Process input data...
2        # Increment # of times we have seen this product.
3        p[$1]++
4        if(!(($1, $2) in s))
5                # Add a new skew for this product.
6                pl[$1, ++plc[$1]] = $2
7        # Increment # of times we have seen this skew with this product.
8        s[$1, $2]++
9}

Thanks again!

I added magenta line numbers to the code above to make it easier to refer to lines for this discussion.

Since there is no condition on line 1, lines 1 through 9 will be executed for each line read from the input file.
Line 3 adds 1 to p[$1]. $1 is the product from the input file. So, p[$1] is the number of times that the product specified in the first field of this input line has been seen so far in the input file.
Line 4 tests whether or not the skew listed in the 2nd column of this input line has been seen on any other line we have read from the input file before.
If this line is the first line that has this skew ($2) for this product ($1), line 6 increments the number of different skews that have been seen with this product (++plc[$1]) and saves this skew ($2) in the list of skews associated with this product (pl[$1, plc[$1]] = $2).
Then line 8 increments the number of times this skew has been seen with this product (s[$1, $2]++).

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

03-07-2013

Registered User

6, 0

Join Date: Feb 2013

Last Activity: 5 August 2017, 3:49 PM EDT

Posts: 6

Thanks Given: 3

Thanked 0 Times in 0 Posts

Thank you again!

JoshCrosby

View Public Profile for JoshCrosby

Find all posts by JoshCrosby

UNIX for Dummies Questions & Answers

awk Grouping and Subgrouping with Counts

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

Discussion started by: cmccabe

2. Shell Programming and Scripting

Grouping and Subgrouping using awk

Discussion started by: hemanty4u

3. UNIX for Dummies Questions & Answers

awk adding counts together from column

Discussion started by: aniquebmx

4. Shell Programming and Scripting

grouping using sed or awk

Discussion started by: anil510

5. Shell Programming and Scripting

awk and perl grouping.

Discussion started by: Peasant

6. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Discussion started by: ckmehta

7. Shell Programming and Scripting

awk grouping by name script

Discussion started by: Avto

8. Shell Programming and Scripting

Grouping using sed/awk ?

Discussion started by: pujansrt

9. UNIX for Dummies Questions & Answers

counts

Discussion started by: sbr262

10. UNIX for Dummies Questions & Answers

counts

Discussion started by: k@ssidy