Sponsored Content
Top Forums UNIX for Dummies Questions & Answers awk Grouping and Subgrouping with Counts Post 302776483 by JoshCrosby on Wednesday 6th of March 2013 11:52:20 AM
Old 03-06-2013
Question

Quote:
Originally Posted by Don Cragun
A bit more, yes. But still not clear. In your sample input there are 6 occurrences each of products p1, p2, p3, and p4. You say you want the top 3 products, but your sample output only shows 2. (And since there are four products with the same number of occurrences, you don't say how to choose which 3 of those 4 should be chosen.) Your sample output didn't show the top 2 product/skew pairs p1/12345 and p2/23456 both of which appear 6 times even though p1 and p2 appear the same number of times as p3 and p4???
I apologize, I should have included a bit more data.

Quote:
From what you did with your 1 count sample, you chose the last two of the set of the four most common products based on the fact that their product names sorted last. Is that really what you want?
I didn't sort the output, so my bad.

Quote:
If there are ties, should your results include all products that match the number of occurrences of the third most common product? If there are ties in the number of appearances of a skew within a product, should the results include all skews with the fifth most common skew within that product?
Yes to the last question, if there are ties (skews) from the products, then those should rollup under that product.

Quote:
Will a single skew ever appear with more than one product, or are skews supposed to be unique to a product.
Not necessarily, a different product can have the same skew, even though it may mean something completely different.

Here is some more data, I hope this is enough to give you an idea. Like I said, these files are huge and contain some pretty sensitive data otherwise I would upload the entire file.

Thanks again with your help!!!!
Code:
product|skew
p1|12345
p2|23456
p3|234
p4|98707
p1|12345
p2|23456
p3|2343
p4|98706
p1|12345
p2|23456
p3|234
p5|36748
p4|98708
p1|12345
p2|23456
p3|234
p4|98708
p1|12345
p6|23467
p2|23456
p3|234345
p4|98708
p1|12345
p2|23456
p3|234345
p4|98707
p3|234345
p4|98707
p3|234345
p4|98707
p3|234345
p4|98707
p3|234345
p4|98707
p3|234345
p4|98707
p3|234345
p4|98707
p3|234345
p4|98707
p6|23467
p6|23467
p6|23467
p6|23467
p6|23467
p6|23467
p6|23467
p4|098
p4|098
p4|098
p4|098
p4|1234
p4|1234

In SQL Server, it may look like this...
Code:
-- create temporary table to hold the top 3 products
CREATE TABLE #temp_P
  (
   id INT IDENTITY(1, 1)
		  PRIMARY KEY
 , prod_id INT
 , pageviews INT
  )
INSERT	INTO [#temp_P]
		([prod_id]
	   , [pageviews])
		SELECT TOP 3
				[prod_id]
			  , COUNT(*) AS [Page Views]
		FROM	[dbo].[PageViews]
		GROUP BY [product-id]
		ORDER BY COUNT(*) DESC;

-- create temporary table to hold the top 5, by 3 products
CREATE TABLE #temp_S
  (
   id INT IDENTITY(1, 1)
		  PRIMARY KEY
 , skew_id VARCHAR(50)
 , pageviews INT
 , prod_id INT
  )
DECLARE	@counter INT
, @prod_id INT
SET @counter = 1
WHILE @counter <= (SELECT COUNT (id) FROM [#temp_P]) 
  BEGIN
	SET @prod_id = (SELECT [prod_id] FROM [#temp_P] WHERE id = @counter)
	 -- iterate through product ids and insert skews 
	BEGIN	
	  PRINT 'Inserting top 5 skews for: ' + CAST(@prod_id AS VARCHAR(100))
	  INSERT  INTO [#temp_S]
			  ([skew_id]
			 , [prod_id]
			 , [pageviews])
			  SELECT TOP 5
					  [product-id]
					, @prod_id
					, COUNT(*) AS [pageviews]
			  FROM	  [dbo].[PageViews]
			  WHERE	  [product-id] = @prod_id
			  GROUP BY [product-id]
			  ORDER BY COUNT(*) DESC;
	END

	SET @counter = @counter + 1    
  END

SELECT	[prod_id]
	  , [skew_id]
	  , [pageviews]
FROM	[#temp_S]
GROUP BY [prod_id]
	  , [skew_id]
	  , [pageviews]
ORDER BY [prod_id]
	  , [pageviews] DESC

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

counts

How can i do a simple record count in my shell script? i just want to count the number of records i receive from a specific file. (11 Replies)
Discussion started by: k@ssidy
11 Replies

2. UNIX for Dummies Questions & Answers

counts

To start I have a table that has ticketholders. Each ticket holder has a unique number and each ticket holder is associated to a so called household number. You can have multiple guests w/i a household. I would like to create 3 flags (form a, for a household that has 1-4 gst) form b 5-8 gsts... (3 Replies)
Discussion started by: sbr262
3 Replies

3. Shell Programming and Scripting

Grouping using sed/awk ?

I run awk cat $1|awk '{print $6}' and get a lot of results and I want results to group them. For example my result is (o/p is unknown to user) xyz xyz abc pqr xyz pqr etc I wanna group them as xyz=total found 7 abc=total .... pqr= Thank (3 Replies)
Discussion started by: pujansrt
3 Replies

4. Shell Programming and Scripting

awk grouping by name script

Hello I am trying to figure out a script which could group a log file by user names. I worked with awk command and I could trim the log file to: <USER: John Frisbie > /* Thu Aug 06 2009 15:11:45.7974 */ FLOAT GRANT WRITE John Frisbie (500 of 3005 write) <USER: Shawn Sanders > /* Thu Aug 06... (2 Replies)
Discussion started by: Avto
2 Replies

5. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Hi, I need an awk script (or whatever shell-construct) that would take data like below and get the max value of 3 column, when grouping by the 1st column. clientname,day-of-month,max-users ----------------------------------- client1,20120610,5 client2,20120610,2 client3,20120610,7... (3 Replies)
Discussion started by: ckmehta
3 Replies

6. Shell Programming and Scripting

awk and perl grouping.

Hello folks. After awk, i have decided to start to learn perl, and i need some help. I have following output : 1 a 1 b 2 k 2 f 3 s 3 p Now with awk i get desired output by issuing : awk ' { a = a FS $2 } END { for ( i in a) print i,a }' input 1 a b 2 k f 3 s p Can... (1 Reply)
Discussion started by: Peasant
1 Replies

7. Shell Programming and Scripting

grouping using sed or awk

I have below inside a file. 11.22.33.44 user1 11.22.33.55 user2 I need this manipulated as alias server1.domain.com='ssh user1@11.22.33.44' alias server2.domain.com='ssh user2@11.22.33.55' (3 Replies)
Discussion started by: anil510
3 Replies

8. UNIX for Dummies Questions & Answers

awk adding counts together from column

Hello Im new treat me nicely, I have a headache :) I have a script that seemed to work now it doesnt anyway, the last part is adding counts of unique items in a csv file eg 05492U34 38 05492U34 47 two columns, (many different values like this in file) i want... (7 Replies)
Discussion started by: aniquebmx
7 Replies

9. Shell Programming and Scripting

Grouping and Subgrouping using awk

I have a data which looks like 1440993600|L|ABCDEF 1440993600|L|ABCD 1440993601|L|ABCDEF 1440993602|L|ABC 1440993603|L|ABCDE . . . 1441015200|L|AB 1441015200|L|ABC 1441015200|L|ABCDEF So basically, the $1 is epoch date, $2 and $3 is some application data From one if the... (5 Replies)
Discussion started by: hemanty4u
5 Replies

10. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :). file chr5 77316500 77316628 ... (6 Replies)
Discussion started by: cmccabe
6 Replies
All times are GMT -4. The time now is 08:25 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy