Grouping and Subgrouping using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Grouping and Subgrouping using awk
# 1  
Old 08-31-2015
Hammer & Screwdriver Grouping and Subgrouping using awk

I have a data which looks like

Code:
1440993600|L|ABCDEF
1440993600|L|ABCD
1440993601|L|ABCDEF
1440993602|L|ABC
1440993603|L|ABCDE
.
.
.

1441015200|L|AB
1441015200|L|ABC
1441015200|L|ABCDEF

So basically, the $1 is epoch date, $2 and $3 is some application data

From one if the threads, I was following I am using this code to group the 1st column based on the interval

Code:
awk -F "|" '{while ($1>=t*w) t++; A[t]++} END {for (i=1;i<=t;i++) print "    "strftime("%F %T",(i-1)*w)" to "strftime("%F %T",i*w-1)"|"(A[i])}' w=3600 ${work_file} |tail


What I need is sub-group based on 3rd column as well
Meaning if the interval is say 3600 (1 hours) the output should be
Code:
1440993600 to 1440997200|ABCDEF|2
1440993600 to 1440997200|ABCD|1
1440993600 to 1440997200|ABC|1
1440993600 to 1440997200|ABCDE|1
1440997200 to 1441000800|ABCDE|12
1440997200 to 1441000800|ABC|3
1441000800 to 1441004400|ABCD|5
1441000800 to 1441004400|ABCDE|3
1441000800 to 1441004400|ABCDEF|7
.
.
.

# 2  
Old 08-31-2015
Any attempts from your side?

---------- Post updated at 17:29 ---------- Previous update was at 17:28 ----------

Howsoever, try
Code:
awk -F\| '
                {INTV=int($1/3600)
                 I[INTV]++
                 AD[$3]++
                 T[INTV,$3]++
                }
END             {for (i in I) for (a in AD) if (T[i,a]) print i*3600 " to " (i+1)*3600, a, T[i,a]
                }
' OFS="|" file
1441015200 to 1441018800|AB|1
1441015200 to 1441018800|ABCDEF|1
1441015200 to 1441018800|ABC|1
1440993600 to 1440997200|ABCDEF|2
1440993600 to 1440997200|ABCDE|1
1440993600 to 1440997200|ABCD|1
1440993600 to 1440997200|ABC|1

# 3  
Old 08-31-2015
Another awk:
Code:
awk '$1>i{i=$1+3600} {A[i FS $3]++} END{for(i in A) {$0=i; print $1-3600 " to " $0, A[i]}}' FS=\| OFS=\| file | sort


Last edited by Scrutinizer; 08-31-2015 at 03:11 PM..
# 4  
Old 09-01-2015
Thanks All

In relation to the same data file, I am trying to group using the command I mentioned above

Code:
awk -F "|" '{while ($1>=t*w) t++; A[t]++} END {for (i=1;i<=t;i++) print "    "strftime("%F %T",(i-1)*w)" to "strftime("%F %T",i*w-1)"|"(A[i])}' w=3600 ${work_file} |tail

This code behaves correctly for small set data in a file
The moment the data is like some 2M records, it does not throw correct results

For example in the same data example above, when I run it over 2M records, I get the output as this way

Code:
    2015-09-01 00:00:00 to 2015-09-01 00:59:59|377387
    2015-09-01 01:00:00 to 2015-09-01 01:59:59|372157
    2015-09-01 02:00:00 to 2015-09-01 02:59:59|386135
    2015-09-01 03:00:00 to 2015-09-01 03:59:59|335708
    2015-09-01 04:00:00 to 2015-09-01 04:59:59|382802
    2015-09-01 05:00:00 to 2015-09-01 05:59:59|6449915

The last count 6449915 is actually not correct
It is displayed as
Code:
Total records in the file minus (sum of 377387 + 372157 + 386135 + 335708 + 382802) which is visually not correct at all


Last edited by hemanty4u; 09-01-2015 at 11:57 AM.. Reason: Missed some statements
# 5  
Old 09-01-2015
@Scrutinizer: brilliant and terse; works well if the first epoch time in a group is an integer multiple of 3600 but shifts the entire range output if the time has some additional seconds in it.

@hemanty4u: With the data as given, t is calculated by successive addition of 1, i.e. 400277 loops, while one single division yields the same result. And, in the END section, you print 400283 times your output line. No surprise run time is that long! Try for (a in A) print a, A[a] in the END section.

Last edited by RudiC; 09-02-2015 at 04:26 AM..
This User Gave Thanks to RudiC For This Post:
# 6  
Old 09-01-2015
@RudiC. Good point, thanks. So this adaptation should improve things:
Code:
awk '$1>i{i=int($1/t+0.5)*t+t} {A[i FS $3]++} END{for(i in A) {$0=i; print $1-t " to " $0, A[i]}}' t=3600 FS=\| OFS=\| file | sort

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with grouping and zipping

Hi can you please help with the below ? source file: Column1,Column2,Column3,Column4 abc,123,dir1/FXX/F19,1 abc,123,dir1/FXX/F20,1 abc,123,dir1/FXX/F23,2 abc,123,dir1/FXX/C25,2 abc,123,dir1/FXX/X25,2 abc,123,dir1/FXX/A23,3 abc,123,dir1/FXX/Z25,3 abc,123,dir1/FXX/Y25,4 I want to... (3 Replies)
Discussion started by: paul1234
3 Replies

2. Shell Programming and Scripting

Name grouping

awk 'FNR==NR {a; next} $NF in a' genes.txt refseq_exons.txt > output.txt I can not figure out how to group the same name in $4 together. Basically, all the SKI together in separate rows and all the TGFB2. Thank you :). chr1 2160133 2161174 SKI chr1 218518675 218520389 TGFB2... (1 Reply)
Discussion started by: cmccabe
1 Replies

3. UNIX for Dummies Questions & Answers

awk Grouping and Subgrouping with Counts

So I have a ton of files, lines in excess of 3 MIL per file. I need to find a solution to find the top 3 products, and then get the top 5 skews with a count of how many times that skew was viewed. This is a sample file, shortened it for readability. Each ROW is counted as view. Here's the... (10 Replies)
Discussion started by: JoshCrosby
10 Replies

4. Shell Programming and Scripting

Grouping

Hi all, I am using following command: perl program.pl input.txt output.txt CUTOFF 3 > groups_3.txt containing program.pl, two files (input.txt, output.txt) and getting output in groups_3.txt: But, I wish to have 30 files corresponding to each CUTOFF ranging from 0 to 30 using the same... (1 Reply)
Discussion started by: bioinfo
1 Replies

5. Shell Programming and Scripting

grouping using sed or awk

I have below inside a file. 11.22.33.44 user1 11.22.33.55 user2 I need this manipulated as alias server1.domain.com='ssh user1@11.22.33.44' alias server2.domain.com='ssh user2@11.22.33.55' (3 Replies)
Discussion started by: anil510
3 Replies

6. Shell Programming and Scripting

awk and perl grouping.

Hello folks. After awk, i have decided to start to learn perl, and i need some help. I have following output : 1 a 1 b 2 k 2 f 3 s 3 p Now with awk i get desired output by issuing : awk ' { a = a FS $2 } END { for ( i in a) print i,a }' input 1 a b 2 k f 3 s p Can... (1 Reply)
Discussion started by: Peasant
1 Replies

7. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Hi, I need an awk script (or whatever shell-construct) that would take data like below and get the max value of 3 column, when grouping by the 1st column. clientname,day-of-month,max-users ----------------------------------- client1,20120610,5 client2,20120610,2 client3,20120610,7... (3 Replies)
Discussion started by: ckmehta
3 Replies

8. Shell Programming and Scripting

awk grouping by name script

Hello I am trying to figure out a script which could group a log file by user names. I worked with awk command and I could trim the log file to: <USER: John Frisbie > /* Thu Aug 06 2009 15:11:45.7974 */ FLOAT GRANT WRITE John Frisbie (500 of 3005 write) <USER: Shawn Sanders > /* Thu Aug 06... (2 Replies)
Discussion started by: Avto
2 Replies

9. Shell Programming and Scripting

Grouping using sed/awk ?

I run awk cat $1|awk '{print $6}' and get a lot of results and I want results to group them. For example my result is (o/p is unknown to user) xyz xyz abc pqr xyz pqr etc I wanna group them as xyz=total found 7 abc=total .... pqr= Thank (3 Replies)
Discussion started by: pujansrt
3 Replies

10. Shell Programming and Scripting

egrep and grouping

i am using the c shell on solaris. directories i'm working with: ls -1d DIV* DIV_dental/ DIV_ibc/ DIV_ifc/ DIV_index/ DIV_pharm/ DIV_sectionI/ DIV_sectionI-title/ DIV_sectionI-toc/ DIV_sectionII-title/ DIV_sectionII-toc/ DIV_standing/ DIV_standing-toc/ DIV_title/ DIV_vision/ (1 Reply)
Discussion started by: effigy
1 Replies
Login or Register to Ask a Question