Grouping and Subgrouping using awk

08-31-2015

Registered User

3, 0

Join Date: Jan 2011

Last Activity: 7 September 2015, 3:34 AM EDT

Posts: 3

Thanks Given: 1

Thanked 0 Times in 0 Posts

Grouping and Subgrouping using awk

I have a data which looks like

Code:

1440993600|L|ABCDEF
1440993600|L|ABCD
1440993601|L|ABCDEF
1440993602|L|ABC
1440993603|L|ABCDE
.
.
.

1441015200|L|AB
1441015200|L|ABC
1441015200|L|ABCDEF

So basically, the $1 is epoch date, $2 and $3 is some application data

From one if the threads, I was following I am using this code to group the 1st column based on the interval

Code:

awk -F "|" '{while ($1>=t*w) t++; A[t]++} END {for (i=1;i<=t;i++) print "    "strftime("%F %T",(i-1)*w)" to "strftime("%F %T",i*w-1)"|"(A[i])}' w=3600 ${work_file} |tail

What I need is sub-group based on 3rd column as well
Meaning if the interval is say 3600 (1 hours) the output should be

Code:

1440993600 to 1440997200|ABCDEF|2
1440993600 to 1440997200|ABCD|1
1440993600 to 1440997200|ABC|1
1440993600 to 1440997200|ABCDE|1
1440997200 to 1441000800|ABCDE|12
1440997200 to 1441000800|ABC|3
1441000800 to 1441004400|ABCD|5
1441000800 to 1441004400|ABCDE|3
1441000800 to 1441004400|ABCDEF|7
.
.
.

hemanty4u

View Public Profile for hemanty4u

Find all posts by hemanty4u

08-31-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Any attempts from your side?

---------- Post updated at 17:29 ---------- Previous update was at 17:28 ----------

Howsoever, try

Code:

awk -F\| '
                {INTV=int($1/3600)
                 I[INTV]++
                 AD[$3]++
                 T[INTV,$3]++
                }
END             {for (i in I) for (a in AD) if (T[i,a]) print i*3600 " to " (i+1)*3600, a, T[i,a]
                }
' OFS="|" file
1441015200 to 1441018800|AB|1
1441015200 to 1441018800|ABCDEF|1
1441015200 to 1441018800|ABC|1
1440993600 to 1440997200|ABCDEF|2
1440993600 to 1440997200|ABCDE|1
1440993600 to 1440997200|ABCD|1
1440993600 to 1440997200|ABC|1

RudiC

View Public Profile for RudiC

Find all posts by RudiC

08-31-2015

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Another awk:

Code:

awk '$1>i{i=$1+3600} {A[i FS $3]++} END{for(i in A) {$0=i; print $1-3600 " to " $0, A[i]}}' FS=\| OFS=\| file | sort

Last edited by Scrutinizer; 08-31-2015 at 03:11 PM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

09-01-2015

Registered User

3, 0

Join Date: Jan 2011

Last Activity: 7 September 2015, 3:34 AM EDT

Posts: 3

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thanks All

In relation to the same data file, I am trying to group using the command I mentioned above

Code:

awk -F "|" '{while ($1>=t*w) t++; A[t]++} END {for (i=1;i<=t;i++) print "    "strftime("%F %T",(i-1)*w)" to "strftime("%F %T",i*w-1)"|"(A[i])}' w=3600 ${work_file} |tail

This code behaves correctly for small set data in a file
The moment the data is like some 2M records, it does not throw correct results

For example in the same data example above, when I run it over 2M records, I get the output as this way

Code:

    2015-09-01 00:00:00 to 2015-09-01 00:59:59|377387
    2015-09-01 01:00:00 to 2015-09-01 01:59:59|372157
    2015-09-01 02:00:00 to 2015-09-01 02:59:59|386135
    2015-09-01 03:00:00 to 2015-09-01 03:59:59|335708
    2015-09-01 04:00:00 to 2015-09-01 04:59:59|382802
    2015-09-01 05:00:00 to 2015-09-01 05:59:59|6449915

The last count 6449915 is actually not correct
It is displayed as

Code:

Total records in the file minus (sum of 377387 + 372157 + 386135 + 335708 + 382802) which is visually not correct at all

Last edited by hemanty4u; 09-01-2015 at 11:57 AM.. Reason: Missed some statements

hemanty4u

View Public Profile for hemanty4u

Find all posts by hemanty4u

09-01-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

@Scrutinizer: brilliant and terse; works well if the first epoch time in a group is an integer multiple of 3600 but shifts the entire range output if the time has some additional seconds in it.

@hemanty4u: With the data as given, t is calculated by successive addition of 1, i.e. 400277 loops, while one single division yields the same result. And, in the END section, you print 400283 times your output line. No surprise run time is that long! Try for (a in A) print a, A[a] in the END section.

Last edited by RudiC; 09-02-2015 at 04:26 AM..

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

09-01-2015

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

@RudiC. Good point, thanks. So this adaptation should improve things:

Code:

awk '$1>i{i=int($1/t+0.5)*t+t} {A[i FS $3]++} END{for(i in A) {$0=i; print $1-t " to " $0, A[i]}}' t=3600 FS=\| OFS=\| file | sort

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Grouping and Subgrouping using awk

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with grouping and zipping

Discussion started by: paul1234

2. Shell Programming and Scripting

Name grouping

Discussion started by: cmccabe

3. UNIX for Dummies Questions & Answers

awk Grouping and Subgrouping with Counts

Discussion started by: JoshCrosby

4. Shell Programming and Scripting

Grouping

Discussion started by: bioinfo

5. Shell Programming and Scripting

grouping using sed or awk

Discussion started by: anil510

6. Shell Programming and Scripting

awk and perl grouping.

Discussion started by: Peasant

7. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Discussion started by: ckmehta

8. Shell Programming and Scripting

awk grouping by name script

Discussion started by: Avto

9. Shell Programming and Scripting

Grouping using sed/awk ?

Discussion started by: pujansrt

10. Shell Programming and Scripting

egrep and grouping

Discussion started by: effigy