Counting multiple entries in a file using awk

09-23-2010

Registered User

44, 0

Join Date: Sep 2010

Last Activity: 16 August 2012, 1:55 AM EDT

Posts: 44

Thanks Given: 13

Thanked 0 Times in 0 Posts

Counting multiple entries in a file using awk

Hi,

I have a big file (~960MB) having epoch time values (~50 million entries) which looks like

897393601
897393601
897393601
897393601
897393602
897393602
897393602
897393602
897393602
897393603
897393603
897393603
897393603

and so on....each time stamp has more than one occurrence.

I want an AWK / SED program (as the file size is considerably big) to read this file,count the number of entries within a fixed interval (for ex. 2 hrs or 7200 secs given by user) and return an output file which would look something like

first 2 hrs X entries
next 2 hrs Y entries
next 2 hrs Z entries
.
.
.
and so on
where "first 2 hrs" means start time(897393601)+time interval(7200) and so on...

I have written a bash script doing the desired thing but it is way too slow for such a big file. So I am looking for a solution in AWK or SED.

Any quick help will be highly appreciated.

Thanks !

Last edited by sajal.bhatia; 09-23-2010 at 11:26 PM.. Reason: Typing error

sajal.bhatia

View Public Profile for sajal.bhatia

Find all posts by sajal.bhatia

09-24-2010

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

This should get you started. You'll have to add conversion if you want to allow the user to supply 2 hours rather than 7200 seconds. Not sure how fast this will be, don't have the patience tonight to create a large data set, but it will likely be faster than bash.

Code:

#!/usr/bin/env ksh

awk -v window=${1:-7200} '
        {
                if( $1 > end_window )    # reached the end of the time window
                {
                        if( idx++ )            # if not the first record, print count
                                printf( "range %d: %.0f values\n", idx, count );
                        count = 0;           # reset count and set next end of window

                        end_window = $1 + window;
                }

                count++;    # count observations in this window
        }

        END {
                printf( "range %d: %.0f values\n", idx, count );   # print count in progress as we reach eof
        }
'

---------- Post updated at 23:52 ---------- Previous update was at 23:50 ----------

Forgot to mention that this reads from stdin and will count the duplicates. If you need to drop the duplicates you can take the easy way out and execute

Code:

sort -u

piping the output into the awk.

This User Gave Thanks to agama For This Post:

agama

View Public Profile for agama

Find all posts by agama

09-24-2010

Registered User

2,759, 420

Join Date: Jun 2006

Last Activity: 13 September 2015, 8:58 PM EDT

Posts: 2,759

Thanks Given: 44

Thanked 420 Times in 408 Posts

Guess sometime there is no entry at all. change two lines from the sample input file.

Code:

awk '
NR==1{start=$1} 
{t=int(($1-start)/7200);a[t]++;s=(t>s)?t:s}
END{
        print "first 2 hours", a[0] , "entries"
        for (i=1;i<=s;i++) print "next 2 hours", (a[i])?a[i]:"0" , "entries"
    }' infile

first 2 hours 11 entries
next 2 hours 0 entries
next 2 hours 0 entries
next 2 hours 2 entries

rdcwayx

View Public Profile for rdcwayx

Find all posts by rdcwayx

09-24-2010

Registered User

44, 0

Join Date: Sep 2010

Last Activity: 16 August 2012, 1:55 AM EDT

Posts: 44

Thanks Given: 13

Thanked 0 Times in 0 Posts

Hi, this script is giving syntax errors while executing it. Can you help me fix them, as I am new to AWK .

when I run this test.awk with this command -- awk -f test.awk input_file.txt this is the error what I am getting

awk: test.awk:3: awk -v window=${1:-7200} '
awk: test.awk:3: ^ syntax error
awk: test.awk:3: awk -v window=${1:-7200} '
awk: test.awk:3: ^ invalid char ''' in expression

Please help !

sajal.bhatia

View Public Profile for sajal.bhatia

Find all posts by sajal.bhatia

09-24-2010

Registered User

2,759, 420

Join Date: Jun 2006

Last Activity: 13 September 2015, 8:58 PM EDT

Posts: 2,759

Thanks Given: 44

Thanked 420 Times in 408 Posts

Quote:

Originally Posted by sajal.bhatia

You can run agama's code directly, no need add it in awk command again.

Code:

awk -v window=${1:-7200} '
        {
                if( $1 > end_window )    # reached the end of the time window
                {
                        if( idx++ )            # if not the first record, print count
                                printf( "range %d: %.0f values\n", idx, count );
                        count = 0;           # reset count and set next end of window

                        end_window = $1 + window;
                }

                count++;    # count observations in this window
        }

        END {
                printf( "range %d: %.0f values\n", idx, count );   # print count in progress as we reach eof
        }
' input_file.txt

These 2 Users Gave Thanks to rdcwayx For This Post:

rdcwayx

View Public Profile for rdcwayx

Find all posts by rdcwayx

09-24-2010

Registered User

44, 0

Join Date: Sep 2010

Last Activity: 16 August 2012, 1:55 AM EDT

Posts: 44

Thanks Given: 13

Thanked 0 Times in 0 Posts

Thanks a lot :-)

sajal.bhatia

View Public Profile for sajal.bhatia

Find all posts by sajal.bhatia

09-24-2010

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Another approach:

Code:

awk '!e{e=$1+7200} 
$1-e>0{print "Range "++i , c " entries"; e+=7200; c=0}
{c++}
END{print "Range " ++i , c " entries"}
' file

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

Shell Programming and Scripting

Counting multiple entries in a file using awk

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting lines in a file using awk

Discussion started by: guitarist684

2. Shell Programming and Scripting

Shell script with awk command for counting in a file

Discussion started by: y.g.

3. Shell Programming and Scripting

Counting Multiple Fields with awk/nawk

Discussion started by: he204035

4. Shell Programming and Scripting

Awk match multiple columns in multiple lines in single file

Discussion started by: jacobs.smith

5. Shell Programming and Scripting

Counting entries in a file

Discussion started by: sajal.bhatia

6. Shell Programming and Scripting

counting particular record format in a file using AWK

Discussion started by: siteregsam

7. Shell Programming and Scripting

Counting occurrences of all words in multiple files

Discussion started by: twjolson

8. Shell Programming and Scripting

Counting duplicate entries in a file using awk

Discussion started by: sajal.bhatia

9. Shell Programming and Scripting

multiple files: counting

Discussion started by: asanjuan

10. Shell Programming and Scripting

Counting lines in multiple files

Discussion started by: Lucky Ali