Counting entries in a file

Small tweeks to the original awk to show number of new IPs in the current bin compared to the previous.

#!/usr/bin/env ksh

awk -v bin_size=${1:-5} '
    function dump( )
        if( NR == 1 )

        new_count = 0;
        for( u in unique )              # compute total in this bin that were not in last bin
            if( last_bin[u] == 0 )

        printf( "%3d %3d %3d\n", bin+1, total, new_count );

        if( $1+0 >= next_bin )
            dump( );
            next_bin = $1 + bin_size;

            delete last_bin;
            for( u in unique )              # copy hits from this bin
                last_bin[u] = 1;
            delete unique;
            total = 0;

    END {
        if( total )
            dump( );


Have fun!
It worksSmilie

Can you give a brief explanation ? I am new to awk.
Glad it worked for you. I've added some comments. I'll watch the thread if you have specific questions.

awk -v bin_size=${1:-5} '
    # use a function so we can call as we process input and at the end without duplicating the code
    function dump( )                # dump out the information that we collected about the last bin
        if( NR == 1 )               # we will call this for the first record; 
            return;                 # if this is the first record (NR equals 1) then we skip the print

        new_count = 0;
        for( u in unique )              # look at each unique IP we saved
            if( last_bin[u] == 0 )      # if it wasnt noticed last time, count it

        printf( "%3d %3d %3d\n", bin, total, new_count );       # print all of the counts

    {                                   # process for each record in the file (impled true condition)
        if( $1+0 >= next_bin )          # if timestamp (col 1) is in the next bin
            dump( );                    # print data from the previous bin
            next_bin = $1 + bin_size;   # mark the start of the next bin

            delete last_bin;            # must delete contents of last bin
            for( u in unique )          # copy hits from this bin 
                last_bin[u] = 1;        # for comparison when we see the start of next bin
            delete unique;              # must delete the list of unique IPs from the current bin before we start
            total = 0;                  # zero number of hits in the bin

        unique[$2]++                    # count the number of times this IP address was seen in the bin
        total++;                        # total number of entries in the bin

    END {               # at the end of the file, one last print if we saw something in the previous bin
        if( total )
            dump( );

Can you modify it to compute the 'new IPs in the current bin compared to the ENTIRE HISTORY upto that interval instead of just the previous interval' ?

I have another column to the input file and would like to sum-up and print the entries of that column for the user specified time interval. For e.g. if the user specifies 5 second as the input, the script should add all the entries of the third column for this 5 sec interval and print it alongside the other information being currently printed by the above script i.e. Time-stamp, number of packets, number of uniq IPs in that interval and number of new IP in that interval as compared to the previous interval.

Looking for a solution.

only raw data, no title in source file. Otherwise, you need adjust the red part.

$ interval=2
$ awk -v s=$interval 'NR==1{min=$1}
                    {NoP[$1]++;UnIP[$1 FS $2]++;IP[$2];min=min>$1?$1:min;max=max>$1?max:$1}
                   END{for (i=min;i<=max;i=i+s)  
                         { b=i
                           {while (b<i+s) 
                                  for (j in IP) if (UnIP[b FS j]) u++
                            print ++e,t,u
                       }' infile

1 5 4
2 7 5

The third column it prints is incorrect.

I am looking for a solution which integrates the above script given by agama and your previous solution
