Counting entries in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Counting entries in a file
# 1  
Old 08-06-2011
Counting entries in a file

Hi,

I have a very large two column log file in the following format:

# Epoch Time IP Address

899726401 112.254.1.0
899726401 112.254.1.0
899726402 154.162.38.0
899726402 160.114.12.0
899726402 165.161.7.0
899726403 101.226.38.0
899726403 101.226.38.0
899726403 101.226.38.0
899726403 73.214.29.0
899726403 144.12.40.0
899726404 144.12.40.0
899726404 1.14.4.0

Each row represents a packet with a time stamp (epoch time) and the source IP address. As the granularity level is in "seconds", hence there are multiple entries for the same time stamp. So in 1st second there are two packets (from 1 IP), 2nd second three (from 3 IPs), 3rd second five (from 3 IPs) and so on.

I want to have a script using sed/awk (as the log files are quite big) which takes "time (seconds)" as the user input and gives the number of packets (instances of epoch time) and number of unique IP address within that specified time as the output.

So for e.g., if user gives 1 second as the input, the output file (3 columns) should be like:

#Time No of Packets No of Unique IPs

1 2 1
2 3 3
3 5 3
4 2 2


Similarly for user input as 2 second the output file should be like:

#Time No of Packets No of Unique IPs

1 5 4
2 7 4

PS: here 1 and 2 in the first column means first two seconds and next two seconds respectively.

Looking forward to the reply.

Thanks,
# 2  
Old 08-06-2011
Have a go with this:

Code:
#!/usr/bin/env ksh

awk -v bin_size=${1:-5} '
    function dump( )
    {
        if( NR == 1 )
            return;
        printf( "%3d %3d %3d\n", bin+1, total, length( unique ) );
        bin++;
    }

    {
        if( $1+0 >= next_bin )
        {
            dump( );
            next_bin = $1 + bin_size;

            delete unique;
            total = 0;
        }

        unique[$2]++
        total++;
    }
    END {
        if( total )
            dump( );
    }
'

exit

The size of the bin in seconds is passed on the command line as the only parameter to the script.
This User Gave Thanks to agama For This Post:
# 3  
Old 08-06-2011
It should also have an "input file" as the one of the user arguments alongwith the "bin size". Its doesn't seem to work.

Can you check it again?

Thanks,

Last edited by sajal.bhatia; 08-06-2011 at 02:53 AM..
# 4  
Old 08-06-2011
It works perfectly and it's a very good script (I tried to write one but I couldn't). Get your INPUTFILE from stdin like this:
Code:
./this_script 1 <INPUTFILE

These 2 Users Gave Thanks to yazu For This Post:
# 5  
Old 08-06-2011
Cheers Smilie
# 6  
Old 08-06-2011
Quote:
Originally Posted by sajal.bhatia
It should also have an "input file" as the one of the user arguments alongwith the "bin size". Its doesn't seem to work.

Can you check it again?

Thanks,
Sorry, I forgot to point out that the script assumes input from stdin -- I'm an old school programmer and generally write code to take input from stdin as this allows preprocessing of the input (through sed or grep) if needed without having to make a change to the script, or code the script to do extra work.

The script could also be modified slightly (the very last line) to allow the name of the input file to be supplied on the command line to read from stdin:

Code:
awk  -v bin_size=${1:-5} '
# body of awk from previous example 
' $2

Assuming the script is saved in foo.ksh, this allows it to be invoked both ways:
Code:
foo.ksh 1 <input-file
foo.ksh 1 input-file

# 7  
Old 08-07-2011
Hi,

In place of "number of unique IPs" for that timeinterval, if I would like to calculate "number of NEW IPs (comparing the IPs in the current time interval to the IPs in the previous time interval), could someone suggest the modifications to the script?

Cheers,
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting and print from file

Dear community, I have an already filtered log on my machine, something like: WARN 2016.03.10 10:59:01.136 logging.LogAlarmListener raise ALARMWARNINGRAISED Alarm NODE-NetworkAccessGroup.Client.41283 SERVICEDOWN-41283.WC severity WARNING raised: Service 41283.WC protocoltype client is down... (13 Replies)
Discussion started by: Lord Spectre
13 Replies

2. Shell Programming and Scripting

Need help of counting no of column of a file

Hi All , I got stuck on the below scenario.If anyone can help me ,that will be really helpful. I have a target hdfs file layout.I need to know the no of column in that file. Target_RECRD_layout { ABC_ID EN NOTNULLABLE, ABC_COUNTRY CHARACTER ENCODING ASCII NOTNULLABLE, ... (5 Replies)
Discussion started by: STCET22
5 Replies

3. Shell Programming and Scripting

Counting lines in a file using awk

I want to count lines of a file using AWK (only) and not in the END part like this awk 'END{print FNR}' because I want to use it. Does anyone know of a way? Thanks a lot. (7 Replies)
Discussion started by: guitarist684
7 Replies

4. UNIX for Dummies Questions & Answers

Counting feilds entries with Perl

Hi All, I have a small problem of counting the number of times a particular entry that exists in a horizontal string of elements and a vertical feild (column of entries). For example AATGGTCCTGExpected outputA=2 C=2 G=3 T=3 I have an idea to do this but I dont know how to do that if these entries... (1 Reply)
Discussion started by: pawannoel
1 Replies

5. Shell Programming and Scripting

Counting characters within a file

Ok say I wanted to count every Y in a data file. Then set Y as my delimiter so that I can separate my file by taking all the contents that occur BEFORE the first Y and store them in a variable so that I may use this content later on in my program. Then I could do the same thing with the next Y's... (5 Replies)
Discussion started by: puttster
5 Replies

6. Shell Programming and Scripting

Counting duplicate entries in a file using awk

Hi, I have a very big (with around 1 million entries) txt file with IPv4 addresses in the standard format, i.e. a.b.c.d The file looks like 10.1.1.1 10.1.1.1 10.1.1.1 10.1.2.4 10.1.2.4 12.1.5.6 . . . . and so on.... There are duplicate/multiple entries for some IP... (3 Replies)
Discussion started by: sajal.bhatia
3 Replies

7. Shell Programming and Scripting

Counting multiple entries in a file using awk

Hi, I have a big file (~960MB) having epoch time values (~50 million entries) which looks like 897393601 897393601 897393601 897393601 897393602 897393602 897393602 897393602 897393602 897393603 897393603 897393603 897393603 and so on....each time stamp has more than one... (6 Replies)
Discussion started by: sajal.bhatia
6 Replies

8. Programming

Counting the words in a file

Please find the below program. It contains the purpose of the program itself. /* Program : Write a program to count the number of words in a given text file */ /* Date : 12-June-2010 */ # include <stdio.h> # include <stdlib.h> # include <string.h> int main( int argc, char *argv ) {... (6 Replies)
Discussion started by: ramkrix
6 Replies

9. Shell Programming and Scripting

Help me in counting records from file

Hi, Please help me in counting the below records(1st field) from samplefile: Expected output: Count Descr ------------------------------------------- 7 Mean manager 14 ... (7 Replies)
Discussion started by: prashant43
7 Replies

10. Shell Programming and Scripting

Counting words in a file

I'm trying to figure out a way to count the number of words in the follwing file: cal 2002 > file1 Is there anyway to do this without using wc but instead using the cut command? (1 Reply)
Discussion started by: r0mulus
1 Replies
Login or Register to Ask a Question