Sampling pcap file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sampling pcap file
# 1  
Old 11-11-2010
Sampling pcap file

Hi,

I have a standard pcap file created using tcpdump. The file looks like
Code:
06:49:36.487629 IP 202.1.175.252 > 71.126.222.64: ICMP echo request, id 52765, seq 1280, length 40
06:49:36.489552 IP 192.120.148.227 > 71.126.222.64: ICMP echo request, id 512, seq 1280, length 40
06:49:36.491812 IP 51.81.166.201 > 71.126.222.64: ICMP echo request, id 61249, seq 1280, length 40

Since each entry in the file represents a packet, I have to calculate number of packets and IP address in each 10 second interval.

So the output file should look something like this:
Code:
#Time    Packets    IP's
10          2000         50
20          1000         30
30          1500         20
.
.
.

and so on till end of the file.

The last entry in the output file can have the first column value (i.e. time) to be less than 10 seconds.

Since the pcap file is quite big (~500-600 MB) I am looking for a solution in sed/awk.

Any help will be highly appreciated.

Thanks!!

Last edited by Franklin52; 11-14-2010 at 10:48 AM.. Reason: Please use code tags
# 2  
Old 11-12-2010
Not sure if the 2nd IP (71.126.222.64) should be counted too, but here it is:

Code:
awk -F"[:, ]" ' { now=mktime("2000 1 1 "$1" "$2" "$3);
if (NR==1) printf("#Time Packets IPs\n", to=now+10);
else {
    if (now >= to) {
           printf("%d %d %d\n", count+=10, found, length(IPs));
           while((to+10) < now) printf("%d 0 0\n", count+=10, to+=10);
           delete IPs;
           found=0;
           to+=10;
        }
}
found++;
IPs[$5]++;
}
END { printf("%d %d %d\n", count + 10 - to + now, found, length(IPs)); } ' logfile

---------- Post updated at 03:11 PM ---------- Previous update was at 12:49 PM ----------

Times past midnight or more than 1 days worth of logs?

If time is less that a time before assume we are in the next day and add 24 hours, also now calculates times without using mktime:

Code:
awk -F"[:, ]" ' { new=$1*3600+$2*60+$3;
while(new < now) new+=3600*24;
now=new;
if (NR==1) printf("#Time Packets IPs\n", to=now+10);
else {
    if (now >= to) {
       printf("%d %d %d\n", count+=10, found, length(IPs));
       while((to+10) < now) printf("%d 0 0\n", count+=10, to+=10);
       delete IPs;
       found=0;
       to+=10;
    }
}
found++;
IPs[$5]++;
}
END { printf("%d %d %d\n", count + 10 - to + now, found, length(IPs)); } infile


Last edited by Chubler_XL; 11-12-2010 at 01:18 AM.. Reason: Updated to include zero readings for missing lines
# 3  
Old 11-12-2010
Thanks !!

First one works better. Don't know why but the second one make some error in counting the packets though the IP count is same.

Can this be extended to print only the IP addresses which are new in each interval by comparing it with previous interval? I mean for example the second interval (10-20 sec) had 30 IP's and first third interval (20-30) had 50 IP's, but out of these 50, 10 are common (i.e. also present in second interval). So the output file has one more column which prints out the new IP's i.e. 40 in this case.

The output file looks like this
#Time Packets IPs New IPs

The first interval (0-10) will have the same values for column 3 (IPs) and Column 4 (New IPs)


Thanks again Smilie
# 4  
Old 11-12-2010
Here is the update for Global New IPs:

Code:
awk -F"[:, ]" ' { now=mktime("2000 1 1 "$1" "$2" "$3);
if (NR==1) printf("#Time Packets IPs New_IPs\n", to=now+10, new=0);
else {
    if (now >= to) {
           printf("%d %d %d %d\n", count+=10, found, length(IPs), new);
           while((to+10) < now) printf("%d 0 0 0\n", count+=10, to+=10);
           delete IPs;
           new=found=0;
           to+=10;
        }
}
found++;
IPs[$5]++;
if (!($5 in GIPs)) {
    new++;
    GIPs[$5]++;
}
}
END { printf("%d %d %d %d\n", count + 10 - to + now, found, length(IPs), new); } ' logfile

# 5  
Old 11-12-2010
Hi,

Can you explain a bit how this works?

What does this Global IP mean ??

Thanks!
# 6  
Old 11-14-2010
This is what I understood your requirement was (each line displays a count of IPs used in the current interval and count of new IPs introduced, ie not seen in the file up to this point).

The array GIP (Global IP) contains each IP seen in the file so far. Each time an IP not in this array is seen it's added to this array and the new counter is incremented.

Perhaps this is wrong, when I re-read your post it appears you only want those IPs not seen in the previous interval (as opposed to the whole file) is this correct?
Code:
Interval   IP
A           192.168.1.1
A           192.168.1.2
A           192.168.1.3
B           192.168.1.1
B           192.168.1.4
C           192.168.1.3
C           192.168.1.2

For Interval,Count,New should we get
Code:
A,3,3
B,2,1
C,2,2

or
Code:
A,3,3
B,2,1
C,2,0

# 7  
Old 11-14-2010
Hi,

Yes I want to have a count of IPs in the current interval (which should be user controlled) and a count of new IPs in that same interval when compared to the previous interval not the whole file.

So the output should be the second one which you posted i.e.

A,3,3
B,2,1
C,2,0

Thanks !!
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Programming

printing out information from pcap file

Hi Folks, i got the following Problem: I want to make an analysis on a pcap file. (diestance between different packets and so on) The difficulty now... it's not a simple Ethernet/ IP/ File, but it's a SS7 file. There are the Layers MTP2 MTP3 and ISUP. My analysis depends on the ISUP Layer. Now... (0 Replies)
Discussion started by: thisismyname
0 Replies

2. Shell Programming and Scripting

data sampling

I have a requirement where I have multiple flat file sources. I need to create sample data from each source. Example: Source 1 has 10 flat files-- member, transaction,item,email,....etc Now if I get any 10 records (say first 10 records) from the member flat file, I need to find those matching... (2 Replies)
Discussion started by: arrivederci
2 Replies

3. Shell Programming and Scripting

Sampling and Binning- Engineering problem

Hi everyone! Can you please help me with some shell scripting? I have an input file input.txt It has 3 columns (Time, Event, Value) Time event Value 03:38:22 A 57 03:38:23 A 56 03:38:24 B 24 03:38:25 C 51 03:38:26 B 7 03:38:26 ... (7 Replies)
Discussion started by: Needhelp2
7 Replies

4. Shell Programming and Scripting

Pcap.h Sniffing

Can someone please help me figure out how to use pcap.h to sniff packets between only 2 computers whose mac addresses are know? Thanks (0 Replies)
Discussion started by: papabearcares
0 Replies

5. Programming

Pcap.h Sniffing

Can someone please help me figure out how to use pcap.h to sniff packets between only 2 computers whose mac addresses are know? Thanks (0 Replies)
Discussion started by: papabearcares
0 Replies

6. Programming

pcap.h

I cant use pcap.h include file. How can I do so? :confused: (8 Replies)
Discussion started by: Pervez Sajjad
8 Replies

7. Programming

Compiling Pcap.c

I don't know if this is the correct forum to post this but hopefully someone can atleast point me in the right direction if they can't help me. I am trying to install the Net::Pcap module for perl from Tim Potter version .04. I have installed gcc 2.95.3 on my Solaris 8 box. I am sure it's just... (6 Replies)
Discussion started by: TioTony
6 Replies
Login or Register to Ask a Question