So, I've been working on a project which takes layer 7 metadata from pcap dumps and archives it. However, there is a lot of dataless information that I don't want in my output. I know of ways to produce the output I want from the input file below, but I want a method of doing this, regardless of the text.
The problem:
I want to eliminate the single timestamp lines between those delimiters. What is below is a sample input file;
Code:
$ cat sample.txt
-------------------------------
13:30:01.651115 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 0
-------------------------------
13:30:01.651125 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 430
-------------------------------
13:30:01.651743 IP 234.234.45.654.2054 > 657.435.23.453.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/apparel/jackets-blazers?id=35786&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
13:30:01.651744 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 123
-------------------------------
13:30:01.651743 IP 132.23.235.11.2054 > 234.234.345.1.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/appa...86&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
13:30:01.651745 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 0
-------------------------------
13:30:01.651745 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 0
-------------------------------
I would like for the output file to be;
Code:
-------------------------------
13:30:01.651743 IP 234.234.45.654.2054 > 657.435.23.453.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/apparel/jackets-blazers?id=35786&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
13:30:01.651743 IP 132.23.235.11.2054 > 234.234.345.1.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/appa...86&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
You'll notice that the only text in the output file is from packets containing additional layer 7 headers. If the packet does not contain these headers, I do not want it displayed. Again, the data does not ALWAYS have these exact header fields, but as long as it has header fields at all, I want it. I could kill them by removing the time stamps, but then I would lose ALL timestamps, which I don't want.
The solution in my head says, "Removing everything between delimiter "----" if everything between them does not exceed one line. Is that reasonable? If so, what is the best way to do that? Alternatively, I could look for all instances of timestamps where the next line is "----" and kill them, I just don't know the best way to do that either!
Thanks in advance.
Last edited by radoulov; 09-22-2011 at 01:33 PM..
Reason: URL removed.
Is the first ------------- necessary? If not, awk makes this easy.
Set RS="------..." so awk considers each thing between -------'s one record.
Set FS="\n" so each line is considered one field.
Set OFS="-------..." so it prints the dashed lines between on the way out.
Then, for each record/block, print only if there's more than 2 fields/lines.
Code:
$ cat 2line.awk
BEGIN { RS="-------------------------------\n"
ORS="-------------------------------\n"
FS="\n" }
(NF>2)
$ awk -f 2line.awk < data
13:30:01.651743 IP 234.234.45.654.2054 > 657.435.23.453.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/a...86&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
13:30:01.651743 IP 132.23.235.11.2054 > 234.234.345.1.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/a...86&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
$
Last edited by radoulov; 09-22-2011 at 01:34 PM..
Reason: URL removed.
Hi,
I have a file which is an extract of jil codes of all autosys jobs in our server.
Sample jil code:
**************************
permission:gx,wx
date_conditions:yes
days_of_week:all
start_times:"05:00"
condition: notrunning(appDev#box#ProductLoad)... (1 Reply)
Hi ,
i have a file with data as below.This is same file. But actual file contains to many rows.
i want to search for a string "Field 039 00" and delete that line and previous 3 lines in that file.. Can some body suggested me how can i do using either sed or awk command ?
Field 004... (7 Replies)
I need to remove double quoted strings from specific lines in a file. The specific line numbers are a variable. For example, line 5 of the file contains
A B C "string"
I want to remove "string". The following sed command works:
sed '5 s/\"*\"//' $file
If there are multiple... (2 Replies)
I am passing a list of strings $list and want to remove all entries with --shift=number, --sort=number/number/..., --group=number/number/... Also are removed whether upper or lower case letters are used
For example the following will all be deleted from the list
--shift=12
--shift=2324... (7 Replies)
Hi,
I have a file with following format:
1|What is you name (full name)?|Character
2|How far is your school ?|Numeric
Now I need to remove everything inside brackets () or . There can be more than one pair of brackets. The output file should look like:
1|What is you name?|Character... (8 Replies)
Hi All,
Can you please guide me to search a string in a particular column of file and return the line number of the line where it was found using awk. As an example :
abc.txt
7000,john,2,1,0,1,6
7001,elen,2,2,0,1,7
7002,sami,2,3,0,1,6
7003,mike,1,4,0,2,1
8001,nike,1,5,0,1,8... (3 Replies)
I get a file which has all its content in a single row.
The file contains xml data containing 3000 records, but all in a single row, making it difficult for Unix to Process the file.
I decided to insert a new line character at all occurrences of a particular string in this file (say replacing... (4 Replies)
Hi,
I have 3 lines in a text file that is similar to this (as a result of a diff between 2 files):
35,36d34
< DATA.EVENT.EVENT_ID.s = "3661208"
< DATA.EVENT.EVENT_ID.s = "3661208"
I am trying to get it down to just this:
DATA.EVENT.EVENT_ID.s = "3661208"
How can I do this?... (11 Replies)