Removing a line IF the next line contains string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing a line IF the next line contains string
# 1  
Old 09-22-2011
Removing a line IF the next line contains string

So, I've been working on a project which takes layer 7 metadata from pcap dumps and archives it. However, there is a lot of dataless information that I don't want in my output. I know of ways to produce the output I want from the input file below, but I want a method of doing this, regardless of the text.

The problem:
I want to eliminate the single timestamp lines between those delimiters. What is below is a sample input file;

Code:
$ cat sample.txt

-------------------------------
13:30:01.651115 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 0
-------------------------------
13:30:01.651125 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 430
-------------------------------
13:30:01.651743 IP 234.234.45.654.2054 > 657.435.23.453.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/apparel/jackets-blazers?id=35786&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
13:30:01.651744 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 123
-------------------------------
13:30:01.651743 IP 132.23.235.11.2054 > 234.234.345.1.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/appa...86&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
13:30:01.651745 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 0
-------------------------------
13:30:01.651745 IP 24.7.10.70.284 > 7.2.3.186.80: tcp 0
-------------------------------

I would like for the output file to be;

Code:
-------------------------------
13:30:01.651743 IP 234.234.45.654.2054 > 657.435.23.453.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/apparel/jackets-blazers?id=35786&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR  2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR  3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
13:30:01.651743 IP 132.23.235.11.2054 > 234.234.345.1.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/appa...86&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR  2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR  3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive

You'll notice that the only text in the output file is from packets containing additional layer 7 headers. If the packet does not contain these headers, I do not want it displayed. Again, the data does not ALWAYS have these exact header fields, but as long as it has header fields at all, I want it. I could kill them by removing the time stamps, but then I would lose ALL timestamps, which I don't want.

The solution in my head says, "Removing everything between delimiter "----" if everything between them does not exceed one line. Is that reasonable? If so, what is the best way to do that? Alternatively, I could look for all instances of timestamps where the next line is "----" and kill them, I just don't know the best way to do that either!

Thanks in advance.

Last edited by radoulov; 09-22-2011 at 12:33 PM.. Reason: URL removed.
# 2  
Old 09-22-2011
Is the first ------------- necessary? If not, awk makes this easy.

Set RS="------..." so awk considers each thing between -------'s one record.
Set FS="\n" so each line is considered one field.
Set OFS="-------..." so it prints the dashed lines between on the way out.

Then, for each record/block, print only if there's more than 2 fields/lines.
Code:
$ cat 2line.awk
BEGIN { RS="-------------------------------\n"
        ORS="-------------------------------\n"
        FS="\n" }

(NF>2)
$ awk -f 2line.awk < data
13:30:01.651743 IP 234.234.45.654.2054 > 657.435.23.453.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/a...86&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
13:30:01.651743 IP 132.23.235.11.2054 > 234.234.345.1.80: tcp 430
:./f"J}/i...P.....:..P.......GET /afsonline/show_afs_ads.js HTTP/1.1
Accept: */*
Referer: http://url-removed/shop/juniors/a...86&edge=hybrid
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C)
Host: Google
Connection: Keep-Alive
-------------------------------
$


Last edited by radoulov; 09-22-2011 at 12:34 PM.. Reason: URL removed.
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 09-22-2011
That is solid gold, good sir. And thanks for editing to add in those extra explanations!

Thanks!
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Red Hat

How to add a new string at the end of line by searching a string on the same line?

Hi, I have a file which is an extract of jil codes of all autosys jobs in our server. Sample jil code: ************************** permission:gx,wx date_conditions:yes days_of_week:all start_times:"05:00" condition: notrunning(appDev#box#ProductLoad)... (1 Reply)
Discussion started by: raghavendra
1 Replies

2. UNIX for Advanced & Expert Users

How to find a string in a line in UNIX file and delete that line and previous 3 lines ?

Hi , i have a file with data as below.This is same file. But actual file contains to many rows. i want to search for a string "Field 039 00" and delete that line and previous 3 lines in that file.. Can some body suggested me how can i do using either sed or awk command ? Field 004... (7 Replies)
Discussion started by: vadlamudy
7 Replies

3. Shell Programming and Scripting

Matching some string in a line and removing that

I have one output file. Node: hstg1so Date: 2013/07/16 17:51:24 GMT Totals: 10608 6871 0 2208 1529 0 0 64% 0% ( 0 ) Node: hstg2so Date: 2013/07/16 17:51:25 GMT Totals: ... (3 Replies)
Discussion started by: Raza Ali
3 Replies

4. Shell Programming and Scripting

Deleting double quoted string from a line when line number is variable

I need to remove double quoted strings from specific lines in a file. The specific line numbers are a variable. For example, line 5 of the file contains A B C "string" I want to remove "string". The following sed command works: sed '5 s/\"*\"//' $file If there are multiple... (2 Replies)
Discussion started by: rennatsb
2 Replies

5. Shell Programming and Scripting

Removing command line arguments from string list

I am passing a list of strings $list and want to remove all entries with --shift=number, --sort=number/number/..., --group=number/number/... Also are removed whether upper or lower case letters are used For example the following will all be deleted from the list --shift=12 --shift=2324... (7 Replies)
Discussion started by: kristinu
7 Replies

6. Shell Programming and Scripting

Removing string between two particular strings in a line

Hi, I have a file with following format: 1|What is you name (full name)?|Character 2|How far is your school ?|Numeric Now I need to remove everything inside brackets () or . There can be more than one pair of brackets. The output file should look like: 1|What is you name?|Character... (8 Replies)
Discussion started by: ppatra
8 Replies

7. Shell Programming and Scripting

search a string in a particular column of file and return the line number of the line

Hi All, Can you please guide me to search a string in a particular column of file and return the line number of the line where it was found using awk. As an example : abc.txt 7000,john,2,1,0,1,6 7001,elen,2,2,0,1,7 7002,sami,2,3,0,1,6 7003,mike,1,4,0,2,1 8001,nike,1,5,0,1,8... (3 Replies)
Discussion started by: arunshankar.c
3 Replies

8. Solaris

Line too long error Replace string with new line line character

I get a file which has all its content in a single row. The file contains xml data containing 3000 records, but all in a single row, making it difficult for Unix to Process the file. I decided to insert a new line character at all occurrences of a particular string in this file (say replacing... (4 Replies)
Discussion started by: ducati
4 Replies

9. UNIX for Dummies Questions & Answers

removing line and duplicate line

Hi, I have 3 lines in a text file that is similar to this (as a result of a diff between 2 files): 35,36d34 < DATA.EVENT.EVENT_ID.s = "3661208" < DATA.EVENT.EVENT_ID.s = "3661208" I am trying to get it down to just this: DATA.EVENT.EVENT_ID.s = "3661208" How can I do this?... (11 Replies)
Discussion started by: ocelot
11 Replies
Login or Register to Ask a Question