Visit The New, Modern Unix Linux Community


Sed Comparing Parenthesized Values In Previous Line To Current Line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sed Comparing Parenthesized Values In Previous Line To Current Line
# 1  
Sed Comparing Parenthesized Values In Previous Line To Current Line

I am trying to delete lines in archived Apache httpd logs

Each line has the pattern:

<ip-address> - - <date-time> <document-request-URL> <http-response> <size-of-req'd-doc> <referring-document-URL>

This pattern is shown in the example of 6 lines from the log in the code box below. These 6 lines are in a row, and all start with same IP address. These are actually the first 6 lines of about 20-25 lines, in which a document was served with multiple gif images.

My purpose is to delete lines under the following conditions:

First, determine whether a line in the log becomes a "reference line"
This happens if the line being tested ("subsequent line")
  1. has a different IP address field from IP address field of the reference line or if there is no reference line currently to evaluate (uninitialized reference line)
  2. has the same IP address to the IP address field in the reference line, but the referring document is not from my web site
  3. has the same IP address to the IP address in the reference line but the requested (GET) document is from my web site but a different HTML document
  4. has the same IP address to the IP address in the current reference line but the requested document is logged more than 60 seconds after the previous requested document in the current reference line
Note that a "subsequent line" is one that does not qualify to become a "reference line."

I have only one sed command line now, basically the regular expression to correspond to the pattern which identifies the line and parenthesized expressions/fields in the line. To be safe, I am using old-style regular expression syntax and not any "extended" kind, such as using `\d` metacharacters to indicate digits

Code:
/(([0-9]{1,3}\.){3}[0-9]{1,3}).*([0-9]{2}\/[a-zA-Z]{3}\/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}) \"GET (.*) HTTP\/1\.1\" [0-9]{3} [0-9]{1,} \"http:\/\/my\.website\.org\//

I have wrapped three fields: the IP address, Date, requested document (between GET and HTTP/1.1), in parentheses.

This becomes the line to be tested: the tests are to do as above. 1) check for IP address difference, (2) check for time difference, (3) examine the requested docfield for file types ( gif | ico | css | js | png | jpg | jpeg | etc ) basically if they are not html, they get deleted (4) make sure the hostname/server name in the referring document is 'http://my.website.org/'

I am thinking that I need to use the Hold and eXchange pattern system, but am not sure how to go about that. More importantly, I must do comparisons on the text, converting date/time expressions into integers to be compared, and more doing string comparisons. The sed utility, as far as I know, has no built-in features for this, so I may have to pass these as parameters to a shell (?) to do the comparisons and return a result that sed can work with.

I have even more of a challenge too: see the NB below.

What I need is a good pointer or reference to what I should be telling sed to do, aside from just being given the answer. Thanks.

Code:
[line lengths broken up to avoid an annoying presentation]

172.16.77.182 - - [18/Sep/2012:20:48:16 +0300] "GET /reference/imagesHistoHTML/ethyl-eosin.gif HTTP/1.1" 200 3300 "http://www.google.com/imgres?hl=en&sa=X&rlz=1C1CHKZ_enUS433US433&biw=1280&
   bih=670&tbm=isch&prmd=imvns&tbnid=fEILNrTkTl2MzM:&imgrefurl=http://my.website.org/reference/histo.html&docid=r1tvojxQaLyVFM&imgurl=http://my.website.org/reference/imagesHistoHTML/ethyl-eosin.gif
   &w=366&h=265&ei=ybNYUPD5CNSO0QHoqYG4BQ&zoom=1&iact=hc&vpx=635&vpy=80&dur=2959&hovh=191&hovw=264&tx=168&ty=103&sig=114532110125230912831&page=1&tbnh=120&tbnw=166&start=0&
   ndsp=18&ved=1t:429,r:15,s:0,i:123" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
172.16.77.182 - - [18/Sep/2012:20:48:17 +0300] "GET /reference/style/std.css HTTP/1.1" 200 5429 "http://my.website.org/reference/histo.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 
   (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
172.16.77.182 - - [18/Sep/2012:20:48:17 +0300] "GET /style/std.css HTTP/1.1" 200 5429 "http://my.website.org/reference/histo.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) 
    Chrome/21.0.1180.89 Safari/537.1"
172.16.77.182 - - [18/Sep/2012:20:48:16 +0300] "GET /reference/histo.html HTTP/1.1" 200 118818 "http://www.google.com/imgres?hl=en&sa=X&rlz=1C1CHKZ_enUS433US433&biw=1280&bih=670&tbm=isch&
     prmd=imvns&tbnid=fEILNrTkTl2MzM:&imgrefurl=http://my.website.org/reference/histo.html&docid=r1tvojxQaLyVFM&imgurl=http://my.website.org/reference/imagesHistoHTML/ethyl-eosin.gif
    &w=366&h=265&ei=ybNYUPD5CNSO0QHoqYG4BQ&zoom=1&iact=hc&vpx=635&vpy=80&dur=2959&hovh=191&hovw=264&tx=168&ty=103&sig=114532110125230912831&page=1&tbnh=120
    &tbnw=166&start=0&ndsp=18&ved=1t:429,r:15,s:0,i:123" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
172.16.77.182 - - [18/Sep/2012:20:48:18 +0300] "GET /reference/imagesHistoHTML/dichlorotriazinyl.gif HTTP/1.1" 200 1406 "http://my.website.org/reference/histo.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) 
    AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"
172.16.77.182 - - [18/Sep/2012:20:48:19 +0300] "GET /reference/imagesHistoHTML/nitroso%20dye%20structure.gif HTTP/1.1" 200 2259 "http://my.website.org/reference/histo.html" "Mozilla/5.0
    (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"

(to respect privacy of those accessing the server, I changed IP address to a recognized private LAN address [I hope])


NB: I am running this sed script as sed.exe (GNU sed version 4.2.1 (c) 2009) under Microsoft Windows 7, thus solutions requiring use of a shell should be a shell command processor installed (MS cmd version 6..7601 or or installable within Windos 7. I am aware that I can process the text of the logs within a VM running a Linux distro (I have, for instance, Ubuntu and TinyCore Linux installed as VMs), but (1) I have not kicked the MS Windows environment as an every-day use system and (2) my facility in bash scripting was more than a decade ago.

Last edited by Proteomist; 10-01-2012 at 03:03 AM.. Reason: break up code-boxed line lengths
# 2  
Sed scripts that handle multiple lines usually have a different flavor -- I like to call them loopers.
  • You add more lines using N. Often, the only line not read with N is the first! The behavior of N at $ (eof) was buggy in some early versions, so I test for that before the N.
  • Then you can write regex that span or hook to the '\n' in between lines that also still matches '.'.
  • Using :labels and t or b branching, you can pile up lines in the buffrer to your heart's content (or your old sed version's fixed buffer size).
  • You can use P to spit out just the first line.
  • With s and \(\) and \1 \2 ... you can swap lines around.
  • Not much use for D, since you start over.
  • The '\n' does not seem to be something you can put in [ ... ].
My sed to remove extra blank lines in a row:
Code:
sed '
  :loop
  $b
  N
  s/^\n$//
  t loop
  P
  s/.*\n//
  t loop
 '


Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #433
Difficulty: Easy
JSON, or JavaScript Object Notation, is a general-purpose data interchange format that is defined as a subset of JavaScript's object literal syntax.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace values in script reading line by line using sed

Hi all, Let's say I have a script calling for the two variables PA_VALUE and PB_VALUE. for pa in PA_VALUE blah blah do for pb in PB_VALUE blah blah do I have a text file with two columns of values for PA and PB. 14.5 16.7 7.8 9.5 5.6 3.6 etc etc I would like to read this... (7 Replies)
Discussion started by: crimsonengineer
7 Replies

2. Shell Programming and Scripting

Read column values from previous and next line using awk

Hi, I have a csv file which contains data that looks something like this: Key1 Key2 Key3 New_Key1 New_Key2 New_Key3 102 30 0 - - - 102 40 1 30 40 50 102 50 2 40 50 ... (4 Replies)
Discussion started by: Nishi_Licious
4 Replies

3. Shell Programming and Scripting

Perl: Conditional replace based on previous and current value in a line

I need to read the contents of a file. Then I need to grep for a keyword and replace part of the grepped line based on the condition of previous and present line. Example input file: V { port1 = P; port2 = 0; shift_port = P0; /* if next shift_port is P0 I need... (9 Replies)
Discussion started by: naveen@
9 Replies

4. Shell Programming and Scripting

ksh comparing current and previous lines

Hi, I am currently trying to work out how to compare one line with the last line I have read in via ksh. I have a file which has sorted output from a previous sort command so all the lines are in order already and the file would look something like show below. Each line has a name and a time... (5 Replies)
Discussion started by: paulie
5 Replies

5. UNIX for Dummies Questions & Answers

Awk to print data from current and previous line

Hi guys, I have found your forum super useful. However, right now I am stuck on a seemingly "simple" thing in AWK. I have two columns of data, the first column in Age (in million years) and the second column is Convergence Rate (in mm/yr). I am trying to process my data so I can use it to... (2 Replies)
Discussion started by: awk_noob_456
2 Replies

6. Shell Programming and Scripting

How to use sed to search for string and Print previous two lines and current line

Hello, Can anybody help me to correct my sed syntax to find the string and print previous two lines and current line and next one line. i am using string as "testing" netstat -v | sed -n -e '/test/{x;2!p;g;$!N;p;D;}' -e h i am able to get the previous line current line next line but... (1 Reply)
Discussion started by: nmadhuhb
1 Replies

7. Shell Programming and Scripting

awk;sed appending line to previous line....

I know this has been asked before but I just can't parse the syntax as explained. I have a set of files that has user information spread out over two lines that I wish to merge into one: User1NameLast User1NameFirst User1Address E-Mail:User1email User2NameLast User2NameFirst User2Address... (11 Replies)
Discussion started by: walkerwheeler
11 Replies

8. Shell Programming and Scripting

SED or AWK "append line to the previous line"

Hi, How can I remove the line beak in the following case if the line begin with the special char ;? TEXT Text;text ;text Text;text;text I want to convert the text to: Text;text;text Text;text;text I have already tried to use... (31 Replies)
Discussion started by: research3
31 Replies

9. Shell Programming and Scripting

sed: appending alternate line after previous line

Hi all, I have to append every alternate line after its previous line. For example if my file has following contents line 1: unix is an OS line 2: it is open source line 3: it supports shell programming line 4: we can write shell scripts Required output should be line1: unix is an OS it is... (4 Replies)
Discussion started by: rish_max
4 Replies

10. Shell Programming and Scripting

Print previous, current and next line using sed

Hi, how can i print the previous, current and next line using sed? current line is the matching line. The following prints all lines containing 'Failure' and also the immediate next line cat $file | sed -n -e '/Failure/{N;p;}' Now, i also want to print the previous line too. Thanks,... (8 Replies)
Discussion started by: ysrinu
8 Replies

Featured Tech Videos