Sed/awk gods, I need your Help! Fancy log extraction


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sed/awk gods, I need your Help! Fancy log extraction
# 1  
Old 06-22-2007
CPU & Memory Sed/awk gods, I need your Help! Fancy log extraction

Hi! I'm trying to find a way to extract a certain amount of lines from a log file. This would allow me to "follow" a web user through our log files.

Here is a sample fake log file to explain what i want to accomplish :
[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer

[2007-06-22 09:33:15,844][thread-2][DEB_]Here is activity from another customer - we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-3][DEB_]more activity from yet another customer- we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST
[2007-06-22 09:33:15,843][thread-34][BEG_]BEGIN REQUEST sessionID=123456

[2007-06-22 09:33:15,844][thread-1][DEB_]Another customer took thread-1! We don't want that log entry either
[2007-06-22 09:33:15,844][thread-34][DEB_]yet more activity from the customer but under a different thread!
[2007-06-22 09:33:15,843][thread-34][BEG_]END REQUEST


What i need is a request that, using sessionID=123456, will identify the appropriate thread ID and extract the lines containing the thread ID between the BEGIN REQUEST and END REQUEST tags.

So basically, the result would be :
[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST
[2007-06-22 09:33:15,843][thread-34][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-34][DEB_]yet more activity from the customer but under a different thread!
[2007-06-22 09:33:15,843][thread-34][BEG_]END REQUEST

what the expression would need to do :
1 - locate sessionID=123456
2 - grab threadID from the same line
3 - dump all threadID lines up to threadID.*END REQUEST
4 - rinse and repeat

Unfortunately, i'm only a neophyte in using sed or awk so i have no idea how to proceed...

Not even sure this can be done. If not i'll use perl, but having a nice expression that could do that (and understanding it) would be a big help for me.

If someone can lend me a hand or at least give me pointers, that'd be very appreciated. Hope my question is clear enough!

Thanks

Last edited by gnagus; 06-22-2007 at 04:09 PM.. Reason: Edit : removed references to UniqueID, replaced by sessionID
# 2  
Old 06-22-2007
Gnagus,
See if this works for you:
Code:
sed -n '/BEGIN REQUEST.*34444/,/END REQUEST/p' input_file

# 3  
Old 06-22-2007
Try the foolowing script (named th.sh):

Code:
awk -v Id=123456 -v FS='[][]' '
   $7 ~ "BEGIN REQUEST sessionID=" Id {
      thread = $4;
   }
   $4 == thread
   $7 ~/END REQUEST/ { thread="" }
' th.txt

Inputfile (th.txt):
Code:
[2007-06-22 09:33:15,840][thread-1][BEG_]BEGIN REQUEST sessionID=100001
[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-2][DEB_]Here is activity from another customer - we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-3][DEB_]more activity from yet another customer- we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST
[2007-06-22 09:33:15,843][thread-34][BEG_]BEGIN REQUEST sessionID=34444
[2007-06-22 09:33:15,844][thread-1][DEB_]Another customer took thread-1! We don't want that log entry either
[2007-06-22 09:33:15,844][thread-34][DEB_]yet more activity from the customer but under a different thread!
[2007-06-22 09:33:15,843][thread-34][BEG_]END REQUEST

Output:
Code:
[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST


Last edited by aigles; 06-22-2007 at 04:35 PM.. Reason: typo error, remove $ from Id assignment
# 4  
Old 06-22-2007
Quote:
Originally Posted by Shell_Life
Gnagus,
See if this works for you:
Code:
sed -n '/BEGIN REQUEST.*34444/,/END REQUEST/p' input_file

ShellLife, I believe that won't work because i have many, many customers accessing the site at the same time, so it'll most likely stop on the next "END REQUEST" it finds, regardless of wether it's related to 34444 or not.

NOTE: Edited the original post to have two threads for 34444. That's what i was aiming for in the first place, typo! Smilie
# 5  
Old 06-22-2007
Quote:
Originally Posted by aigles
Try the foolowing script (named th.sh):

Code:
awk -v Id=$123456 -v FS='[][]' '
   $7 ~ "BEGIN REQUEST sessionID=" Id {
      thread = $4;
   }
   $4 == thread
   $7 ~/END REQUEST/ { thread="" }
' th.txt

Inputfile (th.txt):
Code:
[2007-06-22 09:33:15,840][thread-1][BEG_]BEGIN REQUEST sessionID=100001
[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-2][DEB_]Here is activity from another customer - we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-3][DEB_]more activity from yet another customer- we don't need that
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST
[2007-06-22 09:33:15,843][thread-34][BEG_]BEGIN REQUEST sessionID=34444
[2007-06-22 09:33:15,844][thread-1][DEB_]Another customer took thread-1! We don't want that log entry either
[2007-06-22 09:33:15,844][thread-34][DEB_]yet more activity from the customer but under a different thread!
[2007-06-22 09:33:15,843][thread-34][BEG_]END REQUEST

Output:
Code:
[2007-06-22 09:33:15,843][thread-1][BEG_]BEGIN REQUEST sessionID=123456
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,844][thread-1][DEB_]boatload of lines for thread-1 detailing the whereabouts of the customer
[2007-06-22 09:33:15,843][thread-1][BEG_]END REQUEST

Salut p'tit cousin français! Smilie

I don't know what awk/nawk version you're using, but mine definitively doesn't like your script.... it just dies with a not-very-helpful "awk: syntax error near line 1"

Running awk under Solaris 9 here....
# 6  
Old 06-22-2007
try nawk or /usr/xpg4/bin/awk instead of plain awk.
# 7  
Old 06-22-2007
I have fixed a typo error, remove the $ from the Id variable asignment :
Code:
awk -v Id=123456 -v FS='[][]' '

Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extraction problem with sed command

Hi All I am trying to remove the line having specific pattern from a file by using sed command I have the file named ODS_REP_SRCE_File.txt with content as: ... (1 Reply)
Discussion started by: Shilpi Gupta
1 Replies

2. Shell Programming and Scripting

sed text extraction between 2 patterns using variables

Hi everyone! I'm writting a function in .bashrc to extract some text from a file. The file looks like this: " random text Begin CG step 1 random text Begin CG step 2 ... Begin CG step 100 random text" For a given number, let's say 70, I want all the text between "Begin CG... (4 Replies)
Discussion started by: radudownload
4 Replies

3. UNIX for Dummies Questions & Answers

awk/sed match and extraction

Hi, I have a file like this- aa 12 23 34 aa 21 34 56 aa 78 45 56 I want to print out only the lines after the last aa. How do I do this? I tried using grep -A and sed -n, but both didnt work as I wanted to. Could someone help me out please.. (3 Replies)
Discussion started by: jamie_123
3 Replies

4. Shell Programming and Scripting

Obscure sed extraction syntax

Hi, Could anyone clearly explain me the below sed construct in detail to get to know what it actually does? sed 's/\(* *\)//4' echo 'test;10;20' | sed 's/*;\(*\)/\1/' (1 Reply)
Discussion started by: royalibrahim
1 Replies

5. Shell Programming and Scripting

Extraction of text using sed or awk command

Hi All, I need to extract 543 from the command below : # pvscan PV /dev/sdb1 VG vg0 lvm2 Total: 1 543.88 GB] / in use: 1 / in no VG: 0 I have the following command which does the job, but I think this could be achieved in a more simple way using sed or awk. Any help is... (7 Replies)
Discussion started by: nua7
7 Replies

6. Shell Programming and Scripting

extraction of samba shares with sed

Hi there, My samba configuration file looks like that : ... ... path = /home/samba/profiles/ ... path = /home/samba/shares/family valid users = family path = /home/samba/shares/admins valid users = admins path = /home/samba/shares/publicI want to extract the list of standard... (3 Replies)
Discussion started by: chebarbudo
3 Replies

7. Shell Programming and Scripting

SED scipt help - line extraction

Forgive me if this is a dumb question...I'm a Windows sys admin with little programming knowledge. I have files containing anywhere from 3 to 200 lines. Using SED, I want to extract only lines containing a specific instance of "ISS." It is possible that "ISS" will occur several times in a... (10 Replies)
Discussion started by: thuston22
10 Replies
Login or Register to Ask a Question