Visit Our UNIX and Linux User Community


grep'ing and sed'ing chunks in bash... need help on speeding up a log parser.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting grep'ing and sed'ing chunks in bash... need help on speeding up a log parser.
# 1  
Old 04-21-2009
Question grep'ing and sed'ing chunks in bash... need help on speeding up a log parser.

I have a file that is 20 - 80+ MB in size that is a certain type of log file.

It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example:

The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created for each day
Quote:
20090409 000122 - BD0 - Order 123 starting session
20090409 000122 - BD0 - Processing 1
20090409 000122 - BD0 - More Processing
20090409 000123 - EF0 - Order 234 starting session
20090409 000124 - EF0 - Processing
20090409 000124 - BD0 - Processing 2
20090409 000125 - BD0 - More Processing
20090409 000125 - EF0 - Processing
20090409 000125 - DD1 - Cancel 345 starting session
20090409 000125 - DD1 - Processing
20090409 000126 - DD1 - Processing 2
20090409 000126 - BD0 - Order 123 shutting down
20090409 000127 - 11F - Query 543 starting session
20090409 000127 - 11F - Processing
..
..
20090409 000135 - 11F - Query 543 shutting down
..
20090409 000140 - EF0 - Order 234 shutting down
..
..
..
20090409 000143 - DD1 - Cancel 345 shutting down
Now, here's where it gets to be a pain... I need to pull out the lines from "Starting Session" to "Ending Session" for each Thread ID, and dump these to separate files. HOWEVER, the Thread ID CAN be duplicated over the course of a day -- but usually not for many hours.

A session can last from 30 seconds to 4 minutes or so (~1200 lines) in the logfile, and there can be up to 20 concurrent sessions.

Now, I have something that works -- although quite slowly. I end up grepping and sedding the file over and over. When the file gets large, it takes a MASSIVE amount of time. I am hoping that someone here can help me optimize this. If possible, I'd like to use bash.

Thanks,
Eric

Here is the code I have that works, but is _slow_

Code:
    if [[ -e "$log_file" ]]
    then
        echo "parsing: "$log_file
        grep "starting session" $log_file | while read line 
        do
            thread=`echo $line | cut -d' ' -f4`
            sessiontype=`echo $line | cut -d' ' -f6`
            sessionnumber=`echo $line | cut -d' ' -f7`

            echo "  first line of session: "${line:0:25}"..."
            line2=`echo  - $thread - $sessiontype $sessionnumber shutting down`
            echo "  last line of session: "${line2:0:25}"..."
            sed -n "/$line/,/$line2/p" $log_file | grep " - $thread - ">session.$thread.$sessiontype.$sessionnumber
        done
    ....

This gives me a number of files, that using the example log above would be created as shown below:
Quote:
file: session.BD0.Order.123
20090409 000122 - BD0 - Order 123 starting session
20090409 000122 - BD0 - Processing 1
20090409 000122 - BD0 - More Processing
20090409 000124 - BD0 - Processing 2
20090409 000125 - BD0 - More Processing
20090409 000126 - BD0 - Order 123 shutting down

file: session.DD1.Cancel.345
20090409 000125 - DD1 - Cancel 345 starting session
20090409 000125 - DD1 - Processing
20090409 000126 - DD1 - Processing 2
..
..
..
20090409 000143 - DD1 - Cancel 345 shutting down

file: session.11F.Query.543
20090409 000127 - 11F - Query 543 starting session
20090409 000127 - 11F - Processing
..
..
20090409 000135 - 11F - Query 543 shutting down

file: session.EF0.Order.234
20090409 000123 - EF0 - Order 234 starting session
20090409 000124 - EF0 - Processing
20090409 000125 - EF0 - Processing
20090409 000140 - EF0 - Order 234 shutting down
# 2  
Old 04-21-2009
Assuming the first line of a session ends with "starting session" you can try this (not tested):

Code:
awk '{
  !a[$4]{a[$4]=$4; n[$4]="session."$4"."$6"."$7}
  a[$4]{print > n[$4]}
' file

Use nawk or /usr/xpg4/bin/awk on Solaris if you get errors.

Regards
# 3  
Old 04-21-2009
Sorry, I should have been more specific. The starting session lines all end with something like:

20090409 000122 - BD0 - Order 123 starting session with client 12 port 34
20090409 000123 - EF0 - Order 234 starting session with client 347 port 38
...

And both the client and port are dynamic values.

Yeah, I'm getting errors -- I'm running this under cygwin, so I don't have easy access to nawk.
# 4  
Old 04-21-2009
try this -
$ sort -k 4 logfile | awk 'NR==1{prev=$4;txt="session."$4"."$6"."$7;printf("%s\n%s",txt,$0);getliine}{if (prev !~ $4){txt="session."$4"."$6"."$7;printf("%s\n%s",txt,$0);prev=$4}else {print $0;prev=$4}}'

will give something like this -

session.11F.Processing.
20090409 000127 - 11F - Processing20090409 000127 - 11F - Processing
20090409 000127 - 11F - Query 543 starting session
session.BD0.More.Processing
20090409 000122 - BD0 - More Processing20090409 000125 - BD0 - More Processing
20090409 000126 - BD0 - Order 123 shutting down
20090409 000122 - BD0 - Order 123 starting session
20090409 000122 - BD0 - Processing 1
20090409 000124 - BD0 - Processing 2
session.DD1.Cancel.345
20090409 000125 - DD1 - Cancel 345 starting session20090409 000125 - DD1 - Processing
20090409 000126 - DD1 - Processing 2
session.EF0.Order.234
20090409 000123 - EF0 - Order 234 starting session20090409 000124 - EF0 - Processing
20090409 000125 - EF0 - Processing


cheers,
Devaraj Takhellambam
# 5  
Old 04-21-2009
Try this one:

Code:
awk '
{if !($4 in a){a[$4]=$4; n[$4]="session."$4"."$6"."$7}}
{if ($4 in a) {print > n[$4]}}
' file

Regards

Previous Thread | Next Thread
Test Your Knowledge in Computers #454
Difficulty: Medium
LibreOffice is a free and open office suite developed and maintained by IBM.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

grep'ing a variable - why not working

Hi all, Am writing a ksh script where I am looking for processes that has gone defunct and all of which has the same PPID PID is the variable that I need to match as this is the process ID of the processes that has gone defunct Am just curious how come the following DOES NOT work? ps... (6 Replies)
Discussion started by: newbie_01
6 Replies

2. Shell Programming and Scripting

Grep'ing information from a log file on SUN OS 5

Hi Guys, I'm trying to write an script that will be launched by a user. The script will look at a log file and check for alerts with the date (supplied by user) and a machine's hostname (also supplied by the user). I'm trying to get the output formatted just like the log file. The logfile looks... (5 Replies)
Discussion started by: illgetit
5 Replies

3. Shell Programming and Scripting

grep'ing dot history file

Hi, I tried to grep ".sh_history" (DOTsh_history) file and did not return anything though I found the word in .sh _history file through vi editor in Linux. Then I tried to grep ".profile" to check if it is the prob with hidden files and I got results. Then I verified the same with my friend... (4 Replies)
Discussion started by: bobbygsk
4 Replies

4. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

5. Shell Programming and Scripting

grep'ing a variable that contains a metacharacter ($) with a while loop

This is driving me crazy, and I'm hoping someone can help me out with this. I'm trying to do a simple while loop to go through a log file. I'm pulling out all of the lines with a specific log line, getting an ID from that line, and once I have a list of IDs I want to loop back through the log and... (2 Replies)
Discussion started by: DeCoTwc
2 Replies

6. Programming

Obfuscate'ing a.out ... ???

Hi all, I've search the forums regarding posts similar to this already but can't find the suitable response. Am actually looking for something very trivial I think. I just want to mask/obfuscate the a.out file and run it like a normal UNIX program. I've look at gpg and encryption but it requires... (4 Replies)
Discussion started by: newbie_01
4 Replies

7. Shell Programming and Scripting

pipe'ing grep output to awk

This script is supposed to find out if tomcat is running or not. #!/bin/sh if netstat -a | grep `grep ${1}: /tomcat/bases | awk -F: '{print $3}'` > /dev/null then echo Tomcat for $1 running else echo Tomcat for $1 NOT running fi the /tomcat/bases is a file that... (2 Replies)
Discussion started by: ziggy25
2 Replies

8. Shell Programming and Scripting

grep'ing a file until a certain message appears

Hello, I'm writing a script that will automate the launch of some services on my AIX machine. However, some services are dependent on the successful startup of others. When I start these services manually, I usually just check a log file until I see a message that confirms a successful... (3 Replies)
Discussion started by: pallak7
3 Replies

9. Shell Programming and Scripting

grep'ing for specific directories, and using the output to move files

Hello, this is probably another really simple tasks for most of you gurus, however I am trying to make a script which takes an input, greps a specific file for that input, prints back to screen the results (which are directory names) and then be able to use the directory names to move files.... (1 Reply)
Discussion started by: JayC89
1 Replies

10. UNIX for Dummies Questions & Answers

grep'ing for text within a bunch of files...?

I have, say, a dozen files, and I want to grep for a string of text within them. I don't remember the exact syntax, but let me give it a shot and show you an idea here... find . -type f -exec grep thisword {} \; ...and there's a way to put more than one grep into the statement, so it will tell... (1 Reply)
Discussion started by: kitykity
1 Replies

Featured Tech Videos