The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
grep'ing a file until a certain message appears pallak7 Shell Programming and Scripting 3 04-23-2009 11:48 AM
grep'ing for specific directories, and using the output to move files JayC89 Shell Programming and Scripting 1 12-06-2008 03:37 AM
Speeding up a Shell Script (find, grep and a for loop) Dave Stockdale UNIX for Dummies Questions & Answers 8 08-11-2008 04:36 AM
Speeding up processing a file dlam Shell Programming and Scripting 4 07-19-2008 12:47 PM
speeding up the compilation on SUN Solaris environment swamymns Shell Programming and Scripting 2 07-12-2006 12:06 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-21-2009
elinenbe elinenbe is offline
Registered User
  
 

Join Date: Dec 2008
Posts: 8
Question grep'ing and sed'ing chunks in bash... need help on speeding up a log parser.

I have a file that is 20 - 80+ MB in size that is a certain type of log file.

It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example:

The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created for each day
Quote:
20090409 000122 - BD0 - Order 123 starting session
20090409 000122 - BD0 - Processing 1
20090409 000122 - BD0 - More Processing
20090409 000123 - EF0 - Order 234 starting session
20090409 000124 - EF0 - Processing
20090409 000124 - BD0 - Processing 2
20090409 000125 - BD0 - More Processing
20090409 000125 - EF0 - Processing
20090409 000125 - DD1 - Cancel 345 starting session
20090409 000125 - DD1 - Processing
20090409 000126 - DD1 - Processing 2
20090409 000126 - BD0 - Order 123 shutting down
20090409 000127 - 11F - Query 543 starting session
20090409 000127 - 11F - Processing
..
..
20090409 000135 - 11F - Query 543 shutting down
..
20090409 000140 - EF0 - Order 234 shutting down
..
..
..
20090409 000143 - DD1 - Cancel 345 shutting down
Now, here's where it gets to be a pain... I need to pull out the lines from "Starting Session" to "Ending Session" for each Thread ID, and dump these to separate files. HOWEVER, the Thread ID CAN be duplicated over the course of a day -- but usually not for many hours.

A session can last from 30 seconds to 4 minutes or so (~1200 lines) in the logfile, and there can be up to 20 concurrent sessions.

Now, I have something that works -- although quite slowly. I end up grepping and sedding the file over and over. When the file gets large, it takes a MASSIVE amount of time. I am hoping that someone here can help me optimize this. If possible, I'd like to use bash.

Thanks,
Eric

Here is the code I have that works, but is _slow_

Code:
    if [[ -e "$log_file" ]]
    then
        echo "parsing: "$log_file
        grep "starting session" $log_file | while read line 
        do
            thread=`echo $line | cut -d' ' -f4`
            sessiontype=`echo $line | cut -d' ' -f6`
            sessionnumber=`echo $line | cut -d' ' -f7`

            echo "  first line of session: "${line:0:25}"..."
            line2=`echo  - $thread - $sessiontype $sessionnumber shutting down`
            echo "  last line of session: "${line2:0:25}"..."
            sed -n "/$line/,/$line2/p" $log_file | grep " - $thread - ">session.$thread.$sessiontype.$sessionnumber
        done
    ....
This gives me a number of files, that using the example log above would be created as shown below:
Quote:
file: session.BD0.Order.123
20090409 000122 - BD0 - Order 123 starting session
20090409 000122 - BD0 - Processing 1
20090409 000122 - BD0 - More Processing
20090409 000124 - BD0 - Processing 2
20090409 000125 - BD0 - More Processing
20090409 000126 - BD0 - Order 123 shutting down

file: session.DD1.Cancel.345
20090409 000125 - DD1 - Cancel 345 starting session
20090409 000125 - DD1 - Processing
20090409 000126 - DD1 - Processing 2
..
..
..
20090409 000143 - DD1 - Cancel 345 shutting down

file: session.11F.Query.543
20090409 000127 - 11F - Query 543 starting session
20090409 000127 - 11F - Processing
..
..
20090409 000135 - 11F - Query 543 shutting down

file: session.EF0.Order.234
20090409 000123 - EF0 - Order 234 starting session
20090409 000124 - EF0 - Processing
20090409 000125 - EF0 - Processing
20090409 000140 - EF0 - Order 234 shutting down
  #2 (permalink)  
Old 04-21-2009
Franklin52 Franklin52 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2007
Posts: 4,307
Assuming the first line of a session ends with "starting session" you can try this (not tested):

Code:
awk '{
  !a[$4]{a[$4]=$4; n[$4]="session."$4"."$6"."$7}
  a[$4]{print > n[$4]}
' file
Use nawk or /usr/xpg4/bin/awk on Solaris if you get errors.

Regards
  #3 (permalink)  
Old 04-21-2009
elinenbe elinenbe is offline
Registered User
  
 

Join Date: Dec 2008
Posts: 8
Sorry, I should have been more specific. The starting session lines all end with something like:

20090409 000122 - BD0 - Order 123 starting session with client 12 port 34
20090409 000123 - EF0 - Order 234 starting session with client 347 port 38
...

And both the client and port are dynamic values.

Yeah, I'm getting errors -- I'm running this under cygwin, so I don't have easy access to nawk.
  #4 (permalink)  
Old 04-21-2009
devtakh devtakh is offline
Registered User
  
 

Join Date: Oct 2007
Location: Bangalore
Posts: 514
try this -
$ sort -k 4 logfile | awk 'NR==1{prev=$4;txt="session."$4"."$6"."$7;printf("%s\n%s",txt,$0);getliine}{if (prev !~ $4){txt="session."$4"."$6"."$7;printf("%s\n%s",txt,$0);prev=$4}else {print $0;prev=$4}}'

will give something like this -

session.11F.Processing.
20090409 000127 - 11F - Processing20090409 000127 - 11F - Processing
20090409 000127 - 11F - Query 543 starting session
session.BD0.More.Processing
20090409 000122 - BD0 - More Processing20090409 000125 - BD0 - More Processing
20090409 000126 - BD0 - Order 123 shutting down
20090409 000122 - BD0 - Order 123 starting session
20090409 000122 - BD0 - Processing 1
20090409 000124 - BD0 - Processing 2
session.DD1.Cancel.345
20090409 000125 - DD1 - Cancel 345 starting session20090409 000125 - DD1 - Processing
20090409 000126 - DD1 - Processing 2
session.EF0.Order.234
20090409 000123 - EF0 - Order 234 starting session20090409 000124 - EF0 - Processing
20090409 000125 - EF0 - Processing


cheers,
Devaraj Takhellambam
  #5 (permalink)  
Old 04-21-2009
Franklin52 Franklin52 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2007
Posts: 4,307
Try this one:

Code:
awk '
{if !($4 in a){a[$4]=$4; n[$4]="session."$4"."$6"."$7}}
{if ($4 in a) {print > n[$4]}}
' file
Regards
Closed Thread

Bookmarks

Tags
bash, grep, sed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 03:36 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0