Performance of log parsing shell script very slow


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Performance of log parsing shell script very slow
# 1  
Old 04-08-2009
Performance of log parsing shell script very slow

Hello,
I am an absolute newbie and whatever I've written in the shell script (below) has all been built with generous help from googling the net and this forum. Please forgive any schoolboy mistakes.

Now to the qn, my input file looks like this -

2009:04:03 08:21:41:513,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: OIS - EndUserName: - RequestId: null20090403082313
2009:04:03 08:21:41:775,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: OIS - EndUserName: - RequestId: null20090403082313 - StatusCode: 0 - StatusText: Success
2009:04:03 08:21:45:660,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: VCC - EndUserName: - RequestId: 411111111111111120090403082318
2009:04:03 08:21:46:171,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: VCC - EndUserName: - RequestId: 411111111111111120090403082318 - StatusCode: 0 - StatusText: Success
2009:04:03 08:21:49:583,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: CO - EndUserName: - RequestId: 20090403082321
2009:04:03 08:22:03:571,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: CO - EndUserName: - RequestId: 20090403082321 - StatusCode: 0 - StatusText: Success

From this I have to find out the resp time (start time - end time) for each request id for the SvcName CO. I have written this shell script (not finished though) however the performance is very slow (takes a minute to process a 100 line file). can you please point me in the right direction to improve the performance? Although not the exact subject of this post, if any pointers can be given to calculate the difference between start and end time, it will be quite helpful.
Code:
#!/bin/bash
#script to resptime timings for CO call from logfile
#////////////////////////////////////
#if no command line args
    if [ $# -ne 1 ]
    then
    echo 1>&2"Oops......Usage is wrong. $0 <tgtsrchfile>"
    exit 2
    fi
#assigning command line params to variables
srchFN=$1
#remove resptime.log if already present
checkfile="./resptime.log"
tempfile="./temp.log"
tempfile1="./temp1.log"
    if [ -e $checkfile ];then
    rm -r $checkfile
    fi
    if [ -e $tempfile ];then
        rm -r $tempfile
    fi
    if [ -e $tempfile1 ];then
        rm -r $tempfile1
    fi
    #if keywordfile not present
    if [ ! -r $srchFN ]; then
    echo Target search file $srchFN not present
    exit 2
    fi
#grep for request id
        grep 'Start.*CO' $srchFN | awk -F "RequestId: " '{print $2}'>temp.log
#for each request id get starttime and end time and print into temp file
        cat temp.log | while read line; do
        #if string is empty
        if [ -n $line ];then
        sttime=`grep Start.*CO.*$line $srchFN | awk -F "," '{print $1}'|awk -F " " '{print $2}'`
        endtime=`grep End.*CO.*$line $srchFN | awk -F "," '{print $1}'|awk -F " " '{print $2}'`
            if [ -n "$sttime" -o -n "$endtime" ];then
            echo $line,$sttime,$endtime>>temp1.log
            fi
        fi        
        done;
        
#/////////////////////////////////////

Thanks in advance

Last edited by jim mcnamara; 04-08-2009 at 10:51 AM.. Reason: code tags
# 2  
Old 04-08-2009
something wrong with this line, no delimiter specified:

Code:
grep 'Start.*CO' $srchFN | awk -F "RequestId: " '{print $2}'>temp.log

should be something like:

Code:
grep 'Start.*CO' $srchFN | awk -F, "RequestId: " '{print $2}'>temp.log


it looks like the problem is you're rereading the whole $srchFN for every line
of your temp.log twice through those grep commands.

You'll need to reformat your logic so that these extra greps are NOT even done.

Try to rethink things so that you know what you're looking for while only
reading the file once.
# 3  
Old 04-08-2009
Quote:
Originally Posted by quirkasaurus
it looks like the problem is you're rereading the whole $srchFN for every line
of your temp.log twice through those grep commands.

Try to rethink things so that you know what you're looking for while only
reading the file once.
My input file has one line for the start of a request and one for the end of the request. And there are hundreds of unique requests. So I am struggling to think of how else I can get the start and end time with one grep, for each req id...
# 4  
Old 04-08-2009
too many greps, too many awks, useless cats....
# 5  
Old 04-08-2009
something like this? substitute 'd' for your log file:

Code:
cat d |
while read line ; do

#----------------------------------------------------------------------#
# Start time and service name.                                         #
#----------------------------------------------------------------------#
  if [[ $line = *Service,Start* ]] ; then
    echo $line |
      sed -e 's/^.*SvcName: //' -e 's/ - .*$//' |
      read service_name

    echo $line |
      sed -e 's/,INFO.*$//' |
      read start_time
  fi

#----------------------------------------------------------------------#
# End time.                                                            #
#----------------------------------------------------------------------#
  if [[ $line = *Service,End*SvcName*$svcname*EndUser* ]] ; then
    echo $line |
      sed -e 's/,INFO.*$//' |
      read end_time
  echo start-time $start_time end-time $end_time service-name $service_name
  fi
done

# 6  
Old 04-08-2009
btw -- that'll work really only if each request has an end-time record immediately following it. if this is not the case, then let me know.

there's another trick we'll have to use to preprocess the log file.

specifically, creating a key using the svcname and "Service,Start" "Service,End"
and sorting ASC,DESC based on that key.

Then processing the file again using my above script.
# 7  
Old 04-08-2009
I would use Perl to calculate the date and time differences (using an external module Date::Manip):

Code:
#!/usr/bin/perl 

use warnings;
use strict;

use Date::Manip;

my $infile = 'file';

open FH, $infile or die "$infile: $!";

my ( $sdate, $edate, $delta, $err, $ustart, $uend, $rtime );

while (<FH>) {
    if (/SvcName: CO/) {
        /Start -/ and ( $sdate = ( split ',' )[0] ) =~ s|:(.*?):|/$1/|;
        /End -/   and ( $edate = ( split ',' )[0] ) =~ s|:(.*?):|/$1/|;
    }
    if ( $sdate && $edate ) {
        $ustart = UnixDate( $sdate, "%s" );
        $uend   = UnixDate( $edate, "%s" );
        if ($edate) {
            $rtime = $ustart - $uend;
            print $sdate, " - ", $edate, "\n";
            print DateCalc( $sdate, $edate, \$err ), "\n";
        }
    }
}

With your sample data it produces the following output:


Code:
% cat file
2009:04:03 08:21:41:513,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: OIS - EndUserName: - RequestId: null20090403082313
2009:04:03 08:21:41:775,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: OIS - EndUserName: - RequestId: null20090403082313 - StatusCode: 0 - StatusText: Success
2009:04:03 08:21:45:660,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: VCC - EndUserName: - RequestId: 411111111111111120090403082318
2009:04:03 08:21:46:171,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: VCC - EndUserName: - RequestId: 411111111111111120090403082318 - StatusCode: 0 - StatusText: Success
2009:04:03 08:21:49:583,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: CO - EndUserName: - RequestId: 20090403082321
2009:04:03 08:22:03:571,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: CO - EndUserName: - RequestId: 20090403082321 - StatusCode: 0 - StatusText: Success
zsh-4.3.9[t]% ./s       
2009/04/03 08:21:49:583 - 2009/04/03 08:22:03:571
+0:0:0:0:0:0:14

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Solaris

Solaris 11.1 Slow Network Performance

I have identical M5000 machines that are needing to transfer very large amounts of data between them. These are fully loaded machines, and I've already checked IO, memory usage, etc... I get poor network performance even when the machines are idle or copying via loopback. The 10 GB NICs are... (7 Replies)
Discussion started by: christr
7 Replies

2. Infrastructure Monitoring

99% performance wa, slow server.

There is a big problem with the server (VPS based on OpenVZ, CentOS 5, 3GB RAM). The problem is the following. The first 15-20 minutes after starting the server is operating normally, the load average is less than or about 1.0, but then begins to increase sharply% wa, then hovers around 95-99%.... (2 Replies)
Discussion started by: draiphod
2 Replies

3. Shell Programming and Scripting

Slow performance filtering file

Please, I need help tuning my script. It works but it's too slow. The code reads an acivity log file with 50.000 - 100.000 lines and filters error messages from it. The data in the actlog file look similar to this: 02/08/2011 00:25:01,ANR2034E QUERY MOUNT: No match found using this criteria.... (5 Replies)
Discussion started by: Miila
5 Replies

4. UNIX for Dummies Questions & Answers

Slow copy/performance... between volumes

hi guys We are seeing weird issues on my Linux Suse 10, it has lotus 8.5 and 1 filesystem for OS and another for Lotus Database. the issue is when the Lotus service starts wait on top is very high about 25% percent and in general CPU usage is very high we found that when this happens if we... (0 Replies)
Discussion started by: kopper
0 Replies

5. Filesystems, Disks and Memory

Slow Copy(CP) performance

Hi all We have got issues with copying a 2.6 GB file from one folder to another folder. Well, this is not the first issue we are having on the box currently, i will try to explain everything we have done from the past 2 days. We got a message 2 days back saying that our Production is 98%... (3 Replies)
Discussion started by: b_sri
3 Replies

6. Shell Programming and Scripting

Shell script to parsing log

Hi I Have log like this : 0 234: { 3 2: 04 EE 7 14: '20081114081' 23 1: 00 79 10: '38809' 91 15: '528111510010159' 143 29: 'Streaming/downloading service' 174 3: 'MTV' 179 43: 'rtsp://172.28/MTV2GO-Loop.sdp' 224 1: 05 ... (10 Replies)
Discussion started by: justbow
10 Replies

7. Shell Programming and Scripting

egrep is very slow : How to improve performance

We have an egrep search in a while loop. egrep -w "$key" ${PICKUP_DIR}/new_update >> ${PICKUP_DIR}/update_record_new ${PICKUP_DIR}/new_update is 210 MB file In each iteration, the egrep on an average takes around 50-60 seconds to search. Ther'es nothing significant in the loop other... (7 Replies)
Discussion started by: hidnana
7 Replies

8. Shell Programming and Scripting

Shell script for parsing 300mb log file..

am relatively new to Shell scripting. I have written a script for parsing a big file. The logic is: Apart from lot of other useless stuffs, there are many occurances of <abc> and corresponding </abc> tags. (All of them are properly closed) My requirement is to find a particular tag (say... (3 Replies)
Discussion started by: gurpreet470
3 Replies

9. Post Here to Contact Site Administrators and Moderators

Help! Slow Performance

Is the performance now very, very slow (pages take a very long time to load)? Or is it just me? Neo (6 Replies)
Discussion started by: Neo
6 Replies
Login or Register to Ask a Question