The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
FTP run from shell script gives slow transfer rates Countificus Shell Programming and Scripting 8 04-07-2009 05:50 PM
Shell script to parsing log justbow Shell Programming and Scripting 10 12-06-2008 03:20 PM
egrep is very slow : How to improve performance hidnana Shell Programming and Scripting 7 02-12-2008 07:13 AM
Help! Slow Performance Neo Post Here to Contact Site Administrators and Moderators 6 08-25-2003 04:08 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-08-2009
sowmitr sowmitr is offline
Registered User
  
 

Join Date: Apr 2009
Posts: 2
Performance of log parsing shell script very slow

Hello,
I am an absolute newbie and whatever I've written in the shell script (below) has all been built with generous help from googling the net and this forum. Please forgive any schoolboy mistakes.

Now to the qn, my input file looks like this -

2009:04:03 08:21:41:513,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: OIS - EndUserName: - RequestId: null20090403082313
2009:04:03 08:21:41:775,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: OIS - EndUserName: - RequestId: null20090403082313 - StatusCode: 0 - StatusText: Success
2009:04:03 08:21:45:660,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: VCC - EndUserName: - RequestId: 411111111111111120090403082318
2009:04:03 08:21:46:171,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: VCC - EndUserName: - RequestId: 411111111111111120090403082318 - StatusCode: 0 - StatusText: Success
2009:04:03 08:21:49:583,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: CO - EndUserName: - RequestId: 20090403082321
2009:04:03 08:22:03:571,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: CO - EndUserName: - RequestId: 20090403082321 - StatusCode: 0 - StatusText: Success

From this I have to find out the resp time (start time - end time) for each request id for the SvcName CO. I have written this shell script (not finished though) however the performance is very slow (takes a minute to process a 100 line file). can you please point me in the right direction to improve the performance? Although not the exact subject of this post, if any pointers can be given to calculate the difference between start and end time, it will be quite helpful.
Code:
#!/bin/bash
#script to resptime timings for CO call from logfile
#////////////////////////////////////
#if no command line args
    if [ $# -ne 1 ]
    then
    echo 1>&2"Oops......Usage is wrong. $0 <tgtsrchfile>"
    exit 2
    fi
#assigning command line params to variables
srchFN=$1
#remove resptime.log if already present
checkfile="./resptime.log"
tempfile="./temp.log"
tempfile1="./temp1.log"
    if [ -e $checkfile ];then
    rm -r $checkfile
    fi
    if [ -e $tempfile ];then
        rm -r $tempfile
    fi
    if [ -e $tempfile1 ];then
        rm -r $tempfile1
    fi
    #if keywordfile not present
    if [ ! -r $srchFN ]; then
    echo Target search file $srchFN not present
    exit 2
    fi
#grep for request id
        grep 'Start.*CO' $srchFN | awk -F "RequestId: " '{print $2}'>temp.log
#for each request id get starttime and end time and print into temp file
        cat temp.log | while read line; do
        #if string is empty
        if [ -n $line ];then
        sttime=`grep Start.*CO.*$line $srchFN | awk -F "," '{print $1}'|awk -F " " '{print $2}'`
        endtime=`grep End.*CO.*$line $srchFN | awk -F "," '{print $1}'|awk -F " " '{print $2}'`
            if [ -n "$sttime" -o -n "$endtime" ];then
            echo $line,$sttime,$endtime>>temp1.log
            fi
        fi        
        done;
        
#/////////////////////////////////////
Thanks in advance

Last edited by jim mcnamara; 04-08-2009 at 10:51 AM.. Reason: code tags
  #2 (permalink)  
Old 04-08-2009
quirkasaurus's Avatar
quirkasaurus quirkasaurus is offline
Registered User
  
 

Join Date: Jan 2009
Location: canton, michigan
Posts: 373
something wrong with this line, no delimiter specified:

Code:
grep 'Start.*CO' $srchFN | awk -F "RequestId: " '{print $2}'>temp.log
should be something like:

Code:
grep 'Start.*CO' $srchFN | awk -F, "RequestId: " '{print $2}'>temp.log

it looks like the problem is you're rereading the whole $srchFN for every line
of your temp.log twice through those grep commands.

You'll need to reformat your logic so that these extra greps are NOT even done.

Try to rethink things so that you know what you're looking for while only
reading the file once.
  #3 (permalink)  
Old 04-08-2009
sowmitr sowmitr is offline
Registered User
  
 

Join Date: Apr 2009
Posts: 2
Quote:
Originally Posted by quirkasaurus View Post
it looks like the problem is you're rereading the whole $srchFN for every line
of your temp.log twice through those grep commands.

Try to rethink things so that you know what you're looking for while only
reading the file once.
My input file has one line for the start of a request and one for the end of the request. And there are hundreds of unique requests. So I am struggling to think of how else I can get the start and end time with one grep, for each req id...
  #4 (permalink)  
Old 04-08-2009
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,538
too many greps, too many awks, useless cats....
  #5 (permalink)  
Old 04-08-2009
quirkasaurus's Avatar
quirkasaurus quirkasaurus is offline
Registered User
  
 

Join Date: Jan 2009
Location: canton, michigan
Posts: 373
something like this? substitute 'd' for your log file:

Code:
cat d |
while read line ; do

#----------------------------------------------------------------------#
# Start time and service name.                                         #
#----------------------------------------------------------------------#
  if [[ $line = *Service,Start* ]] ; then
    echo $line |
      sed -e 's/^.*SvcName: //' -e 's/ - .*$//' |
      read service_name

    echo $line |
      sed -e 's/,INFO.*$//' |
      read start_time
  fi

#----------------------------------------------------------------------#
# End time.                                                            #
#----------------------------------------------------------------------#
  if [[ $line = *Service,End*SvcName*$svcname*EndUser* ]] ; then
    echo $line |
      sed -e 's/,INFO.*$//' |
      read end_time
  echo start-time $start_time end-time $end_time service-name $service_name
  fi
done
  #6 (permalink)  
Old 04-08-2009
radoulov's Avatar
radoulov radoulov is online now Forum Staff  
addict
  
 

Join Date: Jan 2007
Location: Варна, България / Milano, Italia
Posts: 2,872
I would use Perl to calculate the date and time differences (using an external module Date::Manip):

Code:
#!/usr/bin/perl 

use warnings;
use strict;

use Date::Manip;

my $infile = 'file';

open FH, $infile or die "$infile: $!";

my ( $sdate, $edate, $delta, $err, $ustart, $uend, $rtime );

while (<FH>) {
    if (/SvcName: CO/) {
        /Start -/ and ( $sdate = ( split ',' )[0] ) =~ s|:(.*?):|/$1/|;
        /End -/   and ( $edate = ( split ',' )[0] ) =~ s|:(.*?):|/$1/|;
    }
    if ( $sdate && $edate ) {
        $ustart = UnixDate( $sdate, "%s" );
        $uend   = UnixDate( $edate, "%s" );
        if ($edate) {
            $rtime = $ustart - $uend;
            print $sdate, " - ", $edate, "\n";
            print DateCalc( $sdate, $edate, \$err ), "\n";
        }
    }
}
With your sample data it produces the following output:


Code:
% cat file
2009:04:03 08:21:41:513,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: OIS - EndUserName: - RequestId: null20090403082313
2009:04:03 08:21:41:775,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: OIS - EndUserName: - RequestId: null20090403082313 - StatusCode: 0 - StatusText: Success
2009:04:03 08:21:45:660,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: VCC - EndUserName: - RequestId: 411111111111111120090403082318
2009:04:03 08:21:46:171,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: VCC - EndUserName: - RequestId: 411111111111111120090403082318 - StatusCode: 0 - StatusText: Success
2009:04:03 08:21:49:583,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,Start - SvcName: CO - EndUserName: - RequestId: 20090403082321
2009:04:03 08:22:03:571,INFO ,servername,yyyy,undefined,,INFO,null,.....Out:xxService,End - SvcName: CO - EndUserName: - RequestId: 20090403082321 - StatusCode: 0 - StatusText: Success
zsh-4.3.9[t]% ./s       
2009/04/03 08:21:49:583 - 2009/04/03 08:22:03:571
+0:0:0:0:0:0:14
  #7 (permalink)  
Old 04-08-2009
fpmurphy's Avatar
fpmurphy fpmurphy is offline Forum Staff  
Moderator
  
 

Join Date: Dec 2003
Location: Florida
Posts: 1,927
Here is another way to solve your problem using ksh93 ....
Code:
#!/usr/bin/ksh93

LOGFILE=./logfile
SVCNAME="CO"

awk -v str="$SVCNAME" -F, '$10 ~ "SvcName: "str {
     split($10, arr1, " ")
     date1=$1
     getline
     date2=$1
     print arr1[4], arr1[9], date1, date2
     next
}' $LOGFILE |  while read svc id sdate stime fdate ftime
do
    sdate1=$(printf '%(%s.%3N)T' "${sdate//:/-} ${stime:0:8}.${stime:9}")
    fdate1=$(printf '%(%s.%3N)T' "${fdate//:/-} ${ftime:0:8}.${ftime:9}")

    diff=$(( fdate1 - sdate1 ))
    idiff=$(( int(diff) ))
    usecs=$(( 1000 * (diff - int(diff)) ))
    secs=$(( idiff % 60 ))
    mins=$(( idiff % (60 * 60) / 60 ))
    hours=$(( idiff / (60 * 60) ))

    printf "%s %s - %s %s +%02d:%02d:%02d.%03d\n"  ${sdate//://} $stime ${fdate//://} $ftime $hours $mins $secs $usecs
done

exit 0
This is the output for your sample file
Code:
2009/04/03 08:21:49:583 - 2009/04/03 08:22:03:571 +00:00:13.988
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 09:06 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0