parsing logfiles (performance issue)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting parsing logfiles (performance issue)
# 1  
Old 05-05-2008
parsing logfiles (performance issue)

--------------------------------------------------------------------------------

Hi All,

I am reading some logfiles and parsing data and printing to some textfile.
Here is my code

OLDIFS=$IFS
IFS='
' # just a newline, in single quotes
while read data
do
if [ "$data" != " " -a "$data" != "" ]
then
#Parsing the Frontend log file
MSISDN="`echo "$data" | cut -c1-10`"
HOUR="`echo "$data" | cut -c11-16`"
ID_SA_SOURCE="`echo "$data" | cut -c17-34`"
ID_SA_DEST="`echo "$data" | cut -c35-52`"
ID_VIR_PORTAL="`echo "$data" | cut -c53-70`"
NW_BEARER="`echo "$data" | cut -c71-74`"
TERMINAL_TYPE="`echo "$data" | cut -c 75-75`"
TRADE_MODEL="`echo "$data" | cut -c 76-105`"
HOUR=$LOGDATE$HOUR

echo $HOUR";"$MSISDN";"$ID_SA_SOURCE";"$ID_SA_DEST";
"$ID_VIR_PORTAL";"$NW_BEARER";"$TERMINAL_TYPE";"$T RADE_MODEL >> OFR_Processed_data.txt
fi
done < $TRACKING_LOGDIR/$listdata

In the log file my data is always fixed thats y ia m using cut to get the data.
This code is working perfectly, but performance vice its a big failure.
while reading a logfile with 1 lakh records its taking morethan 2 hours to process the output data.
Can any one tell y it is taking this much time ? How can i alter my code in a better way ?

Thanks in advance
Subin
# 2  
Old 05-05-2008
Change the print statment to get the exact format you want:
Code:
awk  ' {
         if($0 > " ")
         {
           MSISDN=       substr($0,1,10);
           HOUR=         substr($0,11,16)
           ID_SA_SOURCE= substr($0,17,16)
           ID_SA_DEST=   substr($0,35,16)
           ID_VIR_PORTAL=substr($0,53,16)
           NW_BEARER=    substr($0,71,4 )
           TERMINAL_TYPE=substr($0,75,1 )
           TRADE_MODEL=  substr($0,76,30)                    
           print HOUR, MSIDSN,ID_SA_SOURCE,ID_SA_DEST,ID_VIR_PORTAL,NW_BEARER,TERMINAL_TYPE,TRADE_MODEL >> OFR_Processed_data.txt           
          }                 
        }' $TRACKING_LOGDIR/$listdata

# 3  
Old 05-05-2008
Hi ,

Thanks for the reply.
I replaced my code with awk part..
But iam getting following error

awk: cmd. line:11: print HOUR,MSIDSN,ID_SA_SOURCE,ID_SA_DEST,ID_VIR_PORTAL,NW_BEARER,TERMINAL_TYPE,TRADE_MODEL >> OFR_Processed_data.txt
awk: cmd. line:11:
^ syntax error



Can u look in to this error
# 4  
Old 05-05-2008
my bad - the filename needs double quotes around it.
# 5  
Old 05-06-2008
Hi ,
Thanks for the help.. I changed the code like this and performancevice its looking ok.

awk ' {
if($0 > " ")
{
MSISDN=substr($0,1,10);
HOUR=substr($0,11,16);
ID_SA_SOURCE=substr($0,17,16);
ID_SA_DEST=substr($0,35,16);
ID_VIR_PORTAL=substr($0,53,16);
NW_BEARER=substr($0,71,4 );
TERMINAL_TYPE=substr($0,75,1);
TRADE_MODEL=substr($0,76,30);
print $HOUR";"$MSIDSN";"$ID_SA_SOURCE";"$ID_SA_DEST";"$ID_VIR_PORTAL";"$NW_BEARER";"$TERMINAL_TYPE";"$TRAD E_MODEL >> "OFR.txt"
}
}' logger_fe.log.199804261105


Now the print statement is not functioning as i want .... (i need to write the print statement in a single line with semicolumn seperated)
eg:
20080426000012;0678111627;;3001339;60180;GPRS;4;SonyEricsson_W600i
20080426000012;0637729992;;3025451;0;GPRS;1;MOTOROLA-W375
20080426000012;0670704419;;3001339;60180;GPRS;2;Motorola_V3i
20080426000013;0671228789;;3001339;60180;GPRS;4;SonyEricssonW300i


BUt the output is writing in incorrect format like this

;0677562376000013 3001339 60180UMTS4SonyEricsson_W850i;0677562376000013 3001339
60180UMTS4SonyEricsson_W850i;;;0677562376000013 3001339 60180UMTS4SonyEricsson_W850i;;0677562376000013
3001339 60180UMTS4SonyEricsson_W850i
;0608166896000009 3204235 3207492 GPRS4LG_KU380;;;0608166896000009 3204235 3207492
GPRS4LG_KU380;0608166896000009 3204235 3207492 GPRS4LG_KU380;GPRS4LG_KU380;0608166896000009 3204235
3207492 GPRS4LG_KU380
;0670473398000009 3001339 60180UMTS4SonyEricsson_k610i;0670473398000009 3001339
60180UMTS4SonyEricsson_k610i;;;0670473398000009 3001339 60180UMTS4SonyEricsson_k610i;;0670473398000009


THe lines are coming in same line and not properly seperated with semicolumns... Can you please look in to this again Smilie

Thanks in advance
Subin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Complex text parsing with speed/performance problem (awk solution?)

I have 1.6 GB (and growing) of files with needed data between the 11th and 34th line (inclusive) of the second column of comma delimited files. There is also a lot of stray white space in the file that needs to be trimmed. They have DOS-like end of lines. I need to transpose the 11th through... (13 Replies)
Discussion started by: Michael Stora
13 Replies

2. AIX

Performance issue

Hi We have an AIX5.3 server with application which is written in C. We are facing server (lpar) hangs intermediately. If we open new telnet window prompts for user and takes hell of a time to authenticate, not only that if we run ps -aef then also it takes lot of time. surprisingly there is no... (2 Replies)
Discussion started by: powerAIX
2 Replies

3. AIX

Performance issue

Hi, We have 2 lpars on p6 blade. One of the lpar is having 3 core cpu with 5gb memory running sybase as database. An EOD process takes 25 min. to complete. Now we have an lpar on P7 server with entitled cpu capacity of 2 with 16 Gb memory and sybase as database. The EOD process which takes... (17 Replies)
Discussion started by: vjm
17 Replies

4. Shell Programming and Scripting

Performance issue or something else?

Hi All, I have the following script which I use in Nagios to check the health of the applications, the problem with it is that the curl part ($TOTAL) does not return anything after running for 2-3 hrs, even though from command line the script runs fine but not from Nagios. There are 17... (1 Reply)
Discussion started by: jacki
1 Replies

5. Solaris

Performance issue

Hi Gurus, I am beginner in solaris and want to know what are the things we need to check for performance monitoring on our solairs OS. for DISK,CPU and MEMORY. Also how we do ipforwarding in slaris Many thanks for your help Pradeep P (4 Replies)
Discussion started by: ppandey21
4 Replies

6. Shell Programming and Scripting

Performance of log parsing shell script very slow

Hello, I am an absolute newbie and whatever I've written in the shell script (below) has all been built with generous help from googling the net and this forum. Please forgive any schoolboy mistakes. Now to the qn, my input file looks like this - 2009:04:03 08:21:41:513,INFO... (7 Replies)
Discussion started by: sowmitr
7 Replies

7. UNIX for Advanced & Expert Users

Performance issue!

In my C program i am using very large file(approx 400MB) to read parts of it frequently. But due to large file the performance of the program goes down very badly. It shows very high I/O usage and I/O wait time. My question is, What are the ways to optimize or tune I/O on linux or how i can get... (10 Replies)
Discussion started by: mavens
10 Replies

8. Shell Programming and Scripting

split monthly logfiles into daily logfiles

Hi, I have a lot of logfiles like fooYYYYMM.log (foo200301.log, foo200810.log) with lines like YYYY-MM-DD TIMESTAMP,text1,text2,text3... but I need (for postprocessing) the form fooYYYYMMDD.log (so foo200402.log becomes foo20040201.log, foo20040202.log...) with unmodified content of lines. ... (1 Reply)
Discussion started by: clzupp
1 Replies

9. Shell Programming and Scripting

performance issue

I want to read a file. is it good to use File I/O or shell script?? which one is the best option? (1 Reply)
Discussion started by: vishwaraj
1 Replies

10. UNIX for Advanced & Expert Users

performance issue

Hi, on a linux server I have the following : vmstat 2 10 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 4 0 675236 39836 206060 1617660 3 3 3 6 8 7 1 1 ... (1 Reply)
Discussion started by: big123456
1 Replies
Login or Register to Ask a Question