The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
get last section from large logfile kburrows Shell Programming and Scripting 9 05-23-2004 08:06 PM
Changing the Effective Group ID Jody UNIX for Dummies Questions & Answers 2 12-05-2002 03:53 PM
most effective search ? simon2000 UNIX for Dummies Questions & Answers 3 10-09-2002 10:18 AM
Changing effective user hilmel Security 6 12-06-2001 04:31 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 11-13-2006
fedora fedora is offline
Registered User
  
 

Join Date: Jul 2006
Posts: 94
what is the most effective way to process a large logfile?

I am dealing with a very large firewall logfile (more than 10G),
the logfile like this


*snip*
Nov 9 10:12:01 testfirewall root: [ID 702911 local5.info]

Nov 9 10:12:01 testfirewall root: [ID 702911 local5.info] 0:00:11 accept testfw01-hme0 >hme0 proto: icmp;
src: test001.example.net; dst: abc.dst.net; rule: 1; icmp-type: 8; icmp-code: 0; product: VPN-1 & Fire
*snip*

I don't need any line including "icmp or snmp", and since there are many lines with no content (like the first line in the example, no info after local5.info), I perform a "src" grep, and then I pick up all the lines with which the 16th field not starting with 192.12 or 192.34, or including "test", then I print several fields, using a tab (\t) instead of space to separate them, and at last, delete all the ";" character in the logfile.

My command is as following,

egrep -vi "icmp|snmp" /logs/logfile | egrep -i "src" | awk '$16!~/(^192.(12|34)|.*test.*)/' | awk 'BEGIN {OFS="\t"} {print $1$2, $11,$10,$14,$16,$18,$20," ",$26}' | sed 's/;//g' > /tmp/logfile2

I don't think my way is efficient, so anyone here can give me some suggestions on how to organize my command? Thank you!

Last edited by fedora; 11-13-2006 at 07:24 PM..
  #2 (permalink)  
Old 11-14-2006
aigles's Avatar
aigles aigles is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,433
You do all the wotk within a single awk program:

Code:
awk '
     BEGIN { 
        OFS = "\t" 
     }
     { 
        l0 = tolower($0) 
     }
     l0   ~ /icmp|snmp/ || l0  !~ /src/ ||  $16  ~ /^192.(12|34)|test/ {
        next
     }
     { 
        gsub(/;/, ""); 
        print $12,$11,$10,$14,$16,$18,$20," ",$26 
     }
   ' /logs/logfile > /tmp/logfil


Jean-Pierre.
  #3 (permalink)  
Old 11-14-2006
fedora fedora is offline
Registered User
  
 

Join Date: Jul 2006
Posts: 94
Thank you! aigles

I am just wondering is this the most efficient way to do the job?

I will compare the time difference.
  #4 (permalink)  
Old 11-14-2006
Perderabo's Avatar
Perderabo Perderabo is offline Forum Staff  
Unix Daemon
  
 

Join Date: Aug 2001
Location: Ashburn, Virginia
Posts: 9,131
You want a single process and this does that. A perl or ksh solution might beat this by a little bit, provided that they carefully use only built-in commands and never invoke anything external. Perl and ksh compile the script while awk does not. And a custom C program can beat anything else.

Your 5 stage pipeline will not be even close to a single process. Even if you have 5 CPU's available that can be dedicated to the pipeline, all of that reading and writing to pipes is expensive. (Anything is expensive when you do it many millions times.) And you probably do not have 5 CPU's available for the entire run. Without 5 dedicated CPU's you will need to context switch several million times as well.
  #5 (permalink)  
Old 11-14-2006
rajeeb_d rajeeb_d is offline
Registered User
  
 

Join Date: Feb 2006
Posts: 37
Just an idea

Why don't you split the file into small files of 1GB each. Then use Pederarbo's awk script to go through each one of the split files. And Awk being a stream editor there can be nothing faster to work on data than working on data streams.
After you are done with the cleansing of the files you could append them into a single file.

About splitting the files is just an idea and might save because you would be handling small sets of data flowing in one continuous stream than one large one of 10GB.
  #6 (permalink)  
Old 11-14-2006
fedora fedora is offline
Registered User
  
 

Join Date: Jul 2006
Posts: 94
I really appreciated your guys' advice, have a great day
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 08:35 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0