what is the most effective way to process a large logfile?

Login or Register to Ask a Question and Join Our Community

what is the most effective way to process a large logfile?

Tags

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting what is the most effective way to process a large logfile?

11-13-2006

Registered User

114, 0

Join Date: Jul 2006

Last Activity: 4 January 2015, 4:22 AM EST

Posts: 114

Thanks Given: 1

Thanked 0 Times in 0 Posts

what is the most effective way to process a large logfile?

I am dealing with a very large firewall logfile (more than 10G),
the logfile like this

*snip*
Nov 9 10:12:01 testfirewall root: [ID 702911 local5.info]

Nov 9 10:12:01 testfirewall root: [ID 702911 local5.info] 0:00:11 accept testfw01-hme0 >hme0 proto: icmp;
src: test001.example.net; dst: abc.dst.net; rule: 1; icmp-type: 8; icmp-code: 0; product: VPN-1 & Fire
*snip*

I don't need any line including "icmp or snmp", and since there are many lines with no content (like the first line in the example, no info after local5.info), I perform a "src" grep, and then I pick up all the lines with which the 16th field not starting with 192.12 or 192.34, or including "test", then I print several fields, using a tab (\t) instead of space to separate them, and at last, delete all the ";" character in the logfile.

My command is as following,

egrep -vi "icmp|snmp" /logs/logfile | egrep -i "src" | awk '$16!~/(^192.(12|34)|.*test.*)/' | awk 'BEGIN {OFS="\t"} {print $1$2, $11,$10,$14,$16,$18,$20," ",$26}' | sed 's/;//g' > /tmp/logfile2

I don't think my way is efficient, so anyone here can give me some suggestions on how to organize my command? Thank you!

Last edited by fedora; 11-13-2006 at 07:24 PM..

fedora

View Public Profile for fedora

Find all posts by fedora

11-14-2006

Registered User

1,714, 63

Join Date: Apr 2004

Last Activity: 15 May 2020, 11:27 AM EDT

Location: Bordeaux, France

Posts: 1,714

Thanks Given: 2

Thanked 63 Times in 59 Posts

You do all the wotk within a single awk program:

Code:

awk '
     BEGIN { 
        OFS = "\t" 
     }
     { 
        l0 = tolower($0) 
     }
     l0   ~ /icmp|snmp/ || l0  !~ /src/ ||  $16  ~ /^192.(12|34)|test/ {
        next
     }
     { 
        gsub(/;/, ""); 
        print $12,$11,$10,$14,$16,$18,$20," ",$26 
     }
   ' /logs/logfile > /tmp/logfil

Jean-Pierre.

aigles

View Public Profile for aigles

Find all posts by aigles

11-14-2006

Registered User

114, 0

Join Date: Jul 2006

Last Activity: 4 January 2015, 4:22 AM EST

Posts: 114

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thank you! aigles

I am just wondering is this the most efficient way to do the job?

I will compare the time difference.

fedora

View Public Profile for fedora

Find all posts by fedora

11-14-2006

Administrator Emeritus

9,926, 461

Join Date: Aug 2001

Last Activity: 26 February 2016, 12:31 PM EST

Location: Ashburn, Virginia

Posts: 9,926

Thanks Given: 63

Thanked 461 Times in 270 Posts

You want a single process and this does that. A perl or ksh solution might beat this by a little bit, provided that they carefully use only built-in commands and never invoke anything external. Perl and ksh compile the script while awk does not. And a custom C program can beat anything else.

Your 5 stage pipeline will not be even close to a single process. Even if you have 5 CPU's available that can be dedicated to the pipeline, all of that reading and writing to pipes is expensive. (Anything is expensive when you do it many millions times.) And you probably do not have 5 CPU's available for the entire run. Without 5 dedicated CPU's you will need to context switch several million times as well.

Perderabo

View Public Profile for Perderabo

Find all posts by Perderabo

11-14-2006

Registered User

38, 0

Join Date: Feb 2006

Last Activity: 26 January 2012, 2:50 PM EST

Posts: 38

Thanks Given: 0

Thanked 0 Times in 0 Posts

Just an idea

Why don't you split the file into small files of 1GB each. Then use Pederarbo's awk script to go through each one of the split files. And Awk being a stream editor there can be nothing faster to work on data than working on data streams.
After you are done with the cleansing of the files you could append them into a single file.

About splitting the files is just an idea and might save because you would be handling small sets of data flowing in one continuous stream than one large one of 10GB.

rajeeb_d

View Public Profile for rajeeb_d

Find all posts by rajeeb_d

11-14-2006

Registered User

114, 0

Join Date: Jul 2006

Last Activity: 4 January 2015, 4:22 AM EST

Posts: 114

Thanks Given: 1

Thanked 0 Times in 0 Posts

I really appreciated your guys' advice, have a great day

fedora

View Public Profile for fedora

Find all posts by fedora

11-14-2006

Administrator Emeritus

9,926, 461

Join Date: Aug 2001

Last Activity: 26 February 2016, 12:31 PM EST

Location: Ashburn, Virginia

Posts: 9,926

Thanks Given: 63

Thanked 461 Times in 270 Posts

Splitting the file won't help. Now you added reading and rewriting 10 GB of data to your list of things to do.

Perderabo

View Public Profile for Perderabo

Find all posts by Perderabo

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Process multiple large files with awk

Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk. I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside. For example: a sample_1 200 a.b sample_2 10 a sample_3 10 a sample_1 10 a...

2. Shell Programming and Scripting

Logfile monitoring with logfile replacement

Bonjour, I've wrote a script to monitor a logfile in realtime. It is working almost perfeclty except for two things. The script use the following technique : tail -fn0 $logfile | \ while read line ; do ... some stuff done First one, I'd like a way to end the monitoring script if a...

3. Cybersecurity

Password rules not effective

I was looking for a good list of words to exclude people from using as passwords, i.e. those that could be guessed easily. I'm working through a whole bunch of suggestions from skullsecurity.org, but I managed to find this page that seems to suggest I have more options than I thought. :b: I...

4. UNIX for Dummies Questions & Answers

Real and Effective IDs

Can anyone explain me in details of Real and Effective IDs

5. Shell Programming and Scripting

how to track a log file of one process and mail that logfile to a group?

Hi Friendz, I have 14 DB load scripts say 1,2,3....14. I want a script to call each script automatically, and after completion of every script, it needs to track the logfile and mail that log file to a group.and again it should run the next script in sequence 1,2,3...14 Please help me,need...

6. Red Hat

Fork wait in background process - large delay

hi all, We are trying to run a process in the background and in the process we call fork ;and wait for the child process to finish .We find that the died = wait(&status); happens after 10 seconds randomly and sometimes completes in time (within 1 sec) This behavior is seen only when the...

7. Red Hat

Fork wait in background process - large delay

hi all, We are trying to run a process in the background and in the process we call fork ;and wait for the child process to finish .We find that the died = wait(&status); happens after 10 seconds randomly and sometimes completes in time (within 1 sec) This behavior is seen only when the...

8. Shell Programming and Scripting

Scripting the process to edit a large file

Hi, I need to make a script to edit a file. File is a large file in below format Version: 2008120101 ;$INCLUDE ./abc/xyz/Delhi ;$INCLUDE ./abc/xyz/London $INCLUDE ./abc/xyz/New York First line in the file is version number which is in year,month,date and serial number format. Each...

9. Shell Programming and Scripting

get last section from large logfile

Hi all, I need to pull the (last section) from the header to the end of file. This logfile will have many sections appended, but I only need to capture the last one and email it to someone. Any ideas? Thanks Kathy Example Logfile: ============= = BOX : SAPCFI = JOB : SAPCFI =...

10. UNIX for Dummies Questions & Answers

most effective search ?

what's the most efficient and effective search for a file in a dir ? I see many guys use this # find - print or something as such ? and sometimes pipe it to something else ? Is there a better way of using "grep" in all of this ? thanks simon2000

Login or Register to Ask a Question