what is the most effective way to process a large logfile?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting what is the most effective way to process a large logfile?
# 1  
Old 11-13-2006
what is the most effective way to process a large logfile?

I am dealing with a very large firewall logfile (more than 10G),
the logfile like this


*snip*
Nov 9 10:12:01 testfirewall root: [ID 702911 local5.info]

Nov 9 10:12:01 testfirewall root: [ID 702911 local5.info] 0:00:11 accept testfw01-hme0 >hme0 proto: icmp;
src: test001.example.net; dst: abc.dst.net; rule: 1; icmp-type: 8; icmp-code: 0; product: VPN-1 & Fire
*snip*

I don't need any line including "icmp or snmp", and since there are many lines with no content (like the first line in the example, no info after local5.info), I perform a "src" grep, and then I pick up all the lines with which the 16th field not starting with 192.12 or 192.34, or including "test", then I print several fields, using a tab (\t) instead of space to separate them, and at last, delete all the ";" character in the logfile.

My command is as following,

egrep -vi "icmp|snmp" /logs/logfile | egrep -i "src" | awk '$16!~/(^192.(12|34)|.*test.*)/' | awk 'BEGIN {OFS="\t"} {print $1$2, $11,$10,$14,$16,$18,$20," ",$26}' | sed 's/;//g' > /tmp/logfile2

I don't think my way is efficient, so anyone here can give me some suggestions on how to organize my command? Thank you!

Last edited by fedora; 11-13-2006 at 07:24 PM..
# 2  
Old 11-14-2006
You do all the wotk within a single awk program:
Code:
awk '
     BEGIN { 
        OFS = "\t" 
     }
     { 
        l0 = tolower($0) 
     }
     l0   ~ /icmp|snmp/ || l0  !~ /src/ ||  $16  ~ /^192.(12|34)|test/ {
        next
     }
     { 
        gsub(/;/, ""); 
        print $12,$11,$10,$14,$16,$18,$20," ",$26 
     }
   ' /logs/logfile > /tmp/logfil


Jean-Pierre.
# 3  
Old 11-14-2006
Thank you! aigles

I am just wondering is this the most efficient way to do the job?

I will compare the time difference.
# 4  
Old 11-14-2006
You want a single process and this does that. A perl or ksh solution might beat this by a little bit, provided that they carefully use only built-in commands and never invoke anything external. Perl and ksh compile the script while awk does not. And a custom C program can beat anything else.

Your 5 stage pipeline will not be even close to a single process. Even if you have 5 CPU's available that can be dedicated to the pipeline, all of that reading and writing to pipes is expensive. (Anything is expensive when you do it many millions times.) And you probably do not have 5 CPU's available for the entire run. Without 5 dedicated CPU's you will need to context switch several million times as well.
# 5  
Old 11-14-2006
Just an idea

Why don't you split the file into small files of 1GB each. Then use Pederarbo's awk script to go through each one of the split files. And Awk being a stream editor there can be nothing faster to work on data than working on data streams.
After you are done with the cleansing of the files you could append them into a single file.

About splitting the files is just an idea and might save because you would be handling small sets of data flowing in one continuous stream than one large one of 10GB.
# 6  
Old 11-14-2006
I really appreciated your guys' advice, have a great day Smilie Smilie Smilie
# 7  
Old 11-14-2006
Splitting the file won't help. Now you added reading and rewriting 10 GB of data to your list of things to do.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Process multiple large files with awk

Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk. I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside. For example: a sample_1 200 a.b sample_2 10 a sample_3 10 a sample_1 10 a... (4 Replies)
Discussion started by: camor
4 Replies

2. Shell Programming and Scripting

Logfile monitoring with logfile replacement

Bonjour, I've wrote a script to monitor a logfile in realtime. It is working almost perfeclty except for two things. The script use the following technique : tail -fn0 $logfile | \ while read line ; do ... some stuff done First one, I'd like a way to end the monitoring script if a... (3 Replies)
Discussion started by: Warluck
3 Replies

3. Cybersecurity

Password rules not effective

I was looking for a good list of words to exclude people from using as passwords, i.e. those that could be guessed easily. I'm working through a whole bunch of suggestions from skullsecurity.org, but I managed to find this page that seems to suggest I have more options than I thought. :b: I... (1 Reply)
Discussion started by: rbatte1
1 Replies

4. UNIX for Dummies Questions & Answers

Real and Effective IDs

Can anyone explain me in details of Real and Effective IDs (6 Replies)
Discussion started by: kkalyan
6 Replies

5. Shell Programming and Scripting

how to track a log file of one process and mail that logfile to a group?

Hi Friendz, I have 14 DB load scripts say 1,2,3....14. I want a script to call each script automatically, and after completion of every script, it needs to track the logfile and mail that log file to a group.and again it should run the next script in sequence 1,2,3...14 Please help me,need... (1 Reply)
Discussion started by: shirdi
1 Replies

6. Red Hat

Fork wait in background process - large delay

hi all, We are trying to run a process in the background and in the process we call fork ;and wait for the child process to finish .We find that the died = wait(&status); happens after 10 seconds randomly and sometimes completes in time (within 1 sec) This behavior is seen only when the... (1 Reply)
Discussion started by: vishnu.priya
1 Replies

7. Red Hat

Fork wait in background process - large delay

hi all, We are trying to run a process in the background and in the process we call fork ;and wait for the child process to finish .We find that the died = wait(&status); happens after 10 seconds randomly and sometimes completes in time (within 1 sec) This behavior is seen only when the... (0 Replies)
Discussion started by: vishnu.priya
0 Replies

8. Shell Programming and Scripting

Scripting the process to edit a large file

Hi, I need to make a script to edit a file. File is a large file in below format Version: 2008120101 ;$INCLUDE ./abc/xyz/Delhi ;$INCLUDE ./abc/xyz/London $INCLUDE ./abc/xyz/New York First line in the file is version number which is in year,month,date and serial number format. Each... (5 Replies)
Discussion started by: makkar4u
5 Replies

9. Shell Programming and Scripting

get last section from large logfile

Hi all, I need to pull the (last section) from the header to the end of file. This logfile will have many sections appended, but I only need to capture the last one and email it to someone. Any ideas? Thanks Kathy Example Logfile: ============= = BOX : SAPCFI = JOB : SAPCFI =... (9 Replies)
Discussion started by: kburrows
9 Replies

10. UNIX for Dummies Questions & Answers

most effective search ?

what's the most efficient and effective search for a file in a dir ? I see many guys use this # find - print or something as such ? and sometimes pipe it to something else ? Is there a better way of using "grep" in all of this ? thanks simon2000 (3 Replies)
Discussion started by: simon2000
3 Replies
Login or Register to Ask a Question