Getting info from a huge log file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Getting info from a huge log file
# 8  
Old 09-10-2011
Thank you. I will try the solution once im on the production server.

What im planning is to make this as a shell script with input arguments so its easy for me to get the data that I need just like:

"parser.sh serverIP timestamp"

also im planning to make an expect script for the automation. i will look for such example script. maybe you can refer me to a link where i can find or learn.

thanks.


Quote:
Originally Posted by yazu
Oh, yes. But because your file is really huge it's better to embed this check in awk:
Code:
awk -v t="$time" -F': ' '
$0 ~ "^" t && $2 == "BIOS Name" {
   name = $0; getline; date = $0; getline; quality = $0;
}
name && $3 == "BIOS INFO" {
  printf "%s\n%s\n%s\n", name, date, quality
  exit
}

===

You can't use grep, it will search the whole file, but you need quit after getting your lines (just imagine that your information in the first 100 kilo). But I'm afraid the above awk solution would be slow because of the string regex. But you can embed the time variable (you need a variable for easy further automation) in the awk regex literal:
Code:
awk  -F': ' '
$0 ~ /^'"$time"'/ && $2 == "BIOS Name" {
   name = $0; getline; date = $0; getline; quality = $0;
}
name && $3 == "BIOS INFO" {
  printf "%s\n%s\n%s\n", name, date, quality
  exit
}

# 9  
Old 09-10-2011
Ok. But I lost somehow the closing quote. So that's the right (I hope) one:
Code:
time=06:10:28 # for example
awk  -F': ' '
$0 ~ /^'"$time"'/ && $2 == "BIOS Name" {
   name = $0; getline; date = $0; getline; quality = $0;
}
name && $3 == "BIOS INFO" {
  printf "%s\n%s\n%s\n", name, date, quality
  exit
}' LOGFILE

# 10  
Old 09-10-2011
Quote:
Originally Posted by yazu
Code:
$0 ~ "^" t

But I'm afraid the above awk solution would be slow because of the string regex. But you can embed the time variable (you need a variable for easy further automation) in the awk regex literal:
Code:
$0 ~ /^'"$time"'/

Upon reading that, I was skeptical. I expected that there would be an improvement, but I didn't think it would be a large difference. Wow, was I mistaken.

Test file generation:
Code:
jot -b 'foo bar baz
bar baz foo' 5000000 > data

Trying to match lines that begin with "foo", that yields a 10 million line file whose lines alternately match and don't match.

Approximate results (in seconds) using nawk (aka bwk awk aka one true awk):
4.738 -- awk '{$0 ~ /^foo/}'
5.541 -- awk -v t=foo '{index($0,t)==1}'
7.680 -- awk -v t=foo '{$0 ~ "^" t}'
8.740 -- awk -v t=foo '{substr($0,1,lengh(t))==t}'

The regular expression literal (4.738) is 38% faster than the dynamic regular expression (7.680).

I measured similar results (41% improvement) with an ancient version of mawk on a 12 yr old laptop (which still has a sticker proudly announcing "Designed for Microsoft Windows 95").

Thank you, yazu, for the enlightenment.

Regards,
Alister

Last edited by alister; 09-10-2011 at 02:44 PM..
# 11  
Old 09-11-2011
I managed to get a sample of the log file which I grep'd from the timestamp.

I found out that the timestamp is different so the awk doesn't work. could anyone help to rebuild the code based on the sample log below.

I need to grep the word BIOS-INFOXXX based on timestamp and get the 3 details below.

|06:10:22.211| mymachine | | kernel:| RBL: RBL Code 10
|06:10:22.211| mymachine | | kernel:| DRPD: DRPD 789123
|06:10:22.211| mymachine | | kernel:| RTR: RTR Incomplete


Code:
|06:10:22.211| mymachine | | syslogd| 1.4.1: restart.
|06:10:22.211| mymachine | | syslog:| syslogd startup succeeded
|06:10:22.211| mymachine | | kernel:| klogd 1.4.1, log source = /proc/kmsg started.
|06:10:22.211| mymachine | | kernel:| Linux version 2.4.22-1.2115.nptlsmp (gcc version 3.2.3 20030422 (Red Hat Linux 3.2.3-6)) #1 SMP Wed Oct 29 15:30:09 EST 2003
|06:10:22.211| mymachine | | kernel:| BIOS-provided physical RAM map:
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 0000000000100000 - 000000003ff70000 (usable)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 000000003ff70000 - 000000003ff72000 (ACPI NVS)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 000000003ff72000 - 000000003ff93000 (ACPI data)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 000000003ff93000 - 0000000040000000 (reserved)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
|06:10:22.211| mymachine | | kernel:|  RBL: RBL Code 10
|06:10:22.211| mymachine | | kernel:|  DRPD: DRPD 789123
|06:10:22.211| mymachine | | kernel:|  RTR: RTR Incomplete
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 000000003ff93000 - 0000000040000000 (reserved)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 00000000fecf0000 - 00000000fecf1000 (reserved)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 00000000fed20000 - 00000000fed90000 (reserved)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
|06:10:22.211| mymachine | | kernel:| BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
|06:10:23.221| mymachine | | kernel:| 127MB HIGHMEM available.
|06:10:23.221| mymachine | | kernel:| 896MB LOWMEM available.
|06:10:23.221| mymachine | | kernel:| found SMP MP-table at 000fe710
|06:10:23.221| mymachine | | kernel:| hm, page 000fe000 reserved twice.
|06:10:23.221| mymachine | | kernel:| Intel machine check reporting enabled on CPU#0.
|06:10:23.221| mymachine | | kernel:| CPU0: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 09
|06:10:23.221| mymachine | | random:| Initializing random number generator: succeeded
|06:10:23.221| mymachine | | kernel:| per-CPU timeslice cutoff: 1462.76 usecs.
|06:10:23.221| mymachine | | kernel:| task migration cache decay timeout: 10 msecs.
|06:10:23.221| mymachine | | kernel:| enabled ExtINT on CPU#0
|06:10:23.221| mymachine | | kernel:| ESR value before enabling vector: 00000040
|06:10:23.221| mymachine | | kernel:| ESR value after enabling vector: 00000000
|06:10:23.221| mymachine | | kernel:| Booting processor 1/1 eip 3000
|06:10:23.221| mymachine | | kernel:| Initializing CPU#1
|06:10:23.221| mymachine | | kernel:| masked ExtINT on CPU#1
|06:10:28.201| mymachine | | kernel:| ESR value CPU1: found SMP MP table
|06:10:28.201| mymachine | | kernel:| ESR value before enabling vector: 00000000
|06:10:28.201| mymachine | | kernel:| ESR value after enabling vector: 00000000
|06:10:28.201| mymachine | | kernel:| Calibrating delay loop... 6383.20 BogoMIPS
|06:10:28.201| mymachine | | kernel:| CPU: Trace cache: 12K uops, L1 D cache: 8K
|06:10:28.201| mymachine | | kernel:| CPU: L2 cache: 512K
|06:10:28.201| mymachine | | kernel:| CPU: Physical Processor ID: 0
|06:10:28.201| mymachine | | kernel:| CMainConnection::LockAccess:: RunThread(): - This is it:: BIOS-INFOXXX : Alert, get info on the upper part.
|06:10:28.201| mymachine | | kernel:| Intel machine check reporting enabled on CPU#1.
|06:10:28.201| mymachine | | kernel:| CPU1: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 09
|06:10:28.201| mymachine | | kernel:| Total of 2 processors activated (12753.30 BogoMIPS).
|06:10:28.201| mymachine | | rc: 111|Starting pcmcia: succeeded
|06:10:28.201| mymachine | | kernel:| ENABLING IO-APIC IRQs
|06:10:28.201| mymachine | | kernel:| Setting 2 in the phys_id_present_map
|06:10:28.201| mymachine | | kernel:| ...changing IO-APIC physical APIC ID to 2 ... ok.
|06:10:28.201| mymachine | | netfs: | Mounting other filesystems: succeeded
|06:10:28.201| mymachine | | kernel:| ..TIMER: vector=0x31 pin1=2 pin2=0
|06:10:28.201| mymachine | | kernel:| testing the IO APIC.......................
|06:10:28.201| mymachine | | autofs:| automount startup succeeded

Thanks
# 12  
Old 09-12-2011
"Grepping" in awk for time is just optimization. If it doesn't work - remove it. And if you have other filed separator, "main" and "trigger" regexes, just change them:
Code:
fs=':\\| +'; main='RBL: '; trig='BIOS-INFO'
awk  -F"$fs" -v main="$main" -v trig="$trig" '
$2 ~ main {
   first = $0; getline; second = $0; getline; third = $0;
}
first && $2 ~ trig {       
  printf "%s\n%s\n%s\n", first, second, third
  exit
}' INPUTFILE

===

Remember about optimization - embedding shell variables and changing dynamical regexes to static ones. But they say it's unsafe and if you are going to write a script that will get these parameters from the command line - it's very unsafe. It's possible to embed in a such parameter any shell command and execute it (though you can validate them).

Last edited by yazu; 09-12-2011 at 12:36 AM..
# 13  
Old 09-12-2011
Hello everyone.

I was trying to get another data but i am having problem with field seperator based on the log sample below.

main
Code:
|08:52:01.304|0x00001450|2 |        |RAM_24   |      RBL: RBL Code 10   
|08:52:01.304|0x00001450|2 |        |RAM_24   |      DRPD: DRPD 789123
|08:52:01.304|0x00001450|2 |        |RAM_24   |      RTR: RTR Incomplete

trigger
Code:
|08:52:01.335|0x00001450|-1|        |BIOPDB    |.\ESRvalueafter(618) : Caught ERROR(MFC) exception - BIOS-INFO20530: Alert on the bios get info on top

If the log sample has a seperator of something like this below i works

fs=':\\| +'; main='RBL: '; trig='BIOS-INFO'
awk -F"$fs" -v main="$main" -v trig="$trig" '$2 ~ main { first = $0; getline; second = $0; getline; third = $0; } first && $2 ~ trig { printf "%s\n%s\n%s\n", first, second, third }' INPUTFILE

main
Code:
|08:52:01.304|0x00001450|2 |        |RAM_24   :|      RBL: RBL Code 10   
|08:52:01.304|0x00001450|2 |        |RAM_24   :|      DRPD: DRPD 789123
|08:52:01.304|0x00001450|2 |        |RAM_24   :|      RTR: RTR Incomplete

trigger
Code:
|08:52:01.335|0x00001450|-1|        |BIOPDB    :|.\ESRvalueafter(618) : Caught ERROR(MFC) exception - BIOS-INFO20530: Alert on the bios get info on top

but using the sample log above with the using this it doesn't get any output.

fs='|'; main='RBL: '; trig='BIOS-INFO'
awk -F"$fs" -v main="$main" -v trig="$trig" '$6 ~ main { first = $0; getline; second = $0; getline; third = $0; } first && $6 ~ trig { printf "%s\n%s\n%s\n", first, second, third }' INPUTFILE

anyone can explain why? i can't understand why with FS :\\| it works and assign field var to $2. could anyone disect it please. thanks
# 14  
Old 09-12-2011
Well, for this kind of problems is very hard to give you an exact solution. Something changes and you need other solution. The structure of data you give is defined a little wrong and a solution can be wrong or it can take hours instead of minutes. If you are in this buisness (text/file processing) you really should take some time and learn and understand the basics at least.

Ok, there is I was wrong and you need $7 not $6 (you have the additonal first empty field).

And about -F option - when you assign it to one symbol, you get just symbol as the field separator. When you assign it to a string, you get a regex. '|' divides your strings to 7 fields, and ':\\|' divides to two.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help on script to capture info on log file for a particular time frame

Hi I have a system running uname -a Linux cmovel-db01 2.6.32-38-server #83-Ubuntu SMP Wed Jan 4 11:26:59 UTC 2012 x86_64 GNU/Linux I would like to capture the contents of /var/log/syslog from 11:00AM to 11:30AM and sent to this info via email. I was thinking in set a cron entry at that... (2 Replies)
Discussion started by: fretagi
2 Replies

2. Shell Programming and Scripting

How to select bulk of info from log file?

unix : sun shell : bash i need to select multiple rows with this format : <special format> 10 lines /<special format> from log file that have lots of info i thought of getting the number of the first line using grep -n "special format" file | cut -d: -f1 then pass it to shell... (2 Replies)
Discussion started by: scorpioneer
2 Replies

3. Shell Programming and Scripting

HELP: Shell Script to read a Log file line by line and extract Info based on KEYWORDS matching

I have a LOG file which looks like this Import started at: Mon Jul 23 02:13:01 EDT 2012 Initialization completed in 2.146 seconds. -------------------------------------------------------------------------------- -- Import summary for Import item: PolicyInformation... (8 Replies)
Discussion started by: biztank
8 Replies

4. Shell Programming and Scripting

Event logging to file and display to console | tee command is not able to log all info.

My intention is to log the output to a file as well as it should be displayed on the console > I have used tee ( tee -a ${filename} ) command for this purpose. This is working as expected for first few outputs, after some event loggin nothing is gettting logged in to the file but It is displaying... (3 Replies)
Discussion started by: sanoop
3 Replies

5. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Discussion started by: manishkomar007
7 Replies

6. Shell Programming and Scripting

Help finding info from log file

Hi, I have a log file that contains information such as this: date id number command1 command2 command3 command4 data data data date id number command1 command2 command3 command4 (4 Replies)
Discussion started by: bbbngowc
4 Replies

7. Shell Programming and Scripting

Log File - Getting Info about preceding Date of Pattern Found

Ok Suppose I have a log file like the below: 2010-07-15 00:00:01,410 DEBUG 2010-07-15 00:01:01,410 DEBUG 2010-07-15 00:01:02,410 DEBUG com.af ajfajfaf affafadfadfd dfa fdfadfdfadfadf fafafdfadfdafadfdaffdaffadf afdfdafdfdafafd error error failure afdfadfdfdfdf EBUDGG eafaferror failure... (6 Replies)
Discussion started by: SkySmart
6 Replies

8. Shell Programming and Scripting

Extract info from log file and compute using time date stamp

Looking for a shell script or a simple perl script . I am new to scripting and not very good at it . I have 2 directories . One of them holds a text file with list of files in it and the second one is a daily log which shows the file completion time. I need to co-relate both and make a report. ... (0 Replies)
Discussion started by: breez_drew
0 Replies

9. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and... (10 Replies)
Discussion started by: deepaktanna
10 Replies

10. Linux

Searching for gaps in huge (2.2G) log file?

I've got a 2.2 Gig syslog file from our Cisco firewall appliance. The problem is that we've been seeing gaps in the syslog for anywhere from 10 minutes to 2 hours. Currently I've just been using 'less' and paging through the file to see if I can find any noticeable gaps. Obviously this isn't the... (3 Replies)
Discussion started by: deckard
3 Replies
Login or Register to Ask a Question