Visit Our UNIX and Linux User Community


Parse apache log file with three different time formats


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Parse apache log file with three different time formats
# 1  
Old 08-29-2019
Parse apache log file with three different time formats

Hi,

I want to parse below file and Write a function to extract the logs between two given timestamp.

Apache (Unix) Log Samples - MonitorWare

The challenge here is there are three date and time format.

First :- 07/Mar/2004:16:05:49
Second :- Sun Mar 7 16:02:00 2004
Third :- 29-Mar 15:18:20.54

I have sed command which can help to get this but we should force user to mention format . I want this to be general . How can i achieve this. I will like to parse log file and create a new file to keep time format same and then using sed or grep it's pretty simple.

Code:
sed -n '/07\/Mar\/2004:16:05:49/,/07\/Mar\/2004:16:31:48/p' log

sed -n '/Sun Mar 7 16:02:00 2004/,/Mon Mar 8 00:11:22 2004/p' log
sed -n '/29-Mar 15:18:20.50/,/29-Mar 15:18:20.54/p' log

Please let me know a good way to achieve this. Any pointers will also help
# 2  
Old 08-29-2019
There are two basic approaches - one for linux, another for non-linux. So which one do you have? Shell would be helpful, too.
# 3  
Old 08-29-2019
Also, utilities like GoAccess have nice analytics and reporting tools that handle several times of time stamp formats.

Last edited by rdrtx1; 02-18-2020 at 08:30 PM..
# 4  
Old 08-29-2019
Try this to prefix the date/time to every log line:
Code:
awk -vDM="$(LC_ALL=C locale abday abmon)" '
BEGIN           {gsub (/;/, "|", DM)
                 split (DM, T)
                 MStr1 = "(" T[1] ") (" T[2] ") *[0-9]* [0-9:]* [0-9]*"
                 MStr2 = "[0-9]*/(" T[2] ")/[0-9:]* -[0-9]*"
                 MStr3 = "[0-9]*-(" T[2] ") [0-9:.]*"
                 MStr  = "(" MStr1 ")|(" MStr2 ")|(" MStr3 ")"
                }
match ($0, MStr)        {print substr ($0, RSTART, RLENGTH), $0
                        }
 ' /tmp/*log


EDIT: or, somewhat simplified,



Code:
awk -vDM="$(LC_ALL=C locale abday abmon)" '
BEGIN           {gsub (/;/, "|", DM)
                 split (DM, T)
                 MStr1 = "(" T[1] ") (" T[2] ") *[0-9]* [0-9:]* [0-9]*"
                 MStr2 = "[0-9]*[-/](" T[2] ")(/[0-9:]* -| )*[0-9:.]*"
                 MStr  = "(" MStr1 ")|(" MStr2 ")"
                }
match ($0, MStr)        {print substr ($0, RSTART, RLENGTH), $0
                        }
' /tmp/*log

These 2 Users Gave Thanks to RudiC For This Post:
# 5  
Old 08-29-2019
Classic Approach: Convert dates to epoch and simply compare(classic: unexcited, not extraordinarily short, simple logic)

Code:
#!/bin/sh

awk -vstart="$1" -vend="$2" ' 

BEGIN {
        start_epoch = mktime(start)
        end_epoch   = mktime(end)
}

function monthnumber(monthname) {
        return sprintf("%02d\n",(match("JanFebMarAprMayJunJulAugSepOctNovDec",monthname)+2))/3
}

match($0,/^([0-9]+)\/([a-zA-Z]+)\/([0-9]{4}):([0-9]{2}):([0-9]{2}):([0-9]{2})/,r) { 
        current=mktime( sprintf("%s %s %s %s %s %s", r[3],monthnumber(r[2]),r[1],r[4],r[5],r[6])); }

match($0,/^[a-zA-Z]+ ([a-zA-Z]+) ([0-9]+) ([0-9]+):([0-9]+):([0-9]+) ([0-9]{4})/,r) { 
        current=mktime( sprintf("%s %s %s %s %s %s", r[6],monthnumber(r[1]),r[2],r[3],r[4],r[5])); }

match($0,/^([0-9]+)-([a-zA-Z]+) ([0-9]+):([0-9]+):([0-9]+)/,r) { 
        current=mktime( sprintf("%s %s %s %s %s %s", strftime("%Y"),monthnumber(r[2]),r[1],r[3],r[4],r[5])); } 

(current < start_epoch) { next }
(current > end_epoch  ) { exit }

1
' | "$3"

run like this:

Code:
# call is: ./logsearch "YYYY mm dd HH MM SS" "YYYY mm dd HH MM SS" logfile

./logsearch "2010 10 24 16 34 00" "2020 10 25 23 59 00" my.log

Notes
  • This needs GNU awk
  • I assume the missing year in format #3 is the current year. Maybe this is not the case. If the search is within a year. This does not matter.
  • I do not take care of fractions of a second in format #3, so you get a bit more out of the log than you specify
  • Not locale aware(look at Rudis post for a possible method)

Last edited by stomp; 08-30-2019 at 04:13 PM..
These 2 Users Gave Thanks to stomp For This Post:
# 6  
Old 08-30-2019
It's always best in my view to convert date and time strings to unixtime and do all calculations in unixtime and then convert the results back to a time string based on locale (local time information, timezone information, etc.).

It's kinda "nutty" in my view to try to manipulate / process time using formatted strings which are only a string representation of a "time" in the local time format.

That is why we store "time" in databases as unix timestamps. We do not, generally speaking, store "time" as a formatted time string.
# 7  
Old 08-30-2019
If the logs are very big it may be a good trick to read them backwards, because maybe the interesting part is more likely at the end of the file, so we maybe save to read tons of old lines that way:

Code:
#!/bin/sh
logfile="$3"

# reverse at the beginning to read from end to start
tac "$logfile" | awk -vstart="$1" -vend="$2" ' 

BEGIN {
        start_epoch = mktime(start)
        end_epoch   = mktime(end)
}

function monthnumber(monthname) {
        return sprintf("%02d\n",(match("JanFebMarAprMayJunJulAugSepOctNovDec",monthname)+2))/3
}

match($0,/^([0-9]+)\/([a-zA-Z]+)\/([0-9]{4}):([0-9]{2}):([0-9]{2}):([0-9]{2})/,r) { 
        current=mktime( sprintf("%s %s %s %s %s %s", r[3],monthnumber(r[2]),r[1],r[4],r[5],r[6])); }

match($0,/^[a-zA-Z]+ ([a-zA-Z]+) ([0-9]+) ([0-9]+):([0-9]+):([0-9]+) ([0-9]{4})/,r) { 
        current=mktime( sprintf("%s %s %s %s %s %s", r[6],monthnumber(r[1]),r[2],r[3],r[4],r[5])); }

match($0,/^([0-9]+)-([a-zA-Z]+) ([0-9]+):([0-9]+):([0-9]+)/,r) { 
        current=mktime( sprintf("%s %s %s %s %s %s", strftime("%Y"),monthnumber(r[2]),r[1],r[3],r[4],r[5])); } 

# we have to swap the actions here!
(current < start_epoch) { exit }
(current > end_epoch  ) { next }

1
' | tac 
# and reverse again at the end to return to chronological order

Script call stays the same.

Last edited by stomp; 08-30-2019 at 07:57 PM..

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl to parse a variety of formats

The below perl script parses a variety of formats. If I use the numeric text file as input the script works correctly. However using the alpha text file as input there is a black output file. The portion in bold splits the field to parse f or NC_000023.10:g.153297761C>A into a variable $common but... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

Comparing different time formats

I am trying to do a comparison of files based on their last modified date. I am pulling the first file from a webapp folder using curl. curl --silent -I http://localhost:8023/conf/log4j2.xml | grep Last Last-Modified: Tue, 22 Mar 2016 22:02:18 GMT The second file is on local disk. stat... (2 Replies)
Discussion started by: Junaid Subhani
2 Replies

3. Shell Programming and Scripting

Shell Script | Parse log file after a given date and time stamp

I am developing one script which will take log file name, output file name, date, hour and minute as an argument and based on these inputs, the script will scan and capture all the error(s) that have been triggered from a given time. Example: script should capture all the error after 13:50 on Jan... (2 Replies)
Discussion started by: ROMA3
2 Replies

4. Shell Programming and Scripting

Parse A Log File

Hello All, Below is the excerpt from my Informatica log file which has 4 blocks of lines (starting with WRITER_1_*_1). Like these my log file will have multiple blocks of same pattern. WRITER_1_*_1> WRT_8161 TARGET BASED COMMIT POINT Thu May 08 09:33:21 2014... (13 Replies)
Discussion started by: Ariean
13 Replies

5. Shell Programming and Scripting

Using awk to parse a file with mixed formats in columns

Greetings I have a file formatted like this: rhino grey weight=1003;height=231;class=heaviest;histology=9,0,0,8 bird white weight=23;height=88;class=light;histology=7,5,1,0,0 turtle green weight=40;height=9;class=light;histology=6,0,2,0... (2 Replies)
Discussion started by: Twinklefingers
2 Replies

6. Shell Programming and Scripting

Check/Parse log file's lines using time difference/timestamp

I was looking at this script which outputs the two lines which differs less than one sec. #!/usr/bin/perl -w use strict; use warnings; use Time::Local; use constant SEC_MILIC => 1000; my $file='infile'; ## Open for reading argument file. open my $fh, "<", $file or die "Cannot... (1 Reply)
Discussion started by: cele_82
1 Replies

7. Shell Programming and Scripting

Extracting data from a log file with date formats

Hello, I have a log file for the year, which contains lines starting with the data in the format of YYYY-MM-DD. I need to get all the lines that contain the DD being 04, how would I do this? I tried using grep "*-*04" but it didn't work. Any quick one liners I should know about? Thank you. (2 Replies)
Discussion started by: cpickering
2 Replies

8. Shell Programming and Scripting

sed command to parse Apache config file

Hi there, am trying to parse an Apache 'server' config file. A snippet of the config file is shown below: ..... ProxyPassReverse /foo http://foo.example.com/bar ..... ..... RewriteRule ^/(.*) http://www.example.com/$1 RewriteRule /redirect https://www.example1.com/$1 ........ (7 Replies)
Discussion started by: jy2k7ca
7 Replies

9. Shell Programming and Scripting

Setting of two time formats in one machine

Hi, Is it possible to set the two time formats in a single machine. My machine time is in EST and the logs are in PST. What would be the issue, and how to make change of this.? (5 Replies)
Discussion started by: gsiva
5 Replies

10. Shell Programming and Scripting

Processing a log file based on date/time input and the date/time on the log file

Hi, I'm trying to accomplish the following and would like some suggestions or possible bash script examples that may work I have a directory that has a list of log files that's periodically dumped from a script that is crontab that are rotated 4 generations. There will be a time stamp that is... (4 Replies)
Discussion started by: primp
4 Replies

Featured Tech Videos