Extract lines if string found from last 30 min only


Login or Register to Reply

 
Thread Tools Search this Thread
# 8  
Old 02-12-2019
Corrected format. Translated days into a two-digit number, hours from 12 to 24 hour format and removed nanoseconds and all that at the end
Code:
awk -F "<|>| |, |," -v d="$(LANG=C date -d -30minutes -u +" %b %d %Y %T")" '
 /^</           { line = $0
                if ( length($3) < 2 ) $3 = "0" $3
                split($5, a, ":" s)
                if ($7 == "PM" && a[1] != 12) $5 = (a[1]+=12) ":" a[2] ":" a[3]
                NF = 5
                }
(d < $0)        { print line }
' file.log

formats of compared values
Code:
$0 = Feb 02 2019 14:26:54
 d = Feb 12 2019 18:47:48


Last edited by nezabudka; 02-12-2019 at 02:40 PM..
# 9  
Old 02-12-2019
Any time you're trying to compare dates as strings you're doomed to failure if your strings contain a year that is not in the high order position, a month that is an abbreviated English month name instead of a month number, and/or days of month that are sometimes one digit and sometimes two digits. You need to be comparing date strings that in the same format and contain the same number of characters (unless you're going to convert everything to Seconds since the Epoch and perform a numeric comparison). The optimum string comparison format until the year 10000 is: YYYYmmddHHMMSS. You could try adding milliseconds to the end of that if you want to, but I don't think GNU date will give you anything other than 0 for milliseconds if you ask it to give you a date and time that is 1800 seconds ago. (And, if you tell it to give you a date and time that is 30 minutes ago, it will probably also give you 0 for the seconds part of your timestamp.

Note that I'm guessing on that, I don't have access to a GNU date utility. I do have access to a ksh version 93u+ which has a printf statement of the form:
Code:
printf "%(GNU_date_format_string)T\n" '1800 seconds ago'

that will give me date and time strings from 30 minutes ago (where GNU_date_format_string is a GNU date format string without the leading <plus-sign> character.

The following script seems to do what you want using the Korn shell on macOS Mojave version 10.14.3 to create a test log file with timestamps from 1900 seconds ago up to 1700 seconds ago in 15 second intervals to verify that it is converting dates so it starts printing records from the log file that are no more than 30 minutes old. If you comment out the printf statements that are printing dates and uncomment the date commands that are currently commented out, this code should work with either bash or ksh on a Linux system with a GNU date utility installed.

If you invoke this script with an argument (any argument), the awk script will print out debugging information showing how the split() function split up the lines in the date format you want to process until it finds a timestamp that meets your criteria.

Code:
#!/bin/ksh
# Create sample logfile for this test creating entries with two different date
# and time formats.
for ((i=1900; i>1700; i-=15))
do	TZ=UCT0 LC_ALL=C printf \
	    "%(<%b %e, %Y, %I:%M:%S,%3N %p %Z> <$i seconds ago>)T\n" \
	    "$i seconds ago"
#	LC_ALL=C date -u --date "$i seconds ago" \
#	    "+<%b %e, %Y, %I:%M:%S,%3N %p %Z> <$i seconds ago>" 
	    
	TZ=UCT0 LC_ALL=C printf \
	    "%(%Y-%m-%dT%H:%M:%S.%3N+0000: $i seconds ago)T\n" \
	    "$i seconds ago"
#	LC_ALL=C date -u --date "$i seconds ago" \
#	    "+%Y-%m-%dT%H:%M:%S.%3N+0000: $i seconds ago" 
done > logfile 
printf 'Using logfile containing:\n'
cat logfile

printf '\nstarting awk at about '
TZ=UCT0 LC_ALL=C printf '%(%Y-%m-%d %H:%M:%S,%3N)T\n'
#LC_ALL=C date -u '%Y-%m-%d %H:%M:%S,%3N'

start_date=$(TZ=UCT0 LC_ALL=C printf '%(%Y%m%d%H%M%S)T' '1800 seconds ago') 
#start_date=$(LC_ALL=C date -u '+%Y%m%d%H%M%S' --date '1800 seconds ago')
printf 'start_date=%s\n' "$start_date";date -u

awk -v start="$start_date" -v Log=$# '
BEGIN {	split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m2b)
	for(i = 1; i <= 12; i++)
		b2m[m2b[i]] = sprintf("%02d", i)
}
{	if($1 ~ /</) {
		if(print_it) {
			print
			next
		}
	} else	next
	split($0, fields, /[<> ,:]+/)
	if(Log) for(i=1; i<=12; i++) printf("fields[%d]=%s\n",i,fields[i])
	if(fields[5] == 12)
		fields[5] = "00"
	if(fields[9] == "PM")
		fields[5] += 12
	linedate = fields[4] b2m[fields[2]] sprintf("%02d", fields[3]) \
	    fields[5] fields[6] fields[7]
	if(Log)printf("linedate:%s from %s\n", linedate, substr($0,1,45))
	if(linedate >= start) {
		print_it = 1
		print
	}
}' logfile

Running this script a few minutes ago produced the following output:
Code:
Using logfile containing:
<Feb 12, 2019, 11:48:31,000 PM GMT> <1900 seconds ago>
2019-02-12T23:48:31.000+0000: 1900 seconds ago
<Feb 12, 2019, 11:48:46,000 PM GMT> <1885 seconds ago>
2019-02-12T23:48:46.000+0000: 1885 seconds ago
<Feb 12, 2019, 11:49:01,000 PM GMT> <1870 seconds ago>
2019-02-12T23:49:01.000+0000: 1870 seconds ago
<Feb 12, 2019, 11:49:16,000 PM GMT> <1855 seconds ago>
2019-02-12T23:49:16.000+0000: 1855 seconds ago
<Feb 12, 2019, 11:49:31,000 PM GMT> <1840 seconds ago>
2019-02-12T23:49:31.000+0000: 1840 seconds ago
<Feb 12, 2019, 11:49:46,000 PM GMT> <1825 seconds ago>
2019-02-12T23:49:46.000+0000: 1825 seconds ago
<Feb 12, 2019, 11:50:01,000 PM GMT> <1810 seconds ago>
2019-02-12T23:50:01.000+0000: 1810 seconds ago
<Feb 12, 2019, 11:50:16,000 PM GMT> <1795 seconds ago>
2019-02-12T23:50:16.000+0000: 1795 seconds ago
<Feb 12, 2019, 11:50:31,000 PM GMT> <1780 seconds ago>
2019-02-12T23:50:31.000+0000: 1780 seconds ago
<Feb 12, 2019, 11:50:46,000 PM GMT> <1765 seconds ago>
2019-02-12T23:50:46.000+0000: 1765 seconds ago
<Feb 12, 2019, 11:51:01,000 PM GMT> <1750 seconds ago>
2019-02-12T23:51:01.000+0000: 1750 seconds ago
<Feb 12, 2019, 11:51:16,000 PM GMT> <1735 seconds ago>
2019-02-12T23:51:16.000+0000: 1735 seconds ago
<Feb 12, 2019, 11:51:31,000 PM GMT> <1720 seconds ago>
2019-02-12T23:51:31.000+0000: 1720 seconds ago
<Feb 12, 2019, 11:51:46,000 PM GMT> <1705 seconds ago>
2019-02-12T23:51:46.000+0000: 1705 seconds ago

starting awk at about 2019-02-13 00:20:11,614
start_date=20190212235011
Wed Feb 13 00:20:11 UTC 2019
<Feb 12, 2019, 11:50:16,000 PM GMT> <1795 seconds ago>
<Feb 12, 2019, 11:50:31,000 PM GMT> <1780 seconds ago>
<Feb 12, 2019, 11:50:46,000 PM GMT> <1765 seconds ago>
<Feb 12, 2019, 11:51:01,000 PM GMT> <1750 seconds ago>
<Feb 12, 2019, 11:51:16,000 PM GMT> <1735 seconds ago>
<Feb 12, 2019, 11:51:31,000 PM GMT> <1720 seconds ago>
<Feb 12, 2019, 11:51:46,000 PM GMT> <1705 seconds ago>

Maybe this will give you something you can build on.

Last edited by Don Cragun; 4 Weeks Ago at 05:12 AM.. Reason: Add LC_ALL=C where missing in some of the GNU date utility invocations.
This User Gave Thanks to Don Cragun For This Post:
nezabudka (02-12-2019)
# 10  
Old 02-12-2019
Based on Don Crugun's comments, I'll just fix my script. Thanks
Code:
awk -F "<|>| |, |," -v d="$(LANG=C date -d -30minutes -u +"%Y%m%d%T")" '
BEGIN   { split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m2b)
        for(i = 1; i <= 12; i++)
        b2m[m2b[i]] = sprintf("%02d", i)
}
/^</    { line=$0
        if ( length($3) < 2 ) $3 = "0" $3
        split($5, a, ":" s)
        if ($5 == 24) $5 = "00"
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]
        $0 = $4 b2m[$2] $3 $5
        if ( d < $0 ) print line
}
' file

formats of compared
2019021212:26:55
# 11  
Old 02-13-2019
As an alternative to the string constant holding 12 English month names, you could try
Code:
awk -v"abmon=$(locale abmon)" 'BEGIN {for (n=split(abmon, MTH, ";"); n;n--) NumMTH[MTH[n]]=n} ... '

This User Gave Thanks to RudiC For This Post:
nezabudka (02-13-2019)
# 12  
Old 4 Weeks Ago
Quote:
Originally Posted by nezabudka
Based on Don Crugun's comments, I'll just fix my script. Thanks
Code:
awk -F "<|>| |, |," -v d="$(LANG=C date -d -30minutes -u +"%Y%m%d%T")" '
BEGIN   { split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m2b)
        for(i = 1; i <= 12; i++)
        b2m[m2b[i]] = sprintf("%02d", i)
}
/^</    { line=$0
        if ( length($3) < 2 ) $3 = "0" $3
        split($5, a, ":" s)
        if ($5 == 24) $5 = "00"
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]
        $0 = $4 b2m[$2] $3 $5
        if ( d < $0 ) print line
}
' file

formats of compared
2019021212:26:55
Hi nezabudka,
Note that although the above code will work in many cases, there are a few issues that will cause it to fail intermittently:
First, the command in the command substitution:
Code:
LANG=C date -d -30minutes -u +"%Y%m%d%T"

You may have noticed that when I used a similar construct in the code I suggested in post #9 (correctly in all the ksh93 printf calls and sometimes correctly in the GNU date invocations [all have now been fixed]) that I used LC_ALL=C instead of LANG=C. These environment variables (along with other LC_* variables for the various locale categories have a hierarchy that determines which variable controls the operation when more than one of them are found in the environment. For example, if I run the command RudiC mentioned in post #11 to get a locale's abbreviated month names with the three variables that control the strings used to define a locale's month names all set to different values: LC_ALL=ru_RU specifying a Russian locale for all locale categories no matter what other locale variables are set, LC_TIME=it_IT specifying an Italian locale for time related strings defined by the standards, and LANG=C specifying the locale to be used if none of the other locale environment variables are set, we see that if LC_ALL is defined on the command line (or in your environment) it overrides all of the other locale variables:
Code:
LC_ALL=ru_RU LC_TIME=it_IT LANG=C locale abmon
янв;фев;мар;апр;май;июн;июл;авг;сен;окт;ноя;дек

which gives us the abbreviated month names in Russian. If we drop the setting for LC_ALL (and don't have LC_ALL set in the environment), the command:
Code:
LC_TIME=it_IT LANG=C locale abmon
Gen;Feb;Mar;Apr;Mag;Giu;Lug;Ago;Set;Ott;Nov;Dic

which gives us Italian abbreviated month names. So, if you want to want to guarantee that the date utility will English names for things like "minutes" and "seconds" when using date -d time_base or date --date time_base, you need to use LC_ALL=C or LC_ALL=POSIX instead of LANG=C or LANG=POSIX. Note that I don't have a GNU date utility installed on my system and I don't know which locale category it uses to match the time period strings in -d option-arguments. I would guess that they are controlled by LC_TIME, but they could also be controlled by LC_MESSAGES. Either way, setting LC_ALL will override it and give you what you want.

Second, in the awk statement:
Code:
split($5, a, ":" s)

you only get the results you want because the variable s is not defined in your script. To reduce confusion and protect against a user invoking your awk script with a defined s variable, change the last argument in that function call to just ":" instead of ":" s.

Third, the expression in the awk if statement:
Code:
if ($5 == 24) $5 = "00"

can't ever yield a true result. In this script, $5 on the lines you're processing will always be of the form hh:mm:ss,sss where hh is the hour in 12-hour clock format (01-12), mm is the minute (00-59), and ss,sss is the seconds (00-60) and subseconds apparently consisting of 1 to 3 decimal digits representing tenths, hundredths, or thousandths of a second. There is no way that a string representing a clock for the current time in the above form will ever be the string 24, nor even start with that string. Presumably you want to determine if the hour portion of the time field is 12 and, if it is, reset it to 00 (which will be the correct 24-hour clock hour field if the AM/PM indicator is AM and will later be incremented back to 12 if the AM/PM indicator is PM. I would guess that you would get what you had intended to do if you change the two lines in your code:
Code:
        if ($5 == 24) $5 = "00"
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]

to:
Code:
        if (a[1] == 12) $5 = (a[1] = "00") ":" a[2] ":" a[3]
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]

or to:
Code:
        if (a[1] == 12 && $7 == "AM") $5 = "00:" a[2] ":" a[3]
        if (a[1] < 12 && $7 == "PM") $5 = (a[1] + 12) ":" a[2] ":" a[3]

If you run into issues similar to these in the future, I hope these comments will help you understand some of the pitfalls you have to watch out for when writing code to deal with various date and time formats.

Cheers,
Don
These 2 Users Gave Thanks to Don Cragun For This Post:
Neo (4 Weeks Ago) nezabudka (4 Weeks Ago)
# 13  
Old 4 Weeks Ago
Thank you very much for the comments. All the above I be taken into account for the future.
And in the last remark. This is my carelessness and bug. The order of the expressions was violated.
Apparently I wanted to make something like that.
Code:
        if ($7 == "PM") a[1]+=12
        if (a[1] == 24) a[1] = "00"
        $5 = a[1] ":" a[2] ":" a[3]

Thanks to @RudiC. There are no options in the man page on this issue:
Code:
locale abday
locale abmon

Thank you for teaching, it was very informative.
# 14  
Old 4 Weeks Ago
Quote:
Originally Posted by nezabudka
...
@RudiC. There are no options in the man page on this issue:
Code:
locale abday
locale abmon

...
Aren't there?

Quote:
man 5 locale
.
.
.
LC_TIME
The definition starts with the string LC_TIME in the first column.
The following keywords are allowed:
abday followed by a list of abbreviated names of the days of the week. The list starts with the first day of the week as specified by week (Sunday by default).
day followed by a list of names of the days of the week. The list starts with the first day of the week as specified by week (Sunday by default). See NOTES.
abmon followed by a list of abbreviated month names.
.
.
.
This User Gave Thanks to RudiC For This Post:
nezabudka (4 Weeks Ago)
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
awk - (URGENT!) Print lines sort and move lines if match found High-T UNIX for Dummies Questions & Answers 1 02-02-2015 02:05 AM
Search String and extract few lines under the searched string ajayram_arya Shell Programming and Scripting 4 01-08-2014 05:38 PM
Extract lines with min value, using two field separators. pathunkathunk Shell Programming and Scripting 6 11-10-2013 07:55 AM
Integrate MIN and MAX in a string beca123456 UNIX for Dummies Questions & Answers 8 02-14-2013 04:37 AM
Move a block of lines to file if string found in the block. grep_me UNIX for Advanced & Expert Users 7 11-09-2012 11:29 AM
Get 20 lines above string found, and 35 below string SkySmart Shell Programming and Scripting 4 10-09-2012 10:22 AM
Search for a pattern,extract value(s) from next line, extract lines having those extracted value(s) AshwaniSharma09 Shell Programming and Scripting 7 07-24-2012 11:55 AM
grep log lines logged in 10 min Daniel Gate Shell Programming and Scripting 4 06-04-2012 06:48 AM
AWK script - extracting min and max values from selected lines grincz Shell Programming and Scripting 18 02-03-2012 06:24 PM
search and replace, when found, delete multiple lines, add new set of lines? DeuceLee Shell Programming and Scripting 3 11-23-2011 03:39 PM
Find min.max value if matching columns found using AWK vasanth.vadalur Shell Programming and Scripting 3 11-20-2011 10:19 AM
Grep a string and write a value to next line of found string angel12345 Shell Programming and Scripting 6 08-16-2011 11:07 AM
Find String in FileName and move the String to new File if not found us_pokiri Linux 1 07-20-2011 03:03 AM
Print lines after the search string until blank line is found prash184u Shell Programming and Scripting 3 08-19-2010 02:31 PM
Best approach for a 10 min extract out of several log files with timestamped records Browser_ice UNIX for Dummies Questions & Answers 3 11-15-2005 04:49 PM