Extract lines if string found from last 30 min only


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Extract lines if string found from last 30 min only
# 8  
Old 02-12-2019
Corrected format. Translated days into a two-digit number, hours from 12 to 24 hour format and removed nanoseconds and all that at the end
Code:
awk -F "<|>| |, |," -v d="$(LANG=C date -d -30minutes -u +" %b %d %Y %T")" '
 /^</           { line = $0
                if ( length($3) < 2 ) $3 = "0" $3
                split($5, a, ":" s)
                if ($7 == "PM" && a[1] != 12) $5 = (a[1]+=12) ":" a[2] ":" a[3]
                NF = 5
                }
(d < $0)        { print line }
' file.log

formats of compared values
Code:
$0 = Feb 02 2019 14:26:54
 d = Feb 12 2019 18:47:48


Last edited by nezabudka; 02-12-2019 at 03:40 PM..
# 9  
Old 02-12-2019
Any time you're trying to compare dates as strings you're doomed to failure if your strings contain a year that is not in the high order position, a month that is an abbreviated English month name instead of a month number, and/or days of month that are sometimes one digit and sometimes two digits. You need to be comparing date strings that in the same format and contain the same number of characters (unless you're going to convert everything to Seconds since the Epoch and perform a numeric comparison). The optimum string comparison format until the year 10000 is: YYYYmmddHHMMSS. You could try adding milliseconds to the end of that if you want to, but I don't think GNU date will give you anything other than 0 for milliseconds if you ask it to give you a date and time that is 1800 seconds ago. (And, if you tell it to give you a date and time that is 30 minutes ago, it will probably also give you 0 for the seconds part of your timestamp.

Note that I'm guessing on that, I don't have access to a GNU date utility. I do have access to a ksh version 93u+ which has a printf statement of the form:
Code:
printf "%(GNU_date_format_string)T\n" '1800 seconds ago'

that will give me date and time strings from 30 minutes ago (where GNU_date_format_string is a GNU date format string without the leading <plus-sign> character.

The following script seems to do what you want using the Korn shell on macOS Mojave version 10.14.3 to create a test log file with timestamps from 1900 seconds ago up to 1700 seconds ago in 15 second intervals to verify that it is converting dates so it starts printing records from the log file that are no more than 30 minutes old. If you comment out the printf statements that are printing dates and uncomment the date commands that are currently commented out, this code should work with either bash or ksh on a Linux system with a GNU date utility installed.

If you invoke this script with an argument (any argument), the awk script will print out debugging information showing how the split() function split up the lines in the date format you want to process until it finds a timestamp that meets your criteria.

Code:
#!/bin/ksh
# Create sample logfile for this test creating entries with two different date
# and time formats.
for ((i=1900; i>1700; i-=15))
do	TZ=UCT0 LC_ALL=C printf \
	    "%(<%b %e, %Y, %I:%M:%S,%3N %p %Z> <$i seconds ago>)T\n" \
	    "$i seconds ago"
#	LC_ALL=C date -u --date "$i seconds ago" \
#	    "+<%b %e, %Y, %I:%M:%S,%3N %p %Z> <$i seconds ago>" 
	    
	TZ=UCT0 LC_ALL=C printf \
	    "%(%Y-%m-%dT%H:%M:%S.%3N+0000: $i seconds ago)T\n" \
	    "$i seconds ago"
#	LC_ALL=C date -u --date "$i seconds ago" \
#	    "+%Y-%m-%dT%H:%M:%S.%3N+0000: $i seconds ago" 
done > logfile 
printf 'Using logfile containing:\n'
cat logfile

printf '\nstarting awk at about '
TZ=UCT0 LC_ALL=C printf '%(%Y-%m-%d %H:%M:%S,%3N)T\n'
#LC_ALL=C date -u '%Y-%m-%d %H:%M:%S,%3N'

start_date=$(TZ=UCT0 LC_ALL=C printf '%(%Y%m%d%H%M%S)T' '1800 seconds ago') 
#start_date=$(LC_ALL=C date -u '+%Y%m%d%H%M%S' --date '1800 seconds ago')
printf 'start_date=%s\n' "$start_date";date -u

awk -v start="$start_date" -v Log=$# '
BEGIN {	split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m2b)
	for(i = 1; i <= 12; i++)
		b2m[m2b[i]] = sprintf("%02d", i)
}
{	if($1 ~ /</) {
		if(print_it) {
			print
			next
		}
	} else	next
	split($0, fields, /[<> ,:]+/)
	if(Log) for(i=1; i<=12; i++) printf("fields[%d]=%s\n",i,fields[i])
	if(fields[5] == 12)
		fields[5] = "00"
	if(fields[9] == "PM")
		fields[5] += 12
	linedate = fields[4] b2m[fields[2]] sprintf("%02d", fields[3]) \
	    fields[5] fields[6] fields[7]
	if(Log)printf("linedate:%s from %s\n", linedate, substr($0,1,45))
	if(linedate >= start) {
		print_it = 1
		print
	}
}' logfile

Running this script a few minutes ago produced the following output:
Code:
Using logfile containing:
<Feb 12, 2019, 11:48:31,000 PM GMT> <1900 seconds ago>
2019-02-12T23:48:31.000+0000: 1900 seconds ago
<Feb 12, 2019, 11:48:46,000 PM GMT> <1885 seconds ago>
2019-02-12T23:48:46.000+0000: 1885 seconds ago
<Feb 12, 2019, 11:49:01,000 PM GMT> <1870 seconds ago>
2019-02-12T23:49:01.000+0000: 1870 seconds ago
<Feb 12, 2019, 11:49:16,000 PM GMT> <1855 seconds ago>
2019-02-12T23:49:16.000+0000: 1855 seconds ago
<Feb 12, 2019, 11:49:31,000 PM GMT> <1840 seconds ago>
2019-02-12T23:49:31.000+0000: 1840 seconds ago
<Feb 12, 2019, 11:49:46,000 PM GMT> <1825 seconds ago>
2019-02-12T23:49:46.000+0000: 1825 seconds ago
<Feb 12, 2019, 11:50:01,000 PM GMT> <1810 seconds ago>
2019-02-12T23:50:01.000+0000: 1810 seconds ago
<Feb 12, 2019, 11:50:16,000 PM GMT> <1795 seconds ago>
2019-02-12T23:50:16.000+0000: 1795 seconds ago
<Feb 12, 2019, 11:50:31,000 PM GMT> <1780 seconds ago>
2019-02-12T23:50:31.000+0000: 1780 seconds ago
<Feb 12, 2019, 11:50:46,000 PM GMT> <1765 seconds ago>
2019-02-12T23:50:46.000+0000: 1765 seconds ago
<Feb 12, 2019, 11:51:01,000 PM GMT> <1750 seconds ago>
2019-02-12T23:51:01.000+0000: 1750 seconds ago
<Feb 12, 2019, 11:51:16,000 PM GMT> <1735 seconds ago>
2019-02-12T23:51:16.000+0000: 1735 seconds ago
<Feb 12, 2019, 11:51:31,000 PM GMT> <1720 seconds ago>
2019-02-12T23:51:31.000+0000: 1720 seconds ago
<Feb 12, 2019, 11:51:46,000 PM GMT> <1705 seconds ago>
2019-02-12T23:51:46.000+0000: 1705 seconds ago

starting awk at about 2019-02-13 00:20:11,614
start_date=20190212235011
Wed Feb 13 00:20:11 UTC 2019
<Feb 12, 2019, 11:50:16,000 PM GMT> <1795 seconds ago>
<Feb 12, 2019, 11:50:31,000 PM GMT> <1780 seconds ago>
<Feb 12, 2019, 11:50:46,000 PM GMT> <1765 seconds ago>
<Feb 12, 2019, 11:51:01,000 PM GMT> <1750 seconds ago>
<Feb 12, 2019, 11:51:16,000 PM GMT> <1735 seconds ago>
<Feb 12, 2019, 11:51:31,000 PM GMT> <1720 seconds ago>
<Feb 12, 2019, 11:51:46,000 PM GMT> <1705 seconds ago>

Maybe this will give you something you can build on.

Last edited by Don Cragun; 02-15-2019 at 06:12 AM.. Reason: Add LC_ALL=C where missing in some of the GNU date utility invocations.
This User Gave Thanks to Don Cragun For This Post:
# 10  
Old 02-13-2019
Based on Don Crugun's comments, I'll just fix my script. Thanks
Code:
awk -F "<|>| |, |," -v d="$(LANG=C date -d -30minutes -u +"%Y%m%d%T")" '
BEGIN   { split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m2b)
        for(i = 1; i <= 12; i++)
        b2m[m2b[i]] = sprintf("%02d", i)
}
/^</    { line=$0
        if ( length($3) < 2 ) $3 = "0" $3
        split($5, a, ":" s)
        if ($5 == 24) $5 = "00"
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]
        $0 = $4 b2m[$2] $3 $5
        if ( d < $0 ) print line
}
' file

formats of compared
2019021212:26:55
# 11  
Old 02-13-2019
As an alternative to the string constant holding 12 English month names, you could try
Code:
awk -v"abmon=$(locale abmon)" 'BEGIN {for (n=split(abmon, MTH, ";"); n;n--) NumMTH[MTH[n]]=n} ... '

This User Gave Thanks to RudiC For This Post:
# 12  
Old 02-15-2019
Quote:
Originally Posted by nezabudka
Based on Don Crugun's comments, I'll just fix my script. Thanks
Code:
awk -F "<|>| |, |," -v d="$(LANG=C date -d -30minutes -u +"%Y%m%d%T")" '
BEGIN   { split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m2b)
        for(i = 1; i <= 12; i++)
        b2m[m2b[i]] = sprintf("%02d", i)
}
/^</    { line=$0
        if ( length($3) < 2 ) $3 = "0" $3
        split($5, a, ":" s)
        if ($5 == 24) $5 = "00"
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]
        $0 = $4 b2m[$2] $3 $5
        if ( d < $0 ) print line
}
' file

formats of compared
2019021212:26:55
Hi nezabudka,
Note that although the above code will work in many cases, there are a few issues that will cause it to fail intermittently:
First, the command in the command substitution:
Code:
LANG=C date -d -30minutes -u +"%Y%m%d%T"

You may have noticed that when I used a similar construct in the code I suggested in post #9 (correctly in all the ksh93 printf calls and sometimes correctly in the GNU date invocations [all have now been fixed]) that I used LC_ALL=C instead of LANG=C. These environment variables (along with other LC_* variables for the various locale categories have a hierarchy that determines which variable controls the operation when more than one of them are found in the environment. For example, if I run the command RudiC mentioned in post #11 to get a locale's abbreviated month names with the three variables that control the strings used to define a locale's month names all set to different values: LC_ALL=ru_RU specifying a Russian locale for all locale categories no matter what other locale variables are set, LC_TIME=it_IT specifying an Italian locale for time related strings defined by the standards, and LANG=C specifying the locale to be used if none of the other locale environment variables are set, we see that if LC_ALL is defined on the command line (or in your environment) it overrides all of the other locale variables:
Code:
LC_ALL=ru_RU LC_TIME=it_IT LANG=C locale abmon
янв;фев;мар;апр;май;июн;июл;авг;сен;окт;ноя;дек

which gives us the abbreviated month names in Russian. If we drop the setting for LC_ALL (and don't have LC_ALL set in the environment), the command:
Code:
LC_TIME=it_IT LANG=C locale abmon
Gen;Feb;Mar;Apr;Mag;Giu;Lug;Ago;Set;Ott;Nov;Dic

which gives us Italian abbreviated month names. So, if you want to want to guarantee that the date utility will English names for things like "minutes" and "seconds" when using date -d time_base or date --date time_base, you need to use LC_ALL=C or LC_ALL=POSIX instead of LANG=C or LANG=POSIX. Note that I don't have a GNU date utility installed on my system and I don't know which locale category it uses to match the time period strings in -d option-arguments. I would guess that they are controlled by LC_TIME, but they could also be controlled by LC_MESSAGES. Either way, setting LC_ALL will override it and give you what you want.

Second, in the awk statement:
Code:
split($5, a, ":" s)

you only get the results you want because the variable s is not defined in your script. To reduce confusion and protect against a user invoking your awk script with a defined s variable, change the last argument in that function call to just ":" instead of ":" s.

Third, the expression in the awk if statement:
Code:
if ($5 == 24) $5 = "00"

can't ever yield a true result. In this script, $5 on the lines you're processing will always be of the form hh:mm:ss,sss where hh is the hour in 12-hour clock format (01-12), mm is the minute (00-59), and ss,sss is the seconds (00-60) and subseconds apparently consisting of 1 to 3 decimal digits representing tenths, hundredths, or thousandths of a second. There is no way that a string representing a clock for the current time in the above form will ever be the string 24, nor even start with that string. Presumably you want to determine if the hour portion of the time field is 12 and, if it is, reset it to 00 (which will be the correct 24-hour clock hour field if the AM/PM indicator is AM and will later be incremented back to 12 if the AM/PM indicator is PM. I would guess that you would get what you had intended to do if you change the two lines in your code:
Code:
        if ($5 == 24) $5 = "00"
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]

to:
Code:
        if (a[1] == 12) $5 = (a[1] = "00") ":" a[2] ":" a[3]
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]

or to:
Code:
        if (a[1] == 12 && $7 == "AM") $5 = "00:" a[2] ":" a[3]
        if (a[1] < 12 && $7 == "PM") $5 = (a[1] + 12) ":" a[2] ":" a[3]

If you run into issues similar to these in the future, I hope these comments will help you understand some of the pitfalls you have to watch out for when writing code to deal with various date and time formats.

Cheers,
Don
These 2 Users Gave Thanks to Don Cragun For This Post:
# 13  
Old 02-16-2019
Thank you very much for the comments. All the above I be taken into account for the future.
And in the last remark. This is my carelessness and bug. The order of the expressions was violated.
Apparently I wanted to make something like that.
Code:
        if ($7 == "PM") a[1]+=12
        if (a[1] == 24) a[1] = "00"
        $5 = a[1] ":" a[2] ":" a[3]

Thanks to @RudiC. There are no options in the man page on this issue:
Code:
locale abday
locale abmon

Thank you for teaching, it was very informative.
# 14  
Old 02-16-2019
Quote:
Originally Posted by nezabudka
...
@RudiC. There are no options in the man page on this issue:
Code:
locale abday
locale abmon

...
Aren't there?

Quote:
man 5 locale
.
.
.
LC_TIME
The definition starts with the string LC_TIME in the first column.
The following keywords are allowed:
abday followed by a list of abbreviated names of the days of the week. The list starts with the first day of the week as specified by week (Sunday by default).
day followed by a list of names of the days of the week. The list starts with the first day of the week as specified by week (Sunday by default). See NOTES.
abmon followed by a list of abbreviated month names.
.
.
.
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search String and extract few lines under the searched string

Need Assistance in shell programming... I have a huge file which has multiple stations and i wanted to search particular station and extract few lines from it and the rest is not needed Bold letters are the stations . The whole file has multiple stations . Below example i wanted to search... (4 Replies)
Discussion started by: ajayram_arya
4 Replies

2. Shell Programming and Scripting

Extract lines with min value, using two field separators.

I have a file with two ID columns followed by five columns of counts in fraction form. I'd like to print lines that have a count of at least 4 (so at least 4 in the numerator, e.g. 4/17) in at least one of the five columns. Input file: comp51820_c1_seq1 693 0/29 0/50 0/69 0/36 0/31... (6 Replies)
Discussion started by: pathunkathunk
6 Replies

3. UNIX for Dummies Questions & Answers

Integrate MIN and MAX in a string

I need to use awk for this task ! input (fields are separated by ";"): 1%2%3%4%;AA 5%6%7%8%9;AA 1%2%3%4%5%6;BB 7%8%9%10%11%12;BBIn the 1st field there are patterns composed of numbers separated by "%". The 2nd field define groups (here two different groups called "AA" and "BB"). Records... (8 Replies)
Discussion started by: beca123456
8 Replies

4. UNIX for Advanced & Expert Users

Move a block of lines to file if string found in the block.

I have a "main" file which has blocks of data for each user defined by tags BEGIN and END. BEGIN ID_NUM:24879 USER:abc123 HOW:47M CMD1:xyz1 CMD2:arp2 STATE:active PROCESS:id60 END BEGIN ID_NUM:24880 USER:def123 HOW:4M CMD1:xyz1 CMD2:xyz2 STATE:running PROCESS:id64 END (7 Replies)
Discussion started by: grep_me
7 Replies

5. Shell Programming and Scripting

Get 20 lines above string found, and 35 below string

i want to search a log for a string. when that string is found, i want to grab the a set number of lines that came before the string, and a set number of lines that come after the string. so if i search for the word "Error" in the /var/log/messages file, how can I output the 20 lines that came... (4 Replies)
Discussion started by: SkySmart
4 Replies

6. Shell Programming and Scripting

grep log lines logged in 10 min

A log files has lines (1 line per each log for a majority; a few for 2 lines per each log) May 31 14:00:11 rtprodapp1 local2:notice sudo: jdoe : TTY=pts/0 ; PWD=/home/jdoe ; USER=root ; COMMAND=/usr/bin/su - May 31 14:03:19 rtprodapp1 local2:notice sudo: jdoe : TTY=pts/0 ; PWD=/home/jdoe ;... (4 Replies)
Discussion started by: Daniel Gate
4 Replies

7. Shell Programming and Scripting

search and replace, when found, delete multiple lines, add new set of lines?

hey guys, I tried searching but most 'search and replace' questions are related to one liners. Say I have a file to be replaced that has the following: $ cat testing.txt TESTING AAA BBB CCC DDD EEE FFF GGG HHH ENDTESTING This is the input file: (3 Replies)
Discussion started by: DeuceLee
3 Replies

8. Shell Programming and Scripting

Find min.max value if matching columns found using AWK

Input_ File : 2 3 4 5 1 1 0 1 2 1 -1 1 2 1 3 1 3 1 4 1 6 5 6 6 6 6 6 7 6 7 6 8 5 8 6 7 Desired output : 2 3 4 5 -1 1 4 1 6 5 6 8 5 8 6 7 (3 Replies)
Discussion started by: vasanth.vadalur
3 Replies

9. Shell Programming and Scripting

Print lines after the search string until blank line is found

All I want is to look for the pattern in the file...If I found it at # places... I want print lines after those pattern(line) until I find a blank line. Log EXAMPLE : MT:Exception caught The following Numbers were affected: 1234 2345 2346 Error java.lang.InternalError:... (3 Replies)
Discussion started by: prash184u
3 Replies

10. UNIX for Dummies Questions & Answers

Best approach for a 10 min extract out of several log files with timestamped records

I have a task where I need to code a shell script to extract a 10 min range (10 min from now until now) extract of a log file. I taught I could simply use a command that would say something like Start=date - 10 min but I didn't find anything. Looks like the only solution would have to code a... (3 Replies)
Discussion started by: Browser_ice
3 Replies
Login or Register to Ask a Question