Quote:
Originally Posted by
Don Cragun
I must be missing something here. But, since the starting and ending timestamps in the awk code in the sample pipeline are on the same date and the times are in 24 hour format (not 12 hour with AM/PM), I see no reason why there is any need to convert the two string arguments to Seconds since the Epoch values and perform numeric comparisons on those converted Seconds since the Epoch values instead of comparing the input values as strings. Furthermore, performing the string comparisons should be faster than converting to strings to integers and then performing a numeric comparison. However, if the start and end timestamps are on different dates, the comments made by vgersh99 and jim mcnamara are absolutely correct.
I have never heard of the unpigz command used at the head of the pipeline being used and I have no idea how the files matched by the pattern nginx* are named nor how big they are. If there are lots of huge compressed files and unpigz is being used to produce uncompressed text from all of those files to be used as input to awk (or if unpigz is a typo and the intended utility at the start of the pipeline was gunzip -c or, equivalently, zcat) and if part of the name matched by the asterisk in nginx* encodes the dates contained in that file, the way to speed up your pipeline might well be to select a smaller set of files to uncompress instead of trying to speed up the awk code when the slow part of your pipeline may well be the time needed to uncompress unneeded data and to then filter that unneeded data in your awk code.
First, like you've said about comparing string should be faster than coverting to time for compare later, it's exactly what I think happens and that's why my code is how it is and it's working just fine. I have one directory per day, and I ran awk in the files of just one day, so I think it works because of that.
pigz is a parallel implementation of
gzip, but it's parallel just for compressing files, but it's a little faster than
zcat for descompressing because it uses additional threads for reading, writing, and check calculation (I've read this in one answer in stackoverflow but couldn't refer here because I'm new to the forum). So that's why I use it instead of
zcat. About selecting files instead of using
nginx* that selects all files in the given directory, it's not possible because I can't tell easily what are the content of each file. That's why I thought maybe it would be something I could do with
awk to make it a little faster.