Thank you so much for your time. I will give this a try this evening after business hours. I will also cut the mtime down to 1 so it only processes 24 zipped files at once for the test run.
Sorry for the small snippet of sample data. This is a production firewall that is generating the raw data so I was trying to be careful to only include non-identifying data in the sample.
As for the output to the files, I can tweak that to get exactly what I want if any of the fields are incorrect.
I believe I am beginning to understand how this script works at a basic level on its interaction with the files. I do think your correct that the majority of the time is spent repeatedly un-compressing and re-compressing data unnecessarily.
While the script runs I will have a second connection open to the box running the top command to watch the processor and memory usage to determine the load being placed on the system before expanding it to additional files.
I will let you know how it runs. Your script is far more elegant than my cobbled together one. It just goes to show that just because "my" way works doesn't mean it is the best way to get things done.
---------- Post updated at 11:00 PM ---------- Previous update was at 02:26 PM ----------
The script you provided cuts the time by over 50%!
I do have a few tweaks to do to get the output correct but that is something I can easily handle.
I did do a couple of tests
1. Copied the files down to a local directory and changed the script to search that directory. /home/kenneth.cramer/temp To see if the files being located on the ZFS was having an impact on the speed. I did not see an improvement in the speed of the script decompressing the file.
2. Copied the files to a local directory and unzipped them first. The speed was improved but the time taken to copy and uncompress the file made it balance out. There is no real gain unless I create a timed script to copy down the files and uncompress them before I need to run the script. So that approach is impractical.
3. Tested the size of the compressed vs the uncompressed files. Each file represents an hours worth of data. Compressed each files averages 54 MB. Uncompressed they average 1 gig per file. 7 days with 24 files per day is 168 files. So taking an hour or so to sift through 168 gig of data is not bad for time. The shear size also makes it impractical to copy down the files and uncompress them just to do these few operations on them.
Thank you for all the help. I believe I can manage the last few tweaks from here and get the output I need in the format I need.
Thank you again for all your help.