Hello. I just found out about awk, and it appears that this could handle the problem I'm having right now.
I first stumbled on the thread
How to extract first and last line of different record from a file, and that problem is almost similar to mine.
In my case, an ASCII file will contain the following fields, among many others:
record date time code
----------------------------
dd 1-3-2006 1:00:20 AM 2
aa 1-2-2006 8:34:21 AM 1
bb 1-2-2006 7:34:21 AM 1
aa 1-2-2006 10:30:22 AM 2
cc 1-2-2006 10:34:21 AM 1
aa 1-2-2006 15:30:22 AM 3
dd 1-2-2006 8:04:11 PM 1
aa 1-2-2006 6:24:44 PM 1
bb 1-2-2006 10:30:22 AM 2
aa 1-2-2006 11:03:19 AM 2
aa 1-2-2006 16:03:19 AM 3
bb 1-2-2006 12:03:19 PM 2
cc 1-2-2006 4:04:11 PM 1
dd 1-3-2006 12:20:00 AM 2
bb 1-2-2006 7:24:44 PM 1
dd 1-3-2006 2:20:02 AM 1
....
.
.
At the end of any given interval of days, I am trying to output
aa <total time>
bb <total time>
cc <total time>
dd <total time>
Where total total time elapsed from matching code 1 are deducted by total time elapsed from matching codes 2 and 3, and so forth.
(example from above sample, sorted to display matching records)
aa 1-2-2006 8:34:21 AM 1
aa 1-2-2006 6:24:44 PM 1 (total time 1: X)
aa 1-2-2006 10:30:22 AM 2
aa 1-2-2006 11:03:19 AM 2 (total time 2: Y)
aa 1-2-2006 15:30:22 AM 3
aa 1-2-2006 16:03:19 AM 3 (total time 3: Z)
total time = X-(Y+Z)
I'm hoping to do it with Python but I've seen some awesome awk scripts and I'm wondering if it could be done with awk instead.
I'm also confounded with computing time past midnight. (ie record dd). If code records don't match, total time is not computed but is marked in the output as
xx "data error"