You obviously are carrying out complex operations. I'm not going to be able to simplify that code. Someone else might be able to. But I would bet it's those arrays causing the problem, running out of memory. I would suggest mawk might be innocent.
Just as a diagnostic, maybe use length function on the arrays at some point, to see how big they are. You could set it to print before it crashes, since the error message tells you where it runs out. If you find out the sizes, it would be interesting if you posted.
There are also ways of monitoring program memory usage, external to awk. That would be worth doing.
Quote:
splitting file is time consuming
How long does it take to run? From my experience, doing these kind of industrial-strength operations in one pass is often problematic. I typically break it into a few steps. It could well take longer, as you suggest, and that's a negative. But I think it's usually a better approach, and the time difference does not have to be that large. Also, it's easier to verify, since each step can be validated, instead of trying to do everything in one pass.