Quote:
Originally Posted by
r3d3
@Chubler_XL, @Don Cragun, thank you very much for your help. Both of these scripts worked on the sample I posted. When I tried it on the actual text file I have (about 600K lines, around 300-400 lines between start and end regexs), the scripts are taking a lot of time. Do you have any suggestions on reducing the process time?
What OS are you using? (I.e., what is the output from
uname -a?)
How many different departments are in huge_file.txt?
Is there any chance that start_regexp and end_regexp occur in unmatched pairs? (My code will copy a start_regexp line found between a start_regext and the next end_regexp without restarting a copy, and will ignore an end_regexp if there was no start_regexp since the last seen end_regexp.)
Will there ever be a sequence of lines between the start and end lines that does not contain a
Department=value line?
Answers to the above questions could be used to improve speed with an increased chance of things going wrong if the input data is malformed for some reason.
Expanding on what Chubler_XL said: If your input and output files are on different disk controllers, that might improve performance. If your input files are on one filesystem on a disk drive and your output files are on a different filesystem on the same drive, that will be worse than having the input and output files on the same filesystem.