The main reason is that your original had the following logic:-
- Start a process to read a line from the input
- Start a process to perform the cut *1
- Do a compare looking for value 27
- If we match, start a process for another cut *2
- Display the result
- Start from top to read next line
For a 400 line file, you are forcing 400
cut processes to be run for
*1 and another set for the
cut in
*2
Depending on your shell, you might start 400
read processes, plus 400
echo statements in
*1 and more for
*2 for each line matching value 27.
All of this generates vast amounts of work just in the overheads. I'm not very good with
awk myself but it all runs in a single process so is excellent if you can invest the time to get into the syntax. My variation removed many of these processes, but probably could still be improved. Every process launch requires memory to be allocated, perhaps logs to be written, paging/swap space to be altered etc, so before it actually does anything, there is a significant processing overhead - and then there may be end-of-process overheads too.
The use of the
cat at the front makes it more readable for some, although I'm sure purists may not agree. I suppose it depends how you describe your logic in your mind before writing code. I just tried to follow your logic with a few tweaks so it doesn't become too different and need documentation or lots of work on your part to decipher, but it's the difference between thinking:-
- Working on this file, I will do these things to it, versus
- Do these things on this input file
I hope that this clarifies and helps,
Robin
Liverpool/Blackburn
UK