If the patterns are always fixed strings the usage of
fgrep or
grep -F may result in a HUGE Performance Boost.
If possible, run fgrep without -i. That'll get you another Performance Boost and also put LANG=C before the fgrep command, which should speed up things a little too.
Sidenote
There was a scripting task request in the german linux forum(
www.linuxforen.de) here:
Linuxforen.de Thread regarding fgrep
The task was similar. The big file had 5.000.000 lines (300 MB). The smaller file had 100.000 lines (3 MB). The results:
- Winner fgrep: 7 Seconds
- extremeley optimized lua script: 8,6 Seconds
- awk-Script: ~97 hours (obviously the great awk-hackers here would get a whole lot more out of awk)
- regular grep: stopped after 45 Minutes runtime and 12 GB RAM-Usage
I think the situation is not so far away from this situation here. I suppose the smaller file here is a lot smaller, so the task will not be as cpu-intensive as the other one but this task has a lot more to read(5-6 GB as said by the nikhil).