Help in faster and quicker grepping


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Help in faster and quicker grepping
# 8  
Old 06-03-2013
hi ,

thanks for the help Smilie i will let you know if i need any help.

thnks,
senthil
# 9  
Old 06-03-2013
If your string is not a regex but a fixed string, try grep -F, switching off expensive regex matching.
# 10  
Old 06-03-2013
No more suggestions.
E.g. grep '^FLOW' isn't noticeable faster - a search takes about the same time as to find the next line.
And an RE search of a simple "string" has zero overhead compared to plain search.
An 8GB file should take about 2 minutes - this is fast.
Everything else - sed, awk, perl is slower.
Such amount of log data should be written to a DB file; at least a text log file should be more often rotated!
# 11  
Old 06-03-2013
I disagree -
Solaris 10 M4000, ksh, ^ is faster, files are 50MB ~510000 lines +/- 40 lines between them:

contents of t.shl:

Code:
cd $BANNER_HOME/logs
time grep -c '^2013-05-22 11:04' uzpplpl_ng02cprd.log.54
time grep -c '2013-05-22 11:04'  uzpplpl_ng02cprd.log.56

Results ( I ran it twice to show the effect of filesystem and disk controller caching):

Code:
$> ./t.shl
791

real    0m1.86s
user    0m0.23s
sys     0m0.29s
774

real    0m2.05s
user    0m1.07s
sys     0m0.36s

appworx> ./t.shl
791

real    0m0.39s
user    0m0.20s
sys     0m0.18s
774

real    0m1.15s
user    0m0.93s
sys     0m0.21s

Note user mode times. I know the OP is on a different box, so this may not be a fair comparison. However, expand the (red) user times by a factor of 8GB/50MB
~(20*8) gives 160
so:
Code:
 .93 * 160 = ~148
 .20 * 160 =   ~32
diff                  114

two points:

If you ran a times comparison of '^FLOW' vs 'FLOW' (in that order) on the same file your results were confounded by caching.

The user time is independent of caching and reflective of the work a regex does.

Henry Spencer wrote a white paper onthis kind of thing, I cannot find it so I cannot cite it.

YMMV.
# 12  
Old 06-03-2013
Quote:
Originally Posted by MadeInGermany
No more suggestions.
E.g. grep '^FLOW' isn't noticeable faster - a search takes about the same time as to find the next line.
And an RE search of a simple "string" has zero overhead compared to plain search.
An 8GB file should take about 2 minutes - this is fast.
Everything else - sed, awk, perl is slower.
No offense intended, but all of those unqualified statements are worthless. I have done some work with NFA (cached to DFA) regular expression engines, and the nature and quality of implementations varies massively.

While jim's implementation performs better with an anchor, a GNU grep 2.5.1 does much worse. It takes more than twice as long. (The tests were repeated multiple times in differing order on obsolete hardware and there was never a discrepancy.)
Code:
$ yes 'FLOWWWWWWWW' | head -n1000000 | time -p grep -c 'FLOW'
1000000
real    1.84
user    0.71
sys     0.07
$ yes 'FLOWWWWWWWW' | head -n1000000 | time -p grep -c '^FLOW'
1000000
real    4.83
user    3.70
sys     0.10

As an aside, some implementations will silently optimize depending on the contents of the pattern. A BSD example from OpenBSD :: grep.c:
Code:
	for (i = 0; i < patterns; ++i) {
		/* Check if cheating is allowed (always is for fgrep). */
#ifndef SMALL
		if (Fflag) {
			fgrepcomp(&fg_pattern[i], pattern[i]);
		} else
#endif
		{
			if (fastcomp(&fg_pattern[i], pattern[i])) {
				/* Fall back to full regex library */
				c = regcomp(&r_pattern[i], pattern[i], cflags);

My point, regular expression performance is highly implementation dependent and unqualified statements are seldom valid.

Regards,
Alister
# 13  
Old 06-03-2013
No offense, on an 8-gig file, optimizing may be pointless, disk speed may be the limiting factor...
# 14  
Old 06-03-2013
I thought GNU grep was the crème de la crème of speed.

In my system (GNU grep 2.6.3) it behaves better using an anchor than without it.

The original author (who does not maintain it any longer) has thoroughly defended it. Not sure if after all these years it has been beaten by something else.
This User Gave Thanks to verdepollo For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Which system is faster?

i'm trying to decide if to move operations from one of these hosts to the other. but i cant decide which one of them is the most powerful. each host has 8 cpus. HOSTA processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU ... (6 Replies)
Discussion started by: SkySmart
6 Replies

2. UNIX for Advanced & Expert Users

any quicker way to list disc usage by users?

Hi: it takes a long time for "du -sh list_of_users" to give you the output. Is there a quicker way to get this info? Thanks! N.B. Phil (4 Replies)
Discussion started by: phil518
4 Replies

3. UNIX for Dummies Questions & Answers

Why is RAID0 faster?

I have read anecdotes about people installing RAID0 (RAID - Wikipedia, the free encyclopedia) on some of their machines because it gives a performance boost. Because bandwidth on the motherboard is limited, can someone explain exactly why it should be faster? (7 Replies)
Discussion started by: figaro
7 Replies

4. UNIX for Dummies Questions & Answers

Which command will be faster? y?

i)wc -c/etc/passwd|awk'{print $1}' ii)ls -al/etc/passwd|awk'{print $5}' (4 Replies)
Discussion started by: karthi_g
4 Replies

5. Shell Programming and Scripting

Which is faster AWK or CUT

If I just wanted to get andred08 from the following ldap dn would I be best to use AWK or CUT? uid=andred08,ou=People,o=example,dc=com It doesn't make a difference if it's just one ldap search I am getting it from but when there's a couple of hundred people in the group that retruns all... (10 Replies)
Discussion started by: dopple
10 Replies

6. UNIX for Dummies Questions & Answers

How to grep faster ?

Hi I have to grep for 2000 strings in a file one after the other.Say the file name is Snxx.out which has these strings. I have to search for all the strings in the file Snxx.out one after the other. What is the fastest way to do it ?? Note:The current grep process is taking lot of time per... (7 Replies)
Discussion started by: preethgideon
7 Replies

7. Shell Programming and Scripting

Faster then cp ?

Hi , I need to copy every day about 35GB of files from one file system to another. Im using the cp command and its toke me about 25 min. I also tried to use dd command but its toke much more. Is there better option ? Regards. (6 Replies)
Discussion started by: yoavbe
6 Replies

8. UNIX for Advanced & Expert Users

faster way to loop?

Sample Log file IP.address Date&TimeStamp GET/POST URL ETC 123.45.67.89 MMDDYYYYHHMM GET myURL http://ABC.com 123.45.67.90 MMDDYYYYHHMM GET myURL http://XYZ.com I have a very huge web server log file (about 1.3GB) that contains entries like the one above. I need to get the last entries of... (9 Replies)
Discussion started by: tads98
9 Replies

9. IP Networking

Mandrake should be faster.

For some reason 8.1 Mandrake Linux seems much slower than Windows 2000 with my cable modem. DSL reports test says they conferable speed with Windows2 though. This is consistant slow with both of my boxes, at the same time. Linux used to be faster, but not with Mandrake. Any way to fix this? (17 Replies)
Discussion started by: lancest
17 Replies
Login or Register to Ask a Question