Help in faster and quicker grepping

06-03-2013

Registered User

92, 0

Join Date: Dec 2010

Last Activity: 18 September 2015, 7:09 AM EDT

Posts: 92

Thanks Given: 1

Thanked 0 Times in 0 Posts

hi ,

thanks for the help

i will let you know if i need any help.

thnks,
senthil

senkerth

View Public Profile for senkerth

Find all posts by senkerth

06-03-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

If your string is not a regex but a fixed string, try grep -F, switching off expensive regex matching.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

06-03-2013

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

No more suggestions.
E.g. grep '^FLOW' isn't noticeable faster - a search takes about the same time as to find the next line.
And an RE search of a simple "string" has zero overhead compared to plain search.
An 8GB file should take about 2 minutes - this is fast.
Everything else - sed, awk, perl is slower.
Such amount of log data should be written to a DB file; at least a text log file should be more often rotated!

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

06-03-2013

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

I disagree -
Solaris 10 M4000, ksh, ^ is faster, files are 50MB ~510000 lines +/- 40 lines between them:

contents of t.shl:

Code:

cd $BANNER_HOME/logs
time grep -c '^2013-05-22 11:04' uzpplpl_ng02cprd.log.54
time grep -c '2013-05-22 11:04'  uzpplpl_ng02cprd.log.56

Results ( I ran it twice to show the effect of filesystem and disk controller caching):

Code:

$> ./t.shl
791

real    0m1.86s
user    0m0.23s
sys     0m0.29s
774

real    0m2.05s
user    0m1.07s
sys     0m0.36s

appworx> ./t.shl
791

real    0m0.39s
user    0m0.20s
sys     0m0.18s
774

real    0m1.15s
user    0m0.93s
sys     0m0.21s

Note user mode times. I know the OP is on a different box, so this may not be a fair comparison. However, expand the (red) user times by a factor of 8GB/50MB
~(20*8) gives 160
so:

Code:

 .93 * 160 = ~148
 .20 * 160 =   ~32
diff                  114

two points:

If you ran a times comparison of '^FLOW' vs 'FLOW' (in that order) on the same file your results were confounded by caching.

The user time is independent of caching and reflective of the work a regex does.

Henry Spencer wrote a white paper onthis kind of thing, I cannot find it so I cannot cite it.

YMMV.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

06-03-2013

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by MadeInGermany

No more suggestions.
E.g. grep '^FLOW' isn't noticeable faster - a search takes about the same time as to find the next line.
And an RE search of a simple "string" has zero overhead compared to plain search.
An 8GB file should take about 2 minutes - this is fast.
Everything else - sed, awk, perl is slower.

No offense intended, but all of those unqualified statements are worthless. I have done some work with NFA (cached to DFA) regular expression engines, and the nature and quality of implementations varies massively.

While jim's implementation performs better with an anchor, a GNU grep 2.5.1 does much worse. It takes more than twice as long. (The tests were repeated multiple times in differing order on obsolete hardware and there was never a discrepancy.)

Code:

$ yes 'FLOWWWWWWWW' | head -n1000000 | time -p grep -c 'FLOW'
1000000
real    1.84
user    0.71
sys     0.07
$ yes 'FLOWWWWWWWW' | head -n1000000 | time -p grep -c '^FLOW'
1000000
real    4.83
user    3.70
sys     0.10

As an aside, some implementations will silently optimize depending on the contents of the pattern. A BSD example from OpenBSD :: grep.c:

Code:

	for (i = 0; i < patterns; ++i) {
		/* Check if cheating is allowed (always is for fgrep). */
#ifndef SMALL
		if (Fflag) {
			fgrepcomp(&fg_pattern[i], pattern[i]);
		} else
#endif
		{
			if (fastcomp(&fg_pattern[i], pattern[i])) {
				/* Fall back to full regex library */
				c = regcomp(&r_pattern[i], pattern[i], cflags);

My point, regular expression performance is highly implementation dependent and unqualified statements are seldom valid.

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

06-03-2013

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

No offense, on an 8-gig file, optimizing may be pointless, disk speed may be the limiting factor...

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

06-03-2013

Registered User

729, 105

Join Date: Mar 2010

Last Activity: 26 July 2020, 1:55 PM EDT

Location: Mexico

Posts: 729

Thanks Given: 13

Thanked 105 Times in 102 Posts

I thought GNU grep was the cr�me de la cr�me of speed.

In my system (GNU grep 2.6.3) it behaves better using an anchor than without it.

The original author (who does not maintain it any longer) has thoroughly defended it. Not sure if after all these years it has been beaten by something else.

This User Gave Thanks to verdepollo For This Post:

verdepollo

View Public Profile for verdepollo

Find all posts by verdepollo

UNIX for Advanced & Expert Users

Help in faster and quicker grepping

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Which system is faster?

Discussion started by: SkySmart

2. UNIX for Advanced & Expert Users

any quicker way to list disc usage by users?

Discussion started by: phil518

3. UNIX for Dummies Questions & Answers

Why is RAID0 faster?

Discussion started by: figaro

4. UNIX for Dummies Questions & Answers

Which command will be faster? y?

Discussion started by: karthi_g

5. Shell Programming and Scripting

Which is faster AWK or CUT

Discussion started by: dopple

6. UNIX for Dummies Questions & Answers

How to grep faster ?

Discussion started by: preethgideon

7. Shell Programming and Scripting

Faster then cp ?

Discussion started by: yoavbe

8. UNIX for Advanced & Expert Users

faster way to loop?

Discussion started by: tads98

9. IP Networking

Mandrake should be faster.

Discussion started by: lancest