|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Advanced & Expert Users Expert-to-Expert. Learn advanced UNIX, UNIX commands, Linux, Operating Systems, System Administration, Programming, Shell, Shell Scripts, Solaris, Linux, HP-UX, AIX, OS X, BSD. |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
penalty for case insensitive grep
I just found out there were a big performance penalty for case insensitive "grep" on big files.
It would be understandable, except that the penalty seems to be exaggerated out of proportion. A real example, if I only grep a single letter "V" (or "v") , without "-i", on a big file, (file is doctored so only a few "V" or "v" exist). It takes 0.157 user time to finish. Then I grep the same letter "V", but with "-i" option, it takes 32.0 user time to finish. That is 200 times longer than without "-i" for a single character. Can someone provide some insight why this is the case? Thanks. NB Phil |
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
probably because it has to convert the case of every line. try this and see what happens. Code:
grep -e V -e v ... |
| The Following User Says Thank You to frank_rizzo For This Useful Post: | ||
phil518 (02-25-2011) | ||
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Is this GNU grep? The GNU utilities, unusually, try to handle your character set as appropriate. This means when you tell it to be case insensitive, that busts out some pretty heavy-duty routines in order to do so properly.
|
| The Following User Says Thank You to Corona688 For This Useful Post: | ||
phil518 (02-25-2011) | ||
|
#4
|
|||
|
|||
|
Quote:
Is this a known fact? is x200 the upper bound of the performance penalty, regardless the length of the input string? (thinking n-character long string will have 2^n case permutations). |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Again, it's the fault of I18N. Set LC_ALL to C, you will see that the -i run is only twice as long.
|
| The Following User Says Thank You to binlib For This Useful Post: | ||
phil518 (02-25-2011) | ||
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
This problem has cropped up a few times recently. The default "locale" for GNU utility programs including "grep" and "sort" has changed to "UTF" which means that mapping one "character" can take more than one character. If your file definitely does not contain "UTF" characters you can massively improve performance by changing your locale back to the basic value of "C". To check how your system is now, type and check the output from this enquiry: Code:
locale |
| The Following User Says Thank You to methyl For This Useful Post: | ||
phil518 (02-25-2011) | ||
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Quote:
![]() |
| The Following User Says Thank You to Corona688 For This Useful Post: | ||
phil518 (02-25-2011) | ||
| Sponsored Links | ||
|
![]() |
| Tags |
| grep, grep case insensitive penalty |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Using sed for case insensitive search | sbhuvana20 | UNIX for Dummies Questions & Answers | 7 | 02-19-2011 07:05 AM |
| Case Insensitive search | gregarion | Shell Programming and Scripting | 3 | 01-19-2010 10:31 AM |
| How to make sed case insensitive | vickylife | Shell Programming and Scripting | 2 | 01-06-2009 04:57 AM |
| case insensitive | ROOZ | Shell Programming and Scripting | 4 | 03-11-2008 12:40 PM |
| awk case-insensitive | forever_49ers | Shell Programming and Scripting | 7 | 09-13-2006 05:15 PM |
|
|