penalty for case insensitive grep


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users penalty for case insensitive grep
# 1  
Old 02-24-2011
penalty for case insensitive grep

I just found out there were a big performance penalty for case insensitive "grep" on big files.

It would be understandable, except that the penalty seems to be exaggerated out of proportion.

A real example, if I only grep a single letter "V" (or "v") , without "-i", on a big file, (file is doctored so only a few "V" or "v" exist). It takes 0.157 user time to finish.

Then I grep the same letter "V", but with "-i" option, it takes 32.0 user time to finish. That is 200 times longer than without "-i" for a single character.

Can someone provide some insight why this is the case?

Thanks.

NB Phil
# 2  
Old 02-24-2011
probably because it has to convert the case of every line. try this and see what happens.
Code:
grep -e V -e v ...

This User Gave Thanks to frank_rizzo For This Post:
# 3  
Old 02-24-2011
Quote:
Originally Posted by phil518
Can someone provide some insight why this is the case?
Is this GNU grep? The GNU utilities, unusually, try to handle your character set as appropriate. This means when you tell it to be case insensitive, that busts out some pretty heavy-duty routines in order to do so properly.
This User Gave Thanks to Corona688 For This Post:
# 4  
Old 02-24-2011
Quote:
Originally Posted by Corona688
Is this GNU grep? The GNU utilities, unusually, try to handle your character set as appropriate. This means when you tell it to be case insensitive, that busts out some pretty heavy-duty routines in order to do so properly.
It is GNU grep. Wow, 200 times longer for case insensitive grep. Those routines sound very heavy.

Is this a known fact?

is x200 the upper bound of the performance penalty, regardless the length of the input string? (thinking n-character long string will have 2^n case permutations).
# 5  
Old 02-25-2011
Again, it's the fault of I18N. Set LC_ALL to C, you will see that the -i run is only twice as long.
This User Gave Thanks to binlib For This Post:
# 6  
Old 02-25-2011
This problem has cropped up a few times recently.

The default "locale" for GNU utility programs including "grep" and "sort" has changed to "UTF" which means that mapping one "character" can take more than one character. If your file definitely does not contain "UTF" characters you can massively improve performance by changing your locale back to the basic value of "C".

To check how your system is now, type and check the output from this enquiry:
Code:
locale

This User Gave Thanks to methyl For This Post:
# 7  
Old 02-25-2011
Quote:
Originally Posted by phil518
It is GNU grep. Wow, 200 times longer for case insensitive grep. Those routines sound very heavy.
Imagine every possible language UTF8 supports, including ones where a letter's "case" has strict and complicated rules. All of that's what you're asking grep to check for when doing case-insensitive on UTF8. Smilie
This User Gave Thanks to Corona688 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Using awk to search case insensitive

Hello , Using the below scrip to search a string in a file , by case-insensitively Please assist on using the toupper() as getting error !. #!/usr/bin/ksh set -x curr_dir=`pwd` file_ctr=0 printf "\n Reviewing the output file from the directory: %s \n\n" $curr_dir ls -latr ... (4 Replies)
Discussion started by: Siva SQL
4 Replies

2. UNIX for Dummies Questions & Answers

Using FIND with case insensitive search

I am using HP-Unix B.11.31. Question: How to do the case insensitive search using FIND? Example: I would like list the files with extension of *.SQL & *.sql. When I try with command find . -type f -name *.sql, it does not lists file with *.SQL. (5 Replies)
Discussion started by: Siva SQL
5 Replies

3. Shell Programming and Scripting

Case Insensitive search

Hey , i am trying to do a search for the certain books , and im trying to make it case insensitive. what i have come up with so far is this : Database.txt RETARDED MONKEY:RACHEAL ABRAHAML:30:30:20 GOLD:FATIN:23.20:12:3 STUPID:JERLYN:20:40:3 echo -n "Title: " read Title echo -n... (3 Replies)
Discussion started by: gregarion
3 Replies

4. Shell Programming and Scripting

case-insensitive search with AWK

Hi All, How we can perform case-insensitive search with AWK.:rolleyes: regards, Sam (11 Replies)
Discussion started by: sam25
11 Replies

5. Shell Programming and Scripting

case-insensitive if on substring

I'd like to print a line if a substring is matched in a case insensitive manner something like do a case insensitive search for ABCD as a substring: awk '{ if (substr($1,1,4) == "") print $1 }' infile > outfile I'm not certain how to make the syntax work??? Thanks (4 Replies)
Discussion started by: dcfargo
4 Replies

6. Shell Programming and Scripting

How to make sed case insensitive

I need to remove a pattern say, ABCD whether it is in uppercase or lowercase from a string. How to do it using SED? for example ABCDEF should output to EF abcdEF should also output to EF (2 Replies)
Discussion started by: vickylife
2 Replies

7. Shell Programming and Scripting

case insensitive

hi everyone, I need to do the following thing in a case insesitive mode sed 's/work/job/g' filename since work could appear in different form as Work WORK WorK wORK,.... I was wondering if i could do a case insensitive search of a word. thanks in advance, :) (4 Replies)
Discussion started by: ROOZ
4 Replies

8. Shell Programming and Scripting

how to make case insensitive checks????

Hi, I have tried to make the conditions similar to the below one's, perhaps, I am not sure if there are any more way's to do that???? if ) ]] echo "Whatever" fi (5 Replies)
Discussion started by: hitmansilentass
5 Replies

9. Shell Programming and Scripting

awk case-insensitive

can I tell awk to be case insensitive for one operation without setting the ignorecase value ? thanks, Steffen (7 Replies)
Discussion started by: forever_49ers
7 Replies

10. UNIX for Dummies Questions & Answers

case insensitive locate

How can I do a case insensitive locate? (3 Replies)
Discussion started by: davis.ml
3 Replies
Login or Register to Ask a Question