|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
||||
|
||||
|
What's the Diff Between These Two Regexes?
Trying to understand what's happening here, but I cannot figure it out. I'm reading Mastering Regular Expressions, by Friedl, and he uses this as an example of how to grab quoted text: Code:
egrep -o '"[^"]*"' ~/File.txt ...should pull in any quoted phrases. Match a literal double-quote, match anything not a double-quote until you hit the next literal double-quote. But, he says [^"]* can match a newline, thereby returning quoted text even if it crosses lines. If you want to keep it from crossing lines, you should use this: Code:
egrep -o '"[^"\n]*"' ~/File.txt But this is where my head starts to hurt because a star should never fail, right? In other words, if it hits a newline, isn't a newline zero (or more) occurrences of 'not a newline', thereby allowing the regex to keep chugging along? But looking at the difference between what each regex returns, the differences don't seem to have anything to do with newlines. Check out the different results each regex pulls from this snippet: Quote:
Code:
egrep -o '"[^"]*"' ~/File.txt "What uncouth dialect is that?" "The Doric." egrep -o '"[^"\n]*"' ~/File.txt "The Doric." Why does the second regex miss the first quote? The first regex returns about twice as many hits as the second one, and they all appear to be valid, single line quotes. I should also mention that I'm not trying to accomplish anything. My interest is purely academic, and I'm a total noob. GNU grep OS X 10.6.8 Original file is plain vanilla ASCII, each line ends in a newline. |
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
Regexes are greedy, so only on the last line matches the class of no double quotes and no newlines.
Try options -P or -z of grep to match across newlines. B.t.w.: this is an area were the regex functionality differs a bit between grep, awk, Perl, etc. |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Quote:
The reason the regular expression with the bracket expression [^"\n] does not match "What uncouth dialect is that?" is because within the bracket expression the backslash ceases to be a special character; the sequence \n in [^"\n] does not represent a newline character, but a forward slash and an n, two separate characters. Since this translates to any character that is not a quote, backslash, or n, the n in "uncouth" prevents the match from ocurring. For the nitty gritty on bracket expressions, refer to Regular Expressions, from which the following is extracted: Quote:
Alister |
| The Following User Says Thank You to alister For This Useful Post: | ||
sudon't (06-11-2012) | ||
|
#4
|
|||
|
|||
|
@alister: My bad! Greedyness has indeed nothing to do with this problem. Thanks for putting that right.
|
| Sponsored Links | |
|
|
#5
|
||||
|
||||
|
@alister, it may be interesting to add that like grep, sed is also line oriented, but that it does have the ability to match \n which can occur through the use of N or H commands. But indeed \n loses its meaning withing square brackets. \n is not part of POSIX regular expression, but for this POSIX sed has an addition:
Quote:
POSIX awk goes even further, as it extends POSIX regular expressions, to include the C-language extensions and they are valid within bracket extensions... Quote:
|
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
Quote:
ex also has its own flavor. Even within the confines of the POSIX standard, there are quite a few RE flavors to master: basic, extended, sed, AWK, and also ex. Add to that proprietary extensions by implementations of the standard tools and the dynamic programming languages and you have quite a melange. Regards, Alister |
| Sponsored Links | |
|
|
#7
|
||||
|
||||
|
Quote:
It even turns out that there are different greps, who behave differently! A lifetime of Mac OS use has not prepared me for unix. |
| Sponsored Links | ||
|
![]() |
| Tags |
| egrep, regex, regular expressions |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Print Lines between two regexes | polsum | Shell Programming and Scripting | 4 | 01-11-2012 12:24 PM |
| serach diff filename in diff location using shell scripting | Lucky123 | Shell Programming and Scripting | 1 | 11-25-2011 02:44 AM |
| .procmailrc and uudeview (put attachments from diff senders to diff folders) | optik77 | Shell Programming and Scripting | 1 | 03-27-2011 06:57 AM |
| Simulate SVN diff using plain diff | ackbarr | Shell Programming and Scripting | 3 | 02-07-2009 12:01 PM |
| diff 2 files; output diff's to 3rd file | blt123 | Shell Programming and Scripting | 2 | 05-28-2002 11:29 AM |
|
|