Sponsored Content
Top Forums UNIX for Dummies Questions & Answers What's the Diff Between These Two Regexes? Post 302653849 by sudon't on Sunday 10th of June 2012 07:33:36 PM
Old 06-10-2012
What's the Diff Between These Two Regexes?

Trying to understand what's happening here, but I cannot figure it out.
I'm reading Mastering Regular Expressions, by Friedl, and he uses this as an example of how to grab quoted text:
Code:
egrep -o '"[^"]*"' ~/File.txt

...should pull in any quoted phrases. Match a literal double-quote, match anything not a double-quote until you hit the next literal double-quote.
But, he says [^"]* can match a newline, thereby returning quoted text even if it crosses lines. If you want to keep it from crossing lines, you should use this:
Code:
egrep -o '"[^"\n]*"' ~/File.txt

But this is where my head starts to hurt because a star should never fail, right? In other words, if it hits a newline, isn't a newline zero (or more) occurrences of 'not a newline', thereby allowing the regex to keep chugging along?
But looking at the difference between what each regex returns, the differences don't seem to have anything to do with newlines. Check out the different results each regex pulls from this snippet:
Quote:
those with whom he was most pleased. Having asked one Zeno, upon his
using some far-fetched phrases, "What uncouth dialect is that?" he
replied, "The Doric." For this answer he banished him to Cinara [354],
Code:
egrep -o '"[^"]*"' ~/File.txt
"What uncouth dialect is that?"
"The Doric."

egrep -o '"[^"\n]*"' ~/File.txt
"The Doric."

Why does the second regex miss the first quote? The first regex returns about twice as many hits as the second one, and they all appear to be valid, single line quotes.
I should also mention that I'm not trying to accomplish anything. My interest is purely academic, and I'm a total noob.

GNU grep
OS X 10.6.8
Original file is plain vanilla ASCII, each line ends in a newline.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

diff 2 files; output diff's to 3rd file

Hello, I want to compare two files. All records in file 2 that are not in file 1 should be output to file 3. For example: file 1 123 1234 123456 file 2 123 2345 23456 file 3 should have 2345 23456 I have looked at diff, bdiff, cmp, comm, diff3 without any luck! (2 Replies)
Discussion started by: blt123
2 Replies

2. UNIX for Dummies Questions & Answers

diff

hi all, i want to do this shell script. create a script that will check the transferred file vs. orig file. 1. diff the file1 and file2 2. if difference found, retain the original file and email to netcracker team. 3. if no difference found, delete the previous file and retain... (3 Replies)
Discussion started by: tungaw2004
3 Replies

3. Shell Programming and Scripting

Simulate SVN diff using plain diff

Hi, svn diff does not work very well with 2 local folders, so I am trying to do this diff using diff locally. since there's a bunch of meta files in an svn directory, I want to do a diff that excludes everything EXCEPT *.java files. there seems to be only an --exclude option, so I'm not sure... (3 Replies)
Discussion started by: ackbarr
3 Replies

4. UNIX for Dummies Questions & Answers

Using diff

is there any way to make the diff function compare 1 folder to another instead of just file to file? also, can binary files be compared? (2 Replies)
Discussion started by: puzzler
2 Replies

5. Shell Programming and Scripting

diff

OS : SuSE Linux 10 (zOS) I create two files test1 and test2 /home/me # more test1 1 2 3 4 5 /home/me # more test2 1 2 3 I entered the following command on cronjob and its work diff /home/me/test1 /home/me/test2 > /home/me/test3 its created test3. But the output of test3 is as... (1 Reply)
Discussion started by: sdhn1900
1 Replies

6. Shell Programming and Scripting

.procmailrc and uudeview (put attachments from diff senders to diff folders)

Moderator, please, delete this topic (1 Reply)
Discussion started by: optik77
1 Replies

7. Shell Programming and Scripting

serach diff filename in diff location using shell scripting

Hi, I am new to shell scripting. please help me to find out the solution. I need a script where we need to read the text file(consists of all file names) and get the file names one by one and append the date suffix for each file name as 'yyyymmdd' . Then search each file if exists... (1 Reply)
Discussion started by: Lucky123
1 Replies

8. Shell Programming and Scripting

Print Lines between two regexes

Hi I have a file like this I need to delete all the lines between SQ and // and not the lines containing them. So the desired output should be I tried by using flip-flop operator perl -wlne 'print if !(/SQ/../\/\//)'But its not printing the lines containing regexes. Thanks in advance:b: (4 Replies)
Discussion started by: polsum
4 Replies

9. Shell Programming and Scripting

Diff 3 files, but diff only their 2nd column

Guys i have 3 files, but i want to compare and diff only the 2nd column path=`/home/whois/doms` for i in `cat domain.tx` do whois $i| sed -n '/Registry Registrant ID:/,/Registrant Email:/p' > $path/$i.registrant whois $i| sed -n '/Registry Admin ID:/,/Admin Email:/p' > $path/$i.admin... (10 Replies)
Discussion started by: kenshinhimura
10 Replies

10. Shell Programming and Scripting

Regexes for three column data to create a dictionary

I am working on a multilingual dictionary and I have data in three columns. The data structure can be word=word=gloss or word word=word word=gloss gloss = acts as a delimiter The number of words separated by the delimiter can be up to 8 or 10. The structure is well defined in the sense... (6 Replies)
Discussion started by: gimley
6 Replies
GREP(1) 						      General Commands Manual							   GREP(1)

NAME
grep, egrep, fgrep - search a file for a pattern SYNOPSIS
grep [ option ] ... expression [ file ] ... egrep [ option ] ... [ expression ] [ file ] ... fgrep [ option ] ... [ strings ] [ file ] DESCRIPTION
Commands of the grep family search the input files (standard input default) for lines matching a pattern. Normally, each line found is copied to the standard output. Grep patterns are limited regular expressions in the style of ex(1); it uses a compact nondeterministic algorithm. Egrep patterns are full regular expressions; it uses a fast deterministic algorithm that sometimes needs exponential space. Fgrep patterns are fixed strings; it is fast and compact. The following options are recognized. -v All lines but those matching are printed. -x (Exact) only lines matched in their entirety are printed (fgrep only). -c Only a count of matching lines is printed. -l The names of files with matching lines are listed (once) separated by newlines. -n Each line is preceded by its relative line number in the file. -b Each line is preceded by the block number on which it was found. This is sometimes useful in locating disk block numbers by con- text. -i The case of letters is ignored in making comparisons -- that is, upper and lower case are considered identical. This applies to grep and fgrep only. -s Silent mode. Nothing is printed (except error messages). This is useful for checking the error status. -w The expression is searched for as a word (as if surrounded by `<' and `>', see ex(1).) (grep only) -e expression Same as a simple expression argument, but useful when the expression begins with a -. -f file The regular expression (egrep) or string list (fgrep) is taken from the file. In all cases the file name is shown if there is more than one input file. Care should be taken when using the characters $ * [ ^ | ( ) and in the expression as they are also meaningful to the Shell. It is safest to enclose the entire expression argument in single quotes ' '. Fgrep searches for lines that contain one of the (newline-separated) strings. Egrep accepts extended regular expressions. In the following description `character' excludes newline: A followed by a single character other than newline matches that character. The character ^ matches the beginning of a line. The character $ matches the end of a line. A . (period) matches any character. A single character not otherwise endowed with special meaning matches that character. A string enclosed in brackets [] matches any single character from the string. Ranges of ASCII character codes may be abbreviated as in `a-z0-9'. A ] may occur only as the first character of the string. A literal - must be placed where it can't be mistaken as a range indicator. A regular expression followed by an * (asterisk) matches a sequence of 0 or more matches of the regular expression. A regular expression followed by a + (plus) matches a sequence of 1 or more matches of the regular expression. A regular expression followed by a ? (question mark) matches a sequence of 0 or 1 matches of the regular expression. Two regular expressions concatenated match a match of the first followed by a match of the second. Two regular expressions separated by | or newline match either a match for the first or a match for the second. A regular expression enclosed in parentheses matches a match for the regular expression. The order of precedence of operators at the same parenthesis level is [] then *+? then concatenation then | and newline. Ideally there should be only one grep, but we don't know a single algorithm that spans a wide enough range of space-time tradeoffs. SEE ALSO
ex(1), sed(1), sh(1) DIAGNOSTICS
Exit status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible files. BUGS
Lines are limited to 256 characters; longer lines are truncated. 4th Berkeley Distribution April 29, 1985 GREP(1)
All times are GMT -4. The time now is 12:56 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy