Unix/Linux Go Back    


Shell Programming and Scripting Unix shell scripting - KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and shell scripts and shell scripting languages here.

Understanding regex behaviour when using quantifiers

Shell Programming and Scripting


Closed Linux or Unix Question    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 12-28-2012
chidori chidori is offline
Registered User
 
Join Date: Jun 2011
Last Activity: 3 December 2013, 12:16 PM EST
Posts: 215
Thanks: 51
Thanked 3 Times in 3 Posts
Understanding regex behaviour when using quantifiers


Code:
# echo "Teest string" | sed 's/e*/=>replaced=</'
=>replaced<=Teest string

So, in the above code , sed replaces at the start. does that mean sed using the pattern e* settles to zero occurence ? Why sed was not able to replace Teest string.


Code:
# echo "Teest string" | sed 's/e*//g'
Tst string

How does it work when global flag turned on ?
Sponsored Links
    #2  
Old Unix and Linux 12-28-2012
DGPickett DGPickett is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Last Activity: 17 February 2015, 1:56 PM EST
Location: Southern NJ, USA (Nord)
Posts: 4,671
Thanks: 8
Thanked 586 Times in 559 Posts
Some flavors of regex have + for one or more, but you can just say 'ee*'. Also, there is '\{1,99\}' for 1 to 99 in the sed flavor. There must be about a dozen regex flavors, especially after the PERL guys dominated a POSIX version, so the word edge '\>' became '\b': Regex Tutorial - \b Word Boundaries

There are even schemes to make the * lazy as opposed to the normal greedy behavior. Consider the ksh/bash ${pathname##*/} is greedy, leaves just the entry name, but the ${pathname#*/} just removes the first slash and anything before it. This is not a standard regex, but I recall MULTICS qedx having a way to do the agressive/lazy switch back when. I wonder if regex are older than UNIX?

The g says how many times to apply the substitution: infinite. You can also say 3 to skip to the third match before substituting. It has to do with the writing, not the matching. With no flag, same as 1. http://www.regular-expressions.info/possessive.html
The Following User Says Thank You to DGPickett For This Useful Post:
chidori (12-28-2012)
Sponsored Links
    #3  
Old Unix and Linux 12-28-2012
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 1 July 2015, 12:27 AM EDT
Location: San Jose, CA, USA
Posts: 6,622
Thanks: 280
Thanked 2,215 Times in 1,898 Posts
Quote:
Originally Posted by chidori View Post
Code:
# echo "Teest string" | sed 's/e*/=>replaced=</'
=>replaced<=Teest string

So, in the above code , sed replaces at the start. does that mean sed using the pattern e* settles to zero occurence ? Why sed was not able to replace Teest string.


Code:
# echo "Teest string" | sed 's/e*//g'
Tst string

How does it work when global flag turned on ?
Despite what DGPickett said, perl had no affect on the description of regular expression in the POSIX standards.

There are several variations on RE processing, but there are three main types (basic regular expressions [BREs], extended regular expressions [EREs], and pathname pattern matching) in the standards (POSIX, the Single UNIX Specification [SUS], the System V Interface Definition [SVID], and the Linux Standard Base [LSB]). According to POSIX, SUS and SVID, sed and a bunch of other utilities use BREs, awk and a bunch of other utilities use EREs, and the shell and a bunch of other utilities use pathname pattern matching when expanding pathnames. POSIX has about five full pages describing BREs, three and a half pages that describe the differences between BREs and EREs and another four and a half pages that give the formal grammar for the interpretation of BREs and EREs, and about two and half pages that describe the differences between pattern matching and REs.

Some utilities (like grep) have options to choose between BREs, EREs, and fixed strings. Although not specified by the standards, some implementations have options for sed to choose between BREs and EREs, but using EREs with sed is not portable.

Meanwhile, back to your question. In the BREs used in sed, the expression e* matches zero or more occurrences of an e . The beginning of the string Teest string (before the "T") matches zero occurrences of "e", so the pipeline:

Code:
echo "Teest string" | sed 's/e*/=>replaced=</'

produces:

Code:
=>replaced=<Teest string

and the command:

Code:
sed 's/e*/=>replaced=</g'

would replace every occurrence of zero or more "e"s with your replacement string. I.e.,
Code:
=>replaced=<T=>replaced=<s=>replaced=<t=>replaced=< =>replaced=<s=>replaced=<t=>replaced=<r=>replaced=<i=>replaced=<n=>replaced=<g=>replaced=<

Two portable sed pipelines to do what you were trying to do are:

Code:
echo "Teest string" | sed 's/e\{1,\}/=>replaced=</'

or (as DGPickett suggested):

Code:
echo "Teest string" | sed 's/ee*/=>replaced=</'

which replaces the first occurrence of one or more "e"s with the specified replacement string. In this case:

Code:
T=>replaced=<st string

The Following 2 Users Say Thank You to Don Cragun For This Useful Post:
chidori (12-28-2012), jim mcnamara (12-29-2012)
    #4  
Old Unix and Linux 12-28-2012
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 30 June 2015, 4:49 PM EDT
Location: NM
Posts: 10,495
Thanks: 344
Thanked 869 Times in 807 Posts
There even more interesting aspects to regular expressions.
Consider reading the 'owl' book: Mastering Regular Expressions 3rd Ed by J. Friedl
Sponsored Links
    #5  
Old Unix and Linux 12-29-2012
chidori chidori is offline
Registered User
 
Join Date: Jun 2011
Last Activity: 3 December 2013, 12:16 PM EST
Posts: 215
Thanks: 51
Thanked 3 Times in 3 Posts
:)

Quote:
Originally Posted by jim mcnamara View Post
There even more interesting aspects to regular expressions.
Consider reading the 'owl' book: Mastering Regular Expressions 3rd Ed by J. Friedl
Thanks, I got one already. Guess its good time to start reading that Linux

Thanks to all who for a brief explanation on this Linux
Sponsored Links
    #6  
Old Unix and Linux 12-29-2012
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 30 June 2015, 4:49 PM EDT
Location: NM
Posts: 10,495
Thanks: 344
Thanked 869 Times in 807 Posts
FWIW: The standards issues with regex are something that that appears to be coming together well. Or better anyway.

Basically when you are using UNIX tools, IMO, regex use has this sort of feel to it:

Code:
If today == Tuesday 
  then 
     we must be in Belgium
end if

This is the way UNIX was overall back in the 90's - XOPEN, SUS, SVID, SYSV, BSD, Torvalds etc.

Henry Spencer ( zoologist) wrote the first open source version of UNIX regex, which then allowed the creation of cascade of modern regex "flavors". Larry Wall appears to have used Spencer's regex as a model for perl regex, for example.

So, if you understand the difference between extended regular expressions (ERE) and basic (BRE) you are well on the way.... to Belgium.
Sponsored Links
    #7  
Old Unix and Linux 12-31-2012
DGPickett DGPickett is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Last Activity: 17 February 2015, 1:56 PM EST
Location: Southern NJ, USA (Nord)
Posts: 4,671
Thanks: 8
Thanked 586 Times in 559 Posts
This is shorter: Regular expression - Wikipedia, the free encyclopedia

QED went to MULTICS, probably before UNIX, and became qedx, which is very close to ed and ex. Later, Waterloo ported it to FRED.

grep -E is essentially egrep is ERE.
Sponsored Links
Closed Linux or Unix Question

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
help understanding regex with grep & sed trogdortheburni Shell Programming and Scripting 4 10-13-2012 09:44 PM
read regex from ID file, print regex and line below from source file pathunkathunk UNIX for Dummies Questions & Answers 3 10-08-2012 08:52 PM
Understanding a regex vibhor_agarwali Shell Programming and Scripting 4 03-02-2012 04:53 AM
find: "weird" regex behaviour courteous Shell Programming and Scripting 7 01-24-2011 01:50 PM
Converting perl regex to sed regex suntzu Shell Programming and Scripting 1 10-30-2010 06:16 AM



All times are GMT -4. The time now is 12:51 AM.