Regexp help -- (a*)(b*|(ab)*) and (a*)((b|ab)*) on "aabab"


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regexp help -- (a*)(b*|(ab)*) and (a*)((b|ab)*) on "aabab"
# 1  
Old 03-23-2009
Regexp help -- (a*)(b*|(ab)*) and (a*)((b|ab)*) on "aabab"

Hi all

A regular expression question -- (a*)(b*|(ab)*) and (a*)((b|ab)*) on "aabab"

First, (a*)(b*|(ab)*) on "aabab". I though it would match "aa-b". But it turns on "a-abab". So why isn't a* greedy here?

Then I try (a*)((b|ab)*). It also matches aa-bab. But this time a* is greedy here. Why ?
# 2  
Old 03-23-2009
It depends, different regular expressions engines work differently:

Code:
% perl -ple's/(a*)(b*|(ab)*)/-->$1<-- -->$2<--/'<<<aabab
-->aa<-- -->b<--ab
% sed -r 's/(a*)(b*|(ab)*)/-->\1<-- -->\2<--/'<<<aabab  
-->a<-- -->abab<--
% awk '{print gensub(/(a*)(b*|(ab)*)/,"-->\\1<-- -->\\2<--",1)}'<<<aabab
-->a<-- -->abab<--

# 3  
Old 03-23-2009
TCL 8.5 also works in the way I described

Thanks for the reply.

TCL 8.5 also works in the way I described.

Which version of perl you use ?
# 4  
Old 03-23-2009
v5.10.0 built for cygwin-thread-multi-64int for this example.

Tried also 5.005_03 built for sun4-solaris and v5.8.8 built for IA64.ARCHREV_0-thread-multi with the same result.

And, of course, the Perl regular expressions engine is quite powerful, consider the following:

Code:
$ perl -pe's/(a*?)(b*|(ab)*)/-->$1<-- -->$2<--/'<<<aabab
--><-- --><--aabab # nothing matches, 0 or more non-greedy, so 0.
$ perl -pe's/(a+?)(b*|(ab)*)/-->$1<-- -->$2<--/'<<<aabab
-->a<-- --><--abab # one or more, non-greedy


Last edited by radoulov; 03-23-2009 at 01:09 PM..
# 5  
Old 03-23-2009
Did you also try (a*)((b|ab)*) to see if the difference exists ?
# 6  
Old 03-23-2009
Only the second backreference is different:

Code:
% perl -pe's/(a*)((b|ab)*)/-->$1<-- -->$2<--/'<<<aabab 
-->aa<-- -->bab<--


Last edited by radoulov; 03-23-2009 at 01:27 PM.. Reason: corrected
# 7  
Old 03-23-2009
This might be helpful:

Code:
% perl -M're debug' -pe'/(a*)(b*|(ab)*)/'<<<aabab
Compiling REx "(a*)(b*|(ab)*)"
Final program:
   1: OPEN1 (3)
   3:   STAR (6)
   4:     EXACT <a> (0)
   6: CLOSE1 (8)
   8: OPEN2 (10)
  10:   BRANCH (14)
  11:     STAR (25)
  12:       EXACT <b> (0)
  14:   BRANCH (FAIL)
  15:     CURLYM[3] {0,32767} (25)
  19:       EXACT <ab> (23)
  23:       SUCCEED (0)
  24:     NOTHING (25)
  25: CLOSE2 (27)
  27: END (0)
minlen 0 
Matching REx "(a*)(b*|(ab)*)" against "aabab%n"
   0 <> <aabab%n>            |  1:OPEN1(3)
   0 <> <aabab%n>            |  3:STAR(6)
                                  EXACT <a> can match 2 times out of 2147483647...
   2 <aa> <bab%n>            |  6:  CLOSE1(8)
   2 <aa> <bab%n>            |  8:  OPEN2(10)
   2 <aa> <bab%n>            | 10:  BRANCH(14)
   2 <aa> <bab%n>            | 11:    STAR(25)
                                      EXACT <b> can match 1 times out of 2147483647...
   3 <aab> <ab%n>            | 25:      CLOSE2(27)
   3 <aab> <ab%n>            | 27:      END(0)
Match successful!
aabab
Freeing REx: "(a*)(b*|(ab)*)"

Code:
% perl -M're debug' -pe'/(a*)((b|ab)*)/'<<<aabab      
Compiling REx "(a*)((b|ab)*)"
Final program:
   1: OPEN1 (3)
   3:   STAR (6)
   4:     EXACT <a> (0)
   6: CLOSE1 (8)
   8: OPEN2 (10)
  10:   CURLYX[1] {0,32767} (23)
  12:     OPEN3 (14)
  14:       TRIE-EXACT[ab] (20)
            <b> 
            <ab> 
  20:     CLOSE3 (22)
  22:   WHILEM[1/1] (0)
  23:   NOTHING (24)
  24: CLOSE2 (26)
  26: END (0)
minlen 0 
Matching REx "(a*)((b|ab)*)" against "aabab%n"
   0 <> <aabab%n>            |  1:OPEN1(3)
   0 <> <aabab%n>            |  3:STAR(6)
                                  EXACT <a> can match 2 times out of 2147483647...
   2 <aa> <bab%n>            |  6:  CLOSE1(8)
   2 <aa> <bab%n>            |  8:  OPEN2(10)
   2 <aa> <bab%n>            | 10:  CURLYX[1] {0,32767}(23)
   2 <aa> <bab%n>            | 22:    WHILEM[1/1](0)
                                      whilem: matched 0 out of 0..32767
   2 <aa> <bab%n>            | 12:      OPEN3(14)
   2 <aa> <bab%n>            | 14:      TRIE-EXACT[ab](20)
   2 <aa> <bab%n>            |          State:    1 Accepted:    0 Charid:  1 CP:  62 After State:    2
   3 <aab> <ab%n>            |          State:    2 Accepted:    1 Charid:  1 CP:   0 After State:    0
                                        got 1 possible matches
                                        only one match left: #1 <b>
   3 <aab> <ab%n>            | 20:      CLOSE3(22)
   3 <aab> <ab%n>            | 22:      WHILEM[1/1](0)
                                        whilem: matched 1 out of 0..32767
   3 <aab> <ab%n>            | 12:        OPEN3(14)
   3 <aab> <ab%n>            | 14:        TRIE-EXACT[ab](20)
   3 <aab> <ab%n>            |            State:    1 Accepted:    0 Charid:  2 CP:  61 After State:    3
   4 <aaba> <b%n>            |            State:    3 Accepted:    0 Charid:  1 CP:  62 After State:    4
   5 <aabab> <%n>            |            State:    4 Accepted:    1 Charid:  2 CP:   0 After State:    0
                                          got 1 possible matches
                                          only one match left: #2 <ab>
   5 <aabab> <%n>            | 20:        CLOSE3(22)
   5 <aabab> <%n>            | 22:        WHILEM[1/1](0)
                                          whilem: matched 2 out of 0..32767
   5 <aabab> <%n>            | 12:          OPEN3(14)
   5 <aabab> <%n>            | 14:          TRIE-EXACT[ab](20)
                                            failed to match trie start class...
                                          whilem: failed, trying continuation...
   5 <aabab> <%n>            | 23:          NOTHING(24)
   5 <aabab> <%n>            | 24:          CLOSE2(26)
   5 <aabab> <%n>            | 26:          END(0)
Match successful!
aabab
Freeing REx: "(a*)((b|ab)*)"

For more info on how to interpret the output check this or try:

Code:
perldoc perldebguts|less -p'regular'


Last edited by radoulov; 03-23-2009 at 01:50 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

4. Shell Programming and Scripting

Regexp to separated rows by "asterisks-new line" in awk

Hello to all, I have the text file below, how would be the REGEXP to set the RS to separate registers by asterisks-newline-asterisks (highlighted in red) and FS as the default, in order that the fourth field ($4) always be the number after REG (in blue)? I'm trying with code below, but is... (5 Replies)
Discussion started by: Ophiuchus
5 Replies

5. Shell Programming and Scripting

how to use "cut" or "awk" or "sed" to remove a string

logs: "/home/abc/public_html/index.php" "/home/abc/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" how to use "cut" or "awk" or "sed" to get the following result: abc abc xyz xyz xyz (8 Replies)
Discussion started by: timmywong
8 Replies

6. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

7. Shell Programming and Scripting

Regexp to match all characters including "?"

Hi everyone, I'm almost tearing my hairs to find a valid regexp which will match EVERY character in a string, including the question mark! Specifically I need to match a string which contains the word (example) "stringtobematched" at the end of it. Everyone would suggest this: ... (4 Replies)
Discussion started by: lycaon
4 Replies

8. Shell Programming and Scripting

cat $como_file | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g'

hi All, cat file_name | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g' Can this be done by using sed or awk alone (4 Replies)
Discussion started by: harshakusam
4 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies
Login or Register to Ask a Question