Regexp help -- (a*)(b*|(ab)*) and (a*)((b|ab)*) on "aabab"


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regexp help -- (a*)(b*|(ab)*) and (a*)((b|ab)*) on "aabab"
# 8  
Old 03-24-2009
I also tried python 2.5.2 for (a*)(b*|(ab)*) and it was aab, same as perl.
Actually aa-b was I expected and I don't understand why tcl/awk/sed will get a-abab.

Any idea ?

Last edited by qiulang; 03-25-2009 at 05:48 AM..
# 9  
Old 03-24-2009
Yes,
read about Regex Engine Types.
Observe the differences between GNU sed and super sed:

Code:
% sed -r 's/(a*)(b*|(ab)*)/-->\1<-- -->\2<--/'<<<aabab  
-->a<-- -->abab<--

% ssed -r 's/(a*)(b*|(ab)*)/-->\1<-- -->\2<--/'<<<aabab
-->aa<-- -->b<--ab

NFA and DFA:

Code:
% sed -r 's/(nfa|nfa not)/-->\1<--/'<<<'nfa not'      
-->nfa not<--

% ssed -r 's/(nfa|nfa not)/-->\1<--/'<<<'nfa not'  
-->nfa<-- not

# 10  
Old 03-24-2009
NFA vs DFA

Thanks for pointing out NFA & DFA difference. I document what I learn here in case someone out there is as boring as I am. Smilie
So NFA is straightforward. It tries the patterns in question one by one. In the (a*)(b*|(ab)*) against aabab case, NFA first tries a*, then (b*|(ab)*). And when it comes to alternation, traditional NFA engine chooses the ordered one but not the greedy one. So in the case (nfa|nfa not) against "nfa not", it chooses nfa. So (a*)(b*|(ab)*) against aabab is aab

DFA is some what counterintuitive. It scans the string to rule out the patterns in question and if there exists multi-matches, DFA finds the longest possible match (for alternation, DFA is greedy not ordered). Take the (a*)(b*|(ab)*) against aabab case for instance, starting from aa, there exists 2 possible matches, aa-b and a-abab. So it chooses a-abab.

And I think the case (a*)((b|ab)*)against aabab is a more interesting case now. Because both aa-bab and a-abab match aabab and they are the same length. So why choose aa-bab over a-abab ? I still don't have a clear answer for it.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

4. Shell Programming and Scripting

Regexp to separated rows by "asterisks-new line" in awk

Hello to all, I have the text file below, how would be the REGEXP to set the RS to separate registers by asterisks-newline-asterisks (highlighted in red) and FS as the default, in order that the fourth field ($4) always be the number after REG (in blue)? I'm trying with code below, but is... (5 Replies)
Discussion started by: Ophiuchus
5 Replies

5. Shell Programming and Scripting

how to use "cut" or "awk" or "sed" to remove a string

logs: "/home/abc/public_html/index.php" "/home/abc/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" how to use "cut" or "awk" or "sed" to get the following result: abc abc xyz xyz xyz (8 Replies)
Discussion started by: timmywong
8 Replies

6. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

7. Shell Programming and Scripting

Regexp to match all characters including "?"

Hi everyone, I'm almost tearing my hairs to find a valid regexp which will match EVERY character in a string, including the question mark! Specifically I need to match a string which contains the word (example) "stringtobematched" at the end of it. Everyone would suggest this: ... (4 Replies)
Discussion started by: lycaon
4 Replies

8. Shell Programming and Scripting

cat $como_file | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g'

hi All, cat file_name | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g' Can this be done by using sed or awk alone (4 Replies)
Discussion started by: harshakusam
4 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies
Login or Register to Ask a Question