The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Perl regex question figaro Shell Programming and Scripting 10 07-18-2008 12:45 AM
Perl regex help - matching parentheses cvp Shell Programming and Scripting 4 06-25-2008 11:45 AM
how do i strip this line using perl regex. ramky79 Shell Programming and Scripting 1 03-18-2008 08:10 AM
Regex deepakpv Shell Programming and Scripting 6 03-28-2007 01:18 AM
sed regex Shakey21 UNIX for Dummies Questions & Answers 4 01-31-2002 05:16 PM

Reply
 
Submit Tools LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-23-2008
Registered User
 

Join Date: Nov 2007
Posts: 69
Stumble this Post!
q with Perl Regex

For a programming exercise, I am mean to design a Perl script that detects double letters in a text file.

I tried the following expressions

Code:
# Check for any double letter within the alphabet

/[a-zA-Z]+/

# Check for any repetition of an alphanumeric character

/\w+/
Im aware that the + means to search for one or more occurences of that character, however trying both of these did not meet the requirements of my program.

Also

Code:
/[a-zA-Z]{1}/
did not prove to be helpful as well

After doing some searching, I stumbled across the correct form of the regex for the double letter case. It turned out to be

Code:
/(.)\1/
Now I know that . refers to any single character and the \1 refers to the first character in the line being read (if s/..../.... is being used), but Im still puzzled as to why /(.)\1/ works instead of /[a-zA-Z]+/ for the case of double letters ?

many thanks
James
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 07-23-2008
Moderator
 

Join Date: Sep 2002
Location: Hong Kong, China
Posts: 1,430
Stumble this Post!
Quote:
Originally Posted by JamesGoh View Post
Now I know that . refers to any single character and the \1 refers to the first character in the line being read (if s/..../.... is being used), but Im still puzzled as to why /(.)\1/ works instead of /[a-zA-Z]+/ for the case of double letters ?
* Incorrect text removed *

/[a-zA-Z]+/ only means matching a contiguous sequence of letters, so not only 'AA' or 'zz' will match, 'Az' will match too.

Last edited by cbkihong; 07-23-2008 at 10:28 PM. Reason: Incorrect text removed
Reply With Quote
  #3 (permalink)  
Old 07-23-2008
Registered User
 

Join Date: Jan 2008
Posts: 306
Stumble this Post!
\1 is a backreference to what is matched in the parenthesis in the regexp. So /(.)\1/ finds a double occurance of whatever (.) matched. It is similar to $1 but is used inside the regexp. It is discussed in some detail here:

perlretut - perldoc.perl.org
Reply With Quote
  #4 (permalink)  
Old 07-23-2008
Registered User
 

Join Date: Jan 2008
Posts: 306
Stumble this Post!
Quote:
Originally Posted by cbkihong View Post
Actually, not even /(.)\1/ is correct. In Perl, you should use /(.)$1/. The former syntax is there for compatibility with I think awk or sed but that should in general not be used in Perl, because Perl has more uses of backslash that may interfere with backtracking.
That is not correct. Using \1 is perfectly good perl code. \1 and $1 really have two seperate uses. See the link I posted in my previous post. A short test shows they do not do the same thing:

Code:
$_ = 'foobar';
if (/(.)$1/) {
   print "\$1 = $1","\n";
}	
if (/(.)\1/) {
   print "\\1 = $1";
}
output:

Code:
$1 = f
\1 = o
Reply With Quote
  #5 (permalink)  
Old 07-23-2008
Registered User
 

Join Date: Nov 2007
Posts: 69
Stumble this Post!
Thanks everyone for your messages.

Also I found that re-reading my notes in better detail was very helpful !
Reply With Quote
  #6 (permalink)  
Old 07-23-2008
Registered User
 

Join Date: Jan 2008
Posts: 306
Stumble this Post!
this does not work:

/[a-zA-Z]+/

because it means one or more of the characters inside the square brackets, any of the characters, in any order. You want to find two of the same character repeated in a string, not one or more of any character inside the [] brackets.
Reply With Quote
  #7 (permalink)  
Old 07-23-2008
Moderator
 

Join Date: Dec 2003
Location: /dev/florida
Posts: 946
Stumble this Post!
Interesting and thoughtful question. You use "(" and ")" to mark (remember) a pattern and recall the remembered pattern with "\" followed by a single digit (back reference).

In your particular case, "(.)\1" means remember a character and recall the character.

You can extend this method to find words with multiple double letters. '(.)\1(.)\2(.)\3' will match any word with three double letters, e.g. bookkeeper.
Reply With Quote
Google The UNIX and Linux Forums
Reply

Tags
perl, perl regex, regex

Thread Tools
Display Modes




All times are GMT -7. The time now is 09:56 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0