Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 08-03-2012
Registered User
 
Join Date: Feb 2011
Posts: 153
Thanks: 68
Thanked 1 Time in 1 Post
How do we write an exception in a Regex.

Hello,
Actually this is a follow-up of my earlier request to identify Sentence Boundaries while generating snippets for a search engine. The basic regex I have written to delimit sentence boundaries handles numbers and acronyms but I cannot get it to handle cases of
Quote:
Mr. Andrew visited me.
Mrs. Smith left for London.
The full stops after Mr. Mrs. are automatically treated as sentence delimiters which is not desirable.
I tried the following syntax:

Code:
!(Dr\.|Mr\.|Mrs\.|Ms\.|[A-Z]\.|i\.e\.|w\.r\.t\.|e\.g\.|etc\.|viz\.)

to make the regex ignore a full-stop after such cases enumerated, but it does not work.
In fact the simple regex I had written has got murky and just does not perform any more.
Any help in correcting the regex would be appreciated.

Some sample sentences are given below:
Quote:
Mr. Andrew came.
Ms. Smith left for London.
He brought three things viz. bread, cheese and wine
This is w.r.t. your application
Sponsored Links
    #2  
Old 08-03-2012
Mead Rotor
 
Join Date: Aug 2005
Location: Saskatchewan
Posts: 16,374
Thanks: 491
Thanked 2,535 Times in 2,418 Posts
Instead of writing things for a regex to not match, try getting something else in your regex to match it first. Regexes do greedy matching so whatever matches it first 'wins'.

What language is this regex for? This works in grep:


Code:
$ echo "Mr. Andrew visited me.  fleeb narf stuff." | egrep -o "([a-zA-Z]|(Mr|Ms|Dr|Mrs)[.]| )*[.]"
Mr. Andrew visited me.
  fleeb narf stuff.

$

A simplified example but hopefully conveys the idea.

Just a preference of mine, but I find it clearer to put special chars in [] than escape them to make them literal sometimes.
The Following User Says Thank You to Corona688 For This Useful Post:
gimley (08-03-2012)
Sponsored Links
    #3  
Old 08-03-2012
Registered User
 
Join Date: Feb 2011
Posts: 153
Thanks: 68
Thanked 1 Time in 1 Post
Many thanks. Works beautifully in egrep, but dies in Java. I wonder why. Does anybody know if Java demands a special regex set ?
    #4  
Old 08-04-2012
Mead Rotor
 
Join Date: Aug 2005
Location: Saskatchewan
Posts: 16,374
Thanks: 491
Thanked 2,535 Times in 2,418 Posts
regex really isn't the same everywhere. Might have been a good idea to post you were using java from the start.
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Converting perl regex to sed regex suntzu Shell Programming and Scripting 1 10-30-2010 06:16 AM
read/write,write/write lock with smbclient fails swatidas11 IP Networking 1 03-05-2010 10:26 AM
MMU exception Puntino Linux 2 05-07-2008 12:35 PM
Help with RPC Exception ejbrever HP-UX 2 08-24-2006 02:08 PM
RPC Exception - Help ejbrever UNIX for Advanced & Expert Users 0 08-21-2006 12:56 PM



All times are GMT -4. The time now is 04:09 PM.