regular expression matching whole words

05-25-2012

Registered User

15, 0

Join Date: May 2012

Last Activity: 11 July 2012, 12:47 PM EDT

Posts: 15

Thanks Given: 8

Thanked 0 Times in 0 Posts

regular expression matching whole words

Hi

Consider the file

Code:

this is a good line

when running

Code:

grep '\b(good|great|excellent)\b' file5

I expect it to match the line but it doesn't... what am i doing wrong??
(ultimately this regex will be in a awk script- just using grep to test it)

Thanks,

Storms

Storms

View Public Profile for Storms

Find all posts by Storms

05-25-2012

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

For grep to work with regular expressions you need to enable it (preferred) or use egrep:

Code:

grep -E "(good|great|excellent)" filename

This User Gave Thanks to agama For This Post:

agama

View Public Profile for agama

Find all posts by agama

05-25-2012

Registered User

15, 0

Join Date: May 2012

Last Activity: 11 July 2012, 12:47 PM EDT

Posts: 15

Thanks Given: 8

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by agama

For grep to work with regular expressions you need to enable it (preferred) or use egrep:

Code:

grep -E "(good|great|excellent)" filename

sorry for my denseness but how can i get it to work in the awk script?? the following doesnt seem to match the line

Code:

if ($0 ~ /^.*\b(good|two|three)\b.*$/) { print "match" }

Storms

View Public Profile for Storms

Find all posts by Storms

05-25-2012

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

The \b escape pattern doesn't work in my version of awk. I prefer match() to the ~ syntax, but either should work:

Code:

awk '
    {
        if( $0 ~ /[[:space:]](foo|bar|goo)[[:space:]]/ )
            print "" $0;

        if( match( $0, "[[:space:]](foo|bar|goo)[[:space:]]" ) )
            print;
    }
'

Note that the leading ^.* and trailing .*$ are unneeded. The leading space imples that none of these words can be at the beginning of the line, while the trailing space imples that they may not be the last word on the line. If you need either change to something like:

Code:

if( match( $0, "[[:space:]]*(foo|bar|goo)[[:space:]]*" ) )

to indicate that zero or more space characters may precede/follow the word.

This User Gave Thanks to agama For This Post:

agama

View Public Profile for agama

Find all posts by agama

05-25-2012

Registered User

15, 0

Join Date: May 2012

Last Activity: 11 July 2012, 12:47 PM EDT

Posts: 15

Thanks Given: 8

Thanked 0 Times in 0 Posts

thanks for that, after your reply i did some further googling and found that \y works in place of \b in awk. I'm using this to match whole words...

so it matches good, but not goodd

Code:

if (match($0, /\y(good|excellent|three)\y/)) { print "match", $0 }

Storms

View Public Profile for Storms

Find all posts by Storms

05-26-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Quote:

Originally Posted by agama

For grep to work with regular expressions you need to enable it (preferred) or use egrep:

Code:

grep -E "(good|great|excellent)" filename

grep works with regular expressions (BRE) by default. Did you mean extended regular expressions (ERE) that support alternation (|) and enabling with the "-E" switch?

Quote:

Originally Posted by agama

[..]If you need either change to something like:

Code:

if( match( $0, "[[:space:]]*(foo|bar|goo)[[:space:]]*" ) )

to indicate that zero or more space characters may precede/follow the word.

That will not fly, since "may" allows too much liberty. A word like "goods" would match too. And what about punctuation? What constitutes a word?

Quote:

Originally Posted by Storms

thanks for that, after your reply i did some further googling and found that \y works in place of \b in awk. I'm using this to match whole words...

so it matches good, but not goodd

Code:

if (match($0, /\y(good|excellent|three)\y/)) { print "match", $0 }

\y is a GNU extension and will not work across awks. An alternative would be to use \< and \> instead:

Code:

gawk '/\<(good|excellent|three)\>/{ print "match", $0 }'

But this isn't universal either

A universal awk approach would be something like this I guess:

Code:

awk -F'[[:space:][:punct:]]*' '{for(i=1;i<=NF;i++)if($i~/^(good|great|excellent)$/){print; next}}'

A special case would perhaps need to be made for the underscore character...

Last edited by Scrutinizer; 05-26-2012 at 03:20 AM..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

regular expression matching whole words

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

delete lines matching a regular expression

Discussion started by: pathunkathunk

2. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Discussion started by: jessy83

3. Shell Programming and Scripting

Matching single quote in a regular expression

Discussion started by: JC9672

4. Shell Programming and Scripting

Regular expression matching

Discussion started by: tmf33uk

5. UNIX for Dummies Questions & Answers

replace words in sed using regular expression

Discussion started by: aneuryzma

6. Shell Programming and Scripting

Regular expression matching in BASH (equivalent of =~ in Perl)

Discussion started by: indiana_tas

7. Shell Programming and Scripting

Help: Regular Expression for Negate Matching String

Discussion started by: DrivesMeCrazy

8. Programming

Regular Expression matching in PERL

Discussion started by: Legend986

9. UNIX for Advanced & Expert Users

matching words using regular expressions

Discussion started by: bishweshwar

10. Shell Programming and Scripting

Regular expression matching a new line

Discussion started by: drheams