regular expression matching whole words


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting regular expression matching whole words
# 1  
Old 05-25-2012
regular expression matching whole words

Hi

Consider the file

Code:
this is a good line

when running
Code:
grep '\b(good|great|excellent)\b' file5

I expect it to match the line but it doesn't... what am i doing wrong??
(ultimately this regex will be in a awk script- just using grep to test it)

Thanks,

Storms
# 2  
Old 05-25-2012
For grep to work with regular expressions you need to enable it (preferred) or use egrep:

Code:
grep -E "(good|great|excellent)" filename

This User Gave Thanks to agama For This Post:
# 3  
Old 05-25-2012
Quote:
Originally Posted by agama
For grep to work with regular expressions you need to enable it (preferred) or use egrep:

Code:
grep -E "(good|great|excellent)" filename

sorry for my denseness but how can i get it to work in the awk script?? the following doesnt seem to match the line
Code:
if ($0 ~ /^.*\b(good|two|three)\b.*$/) { print "match" }

# 4  
Old 05-25-2012
The \b escape pattern doesn't work in my version of awk. I prefer match() to the ~ syntax, but either should work:

Code:
awk '
    {
        if( $0 ~ /[[:space:]](foo|bar|goo)[[:space:]]/ )
            print "" $0;

        if( match( $0, "[[:space:]](foo|bar|goo)[[:space:]]" ) )
            print;
    }
'



Note that the leading ^.* and trailing .*$ are unneeded. The leading space imples that none of these words can be at the beginning of the line, while the trailing space imples that they may not be the last word on the line. If you need either change to something like:

Code:
if( match( $0, "[[:space:]]*(foo|bar|goo)[[:space:]]*" ) )



to indicate that zero or more space characters may precede/follow the word.
This User Gave Thanks to agama For This Post:
# 5  
Old 05-25-2012
thanks for that, after your reply i did some further googling and found that \y works in place of \b in awk. I'm using this to match whole words...

so it matches good, but not goodd
Code:
if (match($0, /\y(good|excellent|three)\y/)) { print "match", $0 }

# 6  
Old 05-26-2012
Quote:
Originally Posted by agama
For grep to work with regular expressions you need to enable it (preferred) or use egrep:

Code:
grep -E "(good|great|excellent)" filename

grep works with regular expressions (BRE) by default. Did you mean extended regular expressions (ERE) that support alternation (|) and enabling with the "-E" switch?

Quote:
Originally Posted by agama
[..]If you need either change to something like:

Code:
if( match( $0, "[[:space:]]*(foo|bar|goo)[[:space:]]*" ) )



to indicate that zero or more space characters may precede/follow the word.
That will not fly, since "may" allows too much liberty. A word like "goods" would match too. And what about punctuation? What constitutes a word?

Quote:
Originally Posted by Storms
thanks for that, after your reply i did some further googling and found that \y works in place of \b in awk. I'm using this to match whole words...

so it matches good, but not goodd
Code:
if (match($0, /\y(good|excellent|three)\y/)) { print "match", $0 }

\y is a GNU extension and will not work across awks. An alternative would be to use \< and \> instead:
Code:
gawk '/\<(good|excellent|three)\>/{ print "match", $0 }'

But this isn't universal either

A universal awk approach would be something like this I guess:
Code:
awk -F'[[:space:][:punct:]]*' '{for(i=1;i<=NF;i++)if($i~/^(good|great|excellent)$/){print; next}}'

A special case would perhaps need to be made for the underscore character...

Last edited by Scrutinizer; 05-26-2012 at 03:20 AM..
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

delete lines matching a regular expression

I have a very large file (over 700 million lines) that has some lines that I need to delete. An example of 5 lines of the file: HS4_80:8:2303:19153:193032 153 k80:138891 HS4_80:8:2105:5544:43174 89 k88:81949 165 k88:81949 323 0 * = 323 0 ... (6 Replies)
Discussion started by: pathunkathunk
6 Replies

2. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

3. Shell Programming and Scripting

Matching single quote in a regular expression

I trying to match the begining of the following line in a perl script with a regular expression. $ENV{'ORACLE_HOME'} I tried this regluar expession: /\$ENV\{\'ORACLE_HOME\'\}/ Instead of match, I got a blank prompt > It seems to be a problem with the single quote. If I take it... (11 Replies)
Discussion started by: JC9672
11 Replies

4. Shell Programming and Scripting

Regular expression matching

Hi, I have a variable in my script that gets its value from a procstack output. It could be a number of any length, or it could just be a '1' with 0 or more white spaces around it. I would like to detect when this variable is just a 1 and not a 1234, for example. This is as far as I got: ... (3 Replies)
Discussion started by: tmf33uk
3 Replies

5. UNIX for Dummies Questions & Answers

replace words in sed using regular expression

hi, I need to replace all these lines from my text file 123end 234end 324end 234end 989end 258end 924end At the moment I know how to replace "end". But I want to replace the numbers before end as well. How can I do this ? sed s/end/newWord/ myfile.txt newFile.txt thanks (3 Replies)
Discussion started by: aneuryzma
3 Replies

6. Shell Programming and Scripting

Regular expression matching in BASH (equivalent of =~ in Perl)

In Perl I can write a condition that evaluates a match expression like this: if ($foo =~ /^bar/) { do blah blah blah } How do I write this in shell? What I need to know is what operator do I use? The '=~' doesn't seem to fit. I've tried different operators, I browsed the man page for... (3 Replies)
Discussion started by: indiana_tas
3 Replies

7. Shell Programming and Scripting

Help: Regular Expression for Negate Matching String

Hi guys, as per subject I am having problem with regular expressions. Example, if i got a string "javax.servlet.http.HttpServlet.service" that may occurred anywhere within a text file. How can I used the negate pattern matching of regular expression? I tried the below pattern but it... (4 Replies)
Discussion started by: DrivesMeCrazy
4 Replies

8. Programming

Regular Expression matching in PERL

I am trying to read a file and capture particular lines into different strings: LENGTH: Some Content here TEXT: Some Content Here COMMENT: Some Content Here I want to be able to get (LENGTH: .... ) into one array and so on... I'm trying to use PERL in slurp mode but for some reason... (8 Replies)
Discussion started by: Legend986
8 Replies

9. UNIX for Advanced & Expert Users

matching words using regular expressions

following file is taken as input aaa bbb ccc ddd eee ffff grep -w aaa <filename> gives proper output. grep \<\(aaa\).*\> filename :- should give output, since aaa is at begining, however i dosen't get any ouput. Any discrepancy. machine details:- Linux anaconda... (1 Reply)
Discussion started by: bishweshwar
1 Replies

10. Shell Programming and Scripting

Regular expression matching a new line

I have written a script to test some isdn links in my network and I am trying to format the output to be more readable. Each line of the output has a different number of digits as follows... Sitename , spid1 12345678901234 1234567890 1234567 , spid2 1234567890 1234567890 1234567 Sitename , ... (1 Reply)
Discussion started by: drheams
1 Replies
Login or Register to Ask a Question