The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Extracting lines that match string at certain position paruthiveeran UNIX for Dummies Questions & Answers 5 06-09-2008 12:03 PM
Replacing Last occurance of & from a string vivekshady Shell Programming and Scripting 1 05-13-2008 09:15 PM
String search - Command to find second occurance saurabhsinha23 UNIX for Dummies Questions & Answers 5 12-06-2007 08:03 PM
count string occurance in a file hourly ayhanne UNIX for Dummies Questions & Answers 2 10-13-2007 11:47 AM
Removing the last occurance of string dkhanna01 Shell Programming and Scripting 7 12-28-2004 05:46 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 10-24-2008
jonathanm jonathanm is offline
Registered User
  
 

Join Date: Oct 2008
Posts: 6
Post How can I match lines with just one occurance of a string in awk?

Hi,

I'm trying to match records using awk which contain only one occurance of my string, I know how to match one or more (+) but matching only one is eluding me without developing some convoluted bit of code. I was hoping there would be some simple pattern matching thing similar to '+' but which means 'one and only one occurance of'.

My matching code looks like this:

Code:
$10 !~ /&| and | AND | And |\// && $11 !~ /FLAT|Flat|Apartment|APARTMENT/ && $10 ~ /MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms/ {

But some records have in their name field multiple names, such as
Quote:
Mr Magoo Mr Smith Miss Demeanor
and I want to not match those records.

Any help with this would be grand!

The only alternative I can think of is some convoluted counting loop which goes through the name split as an array to count if any of the Mr, Mrs, MR, MRS, etc occur more than once, which sounds quite long-winded and unnecessary.
  #2 (permalink)  
Old 10-25-2008
drl's Avatar
drl drl is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2007
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 717
Hi.

I find that such things are relatively straight-forward in perl because of the power of regular expression infrastructure. I don't know if awk has this feature as visibly as does perl, but here is a shell script that drives a small perl script:

Code:
#!/bin/bash -

# @(#) s1       Demonstrate perl.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) perl
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " perl script file:"
cat p1

echo
echo " Results:"
./p1 $FILE

exit 0

Producing:

Code:
% ./s1

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0
perl 5.8.4

 Data file data1:
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

 perl script file:
#!/usr/bin/perl

# @(#) p1       Demonstrate skipping of line with repeated matches.

use warnings;
use strict;

my($debug);
$debug = 0;
$debug = 1;
my($t1);

my($lines) = 0;

# Make entire line lower case to simply matches. Use captured
# string to omit lines with contain more than one match.

while ( <> ) {
chomp;
        print " Working on |$_|\n";
        $lines++;
        $t1 = lc $_;
        next if $t1 =~ /(mr|miss).*\1/;
    print "$_\n";;
}

print STDERR " ( Lines read: $lines )\n";

exit(0);

 Results:
 Working on |Mr Magoo|
Mr Magoo
 Working on |Mr Magoo mr magoo|
 Working on |Mr Magoo Mr Smith Miss Demeanor|
 Working on |Mr Smith Miss Demeanor|
Mr Smith Miss Demeanor
 Working on |Miss Demeanor Miss Taken|
 Working on |Miss Taken|
Miss Taken
 ( Lines read: 6 )

Best wishes ... cheers, drl
  #3 (permalink)  
Old 10-25-2008
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
addict
  
 

Join Date: Jan 2007
Location: Варна, България / Milano, Italia
Posts: 2,928
With GNU AWK:


Code:
$ cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo
Miss Taken

Or another version of drl's solution:


Code:
perl -nle'!/(m(r|iss)).*\2/i&&print' file

Some versions of sed:


Code:
sed -nr '/(m(r|iss)).*\2/I!p' file

... I can't manage to make it work with grep.

Last edited by radoulov; 10-25-2008 at 06:03 PM..
  #4 (permalink)  
Old 10-25-2008
danmero danmero is online now Forum Advisor  
  
 

Join Date: Nov 2007
Location: 45.48-73.63
Posts: 1,449
Quote:
Originally Posted by radoulov View Post
With GNU AWK:


Code:
$ cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo
Miss Taken

..
.ops , what is the logic here?

Code:
# cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
# awk 'NF==2' file
Mr Magoo
Miss Taken
# awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo mr magoo

  #5 (permalink)  
Old 10-25-2008
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
addict
  
 

Join Date: Jan 2007
Location: Варна, България / Milano, Italia
Posts: 2,928
Quote:
Originally Posted by danmero View Post
.ops , what is the logic here?
[...]
I said GNU AWK.


Code:
$ cat file
Mr Magoo A
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken B
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo A
Miss Taken B
$ nawk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo mr magoo

Just for completeness:


Code:
$ awk --version|head -1                      
GNU Awk 3.1.6
$ strings =nawk|grep -Fm1 version
version 20070501

The problem with your second example is the case sensitive search (IGNORECASE is GNU specific):


Code:
$ print 'mr
mr mr
miss
miss miss'|nawk -F'm(r|iss)' 'NF==2{print NR,$0}' 
1 mr
3 miss

You may try to make it case insensitive using more verbose code

Last edited by radoulov; 10-25-2008 at 06:38 PM..
  #6 (permalink)  
Old 10-25-2008
drl's Avatar
drl drl is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2007
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 717
Hi.
Quote:
Originally Posted by radoulov View Post
...... I can't manage to make it work with grep.
If grep is compiled with perl regular expressions, one can get farther. I had 2 versions where it was not compiled in. Here's a sample:

Code:
#!/bin/bash -

# @(#) s1       Demonstrate perl regular expressions in grep.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) grep
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " Results:"
grep -v -i --perl-regexp '(mr).*\1' $FILE

exit 0

Producing (on openSUSE 11.0 (i586)):

Code:
$ ./s2

(Versions displayed with local utility "version")
Linux 2.6.25.16-0.1-pae
GNU bash 3.2.39
GNU grep 2.5.2

 Data file data1:
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

 Results:
Mr Magoo
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

cheers, drl
  #7 (permalink)  
Old 10-26-2008
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
addict
  
 

Join Date: Jan 2007
Location: Варна, България / Milano, Italia
Posts: 2,928
Quote:
Originally Posted by drl View Post
Hi.

If grep is compiled with perl regular expressions, one can get farther. I had 2 versions where it was not compiled in.
[...]
Great point drl,
thank you!

It was not obvious to me that this option was needed:


Code:
$ cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

$ grep -viP '(m|(r|iss)).*\2' file
Mr Magoo
Miss Taken

Just an addition (I don't know how I missed that yesterday),
it seems it works with ERE's too:


Code:
$ grep -Evi '(m|(r|iss)).*\2' file
Mr Magoo
Miss Taken

$ egrep -vi '(m|(r|iss)).*\2' file
Mr Magoo
Miss Taken


Last edited by radoulov; 10-26-2008 at 08:23 AM..
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 07:30 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0