How can I match lines with just one occurance of a string in awk?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How can I match lines with just one occurance of a string in awk?
# 1  
Old 10-24-2008
Java How can I match lines with just one occurance of a string in awk?

Hi,

I'm trying to match records using awk which contain only one occurance of my string, I know how to match one or more (+) but matching only one is eluding me without developing some convoluted bit of code. I was hoping there would be some simple pattern matching thing similar to '+' but which means 'one and only one occurance of'.

My matching code looks like this:
Code:
$10 !~ /&| and | AND | And |\// && $11 !~ /FLAT|Flat|Apartment|APARTMENT/ && $10 ~ /MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms/ {

But some records have in their name field multiple names, such as
Quote:
Mr Magoo Mr Smith Miss Demeanor
and I want to not match those records.

Any help with this would be grand!

The only alternative I can think of is some convoluted counting loop which goes through the name split as an array to count if any of the Mr, Mrs, MR, MRS, etc occur more than once, which sounds quite long-winded and unnecessary.
# 2  
Old 10-25-2008
Hi.

I find that such things are relatively straight-forward in perl because of the power of regular expression infrastructure. I don't know if awk has this feature as visibly as does perl, but here is a shell script that drives a small perl script:
Code:
#!/bin/bash -

# @(#) s1       Demonstrate perl.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) perl
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " perl script file:"
cat p1

echo
echo " Results:"
./p1 $FILE

exit 0

Producing:
Code:
% ./s1

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0
perl 5.8.4

 Data file data1:
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

 perl script file:
#!/usr/bin/perl

# @(#) p1       Demonstrate skipping of line with repeated matches.

use warnings;
use strict;

my($debug);
$debug = 0;
$debug = 1;
my($t1);

my($lines) = 0;

# Make entire line lower case to simply matches. Use captured
# string to omit lines with contain more than one match.

while ( <> ) {
chomp;
        print " Working on |$_|\n";
        $lines++;
        $t1 = lc $_;
        next if $t1 =~ /(mr|miss).*\1/;
    print "$_\n";;
}

print STDERR " ( Lines read: $lines )\n";

exit(0);

 Results:
 Working on |Mr Magoo|
Mr Magoo
 Working on |Mr Magoo mr magoo|
 Working on |Mr Magoo Mr Smith Miss Demeanor|
 Working on |Mr Smith Miss Demeanor|
Mr Smith Miss Demeanor
 Working on |Miss Demeanor Miss Taken|
 Working on |Miss Taken|
Miss Taken
 ( Lines read: 6 )

Best wishes ... cheers, drl
# 3  
Old 10-25-2008
Quote:
Originally Posted by jonathanm
Hi,

I'm trying to match records using awk which contain only one occurance of my string, I know how to match one or more (+) but matching only one is eluding me without developing some convoluted bit of code. I was hoping there would be some simple pattern matching thing similar to '+' but which means 'one and only one occurance of'.
I prefer perl too, in cases like this, but this is easily solvable in awk. Basically, you want to match X but not X.*X.

Code:
$10 !~ /&| and | AND | And |\// && $11 !~ /FLAT|Flat|Apartment|APARTMENT/ && $10 ~ /MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms/ && $10 !~ /(MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms).*(MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms)/ {

And yes, it's a bit ugly, but awk isn't always very pretty. Smilie
# 4  
Old 10-25-2008
With GNU AWK:

Code:
$ cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo
Miss Taken

Or another version of drl's solution:

Code:
perl -nle'!/(m(r|iss)).*\2/i&&print' file

Some versions of sed:

Code:
sed -nr '/(m(r|iss)).*\2/I!p' file

... I can't manage to make it work with grep. Smilie

Last edited by radoulov; 10-25-2008 at 06:03 PM..
# 5  
Old 10-25-2008
Quote:
Originally Posted by radoulov
With GNU AWK:

Code:
$ cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo
Miss Taken

..
.ops , what is the logic here?
Code:
# cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
# awk 'NF==2' file
Mr Magoo
Miss Taken
# awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo mr magoo

# 6  
Old 10-25-2008
Quote:
Originally Posted by danmero
.ops , what is the logic here?
[...]
I said GNU AWK.

Code:
$ cat file
Mr Magoo A
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken B
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo A
Miss Taken B
$ nawk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo mr magoo

Just for completeness:

Code:
$ awk --version|head -1                      
GNU Awk 3.1.6
$ strings =nawk|grep -Fm1 version
version 20070501

The problem with your second example is the case sensitive search (IGNORECASE is GNU specific):

Code:
$ print 'mr
mr mr
miss
miss miss'|nawk -F'm(r|iss)' 'NF==2{print NR,$0}' 
1 mr
3 miss

You may try to make it case insensitive using more verbose code Smilie

Last edited by radoulov; 10-25-2008 at 06:38 PM..
# 7  
Old 10-25-2008
Hi.
Quote:
Originally Posted by radoulov
...... I can't manage to make it work with grep. Smilie
If grep is compiled with perl regular expressions, one can get farther. I had 2 versions where it was not compiled in. Here's a sample:
Code:
#!/bin/bash -

# @(#) s1       Demonstrate perl regular expressions in grep.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) grep
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " Results:"
grep -v -i --perl-regexp '(mr).*\1' $FILE

exit 0

Producing (on openSUSE 11.0 (i586)):
Code:
$ ./s2

(Versions displayed with local utility "version")
Linux 2.6.25.16-0.1-pae
GNU bash 3.2.39
GNU grep 2.5.2

 Data file data1:
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

 Results:
Mr Magoo
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to combine lines if fields match in lines

In the awk below, what I am attempting to do is check each line in the tab-delimeted input, which has ~20 lines in it, for a keyword SVTYPE=Fusion. If the keyword is found I am splitting $3 using the . (dot) and reading the portion before and after the dot in an array a. If it does have that... (12 Replies)
Discussion started by: cmccabe
12 Replies

2. UNIX for Dummies Questions & Answers

awk - (URGENT!) Print lines sort and move lines if match found

URGENT HELP IS NEEDED!! I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but - Currently, script is not printing lines to File3.txt in order. - Also the matching lines are not moving out of File1.txt ... (1 Reply)
Discussion started by: High-T
1 Replies

3. Shell Programming and Scripting

ksh sed - Extract specific lines with mulitple occurance of interesting lines

Data file example I look for primary and * to isolate the interesting slot number. slot=`sed '/^primary$/,/\*/!d' filename | tail -1 | sed s'/*//' | awk '{print $1" "$2}'` Now I want to get the Touch line for only the associate slot number, in this case, because the asterisk... (2 Replies)
Discussion started by: popeye
2 Replies

4. Shell Programming and Scripting

Grep word after last occurance of string and display next few lines

Hi, I wanted to grep string "ERROR" and "WORNING" after last occurrence of String "Starting" only and wanted to display two lines after searched ERROR and WORNING string and one line before. I have following cronjob log file "errorlog" file and I have written the code for same in Unix as below... (17 Replies)
Discussion started by: nes
17 Replies

5. Shell Programming and Scripting

Print lines that match regex on xth string

Hello, I need an awk command to print only the lines that match regex on xth field from file. For example if I use this command awk -F"|" ' $22 == "20130117090000.*" 'It wont work, I think, because single quotes wont allow the usage of the metacharacter star * . On the other hand I dont know... (2 Replies)
Discussion started by: black_fender
2 Replies

6. Shell Programming and Scripting

Remove lines that match string at end of column

I have this: 301205 0000030000041.49000000.00 2011111815505 908 301205 0000020000029.10000000.00 2011111815505 962 301205 0000010000027.56000000.00 2011111815505 3083 312291 ... (2 Replies)
Discussion started by: herot
2 Replies

7. UNIX for Dummies Questions & Answers

awk display the match and 2 lines after the match is found.

Hello, can someone help me how to find a word and 2 lines after it and then send the output to another file. For example, here is myfile1.txt. I want to search for "Error" and 2 lines below it and send it to myfile2.txt I tried with grep -A but it's not supported on my system. I tried with awk,... (4 Replies)
Discussion started by: eurouno
4 Replies

8. Shell Programming and Scripting

Multi line document to single lines based on occurance of string

Hi Guys, I am new to awk and sed, i am working multiline document, i want to make make that document into SINGLE lines based on occurace of string "dwh". here's the sample of my problem.. dwh123 2563 4562 4236 1236 78956 12394 4552 dwh192 2656 46536 231326 65652 6565 23262 16625623... (5 Replies)
Discussion started by: victor369
5 Replies

9. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
Discussion started by: rowie718
5 Replies

10. UNIX for Dummies Questions & Answers

Extracting lines that match string at certain position

I have a fixed length file in the following format <date><product_code><other data> The file size is huge and I have to extract only the lines that match a certain product code which is of 2 bytes length. I cannot use normal grep since that may give undesirable results. When I search for prod... (5 Replies)
Discussion started by: paruthiveeran
5 Replies
Login or Register to Ask a Question