How can I match lines with just one occurance of a string in awk?

Login or Register to Ask a Question and Join Our Community

How can I match lines with just one occurance of a string in awk?

Tags

awk, shell scripts, string

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting How can I match lines with just one occurance of a string in awk?

10-24-2008

Registered User

6, 0

Join Date: Oct 2008

Last Activity: 24 October 2008, 9:46 AM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

How can I match lines with just one occurance of a string in awk?

Hi,

I'm trying to match records using awk which contain only one occurance of my string, I know how to match one or more (+) but matching only one is eluding me without developing some convoluted bit of code. I was hoping there would be some simple pattern matching thing similar to '+' but which means 'one and only one occurance of'.

My matching code looks like this:

Code:

$10 !~ /&| and | AND | And |\// && $11 !~ /FLAT|Flat|Apartment|APARTMENT/ && $10 ~ /MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms/ {

But some records have in their name field multiple names, such as

Quote:

Mr Magoo Mr Smith Miss Demeanor

and I want to not match those records.

Any help with this would be grand!

The only alternative I can think of is some convoluted counting loop which goes through the name split as an array to count if any of the Mr, Mrs, MR, MRS, etc occur more than once, which sounds quite long-winded and unnecessary.

jonathanm

View Public Profile for jonathanm

Find all posts by jonathanm

10-25-2008

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

I find that such things are relatively straight-forward in perl because of the power of regular expression infrastructure. I don't know if awk has this feature as visibly as does perl, but here is a shell script that drives a small perl script:

Code:

#!/bin/bash -

# @(#) s1       Demonstrate perl.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) perl
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " perl script file:"
cat p1

echo
echo " Results:"
./p1 $FILE

exit 0

Producing:

Code:

% ./s1

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0
perl 5.8.4

 Data file data1:
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

 perl script file:
#!/usr/bin/perl

# @(#) p1       Demonstrate skipping of line with repeated matches.

use warnings;
use strict;

my($debug);
$debug = 0;
$debug = 1;
my($t1);

my($lines) = 0;

# Make entire line lower case to simply matches. Use captured
# string to omit lines with contain more than one match.

while ( <> ) {
chomp;
        print " Working on |$_|\n";
        $lines++;
        $t1 = lc $_;
        next if $t1 =~ /(mr|miss).*\1/;
    print "$_\n";;
}

print STDERR " ( Lines read: $lines )\n";

exit(0);

 Results:
 Working on |Mr Magoo|
Mr Magoo
 Working on |Mr Magoo mr magoo|
 Working on |Mr Magoo Mr Smith Miss Demeanor|
 Working on |Mr Smith Miss Demeanor|
Mr Smith Miss Demeanor
 Working on |Miss Demeanor Miss Taken|
 Working on |Miss Taken|
Miss Taken
 ( Lines read: 6 )

Best wishes ... cheers, drl

drl

View Public Profile for drl

Find all posts by drl

10-25-2008

Registered User

2,157, 51

Join Date: Feb 2007

Last Activity: 6 September 2017, 5:43 AM EDT

Location: Innsbruck, Austria

Posts: 2,157

Thanks Given: 12

Thanked 51 Times in 48 Posts

Quote:

Originally Posted by jonathanm

Hi,

I'm trying to match records using awk which contain only one occurance of my string, I know how to match one or more (+) but matching only one is eluding me without developing some convoluted bit of code. I was hoping there would be some simple pattern matching thing similar to '+' but which means 'one and only one occurance of'.

I prefer perl too, in cases like this, but this is easily solvable in awk. Basically, you want to match X but not X.*X.

Code:

$10 !~ /&| and | AND | And |\// && $11 !~ /FLAT|Flat|Apartment|APARTMENT/ && $10 ~ /MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms/ && $10 !~ /(MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms).*(MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms)/ {

And yes, it's a bit ugly, but awk isn't always very pretty.

otheus

View Public Profile for otheus

Find all posts by otheus

10-25-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

With GNU AWK:

Code:

$ cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo
Miss Taken

Or another version of drl's solution:

Code:

perl -nle'!/(m(r|iss)).*\2/i&&print' file

Some versions of sed:

Code:

sed -nr '/(m(r|iss)).*\2/I!p' file

... I can't manage to make it work with grep.

Last edited by radoulov; 10-25-2008 at 06:03 PM..

radoulov

View Public Profile for radoulov

Find all posts by radoulov

10-25-2008

Registered User

2,163, 123

Join Date: Nov 2007

Last Activity: 31 July 2016, 9:42 AM EDT

Location: H3X

Posts: 2,163

Thanks Given: 11

Thanked 123 Times in 116 Posts

Quote:

Originally Posted by radoulov

With GNU AWK:

Code:

$ cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo
Miss Taken

..

.ops , what is the logic here?

Code:

# cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
# awk 'NF==2' file
Mr Magoo
Miss Taken
# awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo mr magoo

danmero

View Public Profile for danmero

Find all posts by danmero

10-25-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Quote:

Originally Posted by danmero

.ops , what is the logic here?
[...]

I said GNU AWK.

Code:

$ cat file
Mr Magoo A
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken B
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo A
Miss Taken B
$ nawk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo mr magoo

Just for completeness:

Code:

$ awk --version|head -1                      
GNU Awk 3.1.6
$ strings =nawk|grep -Fm1 version
version 20070501

The problem with your second example is the case sensitive search (IGNORECASE is GNU specific):

Code:

$ print 'mr
mr mr
miss
miss miss'|nawk -F'm(r|iss)' 'NF==2{print NR,$0}' 
1 mr
3 miss

You may try to make it case insensitive using more verbose code

Last edited by radoulov; 10-25-2008 at 06:38 PM..

radoulov

View Public Profile for radoulov

Find all posts by radoulov

10-25-2008

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

Quote:

Originally Posted by radoulov

...... I can't manage to make it work with grep. Smilie

Smilie

If grep is compiled with perl regular expressions, one can get farther. I had 2 versions where it was not compiled in. Here's a sample:

Code:

#!/bin/bash -

# @(#) s1       Demonstrate perl regular expressions in grep.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) grep
set -o nounset
echo

FILE=${1-data1}

echo " Data file $FILE:"
cat $FILE

echo
echo " Results:"
grep -v -i --perl-regexp '(mr).*\1' $FILE

exit 0

Producing (on openSUSE 11.0 (i586)):

Code:

$ ./s2

(Versions displayed with local utility "version")
Linux 2.6.25.16-0.1-pae
GNU bash 3.2.39
GNU grep 2.5.2

 Data file data1:
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

 Results:
Mr Magoo
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken

cheers, drl

drl

View Public Profile for drl

Find all posts by drl

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to combine lines if fields match in lines

In the awk below, what I am attempting to do is check each line in the tab-delimeted input, which has ~20 lines in it, for a keyword SVTYPE=Fusion. If the keyword is found I am splitting $3 using the . (dot) and reading the portion before and after the dot in an array a. If it does have that...

2. UNIX for Dummies Questions & Answers

awk - (URGENT!) Print lines sort and move lines if match found

URGENT HELP IS NEEDED!! I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but - Currently, script is not printing lines to File3.txt in order. - Also the matching lines are not moving out of File1.txt ...

3. Shell Programming and Scripting

ksh sed - Extract specific lines with mulitple occurance of interesting lines

Data file example I look for primary and * to isolate the interesting slot number. slot=`sed '/^primary$/,/\*/!d' filename | tail -1 | sed s'/*//' | awk '{print $1" "$2}'` Now I want to get the Touch line for only the associate slot number, in this case, because the asterisk...

4. Shell Programming and Scripting

Grep word after last occurance of string and display next few lines

Hi, I wanted to grep string "ERROR" and "WORNING" after last occurrence of String "Starting" only and wanted to display two lines after searched ERROR and WORNING string and one line before. I have following cronjob log file "errorlog" file and I have written the code for same in Unix as below...

5. Shell Programming and Scripting

Print lines that match regex on xth string

Hello, I need an awk command to print only the lines that match regex on xth field from file. For example if I use this command awk -F"|" ' $22 == "20130117090000.*" 'It wont work, I think, because single quotes wont allow the usage of the metacharacter star * . On the other hand I dont know...

6. Shell Programming and Scripting

Remove lines that match string at end of column

I have this: 301205 0000030000041.49000000.00 2011111815505 908 301205 0000020000029.10000000.00 2011111815505 962 301205 0000010000027.56000000.00 2011111815505 3083 312291 ...

7. UNIX for Dummies Questions & Answers

awk display the match and 2 lines after the match is found.

Hello, can someone help me how to find a word and 2 lines after it and then send the output to another file. For example, here is myfile1.txt. I want to search for "Error" and 2 lines below it and send it to myfile2.txt I tried with grep -A but it's not supported on my system. I tried with awk,...

8. Shell Programming and Scripting

Multi line document to single lines based on occurance of string

Hi Guys, I am new to awk and sed, i am working multiline document, i want to make make that document into SINGLE lines based on occurace of string "dwh". here's the sample of my problem.. dwh123 2563 4562 4236 1236 78956 12394 4552 dwh192 2656 46536 231326 65652 6565 23262 16625623...

9. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and...

10. UNIX for Dummies Questions & Answers

Extracting lines that match string at certain position

I have a fixed length file in the following format <date><product_code><other data> The file size is huge and I have to extract only the lines that match a certain product code which is of 2 bytes length. I cannot use normal grep since that may give undesirable results. When I search for prod...

Login or Register to Ask a Question