How can I match lines with just one occurance of a string in awk?
Hi,
I'm trying to match records using awk which contain only one occurance of my string, I know how to match one or more (+) but matching only one is eluding me without developing some convoluted bit of code. I was hoping there would be some simple pattern matching thing similar to '+' but which means 'one and only one occurance of'.
My matching code looks like this:
Code:
$10 !~ /&| and | AND | And |\// && $11 !~ /FLAT|Flat|Apartment|APARTMENT/ && $10 ~ /MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms/ {
But some records have in their name field multiple names, such as
Quote:
Mr Magoo Mr Smith Miss Demeanor
and I want to not match those records.
Any help with this would be grand!
The only alternative I can think of is some convoluted counting loop which goes through the name split as an array to count if any of the Mr, Mrs, MR, MRS, etc occur more than once, which sounds quite long-winded and unnecessary.
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
I find that such things are relatively straight-forward in perl because of the power of regular expression infrastructure. I don't know if awk has this feature as visibly as does perl, but here is a shell script that drives a small perl script:
Code:
#!/bin/bash -
# @(#) s1 Demonstrate perl.
echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) perl
set -o nounset
echo
FILE=${1-data1}
echo " Data file $FILE:"
cat $FILE
echo
echo " perl script file:"
cat p1
echo
echo " Results:"
./p1 $FILE
exit 0
Producing:
Code:
% ./s1
(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0
perl 5.8.4
Data file data1:
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
perl script file:
#!/usr/bin/perl
# @(#) p1 Demonstrate skipping of line with repeated matches.
use warnings;
use strict;
my($debug);
$debug = 0;
$debug = 1;
my($t1);
my($lines) = 0;
# Make entire line lower case to simply matches. Use captured
# string to omit lines with contain more than one match.
while ( <> ) {
chomp;
print " Working on |$_|\n";
$lines++;
$t1 = lc $_;
next if $t1 =~ /(mr|miss).*\1/;
print "$_\n";;
}
print STDERR " ( Lines read: $lines )\n";
exit(0);
Results:
Working on |Mr Magoo|
Mr Magoo
Working on |Mr Magoo mr magoo|
Working on |Mr Magoo Mr Smith Miss Demeanor|
Working on |Mr Smith Miss Demeanor|
Mr Smith Miss Demeanor
Working on |Miss Demeanor Miss Taken|
Working on |Miss Taken|
Miss Taken
( Lines read: 6 )
I'm trying to match records using awk which contain only one occurance of my string, I know how to match one or more (+) but matching only one is eluding me without developing some convoluted bit of code. I was hoping there would be some simple pattern matching thing similar to '+' but which means 'one and only one occurance of'.
I prefer perl too, in cases like this, but this is easily solvable in awk. Basically, you want to match X but not X.*X.
Code:
$10 !~ /&| and | AND | And |\// && $11 !~ /FLAT|Flat|Apartment|APARTMENT/ && $10 ~ /MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms/ && $10 !~ /(MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms).*(MR|MISS|MRS|MS|Mr|Miss|Mrs|Ms)/ {
And yes, it's a bit ugly, but awk isn't always very pretty.
$ cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo
Miss Taken
$ cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo
Miss Taken
..
.ops , what is the logic here?
Code:
# cat file
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
# awk 'NF==2' file
Mr Magoo
Miss Taken
# awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo mr magoo
$ cat file
Mr Magoo A
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken B
$ awk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo A
Miss Taken B
$ nawk -F'm(r|iss)' 'NF==2' IGNORECASE=9 file
Mr Magoo mr magoo
Just for completeness:
Code:
$ awk --version|head -1
GNU Awk 3.1.6
$ strings =nawk|grep -Fm1 version
version 20070501
The problem with your second example is the case sensitive search (IGNORECASE is GNU specific):
Code:
$ print 'mr
mr mr
miss
miss miss'|nawk -F'm(r|iss)' 'NF==2{print NR,$0}'
1 mr
3 miss
You may try to make it case insensitive using more verbose code
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
Quote:
Originally Posted by radoulov
...... I can't manage to make it work with grep.
If grep is compiled with perl regular expressions, one can get farther. I had 2 versions where it was not compiled in. Here's a sample:
Code:
#!/bin/bash -
# @(#) s1 Demonstrate perl regular expressions in grep.
echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) grep
set -o nounset
echo
FILE=${1-data1}
echo " Data file $FILE:"
cat $FILE
echo
echo " Results:"
grep -v -i --perl-regexp '(mr).*\1' $FILE
exit 0
Producing (on openSUSE 11.0 (i586)):
Code:
$ ./s2
(Versions displayed with local utility "version")
Linux 2.6.25.16-0.1-pae
GNU bash 3.2.39
GNU grep 2.5.2
Data file data1:
Mr Magoo
Mr Magoo mr magoo
Mr Magoo Mr Smith Miss Demeanor
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
Results:
Mr Magoo
Mr Smith Miss Demeanor
Miss Demeanor Miss Taken
Miss Taken
In the awk below, what I am attempting to do is check each line in the tab-delimeted input, which has ~20 lines in it, for a keyword
SVTYPE=Fusion. If the keyword is found I am splitting $3 using the . (dot) and reading the portion before and after the dot in an array a.
If it does have that... (12 Replies)
URGENT HELP IS NEEDED!!
I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but
- Currently, script is not printing lines to File3.txt in order.
- Also the matching lines are not moving out of File1.txt
... (1 Reply)
Data file example
I look for primary and * to isolate the interesting slot number.
slot=`sed '/^primary$/,/\*/!d' filename | tail -1 | sed s'/*//' | awk '{print $1" "$2}'`
Now I want to get the Touch line for only the associate slot number, in this case, because the asterisk... (2 Replies)
Hi,
I wanted to grep string "ERROR" and "WORNING" after last occurrence of String "Starting" only and wanted to display two lines after searched ERROR and WORNING string and one line before. I have following cronjob log file "errorlog" file and I have written the code for same in Unix as below... (17 Replies)
Hello,
I need an awk command to print only the lines that match regex on xth field from file.
For example if I use this command
awk -F"|" ' $22 == "20130117090000.*" 'It wont work, I think, because single quotes wont allow the usage of the metacharacter star * . On the other hand I dont know... (2 Replies)
Hello, can someone help me how to find a word and 2 lines after it and then send the output to another file.
For example, here is myfile1.txt. I want to search for "Error" and 2 lines below it and send it to myfile2.txt
I tried with grep -A but it's not supported on my system.
I tried with awk,... (4 Replies)
Hi Guys,
I am new to awk and sed, i am working multiline document, i want to make make that document into SINGLE lines based on occurace of string "dwh".
here's the sample of my problem..
dwh123 2563 4562 4236 1236 78956 12394 4552 dwh192 2656 46536 231326 65652 6565 23262 16625623... (5 Replies)
Hi folks,
I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
I have a fixed length file in the following format
<date><product_code><other data>
The file size is huge and I have to extract only the lines that match a certain product code which is of 2 bytes length. I cannot use normal grep since that may give undesirable results. When I search for prod... (5 Replies)