Grepping for Exact Strings


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Grepping for Exact Strings
# 15  
Old 04-13-2009
working solution.

Here you go:

My input file "a" contains:

Code:
my-do-not-change-airplane and other frogs
airplane.seriously change me
sometimes in the file i'm using,
there are words like this: db-airplane, db-12.airplane.
in cases like that, your code turns
the words into db-helicopter, db-12.helicopter.
airplane,frogs,somewerirdairplane buggly buggly
aardvark,chameleon,airplane,dugong,basilisk
aardvark,chameleon,dugong,basilisk,airplane

My script, "clam" looks like this:

Code:
#!/bin/ksh
#----------------------------------------------------------------------#
# Find funky occurances of airplane...                                 #
#----------------------------------------------------------------------#
cat a |
#----------------------------------------------------------------------#
# Translate any and word delimiters to newlines...                     #
#----------------------------------------------------------------------#
  tr '[. ,      ]' '\012\012\012\012' |
  grep airplane |
#----------------------------------------------------------------------#
# Grep OUT our target word to change...                                #
#----------------------------------------------------------------------#
  grep -v "^airplane$" |
  sort -u |
while read pattern ; do
#----------------------------------------------------------------------#
# Create list of sed commands...                                       #
#----------------------------------------------------------------------#
  print "s/$pattern/${pattern}DONOTEDIT/g;"
done > sed.file
#----------------------------------------------------------------------#
# Finish our little sed script. Remove our DONOTEDIT strings.          #
#----------------------------------------------------------------------#
cat << EOF >> sed.file
s/\<airplane\>/HELICOPTER/g;
s/DONOTEDIT//g;
EOF
#----------------------------------------------------------------------#
# ... and voila....                                                    #
#----------------------------------------------------------------------#
sed -f sed.file a

... and the output is:

Code:
my-do-not-change-airplane and other frogs
HELICOPTER.seriously change me
sometimes in the file i'm using,
there are words like this: db-airplane, db-12.HELICOPTER.
in cases like that, your code turns
the words into db-helicopter, db-12.helicopter.
HELICOPTER,frogs,somewerirdairplane buggly buggly
aardvark,chameleon,HELICOPTER,dugong,basilisk
aardvark,chameleon,dugong,basilisk,HELICOPTER

# 16  
Old 04-13-2009
I see that the script also changed db-12.airplane to db-12.HELICOPTER.

You can modify this action, one of two ways:

remove the . from the "tr" translation list, ( also removing one of the \012 sequences.
btw, you need to match word-delimiter count to \012 count. )

Or simply intercept the sed.file and modify its contents first.
# 17  
Old 04-13-2009
anyways.... removed the . word delimiter. here's the code:

Code:
#!/bin/ksh

#----------------------------------------------------------------------#
# Find funky occurances of airplane...                                 #
#----------------------------------------------------------------------#
cat a |

#----------------------------------------------------------------------#
# Translate any and word delimiters to newlines...                     #
#----------------------------------------------------------------------#
  tr '[ ,       ]' '\012\012\012' |
  grep airplane |

#----------------------------------------------------------------------#
# Grep OUT our target word to change...                                #
#----------------------------------------------------------------------#
  grep -v "^airplane$" |
  sort -u |
while read pattern ; do

#----------------------------------------------------------------------#
# Create list of sed commands...                                       #
#----------------------------------------------------------------------#
  print "s/$pattern/${pattern}DONOTEDIT/g;"

done > sed.file

#----------------------------------------------------------------------#
# Finish our little sed script. Remove our DONOTEDIT strings.          #
#----------------------------------------------------------------------#
cat << EOF >> sed.file
s/\<airplane\>/HELICOPTER/g;
s/DONOTEDIT//g;
EOF

#----------------------------------------------------------------------#
# ... and voila....                                                    #
#----------------------------------------------------------------------#
sed -f sed.file a

# 18  
Old 04-19-2009
I spent some time trying to get into awk. So, here is my solution:
Code:
awk 'BEGIN {FS="[.,\ ]"; OFS=" ";} {for(i=1; i<=NF; ++i) {if($i=="airplane") {sub(/airplane/, "helicopter", $i);}} print $0;}' airplane > airplane.new

There's a small flaw in it, though, due to the fact that the input field delimiter (FS) is a regular expression instead of a static string/character. Unfortunately, I didn't find a way to "preserve" the actual input delimiter for output (OFS) but had to set it to a static (whitespace) character.

Maybe one of you guys knows a way how to achieve this. Smilie

--Gunther
# 19  
Old 04-19-2009
Skysmart, you have to be more precise about what is your definition for "airplane alone".
Given this definition, it should be feasible to do the replacement with sed alone, no pun intended.

Usually the definition for "word" is: a sequence of alfanumeric characters surrounded by punctuation characters or by the start/end of the line.
Usually the definition for alfanumeric is: 0,1,...9,a,b,...z,A,B,...Z,_
Usually the definition for punctuation is: any non alfanumeric character.

I understand that you assume that the character "-" is not punctuation.
Is there any other character that you deem not punctuation?

Code:
sed -r 's/(^|[^0-9a-zA-Z_-])airplane([^0-9a-zA-Z_-]|$)/\1helicopter\2/g'

Tested with GNU sed version 4.1.5.

-r is to make sed understand extended regular expression syntax.
^ stands for the start of the line and $ stands for the end of the line; they are "anchors" and do not represent a character.
| is the alternating operator, that is multiple choices.
[] is a class of characters; this expression represents a single character; [^...] is the class of characters that are not in the class [...].
\1 and \2 are backreferences and they are required to keep the characters surrounding the airplane.

Last edited by colemar; 04-19-2009 at 09:43 AM..
# 20  
Old 04-19-2009
Quote:
Originally Posted by SkySmart
...
you see what i'm saying? I dont want the code to touch anything in the file that isn't "airplane", alone. all i want is to replace places were the airplane stands alone.

thanks you so much for your suggestions

If I correctly understand the OP's request, here goes another way with perl,

In my test file, highlighted words contain the required pattern, but only those surrounded by whitespace, or that are found at the start or end of the record (in bold) are to be replaced:

Code:
$ cat file
data data data data airplane. data data data data data data airplane data data

data airplane. data airplanewomen, airplaneman, sweetairplane, sourairplane.

airplane data data data data data data data data data db-airplane, db-12.airplane.

data data airplane, data data data data data data wantairplane data airplane

data data data data airplane data data data data data .airplane data data data


Code:
perl -ne 's/(^|(?<=\s))airplane((?=\s)|$)/helicopter/g; print' file


Output:

Code:
data data data data airplane. data data data data data data helicopter data data

data airplane. data airplanewomen, airplaneman, sweetairplane, sourairplane.

helicopter data data data data data data data data data db-airplane, db-12.airplane.

data data airplane, data data data data data data wantairplane data helicopter

data data data data helicopter data data data data data .airplane data data data

Tested with perl v5.10.0 (Linux) and v5.8.4 (Solaris 10).
# 21  
Old 04-20-2009
rubin -- could you explain the pipe and stuff?
I don't see why you left off the < on the end-of-line expression.
or what the question mark is doing...

Code:
s/(^|(?<=\s))airplane((?=\s)|$)/helicopter/g;

Thanks,
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract between two Exact matched strings.

more data.txt i need this exacted from data.txt This is the command i tried sed -n "/Start_of_DISK_info:\/u/,/End_of_DISK_info:\/u/p" data.txtBut, unfortunately it does not do an exact match. Instead, it prints text between both these strings /u & /u/tmp like below. i need this... (6 Replies)
Discussion started by: mohtashims
6 Replies

2. Shell Programming and Scripting

Grepping exact pattern and deleting the rows

Hi All, I have the Input as below: 21011513 6030 5570 1710 0 0 5140 0 3430 3430 0 21049513 152340 138260 101210 0 0 134440 0 33880 33880 0 31003514 16100 13280 7580 3250 1530 15650 8090 8100 ... (22 Replies)
Discussion started by: am24
22 Replies

3. Shell Programming and Scripting

Grepping multiple strings from one column

I have 3-column tab separated data that looks like the following: act of+n-a-large+vn-tell-v 0.067427 act_com of+n+n-a-large-manufacturer-n 0.129922 act-act_com-com in+n-j+vn-pass-aux-restate-v 0.364499666667 com nmod+n-j+ns-invader-n 0.527521 act_com-com obj+n-a-j+vd-contribute-v 0.091413... (2 Replies)
Discussion started by: owwow14
2 Replies

4. Shell Programming and Scripting

echo exact xml tag from an exact file

Im stumped on this one. Id like to echo into a .txt file all names for an xml feed in a huge folder. Can that be done?? Id need to echo <name>This name</name> in client.xml files. $path="/mnt/windows/path" echo 'recording names' cd "$path" for names in $path than Im stuck on... (2 Replies)
Discussion started by: graphicsman
2 Replies

5. Shell Programming and Scripting

QUESTION1: grep only exact string. QUESTION2: find and replace only exact value with sed

QUESTION1: How do you grep only an exact string. I am using Solaris10 and do not have any GNU products installed. Contents of car.txt CAR1_KEY0 CAR1_KEY1 CAR2_KEY0 CAR2_KEY1 CAR1_KEY10 CURRENT COMMAND LINE: WHERE VARIABLE CAR_NUMBER=1 AND KEY_NUMBER=1 grep... (1 Reply)
Discussion started by: thibodc
1 Replies

6. Shell Programming and Scripting

How to extract exact strings in shell scripting

/Path/snowbird9/nrfCompMgrRave1230100920.log.gz:09/20/2010 06:14:51 ERROR Error Message. /Path/snowbird6/nrfCompMgrRave1220100920.log.gz:09/20/2010 06:14:51 ERROR Error Message. /Path/snowbird14/nrfCompMgrRave920100920.log.gz:09/20/2010 06:14:51 ERROR Error Message.... (0 Replies)
Discussion started by: Shirisha
0 Replies

7. Shell Programming and Scripting

Grepping Multiple Strings on the Same Line 'Or'

I've got this command that I've been using to find strings on the same line, say I'm doing a search for name: find . -name "*" | xargs grep -i "Doe" | grep -i "John" > output.txt This gives me every line in a file that has John and Doe in it. I'm looking to add a OR operator for the second... (5 Replies)
Discussion started by: Rally_Point
5 Replies

8. Shell Programming and Scripting

Grepping for two strings that MUST exist on the same line

Trying to find a way to grep for two names on a line. Both names must appear on the same line so '|' / OR is out. So far, I'm just messing around and I've got find . -name "*" | xargs grep "Smith" Let me explain. I'm at a top level and need to know all the names of the files that... (6 Replies)
Discussion started by: Rally_Point
6 Replies

9. Shell Programming and Scripting

Trouble grepping for multiple strings

I am having a heck of a time trying to write a script that will grep for multiple strings in a single file. I am really at my wits end here and I am hoping to get some feedback here. Basic information: OS: Solaris 9 Shell: KSH Oracle Database server I was trying to grep through a file... (5 Replies)
Discussion started by: thecoffeeguy
5 Replies

10. UNIX for Dummies Questions & Answers

Grepping for strings

Hello. I have a dir of 1500+ dir. In these dirs is a file host, with a tag <x_tag>. I need to : 1. grep for all dir that contain this host file that contain <x_tag> 2. print a list of these host files containing <x_tag> is this better to egrep this? (5 Replies)
Discussion started by: t4st33@mac.com
5 Replies
Login or Register to Ask a Question