awk search pattern with special characters passed from CL


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk search pattern with special characters passed from CL
# 1  
Old 02-22-2010
awk search pattern with special characters passed from CL

I'm very new to awk and sed and I've been struggling with this for a while.

I'm trying to search a file for a string with special characters and this string is a command line argument to a simple script.

./myscript "searchpattern" file

Code:
#!/bin/sh

awk "/$1/" $2 > dupelistfilter.txt
sed "/$1/d" $2 >> deletelisttest.txt

since the search pattern is a path with / and whitespace what's the best way to deal with special characters in the search pattern?
I have to do this in the original bourne shell which has no printf %q.

nested quotes throws an awk error at me even if I were to use eval to avoid the $1.

Any advice appreciated.
# 2  
Old 02-22-2010
Code:
$ x="/tmp/tmp"
$ echo "/tmp/tmp121/" | awk ' $0 ~ "'"$x"'" '
/tmp/tmp121/

# 3  
Old 02-22-2010
Hey, cue:

I would recommend against your awk/sed approach and I also recommend not using anbu23's code (nothing personal, anbu23 Smilie). All of the proposed solutions are vulnerable to the presence of special characters (whether they be special to sed and awk regular expressions [cue's examples] or to awk strings [as in anbu23's case]).

You've already seen the problem with your attempts. Even if you escaped the forward slashes in the variable's value (or used a different delimiter, s#regexp#replacement#flag, which sed allows), you may still encounter problems if there is a "." or a "*" or any other metacharacter.

anbu23's would fail and throw syntax errors if a there's a double quote, will match erroneously if backslash sequences are present, etc, due to conflicts with AWK's string parsing.

Example:
Code:
$ x='/tmp/tmp"'
$ echo '/tmp/tmp"121/' | awk ' $0 ~ "'"$x"'" '
awk: non-terminated string  tmp/tmp ... at source line 1
 context is
         >>>  <<<
awk: giving up
 source line number 2



In my opinion, the best (most futureproof) approach is to use something not vulnerable to any magical characters. I suggest:
Code:
awk -v x="$1" 'index($0,x)' "$2" > dupelistfilter.txt

If you want to negate the logic of the match:
Code:
awk -v x="$1" '!index($0,x)' "$2" >> deletelisttest.txt

Regards,
Alister
# 4  
Old 02-22-2010
If the input has quotes the
Code:
echo '/tmp/tmp"121/' | awk -v x='/tmp/tmp"' ' $0 ~ x '

# 5  
Old 02-22-2010
Quote:
Originally Posted by anbu23
If the input has quotes the
Code:
echo '/tmp/tmp"121/' | awk -v x='/tmp/tmp"' ' $0 ~ x '

In that case, the value of x inside AWK is vulnerable to regular expression metacharacters. Say, for example, that you wanted to match a pathname that had a dot. The dot would not be treated literally, but would be a wildcard matching any character. In the following example, it yields a false positive.

Code:
$ echo '/tmp/tmpnext' | awk -v x='/tmp/tmp.ext' ' $0 ~ x '
/tmp/tmpnext

There's simply no way around it. Unless you are absolutely certain that there will be no metacharacters involved, you cannot pass a value through SED or AWK's regular expression parsers (or AWK's string parser) without passing that value through some sort of sanitizing step to properly escape those special characters (which would be something of a nightmare if it had to be made safe to pass through AWK's string parsing before arriving at the regular expressioin parsing stage).

Alister

---------- Post updated at 01:18 PM ---------- Previous update was at 01:15 PM ----------

cue:

Now that i think about it, by far the simplest solution to this is fgrep. I became fixated on AWK and sed since they were listed in the original post. Unless I missed something, the following should work just fine and is not susceptible to metacharacter interference.

Code:
fgrep "$1" "$2" > dupelistfilter.txt
fgrep -v "$1" "$2" >> deletelisttest.txt

This User Gave Thanks to alister For This Post:
# 6  
Old 02-22-2010
thanks anbu23 and alister for the suggestions. seems there is always one character which can cause a problem. As long as that specific metacharacter isn't permitted in a filename it's fine for my application of the script, even if it isn't fully sanitized. Thanks again.


edit: nevermind, you're right, should have just used grep/fgrep.

Last edited by cue; 02-22-2010 at 02:50 PM..
# 7  
Old 02-22-2010
Quote:
Originally Posted by cue
alister can you elaborate on this part of the command what exactly does it do?
index($0,x)
It checks to see if the string you're searching for (stored in the variable x) is present in the current line (stored in $0). If so, index() returns a non-zero value which in AWK is equivalent to a boolean true value. If the string is not found, index() returns zero. If true, it prints out that line (the default action which is implied is "{print $0}".

My post with the awk solutions included two commands; the second negates the return value with a "!", so that it excludes lines that match (what you are doing with sed's d command).

All that said, you're probably best off using the fgrep commands at the end of my previous post.

Cheers,
Alister
This User Gave Thanks to alister For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Replace Pattern with another that has Special Characters

Hello Team, Any help would be much appreciated for the below scenario: I have a sed command below where I am trying to replace the contents of 'old_pkey' variable with 'new_pkey' variable in a Soap request file (delete_request.txt). This works fine for regular string values, but this new_pkey... (8 Replies)
Discussion started by: ChicagoBlues
8 Replies

2. UNIX for Dummies Questions & Answers

Search special characters in a file and replace with meaningful text messages like Hello

Search special characters in a file and replace with meaningful text messages like Hello (2 Replies)
Discussion started by: raka_rjit
2 Replies

3. Shell Programming and Scripting

Search avoiding special characters

Hi all, I have a list which I want to search in another file. I can do that using grep -f but the search is failing due to special characters, how do I solve this? One row in that list is amino-acid permease inda1 gb|EDU41782.1| amino-acid permease inda1 Input file to be searched... (2 Replies)
Discussion started by: gina.lizar
2 Replies

4. Shell Programming and Scripting

Sed or awk : pattern selection based on special characters

Hello All, I am here again scratching my head on pattern selection with special characters. I have a large file having around 200 entries and i have to select a single line based on a pattern. I am able to do that: Code: cat mytest.txt | awk -F: '/myregex/ { print $2}' ... (6 Replies)
Discussion started by: usha rao
6 Replies

5. Shell Programming and Scripting

SED equivalent for grep -w -f with pattern having special characters

I'm looking for SED equivalent for grep -w -f. All I want is to search a list of patterns from a file. Also If the pattern doesn't match I do not want "null returned", rather I would prefer some text as place holder say "BLANK LINE" as I intend to process the output file based on line number. ... (1 Reply)
Discussion started by: novice_man
1 Replies

6. Shell Programming and Scripting

sed delete pattern with special characters

Hi all, I have the following lines <b>A gtwrhwrthwr text hghthwrhtwrtw </b><font color='#06C'>; text text (text) <b>B gtwrhwrthwr text hghthwrhtwrtw </b><font color='#06C'>; text text (text) <b>J gtwrhwrthwr text hghthwrhtwrtw </b><font color='#06C'>; text text (text) and I would like to... (5 Replies)
Discussion started by: stinkefisch
5 Replies

7. AIX

Removing a filename which has special characters passed from a pipe with xargs

Hi, On AIX 5200-07-00 I have a find command as following to delete files from a certain location that are more than 7 days old. I am being told that I cannot use -exec option to delete files from these directories. Having said that I am more curious to know how this can be done. an sample... (3 Replies)
Discussion started by: jerardfjay
3 Replies

8. Shell Programming and Scripting

NAWK - seach pattern for special characters - } dbl qt - sng qt

i'm puzzled.... trying to look for the pattern }"'. but the below code returns to me the message below (pattern is curley queue + dbl qt + sng qt + period) nawk -v pat="\}\"\'\."' { if (match($0, pat)) { before = substr($0,1,RSTART-1); ... (11 Replies)
Discussion started by: danmauer
11 Replies

9. Shell Programming and Scripting

Perl code to search for filenames that contain special characters

Hello, I have a requirement to search a directory, which contains any number of other directories for file names that contain special characters. directory structure DIR__ |__>DIR1 |__>DIR2__ |__>DIR2.1 |__>DIR2.2 |__>DIR3 .. ... (8 Replies)
Discussion started by: jerardfjay
8 Replies

10. UNIX for Dummies Questions & Answers

search special characters in a file

Hello I am new to shell scripting and can anyone tell me how to check if there are any special characters in a file. Can i use grep ? thanks susie (2 Replies)
Discussion started by: cramya80
2 Replies
Login or Register to Ask a Question