GSUB/Regex Help


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting GSUB/Regex Help
# 1  
Old 08-21-2012
GSUB/Regex Help

I am trying to write my gsub regex to replace a bunch of special characters with spaces, so i can split it to an array and look at each word independently.

However, my regex skills are slightly lacking and I appear to be missing a quote or something here.

I am trying to replace the following characters \/;:'"`()| with a space.

Code:
gsub ( "[\\'|\\"|\\:|\\;|\\`|\\(|\\)|\\\|\\/|\\|]"," ",$0 )

When i run this piece of the code i get the following error.

line 240: syntax error at line 289: `)' unexpected
# 2  
Old 08-21-2012
What OS are you on? shell?
# 3  
Old 08-21-2012
Can you please try this :

Code:
awk '{gsub(/[\|\(\)\\\/\;\:\47\`\"]*/," ",$0);}1' input_file

Caution: This might be a very naive one since am a beginner hereSmilie
47 stands for ' [quote] ..its a octal value
# 4  
Old 08-21-2012
Quote:
Originally Posted by msabhi
Can you please try this :

Code:
awk '{gsub(/[\|\(\)\\\/\;\:\47\`\"]*/," ",$0);}1' input_file

Caution: This might be a very naive one since am a beginner hereSmilie
47 stands for '
Quote:
..its a octal value
That almost works. You don't need (or want) the asterisk in a call to gsub(). It matches every string of zero or more matches, which in this case effectively adds a space before any of the characters in the input that aren't in the list. You have several backslash characters escaping characters that aren't special inside a bracket expression, but they shouldn't hurt anything. Also, if the last arg to gsub() is left off, it uses $0 as a default.

The following line works:
Code:
awk '{gsub(/[|()\\\/;:\47`"]/," ");}1' input_file

Note, however, that this solution won't work on a system with EBCDIC as the codeset for the C Locale. (I think IBM still supports systems like this.) On a system using EBCDIC, you'd need to use \175 instead of \47. If you want to put this in an awk program file (where the script won't have quote processing performed by the shell before awk sees it, the following line should work:

script:
Code:
 {gsub(/[|()\\\/;:'`"]/," ");print}

Code:
awk -f script input_file

will work without codeset dependencies.

Last edited by vgersh99; 08-21-2012 at 03:58 PM.. Reason: fixed code tags
These 2 Users Gave Thanks to Don Cragun For This Post:
# 5  
Old 08-22-2012
Superb Don ! your explanation will be fruitful to many a here like me trying to learn the best in awk/any programming language..thanks a millionSmilie
# 6  
Old 08-22-2012
Don

Thanks for the tips....I have it working on my HP boxes and Redhat boxes now.
I am running this as part of a KSH script so i still need the quote processing.

Here is the current code (i added a few more symbols).
Code:
gsub ( "['\)''\('=;:/'\'''\`''\\''\"''\|''\.''\$''\-''\@''\%']"," ",$0 )

Im not sure why but for whatever reason Solaris does not like this. I keep getting the following error.

Quote:
/usr/xpg4/bin/awk: line 8 (NR=1): /[)(=;:/'`"|.$-@%]/: invalid endpoint in range
# 7  
Old 08-22-2012
Quote:
Originally Posted by nitrobass24
Don

Thanks for the tips....I have it working on my HP boxes and Redhat boxes now.
I am running this as part of a KSH script so i still need the quote processing.

Here is the current code (i added a few more symbols).
Code:
gsub ( "['\)''\('=;:/'\'''\`''\\''\"''\|''\.''\$''\-''\@''\%']"," ",$0 )

Im not sure why but for whatever reason Solaris does not like this. I keep getting the following error.
Code:
/usr/xpg4/bin/awk: line 8 (NR=1): /[)(=;:/'`"|.$-@%]/: invalid endpoint in range

Just to be sure I understand what you're saying, you have a ksh shell script that at some point contains something like:
Code:
awk 'first line of awk program
second line of awk program
third line of awk program
fourth line of awk program
fifth line of awk program
sixth line of awk program
seventh line of awk program
pattern {gsub ( "['\)''\('=;:/'\'''\`''\\''\"''\|''\.''\$''\-''\@''\%']"," ",$0 )}
possibly more lines of awk program'
possibly followed by more lines in your ksh shell script

and you have chosen to use the call to gsub() shown above rather than the suggestion I made in an earlier post:
Code:
gsub(/[|()\\\/;:\47`"]/," ")

because you now also want to change the characters <period>, <dollar-sign>, <hyphen>, <at-sign>, and <percent-sign> to a <space> in addition to the characters you were changing before. Is that correct?

Note that having $-@ in a bracket expression in the 1st argument to gsub after quote removal is a range expression matching <dollar-sign>, <at-sign> and everything that comes between them in your current locale definition. In the POSIX locale, that should match the following characters
Code:
$%&'()*+,-./0123456789:;<=>@

not just the $, -, and @.

I put together an input file to use to test various calls to gsub:

in.gsubspecial:
Code:
backslash[\] slash[/] semi[;] colon[:]
single-quote['] double-quote["] back-quote[`]
open-paren[(] close-paren[)] open-brace[{] close-brace[}]
dollar-sign[$] at-sign[@] percent[%] hyphen[-] 
digits[0123456789]
range-expression[$%&'()*+,-./0123456789:;<=>@]

and used the following commands in a shell script to test out three sample
gsub() calls I produced and the gsub call you have above.
Code:
awk ' { #print input line
        print
        #make copies
        x=$0
        y=$0
        z=$0
        #Previously suggested gsub (with original set of characters to change
        #This version used \47 to represent the single-quote
        gsub(/[|()\\\/;:\47`"$@%-]/," ");print $0,"my original gsub"
        #The following versions use '\'' to get out of the quoted string
        #  containing the program, insert an escaped quote, and get back into
        #  the quoted string containing the rest of the program (which gets rid
        #  of the codeset dependency).
        #Prevous gsub with added character using a range expression
        #Note that - is at the end of the bracket expression
        gsub("[|()\\\\\/;:'\''`\"$@%-]"," ", x);print x,"expanded gsub, no range exp"
        #Prevous gsub with added character using a range expression
        #Note that - is in between $ and @
        gsub("[|()\\\\\/;:'\''`\"$-@%]"," ", y);print y,"Expanded gsub w/range"

        gsub("['\)''\('=;:/'\'''\`''\\''\"''\|''\.''\$''\-''\@''\%']"," ",z);print z,"gsub from nitrobass24"
}' in.gsubspecial

When I run this script, I get the following output:
Code:
backslash[\] slash[/] semi[;] colon[:]
backslash[ ] slash[ ] semi[ ] colon[ ] my original gsub
backslash[ ] slash[ ] semi[ ] colon[ ] expanded gsub, no range exp
backslash[ ] slash[ ] semi[ ] colon[ ] Expanded gsub w/range
backslash[\] slash[ ] semi[ ] colon[ ] gsub from nitrobass24
single-quote['] double-quote["] back-quote[`]
single quote[ ] double quote[ ] back quote[ ] my original gsub
single quote[ ] double quote[ ] back quote[ ] expanded gsub, no range exp
single quote[ ] double quote[ ] back quote[ ] Expanded gsub w/range
single quote[ ] double quote[ ] back quote[ ] gsub from nitrobass24
open-paren[(] close-paren[)] open-brace[{] close-brace[}]
open paren[ ] close paren[ ] open brace[{] close brace[}] my original gsub
open paren[ ] close paren[ ] open brace[{] close brace[}] expanded gsub, no range exp
open paren[ ] close paren[ ] open brace[{] close brace[}] Expanded gsub w/range
open paren[ ] close paren[ ] open brace[{] close brace[}] gsub from nitrobass24
dollar-sign[$] at-sign[@] percent[%] hyphen[-]
dollar sign[ ] at sign[ ] percent[ ] hyphen[ ] my original gsub
dollar sign[ ] at sign[ ] percent[ ] hyphen[ ] expanded gsub, no range exp
dollar sign[ ] at sign[ ] percent[ ] hyphen[ ] Expanded gsub w/range
dollar sign[ ] at sign[ ] percent[ ] hyphen[ ] gsub from nitrobass24
digits[0123456789]
digits[0123456789] my original gsub
digits[0123456789] expanded gsub, no range exp
digits[          ] Expanded gsub w/range
digits[          ] gsub from nitrobass24
range-expression[$%&'()*+,-./0123456789:;<=>@]
range expression[  &   *+, . 0123456789  <=> ] my original gsub
range expression[  &   *+, . 0123456789  <=> ] expanded gsub, no range exp
range expression[                            ] Expanded gsub w/range
range expression[                            ] gsub from nitrobass24

I don't know why your gsub wouldn't work with /usr/xpg4/bin/awk on Solaris 10. (Is there any chance that you're using a Locale with a non-standard setting for the LC_COLLATE category? Are you sure that you are using exactly the same script on Solaris 10 that you're using on the other systems? Having another single-quote anywhere in your awk script [even in a comment] could greatly change the behavior.) I do see that your gsub() call fails to change a backslash character into a space. If you intended to use the $-@ as a range expression, we can get rid of several character in the matching list expression that are not only listed individually, but are also included in the range expression (including the single-quote).

Hopefully, this will give you something you can adapt to something you can use.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sendmail K command regex: adding exclusion/negative lookahead to regex -a@MATCH

I'm trying to get some exclusions into our sendmail regular expression for the K command. The following configuration & regex works: LOCAL_CONFIG # Kcheckaddress regex -a@MATCH +<@+?\.++?\.(us|info|to|br|bid|cn|ru) LOCAL_RULESETS SLocal_check_mail # check address against various regex... (0 Replies)
Discussion started by: RobbieTheK
0 Replies

2. Shell Programming and Scripting

Perl, RegEx - Help me to understand the regex!

I am not a big expert in regex and have just little understanding of that language. Could you help me to understand the regular Perl expression: ^(?!if\b|else\b|while\b|)(?:+?\s+){1,6}(+\s*)\(*\) *?(?:^*;?+){0,10}\{ ------ This is regex to select functions from a C/C++ source and defined in... (2 Replies)
Discussion started by: alex_5161
2 Replies

3. UNIX for Dummies Questions & Answers

Gsub regex not working

I have a number of files that I pass through awk/gsub. I believe to have found a working regex and on 'test bed' sites it matches, however within gsub it does not. Examples: Initial data: /Volumes/Daniel/Public/Drop Box/_Hellsing_Ultimate_OVA_-_10_.mkv gsub & regex: gsub("\]+\]","" ... (4 Replies)
Discussion started by: unknownn
4 Replies

4. Shell Programming and Scripting

awk gsub

Hi, I want to print the first column with original value and without any double quotes The output should look like <original column>|<column without quotes> $ cat a.txt "20121023","19301229712","100397" "20121023","19361629712","100778" "20121030A","19361630412","100838"... (3 Replies)
Discussion started by: ysrini
3 Replies

5. UNIX for Dummies Questions & Answers

awk gsub(): general regex

%%%%% (9 Replies)
Discussion started by: lucasvs
9 Replies

6. Shell Programming and Scripting

How to use gsub and array

Hello, i'm searching for a solution to this problem. I have 2 files, the first one is like: <HTML> <HEAD> <TITLE>{$String1}</TITLE> </HEAD> <BODY> <P>{$String2}</P> </BODY> </HTML>and the other one: {$String1}; french {$String2}; italian {$String3}; english ... {$StringN}; I... (3 Replies)
Discussion started by: heaven25
3 Replies

7. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

8. Shell Programming and Scripting

Help with AWK and gsub

Hello, I have a variable that displays the following results from a JVM.... 1602100K->1578435K I would like to collect the value of 1578435 which is the value after a garbage collection. I've tried the following command but it looks like I can't get the > to work. Any suggestions as... (4 Replies)
Discussion started by: npolite
4 Replies

9. Shell Programming and Scripting

Gsub and nawk

Hello I have problem with reg-expr and function gsub(); File that I want to preprocess look like this: int table ; printf(" variable : ", variable) ; Using nawk I try something like this: for ( .... ) { line = $0 reg_expr = "\.\=]*" "" variable "" "\.\=]*" ; gsub( reg_expr... (1 Reply)
Discussion started by: scotty_123
1 Replies
Login or Register to Ask a Question