Command line not recognizing metacharacters in awk

08-13-2014

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

Command line not recognizing metacharacters in awk

Hello, I'm new to command line coding (and coding in general) and have run into a problem. I'm using awk to perform a global find and replace in a text file in the Terminal provided by Mac.

Here is a sample of my textfile where the fields are separated by tabs.

Code:

1Ps 1,1  VWB/(J    VWB      VWB 
2 Ps 1,1  &WHJ==  WHJ      HJ==   
3 Ps 1,1  L             L          L      
4 Ps 1,1  GBR/~>   GBR>    GBR
5 Ps 1,1  D            D          D

I want to add a space after the digit that is preceded by a comma (this is because my columns become upset when there is a two-digit number after the coma in other parts of my data). To do this, I used the following:

Code:

awk '{gsub(",[0-9]",",[0-9] "); print}' textfile

However, I got the following output whereby the character class '[0-9]' was not recognized.

Code:

1 Ps 1,[0-9]   VWB/(J    VWB      VWB 
2 Ps 1,[0-9]   &WHJ==  WHJ      HJ==   
3 Ps 1,[0-9]   L             L          L      
4 Ps 1,[0-9]   GBR/~>   GBR>    GBR
5 Ps 1,[0-9]   D            D          D

Can someone please tell me what I'm going wrong and how to set up my code and/or Terminal to recognize regex metacharacters?

Moderator's Comments:

Unfortunately, the sample data that you posted here contains spaces instead of tabs.

Last edited by jvoot; 08-13-2014 at 08:55 PM.. Reason: Add CODE tags.

jvoot

View Public Profile for jvoot

Find all posts by jvoot

08-13-2014

Registered User

176, 67

Join Date: Nov 2013

Last Activity: 21 February 2019, 3:36 AM EST

Posts: 176

Thanks Given: 14

Thanked 67 Times in 63 Posts

Looks like you are using the gsub function incorrectly in your awk statement. I feel sed would be the best tool for this job.

Assuming you want your output to look like this (if this is not what you want your output to look like please provide sample output):

Code:

1 Ps 1 ,1 VWB/(J VWB VWB
2 Ps 1 ,1 &WHJ== WHJ HJ==
3 Ps 1 ,1 L L L
4 Ps 1 ,1 GBR/~> GBR> GBR
5 Ps 1 ,1 D D D

You can use the below sed one liner which will work on your input:

Code:

 sed 's/\([0-9]\),/\1 ,/g'  textfile

Reading your post again, it looks like you want your output like this instead:

Code:

1 Ps 1,1  VWB/(J VWB VWB
2 Ps 1,1  &WHJ== WHJ HJ==
3 Ps 1,1  L L L
4 Ps 1,1  GBR/~> GBR> GBR
5 Ps 1,1  D D D

In which case change the sed one liner to this:

Code:

sed 's/\(,[0-9]\)/\1 /g' textfile

Last edited by pilnet101; 08-13-2014 at 09:00 PM..

This User Gave Thanks to pilnet101 For This Post:

pilnet101

View Public Profile for pilnet101

Find all posts by pilnet101

08-13-2014

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

Thanks so much for this pilnet101. Actually, I'm trying to get the space *after* the digit which follows the comma, that is to say, a space before the tab that delimits the field.

Ultimately, I'd like each record of the textfile to look like this:

Code:

1 Ps 1,1[space][tab] VWB/(J ...

I'm trying to add that [space] above to accommodate a future double digit, i.e.,

Code:

Ps 1,10[Tab]Field $2

So, I'm trying to add a single space when there is only one digit after the comma (,1 ) and not have that space when there are two digits after the comma (,11).

Thanks so much and I apologize for my inability to explain this sufficiently. I am very new to this.

jvoot

View Public Profile for jvoot

Find all posts by jvoot

08-13-2014

Registered User

176, 67

Join Date: Nov 2013

Last Activity: 21 February 2019, 3:36 AM EST

Posts: 176

Thanks Given: 14

Thanked 67 Times in 63 Posts

Got it, try this one:

Code:

sed 's/\(,[0-9]\) /\1  /g' textfile

This User Gave Thanks to pilnet101 For This Post:

pilnet101

View Public Profile for pilnet101

Find all posts by pilnet101

08-13-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

There are two things to consider here:

if you only want to change one occurrence of something, use sub() instead of gsub(), and
the problem you're seeing isn't the ERE, the real problem is the replacement string you're using. The [0-9] in the replacement string is literal text; not an RE. In the replacement text, an unescaped ampersand (&) is replace by the string that was matched by the RE.

pilnet101 already showed you how to do this with sed, but if you're doing other stuff to your file in awk (that you haven't shown us), there is no reason to use both sed and awk. So, if you want to do it in awk try changing:

Code:

gsub(",[0-9]",",[0-9] ")

to:

Code:

sub(",[0-9]","& ")

The & would also simplify the sed command. Some versions of awk will allow you to use the backreferences pilnet101 used in sed in sub() calls in awk, but that is a non-standard extension that is not always available. (The awk utility uses extended regular expressions while the sed utility uses basic regular expressions.

If you have tabs in your input file as field separators and want to add a space before an existing tab in your input and preserve other tabs in your input, change your script to:

Code:

awk -F '\t' '{sub(",[0-9]","& "); print}' OFS='\t' textfile

or, using sed:

Code:

sed 's/,[0-9]/& /' textfile

These 2 Users Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

08-13-2014

Registered User

176, 67

Join Date: Nov 2013

Last Activity: 21 February 2019, 3:36 AM EST

Posts: 176

Thanks Given: 14

Thanked 67 Times in 63 Posts

Don, just one thing regarding your post - you would require a space after [0-9] expression as per OP's requirement to "not have that space when there are two digits after the comma".

pilnet101

View Public Profile for pilnet101

Find all posts by pilnet101

08-13-2014

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

That's right pilnet101. This resulted in a space after every digit that followed the comma, rather than just when there is one digit.

I really really appreciate the help pilnet101 and Don!

Quote:

Originally Posted by pilnet101

Don, just one thing regarding your post - you would require a space after [0-9] expression as per OP's requirement to "not have that space when there are two digits after the comma".

jvoot

View Public Profile for jvoot

Find all posts by jvoot

UNIX for Dummies Questions & Answers

Command line not recognizing metacharacters in awk

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

awk command not replacing in first line

Discussion started by: max_hammer

2. Shell Programming and Scripting

Pass awk field to a command line executed within awk

Discussion started by: tuxer

3. Shell Programming and Scripting

Command line - awk, sed

Discussion started by: nag_sathi

4. UNIX for Dummies Questions & Answers

Need an awk command to delete a line

Discussion started by: Rahul619

5. Shell Programming and Scripting

awk - ignore metacharacters, search shell variables

Discussion started by: DSommers

6. Shell Programming and Scripting

escaping metacharacters in paths for a shell command

Discussion started by: cue

7. UNIX for Dummies Questions & Answers

Using current line in a command in AWK

Discussion started by: m4rty

8. UNIX for Dummies Questions & Answers

The ll command + metacharacters

Discussion started by: feverdream

9. Shell Programming and Scripting

awk pattern matching problem -not recognizing a column

Discussion started by: newpro

10. Shell Programming and Scripting

assign a command line argument and a unix command to awk variables

Discussion started by: sweta_doshi