Command line not recognizing metacharacters in awk


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Command line not recognizing metacharacters in awk
# 1  
Old 08-13-2014
Command line not recognizing metacharacters in awk

Hello, I'm new to command line coding (and coding in general) and have run into a problem. I'm using awk to perform a global find and replace in a text file in the Terminal provided by Mac.

Here is a sample of my textfile where the fields are separated by tabs.

Code:
1Ps 1,1  VWB/(J    VWB      VWB 
2 Ps 1,1  &WHJ==  WHJ      HJ==   
3 Ps 1,1  L             L          L      
4 Ps 1,1  GBR/~>   GBR>    GBR
5 Ps 1,1  D            D          D

I want to add a space after the digit that is preceded by a comma (this is because my columns become upset when there is a two-digit number after the coma in other parts of my data). To do this, I used the following:

Code:
awk '{gsub(",[0-9]",",[0-9] "); print}' textfile

However, I got the following output whereby the character class '[0-9]' was not recognized.

Code:
1 Ps 1,[0-9]   VWB/(J    VWB      VWB 
2 Ps 1,[0-9]   &WHJ==  WHJ      HJ==   
3 Ps 1,[0-9]   L             L          L      
4 Ps 1,[0-9]   GBR/~>   GBR>    GBR
5 Ps 1,[0-9]   D            D          D

Can someone please tell me what I'm going wrong and how to set up my code and/or Terminal to recognize regex metacharacters?
Moderator's Comments:
Mod Comment Unfortunately, the sample data that you posted here contains spaces instead of tabs.

Last edited by jvoot; 08-13-2014 at 08:55 PM.. Reason: Add CODE tags.
# 2  
Old 08-13-2014
Looks like you are using the gsub function incorrectly in your awk statement. I feel sed would be the best tool for this job.

Assuming you want your output to look like this (if this is not what you want your output to look like please provide sample output):

Code:
1 Ps 1 ,1 VWB/(J VWB VWB
2 Ps 1 ,1 &WHJ== WHJ HJ==
3 Ps 1 ,1 L L L
4 Ps 1 ,1 GBR/~> GBR> GBR
5 Ps 1 ,1 D D D

You can use the below sed one liner which will work on your input:

Code:
 sed 's/\([0-9]\),/\1 ,/g'  textfile

Reading your post again, it looks like you want your output like this instead:

Code:
1 Ps 1,1  VWB/(J VWB VWB
2 Ps 1,1  &WHJ== WHJ HJ==
3 Ps 1,1  L L L
4 Ps 1,1  GBR/~> GBR> GBR
5 Ps 1,1  D D D

In which case change the sed one liner to this:

Code:
sed 's/\(,[0-9]\)/\1 /g' textfile


Last edited by pilnet101; 08-13-2014 at 09:00 PM..
This User Gave Thanks to pilnet101 For This Post:
# 3  
Old 08-13-2014
Thanks so much for this pilnet101. Actually, I'm trying to get the space *after* the digit which follows the comma, that is to say, a space before the tab that delimits the field.

Ultimately, I'd like each record of the textfile to look like this:

Code:
1 Ps 1,1[space][tab] VWB/(J ...

I'm trying to add that [space] above to accommodate a future double digit, i.e.,

Code:
Ps 1,10[Tab]Field $2

So, I'm trying to add a single space when there is only one digit after the comma (,1 ) and not have that space when there are two digits after the comma (,11).

Thanks so much and I apologize for my inability to explain this sufficiently. I am very new to this.
# 4  
Old 08-13-2014
Got it, try this one:

Code:
sed 's/\(,[0-9]\) /\1  /g' textfile

This User Gave Thanks to pilnet101 For This Post:
# 5  
Old 08-13-2014
There are two things to consider here:
  1. if you only want to change one occurrence of something, use sub() instead of gsub(), and
  2. the problem you're seeing isn't the ERE, the real problem is the replacement string you're using. The [0-9] in the replacement string is literal text; not an RE. In the replacement text, an unescaped ampersand (&) is replace by the string that was matched by the RE.
pilnet101 already showed you how to do this with sed, but if you're doing other stuff to your file in awk (that you haven't shown us), there is no reason to use both sed and awk. So, if you want to do it in awk try changing:
Code:
gsub(",[0-9]",",[0-9] ")

to:
Code:
sub(",[0-9]","& ")

The & would also simplify the sed command. Some versions of awk will allow you to use the backreferences pilnet101 used in sed in sub() calls in awk, but that is a non-standard extension that is not always available. (The awk utility uses extended regular expressions while the sed utility uses basic regular expressions.

If you have tabs in your input file as field separators and want to add a space before an existing tab in your input and preserve other tabs in your input, change your script to:
Code:
awk -F '\t' '{sub(",[0-9]","& "); print}' OFS='\t' textfile

or, using sed:
Code:
sed 's/,[0-9]/& /' textfile

These 2 Users Gave Thanks to Don Cragun For This Post:
# 6  
Old 08-13-2014
Don, just one thing regarding your post - you would require a space after [0-9] expression as per OP's requirement to "not have that space when there are two digits after the comma".
# 7  
Old 08-13-2014
That's right pilnet101. This resulted in a space after every digit that followed the comma, rather than just when there is one digit.

I really really appreciate the help pilnet101 and Don!

Quote:
Originally Posted by pilnet101
Don, just one thing regarding your post - you would require a space after [0-9] expression as per OP's requirement to "not have that space when there are two digits after the comma".
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

awk command not replacing in first line

As per requirement if column 2 is NULL then 'N' ELSE 'Y'. I have written below awk code. But it is not replacing values for first line. :confused: cat temp.txt 1|abc|3 1||4 1|11|c awk -F'|' '{if($2==""){$2="N"}else{$2="Y"} print $0 } {OFS="|"} ' < temp.txt 1 Y 3 ... (4 Replies)
Discussion started by: max_hammer
4 Replies

2. Shell Programming and Scripting

Pass awk field to a command line executed within awk

Hi, I am trying to pass awk field to a command line executed within awk (need to convert a timestamp into formatted date). All my attempts failed this far. Here's an example. It works fine with timestamp hard-codded into the command echo "1381653229 something" |awk 'BEGIN{cmd="date -d... (4 Replies)
Discussion started by: tuxer
4 Replies

3. Shell Programming and Scripting

Command line - awk, sed

My input file gfile values is CTRY=GM&PROJTYPE=SP&PROJECTTYPE=Small+Project If i am giving PROJECTTYPE then it must give Small Project awk -F"&" '{for (i=1; i<=NF; i++) if ($i ~ "^"PAT) {sub ("^"PAT"=", "", $i); sed 's/'+'/""/' $i ; print $i }}' PAT=$1 ... (6 Replies)
Discussion started by: nag_sathi
6 Replies

4. UNIX for Dummies Questions & Answers

Need an awk command to delete a line

Hi all, As of now am using an awk command to check the number of columns in a file that has 10 lakh rows. Is it possible to remove that particular line having an extra column and copy the remaining lines to a new file ? YOUR HELP IS HIGHLY APPRECIATED. THANKS IN ADVANCE (5 Replies)
Discussion started by: Rahul619
5 Replies

5. Shell Programming and Scripting

awk - ignore metacharacters, search shell variables

Can I use awk to search for a string, passed from the shell, that might include metacharacters? File1 entries: Bob Marley Jammin (Bonus Track).mp3 File2 entries: Bob Marley Jammin (Bonus Track).mp3 32000 /Music/Bob Marley/ Jammin (Bonus Track).mp3 So far, I have this; $ sed -e... (9 Replies)
Discussion started by: DSommers
9 Replies

6. Shell Programming and Scripting

escaping metacharacters in paths for a shell command

I have a file which contains a list of paths separated by a new line character. e.g /some/path/to/a/file.png /some/path to/another/file.jpeg /some path/to yet/another/file Notice that these paths may contain metacharacters, the spaces for example are also not escaped. If I wanted... (5 Replies)
Discussion started by: cue
5 Replies

7. UNIX for Dummies Questions & Answers

Using current line in a command in AWK

Hi, Im trying to get current line in the AGREP command I use in AWK. My script looks like this: list.txt car bus checklist.txt cer buss cat list.txt | awk -v mycmd="$(agrep -2 -i $0 checklist.txt)" '{print $mycmd}' It doesnt work. How can I get the current line in the $0... (6 Replies)
Discussion started by: m4rty
6 Replies

8. UNIX for Dummies Questions & Answers

The ll command + metacharacters

Hello. I am learning how to use Unix through an online course. Unfortunately the text that we use isn't very good, so I could use some help with a pretty basic question. Use metacharacters and the ll command to list all filenames under the datafiles directory that contain a dot "." with the... (2 Replies)
Discussion started by: feverdream
2 Replies

9. Shell Programming and Scripting

awk pattern matching problem -not recognizing a column

Hi all, I am new to awk. I want to print the line numbers if the column has a particular value. For example I have: cat FILE1 COL1 COL2 X114 0 X116 0 X117 0 X120 0 X121 0 X125 0 X126 0 X127 0 X131 1 X132 0 X135 0 X136 0 (3 Replies)
Discussion started by: newpro
3 Replies

10. Shell Programming and Scripting

assign a command line argument and a unix command to awk variables

Hi , I have a piece of code ...wherein I need to assign the following ... 1) A command line argument to a variable e.g origCount=ARGV 2) A unix command to a variable e.g result=`wc -l testFile.txt` in my awk shell script When I do this : print "origCount" origCount --> I get the... (0 Replies)
Discussion started by: sweta_doshi
0 Replies
Login or Register to Ask a Question