Gsub function in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Gsub function in awk
# 1  
Old 07-11-2013
Gsub function in awk

Hello, I had some difficulty to understand the gsub function and maybe the regex in this script to remove all the punctuations:
Code:
awk  'gsub(/[^a-zA-Z0-9_ \t]/, " ", $0)'  text.txt

File text.txt:
Code:
This is a test for gsub
I typed this random text file
which contains punctuation like ,.;!'"?/\ etc.
The script should remove all the punctuations

The output is only one line:
Code:
which contains punctuation like        etc

What did I miss with gsub or the regex? Thanks a lot!

Last edited by yifangt; 07-11-2013 at 04:32 PM..
# 2  
Old 07-11-2013
You could do:
Code:
awk '{gsub(/[^[:alnum:]^[:blank:]]*/,x); print}' file

For further reference: Character Classes

By the way the problem with your approach is that you put that in the condition section, you should put that in the action section instead:
Code:
awk  '{gsub(/[^a-zA-Z0-9_ \t]/, " ", $0); print}' file

So if it is in condition section, it will print only those lines for which gsub function did successful substitution.
This User Gave Thanks to Yoda For This Post:
# 3  
Old 07-11-2013
Yoda's suggestion still seems overly complex for what was requested. Why not just:
Code:
awk '{gsub(/[[:punct:]]/,""); print}' file

If you're going to try this on a Solaris system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of /usr/bin/awk or /bin/awk.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 07-11-2013
Can you post the way the output should look like according to you...
# 5  
Old 07-11-2013
Thanks a lot Yoda, and all!

Yes, the condition and action part is what I missed. However, when I did:
Code:
awk 'gsub(/[[:punct:]_[:blank:]]/, " ", $0)' text.txt

which worked fine!
Code:
This is a test for gsub
I typed this random text file
which contains punctuation like           etc 
The script should remove all the punctuations

How come like this? Thank you again!

Last edited by yifangt; 07-11-2013 at 04:32 PM..
# 6  
Old 07-11-2013
Quote:
Originally Posted by yifangt
Thanks a lot Yoda, and all!

Yes, the condition and action part is what I missed. However, when I did:
Code:
awk 'gsub(/[[:punct:]_[:blank:]]/, " ", $0)' text.txt

which worked fine!
Code:
This is a test for gsub
I typed this random text file
which contains punctuation like           etc 
The script should remove all the punctuations

How come like this? Thank you again!
When condition evaluates to true (non-empty string or non-zero arithmetic value), the action defaults to printing the current line. Since gsub() returns the number of substitutions performed and all of your input lines contained a space character; changing each space (by [:blank:] matching a space and then changing it to a space), got you what you wanted. Having the underscore in your regular expression is redundant since underscore is a punctuation character.

Try your script again when your input file contains an empty line (nothing but a newline) and a line that contains a single word with no leading or trailing spaces or punctuation to see the difference.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 07-11-2013
Thanks Don!
You explanation can't be more helpful, and I tried your suggestion with blank line and one-word-only line. I saw the difference. I put it here for others if they need it.
Code:
 $ awk 'gsub(/[[:punct:]_[:blank:]]/, " ", $0)' text.txt
This is a test for gsub
I typed this random text file
which contains punctuation like           etc 
The script should remove all the punctuations

Code:
$ awk '{gsub(/[[:punct:]_[:blank:]]/, " ", $0); print}' /tmp/text.txt 
This is a test for gsub
I typed this random text file
which contains punctuation like           etc 
The script should remove all the punctuations

fox

You give me another instance that this forum is so great!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using multiple gsub() function under a loop in awk

Hi ALL, I want to replace string occurrence in my file "Config" using a external file named "Mapping" using awk. $cat Config ! Configuration file for RAVI ! Configuration file for RACHANA ! Configuration file for BALLU $cat Mapping ravi:ram rachana:shyam ballu:hameed The... (5 Replies)
Discussion started by: useless79
5 Replies

2. Shell Programming and Scripting

awk gsub

Hi, I want to print the first column with original value and without any double quotes The output should look like <original column>|<column without quotes> $ cat a.txt "20121023","19301229712","100397" "20121023","19361629712","100778" "20121030A","19361630412","100838"... (3 Replies)
Discussion started by: ysrini
3 Replies

3. Shell Programming and Scripting

Using of gsub function in AWK to replace space by underscore

I must design a UNIX script to monitor files whose size is over a threshold of 5 MB in a specific UNIX directory I meet a problem during the for loop in my script. Some file names contain spaces. ls -lrt | awk '$5>=5000000 && length($8)==5 {gsub(/ /,"_",$9); print};' -rw-r--r-- 1 was61 ... (2 Replies)
Discussion started by: Scofield38
2 Replies

4. Shell Programming and Scripting

Awk; gsub in fields 3 and 4

I want to transform a log file into input for a database. Here's the log file: Tue Aug 4 20:17:01 PDT 2009 Wireless users: 339 Daily Average: 48.4285 = Tue Aug 11 20:17:01 PDT 2009 Wireless users: 295 Daily Average: 42.1428 = Tue Aug 18 20:17:01 PDT 2009 Wireless users: 294 Daily... (6 Replies)
Discussion started by: Bubnoff
6 Replies

5. Shell Programming and Scripting

awk gsub with variables?

Hey, I would like to replace a string by a new one. Teh problem is that both strings should be variables to be flexible, because I am having a lot of files (with the same structure, but in different folders) for i in daysim_* do cd $i/5/ folder=`pwd |awk '{print $1}'` awk '{ if... (3 Replies)
Discussion started by: ergy1983
3 Replies

6. Shell Programming and Scripting

Awk gsub error.

I want to replace comma with space and "*646#" with space. I am using the following code: nawk -F"|" '{gsub(","," ",$3); gsub(/\*646\#/"," ",$3);print}' OFS="|" file I am getting following error: Help is appreciated (5 Replies)
Discussion started by: pinnacle
5 Replies

7. Shell Programming and Scripting

Awk Gsub Query

Hi, Can some one please explain the following line please throw some light on the ones marked in red awk '{print $9}' ${FTP_LOG} | awk -v start=${START_DATE} 'BEGIN { FS = "." } { old_line1=$0; gsub(/\-/,""); if ( $3 >= start ) print old_line1 }' | awk -v end=${END_DATE} 'BEGIN { FS="." } {... (3 Replies)
Discussion started by: crosairs
3 Replies

8. Shell Programming and Scripting

awk gsub

Hi all I want to do a simple substitution in awk but I am getting unexpected output. My function accepts a time and then prints out a validation message if the time is valid. However some times may include a : and i want to strip this out if it exists before i get to the validation. I have shown... (4 Replies)
Discussion started by: pxy2d1
4 Replies

9. Shell Programming and Scripting

Help with AWK and gsub

Hello, I have a variable that displays the following results from a JVM.... 1602100K->1578435K I would like to collect the value of 1578435 which is the value after a garbage collection. I've tried the following command but it looks like I can't get the > to work. Any suggestions as... (4 Replies)
Discussion started by: npolite
4 Replies

10. Shell Programming and Scripting

use var in gsub of awk

Hi all, This problem has cost me half a day, and i still do not know how to do. Any help will be appreciated. Thanks advance. I want to use a variable as the first parameters of gsub function of awk. Example: { ... arri]=gsub(i,tolower(i),$1) (which should be ambraced by //) ... } (1 Reply)
Discussion started by: summer_cherry
1 Replies
Login or Register to Ask a Question