Having Problems with AWK


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Having Problems with AWK
# 1  
Old 05-31-2014
Having Problems with AWK

So, I'm having a lot of crazy problems with Awk that I cannot understand. This one in particular is driving me nuts. Here is one section of my Awk script:

Code:
    print $0
    
    sub(/Jan/,"",$2)
    sub(/Feb/,"",$2)
    sub(/Mar/,"",$2)
    sub(/Apr/,"",$2)
    sub("May","",$2)
    sub(/Jun/,"",$2)
    sub(/Jul/,"",$2)
    sub(/Aug/,"",$2)
    sub(/Sep/,"",$2)
    sub(/Oct/,"",$2)
    sub(/Nov/,"",$2)
    sub(/Dec/,"",$2)

    print $0

and here is the single line that is in the datafile I am working with:


Code:
Thursday, May 29, 2014,4,11,20 PM GMT-0700,1,Hard to get mario and pokemon gamecube cases ,http,//www.listia.com/auction/17243312-hard-to-get-mario-and-pokemon-gamecube

I am trying to simply delete the word, "May", and do nothing else. I have tried this a few different ways, using a double-quoted string instead of RegExp, and the output was exactly the same. Here is the output I am getting:

Code:
Thursday, May 29, 2014,4,11,20 PM GMT-0700,1,Hard to get mario and pokemon gamecube cases ,http,//www.listia.com/auction/17243312-hard-to-get-mario-and-pokemon-gamecube
Thursday   29  2014 4 11 20 PM GMT-0700 1 Hard to get mario and pokemon gamecube cases  http //www.listia.com/auction/17243312-hard-to-get-mario-and-pokemon-gamecube

You can see that after running the script, I am somehow getting a very weird result. The word "May" has been stripped from the second field, but somehow all of the commas have also been stripped from the output.

For the record I am using Gawk under Ubuntu. Any help is greatly appreciated.

---------- Post updated at 05:55 AM ---------- Previous update was at 05:43 AM ----------

Here is a more complete explanation. This is the datafile:

Code:
"Thursday, May 29, 2014","4:11:20 PM GMT-0700","1",                      "Hard to get mario and pokemon gamecube cases"           ,"http://www.listia.com/auction/17243312-hard-to-get-mario-and-pokemon-gamecube"

This is the AWK script:

Code:
{
    gsub(/\ \ /, "")
    gsub(/\"/, "")
    
    gsub(/\:/, "\,")
       
    print $0
    
    sub(/Jan/,"",$2)
    sub(/Feb/,"",$2)
    sub(/Mar/,"",$2)
    sub(/Apr/,"",$2)
    sub("May","",$2)
    sub(/Jun/,"",$2)
    sub(/Jul/,"",$2)
    sub(/Aug/,"",$2)
    sub(/Sep/,"",$2)
    sub(/Oct/,"",$2)
    sub(/Nov/,"",$2)
    sub(/Dec/,"",$2)

    print $0
    
    gsub(/" AM"/, "\,AM", $4)
    gsub(/" PM"/, "\,PM", $4)
    gsub(/" GMT"/, "\,GMT", $4)
    
    gsub(/\ /,"", $2)

    gsub(/\ /,"", $3)

}

And, here are the two output lines, printed before and after the block of code pertaining to the months above.

Code:
Thursday, May 29, 2014,4,11,20 PM GMT-0700,1,Hard to get mario and pokemon gamecube cases ,http,//www.listia.com/auction/17243312-hard-to-get-mario-and-pokemon-gamecube
Thursday   29  2014 4 11 20 PM GMT-0700 1 Hard to get mario and pokemon gamecube cases  http //www.listia.com/auction/17243312-hard-to-get-mario-and-pokemon-gamecube

The command I am using to launch awk is:

gawk -F, -f <awkfile> <datafile>

hope that helps

Last edited by rrdein; 05-31-2014 at 11:04 AM..
# 2  
Old 05-31-2014
Since you are modifying a field ($2), the record gets recomputed and all input field separators FS ( a comma) get replaced by the default output field separators OFS (a single space). Try it like this:

Code:
gawk -F, -v OFS=, -f <awkfile> <datafile>


Last edited by Scrutinizer; 05-31-2014 at 11:18 AM..
# 3  
Old 05-31-2014
Just curious, I have the book, "Effective Awk Programming". Suffice it to say that is does not to a very effective job at actually teaching Awk. This is all it really has to say about OFS:

"It is output between the fields output by a print statement. Its default value is " ", a string consisting of a single space.

I am just wondering if you can tell me why the first print statement in my code leaves the commas, while the second one removes them? Thank you.
# 4  
Old 05-31-2014
Because in the first case none of the fields have been altered ($1 .. $NF) . The preceding gsubs operate on the record itself ($0) and then the record does not get recomputed and thus FS does not get replaced by OFS.
# 5  
Old 05-31-2014
Hi/
Quote:
Originally Posted by rrdein
Just curious, I have the book, "Effective Awk Programming". Suffice it to say that is does not to a very effective job at actually teaching Awk. This is all it really has to say about OFS:

"It is output between the fields output by a print statement. Its default value is " ", a string consisting of a single space. ...
My copy says:
Quote:
When you change the value of a field (as perceived by awk), the text of the input record is recalculated to contain the new field where the old one was.
...
The recomputation affects and is affected by by NF ... and by a feature that has not been discussed yet, the output field separator, OFS ...
EAP 2nd Edition, page 42, A Robbins, SSC.

I think the real issue is changing the field, although I can see your point perhaps desiring that references to OFS would be more completely covered in the index.

Best wishes ... cheers, drl
# 6  
Old 05-31-2014
Thank you guys for your help. I believe drl hit the nail on the head with the index. I originally read the Aho/Weinberger book many years ago and found it very simple and easy to follow. I misplaced the book over the years and thought I could save a buck with the Robbins book (2nd Edition), but I have also noticed that the index is very sparse, and even does not provide info for many topics that are shown prominently in the ToC.

I have company today, but as soon as I get a chance I will try these solutions and give some more thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Execution Problems with awk

Ubuntu, Bash 4.3.48 Hi, I have this input file: a1:b2:c30:g4:h12:j7 and I want this output file: a1=g4:b2=h12:c30=j7 I can do it this with this code: awk -F':' '{print $1"="$4":"$2"="$5":"$3"="$6"}' INPUT > OUTPUTIn this case I have 6 columns, I calculate manually the half number of... (6 Replies)
Discussion started by: echo manolis
6 Replies

2. Shell Programming and Scripting

awk problems - awk ignores conditions

awk 'BEGIN{ if('"$CATE"'<'"${WARN}"') printf ("%s", "'"`Kfunc "" ; break`"'") else if (('"${CATE}"'>='"${WARN}"') && ('"${CATE}"'<'"${CRIT}"')) printf ("%s", "'"`Wfunc ""; break`"'") else if ('"${CATE}"'>='"${CRIT}"') printf... (6 Replies)
Discussion started by: SkySmart
6 Replies

3. Shell Programming and Scripting

simple problems in awk

Dear All, I have the following awk script. #!/bin/bash sh stdev.cmd data.file | awk '{print $2}' > out.data read d < out.data echo $d awk '{print $1,$2- $f}' new > newz The script runs "stdev.cmd" and output a file "out.data" and the value of the... (2 Replies)
Discussion started by: Yacob_123
2 Replies

4. UNIX for Dummies Questions & Answers

Problems with AWK

Hi I am writing a shell script for a number of things and aone problem that keeps comming up is AWK formatting. When commands are typed into the command line they are fine, but when executed in the script the results are pilled up and not in a list/table format. I have tried using ... (2 Replies)
Discussion started by: AngelFlesh
2 Replies

5. Shell Programming and Scripting

Execution problems using awk command.

Hi All, I have the following requirement. In a directory i get files from external source. I at regular intervals check that directory for any incoming files. The file name is underscore delimited. Such as: aaa_bbb_ccc_ddd_eee_fff.dat I am using awk and and splitting the file name. ... (4 Replies)
Discussion started by: satishpv_2002
4 Replies

6. UNIX for Dummies Questions & Answers

problems with awk

Using Linux, I am trying to create a list of all the lines that have "Non-white" or "No" in column 3 of a file: ethnicity.txt. I have used the following command : awk '$3 == "No" || $3 == "Non-white" {print $1, $2, $3}' ethnicity.txt This only returns the lines with "No" and none of... (3 Replies)
Discussion started by: polly_falconer
3 Replies

7. Shell Programming and Scripting

help.. Problems in using awk

I do have a file (named as templist) which looks like this one: 00450000.000000 00402300.000000 00040000.000000 00020000.000000 00020000.000000 00020000.000000 00020000.000000 and I want to make a script that adds this using AWK or FOR. I tried using awk using the command but it just... (8 Replies)
Discussion started by: dakid
8 Replies

8. UNIX for Advanced & Expert Users

awk problems

awk ' FILENAME=="First"{ arr = 1; x=sub ; } FILENAME=="Second"{ if (/^10/ &&... (5 Replies)
Discussion started by: Ehab
5 Replies

9. Shell Programming and Scripting

awk problems

If i try the -f option for awk, i get the "awk: can't open " error message The following awk statement works fine without the -f option `awk <$RULES '/^IGNORE_POLICY / { print $2 }'` Below how i turned on debugging to show what is happening, can someone provide me with some advice!!!! ... (1 Reply)
Discussion started by: Junes
1 Replies

10. Shell Programming and Scripting

Problems with AWK

Hi I'm a newbie to Unix scripting and was having some problems with AWK. I have written this little script that should read a process list and then print out the PID's of the offending processes. Unfortunately it doesn't seem to work! The script is as follows: ps -ef | awk '{if... (10 Replies)
Discussion started by: trainee
10 Replies
Login or Register to Ask a Question