Awk; gsub in fields 3 and 4


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
Awk; gsub in fields 3 and 4

I want to transform a log file into input for a database.

Here's the log file:
Code:
Tue Aug 4 20:17:01 PDT 2009
Wireless users: 339
Daily Average: 48.4285
=
Tue Aug 11 20:17:01 PDT 2009
Wireless users: 295
Daily Average: 42.1428
=
Tue Aug 18 20:17:01 PDT 2009
Wireless users: 294
Daily Average: 42.0000
=
Tue Aug 25 20:17:01 PDT 2009
Wireless users: 289
Daily Average: 41.2857
=

I need to strip the descriptions for "Wireless users" and "Daily Average" but keep the date as is.

So far, I thought I could use "=" for the record separator and "\n" as the field separator. Here's what I've got so far:
Code:
awk -F'\n' 'RS="\="{for(i=1;i<=NF;i++){gsub(/[^[:digit:].]/,"",$4)}}; 1' rotate1.log

The output confuses me:

Code:
Tue Aug 4 20:17:01 PDT 2009
Wireless users: 339
Daily Average: 48.4285

 Tue Aug 11 20:17:01 PDT 2009 Wireless users: 295 42.1428 
 Tue Aug 18 20:17:01 PDT 2009 Wireless users: 294 42.0000 
 Tue Aug 25 20:17:01 PDT 2009 Wireless users: 289 41.2857 
 Tue Sep 1 20:17:01 PDT 2009 Wireless users: 379 54.1428

Why is it printing the first record as is, printing the rest as specified by RS and FS?

Secondly I need to gsub on field 3 as well. Here's one with two
gsub statements:

Code:
awk -F'\n' 'RS="\="{for(i=1;i<=NF;i++){gsub(/[^[:digit:].]/,"",$4)}{gsub(/[^[:digit:]]/,"",$3)}}; 1' rotate1.log

Output ( still printing first record unscathed ):
Code:
Tue Aug 4 20:17:01 PDT 2009
Wireless users: 339
Daily Average: 48.4285

 Tue Aug 11 20:17:01 PDT 2009 295 42.1428 
 Tue Aug 18 20:17:01 PDT 2009 294 42.0000 
 Tue Aug 25 20:17:01 PDT 2009 289 41.2857 
 Tue Sep 1 20:17:01 PDT 2009 379 54.1428

Is there a way to throw an "or" in there to reduce the gsubs to
one?

So the output above is OK except for the printing of the first record "AS IS".

With OFS set as tab I've nearly got what I need:

Code:
awk -F'\n' 'RS="\="{for(i=1;i<=NF;i++){gsub(/[^[:digit:].]/,"",$4)}{gsub(/[^[:digit:]]/,"",$3)}};{OFS="\t"};1' rotate1.log

So what's up with printing the first record undigested?

Thanks for reading!

Bubnoff
# 2  
Quote:
Why is it printing the first record as is, printing the rest as specified by RS and FS?
Because you put the RS assignment in the wrong place.
Put the RS assignment outside of the code:

Code:
awk -v RS='\n' ...

Please post an example of the desired output based on the previously posted input.
This User Gave Thanks to radoulov For This Post:
# 3  
Thanks Radoulov ~

However, it's still printing the first record incorrectly. Here's what I need:
Code:
Tue Aug 11 20:17:01 PDT 2009    295     42.1428 
Tue Aug 18 20:17:01 PDT 2009    294     42.0000 
Tue Aug 25 20:17:01 PDT 2009    289     41.2857 
Tue Sep 1 20:17:01 PDT 2009     379     54.1428 
Tue Sep 8 20:17:01 PDT 2009     287     41.0000

Here's the current Awk ( with your suggestion ):

Code:
 awk -F'\n' -v RS='=' '{for(i=1;i<=NF;i++){gsub(/[^[:digit:].]/,"",$4)}{gsub(/[^[:digit:]]/,"",$3)}};{OFS="\t"};1' rotate1.log

Here's the resulting output:
Code:
Tue Aug 4 20:17:01 PDT 2009     Wireless users: 339     484285  
        Tue Aug 11 20:17:01 PDT 2009    295     42.1428 
        Tue Aug 18 20:17:01 PDT 2009    294     42.0000 
        Tue Aug 25 20:17:01 PDT 2009    289     41.2857 
        Tue Sep 1 20:17:01 PDT 2009     379     54.1428 
        Tue Sep 8 20:17:01 PDT 2009     287     41.0000

I'm very very close here, but that first record is still not being transformed. I also notice that there are blank new lines at the bottom of the output that are not present in the input.

Thanks again ~

Bubnoff

---------- Post updated at 09:30 AM ---------- Previous update was at 09:07 AM ----------

I notice that I really don't need the "for loop".


Code:
awk -F'\n' -v RS='=' '{gsub(/[^[:digit:]]/,"",$3)}{gsub(/[^[:digit:].]/,"",$4)};{OFS="\t";print}' rotate1.log

Still get the partially transformed first record:

Code:
Tue Aug 4 20:17:01 PDT 2009     Wireless users: 339     484285

Note that the gsub for field 4 is partially applied while not at all on 4. Also not respecting the period in the expression for 4 but stripping "non-digits".

Should look like this:

Code:
Tue Aug 4 20:17:01 PDT 2009     339     48.4285



---------- Post updated at 09:37 AM ---------- Previous update was at 09:30 AM ----------

In the input file there was not a RS ( ie., = ) before the first record.

So with this:

Code:
 awk -F'\n' -v RS='=' '{gsub(/[^[:digit:]]/,"",$3);gsub(/[^[:digit:].]/,"",$4)};{OFS="\t";print}' rotate1.log

I get the expected results. Is it possible to reduce this to one gsub for both the 3rd and 4th fields?

Also, do I need sed to remove the blank lines in the output, or is there something I'm missing in my Awk?

Code:
awk -F'\n' -v RS='=' '{gsub(/[^[:digit:]]/,"",$3);gsub(/[^[:digit:].]/,"",$4)};{OFS="\t";print}' rotate1.log | sed '/^[ \t]*$/d'

Thanks again for your suggestions!

Bubnoff

Last edited by Bubnoff; 09-30-2010 at 01:39 PM.. Reason: Another observation
# 4  
You can use something like this:

Code:
awk -F: '/=/ { print x }
NF > 2 { printf "%s", $0; next }
{ printf "%s", $2 }' infile

Given the fixed format, you could also write something like this:

Code:
awk 'NF {
  for (i = 0; ++i <= 6;)
    printf "%s ", $i
  printf "%s %s\n", $9, $12
  }' RS== infile

This User Gave Thanks to radoulov For This Post:
# 5  
Nice solutions but how do they work?
Code:
awk -F: '/=/ { print x } NF > 2 { printf "%s", $0; next } { printf "%s", $2 }' infile

I get the NF > 2 { ...etc and after but why does the /=/ {print x}
work?

I think I understand the second one.

Thanks again!

Bub
# 6  
Quote:
... but why does the /=/ {print x} work?
Just to print a newline Smilie
You can use printf "\n" instead if you wish.
# 7  
You rock!!

Three ways to do this ...one record based ( record as defined by input file ) and two essentially line/field based.

Thanks again!

Bub

Last edited by Bubnoff; 09-30-2010 at 06:21 PM..
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Gsub function in awk

Hello, I had some difficulty to understand the gsub function and maybe the regex in this script to remove all the punctuations: awk 'gsub(//, " ", $0)' text.txtFile text.txt: This is a test for gsub I typed this random text file which contains punctuation like ,.;!'"?/\ etc. The script... (6 Replies)
Discussion started by: yifangt
6 Replies

2. UNIX for Dummies Questions & Answers

awk gsub with variables

Hello, I'm trying to substitute a string with leading zero for all the records except the trailer record using awk command and with variables. The input file test_med1.txt has data like below 1234ABC...........................9200............LF... (2 Replies)
Discussion started by: somu_june
2 Replies

3. Shell Programming and Scripting

awk gsub

Hi, I want to print the first column with original value and without any double quotes The output should look like <original column>|<column without quotes> $ cat a.txt "20121023","19301229712","100397" "20121023","19361629712","100778" "20121030A","19361630412","100838"... (3 Replies)
Discussion started by: ysrini
3 Replies

4. Shell Programming and Scripting

awk gsub multiple fields

Hi, I am trying to execute this line awk -F ";" -v OFS=";" '{gsub(/\./,",",$6); print}' FILE but for multiple fields $6 $7 $8 Do you have a suggstion? Tried: awk -F ";" -v OFS="";"" "function GSUB( F ) {gsub(/\./,\",\",$F); print} { GSUB( 6 ); GSUB( 7 ); GSUB( 8 ) } 1"... (2 Replies)
Discussion started by: nakaedu
2 Replies

5. Shell Programming and Scripting

awk gsub with variables?

Hey, I would like to replace a string by a new one. Teh problem is that both strings should be variables to be flexible, because I am having a lot of files (with the same structure, but in different folders) for i in daysim_* do cd $i/5/ folder=`pwd |awk '{print $1}'` awk '{ if... (3 Replies)
Discussion started by: ergy1983
3 Replies

6. Shell Programming and Scripting

Help with awk and gsub using C shell

Being new to awk, I am still running into little stupid things. For this issues I am trying to search for all occurrences of a string in a file and replace all of those occurrences with a replacement string. I tried doing awk '{gsub("|750101|", "|000000|", $0)}' infile > outfile Unix... (3 Replies)
Discussion started by: jclanc8
3 Replies

7. Shell Programming and Scripting

Awk gsub error.

I want to replace comma with space and "*646#" with space. I am using the following code: nawk -F"|" '{gsub(","," ",$3); gsub(/\*646\#/"," ",$3);print}' OFS="|" file I am getting following error: Help is appreciated (5 Replies)
Discussion started by: pinnacle
5 Replies

8. Shell Programming and Scripting

Awk Gsub Query

Hi, Can some one please explain the following line please throw some light on the ones marked in red awk '{print $9}' ${FTP_LOG} | awk -v start=${START_DATE} 'BEGIN { FS = "." } { old_line1=$0; gsub(/\-/,""); if ( $3 >= start ) print old_line1 }' | awk -v end=${END_DATE} 'BEGIN { FS="." } {... (3 Replies)
Discussion started by: crosairs
3 Replies

9. Shell Programming and Scripting

awk gsub

Hi all I want to do a simple substitution in awk but I am getting unexpected output. My function accepts a time and then prints out a validation message if the time is valid. However some times may include a : and i want to strip this out if it exists before i get to the validation. I have shown... (4 Replies)
Discussion started by: pxy2d1
4 Replies

10. Shell Programming and Scripting

Help with AWK and gsub

Hello, I have a variable that displays the following results from a JVM.... 1602100K->1578435K I would like to collect the value of 1578435 which is the value after a garbage collection. I've tried the following command but it looks like I can't get the > to work. Any suggestions as... (4 Replies)
Discussion started by: npolite
4 Replies

Featured Tech Videos