The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Multiple field separators in awk? (First a space, then a colon) doubleminus UNIX for Dummies Questions & Answers 3 04-27-2008 04:28 PM
I need help counting the fields and field separators using Nawk scrappycc Shell Programming and Scripting 3 02-06-2008 11:47 PM
can you redirect multiple files for input? Matrix_Prime UNIX for Dummies Questions & Answers 4 02-27-2005 07:07 PM
Awk Multiple Field Separators Tonka52 Shell Programming and Scripting 7 04-07-2004 10:37 PM
Output Multiple Field from dataBase file Dennz UNIX for Dummies Questions & Answers 3 09-01-2003 01:41 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-25-2008
kinksville kinksville is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 7
Cool Multiple input field Separators in awk.

I saw a couple of posts here referencing how to handle more than one input field separator in awk. I figured I would share how I (just!) figured out how to turn this line in a logfile:

90000000000000000000010001 name D0.90000000000103787900010001QF840840916070000007085814Y216254@D1111111111111111=1107xxxxxxxxxxxxxxx x919MENCHIES

into this format:

90000000000000000000010001,name,840840916070000007085814Y216654,1111111111111111,1107,919MENCHIES

I have an entire script since this is just one step in a process of turning logs into useful information, but heres the relevant portion.

#Author: kinksville
#Date: April 24, 2008
#Revised: April 24, 2008
#Revision: Revision 1.00
#Other files: cclookup.s, cclookup.rep
#Changelog:
#April 24, 2008: Initial creation of the script.
#
#End changelog.

BEGIN {
FS="[ \. QF \@D = x]+"
OFS = ","
}
#First iteration of the @D search, stripping out the . character and inserting a OFS.
/\@D/ { #Search for any line containing the string @D
report2="cclookup.rep2"; #Define report2 variable.
report="cclookup.rep"; #Define report variable.
num_cclookup++; #Get number of auth requests.
print $1, $2, $5, $6, $7, $8 > report;
print $0 > report2;
} #End of the @D search.


The key is the fact that awk will accept a regular expression as file separator. This regexp FS="[ \. QF \@D = x]+" matches spaces, the . the string QF, the string @D, the =, and the character x. The + after the trailing bracket is the key, since that allows for 1 or more instances of any of the characters matched by the regexp.

That means that x and xxxxxx are both treated as a single field separator.

I still need to work on the output, since now I need to trim the name off the end of the last field. Unfortunately the number in the last field can range anywhere from 9999999 to 1 and that is the part that I want to preserve. Maybe a [^0-9]+ expression?
  #2 (permalink)  
Old 04-25-2008
aigles's Avatar
aigles aigles is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,433
Are you sure that your FS definition is valid for your requirement ?
You doesn't define "@D" and "QF" as separators.
The caracters @,D,Q and F are define as separators.

The valid syntax is :

Code:
FS  = "([[:space:]]|\\.|QF|=|x)+";

The get the last field without prefixing digits :

Code:
last_field=$NF
sub(/^[0-9]*/, "", last_field);

Jean-Pierre.
  #3 (permalink)  
Old 04-25-2008
kinksville kinksville is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 7
I was a little confused by the fact that QF and @D were working too. I think its because [QF]+ matches QQ QQQ QF QQFF etc.

It's not as clean as I might like but those characters are always at that particular place in the logged message, so it does what I want it to.

I'll sub in your expression and see what happens too
  #4 (permalink)  
Old 04-25-2008
kinksville kinksville is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 7
No such luck

Quote:
Originally Posted by aigles View Post
The valid syntax is :

Code:
FS  = "([[:space:]]|\\.|QF|=|x)+";

Quote:
Originally Posted by aigles View Post
The get the last field without prefixing digits :

Code:
last_field=$NF
sub(/^[0-9]*/, "", last_field);

Jean-Pierre.


Neither of those snippets worked correctly for me. The FS syntax that you used probably changed the number of fields and so they didn't all get printed out.

The second snippet just seemed to add the #1 to the last field ie (,619MENCHIES1).

I'll play with it some more and see what happens.
  #5 (permalink)  
Old 04-25-2008
kinksville kinksville is offline
Registered User
  
 

Join Date: Apr 2008
Posts: 7
Cool So here's what finally got me the result I was looking for.


Code:
#This script scans the appropriate log file and copies lines containing authorization requests to the output.
#All output is comma separated.
#Author:        kinksville
#Date:          April 24, 2008
#Revised:       April 25, 2008
#Revision:      Revision 1.01
#Other files:   cclookup.s, cclookup.rep
#Changelog:
#April 24, 2008: Initial creation of the script.
#April 25, 2008: Updated the regex for the input FS to match multiple characters.
#
#End changelog.

BEGIN {
#Input field separators will match any of the following characters/strings: blank space, . , QF, @D, =, x (repeating).
#The + on the outside of the brackets will allow it to match 0 or more instances of any of the characters/strings in any combination.
#%  Any comments with the % sign are temporarily there for testing purposes.
FS="[ \. QF \@D = x]+"
#Output field separator is defined as a comma.
OFS = ","
}
#@D search, stripping out the field separator characters and inserting a OFS.
/\@D/                {                                                          #Search for any line containing the string @D
                        last_field=$8 ;
                        sub(/[^0-9]*/,"",last_field );
                        dollar_val=last_field/100 ;
                        report="cclookup.rep";                                  #Define report variable.
                        num_cclookup++;                                         #Get number of auth requests.
                        field1=$1 ;
                        field2=$2 ;
                        field3=$5 ;
                        field4=$6 ;
                        field5=$7 ;
                        printf ("%s,%s,%s,%s,%s,$%-.2f\n",field1,field2,field3,field4,field5,dollar_val) > report
                        #print $1, $2, $5, $6, $7, $8 > report;                 #Print fields 1-2 with the OFS between them to report.
                        }                                                       #End of the @D search.

It's a bit of a kludge but it works. I couldn't seem to get the last_field variable to print out no matter what I did using the plain print command, which is why I eventually went with printf instead. That also allowed me to output the results in a decimal format since those numbers before the MENCHIES were dollar amounts.

Last edited by kinksville; 04-25-2008 at 06:13 PM.. Reason: Removed full name from the comments.
Closed Thread

Bookmarks

Tags
awk, awk trim, trim, trim awk

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 03:54 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0