Multiple input field Separators in awk.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Multiple input field Separators in awk.
# 1  
Old 04-25-2008
Tools Multiple input field Separators in awk.

I saw a couple of posts here referencing how to handle more than one input field separator in awk. I figured I would share how I (just!) figured out how to turn this line in a logfile:

90000000000000000000010001 name D0.90000000000103787900010001QF840840916070000007085814Y216254@D1111111111111111=1107xxxxxxxxxxxxxxx x919MENCHIES

into this format:

90000000000000000000010001,name,840840916070000007085814Y216654,1111111111111111,1107,919MENCHIES

I have an entire script since this is just one step in a process of turning logs into useful information, but heres the relevant portion.

#Author: kinksville
#Date: April 24, 2008
#Revised: April 24, 2008
#Revision: Revision 1.00
#Other files: cclookup.s, cclookup.rep
#Changelog:
#April 24, 2008: Initial creation of the script.
#
#End changelog.

BEGIN {
FS="[ \. QF \@D = x]+"
OFS = ","
}
#First iteration of the @D search, stripping out the . character and inserting a OFS.
/\@D/ { #Search for any line containing the string @D
report2="cclookup.rep2"; #Define report2 variable.
report="cclookup.rep"; #Define report variable.
num_cclookup++; #Get number of auth requests.
print $1, $2, $5, $6, $7, $8 > report;
print $0 > report2;
} #End of the @D search.


The key is the fact that awk will accept a regular expression as file separator. This regexp FS="[ \. QF \@D = x]+" matches spaces, the . the string QF, the string @D, the =, and the character x. The + after the trailing bracket is the key, since that allows for 1 or more instances of any of the characters matched by the regexp.

That means that x and xxxxxx are both treated as a single field separator.

I still need to work on the output, since now I need to trim the name off the end of the last field. Unfortunately the number in the last field can range anywhere from 9999999 to 1 and that is the part that I want to preserve. Maybe a [^0-9]+ expression?
# 2  
Old 04-25-2008
Are you sure that your FS definition is valid for your requirement ?
You doesn't define "@D" and "QF" as separators.
The caracters @,D,Q and F are define as separators.

The valid syntax is :
Code:
FS  = "([[:space:]]|\\.|QF|=|x)+";

The get the last field without prefixing digits :
Code:
last_field=$NF
sub(/^[0-9]*/, "", last_field);

Jean-Pierre.
# 3  
Old 04-25-2008
I was a little confused by the fact that QF and @D were working too. I think its because [QF]+ matches QQ QQQ QF QQFF etc.

It's not as clean as I might like but those characters are always at that particular place in the logged message, so it does what I want it to.

I'll sub in your expression and see what happens too Smilie
# 4  
Old 04-25-2008
No such luck

Quote:
Originally Posted by aigles
The valid syntax is :
Code:
FS  = "([[:space:]]|\\.|QF|=|x)+";


Quote:
Originally Posted by aigles
The get the last field without prefixing digits :
Code:
last_field=$NF
sub(/^[0-9]*/, "", last_field);

Jean-Pierre.
Smilie

Neither of those snippets worked correctly for me. The FS syntax that you used probably changed the number of fields and so they didn't all get printed out.

The second snippet just seemed to add the #1 to the last field ie (,619MENCHIES1).

I'll play with it some more and see what happens.
# 5  
Old 04-25-2008
Tools So here's what finally got me the result I was looking for.

Code:
#This script scans the appropriate log file and copies lines containing authorization requests to the output.
#All output is comma separated.
#Author:        kinksville
#Date:          April 24, 2008
#Revised:       April 25, 2008
#Revision:      Revision 1.01
#Other files:   cclookup.s, cclookup.rep
#Changelog:
#April 24, 2008: Initial creation of the script.
#April 25, 2008: Updated the regex for the input FS to match multiple characters.
#
#End changelog.

BEGIN {
#Input field separators will match any of the following characters/strings: blank space, . , QF, @D, =, x (repeating).
#The + on the outside of the brackets will allow it to match 0 or more instances of any of the characters/strings in any combination.
#%  Any comments with the % sign are temporarily there for testing purposes.
FS="[ \. QF \@D = x]+"
#Output field separator is defined as a comma.
OFS = ","
}
#@D search, stripping out the field separator characters and inserting a OFS.
/\@D/                {                                                          #Search for any line containing the string @D
                        last_field=$8 ;
                        sub(/[^0-9]*/,"",last_field );
                        dollar_val=last_field/100 ;
                        report="cclookup.rep";                                  #Define report variable.
                        num_cclookup++;                                         #Get number of auth requests.
                        field1=$1 ;
                        field2=$2 ;
                        field3=$5 ;
                        field4=$6 ;
                        field5=$7 ;
                        printf ("%s,%s,%s,%s,%s,$%-.2f\n",field1,field2,field3,field4,field5,dollar_val) > report
                        #print $1, $2, $5, $6, $7, $8 > report;                 #Print fields 1-2 with the OFS between them to report.
                        }                                                       #End of the @D search.

It's a bit of a kludge but it works. I couldn't seem to get the last_field variable to print out no matter what I did using the plain print command, which is why I eventually went with printf instead. That also allowed me to output the results in a decimal format since those numbers before the MENCHIES were dollar amounts.

Last edited by kinksville; 04-25-2008 at 06:13 PM.. Reason: Removed full name from the comments.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing out data with multiple field separators

I have a large file that I need to print certain sections out of. file.txt /alpha/beta/delta/gamma/425/590/USC00015420.blah.lt.0.01.str:USC00015420Y2017M10BLALT.01 12 13 14 -9 1 -9 -9 -9 -9 -9 1 2 3 4 5 -9 -9 I need to print the "USC00015420" and... (5 Replies)
Discussion started by: ncwxpanther
5 Replies

2. Shell Programming and Scripting

awk multiple filed separators

There is an usual ifconfig output vlan30 Link encap:Ethernet HWaddr inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: 2407:4c00:0:1:aaff::1/64 Scope:Global inet6 addr: fe80::224:e8ff:fe6b:cc4f/64 Scope:Link UP BROADCAST... (1 Reply)
Discussion started by: urello
1 Replies

3. Shell Programming and Scripting

awk multiple fields separators

Can you please help me with this .... Input File share "FTPTransfer" "/v31_fs01/root/FTP-Transfer" umask=022 maxusr=4294967295 netbios=NJ09FIL530 share "Test" "/v31_fs01/root/Test" umask=022 maxusr=4294967295 netbios=NJ09FIL530 share "ENR California" "/v31_fs01/root/ENR California"... (14 Replies)
Discussion started by: greycells
14 Replies

4. Shell Programming and Scripting

Multiple long field separators

How do I use multiple field separators in awk? I know that if I use awk -F"", both a and b will be field separators. But what if I need two field separators that both are longer than one letter? If I want the field separators to be "ab" and "cd", I will not be able to use awk -F"". The ... (2 Replies)
Discussion started by: locoroco
2 Replies

5. Shell Programming and Scripting

Splitting record into multiple records by appending values from an input field (AWK)

Hello, For the input file, I am trying to split those records which have multiple values seperated by '|' in the last input field, into multiple records and each record corresponds to the common input fields + one of the value from the last field. I was trying with an example on this forum... (4 Replies)
Discussion started by: imtiaz99
4 Replies

6. UNIX for Dummies Questions & Answers

Can one use 2 field separators in awk?

I have files such as n02-z30-dsr65-terr0.25-dc0.008-16x12drw-run1.cmd I am wondering if it is possible to define two field separators "-" and "." for these strings so that $7 is run1. (5 Replies)
Discussion started by: kristinu
5 Replies

7. UNIX Desktop Questions & Answers

awk Varing Field Separators

Hi Guys, I have small dilemma which I could do with a little help solving . I currently have text HDD S.M.A.R.T report which I have pasted below: smartctl 5.39 2008-10-24 22:33 (openSUSE RPM) Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net Device: COMPAQ... (2 Replies)
Discussion started by: bikerben
2 Replies

8. Shell Programming and Scripting

AWK multiple fields separators

I need to print the second field of a file, taking spaces, tab and = as field separators. ; for 16-bit app support MAPI=1 CMC=1 CMCDLLNAME32=mapi32.dll CMCDLLNAME=mapi.dll MAPIX=1 MAPIXVER=1.0.0.1 OLEMessaging=1 asf=MPEGVideo asx=MPEGVideo ivf=MPEGVideo m3u=MPEGVideo (2 Replies)
Discussion started by: PamPam
2 Replies

9. UNIX for Dummies Questions & Answers

Multiple field separators in awk? (First a space, then a colon)

How do I deal with extracting a portion of a record when multiple field separators are involved. Let's say I have: Mike Harrington;(555) 555-5555:250:100:175 Christian Dobbins;(555) 555-2358:155:90:201 Susan Dalsass;(555) 555-6279:250:60:50 Archie McNichol;(555) 555-1348:250:100:175 Jody... (3 Replies)
Discussion started by: doubleminus
3 Replies

10. Shell Programming and Scripting

Awk Multiple Field Separators

Hi Guys, I'm tying to split a line similar to this:YO6-2000-30.htm: (3 properties found).......into separate columns, so effectively I need to check for a -, ., :, a tab and a space in the statement. Any help would be appreciated Thanks! (7 Replies)
Discussion started by: Tonka52
7 Replies
Login or Register to Ask a Question