Sponsored Content
Top Forums Shell Programming and Scripting Multiple input field Separators in awk. Post 302189337 by kinksville on Friday 25th of April 2008 01:01:58 PM
Old 04-25-2008
Tools Multiple input field Separators in awk.

I saw a couple of posts here referencing how to handle more than one input field separator in awk. I figured I would share how I (just!) figured out how to turn this line in a logfile:

90000000000000000000010001 name D0.90000000000103787900010001QF840840916070000007085814Y216254@D1111111111111111=1107xxxxxxxxxxxxxxx x919MENCHIES

into this format:

90000000000000000000010001,name,840840916070000007085814Y216654,1111111111111111,1107,919MENCHIES

I have an entire script since this is just one step in a process of turning logs into useful information, but heres the relevant portion.

#Author: kinksville
#Date: April 24, 2008
#Revised: April 24, 2008
#Revision: Revision 1.00
#Other files: cclookup.s, cclookup.rep
#Changelog:
#April 24, 2008: Initial creation of the script.
#
#End changelog.

BEGIN {
FS="[ \. QF \@D = x]+"
OFS = ","
}
#First iteration of the @D search, stripping out the . character and inserting a OFS.
/\@D/ { #Search for any line containing the string @D
report2="cclookup.rep2"; #Define report2 variable.
report="cclookup.rep"; #Define report variable.
num_cclookup++; #Get number of auth requests.
print $1, $2, $5, $6, $7, $8 > report;
print $0 > report2;
} #End of the @D search.


The key is the fact that awk will accept a regular expression as file separator. This regexp FS="[ \. QF \@D = x]+" matches spaces, the . the string QF, the string @D, the =, and the character x. The + after the trailing bracket is the key, since that allows for 1 or more instances of any of the characters matched by the regexp.

That means that x and xxxxxx are both treated as a single field separator.

I still need to work on the output, since now I need to trim the name off the end of the last field. Unfortunately the number in the last field can range anywhere from 9999999 to 1 and that is the part that I want to preserve. Maybe a [^0-9]+ expression?
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk Multiple Field Separators

Hi Guys, I'm tying to split a line similar to this:YO6-2000-30.htm: (3 properties found).......into separate columns, so effectively I need to check for a -, ., :, a tab and a space in the statement. Any help would be appreciated Thanks! (7 Replies)
Discussion started by: Tonka52
7 Replies

2. UNIX for Dummies Questions & Answers

Multiple field separators in awk? (First a space, then a colon)

How do I deal with extracting a portion of a record when multiple field separators are involved. Let's say I have: Mike Harrington;(555) 555-5555:250:100:175 Christian Dobbins;(555) 555-2358:155:90:201 Susan Dalsass;(555) 555-6279:250:60:50 Archie McNichol;(555) 555-1348:250:100:175 Jody... (3 Replies)
Discussion started by: doubleminus
3 Replies

3. Shell Programming and Scripting

AWK multiple fields separators

I need to print the second field of a file, taking spaces, tab and = as field separators. ; for 16-bit app support MAPI=1 CMC=1 CMCDLLNAME32=mapi32.dll CMCDLLNAME=mapi.dll MAPIX=1 MAPIXVER=1.0.0.1 OLEMessaging=1 asf=MPEGVideo asx=MPEGVideo ivf=MPEGVideo m3u=MPEGVideo (2 Replies)
Discussion started by: PamPam
2 Replies

4. UNIX Desktop Questions & Answers

awk Varing Field Separators

Hi Guys, I have small dilemma which I could do with a little help solving . I currently have text HDD S.M.A.R.T report which I have pasted below: smartctl 5.39 2008-10-24 22:33 (openSUSE RPM) Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net Device: COMPAQ... (2 Replies)
Discussion started by: bikerben
2 Replies

5. UNIX for Dummies Questions & Answers

Can one use 2 field separators in awk?

I have files such as n02-z30-dsr65-terr0.25-dc0.008-16x12drw-run1.cmd I am wondering if it is possible to define two field separators "-" and "." for these strings so that $7 is run1. (5 Replies)
Discussion started by: kristinu
5 Replies

6. Shell Programming and Scripting

Splitting record into multiple records by appending values from an input field (AWK)

Hello, For the input file, I am trying to split those records which have multiple values seperated by '|' in the last input field, into multiple records and each record corresponds to the common input fields + one of the value from the last field. I was trying with an example on this forum... (4 Replies)
Discussion started by: imtiaz99
4 Replies

7. Shell Programming and Scripting

Multiple long field separators

How do I use multiple field separators in awk? I know that if I use awk -F"", both a and b will be field separators. But what if I need two field separators that both are longer than one letter? If I want the field separators to be "ab" and "cd", I will not be able to use awk -F"". The ... (2 Replies)
Discussion started by: locoroco
2 Replies

8. Shell Programming and Scripting

awk multiple fields separators

Can you please help me with this .... Input File share "FTPTransfer" "/v31_fs01/root/FTP-Transfer" umask=022 maxusr=4294967295 netbios=NJ09FIL530 share "Test" "/v31_fs01/root/Test" umask=022 maxusr=4294967295 netbios=NJ09FIL530 share "ENR California" "/v31_fs01/root/ENR California"... (14 Replies)
Discussion started by: greycells
14 Replies

9. Shell Programming and Scripting

awk multiple filed separators

There is an usual ifconfig output vlan30 Link encap:Ethernet HWaddr inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: 2407:4c00:0:1:aaff::1/64 Scope:Global inet6 addr: fe80::224:e8ff:fe6b:cc4f/64 Scope:Link UP BROADCAST... (1 Reply)
Discussion started by: urello
1 Replies

10. Shell Programming and Scripting

Parsing out data with multiple field separators

I have a large file that I need to print certain sections out of. file.txt /alpha/beta/delta/gamma/425/590/USC00015420.blah.lt.0.01.str:USC00015420Y2017M10BLALT.01 12 13 14 -9 1 -9 -9 -9 -9 -9 1 2 3 4 5 -9 -9 I need to print the "USC00015420" and... (5 Replies)
Discussion started by: ncwxpanther
5 Replies
regexp(n)						       Tcl Built-In Commands							 regexp(n)

__________________________________________________________________________________________________________________________________________________

NAME
regexp - Match a regular expression against a string SYNOPSIS
regexp ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...? _________________________________________________________________ DESCRIPTION
Determines whether the regular expression exp matches part or all of string and returns 1 if it does, 0 if it doesn't, unless -inline is specified (see below). (Regular expression matching is described in the re_syntax reference page.) If additional arguments are specified after string then they are treated as the names of variables in which to return information about which part(s) of string matched exp. MatchVar will be set to the range of string that matched all of exp. The first subMatchVar will con- tain the characters in string that matched the leftmost parenthesized subexpression within exp, the next subMatchVar will contain the char- acters that matched the next parenthesized subexpression to the right in exp, and so on. If the initial arguments to regexp start with - then they are treated as switches. The following switches are currently supported: -about Instead of attempting to match the regular expression, returns a list containing information about the regular expression. The first element of the list is a subexpression count. The second element is a list of property names that describe vari- ous attributes of the regular expression. This switch is primarily intended for debugging purposes. -expanded Enables use of the expanded regular expression syntax where whitespace and comments are ignored. This is the same as speci- fying the (?x) embedded option (see METASYNTAX, below). -indices Changes what is stored in the subMatchVars. Instead of storing the matching characters from string, each variable will con- tain a list of two decimal strings giving the indices in string of the first and last characters in the matching range of characters. -line Enables newline-sensitive matching. By default, newline is a completely ordinary character with no special meaning. With this flag, `[^' bracket expressions and `.' never match newline, `^' matches an empty string after any newline in addition to its normal function, and `$' matches an empty string before any newline in addition to its normal function. This flag is equivalent to specifying both -linestop and -lineanchor, or the (?n) embedded option (see METASYNTAX, below). -linestop Changes the behavior of `[^' bracket expressions and `.' so that they stop at newlines. This is the same as specifying the (?p) embedded option (see METASYNTAX, below). -lineanchor Changes the behavior of `^' and `$' (the ``anchors'') so they match the beginning and end of a line respectively. This is the same as specifying the (?w) embedded option (see METASYNTAX, below). -nocase Causes upper-case characters in string to be treated as lower case during the matching process. | -all | Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches | found. If this is specified with match variables, they will continue information for the last match only. | -inline | Causes the command to return, as a list, the data that would otherwise be placed in match variables. When using -inline, | match variables may not be specified. If used with -all, the list will be concatenated at each iteration, such that a flat | list is always returned. For each match iteration, the command will append the overall match data, plus one element for | each subexpression in the regular expression. Examples are: | regexp -inline -- {w(w)} " inlined " | => {in n} | regexp -all -inline -- {w(w)} " inlined " | => {in n li i ne e} | -start index | Specifies a character index offset into the string to start matching the regular expression at. When using this switch, `^' | will not match the beginning of the line, and A will still match the start of the string at index. If -indices is speci- | fied, the indices will be indexed starting from the absolute beginning of the input string. index will be constrained to | the bounds of the input string. -- Marks the end of switches. The argument following this one will be treated as exp even if it starts with a -. If there are more subMatchVar's than parenthesized subexpressions within exp, or if a particular subexpression in exp doesn't match the string (e.g. because it was in a portion of the expression that wasn't matched), then the corresponding subMatchVar will be set to ``-1 -1'' if -indices has been specified or to an empty string otherwise. SEE ALSO
re_syntax(n), regsub(n) KEYWORDS
match, regular expression, string Tcl 8.3 regexp(n)
All times are GMT -4. The time now is 04:02 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy