Sponsored Content
Top Forums Shell Programming and Scripting Parsing a mixed format (flatfile+xml) logfile Post 302724913 by goddevil on Thursday 1st of November 2012 12:39:07 PM
Old 11-01-2012
Sun Parsing a mixed format (flatfile+xml) logfile

I am trying to parse a file that looks like the below:
Quote:
2012/09/10 12:18:18: username@192.168.1.1: OPERATION user (<event_n>blah</event_n><column>username</column><old_val>blabla</old_val><new_val>newblablah</new_value><time>1347270053954</time><new_val></new_val>[XML file continues..]</event_data>) - succeeded
There are thousands of lines like the above and the file is expected to run into hundreds of thousands.

The issue i have is the mixed format of the file. If it was just an xmlfile, i would use an xmllint to parse the file. Now, i am using nawk to parse the xml and convert it into a flatfile, piping it into an sed to remove unwanted characters and then piping the result into an nawk again to parse the flatfile. A sample code is below:

Code:
nawk -F'(<)|(>)' '{print $1 "\t" $2 "\n" $8 "\t" $14 .....  $60 }' testfile.log | sed -e s/event_n//g  ......... -e 's/[()]//g' -e s/-/RESULT/g | nawk -f present.awk > output

The awk file present.awk is as follows:
Code:
BEGIN{
FS="[ |:|@||\t]";
}
{
print "Date & time : " $1, $2":"$3":"$4;
print "Login : " $6"@"$7;
print "Ops : " $9;
print "Mod : " $10;
print "Det: "
print $11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24;
}
END{
print NR,"Records Processed";
}

The file output looks like below:
Quote:
Date & time : 2012/09/10 08:43:01
Login : username@192.168.1.1
Ops: UPDATE
Mod : users
Det:
username old_val blah new_val newblablah time old_val 1386782 new_value 1347270053954 RESULT succeeded
---------------------------------------------------------------------

I have the following queries/concerns regarding my work:

1. In the file present.awk, if you look at the final print statement (print $11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24) you will see that the output is not neat as this particular output is separated by tabs. I find that there are multiple tabs between the fields and sometimes nothing is printed. So my question is this: When using a tab as a field separator, can i ignore multiple tabls and consider them as 1 tab? How do i do this in the FS statement?

2. Going forward, my plan is to extract data daily from the file for the previous day. My plan is to get the previous day's date, grep the log file for this date and then pipe the result into the above code. Is there a better way to do this and avoid the grep and pipe?

3. I have refrained from using pipes as much as possible to reduce the time complexity but i couldnt avoid the above pipes. Is there anyway i can do the parsing above without using any pipes? I have this nagging feeling that there should be a better way to do the parsing without going through the painstaking work of finding which fiields correspond to the data i need and then printing the particular field from the awk

4. Once i get the above output file, is there anything i can do to convert the file into a format that would be easily readable from windows? I would like to add some logos and page breaks to the file. Is this possible?

I will be grateful if you can take some time to help me with my predicament
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

parsing xml

I want to use wget comment to parse an xml parse that exist in an online website. How can I connect it using shell script through Unix and how can I parse it?? (1 Reply)
Discussion started by: walnut
1 Replies

2. Shell Programming and Scripting

Logfile parsing with variable, multiple criterias among multiple lines

Hi all I've been working on a bash script parsing through debug/trace files and extracting all lines that relate to some search string. So far, it works pretty well. However, I am challenged by one requirement that is still open. What I want to do: 1) parse through a file and identify all... (3 Replies)
Discussion started by: reminder
3 Replies

3. Shell Programming and Scripting

logfile parsing

I thought I was pretty handy with awk until I got this one. :) I'm trying to parse a log file where the events could have different delimiters (2 scripts is ok), the errors are spread over multiple lines, and I"m trying to figure out how to not read the same lines that have already been read. ... (1 Reply)
Discussion started by: linkslice
1 Replies

4. Shell Programming and Scripting

XML parsing

I have a xml file attached. I need to parse parameterId and its value My output should be like 151515 38 151522 32769 and so on.. Please help me. Its urgent (6 Replies)
Discussion started by: LavanyaP
6 Replies

5. UNIX for Advanced & Expert Users

XML Parsing

I had a big XML and from which I have to make a layout as below *TOTAL+CB | *CB+FX | CS |*IR | *TOTAL | -------------------------------------------------------------------------------------------------- |CB FX | | | | DMFXNY EMSGFX... (6 Replies)
Discussion started by: manas_ranjan
6 Replies

6. Shell Programming and Scripting

Parsing XML

I am trying to parse an xml file and trying to grab certain values and inserting them into database table. I have the following xml that I am parsing: <dd:service name="locator" link="false"> <dd:activation mode="manual" /> <dd:run mode="direct_persistent" proxified="false" managed="true"... (7 Replies)
Discussion started by: $criptKid617
7 Replies

7. Shell Programming and Scripting

Generating XML from a flatfile

Hi all, I am trying to generate an XML file from a flatfile in ksh/bash (could also use perl at a pinch, but out of my depth there!). I have found several good solutions on this very forum for cases where the header line in the file forms the XML tags, however my flatfile is as follows:... (5 Replies)
Discussion started by: ianmrid
5 Replies

8. Shell Programming and Scripting

Parsing Logfile

Hi, I need to continuously monitor a logfile to get the log information between a process start and end. the logfile look like this abcdddddddddd cjjckkkkkkkkkkkk abc : Process started aaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbbb abc... (6 Replies)
Discussion started by: Byorg
6 Replies

9. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

10. Shell Programming and Scripting

XML parsing

i have xml output in below format... <AlertsResponse> <Alert id="11216" name="fgdfg"> <AlertActionLog timestamp="1356521629778" user="admin" detail="Recovery Alert"/> </Alert> <Alert id="11215" name="gdfg <AlertActionLog timestamp="1356430119840" user=""... (12 Replies)
Discussion started by: vivek d r
12 Replies
All times are GMT -4. The time now is 05:14 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy