More complicated log parsing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting More complicated log parsing
# 1  
Old 06-06-2007
More complicated log parsing

Hey Guys,

I am trying to grep within a file to find and output certain parts of lines to other file(s). The output files need to have a dynamic file name based on a field in the main log.

The problem is that every line of the log is not the same, and often not even similar.


To explain further, the lines in the log look like:

Code:
2007-06-05 14:03:48,337 INFO  External- PXgcGllGX1TMdFCXrKyc8GQTwvLlfQ6B9wYQLyGXTQpKX5yxW8FC!-1784053810!1181066628296|>> [HandleRequest] QService<?xml version="1.0" encoding="utf-8" standalone="yes"?>
2007-06-05 14:03:51,236 INFO  External- PXgcGllGX1TMdFCXrKyc8GQTwvLlfQ6B9wYQLyGXTQpKX5yxW8FC!-1784053810!1181066628296|<< [HandleResponse] QService<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2007-06-05 14:03:56,900 INFO  External- |||>> [HandleRequest] QService<?xml version="1.0" encoding="utf-8" standalone="yes"?>
2007-06-05 14:03:58,492 INFO  External- |||<< [HandleResponse] QService<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2007-06-05 14:11:09,570 INFO  External- |02-20070605-510669||>> [HandleRequest] LService<?xml version="1.0" encoding="utf-8" standalone="yes"?>
2007-06-05 14:11:12,752 INFO  External- |02-20070605-510669||<< [HandleResponse] LService<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2007-06-05 14:11:22,997 INFO  External- |02-20070605-510669||>> [HandleRequest] AService<?xml version="1.0" encoding="utf-8" standalone="yes"?>
2007-06-05 14:11:38,191 INFO  External- |02-20070605-510669||<< [HandleResponse] AService<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

What I want to do is ignore everything before the first pipe and take the '02-YYYYMMDD-XXXXXX' format between the pipes as the $FILENAME, then append everything after the XService (ie. <?xml to end of line) to the new file.

I would appreciate any suggestions, thanks in advance.
# 2  
Old 06-06-2007
try this, not tested though

Code:
while read line
do
  filename=`echo "$line" | sed 's/^.* |\(.*\)||\(.*\)/\1/'`
  echo "filename is $filename"
  echo "$line" | sed 's/^.*<?\(.*\)/<?\1>/' > $filename
done < source_file

# 3  
Old 06-06-2007
Thanks for the amazingly quick response.

As you said, untested, but so far good start.
Two main issues
First, it doesn't quite work. I neglected to mention that I only want to get what is after the HandleRequest for AService ([HandleRequest] AService).
It does make files in the current state, but selectively, and not the correct handle/service.

Second, is the millions of other files that are created named after random tags that are on their own lines. All the other garbage files all start with "<", what would the if statement in the brackets look like (if [ $filename != <* ] ?).

Last edited by sjug; 06-06-2007 at 04:15 PM..
# 4  
Old 06-06-2007
Try this awk program :
Code:
#!/usr/bin/awk -f
# Awk script: extract.awk

BEGIN {
   FS = "|";
}
$2 ~ /^02-[0-9]+-[0-9]+$/ {
   if (file && file != $2) close(file);
   file = $2;
   sub(/^.*.Service/, "", $0);
   print $0 >> file;
}

Output with your datas:
Code:
$ ls 02-*
/bin/ls: cannot access 02-*: No such file or directory
$ awk -f extract.awk logfile
$ ls 02-*
02-20070605-510669
$ cat 02-*
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
$

Jean-Pierre.
# 5  
Old 06-06-2007
Thanks for your response Jean-Pierre,

The script does work with my sample file but not every line has the same format. For example these lines are thrown in between the previous ones I posted.

So for my actual logs there is no output.

Code:
2007-06-06 11:05:32,863 INFO  External- 4164445555|01-20070606-280684||WService dueDate call with requestXML=
2007-06-06 11:05:32,863 INFO  External- 4164445555|01-20070606-280684||<?xml version="1.0" encoding="ISO-8859-1"?>
    <requestHeader>
        <sourceSystemTimestamp>
        <requestType>
            <miscServices>dueDate</miscServices>
        </requestType>
        <asyncIndr>no</asyncIndr>
        <responseReqtList>
            <responseReqt>
                <responseType>confirmation</responseType>
                <responseMode>http</responseMode>
                <responseAddress>
                <responseLanguage>E</responseLanguage>
            </responseReqt>
            <totalResponseReqts>1</totalResponseReqts>
        </responseReqtList>
        <sourceRequestIdList>
            <sourceRequestId>01-20070606-280684</sourceRequestId>
            <totalSourceRequestIds>1</totalSourceRequestIds>
        </sourceRequestIdList>
    </requestHeader>
    <telephoneNumber>
    <serviceAddressRequest>
    <dueDateRequestList>
        <dueDateRequestListItem>
            <dueDateRequestMode>query</dueDateRequestMode>
        </dueDateRequestListItem>
        <totalDueDateRequestListItems>1</totalDueDateRequestListItems>
    </dueDateRequestList>
</dueDateRequest>


It may be better to do an intial sweep to duplicate the log, then sweep through and remove them, then process with your initial script.

Also, I do not need all entries for a specific 02-YYYYMMDD-XXXXXX, only the ones that have '[HandleRequest] AService'.

Last edited by sjug; 06-06-2007 at 06:09 PM..
# 6  
Old 06-06-2007
perhaps :
Code:
#!/usr/bin/awk -f
# Awk script: extract.awk

BEGIN {
   FS = "|";
}
$2 ~ /^02-[0-9]+-[0-9]+$/ && /[HandleRequest] AService/ {
   if (file && file != $2) close(file);
   file = $2;
   sub(/^.*.Service/, "", $0);
   print $0 >> file;
}

Input file:
Code:
2007-06-05 14:03:48,337 INFO  External- PXgcGllGX1TMdFCXrKyc8GQTwvLlfQ6B9wYQLyGXTQpKX5yxW8FC!-1784053810!1181066628296|>> [HandleRequest] QService<?xml version="1.0" encoding="utf-8" standalone="yes"?>
2007-06-05 14:03:51,236 INFO  External- PXgcGllGX1TMdFCXrKyc8GQTwvLlfQ6B9wYQLyGXTQpKX5yxW8FC!-1784053810!1181066628296|<< [HandleResponse] QService<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2007-06-05 14:03:56,900 INFO  External- |||>> [HandleRequest] QService<?xml version="1.0" encoding="utf-8" standalone="yes"?>
2007-06-05 14:03:58,492 INFO  External- |||<< [HandleResponse] QService<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2007-06-05 14:11:09,570 INFO  External- |02-20070605-510669||>> [HandleRequest] LService<?xml version="1.0" encoding="utf-8" standalone="yes"?>
2007-06-05 14:11:12,752 INFO  External- |02-20070605-510669||<< [HandleResponse] LService<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2007-06-05 14:11:22,997 INFO  External- |02-20070605-510669||>> [HandleRequest] AService<?xml version="1.0" encoding="utf-8" standalone="yes"?>
2007-06-05 14:11:38,191 INFO  External- |02-20070605-510669||<< [HandleResponse] AService<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2007-06-06 11:05:32,863 INFO  External- 4164445555|01-20070606-280684||WService dueDate call with requestXML=
2007-06-06 11:05:32,863 INFO  External- 4164445555|01-20070606-280684||<?xml version="1.0" encoding="ISO-8859-1"?>
    <requestHeader>
        <sourceSystemTimestamp>
        <requestType>
            <miscServices>dueDate</miscServices>
        </requestType>
        <asyncIndr>no</asyncIndr>
        <responseReqtList>
            <responseReqt>
                <responseType>confirmation</responseType>
                <responseMode>http</responseMode>
                <responseAddress>
                <responseLanguage>E</responseLanguage>
            </responseReqt>
            <totalResponseReqts>1</totalResponseReqts>
        </responseReqtList>
        <sourceRequestIdList>
            <sourceRequestId>01-20070606-280684</sourceRequestId>
            <totalSourceRequestIds>1</totalSourceRequestIds>
        </sourceRequestIdList>
    </requestHeader>
    <telephoneNumber>
    <serviceAddressRequest>
    <dueDateRequestList>
        <dueDateRequestListItem>
            <dueDateRequestMode>query</dueDateRequestMode>
        </dueDateRequestListItem>
        <totalDueDateRequestListItems>1</totalDueDateRequestListItems>
    </dueDateRequestList>
</dueDateRequest>

Output file 02-20070605-510669:
Code:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<?xml version="1.0" encoding="utf-8" standalone="yes"?>

Jean-Pierre.
# 7  
Old 06-06-2007
Continued thanks for all your support Jean-Pierre,

but I still get no output at all from the latest input file (as pasted by yourself), and the latest extract script
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing Log File help

Hi, I am a newbie to scripting. I have multiple log files (saved as .gz) in a directory that looks like this 01-01-2013 10:00 pn: 123 01-01-2013 10:00 sn: 987 01-01-2013 10:00 Test1 01-01-2013 10:00 Result: Pass 01-01-2013 10:00 Time: 5:00 01-01-2013 10:00 Test2 01-01-2013 10:00... (3 Replies)
Discussion started by: linuxnew
3 Replies

2. Shell Programming and Scripting

Log parsing

I have a directory with daily logs that have records like this: Date: 04/17/13 Time: 09:29:15 IP: 123.123.123.123 URL: usr/local/file1 and I want to only count how many times each file was accessed (e.g. file1 in that example above), and I want to also look in all the logs in the current... (3 Replies)
Discussion started by: Jaymz
3 Replies

3. Shell Programming and Scripting

Help Parsing a Log File

Hello all, I am new to scripting and I have written a script that performs an Rsync on my NAS and then moves on to send me an email with the status etc. The problem is that I think Rsync is taking to long to complete and the IF statement is timing out, as it doesn't appear to move on. Here... (1 Reply)
Discussion started by: Mongrel
1 Replies

4. Shell Programming and Scripting

Log parsing script

Hello, I have a script that parses logs and sends the output via digitally signed and encrypted email. This script uses grep -v to exclude patterns in a file. The problem I have is if this is run via cron none of the pattern matching seems to occur. If I run it by hand it runs exactly as it is... (2 Replies)
Discussion started by: wpfontenot
2 Replies

5. Shell Programming and Scripting

Parsing complicated CSV file with sed

Yes, there is a great doc out there that discusses parsing csv files with sed, and this topic has been covered before but not enough to answer my question (unix.com forums). I'm trying to parse a CSV file that has optional quotes like the following: "Apple","Apples, are fun",3.60,4.4,"I... (3 Replies)
Discussion started by: analog999
3 Replies

6. Shell Programming and Scripting

Perl log parsing help

Hello, I'm sure this is a very simple problem, but I'm having trouble thinking of an efficient way to do the following: given a large centralized ssh-log, one file on a syslog server, not separated by machines (I wish it were), that looks something like this: Sep 27 16:20:56 machine-name... (1 Reply)
Discussion started by: droog72
1 Replies

7. Shell Programming and Scripting

Parsing a Complicated properties file

Hi All, I have a requirement to parse a file. Let me clear you all on the req. I have a job which contains multiple tasks and each task will have multiple attributes that will be in the below format. Each task will have some sequence number according to that sequence number tasks shld... (1 Reply)
Discussion started by: rajeshorpu
1 Replies

8. Shell Programming and Scripting

XML Log Parsing

I have a log file that is around 300 MB of data having continours soap responses as shown below( I have attached only one sample SOAP). I would require to have the following extracted and written onto a new file. timestamp WebPartId bus:block bus:unblock endpt:operation Please help me. ... (3 Replies)
Discussion started by: pk_eee
3 Replies

9. Shell Programming and Scripting

Parsing a large log

I need to parse a large log say 300-400 mb The commands like awk and cat etc are taking time. Please help how to process. I need to process the log for certain values of current date. But I am unbale to do so. (17 Replies)
Discussion started by: asth
17 Replies

10. Shell Programming and Scripting

parsing email log

Can anyone give me some examples of how I can parse the following lines of text so that all characters up to and including the @ symbol are deleted? Also, any duplicates would need to be deleted in order to produce the desired output. Any help is much appreciated and explanations of any... (5 Replies)
Discussion started by: jjamd64
5 Replies
Login or Register to Ask a Question