seeking help in text processing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting seeking help in text processing
# 1  
Old 04-15-2008
seeking help in text processing

Hi,

I am a newbie in shell scripting. I want to get an expert help in solving a text processing issue.

The issue I am facing is that, in the below log file contents I need to extract each block of lines (it could be a single line also) based on some regular expression and store it in seperate files.

One approach coming into my mind is that, extract the lines between 2 regular expression patterns and append it into a file with name corresponding to its MID. The start pattern shall match the string "06 Oct 00:04:10:334" and the end pattern shall match the string "(MID=0003080248636816, UBID=, FACTID=)" and extract the lines in between, both inclusive. In the action part extract the MID "0003080248636816" and create a file with that name and append the matched lines into that file.

I guess it can be done using awk programming, but I am in the learning phase. Any help would be greately appreciated. If there is an easy and better approach to this problem, please suggest.


The output I wanted to generate is like this:


File: 0003080248636816
------------------

06 Oct 00:04:10:334 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer -
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0SmilierocessConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0SmilierocessConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636816, UBID=, FACTID=)

06 Oct 00:04:10:891 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:04:10:563];the duration of usage was [327] ms (MID=0003080248636816, UBID=0000050244656716, FACTID=0000786987)

06 Oct 00:07:22:193 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:07:22:193];the duration of usage was [327] ms (MID=0003080248636816, UBID=, FACTID=)



File: 0003080248636817
------------------


06 Oct 00:04:10:563 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=0000050244656716, FACTID=0000786982)

06 Oct 00:04:10:967 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionFinalizer -
---- SOAP Response Detail Start ----
Soap Envelope: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><processConsumerMessageResponse xmlns="http://gateway.mascon.implementation.axis.orbitz.com"></processConsumerMessageResponse></soapenv:Body></soapenv:Envelope>
---- SOAP Response Detail End ---- (MID=0003080248636817, UBID=0000050244656716, FACTID=)

06 Oct 00:07:20:256 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=, FACTID=)



File: 0003080248636818
------------------


06 Oct 00:06:52:299 [Servlet.Engine.Transports : 5] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer -
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0SmilierocessConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0SmilierocessConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636818, UBID=0000050244656718, FACTID=0000786987)

06 Oct 00:06:52:344 [Servlet.Engine.Transports : 5] ERROR com.orbitz.axis.m2c.soap.XmlBeanDocumentServiceOperation - Caught exception in validateInput() (MID=0003080248636818, UBID=0000050244656718, FACTID=)





The original log content is given below:


06 Oct 00:04:10:334 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer -
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0SmilierocessConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0SmilierocessConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636816, UBID=, FACTID=)

06 Oct 00:04:10:563 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=0000050244656716, FACTID=0000786982)

06 Oct 00:04:10:891 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:04:10:563];the duration of usage was [327] ms (MID=0003080248636816, UBID=0000050244656716, FACTID=0000786987)

06 Oct 00:04:10:967 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionFinalizer -
---- SOAP Response Detail Start ----
Soap Envelope: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><processConsumerMessageResponse xmlns="http://gateway.mascon.implementation.axis.orbitz.com"></processConsumerMessageResponse></soapenv:Body></soapenv:Envelope>
---- SOAP Response Detail End ---- (MID=0003080248636817, UBID=0000050244656716, FACTID=)

06 Oct 00:06:52:299 [Servlet.Engine.Transports : 5] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer -
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0SmilierocessConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0SmilierocessConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636818, UBID=0000050244656718, FACTID=0000786987)

06 Oct 00:06:52:344 [Servlet.Engine.Transports : 5] ERROR com.orbitz.axis.m2c.soap.XmlBeanDocumentServiceOperation - Caught exception in validateInput() (MID=0003080248636818, UBID=0000050244656718, FACTID=)

06 Oct 00:07:20:256 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=, FACTID=)

06 Oct 00:07:22:193 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:07:22:193];the duration of usage was [327] ms (MID=0003080248636816, UBID=, FACTID=)

Last edited by Alecs; 04-15-2008 at 03:59 PM..
# 2  
Old 04-15-2008
Quote:
Originally Posted by Alecs
The output I wanted to generate is like this:

Please describe exactly and concisely what you want done. No one is going to hunt through all that to find the criteria.
# 3  
Old 04-15-2008
cp file1 file2 script help, immediate help needed......

Do you know how to make up the script "cp file1 file2"? If you don't know, do you know of anyone else that could help me? I need a response as soon as possible, I greatly appreciate any help you can give me.
# 4  
Old 04-15-2008
Quote:
Originally Posted by xNaTe128x
Do you know how to make up the script "cp file1 file2"? If you don't know, do you know of anyone else that could help me? I need a response as soon as possible, I greatly appreciate any help you can give me.
Don't 'hijack' other people's threads - start a new thread. Show the effort!!! WARNING!

Last edited by vgersh99; 04-15-2008 at 09:24 PM..
# 5  
Old 04-15-2008
Perhaps try...
Code:
awk '{a[++i]=$0}
     match($0,/MID=[0-9]*/){
        f="outfile." substr($0,RSTART+4,RLENGTH-4)
        for(n=1;n<=i;n++)
            print a[n] >> f
        i=0
        close(f)
     }' infile

Tested on the sample data...
Code:
$ head -1000 outfile.*
==> outfile.0003080248636816 <==
06 Oct 00:04:10:334 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer - 
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0rocessConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0rocessConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636816, UBID=, FACTID=)

06 Oct 00:04:10:891 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:04:10:563];the duration of usage was [327] ms (MID=0003080248636816, UBID=0000050244656716, FACTID=0000786987)

06 Oct 00:07:22:193 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242 releaseConnection() - Released back connection [org.apache.commons.httpclient.HttpConnectionProxy@1b727b3];it was checked out at [06 Oct 00:07:22:193];the duration of usage was [327] ms (MID=0003080248636816, UBID=, FACTID=)

==> outfile.0003080248636817 <==

06 Oct 00:04:10:563 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=0000050244656716, FACTID=0000786982)

06 Oct 00:04:10:967 [Servlet.Engine.Transports : 11] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionFinalizer - 
---- SOAP Response Detail Start ----
Soap Envelope: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><processConsumerMessageResponse xmlns="http://gateway.mascon.implementation.axis.orbitz.com"></processConsumerMessageResponse></soapenv:Body></soapenv:Envelope>
---- SOAP Response Detail End ---- (MID=0003080248636817, UBID=0000050244656716, FACTID=)

06 Oct 00:07:20:256 [Servlet.Engine.Transports : 11] AUDIT org.apache.commons.httpclient.HttpConnectionManagerProxy - org.apache.commons.httpclient.SimpleHttpConnectionManager@186f242- Received connection org.apache.commons.httpclient.HttpConnection@8b0027 for host configuration HostConfiguration[host=http://app62.atl.ec.orbitz.com:84] in [0] ms (MID=0003080248636817, UBID=, FACTID=)

==> outfile.0003080248636818 <==

06 Oct 00:06:52:299 [Servlet.Engine.Transports : 5] INFO com.orbitz.axis.m2c.soap.axis.AxisInteractionInitializer - 
---- SOAP Request Detail Start ----
Target Service Name: MasconWebService
Transport Name: http
Soap Envelope: <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><ns0rocessConsumerMessage xmlns:ns0="http://gateway.mascon.implementation.axis.orbitz.com">
<ns0:masconConsumerRequest>
<ns0:applicationType>Consumer</ns0:applicationType>
<ns0:messageId>1000</ns0:messageId>

</ns0:masconConsumerRequest>
</ns0rocessConsumerMessage></SOAP-ENV:Body></SOAP-ENV:Envelope>
---- SOAP Request Detail End ---- (MID=0003080248636818, UBID=0000050244656718, FACTID=0000786987)

06 Oct 00:06:52:344 [Servlet.Engine.Transports : 5] ERROR com.orbitz.axis.m2c.soap.XmlBeanDocumentServiceOperation - Caught exception in validateInput() (MID=0003080248636818, UBID=0000050244656718, FACTID=)

# 6  
Old 04-17-2008
Quote:
awk '{a[++i]=$0}
match($0,/MID=[0-9]*/){
f="outfile." substr($0,RSTART+4,RLENGTH-4)
for(n=1;n<=i;n++)
print a[n] >> f
i=0
close(f)
}' infile
Above code is working exactly as what I want. Thank you very much Ygor !!
# 7  
Old 04-22-2008
Quote:
f="outfile." substr($0,RSTART+4,RLENGTH-4)
for(n=1;n<=i;n++)
print a[n] >> f
Is there any way to create dynamic array with name as the same filename used in the script, inside the awk block and append the content of array a to the respective dynamic array. Basically, I would like to replace the use of temporary files with arrays in awk. Any help on creating dynamic arrays is appreciated.

Thanks,
Alecs
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Text processing in UNIX

Greetings! I have a text file that I am trying to process to get the desired output but looks like I will need the community help. Input File: a|x|london|consumer|consumer1|country||D|consumer|consumer1|country||1 a|x|paris|consumer|consumer2|country||D|consumer1|consumer2|country||2... (3 Replies)
Discussion started by: bikerboy
3 Replies

2. Shell Programming and Scripting

awk for text processing

Hi,my file is in this format ", \"symbol\": \"Rbm38\" } ]" I want to convert it to a more user readable format _id pubmed text symbol 67196 18667844 Overexpression of UBE2T in NIH3T3 cells significantly promoted colony formation in mouse cell cultures Ube2t 56190 21764855 ... (3 Replies)
Discussion started by: biofreek
3 Replies

3. Shell Programming and Scripting

Text processing

Hi, Need an advise on $ cat test.txt START field1 field2 field3 field4 field5 field6 END 12345|6|1|2|3|4|111|119 67890|6|1|3|8|9|112|000 $ (4 Replies)
Discussion started by: getmilo
4 Replies

4. Shell Programming and Scripting

Help with text processing

I have an Input file which has a series of lines(which could vary) followed by two blank lines and then another series of lines(Could be any number of lines) followed by two blank lines and then repeats. I need to use filters to convert the following input file(which is an example) to an output... (7 Replies)
Discussion started by: bikerboy
7 Replies

5. Shell Programming and Scripting

Text processing using awk

I dispose of two tab-delimited files (the first column is the primary key): File 1 (there are multiple rows sharing the same key, I cannot merge them) A 28,29,30,31 A 17,18,19 B 11,13,14,15 B 8,9File 2 (there is one only row beginning with a given key) A 2,8,18,30,31 B ... (3 Replies)
Discussion started by: dovah
3 Replies

6. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

7. Shell Programming and Scripting

Text processing doubt

How to print nth column of a pattern/file without using awk,cut commands? (1 Reply)
Discussion started by: rajkumarin
1 Replies

8. Shell Programming and Scripting

Text processing of file

I have a text file which is a dataset. and I need to convert it into a CSV format The file is as follows : First line : -1 3:1 11:1 14:1 19:1 39:1 42:1 55:1 64:1 67:1 73:1 75:1 76:1 80:1 83:1 Second line " +1 5:1 11:1 15:1 32:1 39:1 40:1 52:1 63:1 67:1 73:1 74:1 76:1 78:1 83:1 There are a... (6 Replies)
Discussion started by: ajayram
6 Replies

9. Shell Programming and Scripting

Awk text processing

Hi Very much appreciate if somebody could give me a clue .. I undestand that it could be done with awk but have a limited experience. I have the following text in the file 1 909 YES NO 2 500 No NO . ... 1 ... (8 Replies)
Discussion started by: zam
8 Replies

10. UNIX for Dummies Questions & Answers

text file processing

Hello! There is a text file, that contains hierarchy of menues, like: Aaaaa->Bbbbb Aaaaa->Cccc Aaaaa-> {spaces} Ddddd (it means that the full path is Aaaaa->Cccc->Ddddd ) Aaaaa-> {more spaces} Eeeee (it means that the full path is Aaaaa->Cccc->Ddddd->Eeeee ) Fffffff->Ggggg... (1 Reply)
Discussion started by: alias47
1 Replies
Login or Register to Ask a Question