Extract strings from multiple lines into one file -


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract strings from multiple lines into one file -
# 1  
Old 03-21-2011
Question Extract strings from multiple lines into one file -

input file
Quote:
<af type="tenured" id="11" timestamp="Mar 17 13:09:04 2011" intervalms="213620040.225">
<minimum requested_bytes="40" />
<time exclusiveaccessms="0.082" />
<tenured freebytes="2147328" totalbytes="268435456" percent="0" >
<soa freebytes="0" totalbytes="266288128" percent="0" />
<loa freebytes="2147328" totalbytes="2147328" percent="100" />
</tenured>
<gc type="global" id="11" totalid="11" intervalms="213620040.989">
<refs_cleared soft="0" threshold="32" weak="15" phantom="0" />
<finalization objectsqueued="85439" />
<timesms mark="134.171" sweep="9.157" compact="0.000" total="143.712" />
<tenured freebytes="197381840" totalbytes="268435456" percent="73" >
<soa freebytes="195502800" totalbytes="266556416" percent="73" />
<loa freebytes="1879040" totalbytes="1879040" percent="100" />
</tenured>
</gc>
<tenured freebytes="197381320" totalbytes="268435456" percent="73" >
<soa freebytes="195502280" totalbytes="266556416" percent="73" />
<loa freebytes="1879040" totalbytes="1879040" percent="100" />
</tenured>
<time totalms="144.596" />
</af>
<af type="tenured" id="12" timestamp="Mar 20 00:37:37 2011" intervalms="214112729.198">
<minimum requested_bytes="40" />
<time exclusiveaccessms="0.083" />
<tenured freebytes="1879040" totalbytes="268435456" percent="0" >
<soa freebytes="0" totalbytes="266556416" percent="0" />
<loa freebytes="1879040" totalbytes="1879040" percent="100" />
</tenured>
<gc type="global" id="12" totalid="12" intervalms="214112730.074">
<refs_cleared soft="0" threshold="32" weak="15" phantom="0" />
<finalization objectsqueued="85638" />
<timesms mark="132.993" sweep="8.944" compact="0.000" total="142.291" />
<tenured freebytes="197293616" totalbytes="268435456" percent="73" >
<soa freebytes="195683376" totalbytes="266825216" percent="73" />
<loa freebytes="1610240" totalbytes="1610240" percent="100" />
</tenured>
</gc>
<tenured freebytes="197293016" totalbytes="268435456" percent="73" >
<soa freebytes="195682776" totalbytes="266825216" percent="73" />
<loa freebytes="1610240" totalbytes="1610240" percent="100" />
</tenured>
<time totalms="144.242" />
</af>
<af type="tenured" id="13" timestamp="Mar 20 21:30:59 2011" intervalms="75201653.293">
<minimum requested_bytes="24" />
<time exclusiveaccessms="0.069" />
<tenured freebytes="1610240" totalbytes="268435456" percent="0" >
<soa freebytes="0" totalbytes="266825216" percent="0" />
<loa freebytes="1610240" totalbytes="1610240" percent="100" />
</tenured>
<gc type="global" id="13" totalid="13" intervalms="75201654.737">
<refs_cleared soft="4" threshold="32" weak="1678" phantom="0" />
<finalization objectsqueued="30104" />
<timesms mark="98.374" sweep="7.513" compact="0.000" total="106.094" />
<tenured freebytes="213732984" totalbytes="268435456" percent="79" >
<soa freebytes="212391032" totalbytes="267093504" percent="79" />
<loa freebytes="1341952" totalbytes="1341952" percent="100" />
</tenured>
</gc>
<tenured freebytes="213732384" totalbytes="268435456" percent="79" >
<soa freebytes="212390432" totalbytes="267093504" percent="79" />
<loa freebytes="1341952" totalbytes="1341952" percent="100" />
</tenured>
<time totalms="108.518" />
</af>
Desired csv output
gc_type, date/time, milli secs
af, Mar 17 13:09:04 2011, 144.596
af, Mar 20 00:37:37 2011, 144.242
af, ar 20 21:30:59 2011, 108.518

Hi All,

Any help in acheiving the above would be appreciated. I would like to parse through lines within one file and form one consolidated file with gc_type, date/time, milli secs. A sample input file format and desired csv output is listed above.
# 2  
Old 03-21-2011
Here's one way to do it with Perl -

Code:
$
$ cat input
<af type="tenured" id="11" timestamp="Mar 17 13:09:04 2011" intervalms="213620040.225">
<minimum requested_bytes="40" />
<time exclusiveaccessms="0.082" />
<tenured freebytes="2147328" totalbytes="268435456" percent="0" >
<soa freebytes="0" totalbytes="266288128" percent="0" />
<loa freebytes="2147328" totalbytes="2147328" percent="100" />
</tenured>
<gc type="global" id="11" totalid="11" intervalms="213620040.989">
<refs_cleared soft="0" threshold="32" weak="15" phantom="0" />
<finalization objectsqueued="85439" />
<timesms mark="134.171" sweep="9.157" compact="0.000" total="143.712" />
<tenured freebytes="197381840" totalbytes="268435456" percent="73" >
<soa freebytes="195502800" totalbytes="266556416" percent="73" />
<loa freebytes="1879040" totalbytes="1879040" percent="100" />
</tenured>
</gc>
<tenured freebytes="197381320" totalbytes="268435456" percent="73" >
<soa freebytes="195502280" totalbytes="266556416" percent="73" />
<loa freebytes="1879040" totalbytes="1879040" percent="100" />
</tenured>
<time totalms="144.596" />
</af>
<af type="tenured" id="12" timestamp="Mar 20 00:37:37 2011" intervalms="214112729.198">
<minimum requested_bytes="40" />
<time exclusiveaccessms="0.083" />
<tenured freebytes="1879040" totalbytes="268435456" percent="0" >
<soa freebytes="0" totalbytes="266556416" percent="0" />
<loa freebytes="1879040" totalbytes="1879040" percent="100" />
</tenured>
<gc type="global" id="12" totalid="12" intervalms="214112730.074">
<refs_cleared soft="0" threshold="32" weak="15" phantom="0" />
<finalization objectsqueued="85638" />
<timesms mark="132.993" sweep="8.944" compact="0.000" total="142.291" />
<tenured freebytes="197293616" totalbytes="268435456" percent="73" >
<soa freebytes="195683376" totalbytes="266825216" percent="73" />
<loa freebytes="1610240" totalbytes="1610240" percent="100" />
</tenured>
</gc>
<tenured freebytes="197293016" totalbytes="268435456" percent="73" >
<soa freebytes="195682776" totalbytes="266825216" percent="73" />
<loa freebytes="1610240" totalbytes="1610240" percent="100" />
</tenured>
<time totalms="144.242" />
</af>
<af type="tenured" id="13" timestamp="Mar 20 21:30:59 2011" intervalms="75201653.293">
<minimum requested_bytes="24" />
<time exclusiveaccessms="0.069" />
<tenured freebytes="1610240" totalbytes="268435456" percent="0" >
<soa freebytes="0" totalbytes="266825216" percent="0" />
<loa freebytes="1610240" totalbytes="1610240" percent="100" />
</tenured>
<gc type="global" id="13" totalid="13" intervalms="75201654.737">
<refs_cleared soft="4" threshold="32" weak="1678" phantom="0" />
<finalization objectsqueued="30104" />
<timesms mark="98.374" sweep="7.513" compact="0.000" total="106.094" />
<tenured freebytes="213732984" totalbytes="268435456" percent="79" >
<soa freebytes="212391032" totalbytes="267093504" percent="79" />
<loa freebytes="1341952" totalbytes="1341952" percent="100" />
</tenured>
</gc>
<tenured freebytes="213732384" totalbytes="268435456" percent="79" >
<soa freebytes="212390432" totalbytes="267093504" percent="79" />
<loa freebytes="1341952" totalbytes="1341952" percent="100" />
</tenured>
<time totalms="108.518" />
</af>
$
$
$ perl -lne 'BEGIN {print "gc_type, date/time, milli secs"}
             if (/^.*?af.*timestamp="(.*?)".*$/) {$str="af,$1"}
             elsif (/^.*totalms="(.*?)".*$/) {print "$str,$1"}' input
gc_type, date/time, milli secs
af,Mar 17 13:09:04 2011,144.596
af,Mar 20 00:37:37 2011,144.242
af,Mar 20 21:30:59 2011,108.518
$
$

tyler_durden
# 3  
Old 03-21-2011
How about this?
Code:
awk -F"[=\"]" 'BEGIN{print "gc_type, date/time, milli secs";OFS=","} {if(/timestamp/){ts=$9} if (/<time totalms/){tms=$3;getline; print substr($0,3,(length($0))-3) OFS ts OFS tms}}'  inputfile


Last edited by pravin27; 03-21-2011 at 10:09 AM..
# 4  
Old 03-21-2011
Thank you so much for you quick response. It worked like a champ.Smilie

---------- Post updated at 07:00 PM ---------- Previous update was at 06:34 PM ----------

One more small hep required.
How to include the hostname to the desired output.
desired output
Quote:
hostname, gc_type, date/time, milli secs
hostname, af,Mar 17 13:09:04 2011,144.596
hostname, af,Mar 20 00:37:37 2011,144.242
hostname, af,Mar 20 21:30:59 2011,108.518
# 5  
Old 03-21-2011
If it is constant then use
Code:
awk -F"[=\"]" 'BEGIN{print "hostname, gc_type, date/time, milli secs";OFS=","} {if(/timestamp/){ts=$9} if (/<time totalms/){tms=$3;getline; print "hostname" OFS substr($0,3,(length($0))-3) OFS ts OFS tms}}'  inputfile

OR You want to print the hostname of system, then use
Code:
awk -v hst=`hostname` -F"[=\"]" 'BEGIN{print "hostname, gc_type, date/time, milli secs";OFS=","} {if(/timestamp/){ts=$9} if (/<time totalms/){tms=$3;getline; print hst OFS substr($0,3,(length($0))-3) OFS ts OFS tms}}' inputfile

This User Gave Thanks to pravin27 For This Post:
# 6  
Old 03-21-2011
Or, using Perl -

Code:
perl -lne 'BEGIN {print "hostname, gc_type, date/time, milli secs"; chomp($h=`hostname`)}
         if (/^.*?af.*timestamp="(.*?)".*$/) {$str="$h,af,$1"}
         elsif (/^.*totalms="(.*?)".*$/) {print "$str,$1"}' input_file

tyler_durden
This User Gave Thanks to durden_tyler For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract multiple strings from line

Hello I have an output that has a string between quotes and another between square brackets on the same line. I need to extract these 2 strings Example line Device "nrst3a" attributes=(0x4) RAW SERIAL_NUMBER=SNL2 Output should look like nrst3a VD073AV1443BVW00083 I was trying with sed... (3 Replies)
Discussion started by: bombcan
3 Replies

2. Solaris

How to find multiple strings on different lines in file?

Hello, I have spent considerable amount of time breaking my head on this and reached out here. here is the back ground. OS - Solaris 10 There are two strings '<Orin>sop' and '<Dup>two' which I wanted to look for in a file without the quotes on different lines and ONLY if both strings are... (5 Replies)
Discussion started by: keithTait309875
5 Replies

3. Shell Programming and Scripting

Exclude lines in a file with matches with multiple Strings using egrep

Hi I have a txt file and I would like to use egrep without using -v option to exclude the lines which matches with multiple Strings. Let's say I have some text in the txt file. The command should not fetch lines if they have strings something like CAT MAT DAT The command should fetch me... (4 Replies)
Discussion started by: Sathwik
4 Replies

4. Shell Programming and Scripting

Extract a pattern from multiple lines in a file

I have a file that has some lines starts with * I want to get these lines, then get the word between "diac" and "lex". ex. file: ;;WORD AlAx *0.942490 diac:Al>ax lex:>ax_1 bw:Al/DET+>ax/NOUN+ gloss:brother pos:noun prc3:0 prc2:0 prc1:0 prc0:Al_det per:na asp:na vox:na mod:na gen:m num:s... (4 Replies)
Discussion started by: Viernes
4 Replies

5. Shell Programming and Scripting

awk? extract quoted "" strings from multiple lines.

I am trying to extract multiple strings from snmp-mib files like below. ----- $ cat IF-MIB.mib <snip> linkDown NOTIFICATION-TYPE OBJECTS { ifIndex, ifAdminStatus, ifOperStatus } STATUS current DESCRIPTION "A linkDown trap signifies that the SNMP entity, acting in... (5 Replies)
Discussion started by: genzo
5 Replies

6. Shell Programming and Scripting

Extract strings from multiple lines into one csv file

Hi all, Please go through my requirement. I have a log file in the location /opt/WebSphere61/AppServer/profiles/EMQbatchprofile/logs/EMQbatch This file contains the follwing pattern data <af type="tenured" id="42" timestamp="May 14 13:44:13 2011" intervalms="955.624"> <minimum... (8 Replies)
Discussion started by: satish.vampire
8 Replies

7. Shell Programming and Scripting

replace a string with contents of a txt file containing multiple lines of strings

Hello everyone, ive been trying to replace a string "kw01" in an xml file with the contents of a txt file having multiple lines. im a unix newbie and all the sed combinations i tried resulted to being garbled. Below is the contents of the txt file: RAISEDATTIME --------------------... (13 Replies)
Discussion started by: 4dirk1
13 Replies

8. UNIX for Dummies Questions & Answers

grep command to find multiple strings in multiple lines in a file.

I want to search files (basically .cc files) in /xx folder and subfolders. Those files (*.cc files) must contain #include "header.h" AND x() function. I am writing it another way to make it clear, I wanna list of *.cc files that have 'header.h' & 'x()'. They must have two strings, header.h... (2 Replies)
Discussion started by: ritikaSharma
2 Replies

9. UNIX for Dummies Questions & Answers

Help please, extract multiple lines from a text file

Hi all, I need to extract lines between the lines 'RD' and 'QA' from a text file (following). there are more that one of such pattern in the file and I need to extract all of them. however, the number of lines between them is varied in the file. Therefore, I can not just use 'grep -A' command.... (6 Replies)
Discussion started by: johnshembb
6 Replies

10. Shell Programming and Scripting

how to extract multiple strings from a line

Hi I have the following requirement. i have the following line from a log file one : two : Three : four : five : six : seven : eight :nine :ten Now can you pls help what i should do to get only the following output from the above line two : five : six : seven : Eight appreciate your... (3 Replies)
Discussion started by: vin_eme
3 Replies
Login or Register to Ask a Question