Extract and parse XML data (statistic value) to csv
Hi All,
I need to parse some statistic data from the "measInfo" -eg. 25250000 (as highlighted) and return the result into line by line, and erasing all other unnecessary info/tag.
Thought of starting with grep "measInfoID="25250000" but this only returns 1 line. How do I get all the output below this measInfoID? and return each of the value, line by line as per my desired output? I am assuming sed is needed to erase some of the data, and perhaps awk to loop?
Any help would be appreciated. Thanks all
Long xml data
Desired output
And the desired output should be in csv format (not sure if "," is needed...I just want the easily further processed with awk ' using variable $1...$n)
Last edited by Franklin52; 01-06-2012 at 03:27 AM..
Reason: Please use code tags for code and data samples, thank you
Try this out:
Let me explain this mess :
This assumes that the <measResults> data is always on the next line after <measValue>.
If you put this code into your script, make sure to keep the comments
Thanks mirni.
That code is very complicated lol. I can never understand sed as its syntax is too confusing. I tried it but it returned no result. Something must have gone wrong.
Hi chapakrani,
Whats xmlint and Xpath? I tried searching online for xml to csv parser but I could not find any useful one.
Last edited by fpmurphy; 01-06-2012 at 11:41 AM..
Reason: code tags please!
Hmm... it works for me:
Where stats.xml is the copied'n'pasted stuff from your first post.
Does your file sampleCellbasedscript.txt contain exactly what you posted? Are there by any chance any whitespace characters at the beggining of the line with measValue tag?
Sed is infamous for its obscurity, but that is just on the first sight. Once you understand how it works, it is no mystery.
Try this:
It should print the section between measInfo tags.
Hmm... it works for me:
Where stats.xml is the copied'n'pasted stuff from your first post.
Does your file sampleCellbasedscript.txt contain exactly what you posted? Are there by any chance any whitespace characters at the beggining of the line with measValue tag?
Sed is infamous for its obscurity, but that is just on the first sight. Once you understand how it works, it is no mystery.
Try this:
It should print the section between measInfo tags.
Thank you so much, Mirni.
Yes, there are 4 white spaces before the meaValue and 9 before meaResult. I have tried to ammend the code to following, and it now gives close to my desidered result. However, how do I add extra "," in between the last string returned by sedding <measResults> ?
root@localhost:~/xmlproj> sed -n '/<measInfo measInfoId="25250000">/,/<\/measInfo>/ {/^<measValue / {s/.*Label=\([^"]*\).*/\1/ ;x; n; s/^<measResults>\([0-9 ]*\).*/\1/ ;H; x; s/\n//; p } }' sampleCellbasedscript.txt
(initial code - no result due to white spaces)
Question:
1) How to remove the space before ID
2) How to print , after each result just like the desired output?
3) If there are decimal point in the input file, then this code fails to output the float number. I think the ([0-9 ]*\).*/\1/ only returns any number between 0 to 9, and float number will fail. How do I resolve this?
I also tried piping the result and run a 2nd sed to print , yet could not erase the existing , (hence causing double ,,)
root@localhost:~/xmlproj> sed -n '/<measInfo measInfoId="25250000">/,/<\/measInfo>/ {/<measValue / {s/.*Label=\([^"]*\).*/\1,/ ;x; n; s/.*<measResults>\([0-9 ]*\).*/\1/ ;H; x; s/\n//; p } }' sampleCellbasedscript.txt | sed -e 's/ /,/g'
Site-O:MD0035-O-A-2,,ID=59135,0,0,0,27300,100194,141378,2282,0,0,379,5849362,0,0,2497,0,
Site-O:MA0340-O-A-2,,ID=56575,0,0,0,2099,11649,11091,28,0,0,74,249108,0,0,119,0,
Site-O:MD8001-O-A-3,,ID=59646,0,0,0,0,549,0,0,0,0,0,1967,0,0,0,0,
Site-O:MA0056-O-A-2,,ID=59155,0,0,0,0,1571,37,0,0,0,41,24453,0,0,0,0,
Site-O:MA0056-O-A-1,,ID=59154,0,0,0,1349,4921,878,0,0,0,48,24651,0,0,0,0,
Site-O:MA0146-O-A-3,,ID=57106,0,0,0,0,7018,106949,0,0,0,10,3928360,0,0,0,0,
Site-O:MA0120-O-B-3,,ID=12561,0,0,0,8021,31504,1743,53,0,0,12,3939629,0,0,0,0,
Site-O:MA8105-O-A-3,,ID=58896,0,0,0,0,2807,195,0,0,0,0,50977,0,0,0,0,
Site-O:MA0289-O-A-3,,ID=57616,0,0,0,0,15665,10976,0,0,0,4,692551,0,0,831,0,
Site-O:MA0146-O-A-1,,ID=57104,0,0,0,0,1884,237,0,0,0,1,13943,0,0,0,0,
I am not too savvy with arrays and am assuming that what I am looking for needs arrays. This is my requirement.
So I have the raw data that gets updated to a log as shown below
StudentInfo:
FullInfo = {
Address = Newark
Age = 20
Name= John
}
StudentInfo:... (2 Replies)
Hi
I have xml file with multiple records and would like to extract records from xml with specific condition if specific tag is present extract entire row otherwise skip .
<logentry revision="21510">
<author>mantest</author>
<date>2015-02-27</date>
<QC_ID>334566</QC_ID>... (12 Replies)
Hi All,
Hope all you are doing good! Need your help. I have an XML file which needs to be converted CSV file. I am not an expert of awk/sed so your help is highly appreciated!!
XML file looks like this:
<l:event dateTime="2013-03-13 07:15:54.713" layerName="OSB" processName="ABC"... (2 Replies)
Hi friend i have input as following XML file
<?xml version="1.0"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02">
<BkToCstmrDbtCdtNtfctn>
<GrpHdr><MsgId>LBP-RDJ-TE000000-130042430010001001</MsgId><CreDtTm>2013-01-04T03:21:30</CreDtTm></GrpHdr>... (3 Replies)
Hi,
Can anybody help to solve this. I want to parse some xmldata along with the URL in the Shell.
I'm calling the URL via the curl command
Given below is my shell script file
export... (7 Replies)
Hi All,
Need your assistance on another xml tag related issue. I have a xml file as below:
<INVOICES>
<INVOICE>
<BILL>
<BILL_NO>1234</BILL_NO>
<BILL_DATE>01 JAN 2011</BILL_DATE>
</BILL>
<NAMEINFO>
<NAME>ABC</NAME>
</NAMEINFO>
</INVOICE>
<INVOICE>
<BILL>
<BILL_NO>5678</BILL_NO>... (12 Replies)
Hi All,
I am having an XML tag like:
<detail sim_ser_no_1="898407109001000090"
imsi_1="452070001000090">
<security>ADM1=????</security>
<security>PIN1=????</security>
<security>PIN2=????</security>
... (2 Replies)
Hi , I have a billing CDR file which is separated by “!”. I need to extract and format data between the starting (“!”) and the end of the line (“1.2.1.8”). These two variables are permanent tags to show begin and end.
! TICKET NBR : 2 ! GSI : 101 ! 3100.2.112.1 24/03/2010 00:41:14 !... (3 Replies)
Hi all,
I have the following xml file :
<xmlhead><xmlelement1>element1value</xmlelement1>\0a<xmlelement2>jjasd</xmlelement2>...</xmlhead>
As you can see there are no lines or spaces seperating the elements, just the character \0a. How can i find and print the values of a specific element?... (1 Reply)
Hi,
It's been a few years since college when I did stuff like this all the time. Can someone help me figure out how to best tackle this problem? I need to parse a file full of entries that look like this:
<eq action="A" sectyType="0" symbol="PGR" exch="CA" curr="VEF" sess="NORM"... (7 Replies)