Extract and parse XML data (statistic value) to csv
Hi All,
I need to parse some statistic data from the "measInfo" -eg. 25250000 (as highlighted) and return the result into line by line, and erasing all other unnecessary info/tag.
Thought of starting with grep "measInfoID="25250000" but this only returns 1 line. How do I get all the output below this measInfoID? and return each of the value, line by line as per my desired output? I am assuming sed is needed to erase some of the data, and perhaps awk to loop?
Desired output
And the desired output should be in csv format (not sure if "," is needed...I just want the easily further processed with awk ' using variable $1...$n)
mID=25250000
sed -n '/<measInfo measInfoId="'$mID'">/,/<\/measInfo>/ {/^<measValue / {s/.*Label=\([^"]*\).*/\1/ ;x; n; s/^<measResults>\([0-9 ]*\).*/\1/ ;H; x; s/\n//; p } }' stats.xml
Let me explain this mess :
Code:
mID=25250000 #use a variable
sed -n '
/<measInfo measInfoId="'$mID'">/,/<\/measInfo/{ #consider only the section inbetween measInfo tags
/^<measValue / { #on lines that start with measValue tag
s/.*Label=\([^"]*\).*/\1/ ; #get the stuff behind 'Label='
x; #and put it into hold buffer
n; #read next line
s/^<measResults>\([0-9 ]*\).*/\1/ ; #extract just the numbers
H; #and append them to hold buffer
x; #retrieve hold buffer
s/\n//; #get rid of an extra newline
p #and print out
}
}' stats.xml
This assumes that the <measResults> data is always on the next line after <measValue>.
If you put this code into your script, make sure to keep the comments
Thanks mirni.
That code is very complicated lol. I can never understand sed as its syntax is too confusing. I tried it but it returned no result. Something must have gone wrong.
Code:
root@localhost:~/xmlproj> sed -n '/<measInfo measInfoId="25250000">/,/<\/measInfo>/ {/^<measValue / {s/.*Label=\([^"]*\).*/\1/ ;x; n; s/^<measResults>\([0-9 ]*\).*/\1/ ;H; x; s/\n//; p } }' sampleCellbasedscript.txt
root@localhost:~/xmlproj> sed -n '/<measInfo measInfoId="25250000">/,/<\/measInfo>/ {/^<measValue / {s/.*Label=\([^"]*\).*/\1/ ;x; n; s/^<measResults>\([0-9 ]*\).*/\1/ ;H; x; s/\n//; p } }' sampleCellbasedscript.txt
root@localhost:~/xmlproj>
Hi chapakrani,
Whats xmlint and Xpath? I tried searching online for xml to csv parser but I could not find any useful one.
Last edited by fpmurphy; 01-06-2012 at 11:41 AM..
Reason: code tags please!
Where stats.xml is the copied'n'pasted stuff from your first post.
Does your file sampleCellbasedscript.txt contain exactly what you posted? Are there by any chance any whitespace characters at the beggining of the line with measValue tag?
Sed is infamous for its obscurity, but that is just on the first sight. Once you understand how it works, it is no mystery.
Try this:
Code:
sed -n '/<measInfo measInfoId="25250000">/,/<\/measInfo>/ p' inputFile
It should print the section between measInfo tags.
Where stats.xml is the copied'n'pasted stuff from your first post.
Does your file sampleCellbasedscript.txt contain exactly what you posted? Are there by any chance any whitespace characters at the beggining of the line with measValue tag?
Sed is infamous for its obscurity, but that is just on the first sight. Once you understand how it works, it is no mystery.
Try this:
Code:
sed -n '/<measInfo measInfoId="25250000">/,/<\/measInfo>/ p' inputFile
It should print the section between measInfo tags.
Thank you so much, Mirni.
Yes, there are 4 white spaces before the meaValue and 9 before meaResult. I have tried to ammend the code to following, and it now gives close to my desidered result. However, how do I add extra "," in between the last string returned by sedding <measResults> ?
root@localhost:~/xmlproj> sed -n '/<measInfo measInfoId="25250000">/,/<\/measInfo>/ {/^<measValue / {s/.*Label=\([^"]*\).*/\1/ ;x; n; s/^<measResults>\([0-9 ]*\).*/\1/ ;H; x; s/\n//; p } }' sampleCellbasedscript.txt
(initial code - no result due to white spaces)
Question:
1) How to remove the space before ID
2) How to print , after each result just like the desired output?
3) If there are decimal point in the input file, then this code fails to output the float number. I think the ([0-9 ]*\).*/\1/ only returns any number between 0 to 9, and float number will fail. How do I resolve this?
I also tried piping the result and run a 2nd sed to print , yet could not erase the existing , (hence causing double ,,)
root@localhost:~/xmlproj> sed -n '/<measInfo measInfoId="25250000">/,/<\/measInfo>/ {/<measValue / {s/.*Label=\([^"]*\).*/\1,/ ;x; n; s/.*<measResults>\([0-9 ]*\).*/\1/ ;H; x; s/\n//; p } }' sampleCellbasedscript.txt | sed -e 's/ /,/g'
Site-O:MD0035-O-A-2,,ID=59135,0,0,0,27300,100194,141378,2282,0,0,379,5849362,0,0,2497,0,
Site-O:MA0340-O-A-2,,ID=56575,0,0,0,2099,11649,11091,28,0,0,74,249108,0,0,119,0,
Site-O:MD8001-O-A-3,,ID=59646,0,0,0,0,549,0,0,0,0,0,1967,0,0,0,0,
Site-O:MA0056-O-A-2,,ID=59155,0,0,0,0,1571,37,0,0,0,41,24453,0,0,0,0,
Site-O:MA0056-O-A-1,,ID=59154,0,0,0,1349,4921,878,0,0,0,48,24651,0,0,0,0,
Site-O:MA0146-O-A-3,,ID=57106,0,0,0,0,7018,106949,0,0,0,10,3928360,0,0,0,0,
Site-O:MA0120-O-B-3,,ID=12561,0,0,0,8021,31504,1743,53,0,0,12,3939629,0,0,0,0,
Site-O:MA8105-O-A-3,,ID=58896,0,0,0,0,2807,195,0,0,0,0,50977,0,0,0,0,
Site-O:MA0289-O-A-3,,ID=57616,0,0,0,0,15665,10976,0,0,0,4,692551,0,0,831,0,
Site-O:MA0146-O-A-1,,ID=57104,0,0,0,0,1884,237,0,0,0,1,13943,0,0,0,0,
I am not too savvy with arrays and am assuming that what I am looking for needs arrays. This is my requirement.
So I have the raw data that gets updated to a log as shown below
StudentInfo:
FullInfo = {
Address = Newark
Age = 20
Name= John
}
StudentInfo:... (2 Replies)
Hi
I have xml file with multiple records and would like to extract records from xml with specific condition if specific tag is present extract entire row otherwise skip .
<logentry revision="21510">
<author>mantest</author>
<date>2015-02-27</date>
<QC_ID>334566</QC_ID>... (12 Replies)
Hi All,
Hope all you are doing good! Need your help. I have an XML file which needs to be converted CSV file. I am not an expert of awk/sed so your help is highly appreciated!!
XML file looks like this:
<l:event dateTime="2013-03-13 07:15:54.713" layerName="OSB" processName="ABC"... (2 Replies)
Hi friend i have input as following XML file
<?xml version="1.0"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02">
<BkToCstmrDbtCdtNtfctn>
<GrpHdr><MsgId>LBP-RDJ-TE000000-130042430010001001</MsgId><CreDtTm>2013-01-04T03:21:30</CreDtTm></GrpHdr>... (3 Replies)
Hi,
Can anybody help to solve this. I want to parse some xmldata along with the URL in the Shell.
I'm calling the URL via the curl command
Given below is my shell script file
export... (7 Replies)
Hi All,
Need your assistance on another xml tag related issue. I have a xml file as below:
<INVOICES>
<INVOICE>
<BILL>
<BILL_NO>1234</BILL_NO>
<BILL_DATE>01 JAN 2011</BILL_DATE>
</BILL>
<NAMEINFO>
<NAME>ABC</NAME>
</NAMEINFO>
</INVOICE>
<INVOICE>
<BILL>
<BILL_NO>5678</BILL_NO>... (12 Replies)
Hi All,
I am having an XML tag like:
<detail sim_ser_no_1="898407109001000090"
imsi_1="452070001000090">
<security>ADM1=????</security>
<security>PIN1=????</security>
<security>PIN2=????</security>
... (2 Replies)
Hi , I have a billing CDR file which is separated by “!”. I need to extract and format data between the starting (“!”) and the end of the line (“1.2.1.8”). These two variables are permanent tags to show begin and end.
! TICKET NBR : 2 ! GSI : 101 ! 3100.2.112.1 24/03/2010 00:41:14 !... (3 Replies)
Hi all,
I have the following xml file :
<xmlhead><xmlelement1>element1value</xmlelement1>\0a<xmlelement2>jjasd</xmlelement2>...</xmlhead>
As you can see there are no lines or spaces seperating the elements, just the character \0a. How can i find and print the values of a specific element?... (1 Reply)
Hi,
It's been a few years since college when I did stuff like this all the time. Can someone help me figure out how to best tackle this problem? I need to parse a file full of entries that look like this:
<eq action="A" sectyType="0" symbol="PGR" exch="CA" curr="VEF" sess="NORM"... (7 Replies)