Sponsored Content
Full Discussion: Parsing XML
Top Forums Shell Programming and Scripting Parsing XML Post 302556514 by ManoharMa on Monday 19th of September 2011 02:45:50 AM
Old 09-19-2011
Parsing XML

Learned People,

Hello !

Till today, for the most part, all of the tricky questions/situations that I encountered were already posted by other folks and all I had to do was peruse through these one at a time and I could find some sort of an answer and all I had to do was add some minor tweaks and it would fly.

Today, I ran into this situation where in Im parsing XML data that is stored in flat files. I found a perl based answer on this forums that sounds like I can use it but there is no perl installation on the machine & what ever I can do, I tried it and it is not working. Also, this is not a well-formed xml and most of it is one continuos line of 2million+ characters and even when the file size runs into a gig or 1.2g, it only has 20,000 lines.

Here is an example below -

Sample input file ------
HTML Code:
$head -1 ROAMERCDRFILE

MDN>9168761121</MDN>\n<HomeSId>NA</HomeSId>\n<ESN>NA</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid2 Power Phase Error Peak</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid2 Power Phase Error Peak</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>0</Value>\n<MSN>Degree</MSN>\n<HomeSId>NA</HomeSId>\n<ESN>NA</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid2 Power Phase Error RMS</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid2 Power Phase Error RMS</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>1.87</Value>\n<MSN>Degree</MSN>\n<HomeSId>NA</HomeSId>\n<ESN>NA</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid2 Power Freq Error</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid2 Power Freq Error</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>4.47</Value>\n<MSN>Hz</MSN>\n<HomeSId>150</HomeSId>\n<ESN>-150</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid2 Power Transmit Time Error</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid2 Power Transmit Time Error</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>0</Value>\n<MSN>us</MSN>\n<HomeSId>NA</HomeSId>\n<ESN>NA</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Power</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid1 Power</Subsubroamer>\n<ServSid>CDMA BC6 Power Port A BC6 CH 25 Mid1Power</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>-10.35</Value>\n<MSN>dBm</MSN>\n<HomeSId>-7</HomeSId>\n<ESN>-13</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid1 Power Rho</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid1 Power Rho</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>99.86</Value>\n<MSN>N/A</MSN>\n<HomeSId>100</HomeSId>\n<ESN>98</ESN>\n
What Im looking for is -
HTML Code:
9168761121
	NA
	Mod
	CDMA BC6
	0
	NA
            ....
            .....
            ...
This is what I stitched together from copying stuff around ....

HTML Code:
awk 'BEGIN{FS="<|>"}
{print ESN, SubRoamer, Roamer, Value
ESN=""
SubRoamer =""          
Roamer =""
Value=""
} 
{for(i=1;i<=NF;i++) {if($i=="ESN"){ESN=$(i+1);continue}}}
{for(i=1;i<=NF;i++) {if($i=="SubRoamer"){SubRoamer =$(i+1); continue}}}
{for(i=1;i<=NF;i++) {if($i=="Roamer"){Roamer =$(i+1); continue}}}
{for(i=1;i<=NF;i++) {if($i=="Value"){Value =$(i+1); continue}}}
END {print ESN, SubRoamer, Roamer, Value}' ROAMERCDRFILE
& it print the following ...
HTML Code:
98   99.86
which is wrong.

Also, I pieced together a very laborious looking thing as below -

HTML Code:
cat ROAMERCDRFILE  | tr '>' '\n' | egrep 'ESN|Subroamer|roamer|Value' | sed 's/ESN//g' | sed 's/Subroamer//g' | sed 's/roamer//g' | sed 's/Value//g' | sed 's/<//g' |sed 's/>//g' | sed 's/\///g' | sed 's/\\//g'
& it is showing records as below -


HTML Code:
n
NA
n
CDMA BC6
n
Mod
n
Port A BC6 CH 25 Mid2 Power Phase Error Peak
n
0
n
NA
n
CDMA BC6
n
Mod
n
Port A BC6 CH 25 Mid2 Power Phase Error RMS
n
1.87
n
NA
n
But then, that is taking lots and lots of time and so far, I havent seen result pouring out not even once.

Any and all help would be such a huge relief.

Please help !

regards,
Manohar
 

10 More Discussions You Might Find Interesting

1. Programming

XML parsing

Hi I want to take an XML file and transform it into a pipe-delimited format. What is the best tool to use for this? I have libxml2 which seems to be the best xml parser around. The xml file will have the following format. <Txn> <Date>120504</Date> <id>99</id> <Items> <Item>... (1 Reply)
Discussion started by: handak9
1 Replies

2. Shell Programming and Scripting

parsing xml

I want to use wget comment to parse an xml parse that exist in an online website. How can I connect it using shell script through Unix and how can I parse it?? (1 Reply)
Discussion started by: walnut
1 Replies

3. Shell Programming and Scripting

XML Parsing

Hi, Need a script to parse the following xml file content <tag1 Name="val1"> <abc Name="key"/> <abc Name="pass">*********</abc> </tag1> <tag2 Name="Core"> <Host Name="a.b.c"> <tag1 Name="abc"> <abc Name="ac">None</abc> ... (4 Replies)
Discussion started by: Mavericc
4 Replies

4. Shell Programming and Scripting

XML parsing

I have a xml file attached. I need to parse parameterId and its value My output should be like 151515 38 151522 32769 and so on.. Please help me. Its urgent (6 Replies)
Discussion started by: LavanyaP
6 Replies

5. UNIX for Advanced & Expert Users

XML Parsing

I had a big XML and from which I have to make a layout as below *TOTAL+CB | *CB+FX | CS |*IR | *TOTAL | -------------------------------------------------------------------------------------------------- |CB FX | | | | DMFXNY EMSGFX... (6 Replies)
Discussion started by: manas_ranjan
6 Replies

6. Shell Programming and Scripting

Parsing XML

I am trying to parse an xml file and trying to grab certain values and inserting them into database table. I have the following xml that I am parsing: <dd:service name="locator" link="false"> <dd:activation mode="manual" /> <dd:run mode="direct_persistent" proxified="false" managed="true"... (7 Replies)
Discussion started by: $criptKid617
7 Replies

7. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

8. Shell Programming and Scripting

XML parsing

i have xml output in below format... <AlertsResponse> <Alert id="11216" name="fgdfg"> <AlertActionLog timestamp="1356521629778" user="admin" detail="Recovery Alert"/> </Alert> <Alert id="11215" name="gdfg <AlertActionLog timestamp="1356430119840" user=""... (12 Replies)
Discussion started by: vivek d r
12 Replies

9. Shell Programming and Scripting

XML Parsing :

HI I want to parse below file in to two output :- Input :- ?xml version="1.0" encoding="UTF-8"?> <bulkCmConfigDataFile xmlns:un="utranNrm.xsd" <configData dnPrefix="Undefined"> <xn:SubNetwork id="ONRM_ROOT_MO_R"> <xn:MeContext id="C136"> ... (3 Replies)
Discussion started by: asavaliya
3 Replies

10. Shell Programming and Scripting

XML parsing

I have an xml file where the format looks like below <SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command" REUSABLE ="NO" TYPE ="Pre-session command"> <TASK DESCRIPTION ="" NAME ="pre_session_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1"> ... (8 Replies)
Discussion started by: r_t_1601
8 Replies
All times are GMT -4. The time now is 11:19 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy