Learned People,
Hello !
Till today, for the most part, all of the tricky questions/situations that I encountered were already posted by other folks and all I had to do was peruse through these one at a time and I could find some sort of an answer and all I had to do was add some minor tweaks and it would fly.
Today, I ran into this situation where in Im parsing XML data that is stored in flat files. I found a perl based answer on this forums that sounds like I can use it but there is no perl installation on the machine & what ever I can do, I tried it and it is not working. Also, this is not a well-formed xml and most of it is one continuos line of 2million+ characters and even when the file size runs into a gig or 1.2g, it only has 20,000 lines.
Here is an example below -
Sample input file ------
HTML Code:
$head -1 ROAMERCDRFILE
MDN>9168761121</MDN>\n<HomeSId>NA</HomeSId>\n<ESN>NA</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid2 Power Phase Error Peak</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid2 Power Phase Error Peak</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>0</Value>\n<MSN>Degree</MSN>\n<HomeSId>NA</HomeSId>\n<ESN>NA</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid2 Power Phase Error RMS</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid2 Power Phase Error RMS</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>1.87</Value>\n<MSN>Degree</MSN>\n<HomeSId>NA</HomeSId>\n<ESN>NA</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid2 Power Freq Error</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid2 Power Freq Error</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>4.47</Value>\n<MSN>Hz</MSN>\n<HomeSId>150</HomeSId>\n<ESN>-150</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid2 Power Transmit Time Error</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid2 Power Transmit Time Error</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>0</Value>\n<MSN>us</MSN>\n<HomeSId>NA</HomeSId>\n<ESN>NA</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Power</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid1 Power</Subsubroamer>\n<ServSid>CDMA BC6 Power Port A BC6 CH 25 Mid1Power</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>-10.35</Value>\n<MSN>dBm</MSN>\n<HomeSId>-7</HomeSId>\n<ESN>-13</ESN>\n</HOMESID_Data>\n<HOMESID_Data>\n<roamer>CDMA BC6</roamer>\n<Subroamer>Mod</Subroamer>\n<Subsubroamer>Port A BC6 CH 25 Mid1 Power Rho</Subsubroamer>\n<ServSid>CDMA BC6 Mod Port A BC6 CH 25 Mid1 Power Rho</ServSid>\n<SPCS_Priority>1</SPCS_Priority>\n<Value>99.86</Value>\n<MSN>N/A</MSN>\n<HomeSId>100</HomeSId>\n<ESN>98</ESN>\n
What Im looking for is -
HTML Code:
9168761121
NA
Mod
CDMA BC6
0
NA
....
.....
...
This is what I stitched together from copying stuff around ....
HTML Code:
awk 'BEGIN{FS="<|>"}
{print ESN, SubRoamer, Roamer, Value
ESN=""
SubRoamer =""
Roamer =""
Value=""
}
{for(i=1;i<=NF;i++) {if($i=="ESN"){ESN=$(i+1);continue}}}
{for(i=1;i<=NF;i++) {if($i=="SubRoamer"){SubRoamer =$(i+1); continue}}}
{for(i=1;i<=NF;i++) {if($i=="Roamer"){Roamer =$(i+1); continue}}}
{for(i=1;i<=NF;i++) {if($i=="Value"){Value =$(i+1); continue}}}
END {print ESN, SubRoamer, Roamer, Value}' ROAMERCDRFILE
& it print the following ...
which is wrong.
Also, I pieced together a very laborious looking thing as below -
HTML Code:
cat ROAMERCDRFILE | tr '>' '\n' | egrep 'ESN|Subroamer|roamer|Value' | sed 's/ESN//g' | sed 's/Subroamer//g' | sed 's/roamer//g' | sed 's/Value//g' | sed 's/<//g' |sed 's/>//g' | sed 's/\///g' | sed 's/\\//g'
& it is showing records as below -
HTML Code:
n
NA
n
CDMA BC6
n
Mod
n
Port A BC6 CH 25 Mid2 Power Phase Error Peak
n
0
n
NA
n
CDMA BC6
n
Mod
n
Port A BC6 CH 25 Mid2 Power Phase Error RMS
n
1.87
n
NA
n
But then, that is taking lots and lots of time and so far, I havent seen result pouring out not even once.
Any and all help would be such a huge relief.
Please help !
regards,
Manohar