XML text bounded with tag


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting XML text bounded with tag
# 1  
Old 01-04-2015
XML text bounded with tag

Could you please give your inputs on the below issue:

source.xml
Code:
<?xml version="1.0" encoding="UTF-16"?>
<P1 >
<C1 type="i"><2></C1>
<V1 type="string"><6.2></V1>
<D1 type="string">
	<D2><1.0></D2>
	<D2><2.0></D2>
</D1>
......................
......................
many more records.....
</P1>

Problem with the above xml is, text is bounded between < & >. I am unable to read the xml. Could you please guide me in how to remove the < & > for the text.
# 2  
Old 01-04-2015
What output are you looking for?
# 3  
Old 01-04-2015
The issue will be determining what is a valid XML tag and what is data that appears between "<" and ">". Is it always numeric? Are there negative numbers? Character strings? With or without spaces?
But making a guess, should the results be:
Code:
<?xml version="1.0" encoding="UTF-16"?>
<P1 >
<C1 type="i">2</C1>
<V1 type="string">6.2</V1>
<D1 type="string">
    <D2>1.0</D2>
    <D2>2.0</D2>
</D1>
......................
......................
many more records.....
</P1>

This was done with:
Code:
perl -pe 's{<(\d+(?:\.\d+)?)>}{\1}g;'

or with:
Code:
sed -e 's/<\([1-9][0-9]*\)>/\1/g' -e 's/<\([1-9][0-9]*\.[0-9]*\)>/\1/g'

This User Gave Thanks to derekludwig For This Post:
# 4  
Old 01-04-2015
Thanks a lot for your input, It will contains characters, sapces
Input:
Code:
<?xml version="1.0" encoding="UTF-16"?>
<P1 >
<C1 type="i"><abc txt></C1>
<V1 type="string"><6.2 txt></V1>
<D1 type="string">
    <D2>1.0</D2>
    <D2>2.0</D2>
</D1>
......................
......................
many more records.....
</P1>

desired output:
Code:
<?xml version="1.0" encoding="UTF-16"?>
<P1 >
<C1 type="i">abc txt</C1>
<V1 type="string">6.2 txt</V1>
<D1 type="string">
    <D2>1.0</D2>
    <D2>2.0</D2>
</D1>
......................
......................
many more records.....
</P1>

# 5  
Old 01-04-2015
With this particular format you could try:
Code:
sed 's/<\([^>]*\)>\(<[^>]*>\)$/\1\2/' file

This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 01-04-2015
With respects to Scrutinizer, if the XML tags are nested on the same line, are empty, have multiples on a line, or span multiple lines, as in:
Code:
<?xml version="1.0" encoding="UTF-16"?>
<P1 >
<C1 type="i"><Z1 ><abc txt></Z1></C1>
<V1 type="string"><6.2 txt></V1>
<D1 type="string">
    <D2><1.0></D2>
    <D2><2.0></D2>
    <Y2><one 1.0></Y2><Y2><two 2.0></Y2><Y2><three 3.0></Y2><Y2><four 4.0></Y2>
    <W3 alpha="beta"></W3>
    <X4>
    <  foo 42 bar  >
    </X4>[/highlight]
</D1>
......................
......................
many more records.....
</P1>

Then the sed one-liner won't work:
Code:
<?xml version="1.0" encoding="UTF-16"?>
<P1 >
<C1 type="i"><Z1 ><abc txt>/Z1</C1>
<V1 type="string">6.2 txt</V1>
<D1 type="string">
    <D2>1.0</D2>
    <D2>2.0</D2>
    <Y2><one 1.0></Y2><Y2><two 2.0></Y2><Y2><three 3.0></Y2><Y2>four 4.0</Y2>
    W3 alpha="beta"</W3>
    <X4>
    <  foo 42 bar  >
    </X4>
</D1>
......................
......................
many more records.....
</P1>

A partial perlish solution:
Code:
perl -0777 -pe 'print; print "------\n"; s{<([^/>\s]+)([^>]*)>(?:\s*<([^>]*)\s*>\s*)?</\1\s*>}{<$1$2>$3</$1>}gms;'

which generates:
Code:
<?xml version="1.0" encoding="UTF-16"?>
<P1 >
<C1 type="i"><Z1 >abc txt</Z1></C1>
<V1 type="string">6.2 txt</V1>
<D1 type="string">
    <D2>1.0</D2>
    <D2>2.0</D2>
    <Y2>one 1.0</Y2><Y2>two 2.0</Y2><Y2>three 3.0</Y2><Y2>four 4.0</Y2>
    <W3 alpha="beta"></W3>
    <X4>  foo 42 bar  </X4>
</D1>
......................
......................
many more records.....
</P1>

Mind you, what works will depend entirely on your input data. If the sed one-liner works, use it.

Last edited by derekludwig; 01-04-2015 at 11:22 AM.. Reason: typo: s/online/one-liner/
This User Gave Thanks to derekludwig For This Post:
# 7  
Old 01-05-2015
Code:
perl -0777 -pe 's{<([^/>\s]+)([^>]*)>(?:\s*<([^>]*)\s*>\s*)?</\1\s*>}{<$1$2>$3</$1>}gms;'

Above code working perfectly, Thanks a lot all for your inputs.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

2. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

3. Shell Programming and Scripting

To search for a particular tag in xml and collate all similar tag values and display them count

I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help File: <xml><object1>house</object1><object2>child</object2>... (9 Replies)
Discussion started by: srkmish
9 Replies

4. Shell Programming and Scripting

Need to replace XML TAG

As per the requirement I need to replace XML tag with old to new on one of the XML file. Old<com : DEM>PHI</com : DEM> New<com : DEM>PHM</com : DEM> Please someone provide the sed command to replace above mentioned old XML tag with new XML tag (2 Replies)
Discussion started by: siva83
2 Replies

5. Shell Programming and Scripting

XML Parse between to tag with upper tag

Hi Guys Here is my Input : <?xml version="1.0" encoding="UTF-8"?> <xn:MeContext id="01736"> <xn:VsDataContainer id="01736"> <xn:attributes> <xn:vsDataType>vsDataMeContext</xn:vsDataType> ... (12 Replies)
Discussion started by: pareshkp
12 Replies

6. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

7. Shell Programming and Scripting

How to retrieve the value from XML tag whose end tag is in next line

Hi All, Find the following code: <Universal>D38x82j1JJ </Universal> I want to retrieve the value of <Universal> tag as below: Please help me. (3 Replies)
Discussion started by: mjavalkar
3 Replies

8. Shell Programming and Scripting

XML tag replacement from different XML file

We have 2 XML file 1. ORIGINAL.xml file and 2. ATTRIBUTE.xml files, In the ORIGINAL.xml we need some modification as <resourceCode>431048</resourceCode>under <item type="Manufactured"> tag - we need to grab the 431048 value from tag and pass it to database table in unix shell script to find the... (0 Replies)
Discussion started by: balrajg
0 Replies

9. Shell Programming and Scripting

Reading only particular TAG from XML

Hi, I have an XML file with following structure. Between following tags I have pipedelimited records with newline characters (Data1|1|2|3) <!]> I need to read the data between above tags so that my output is a flat file with pipedelimited records. <BOS> <Header> <TTC>ABC</TTC> ... (9 Replies)
Discussion started by: dsrookie
9 Replies

10. Shell Programming and Scripting

how to get xml tag..

Sorry to trouble you guys again.....but i encounter this problem: My textfile contains this: 2006-01-12 01:12:08,290 INFO - The XML message **************<PM_ARRIVAL xmlns:xsi= "http://www.w3.org/2001/XMLSchemainstance"><system_c>GMS</system_c><trans_c>ARLC</trans_c></<PM_ARRIVAL> 2006-01-12... (8 Replies)
Discussion started by: forevercalz
8 Replies
Login or Register to Ask a Question