reformatting xml file, sed or awk I think (possibly perl)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting reformatting xml file, sed or awk I think (possibly perl)
# 1  
Old 04-15-2011
reformatting xml file, sed or awk I think (possibly perl)

I have some xml files that cannot be read using a standard parser, or I am using the wrong parser. The issues seems to be spaces in some of the tags.

Here is a sample,
Code:
<UgUn 2 >
<Un>
-0.426753
</Un>
</UgUn>

The parser isn't able to find the number 2, so that information is lost, etc. It seems as if it would like,
Code:
<UgUn>
<Id>
2
</Id>
<Un>
-0.426753
</Un>
</UgUn>

I seems like it would be pretty simple to script up something to convert between these two formats.

I believe that you can also do,
Code:
<UgUn Id="2">
<Un>
-0.426753
</Un>
</UgUn>

Which would be much easier to manage in sed, since it only involves one line.

I don't know much of anything about xml, so suggestions would be appreciated.

LMHmedchem
# 2  
Old 04-15-2011
First format:
Code:
perl -pe 's/<(\w+) (\d+) >/<\1>\n<Id>\n\2\n<\/Id>/' file.xml

Second format:
Code:
perl -pe 's/<(\w+) (\d+) >/<\1 Id="\2">/' file.xml

# 3  
Old 04-15-2011
Yes, that was inavlid xml, if that process is right, it is easy in sed or awk.
Code:
sed '
  s/<UgUn  *\([^ ]\{1,9\}\) *>/<UgUn Id="\1">/g
 ' infile >outfile

Assuming Id is unique, the second form works and is far simpler.
# 4  
Old 04-16-2011
After consulting my sed dictionary,
http://www.ancientscripts.com/images/su_signs.gif

I find the above suggestion to work just fine.

I don't seem to have re-installed perl the last time I re-installed cygwin, so I didn't get to try that suggestion.

I have a few other things that need to be converted to proper metadata arguments.
Code:
<Fmt TEXT>
<Name Net_0>
<Epoch 7300>
<Lay Input>

It seemed like the following should work,
Code:
#for <text text>
sed 's/<\([a-z,A-Z]*\)\ \([a-z,A-Z]*\)>/<\1\ \1="\2">/g' temp1 > temp2

#for <text number>
sed 's/<\([a-z,A-Z]*\)\ \([1-9]*\)>/<\1\ \1="\2">/g' temp2 > temp3

The first line works for <Fmt TEXT> and <Lay Input> and gives the expected, <Fmt Fmt ="TEXT"> and <Lay Lay="Input">, but does not recognize <Name Net_0>, I expect because of the underscore. The <text number> command doesn't work at all, though it seems properly formed.

Perhaps I need a newer dictionary???

LMHmedchem

---------- Post updated 04-16-11 at 05:13 PM ---------- Previous update was 04-15-11 at 05:23 PM ----------

I have found a few things that I think I need to do in awk or perl.

I have this,
Code:
<Cn Cn="11">
0 1.42767
1 1.16508
2 -0.56867
3 -0.272873
4 -0.14623
5 -0.053066
6 0.345557
7 -0.424821
8 -0.507607
9 -0.459116
10 -1.19002
</Cn>

This is of the format <Cn Cn="int"> where subsequent lines have an int and float as int | space | float. I need to change the int into tags with the float as the data member of the tag.
Code:
<Cn Cn="11">
<C0>1.42767</C0>
<C1>1.16508</C1>
<C2>-0.56867</C2>
<C3>-0.272873</C3>
<C4>-0.14623</C4>
<C5>-0.053066</C5>
<C6>0.345557</C6>
<C7>-0.424821</C7>
<C8>-0.507607</C8>
<C9>-0.459116</C9>
<C10>-1.19002</C10>
</Cn>

There is also this, which seems easier,
Code:
<Un>
-0.0552877

where there is a <Un> tag with a float on the next line. All that is needed here is to insert the float into a tag,
Code:
<Un>
 <Bias>-0.0552877</Bias>

I tried a few things like,
's/<\(<Un>/n)\ \([1-9]*\)/<\1\ <Bias>\2</Bias>/g'

but I obviously don't have that quite right.

LMHmedchem

Last edited by LMHmedchem; 04-15-2011 at 06:49 PM..
# 5  
Old 04-16-2011
For the "<Cn>" format try this:
Code:
perl -pe 's/(\d+) ([\d.-]+)/<C\1>\2<\/C\1>/ if /<Cn/../<\/Cn/' file

For <Un> try this:
Code:
perl -p0e 's/(<Un>\n)(.*)/\1<Bias>\2<\/Bias>/g' file

This User Gave Thanks to bartus11 For This Post:
# 6  
Old 04-16-2011
That took care of quite a bit of the issue. I still have a few things going on.

I have,
Code:
<Fmt TEXT>
<Epoch 7300>
<Lay Input>

which I have fixed with
sed 's/<\([a-z,A-Z]*\)\ \([a-z,A-Z]*\)>/<\1\ \1="\2">/g'
sed 's/<\([a-z,A-Z]*\)\ \([0-9]*\)>/<\1\ \1="\2">/g'

which gives,
Code:
<Fmt Fmt="TEXT">
<Epoch Epoch="7300">
<Lay Lay="Input">

but this does not work for
Code:
<Name Network_0>

I assume because of the _0.

LMHmedchem
# 7  
Old 04-16-2011
What should
Code:
<Name Network_0>

be transformed into?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace string in XML file with awk/sed with string from another

Sorry for the long/weird title but I'm stuck on a problem I have. I have this XML file: </member> <member> <name>TransactionID</name> <value><string>123456789123456</string></value> </member> <member> <name>Number</name> ... (9 Replies)
Discussion started by: cozzin
9 Replies

2. Shell Programming and Scripting

Splitting xml file into several xml files using perl

Hi Everyone, I'm new here and I was checking this old post: /shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html (cannot paste link because of lack of points) I need to do something like this but understand very little of perl. I also check... (4 Replies)
Discussion started by: mcosta
4 Replies

3. Shell Programming and Scripting

Get multiple values from an xml file using one of the following commands or together awk/perl/script

Hello, I have a requirement to extract the value from multiple xml node and print out the values to new file to compare. Would be done using either awk/perl or some unix script. For example sample input file: ..... ..... <factories xmi:type="resources.jdbc:DataSource"... (2 Replies)
Discussion started by: slbmind
2 Replies

4. Shell Programming and Scripting

Modify the file with awk,sed or perl

Hi All, I need help from any of you.Would be so thankful for your help. I/P DDDD,1045,161,1557,429,1694,800,1911,1113,2460,1457,2917> 1609,3113,1869,3317,2732,3701,3727,4132,5857,5107> 9004,6496 DDDD,1125,157,1558,429,1694,800,1911,1117,2432,1444,2906>... (2 Replies)
Discussion started by: Indra2011
2 Replies

5. Shell Programming and Scripting

Using sed (or awk or perl) to delete rows in a file

I have a Unix file with 200,000 records, and need to remove all records from the file that have the character ‘I' in position 68 (68 bytes from the left). I have searched for similar problems and it appears that it would be possible with sed, awk or perl but I do not know enough about any of these... (7 Replies)
Discussion started by: joddo
7 Replies

6. Shell Programming and Scripting

awk multiple file reformatting

I hopefully have a simple request - I need to process multiple files reformatting the output based on tags at the beginning of each line. So the data for the new 3 lines of the output file are in the HDR line and then the details are in the DTL tagged lines. for ifile in $indir do echo... (1 Reply)
Discussion started by: jason_v_brown
1 Replies

7. Shell Programming and Scripting

Using SED/AWK to extract xml at end of file

Hello everyone, Firstly i do not require alot of help.. i am right at the end of finishing my scipt but cannot find a solution to the last part. What i need to do is, prompt the user for a file to work with, which i have done. promt the user for an output file - which is done. #!/bin/bash... (14 Replies)
Discussion started by: hugh86
14 Replies

8. Shell Programming and Scripting

How to get value from xml node using sed/perl/script?

hello, new to this forum. but i have a requirement to extract the value from multiple xml node and print out the values to new file with comma seperated. would like to know how this would be done using either sed/perl or some unix script. an example would be tremendous... sample input file:... (2 Replies)
Discussion started by: davidsouk
2 Replies

9. Shell Programming and Scripting

sed or awk to extract data from Xml file

Hi, I want to get data from Xml file by using sed or awk command. I want to get the following result : mon titre 1;Createur1;Dossier1 mon titre 1;Createur1;Dossier1 and save it in cvs file (fichier.cvs). FROM this Xml file (test.xml): <playlist version="1"> <trackList> <track>... (1 Reply)
Discussion started by: yeclota
1 Replies

10. Shell Programming and Scripting

How to parse a XML file using PERL and XML::DOm

I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Discussion started by: girigopal
0 Replies
Login or Register to Ask a Question