OK, but is everything working fine now? Because I really don't feel like analyzing 1000+ lines XML file If something is not reformatted properly, then which particular tag is it?
Sorry about that, I meant to post an abbreviated version of the ill formed XML which is only 65 lines. I have attached that here in case anyone want to have a look. The .zip also includes two .doc files. The first is the ill formed XML with comments describing the issues. The problem tags are in bold blue. The second is the revised version with comments describing the corrections. Additions are in bold red. I hope this will be helpful to anyone looking to correct a similar issue.
The script is working and I suppose isn't overly kludgey. I am hard coding the text format as ASCII. Is there an easy way in bash to id the encoding of the input file?
I have made a couple of changes. I am naming all the created attributes the same as the tag name, since this is simpler and now the replacements are done with a more general rule. The exception is the UgUn tag, since there is an extra space to be dealt with, I guess. I have also changed to perl based on the format of the earlier posts.
Here is the current script
The exception is the UnUg tag, which I can't get perl to find, and the tags with multiple args, which are still in awk. I am not sure about the current solution, since it doesn't take into account the fact that the args may have different values that are being searched for.
Quote:
Originally Posted by matrixmadhan
in the script posted above, I see lot of sed and awk that are chained of list of commands to be executed within a bash wrapper. With increase in file size, this approach is going to terribly slow down the processing as its going to keep spawning multiple processes.
Have you considered writing it in perl with read line interface and processing, which will be way faster than the current approach.
I could switch to using more temp files if you think that would help.
I really don't know perl all that well at all. On the whole, I am pretty poor at regex stuff. I use it some, since it's so d**n convenient, but what I know is just bits and pieces of various interpreters and stream editors. I can modify stuff in perl an awk, but that is about it. I wish I knew this material better, but that is just on the long list of such things.
LMHmedchem
I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Hi, I want to get data from Xml file by using sed or awk command. I want to get the following result :
mon titre 1;Createur1;Dossier1
mon titre 1;Createur1;Dossier1
and save it in cvs file (fichier.cvs).
FROM this Xml file (test.xml):
<playlist version="1">
<trackList>
<track>... (1 Reply)
hello,
new to this forum. but i have a requirement to extract the value from multiple xml node and print out the values to new file with comma seperated. would like to know how this would be done using either sed/perl or some unix script. an example would be tremendous...
sample input file:... (2 Replies)
Hello everyone,
Firstly i do not require alot of help.. i am right at the end of finishing my scipt but cannot find a solution to the last part.
What i need to do is, prompt the user for a file to work with, which i have done.
promt the user for an output file - which is done.
#!/bin/bash... (14 Replies)
I hopefully have a simple request - I need to process multiple files reformatting the output based on tags at the beginning of each line. So the data for the new 3 lines of the output file are in the HDR line and then the details are in the DTL tagged lines.
for ifile in $indir
do
echo... (1 Reply)
I have a Unix file with 200,000 records, and need to remove all records from the file that have the character ‘I' in position 68 (68 bytes from the left). I have searched for similar problems and it appears that it would be possible with sed, awk or perl but I do not know enough about any of these... (7 Replies)
Hi All,
I need help from any of you.Would be so thankful for your help.
I/P
DDDD,1045,161,1557,429,1694,800,1911,1113,2460,1457,2917>
1609,3113,1869,3317,2732,3701,3727,4132,5857,5107>
9004,6496
DDDD,1125,157,1558,429,1694,800,1911,1117,2432,1444,2906>... (2 Replies)
Hello,
I have a requirement to extract the value from multiple xml node and print out the values to new file to compare.
Would be done using either awk/perl or some unix script.
For example sample input file:
.....
.....
<factories xmi:type="resources.jdbc:DataSource"... (2 Replies)
Hi Everyone,
I'm new here and I was checking this old post:
/shell-programming-and-scripting/180669-splitting-file-into-several-smaller-files-using-perl.html
(cannot paste link because of lack of points)
I need to do something like this but understand very little of perl.
I also check... (4 Replies)
Sorry for the long/weird title but I'm stuck on a problem I have. I have this XML file:
</member>
<member>
<name>TransactionID</name>
<value><string>123456789123456</string></value>
</member>
<member>
<name>Number</name>
... (9 Replies)