I have an xml file that contains information like this
I want to remove duplicate entries, the problem is I cannot sort, as due to comment stracture for some entries, they form a new line.
I tried the following code
(called as awk -f script file)
provided by Aigles in another post, which for some reason does not work (modified to my input), it does not change anything. I tried to variate it unsuccsfully so far.
any ideas would be much appreciated
many thanks
Last edited by TasosARISFC; 09-08-2011 at 05:34 AM..
Hi, sadly I have no XSLT processor nor can I install one as my machine is restricted
---------- Post updated at 03:57 PM ---------- Previous update was at 03:53 PM ----------
Is there a way to check whats between <ID></ID> and check if that exists somewhere else in the file. if it does...delete it? Also how can I bypass "/" in awk? for example I cannot do this:
Unfortunately processing XML isn't trivial. Without a proper recursive parser for it you end up building one yourself, brute-force, character by character, because the record-based language constructs of awk don't help you. That's why tools like XSLT processors exist..
working on something.
---------- Post updated at 10:55 AM ---------- Previous update was at 09:06 AM ----------
Here's a semi-ugly GNU awk solution. It works by breaking apart records on < and fields on >. Meaning, the first token is always a complete tag and the second, if any, is text -- <stuff param=1>text would get split into "stuff param=1", "text". I've tried to make it tolerate improperly nested tags, uppercase vs lowercase, etc, but can't possibly make it perfect.
Last edited by Corona688; 08-31-2011 at 02:33 PM..
Reason: [edit] improved version with fewer extra newlines
Hi Corona, thank you for your effort. As in your example what this did was to create a list of <ID> 0000000</ID> removing all other tags, but still with duplicates.
I thought of sorting and use uniq to get the duplicate IDs from this list, then delete them from the original file (not the list). However this does not resolve the problem that some elements have comments that extent to new lines and those lines do not get removed
eg
will only delete the first line
So again I am looking for a way to look for xml tags <ID></ID> and if whats between them already exists in the file delete it.
---------- Post updated at 10:11 AM ---------- Previous update was at 09:29 AM ----------
I can see this is rather complicated... an other way I could do this is by providing the list of duplicate ID's, search the file for them and delete them. I already know the duplicate IDs from a sys out, so I can place them in a text file as a list.However I have two issues
First, how do I delete all but one?
and second how do I define where to start deleting and where to stop?
I'm searching for the names of a TV show in the XML file I've attached at the end of this post. What I'm trying to do now is pull out/list the data from each of the <SeriesName> tags throughout the document. Currently, I'm only able to get data the first instance of that XML field using the... (9 Replies)
I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file:
<EMAIL>***</EMAIL>
<CUSTOMER_ID>****</CUSTOMER_ID>
<BRANDID>***</BRANDID>
Now I want to grep the values of all these specified... (1 Reply)
Hi,
I'm having a xml file with multiple xml header. so i want to split the file into multiple files.
Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix.
eg :
<?xml version="1.0" encoding="UTF-8"?>
<ml:individual... (3 Replies)
Hi All,
We need to split a large xml into multiple valid xml with same header(2lines) and footer(last line) for N number of letterId.
In the example below we have first 2 lines as header and last line as footer.(They need to be in each split xml file)
Header:
<?xml version="1.0"... (5 Replies)
Hi All,
I have two xml files.
One is having below input
<NameValuePair>
<name>Daemon</name>
<value>tcp:7474</value>
</NameValuePair>
<NameValuePair>
<name>Network</name>
<value></value>
</NameValuePair>
... (2 Replies)
I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere.
I can not supply any sample file as it contains private data but you can download your own contacts using this script:
#!/bin/sh
# imports... (9 Replies)
HI All,
I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz.
eg:
<?xml version="1.0"?>... (4 Replies)
Hi All,
I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me.
<A>testing_Location</A>
<value>LA</value>
<zone>US</zone>
<B>Region</B>
<value>Russia</value>
<zone>Washington</zone>
<C>Country</C>... (0 Replies)
I have an xml file:
<AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Table1>
<Data1 10 </Data1>
<Data2 20 </Data2>
<Data3 40 </Data3>
<Table1>
</AutoData>
and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only.
I tried using sed... (10 Replies)
I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)