Sponsored Content
Full Discussion: Duplicates in an XML file
Top Forums Shell Programming and Scripting Duplicates in an XML file Post 302551715 by Corona688 on Wednesday 31st of August 2011 12:55:36 PM
Old 08-31-2011
Escape it with \, like \/

Unfortunately processing XML isn't trivial. Without a proper recursive parser for it you end up building one yourself, brute-force, character by character, because the record-based language constructs of awk don't help you. That's why tools like XSLT processors exist..

working on something.

---------- Post updated at 10:55 AM ---------- Previous update was at 09:06 AM ----------

Here's a semi-ugly GNU awk solution. It works by breaking apart records on < and fields on >. Meaning, the first token is always a complete tag and the second, if any, is text -- <stuff param=1>text would get split into "stuff param=1", "text". I've tried to make it tolerate improperly nested tags, uppercase vs lowercase, etc, but can't possibly make it perfect.

Code:
$ cat xml.awk
#!/usr/bin/gawk -f

function inside(STR, OFF, N)
{
        for(N=(D-(1+OFF)); N>=0; N--)
        if(TAG[N] == tolower(STR))      return(1);

        return(0);
}

BEGIN {         RS="<"  ;       FS=">"  ;       LAST="C"        }

{
        # closing tag
        if($0 ~ /^[/]/)
        {
                TMP=D;
                D--;
                # Try to deal gracefully with improperly nested tags
                while(tolower(substr($1, 2, length($1)-1)) != TAG[D])
                {
                        if(D > 0)       D--;
                        else    {       D=TMP;  break;          }
                }

                if(!inside("ID", 0))    printf("<%s>\n", $1);
                LAST="C"
        }
        else if($1)     # open tag
        {
                if(!inside("ID", 0))
                {
                        if(LAST == "O") { printf("\n"); }

                        printf("<%s>", $1);
                        LAST="O";
                }

                if(! ($1 ~ /[/]$/))     # ignore self-closing tags
                {
                        split($1, a, "[ \r\n\t]");
                        TAG[D++]=tolower(a[1]);
                }

                if(!inside("ID", 1))
                if(NF > 1)      printf("%s", $2);
        }
}
$ ./xml.awk < data.xml
<ID>574922</ID>
<ID>574922</ID>
<ID>412659</ID>
<ID>873520</ID>
<ID>480622</ID>
<ID>873520</ID>
<ID>480622</ID>
$


Last edited by Corona688; 08-31-2011 at 02:33 PM.. Reason: [edit] improved version with fewer extra newlines
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to parse a XML file using PERL and XML::DOm

I need to know the way. I have got parsing down some nodes. But I was unable to get the child node perfectly. If you have code please send it. It will be very useful for me. (0 Replies)
Discussion started by: girigopal
0 Replies

2. Shell Programming and Scripting

How to remove xml namespace from xml file using shell script?

I have an xml file: <AutoData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table1> <Data1 10 </Data1> <Data2 20 </Data2> <Data3 40 </Data3> <Table1> </AutoData> and I have to remove the portion xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" only. I tried using sed... (10 Replies)
Discussion started by: Gary1978
10 Replies

3. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

4. Shell Programming and Scripting

Help required in Splitting a xml file into multiple and appending it in another .xml file

HI All, I have to split a xml file into multiple xml files and append it in another .xml file. for example below is a sample xml and using shell script i have to split it into three xml files and append all the three xmls in a .xml file. Can some one help plz. eg: <?xml version="1.0"?>... (4 Replies)
Discussion started by: ganesan kulasek
4 Replies

5. Shell Programming and Scripting

XML: parsing of the Google contacts XML file

I am trying to parse the XML Google contact file using tools like xmllint and I even dived into the XSL Style Sheets using xsltproc but I get nowhere. I can not supply any sample file as it contains private data but you can download your own contacts using this script: #!/bin/sh # imports... (9 Replies)
Discussion started by: ripat
9 Replies

6. Shell Programming and Scripting

Comparing delta values of one xml file in other xml file

Hi All, I have two xml files. One is having below input <NameValuePair> <name>Daemon</name> <value>tcp:7474</value> </NameValuePair> <NameValuePair> <name>Network</name> <value></value> </NameValuePair> ... (2 Replies)
Discussion started by: sharsour
2 Replies

7. Shell Programming and Scripting

Split xml file into multiple xml based on letterID

Hi All, We need to split a large xml into multiple valid xml with same header(2lines) and footer(last line) for N number of letterId. In the example below we have first 2 lines as header and last line as footer.(They need to be in each split xml file) Header: <?xml version="1.0"... (5 Replies)
Discussion started by: vx04
5 Replies

8. Shell Programming and Scripting

Splitting a single xml file into multiple xml files

Hi, I'm having a xml file with multiple xml header. so i want to split the file into multiple files. Sample.xml consists multiple headers so how can we split these multiple headers into multiple files in unix. eg : <?xml version="1.0" encoding="UTF-8"?> <ml:individual... (3 Replies)
Discussion started by: Narendra921631
3 Replies

9. UNIX for Beginners Questions & Answers

Grepping multiple XML tag results from XML file.

I want to write a one line script that outputs the result of multiple xml tags from a XML file. For example I have a XML file which has below XML tags in the file: <EMAIL>***</EMAIL> <CUSTOMER_ID>****</CUSTOMER_ID> <BRANDID>***</BRANDID> Now I want to grep the values of all these specified... (1 Reply)
Discussion started by: shubh752
1 Replies

10. UNIX for Beginners Questions & Answers

How to pull multiple XML tags from the same XML file in Shell.?

I'm searching for the names of a TV show in the XML file I've attached at the end of this post. What I'm trying to do now is pull out/list the data from each of the <SeriesName> tags throughout the document. Currently, I'm only able to get data the first instance of that XML field using the... (9 Replies)
Discussion started by: hungryd
9 Replies
IPTABLES-XML(8) 														   IPTABLES-XML(8)

NAME
iptables-xml -- Convert iptables-save format to XML SYNOPSIS
iptables-xml [-c] [-v] DESCRIPTION
iptables-xml is used to convert the output of iptables-save into an easily manipulatable XML format to STDOUT. Use I/O-redirection pro- vided by your shell to write to a file. -c, --combine combine consecutive rules with the same matches but different targets. iptables does not currently support more than one target per match, so this simulates that by collecting the targets from consecutive iptables rules into one action tag, but only when the rule matches are identical. Terminating actions like RETURN, DROP, ACCEPT and QUEUE are not combined with subsequent targets. -v, --verbose Output xml comments containing the iptables line from which the XML is derived iptables-xml does a mechanistic conversion to a very expressive xml format; the only semantic considerations are for -g and -j targets in order to discriminate between <call> <goto> and <nane-of-target> as it helps xml processing scripts if they can tell the difference between a target like SNAT and another chain. Some sample output is: <iptables-rules> <table name="mangle"> <chain name="PREROUTING" policy="ACCEPT" packet-count="63436" byte-count="7137573"> <rule> <conditions> <match> <p>tcp</p> </match> <tcp> <sport>8443</sport> </tcp> </conditions> <actions> <call> <check_ip/> </call> <ACCEPT/> </actions> </rule> </chain> </table> </iptables-rules> Conversion from XML to iptables-save format may be done using the iptables.xslt script and xsltproc, or a custom program using libxsltproc or similar; in this fashion: xsltproc iptables.xslt my-iptables.xml | iptables-restore BUGS
None known as of iptables-1.3.7 release AUTHOR
Sam Liddicott <azez@ufomechanic.net> SEE ALSO
iptables-save(8), iptables-restore(8), iptables(8) Jul 16, 2007 IPTABLES-XML(8)
All times are GMT -4. The time now is 11:29 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy