Hi folks. I would like to remove the full parent (outer) xml tag from a file given a matching child (inner) tag, in a bash shell.
To be more specific, this is what I have so far:
The goal is to remove all Outer tags that contain an Inner tag with value 0. However, the above command clearly doesn't do what I want it to. In particular, the first .+? seems to be greedy, and I don't understand why. Does anybody know how I can do it? I appreciate any help.
I'm not bound to perl, but AFAIK perl is the easiest choice for multi-line matching. Any working alternative (sed, awk?) is perfectly welcome.
Last edited by BatManWSL; 05-18-2010 at 10:32 AM..
XSLT is generally a better mechanism for handling this sort of document transformation.
Assuming you convert your XML document into a well formed XML document by adding a root element, the following stylesheet will do what you want.
The first template does all the heavy lifting. The second template is just an identity tranformation.
...
The goal is to remove all Outer tags that contain an Inner tag with value 0. However, the above command clearly doesn't do what I want it to. In particular, the first .+? seems to be greedy, and I don't understand why. Does anybody know how I can do it?...
I agree with fpmurphy on this mainly for two reasons:
(a) complexity of Perl regular expressions increases with that of your XML processing requirements, and
(b) a change in the XML structure could render the entire regex useless. That could be a *very* frustrating experience.
Nevertheless, I'd like to answer your questions.
Firstly, all the standard quantifiers - *, +, ? and {m,n} are greedy. That's by definition.
Secondly, your regex works exactly as it is expected to.
I've color coded the parts of the regex that match the parts in the xml file.
Note that when you mention ".+?", Perl matches between the first "<Outer>" and "<Inner>", and that includes the part of the string that has "</Outer>" in it.
You will need to tell Perl to look-ahead of "<Outer>" but not match if the look-ahead string has "</Outer>" in it. And same is the case for the string after "</Inner>" - look-ahead but don't match if the string has "<Outer>" in it.
That's where the concept of "negative lookahead" (?! construct comes into picture.
Your regex should've been so -
Having said that, if you want to explore XML processing with Perl, check out the XMLTwig module at cpan or at xmltwig.com.
HTH,
tyler_durden
Last edited by durden_tyler; 05-18-2010 at 04:02 PM..
This User Gave Thanks to durden_tyler For This Post:
Hi Forum.
I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script.
Any feedback will be greatly appreciated.
... (19 Replies)
Hi ,
I have a situation where I need to search an xml file for the presence of a tag
<FollowOnFrom> and also , presence of partial part of the following tag <ContractRequest _LoadId and if these 2 exist ,then
extract the value from the following tag <_LocalId> which is
"CW2094139". There... (2 Replies)
I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help
File:
<xml><object1>house</object1><object2>child</object2>... (9 Replies)
Hi Guys
Here is my Input :
<?xml version="1.0" encoding="UTF-8"?>
<xn:MeContext id="01736">
<xn:VsDataContainer id="01736">
<xn:attributes>
<xn:vsDataType>vsDataMeContext</xn:vsDataType>
... (12 Replies)
Hi folks,
I have an query that is let say i have to search in an xml file an tag that is <abcdef> now this xml file is at /opt/usr/local so one fastest way to achieve this is go to this location by cd /opt/usr/local and then do grep like this... grep -i abcdef but for this I must know the... (4 Replies)
Hi All,
Find the following code:
<Universal>D38x82j1JJ
</Universal>
I want to retrieve the value of <Universal> tag as below:
Please help me. (3 Replies)
Hello,
I cannot see what's wrong in my code.
When I run code below, it just print an empty string.
my $test = "SWER~~ERTGSDFGTHAS_RTAWGA_DFAS.x4-234253454.in";
if ($test = ~ m/\~{1,2}.*4/) {
print "$1\n";
}
else {
print "No match...\n";
}
Anyone know what I'm doing wrong?
... (4 Replies)
I have a file that contains this.
<NAME>/bob</NAME>
I'm trying to print just the /bob part to my screen. I have a command line example I really think should work. Keep in mind that the content between the <NAME> </NAME> is always changing.
$/tmp> perl -ne 'print /<NAME>($.)<\/NAME>/'... (2 Replies)
Hi All,
I'm trying to extract the values for the 'src' and 'alt' tags within an xml file. In the files that I'm searching, the tags are always enclosed within an 'img' tag. Typically:
<img src="diwiz01.gif" width="576" height="254" alt="Out-of-process and In-process COM Objects"><bookmark... (3 Replies)