XML Problem


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting XML Problem
# 1  
Old 06-03-2008
XML Problem

Hello, I need a script to edit a custom XML, although I know it should be fairly easy to create such a script, I'm failing miserably.
The script should be able to read from a file containing the ids of one tag of the xml (<content contentid="XXX".... for example) and then remove this content.
For instance, for the simple XML file like this:
Code:
<categorygroup categorygroupid="test">
 <category categoryid="test_category1">
  <content contentid="0001" name="content_test">
  ...
  </content>
  <content contentid="0002" name="content_test2">
  ...
  </content>
  <content contentid="0003" name="content_test3">
  ...
  </content>
 </category>
 <categorygroup categorygroupid="test">
 <category categoryid="test_category2">
  <content contentid="0011" name="content_test1">
  ...
  </content>
  <content contentid="0012" name="content_test12">
  ...
  </content>
  <content contentid="0013" name="content_test13">
  ...
  </content>
 </category>
</categorygroup>

If one has the codes 0001, 0012 and 0013 on the file, it should become this xml file:
Code:
<categorygroup categorygroupid="test">
 <category categoryid="test_category1">
  <content contentid="0002" name="content_test2">
  ...
  </content>
  <content contentid="0003" name="content_test3">
  ...
  </content>
 </category>
 <categorygroup categorygroupid="test">
 <category categoryid="test_category2">
  <content contentid="0011" name="content_test1">
  ...
  </content>
 </category>
</categorygroup>

Now, I'm pretty sure this should be easy, but I'm having a VERY big amount of trouble by doing this (I've tried PERL, Ruby, PHP and even sed with grep) can anyone help me?

Thanks.
# 2  
Old 06-03-2008
This appears to work for the sample you posted:

Code:
perl -0777 -pe 's%^\s*<content contentid="(0001|001[23])"[^<>]*>(.*?)</content>\s$*%%msg' file.xml

The ^ and $ decorations are probably unnecessary, if the result is mainly intended to be machine-readable. The real beef is the -0777 option and the .*? regex coupled with the /s modifier. See the Perl FAQ for more on these.
# 3  
Old 06-04-2008
Hum...that seems good, but where do I put the input code to remove from the XML? (I'm really no expert at regular expressions...yet)
Also, please remember that this codes are fed up by a file, and honestly, I know absolutely nothing about PERL...or at least not enough to read a file and feed every line (removing the \n) to a specific regexp.

thanks a lot

Last edited by Zarnick; 06-04-2008 at 01:47 PM..
# 4  
Old 06-05-2008
That's the entire program. Replace file.xml with the name of the input file. Redirect to a temporary file, or use perl -i to change the original file "in place".
# 5  
Old 06-05-2008
This I understood, the file.xml is the xml file to remove the content from, but how do I feed the perl program with the codes to remove? I tried creating a big file with all the codes piped (e.g.: 0001|0002|3142|5342|7890....) and then cat it with the perl program you passed:
Code:
perl -0777 -pe 's%^\s*<content contentid="(`cat codes.txt`)"[^<>]*>(.*?)</content>\s$*%%msg' file.xml

But it didn't worked. Am I missing something here?

Thanks.
# 6  
Old 06-05-2008
It's looking for literally the contents of the file, you need to process it to make a decent regular expression out of it.

Better do that in Perl directly, too.

Code:
perl -0777 -pe 'BEGIN {
    open (C, "codes.txt") || die "$!"; $c = <C>; close C; chomp $c; $c =~ y/\n/|/; }
  s%^\s*<content contentid="($c)"[^<>]*>(.*?)</content>\s$*%%msg' file.xml

This isn't particularly elegant; there is some pressure to put this into a file rather than try to pretend it's still a one-liner. You should probably refactor it a bit then.

Last edited by era; 06-05-2008 at 09:36 AM.. Reason: Oops, <C> is influenced by -0777 too
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell Command to compare two xml lines while ignoring xml tags

I've got two different files and want to compare them. File 1 : HTML Code: <response ticketId="944" type="getQueryResults"><status>COMPLETE</status><description>Query results fetched successfully</description><recordSet totalCount="1" type="sms_records"><record... (1 Reply)
Discussion started by: Shaishav Shah
1 Replies

2. Shell Programming and Scripting

How to add the multiple lines of xml tags before a particular xml tag in a file

Hi All, I'm stuck with adding multiple lines(irrespective of line number) to a file before a particular xml tag. Please help me. <A>testing_Location</A> <value>LA</value> <zone>US</zone> <B>Region</B> <value>Russia</value> <zone>Washington</zone> <C>Country</C>... (0 Replies)
Discussion started by: mjavalkar
0 Replies

3. Programming

problem with accessing online XML file

Hi everyone, I am trying to access an online XML file, for example: <a href="http://www.rgraph.net/sample.xml" target="_blank">http://www.rgraph.net/sample.xml using HTML. <html> <body> <script type="text/javascript"> if (window.XMLHttpRequest) {// code for IE7+, Firefox,... (1 Reply)
Discussion started by: Xperia124
1 Replies

4. Shell Programming and Scripting

xml extract problem

I have looked at other responses and never was able to modify to work. data is: <?xml version="1.0"?> <note version="0.3" xmlns:link="http://beatniksoftware.com/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size" xmlns="http://beatniksoftware.com/tomboy"><title>recoll</title><text... (12 Replies)
Discussion started by: Klasform
12 Replies

5. Shell Programming and Scripting

Cygwin vi XML file encoding problem

Hi, I have got a zip (binary) file transferred from MacOS (thus it has additional __MACOSX directory packed inside). On extracting this zip, there are few *.xml files available. When I opened this *.xml file in vim editor using Cygwin (on windows) the editor displayed in the bottom. I tried... (4 Replies)
Discussion started by: royalibrahim
4 Replies

6. Shell Programming and Scripting

Facing problem in XML::parser module in PERL

HI, I have XML file which is having values as Spanish character (UTF-8 encoding). I am using XML::parser module but my code is not able to read those characters. I did goggling but not able to find suitable solution. Anybody please help me out. XML file having characters like: ñ I am... (1 Reply)
Discussion started by: jatanig
1 Replies

7. Shell Programming and Scripting

Perl - problem with CPAN module XML::Simple

Hi All, I am trying to run the following program #!/usr/bin/perl # use module use XML::Simple; use Data::Dumper; # create object $xml = new XML::Simple; # read XML file $data = $xml->XMLin("dump.xml"); # print output print Dumper($dump); At first i had the error mesage saying... (5 Replies)
Discussion started by: userscript
5 Replies

8. Shell Programming and Scripting

XML Copy & replace problem

I probably could have done this at one time, but, the years and no need has left my scripting skills lacking and I'm unable to work this problem out. https://www.unix.com/images/smilies/frown.gif :( Using Linux, have a great many xml files in which there may be multiple occurrence of a line of... (13 Replies)
Discussion started by: xenixuser
13 Replies

9. Shell Programming and Scripting

Problem printing the property of xml file via shell script

Hi, I have a config.xml which cointains the tags like <CONFIG> <PROPERTY name="port" value="1111"/> <PROPERTY name="dbname" value="ABCDE"/> <PROPERTY name="connectstring" value="xyz/pwd"/> </CONFIG> This file is in some directory at UNix box. I need to write a... (4 Replies)
Discussion started by: neeto
4 Replies

10. Programming

importing xml: problem

I'm an absolute newbie for unix... For my work, I have to import a xml file in our system (jsp+sql) via putty telnet. Once i have copied the file in the right directory, I launch this command: ./thisImport -i input/thisImport/newimport.20071130.xml -l 10 -t this is a test (as you can see),... (1 Reply)
Discussion started by: tranky
1 Replies
Login or Register to Ask a Question