The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
problem with dd command or maybe AFS problem Anta Shell Programming and Scripting 0 08-25-2006 11:10 AM
SSH Problem auth problem budrito UNIX for Advanced & Expert Users 1 03-17-2004 10:12 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-03-2008
Zarnick Zarnick is offline
Registered User
  
 

Join Date: May 2004
Location: Brazil
Posts: 40
XML Problem

Hello, I need a script to edit a custom XML, although I know it should be fairly easy to create such a script, I'm failing miserably.
The script should be able to read from a file containing the ids of one tag of the xml (<content contentid="XXX".... for example) and then remove this content.
For instance, for the simple XML file like this:
Code:
<categorygroup categorygroupid="test">
 <category categoryid="test_category1">
  <content contentid="0001" name="content_test">
  ...
  </content>
  <content contentid="0002" name="content_test2">
  ...
  </content>
  <content contentid="0003" name="content_test3">
  ...
  </content>
 </category>
 <categorygroup categorygroupid="test">
 <category categoryid="test_category2">
  <content contentid="0011" name="content_test1">
  ...
  </content>
  <content contentid="0012" name="content_test12">
  ...
  </content>
  <content contentid="0013" name="content_test13">
  ...
  </content>
 </category>
</categorygroup>
If one has the codes 0001, 0012 and 0013 on the file, it should become this xml file:
Code:
<categorygroup categorygroupid="test">
 <category categoryid="test_category1">
  <content contentid="0002" name="content_test2">
  ...
  </content>
  <content contentid="0003" name="content_test3">
  ...
  </content>
 </category>
 <categorygroup categorygroupid="test">
 <category categoryid="test_category2">
  <content contentid="0011" name="content_test1">
  ...
  </content>
 </category>
</categorygroup>
Now, I'm pretty sure this should be easy, but I'm having a VERY big amount of trouble by doing this (I've tried PERL, Ruby, PHP and even sed with grep) can anyone help me?

Thanks.
  #2 (permalink)  
Old 06-03-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
This appears to work for the sample you posted:

Code:
perl -0777 -pe 's%^\s*<content contentid="(0001|001[23])"[^<>]*>(.*?)</content>\s$*%%msg' file.xml
The ^ and $ decorations are probably unnecessary, if the result is mainly intended to be machine-readable. The real beef is the -0777 option and the .*? regex coupled with the /s modifier. See the Perl FAQ for more on these.
  #3 (permalink)  
Old 06-04-2008
Zarnick Zarnick is offline
Registered User
  
 

Join Date: May 2004
Location: Brazil
Posts: 40
Hum...that seems good, but where do I put the input code to remove from the XML? (I'm really no expert at regular expressions...yet)
Also, please remember that this codes are fed up by a file, and honestly, I know absolutely nothing about PERL...or at least not enough to read a file and feed every line (removing the \n) to a specific regexp.

thanks a lot

Last edited by Zarnick; 06-04-2008 at 01:47 PM..
  #4 (permalink)  
Old 06-05-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
That's the entire program. Replace file.xml with the name of the input file. Redirect to a temporary file, or use perl -i to change the original file "in place".
  #5 (permalink)  
Old 06-05-2008
Zarnick Zarnick is offline
Registered User
  
 

Join Date: May 2004
Location: Brazil
Posts: 40
This I understood, the file.xml is the xml file to remove the content from, but how do I feed the perl program with the codes to remove? I tried creating a big file with all the codes piped (e.g.: 0001|0002|3142|5342|7890....) and then cat it with the perl program you passed:
Code:
perl -0777 -pe 's%^\s*<content contentid="(`cat codes.txt`)"[^<>]*>(.*?)</content>\s$*%%msg' file.xml
But it didn't worked. Am I missing something here?

Thanks.
  #6 (permalink)  
Old 06-05-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
It's looking for literally the contents of the file, you need to process it to make a decent regular expression out of it.

Better do that in Perl directly, too.

Code:
perl -0777 -pe 'BEGIN {
    open (C, "codes.txt") || die "$!"; $c = <C>; close C; chomp $c; $c =~ y/\n/|/; }
  s%^\s*<content contentid="($c)"[^<>]*>(.*?)</content>\s$*%%msg' file.xml
This isn't particularly elegant; there is some pressure to put this into a file rather than try to pretend it's still a one-liner. You should probably refactor it a bit then.

Last edited by era; 06-05-2008 at 09:36 AM.. Reason: Oops, <C> is influenced by -0777 too
Closed Thread

Bookmarks

Tags
perl, perl regex, regex, regular expressions

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 07:58 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0