![]() |
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| problem with dd command or maybe AFS problem | Anta | Shell Programming and Scripting | 0 | 08-25-2006 11:10 AM |
| SSH Problem auth problem | budrito | UNIX for Advanced & Expert Users | 1 | 03-17-2004 10:12 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
XML Problem
Hello, I need a script to edit a custom XML, although I know it should be fairly easy to create such a script, I'm failing miserably.
The script should be able to read from a file containing the ids of one tag of the xml (<content contentid="XXX".... for example) and then remove this content. For instance, for the simple XML file like this: Code:
<categorygroup categorygroupid="test"> <category categoryid="test_category1"> <content contentid="0001" name="content_test"> ... </content> <content contentid="0002" name="content_test2"> ... </content> <content contentid="0003" name="content_test3"> ... </content> </category> <categorygroup categorygroupid="test"> <category categoryid="test_category2"> <content contentid="0011" name="content_test1"> ... </content> <content contentid="0012" name="content_test12"> ... </content> <content contentid="0013" name="content_test13"> ... </content> </category> </categorygroup> Code:
<categorygroup categorygroupid="test"> <category categoryid="test_category1"> <content contentid="0002" name="content_test2"> ... </content> <content contentid="0003" name="content_test3"> ... </content> </category> <categorygroup categorygroupid="test"> <category categoryid="test_category2"> <content contentid="0011" name="content_test1"> ... </content> </category> </categorygroup> Thanks. |
|
||||
|
This appears to work for the sample you posted:
Code:
perl -0777 -pe 's%^\s*<content contentid="(0001|001[23])"[^<>]*>(.*?)</content>\s$*%%msg' file.xml |
|
||||
|
Hum...that seems good, but where do I put the input code to remove from the XML? (I'm really no expert at regular expressions...yet)
Also, please remember that this codes are fed up by a file, and honestly, I know absolutely nothing about PERL...or at least not enough to read a file and feed every line (removing the \n) to a specific regexp. thanks a lot Last edited by Zarnick; 06-04-2008 at 01:47 PM.. |
|
||||
|
This I understood, the file.xml is the xml file to remove the content from, but how do I feed the perl program with the codes to remove? I tried creating a big file with all the codes piped (e.g.: 0001|0002|3142|5342|7890....) and then cat it with the perl program you passed:
Code:
perl -0777 -pe 's%^\s*<content contentid="(`cat codes.txt`)"[^<>]*>(.*?)</content>\s$*%%msg' file.xml Thanks. |
|
||||
|
It's looking for literally the contents of the file, you need to process it to make a decent regular expression out of it.
Better do that in Perl directly, too. Code:
perl -0777 -pe 'BEGIN {
open (C, "codes.txt") || die "$!"; $c = <C>; close C; chomp $c; $c =~ y/\n/|/; }
s%^\s*<content contentid="($c)"[^<>]*>(.*?)</content>\s$*%%msg' file.xml
Last edited by era; 06-05-2008 at 09:36 AM.. Reason: Oops, <C> is influenced by -0777 too |
![]() |
| Bookmarks |
| Tags |
| perl, perl regex, regex, regular expressions |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|