![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| perl -write values in a file to @array in perl | meghana | Shell Programming and Scripting | 27 | 06-07-2009 06:05 PM |
| extracting selected few lines through perl | paruthiveeran | UNIX for Dummies Questions & Answers | 2 | 07-16-2008 05:43 AM |
| extracting used perl modules | DILEEP410 | Shell Programming and Scripting | 0 | 07-09-2008 01:47 AM |
| Extracting values from files | Master Error | Shell Programming and Scripting | 4 | 08-15-2004 10:23 AM |
| Perl - extracting data from .csv files | kregh99 | Shell Programming and Scripting | 3 | 10-09-2003 11:18 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Extracting tag values from XML using perl
Hi All,
I'm trying to extract the values for the 'src' and 'alt' tags within an xml file. In the files that I'm searching, the tags are always enclosed within an 'img' tag. Typically: <img src="diwiz01.gif" width="576" height="254" alt="Out-of-process and In-process COM Objects"><bookmark name="f003"/></img> I grep for 'img' and pipe to the following perl code that successfully extracts the required data: Code:
#!/usr/bin/perl
while (<>) {
while (m/img src=\"(.*?)\"/ig) {
print $1,"|";
}
while (m/alt=\"(.*?)\"/ig) {
print $1,"\n";
}
}
<img width="470" height="321" alt="A Remote COM Object" src="dicwiz02.gif"><bookmark name="f004"/></img> Consequently, the above code doesn't work. The basis of the code was originally used for a different problem and I didn't write it. I've modified it in an attempt to satisfy this problem. Unfortunately, although I know the basics of sed and awk (but hardly any perl), I'm not a programmer and I'm struggling a bit. Any help gratefully received. Thanks. |
|
||||
|
There might be other more clever solutions, but this one works.
[CODE] Tsunami xml # cat xml <img width="470" height="321" alt="A Remote COM Object" src="dicwiz02.gif"><bookmark name="f004"/></img> <img src="diwiz01.gif" width="576" height="254" alt="Out-of-process and In-process COM Objects"><bookmark name="f003"/></img> Tsunami xml # perl -ne 'print "$1 $2\n" if /<img.*?(?:src|alt)=\"(.*?)\".*?(?:alt|src)=\"(.*?)\".*?<\/img>/;' xml A Remote COM Object dicwiz02.gif diwiz01.gif Out-of-process and In-process COM Objects Tsunami xml # [CODE] |
![]() |
| Bookmarks |
| Tags |
| perl regex |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|