Extracting tag values from XML using perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting tag values from XML using perl
# 1  
Old 08-06-2008
Extracting tag values from XML using perl

Hi All,

I'm trying to extract the values for the 'src' and 'alt' tags within an xml file. In the files that I'm searching, the tags are always enclosed within an 'img' tag. Typically:

<img src="diwiz01.gif" width="576" height="254" alt="Out-of-process and In-process COM Objects"><bookmark name="f003"/></img>

I grep for 'img' and pipe to the following perl code that successfully extracts the required data:

Code:
#!/usr/bin/perl
while (<>) {
   while (m/img src=\"(.*?)\"/ig) {
      print $1,"|";
      }
   while (m/alt=\"(.*?)\"/ig) {
      print $1,"\n";
      }
      }

However, the xml source occasionally contains the 'src' and 'alt' tags in a different order within the 'img' tag. For example:

<img width="470" height="321" alt="A Remote COM Object" src="dicwiz02.gif"><bookmark name="f004"/></img>

Consequently, the above code doesn't work.

The basis of the code was originally used for a different problem and I didn't write it. I've modified it in an attempt to satisfy this problem. Unfortunately, although I know the basics of sed and awk (but hardly any perl), I'm not a programmer and I'm struggling a bit.

Any help gratefully received.

Thanks.
# 2  
Old 08-06-2008
replace:
Code:
while (m/img src=\"(.*?)\"/ig) {
    print $1,"|";

with:
Code:
while (m/img(.*?)src=\"(.*?)\"/ig) {
    print $2,"|";

# 3  
Old 08-06-2008
There might be other more clever solutions, but this one works.

[CODE]
Tsunami xml # cat xml
<img width="470" height="321" alt="A Remote COM Object" src="dicwiz02.gif"><bookmark name="f004"/></img>
<img src="diwiz01.gif" width="576" height="254" alt="Out-of-process and In-process COM Objects"><bookmark name="f003"/></img>
Tsunami xml # perl -ne 'print "$1 $2\n" if /<img.*?(?:src|alt)=\"(.*?)\".*?(?:alt|src)=\"(.*?)\".*?<\/img>/;' xml
A Remote COM Object dicwiz02.gif
diwiz01.gif Out-of-process and In-process COM Objects
Tsunami xml #
[CODE]
# 4  
Old 08-06-2008
Thanks guys. Both solutions work. I appreciate your efforts!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Moving XML tag/contents after specific XML tag within same file

Hi Forum. I have an XML file with the following requirement to move the <AdditionalAccountHolders> tag and its content right after the <accountHolderName> tag within the same file but I'm not sure how to accomplish this through a Unix script. Any feedback will be greatly appreciated. ... (19 Replies)
Discussion started by: pchang
19 Replies

2. Shell Programming and Scripting

Extracting the tag name from an xml file

Hi, My requirement is something like this, I have a xml file that contains some tags and nested tags, <n:tag_name1> <n:sub_tag1>val1</n:sub_tag1> <n:sub_tag2>val2</n:sub_tag2> </n:tag_name1> <n:tag_name2> <n:sub_tag1>value</n:sub_tag1> ... (6 Replies)
Discussion started by: Little
6 Replies

3. Shell Programming and Scripting

To search for a particular tag in xml and collate all similar tag values and display them count

I want to basically do the below thing. Suppose there is a tag called object1. I want to display an output for all similar tag values under heading of Object 1 and the count of the xmls. Please help File: <xml><object1>house</object1><object2>child</object2>... (9 Replies)
Discussion started by: srkmish
9 Replies

4. Shell Programming and Scripting

Find out values between xml tag

Find out values between xml tag ....... ABC><name></ABC><xyz>test</xyz>..here some other tag... <ABC><NUMBER></ABC><xyz>12345</xyz>.... ....... I want to take between bewtween ABC><NUMBER></ABC><xyz> to </xyz> that is 12345 (3 Replies)
Discussion started by: Jairaj
3 Replies

5. Shell Programming and Scripting

Extracting the value of an middle attribute tag from XML

Hi All, Please help me out in resolving this.. <secondTag enabled='true' processName='test1' pidFile='/tmp/test1.pid' /> From the above tag, I'm trying to retrieve the value of enabled and pidFile attributes by means of processName attribute. Would be thankful in resolving this..... (5 Replies)
Discussion started by: mjavalkar
5 Replies

6. Shell Programming and Scripting

Extracting the value of an attribute tag from XML

Greetings, I am very new to the UNIX shell scripting and would like to learn. However, I am currently stuck on how to process the below sample of code from an XML file using UNIX comands: <ATTRIBUTE NAME="Memory" VALUE="512MB"/> <ATTRIBUTE NAME="CPU Speed" VALUE="3.0GHz"/> <ATTRIBUTE... (5 Replies)
Discussion started by: JesterMania
5 Replies

7. UNIX for Dummies Questions & Answers

Extracting values from an XML file

Hello People, I have an xml file from which I need to extract the values of the parameters using UNIX shell commands. Ex : Input is like : <Name>Roger</Name> or <Address>MI</Address> I need the output as just : Roger or MI with the tags removed. Please help. (1 Reply)
Discussion started by: sushant172
1 Replies

8. Programming

Extracting Field values for XML file

i have an input file of XML type with data like <nx-charging:additional-parameter name="NX_INTERNATIONALIZED_CLID" value="919427960829"/><nx-charging:finalStatus>RESPONSE , Not/Applicable , OK</nx-charging:finalStatus></nx-charging:process> i want to extract data such that i get the output... (3 Replies)
Discussion started by: junaid.nehvi
3 Replies

9. Shell Programming and Scripting

KSH Script to Get the <TAG Values> from an XML file

Hi All, I am new to Unix I need a KSH script to get the values from XML file to write to a temp file. Like the requirement is from the below TAG <MAPPING DESCRIPTION ="Test Mapping" ISVALID ="YES" NAME ="m_test_xml" OBJECTVERSION ="1" VERSIONNUMBER ="1"> I need the MAPPING DESCRIPTION... (3 Replies)
Discussion started by: perlamohan
3 Replies

10. Shell Programming and Scripting

Extracting XML Tag Contents

Hi Jean I require your help in writing a shell script. Iam zero in Unix programming. I have a large file about 400 MB of data, which contains about 50000 XML messages seperated by a Tab, I think. I need to extract only 4 values from each XML message and write it onto a new file. Please help me... (2 Replies)
Discussion started by: pk_eee
2 Replies
Login or Register to Ask a Question