Sponsored Content
Top Forums Shell Programming and Scripting Read content between xml tags with awk, grep, awk or what ever... Post 302402760 by durden_tyler on Wednesday 10th of March 2010 02:51:05 PM
Old 03-10-2010
Here's a Perl solution. Assume your file is as follows -

Code:
$
$
$ cat sample.xml
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology
      society in England, the young survivors lay the
      foundation for a new society.</description>
   </book>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-03-10</publish_date>
      <description>In post-apocalypse England, the mysterious
      agent known only as Oberon helps to create a new life
      for the inhabitants of London. Sequel to Maeve
      Ascendant.</description>
   </book>
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters,
      battle one another for control of England. Sequel to
      Oberon's Legacy.</description>
   </book>
</catalog>
$
$

You want to pick up the stuff between the "<description>, </description>" tags.

The first occurrence is on a single line. The rest of them span multiple lines and you want the newlines to be preserved. I shall assume that you want the whitespaces to be preserved as well.

Here's the script -

Code:
$
$ perl -lne 'BEGIN{undef $/} while (/<description>(.*?)<\/description>/sg){print $1}' sample.xml
An in-depth look at creating applications with XML.
A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.
After the collapse of a nanotechnology
      society in England, the young survivors lay the
      foundation for a new society.
In post-apocalypse England, the mysterious
      agent known only as Oberon helps to create a new life
      for the inhabitants of London. Sequel to Maeve
      Ascendant.
The two daughters of Maeve, half-sisters,
      battle one another for control of England. Sequel to
      Oberon's Legacy.
$
$

In case you want the newlines preserved, but want to remove the whitespace at the beginning, then -

Code:
$
$ perl -lne 'BEGIN{undef $/} while (/<description>(.*?)<\/description>/sg){($x = $1) =~ s/\n\s*/\n/g; print $x}' sample.xml
An in-depth look at creating applications with XML.
A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.
After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.
In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.
The two daughters of Maeve, half-sisters,
battle one another for control of England. Sequel to
Oberon's Legacy.
$
$

And in case you want to neither the newline nor the whitespace i.e. each chunk between "<description>" tags on a single line, then -

Code:
$
$ perl -lne 'BEGIN{undef $/} while (/<description>(.*?)<\/description>/sg){($x = $1) =~ s/\n\s*//g; print $x}' sample.xml
An in-depth look at creating applications with XML.
A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.
After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.
In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant.
The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy.
$
$

HTH,
tyler_durden
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need help with awk - how to read a content of a file from every file from file list

Hi Experts. I need to list the file and the filename comes from the file ListOfFile.txt. Basicly I have a filename "ListOfFile.txt" and it contain Example of ListOfFile.txt /home/Dave/Program/Tran1.P /home/Dave/Program/Tran2.P /home/Dave/Program/Tran3.P /home/Dave/Program/Tran4.P... (7 Replies)
Discussion started by: tanit
7 Replies

2. UNIX for Dummies Questions & Answers

Using Awk within awk to read all files in directory

I am wondering if anyone has any idea how to use an awk within awk to read files and find a match which adds to count. Say I am searching how many times the word crap appears in each files within a directory. How would i do that from the command prompt ... thanks (6 Replies)
Discussion started by: flevongo
6 Replies

3. Shell Programming and Scripting

Read a file content with awk and sed

Hello , I have huge file with below content. I need to read the numeric values with in the paranthesis after = sign. Please help me with awk and sed script for it. 11.10.2009 04:02:47 Customer login not found: identifier=(0748502889) prefix=(TEL) serviceCode=(). 11.10.2009 04:03:12... (13 Replies)
Discussion started by: rmv
13 Replies

4. Shell Programming and Scripting

Help on awk to read xml file

Hello, I have a xml file as shown below. I want to parse the file and store data in variables. xml file looks like: <TEST NAME="DataBaseurl">jdbc:oracle:thin:@localhost:1521:ora10</TEST> <TEST NAME="Databaseuser">Pradeep</TEST> ...... and many other such lines i want to read this file and... (2 Replies)
Discussion started by: pradeepmacha
2 Replies

5. Shell Programming and Scripting

how to get tags content by grep

1) Is it possible to get tags content by grep -E ? For example title. Source text "<title>My page<title>"; to print "My page". 2) which bash utility to use when I want to use regex in this format? (?<=title>).*(?=</title) (11 Replies)
Discussion started by: visitor123
11 Replies

6. Shell Programming and Scripting

awk to retrieve the particular value from a same list of xml tags

Hi All, I have the following code in one of my xml file: <com:parameter> <com:name>secretKey</com:name> <com:value>31XA874821172E89B00B1C</com:value> </com:parameter> <com:parameter> <com:name>tryDisinfect</com:name> <com:value>false</com:value> </com:parameter> <com:parameter>... (4 Replies)
Discussion started by: mjavalkar
4 Replies

7. Shell Programming and Scripting

awk and or sed command to sum the value in repeating tags in a XML

I have a XML in which <Amt Ccy="EUR">3.1</Amt> tag repeats. This is under another tag <Main>. I need to sum all the values of <Amt Ccy=""> (Ccy may vary) coming under <Main> using awk and or sed command. can some help? Sample looks like below <root> <Main> ... (6 Replies)
Discussion started by: bk_12345
6 Replies

8. Shell Programming and Scripting

How to add Xml tags to an existing xml using shell or awk?

Hi , I have a below xml: <ns:Body> <ns:result> <Date Month="June" Day="Monday:/> </ns:result> </ns:Body> i have a lookup abc.txtt text file with below details Month June July August Day Monday Tuesday Wednesday I need a output xml with below tags <ns:Body> <ns:result>... (2 Replies)
Discussion started by: Nevergivup
2 Replies

9. UNIX for Dummies Questions & Answers

Grep content in xml file

I have an xml file with header as below. <Provider xmlns="http://www.xyzx.gov/xyz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xyzx.gov/xyz xyz.xsd" SCHEMA_VERSION="2.5" PROVIDER="5"> I want to get the schema version here that is 2.5 and put in a... (7 Replies)
Discussion started by: Ariean
7 Replies

10. UNIX for Dummies Questions & Answers

Piping grep into awk, read the next line using grep

Hi, I have a number of files containing the information below. """"" Fundallinfo 6.3950 14.9715 14.0482 """"" I would like to grep for Fundallinfo and use it to read the next line? I ideally would like to read the three numbers that follow in the next line and... (2 Replies)
Discussion started by: Paul Moghadam
2 Replies
All times are GMT -4. The time now is 02:29 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy