Extract text between two specified "constant" texts using awk
Hi All,
From the title you may know that this question has been asked several times and I have done lot of Googling on this.
I have a Wikipedia dump file in XML format. All the contents are in one XML file i.e. all different topics have been put in one XML file. Now I need to separate them and make separate files for each topic. After carefully going though the XML file, I found that the topics occur between <page> and </page> tags. I want to use awk to extract the topics and their descriptions in separate files like first topic goes into 1.dat and then second topic into 2.dat and so on till the end of file.
This is how Wikipedia XML file looks:
HTML Code:
<page><title>APRIL</title>
.........(text contents that I need to extract and store in 1.dat including the <title> tag)
</page><page><title>August</title>
....(text contents that I need to store in 2.dat including the <title> tag)
</page>
so on.......
I have done this but it created havoc.
Last edited by shoaibjameel123; 03-10-2011 at 08:58 AM..
Hi Friends,
Can any of you explain me about the below line of code?
mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`
Im not able to understand, what exactly it is doing :confused:
Any help would be useful for me.
Lokesha (4 Replies)
I have been lurking on this forum for some time now and appreciate Everyone's help. I need to find a way to get the SystemID from this XML file. The file is much larger than just this one line but I can grep and get this line Printed. But really just need the "systemid".
<test123: prefintem... (9 Replies)
Hi,
I have a file from which i need to extract data between two constant strings.
The data looks like this :
Line 1 SUN> read db @cmpd unit 60
Line 2 Parameter: CMPD -> "C00071"
Line 3
Line 4 SUN> generate
Line 5 tabint>ERROR: (Variable data)
The data i need to extract is... (11 Replies)
Hi,
I have line in input file as below:
3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL
My expected output for line in the file must be :
"1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL"
Can someone... (7 Replies)
Hello everyone
Sorry I have to add another sed question. I am searching a log file and need only the first 2 occurances of text which comes after (note the space) "string " and before a ",". I have tried
sed -n 's/.*string \(*\),.*/\1/p' filewith some, but limited success. This gives out all... (10 Replies)
logs:
"/home/abc/public_html/index.php"
"/home/abc/public_html/index.php"
"/home/xyz/public_html/index.php"
"/home/xyz/public_html/index.php"
"/home/xyz/public_html/index.php"
how to use "cut" or "awk" or "sed" to get the following result:
abc
abc
xyz
xyz
xyz (8 Replies)
I am trying to extract multiple strings from snmp-mib files like below.
-----
$ cat IF-MIB.mib
<snip>
linkDown NOTIFICATION-TYPE
OBJECTS { ifIndex, ifAdminStatus, ifOperStatus }
STATUS current
DESCRIPTION
"A linkDown trap signifies that the SNMP entity, acting in... (5 Replies)
Hi All,
I have 2 pipe delimited files viz., file_old and file_new. I'm trying to compare these 2 files, and extract all the different rows between them into a new_file.
comm -3 < sort file_old < sort file_new > new_file
I am getting the below error:
-ksh: sort: cannot open
But if I do... (7 Replies)
I am hoping to pull multiple strings from one file and use them to search within a block of text within another file.
File 1PS001,001 HLK
PS002,004 MWQ
PS004,002 RXM
PS004,006 DBX
PS004,006 SBR
PS005,007 ML
PS005,009 DBR
PS005,011 MR
PS005,012 SBR
PS006,003 RXM
PS006,003 >SJ
PS006,010... (11 Replies)
Discussion started by: jvoot
11 Replies
LEARN ABOUT DEBIAN
template::xml
Template::XML(3pm) User Contributed Perl Documentation Template::XML(3pm)NAME
Template::XML - XML plugins for the Template Toolkit
SYNOPSIS
[% USE XML;
dom = XML.dom('foo.xml');
xpath = XML.xpath('bar.xml');
simple = XML.simple('baz.xml');
rss = XML.simple('news.rdf');
%]
DESCRIPTION
The Template-XML distribution provides a number of Template Toolkit plugin modules for working with XML.
The Template::Plugin::XML module is a front-end to the various other XML plugin modules. Through this you can access XML files and direc-
tories of XML files via the Template::Plugin::XML::File and Template::Plugin::XML::Directory modules (which subclass from the Tem-
plate::Plugin::File and Template::Plugin::Directory modules respectively). You can then create a Document Object Model (DOM) from an XML
file (Template::Plugin::XML::DOM), examine it using XPath queries (Template::Plugin::XML::XPath), turn it into a Perl data structure (Tem-
plate::Plugin::XML::Simple) or parse it as an RSS (RDF Site Summary) file.
The basic XML plugins were distributed as part of the Template Toolkit until version 2.15 released in May 2006. At this time they were
extracted into this separate Template-XML distribution and an alpha version of this Template::Plugin::XML front-end module was added.
AUTHORS
Andy Wardley wrote the Template Toolkit plugin modules, with assistance from Simon Matthews in the case of the XML::DOM plugin. Matt
Sergeant wrote the XML::XPath module. Enno Derksen and Clark Cooper wrote the XML::DOM module. Jonathan Eisenzopf wrote the XML::RSS mod-
ule. Grant McLean wrote the XML::Simple module. Clark Cooper and Larry Wall wrote the XML::Parser module. James Clark wrote the expat
library.
COPYRIGHT
Copyright (C) 1996-2006 Andy Wardley. All Rights Reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
Template, Template::Plugins, Template::Plugin::XML, Template::Plugin::XML::DOM, Template::Plugin::XML::RSS, Template::Plugin::XML::Simple,
Template::Plugin::XML::XPath
perl v5.8.8 2008-03-01 Template::XML(3pm)