Sponsored Content
Top Forums Shell Programming and Scripting Extract URL from RSS Feed in AWK Post 302449047 by kurumi on Saturday 28th of August 2010 04:06:34 AM
Old 08-28-2010
Code:
#!/bin/bash
exec 6<"file"
while read -r LINE<&6
do
  case "$LINE" in
   *xmlUrl*)
      LINE=${LINE##*xmlUrl=\"}
      echo ${LINE%%\" *};;
  esac
done
exec 6<&-

 

6 More Discussions You Might Find Interesting

1. What is on Your Mind?

Post Your Favorite UNIX/Linux Related RSS Feed Links

Hello, I am planning to revise the RSS News subforum areas, here: News, Links, Events and Announcements - The UNIX Forums ... maybe with a subforum for each OS specific news, like HP-UX, Solaris, RedHat, OSX, etc. RSS subforums.... Please post your favorite OS specific RSS (RSS2) link... (0 Replies)
Discussion started by: Neo
0 Replies

2. Shell Programming and Scripting

replace last form feed with line feed

Hi I have a file with lots of line feeds and form feeds (page break). Need to replace last occurrence of form feed (created by - echo "\f" ) in the file with line feed. Please advise how can i achieve this. TIA Prvn (5 Replies)
Discussion started by: prvnrk
5 Replies

3. Shell Programming and Scripting

SED extract url - please help a lamer

Hello everybody. I have lines that looks something like this: <done16=""118"" done18=""$ title=""thisisatitle"" href=""/JoeBanana" alt=""Joe""><done16=""118"" done18=""$ title=""thisisatitle"" href=""/GeraldGiraffe" alt=""Gerald""> What kind of SED command would I need to use to extract... (4 Replies)
Discussion started by: digi
4 Replies

4. Shell Programming and Scripting

How to extract url from html page?

for example, I have an html file, contain <a href="http://awebsite" id="awebsite" class="first">website</a>and sometime a line contains more then one link, for example <a href="http://awebsite" id="awebsite" class="first">website</a><a href="http://bwebsite" id="bwebsite"... (36 Replies)
Discussion started by: 14th
36 Replies

5. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

6. Shell Programming and Scripting

How to use GREP to extract URL from file

Hi All , Here is what I want to do: Given a line: 98.70.217.222 - - "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-" 1. Get the URL component: ... (2 Replies)
Discussion started by: Naks_Sh10
2 Replies
XML::Feed(3pm)						User Contributed Perl Documentation					    XML::Feed(3pm)

NAME
XML::Feed - Syndication feed parser and auto-discovery SYNOPSIS
use XML::Feed; my $feed = XML::Feed->parse(URI->new('http://example.com/atom.xml')) or die XML::Feed->errstr; print $feed->title, " "; for my $entry ($feed->entries) { } ## Find all of the syndication feeds on a given page, using ## auto-discovery. my @feeds = XML::Feed->find_feeds('http://example.com/'); DESCRIPTION
XML::Feed is a syndication feed parser for both RSS and Atom feeds. It also implements feed auto-discovery for finding feeds, given a URI. XML::Feed supports the following syndication feed formats: o RSS 0.91 o RSS 1.0 o RSS 2.0 o Atom The goal of XML::Feed is to provide a unified API for parsing and using the various syndication formats. The different flavors of RSS and Atom handle data in different ways: date handling; summaries and content; escaping and quoting; etc. This module attempts to remove those differences by providing a wrapper around the formats and the classes implementing those formats (XML::RSS and XML::Atom::Feed). For example, dates are handled differently in each of the above formats. To provide a unified API for date handling, XML::Feed converts all date formats transparently into DateTime objects, which it then returns to the caller. USAGE
XML::Feed->new($format) Creates a new empty XML::Feed object using the format $format. $feed = XML::Feed->new('Atom'); $feed = XML::Feed->new('RSS'); $feed = XML::Feed->new('RSS', version => '0.91'); XML::Feed->parse($stream) XML::Feed->parse($stream, $format) Parses a syndication feed identified by $stream and returns an XML::Feed obhect. $stream can be any one of the following: o Scalar reference A reference to string containing the XML body of the feed. o Filehandle An open filehandle from which the feed XML will be read. o File name The name of a file containing the feed XML. o URI object A URI from which the feed XML will be retrieved. $format allows you to override format guessing. XML::Feed->find_feeds($uri) Given a URI $uri, use auto-discovery to find all of the feeds linked from that page (using <link> tags). Returns a list of feed URIs. XML::Feed->identify_format($xml) Given the xml of a feed return what format it is in ("Atom", or some version of "RSS"). $feed->convert($format) Converts the XML::Feed object into the $format format, and returns the new object. $feed->splice($other_feed) Splices in all of the entries from the feed $other_feed into $feed, skipping posts that are already in $feed. $feed->format Returns the format of the feed ("Atom", or some version of "RSS"). $feed->title([ $title ]) The title of the feed/channel. $feed->base([ $base ]) The url base of the feed/channel. $feed->link([ $uri ]) The permalink of the feed/channel. $feed->tagline([ $tagline ]) The description or tagline of the feed/channel. $feed->description([ $description ]) Alias for $feed->tagline. $feed->author([ $author ]) The author of the feed/channel. $feed->language([ $language ]) The language of the feed. $feed->copyright([ $copyright ]) The copyright notice of the feed. $feed->modified([ $modified ]) A DateTime object representing the last-modified date of the feed. If present, $modified should be a DateTime object. $feed->generator([ $generator ]) The generator of the feed. $feed->self_link ([ $uri ]) The Atom Self-link of the feed: <http://validator.w3.org/feed/docs/warning/MissingAtomSelfLink.html> A string. $feed->entries A list of the entries/items in the feed. Returns an array containing XML::Feed::Entry objects. $feed->items A synonym (alias) for <$feed->entries>. $feed->add_entry($entry) Adds an entry to the feed. $entry should be an XML::Feed::Entry object in the correct format for the feed. $feed->as_xml Returns an XML representation of the feed, in the format determined by the current format of the $feed object. PACKAGE VARIABLES
$XML::Feed::Format::RSS::PREFERRED_PARSER If you want to use another RSS parser class than XML::RSS (default), you can change the class by setting $PREFERRED_PARSER variable in the XML::Feed::Format::RSS package. $XML::Feed::Format::RSS::PREFERRED_PARSER = "XML::RSS::LibXML"; Note: this will only work for parsing feeds, not creating feeds. Note: Only "XML::RSS::LibXML" version 0.3004 is known to work at the moment. $XML::Feed::MULTIPLE_ENCLOSURES Although the RSS specification states that there can be at most one enclosure per item some feeds break this rule. If this variable is set then "XML::Feed" captures all of them and makes them available as a list. Otherwise it returns the last enclosure parsed. Note: "XML::RSS" version 1.44 is needed for this to work. VALID FEEDS
For reference, this cgi script will create valid, albeit nonsensical feeds (according to "http://feedvalidator.org" anyway) for Atom 1.0 and RSS 0.90, 0.91, 1.0 and 2.0. #!perl -w use strict; use CGI; use CGI::Carp qw(fatalsToBrowser); use DateTime; use XML::Feed; my $cgi = CGI->new; my @args = ( $cgi->param('format') || "Atom" ); push @args, ( version => $cgi->param('version') ) if $cgi->param('version'); my $feed = XML::Feed->new(@args); $feed->id("http://".time.rand()."/"); $feed->title('Test Feed'); $feed->link($cgi->url); $feed->self_link($cgi->url( -query => 1, -full => 1, -rewrite => 1) ); $feed->modified(DateTime->now); my $entry = XML::Feed::Entry->new(); $entry->id("http://".time.rand()."/"); $entry->link("http://example.com"); $entry->title("Test entry"); $entry->summary("Test summary"); $entry->content("Foo"); $entry->modified(DateTime->now); $entry->author('test@example.com (Testy McTesterson)'); $feed->add_entry($entry); my $mime = ("Atom" eq $feed->format) ? "application/atom+xml" : "application/rss+xml"; print $cgi->header($mime); print $feed->as_xml; LICENSE
XML::Feed is free software; you may redistribute it and/or modify it under the same terms as Perl itself. AUTHOR &; COPYRIGHT Except where otherwise noted, XML::Feed is Copyright 2004-2008 Six Apart, cpan@sixapart.com. All rights reserved. SUBVERSION
The latest version of XML::Feed can be found at http://code.sixapart.com/svn/XML-Feed/trunk/ perl v5.14.2 2012-03-21 XML::Feed(3pm)
All times are GMT -4. The time now is 05:25 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy