Extract URL from RSS Feed in AWK Post: 302449040

Sponsored Content

Top Forums Shell Programming and Scripting Extract URL from RSS Feed in AWK Post 302449040 by fahdmirza on Saturday 28th of August 2010 03:30:15 AM

08-28-2010

Registered User

Extract URL from RSS Feed in AWK

Hi,
I have following data file;

Code:

<outline title="Matt Cutts" type="rss" version="RSS" xmlUrl="http://www.mattcutts.com/blog/feed/" htmlUrl="http://www.mattcutts.com/blog"/>
<outline title="Stone" text="Stone" type="rss" version="RSS" xmlUrl="http://feeds.feedburner.com/STC-Art" htmlUrl="http://www.stone.com/S.shtml"/>
<outline title="Stone" text="Stone" type="rss" version="RSS" ymlUrl="http://feeds.feedburner.com/STC-Art" htmlUrl="http://www.stone.com/S.shtml"/>
<outline title="Adam Leventhal's Weblog" text="Adam Leventhal's Weblog" type="rss" version="RSS" xmlUrl="http://blogs.sun.com/ahl/feed/entries/atom" htmlUrl="http://blogs.sun.com/ahl/"/>

I want to just extract the url in xmlUrl attribute and save it another file. I want to do it in awk.

Thanks for your time.

regards

fahdmirza

View Public Profile for fahdmirza

Find all posts by fahdmirza

6 More Discussions You Might Find Interesting

1. What is on Your Mind?

Post Your Favorite UNIX/Linux Related RSS Feed Links

Hello, I am planning to revise the RSS News subforum areas, here: News, Links, Events and Announcements - The UNIX Forums ... maybe with a subforum for each OS specific news, like HP-UX, Solaris, RedHat, OSX, etc. RSS subforums.... Please post your favorite OS specific RSS (RSS2) link...

2. Shell Programming and Scripting

replace last form feed with line feed

Hi I have a file with lots of line feeds and form feeds (page break). Need to replace last occurrence of form feed (created by - echo "\f" ) in the file with line feed. Please advise how can i achieve this. TIA Prvn

3. Shell Programming and Scripting

SED extract url - please help a lamer

Hello everybody. I have lines that looks something like this: <done16=""118"" done18=""$ title=""thisisatitle"" href=""/JoeBanana" alt=""Joe""><done16=""118"" done18=""$ title=""thisisatitle"" href=""/GeraldGiraffe" alt=""Gerald""> What kind of SED command would I need to use to extract...

4. Shell Programming and Scripting

How to extract url from html page?

for example, I have an html file, contain <a href="http://awebsite" id="awebsite" class="first">website</a>and sometime a line contains more then one link, for example <a href="http://awebsite" id="awebsite" class="first">website</a><a href="http://bwebsite" id="bwebsite"...

5. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1...

6. Shell Programming and Scripting

How to use GREP to extract URL from file

Hi All , Here is what I want to do: Given a line: 98.70.217.222 - - "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-" 1. Get the URL component: ...

LEARN ABOUT DEBIAN

disco

disco(1)						      General Commands Manual							  disco(1)

NAME

       disco - Mono's Web Service Discovery Tool

SYNOPSIS

       disco [options] url

DESCRIPTION

       disco is a tool for discovering web services and for retireving the documents that describe those services.

       url is the location of a DISCO document, which includes a list of WSDL documents, XML schemas and references to other DISCO documents.

       disco downloads the DISCO document and all referenced documents (unless the -nosave option is specified), and saves them to disk.

OPTIONS

       The following options are available:

       -nologo
	      Supress the startup logo.

       -nosave
	      Do not save the discovered documents to disk. The default is to save the documents.

       -o:directory , -out:directory
	      The directory where to save the discovered documents.  By default, documents are saved in the current directory.

       -u:username , -user:username
	      The user name to use when connecting to the server.

       -p:password , -password:password
	      The password to use when connecting to the server.

       -d:domain , -domain:domain
	      The domain to use when connecting to the server.

       -proxy:url
	      The url of the proxy server to use for http requests.

       -proxyusername:username
	      The user name to use when connecting to the proxy.

       -proxypassword:password
	      The password to use when connecting to the proxy.

       -proxydomain:domain
	      The domain to use when connecting to the proxy.

AUTHORS

       Lluis Sanchez Gual (lluis@ximian.com)

LICENSE

       disco is released under the terms of the GNU GPL.

SEE ALSO

       wsdl(1), soapsuds(1), mono(1), mcs(1)

																	  disco(1)