Extract URL from RSS Feed in AWK


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract URL from RSS Feed in AWK
# 1  
Old 08-28-2010
Extract URL from RSS Feed in AWK

Hi,
I have following data file;
Code:
<outline title="Matt Cutts" type="rss" version="RSS" xmlUrl="http://www.mattcutts.com/blog/feed/" htmlUrl="http://www.mattcutts.com/blog"/>
<outline title="Stone" text="Stone" type="rss" version="RSS" xmlUrl="http://feeds.feedburner.com/STC-Art" htmlUrl="http://www.stone.com/S.shtml"/>
<outline title="Stone" text="Stone" type="rss" version="RSS" ymlUrl="http://feeds.feedburner.com/STC-Art" htmlUrl="http://www.stone.com/S.shtml"/>
<outline title="Adam Leventhal's Weblog" text="Adam Leventhal's Weblog" type="rss" version="RSS" xmlUrl="http://blogs.sun.com/ahl/feed/entries/atom" htmlUrl="http://blogs.sun.com/ahl/"/>

I want to just extract the url in xmlUrl attribute and save it another file. I want to do it in awk.

Thanks for your time.

regards
# 2  
Old 08-28-2010
Code:
#!/bin/bash
exec 6<"file"
while read -r LINE<&6
do
  case "$LINE" in
   *xmlUrl*)
      LINE=${LINE##*xmlUrl=\"}
      echo ${LINE%%\" *};;
  esac
done
exec 6<&-

# 3  
Old 08-28-2010
Code:
awk 'BEGIN{RS=FS}/^xmlUrl/{print $2}' FS='"' infile

Output:
Code:
http://www.mattcutts.com/blog/feed/
http://feeds.feedburner.com/STC-Art
http://blogs.sun.com/ahl/feed/entries/atom

This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 08-28-2010
Hi Scrutinizer,
Thanks a lot. It works. Please do a favor and explain your code in words please.

regards
# 5  
Old 08-28-2010
Hi fahdmirza, this awk script changes the record separator to the value of the field separator so that every record becomes one field. Then it splits the new records in new fields separated by double quotes. The required values are then in the second new field of the new records that start with xmlUrl.

Last edited by Scrutinizer; 08-28-2010 at 06:33 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 08-28-2010
Code:
RS=FS, good idea

# 7  
Old 08-28-2010
Hi Scrutinizer, thanks for the reply. Pardon my ignorance, but I have little confusion.

For example take the following line from the data:

<outline title="Matt Cutts" type="rss" version="RSS" xmlUrl="http://www.mattcutts.com/blog/feed/" htmlUrl="http://www.mattcutts.com/blog"/>

Now first your code makes the above full line (or record) as one field by doing RS=FS.

Then it matches the start of xmlUrl in above line, and now the field separater is ".

My question is how $2 contains the required url. Please explain.

Thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to use GREP to extract URL from file

Hi All , Here is what I want to do: Given a line: 98.70.217.222 - - "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-" 1. Get the URL component: ... (2 Replies)
Discussion started by: Naks_Sh10
2 Replies

2. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

3. Shell Programming and Scripting

How to extract url from html page?

for example, I have an html file, contain <a href="http://awebsite" id="awebsite" class="first">website</a>and sometime a line contains more then one link, for example <a href="http://awebsite" id="awebsite" class="first">website</a><a href="http://bwebsite" id="bwebsite"... (36 Replies)
Discussion started by: 14th
36 Replies

4. Shell Programming and Scripting

SED extract url - please help a lamer

Hello everybody. I have lines that looks something like this: <done16=""118"" done18=""$ title=""thisisatitle"" href=""/JoeBanana" alt=""Joe""><done16=""118"" done18=""$ title=""thisisatitle"" href=""/GeraldGiraffe" alt=""Gerald""> What kind of SED command would I need to use to extract... (4 Replies)
Discussion started by: digi
4 Replies

5. Shell Programming and Scripting

replace last form feed with line feed

Hi I have a file with lots of line feeds and form feeds (page break). Need to replace last occurrence of form feed (created by - echo "\f" ) in the file with line feed. Please advise how can i achieve this. TIA Prvn (5 Replies)
Discussion started by: prvnrk
5 Replies

6. What is on Your Mind?

Post Your Favorite UNIX/Linux Related RSS Feed Links

Hello, I am planning to revise the RSS News subforum areas, here: News, Links, Events and Announcements - The UNIX Forums ... maybe with a subforum for each OS specific news, like HP-UX, Solaris, RedHat, OSX, etc. RSS subforums.... Please post your favorite OS specific RSS (RSS2) link... (0 Replies)
Discussion started by: Neo
0 Replies
Login or Register to Ask a Question