Extract URL from RSS Feed in AWK


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract URL from RSS Feed in AWK
# 8  
Old 08-28-2010
Quote:
Originally Posted by fahdmirza
Hi Scrutinizer, thanks for the reply. Pardon my ignorance, but I have little confusion.

For example take the following line from the data:

<outline title="Matt Cutts" type="rss" version="RSS" xmlUrl="http://www.mattcutts.com/blog/feed/" htmlUrl="http://www.mattcutts.com/blog"/>

Now first your code makes the above full line (or record) as one field by doing RS=FS.

Then it matches the start of xmlUrl in above line, and now the field separater is ".

My question is how $2 contains the required url. Please explain.

Thanks.
It is the other way around; it takes the fields in the line and turns every field into a record....
Then it matches the records that start with xmlUrl

If the separator is " then there are three fields in the record that we are looking for:
$1 contains the part to the left of the first double quote, xmlUrl=
$2 contains the url and
$3 contains the part to the right of the second double quote, which is an empty string...

Does that answer you question?

Last edited by Scrutinizer; 08-28-2010 at 05:23 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 9  
Old 08-29-2010
Crystal Clear. Many many Thanks.

best regards
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to use GREP to extract URL from file

Hi All , Here is what I want to do: Given a line: 98.70.217.222 - - "GET /liveupdate-aka.symantec.com/1340071490jtun_nav2k8enn09m25.m25?h=abcdefgh HTTP/1.1" 200 159229484 "-" "hBU1OhDsPXknMepDBJNScBj4BQcmUz5TwAAAAA" "-" 1. Get the URL component: ... (2 Replies)
Discussion started by: Naks_Sh10
2 Replies

2. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

3. Shell Programming and Scripting

How to extract url from html page?

for example, I have an html file, contain <a href="http://awebsite" id="awebsite" class="first">website</a>and sometime a line contains more then one link, for example <a href="http://awebsite" id="awebsite" class="first">website</a><a href="http://bwebsite" id="bwebsite"... (36 Replies)
Discussion started by: 14th
36 Replies

4. Shell Programming and Scripting

SED extract url - please help a lamer

Hello everybody. I have lines that looks something like this: <done16=""118"" done18=""$ title=""thisisatitle"" href=""/JoeBanana" alt=""Joe""><done16=""118"" done18=""$ title=""thisisatitle"" href=""/GeraldGiraffe" alt=""Gerald""> What kind of SED command would I need to use to extract... (4 Replies)
Discussion started by: digi
4 Replies

5. Shell Programming and Scripting

replace last form feed with line feed

Hi I have a file with lots of line feeds and form feeds (page break). Need to replace last occurrence of form feed (created by - echo "\f" ) in the file with line feed. Please advise how can i achieve this. TIA Prvn (5 Replies)
Discussion started by: prvnrk
5 Replies

6. What is on Your Mind?

Post Your Favorite UNIX/Linux Related RSS Feed Links

Hello, I am planning to revise the RSS News subforum areas, here: News, Links, Events and Announcements - The UNIX Forums ... maybe with a subforum for each OS specific news, like HP-UX, Solaris, RedHat, OSX, etc. RSS subforums.... Please post your favorite OS specific RSS (RSS2) link... (0 Replies)
Discussion started by: Neo
0 Replies
Login or Register to Ask a Question