Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 03-02-2012
Registered User
 
Join Date: Mar 2012
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Noob question about parsing a website

I'm trying to parse the website, finance.yahoo.com/q?s=ge&ql=1, and retrieve the info between <span id="yfs_l84_ge">18.98</span>, so 18.98.

What would be the best way to go about this in a bash script?

Any help or suggestions will be much appreciated.
Thanks!

Last edited by mayson; 03-02-2012 at 06:50 PM..
Sponsored Links
    #2  
Old 03-03-2012
Registered User
 
Join Date: Mar 2012
Posts: 6
Thanks: 0
Thanked 1 Time in 1 Post
If bash script weren't a stringent requirement, I would suggest python, perl or php. Python has a package called BeautifulSoup to do exactly this... (you will need to also import urllib for fetching the page). Google-ing for these could give you some ready made scripts as well.

For bash script, you will need to use curl and awk (alternatively use wget and grep) but will be time consuming to get it done right (while soup.find("span", {'id' : "yfs_l84_ge"}) within your python parser would get you your required element).

Here's a quick non-debugged python code:


Code:
from BeautifulSoup import BeautifulSoup
import urllib
import urllib2

url = "<your website to crawl>"
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

values = {'name' : 'Friendly Spider',
          'location' : 'New York, USA',
          'language' : 'Python'}

headers = {'User-Agent' : user_agent }
urlData = urllib.urlencode(values)
req = urllib2.Request(url, urlData, headers)
response = urllib2.urlopen(url)
the_page = response.read()
soup = BeautifulSoup(the_page)

span = soup.find("span", {'id' : "yfs_l84_ge"})
if span:
   print span.contents[0].strip()


Last edited by eosbuddy; 03-03-2012 at 12:52 AM.. Reason: add code
Sponsored Links
    #3  
Old 03-03-2012
Scrutinizer's Avatar
Moderator
 
Join Date: Nov 2008
Location: Amsterdam
Posts: 7,350
Thanks: 144
Thanked 1,756 Times in 1,593 Posts
Try awk:

Code:
awk -F\> '/^span id="yfs_l84_ge"/{print $2}' RS=\<


Code:
awk '$1==s{print $2}' RS=\< FS=\> s='span id="yfs_l84_ge"'


Code:
awk '$1=="span id=\"" i "\""{print $2}' RS=\< FS=\> i="yfs_l84_ge"


Last edited by Scrutinizer; 03-03-2012 at 05:45 AM..
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Simple Noob Question sethartha Ubuntu 2 12-14-2009 08:39 PM
noob question about redirecting stderr trey85stang Shell Programming and Scripting 2 02-10-2009 02:10 PM
Noob question on comparing #'s. kirkm76 UNIX for Dummies Questions & Answers 6 05-16-2007 11:03 AM
Noob sorting question Hexabah UNIX for Dummies Questions & Answers 1 02-11-2007 03:57 AM
Question about compiling (noob) arya6000 Programming 9 12-11-2006 12:28 AM



All times are GMT -4. The time now is 06:51 AM.