|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Noob question about parsing a website
I'm trying to parse the website, finance.yahoo.com/q?s=ge&ql=1, and retrieve the info between <span id="yfs_l84_ge">18.98</span>, so 18.98.
What would be the best way to go about this in a bash script? Any help or suggestions will be much appreciated. Thanks! Last edited by mayson; 03-02-2012 at 06:50 PM.. |
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
If bash script weren't a stringent requirement, I would suggest python, perl or php. Python has a package called BeautifulSoup to do exactly this... (you will need to also import urllib for fetching the page). Google-ing for these could give you some ready made scripts as well. For bash script, you will need to use curl and awk (alternatively use wget and grep) but will be time consuming to get it done right (while soup.find("span", {'id' : "yfs_l84_ge"}) within your python parser would get you your required element). Here's a quick non-debugged python code: Code:
from BeautifulSoup import BeautifulSoup
import urllib
import urllib2
url = "<your website to crawl>"
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'name' : 'Friendly Spider',
'location' : 'New York, USA',
'language' : 'Python'}
headers = {'User-Agent' : user_agent }
urlData = urllib.urlencode(values)
req = urllib2.Request(url, urlData, headers)
response = urllib2.urlopen(url)
the_page = response.read()
soup = BeautifulSoup(the_page)
span = soup.find("span", {'id' : "yfs_l84_ge"})
if span:
print span.contents[0].strip()Last edited by eosbuddy; 03-03-2012 at 12:52 AM.. Reason: add code |
| Sponsored Links | ||
|
|
#3
|
||||
|
||||
|
Try awk: Code:
awk -F\> '/^span id="yfs_l84_ge"/{print $2}' RS=\<Code:
awk '$1==s{print $2}' RS=\< FS=\> s='span id="yfs_l84_ge"'Code:
awk '$1=="span id=\"" i "\""{print $2}' RS=\< FS=\> i="yfs_l84_ge"Last edited by Scrutinizer; 03-03-2012 at 05:45 AM.. |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Simple Noob Question | sethartha | Ubuntu | 2 | 12-14-2009 08:39 PM |
| noob question about redirecting stderr | trey85stang | Shell Programming and Scripting | 2 | 02-10-2009 02:10 PM |
| Noob question on comparing #'s. | kirkm76 | UNIX for Dummies Questions & Answers | 6 | 05-16-2007 11:03 AM |
| Noob sorting question | Hexabah | UNIX for Dummies Questions & Answers | 1 | 02-11-2007 03:57 AM |
| Question about compiling (noob) | arya6000 | Programming | 9 | 12-11-2006 12:28 AM |
|
|