SED to extract HTML text data, not quite right!

01-30-2010

Registered User

7, 0

Join Date: Dec 2008

Last Activity: 11 August 2010, 6:06 PM EDT

Posts: 7

Thanks Given: 0

Thanked 0 Times in 0 Posts

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only:

Text Forecasts - Environment Canada

I use this:

Code:

 sed -n "/Greater Victoria./,/Fraser Valley./p"

But that phrasing does not sometimes get it all and think perhaps the website has more than one linefeed, carriage return, whatever, that messes up my coding. Any ideas appreciated.

Larry

lagagnon

View Public Profile for lagagnon

Find all posts by lagagnon

01-31-2010

Administrator Emeritus

9,179, 1,331

Join Date: Jun 2009

Last Activity: 26 February 2019, 5:57 PM EST

Posts: 9,179

Thanks Given: 430

Thanked 1,331 Times in 1,120 Posts

Hi.

If I copy the text from the site, and modify your statement slightly:

Code:

sed -n "/Greater Victoria./,/^$/p" file1

I get this output:

Code:

Greater Victoria.
Monday..Cloudy. High 8.
Tuesday..Cloudy with 60 percent chance of showers. Low plus 3.
 High 7.
Wednesday..Cloudy with 40 percent chance of flurries or rain showers.
 Low plus 3. High 8.
Thursday..Cloudy with 60 percent chance of rain showers mixed with
 flurries. Low plus 3. High 8.
Friday..Cloudy with 30 percent chance of showers. Low plus 3. High 9.
Normals for the period..Low plus 2. High 7.

It looks exactly the same as on the website. Is it not want you wanted?

Scott

View Public Profile for Scott

Find all posts by Scott

01-31-2010

Registered User

307, 29

Join Date: May 2008

Last Activity: 7 September 2011, 6:25 AM EDT

Location: Maryland, USA

Posts: 307

Thanks Given: 2

Thanked 29 Times in 21 Posts

Google weather api. There's a bunch of them.

For example, I looked at Yahoo! Weather RSS Feed to come up with this for Victoria weather:

Code:

curl -s "http://weather.yahooapis.com/forecastrss?w=9848&u=c"|sed "1,/Current Conditions/d"|head -n1

The text is simpler and more predictable than a regular web site, so it shouldn't be too difficult to parse out the interesting items.

KenJackson

View Public Profile for KenJackson

Find all posts by KenJackson

Shell Programming and Scripting

SED to extract HTML text data, not quite right!

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract text from html using perl or awk

Discussion started by: cmccabe

2. Shell Programming and Scripting

Awk/sed HTML extract

Discussion started by: p1ne

3. Shell Programming and Scripting

awk -- Extract data from html within multiple tags as reference

Discussion started by: counfhou

4. Shell Programming and Scripting

extract complex data from html table rows

Discussion started by: rickgtx

5. Shell Programming and Scripting

help with sed needed to extract content from html tags

Discussion started by: seb001

6. Shell Programming and Scripting

extract data with awk from html files

Discussion started by: sbobotex

7. Shell Programming and Scripting

Extract URLs from HTML code using sed

Discussion started by: L0rd

8. UNIX for Advanced & Expert Users

sed to extract HTML content

Discussion started by: stargazerr

9. UNIX for Dummies Questions & Answers

extract data from html tables

Discussion started by: Streetrcr

10. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

Discussion started by: los111