help with sed needed to extract content from html tags

03-04-2012

Registered User

4, 0

Join Date: Mar 2012

Last Activity: 6 March 2012, 6:44 AM EST

Posts: 4

Thanks Given: 4

Thanked 0 Times in 0 Posts

help with sed needed to extract content from html tags

Hi
I've searched for it for few hours now and i can't seem to find anything working like i want. I've got webpage, saved in file par with form like this:

Code:

<html><body><form name='sendme' action='http://example.com/' method='POST'>
<textarea name='1st'>abc123def678</textarea>
<textarea name='2nd'>Text</textarea>
<textarea name='3rd'>Text</textarea>
</form></body></html>

and I want to extract content from textarea tags

Code:

cat par | sed -e 's/.*1st//' -e 's/<\/textarea>.*//'

wchich returns

Code:

'>abc123def678

i can't seem to be able to get rid of the '>, can anyone recommend me a working solution with sed ?

seb001

View Public Profile for seb001

Find all posts by seb001

03-04-2012

Registered User

2,019, 606

Join Date: Apr 2009

Last Activity: 27 February 2021, 12:15 PM EST

Location: India

Posts: 2,019

Thanks Given: 50

Thanked 606 Times in 567 Posts

Code:

sed '/1st/ s:<textarea[^>]*>\([^<]*\)</textarea>:\1:;q' par

This User Gave Thanks to balajesuri For This Post:

balajesuri

View Public Profile for balajesuri

Find all posts by balajesuri

03-04-2012

Registered User

4, 0

Join Date: Mar 2012

Last Activity: 6 March 2012, 6:44 AM EST

Posts: 4

Thanks Given: 4

Thanked 0 Times in 0 Posts

that returns

Code:

<html><body><form name='sendme' action='http://example.com/' method='POST'>
abc123def678
<textarea name='2nd'>Text</textarea>
<textarea name='3rd'>Text</textarea>
</form></body></html>

i've tried

Code:

sed '/1st/ s:<textarea[^>]*>\([^<]*\)</textarea>.*:\1:;q' par

with result

Code:

<html><body><form name='sendme' action='http://example.com/' method='POST'>
abc123def678

Code:

sed '/1st/ s:.*<textarea[^>]*>\([^<]*\)</textarea>.*:\1:;q' par

returns last textarea with text

any idea how to modify it ?

seb001

View Public Profile for seb001

Find all posts by seb001

03-04-2012

Registered User

1, 0

Join Date: Mar 2012

Last Activity: 4 March 2012, 12:20 PM EST

Posts: 1

Thanks Given: 0

Thanked 0 Times in 0 Posts

AttributeError: 'module' object has no attribute 'logger

hi all, when installing sage, there is a problem with emacs.py
so, this screen appeared after rynning ./sage
----------------------------------------------------------------------
| Sage Version 4.4.2, Release Date: 2010-05-19 |
| Type notebook() for the GUI, and license() for information. |
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/zid/sage/local/bin/sage-ipython", line 18, in <module>
import IPython
File "/usr/lib/python2.7/dist-packages/IPython/__init__.py", line
58, in <module>
__import__(name,glob,loc,[])
File "/usr/lib/python2.7/dist-packages/IPython/ipstruct.py", line
17, in <module>
from IPython.genutils import list2dict2
File "/usr/lib/python2.7/dist-packages/IPython/genutils.py", line
114, in <module>
import IPython.rlineimpl as readline
File "/usr/lib/python2.7/dist-packages/IPython/rlineimpl.py", line
18, in <module>
from pyreadline import *
File "/usr/local/lib/python2.7/dist-packages/pyreadline-2.0_dev1-
py2.7.egg/pyreadline/__init__.py", line 11, in <module>
from . import unicode_helper, logger, clipboard, lineeditor,
modes, console
File "/usr/local/lib/python2.7/dist-packages/pyreadline-2.0_dev1-
py2.7.egg/pyreadline/modes/__init__.py", line 3, in <module>
from . import emacs, notemacs, vi
File "/usr/local/lib/python2.7/dist-packages/pyreadline-2.0_dev1-
py2.7.egg/pyreadline/modes/emacs.py", line 11, in <module>
import pyreadline.logger as logger
AttributeError: 'module' object has no attribute 'logger'
any one can help me pleaseeee

regards
Zid

youssefmahdia

View Public Profile for youssefmahdia

Find all posts by youssefmahdia

03-04-2012

Registered User

4,996, 477

Join Date: Dec 2003

Last Activity: 12 June 2016, 11:03 PM EDT

Location: /dev/ph

Posts: 4,996

Thanks Given: 73

Thanked 477 Times in 439 Posts

Code:

sed -n '/textarea/p' infile | sed -e 's/<[^>]*>//g'

gives

Code:

abc123def678
Text
Text

This User Gave Thanks to fpmurphy For This Post:

fpmurphy

View Public Profile for fpmurphy

Find all posts by fpmurphy

03-05-2012

Registered User

894, 183

Join Date: Jul 2010

Last Activity: 2 November 2018, 11:07 AM EDT

Location: IN

Posts: 894

Thanks Given: 15

Thanked 183 Times in 174 Posts

Or this..?

Code:

sed -n '/text/s/<[^<]*>//gp' inputfile
sed -n '/text.*1st/s/<[^<]*>//gp' inputfile

Last edited by michaelrozar17; 03-05-2012 at 02:03 AM.. Reason: alternate sed solution..

This User Gave Thanks to michaelrozar17 For This Post:

michaelrozar17

View Public Profile for michaelrozar17

Find all posts by michaelrozar17

03-05-2012

Registered User

4, 0

Join Date: Mar 2012

Last Activity: 6 March 2012, 6:44 AM EST

Posts: 4

Thanks Given: 4

Thanked 0 Times in 0 Posts

still not there, both (fpmurphy, michaelrozar17) return same result
everything in between all html brackets

Code:

abc123def678TextText

seb001

View Public Profile for seb001

Find all posts by seb001

Shell Programming and Scripting

help with sed needed to extract content from html tags

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk/sed HTML extract

Discussion started by: p1ne

2. Shell Programming and Scripting

Print content between two html tags

Discussion started by: lxdorney

3. UNIX for Dummies Questions & Answers

Replacing HTML tags with sed

Discussion started by: twjolson

4. Shell Programming and Scripting

awk -- Extract data from html within multiple tags as reference

Discussion started by: counfhou

5. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

Discussion started by: lagagnon

6. Shell Programming and Scripting

sed - striping out html tags

Discussion started by: bigtonydallas

7. Shell Programming and Scripting

Extract URLs from HTML code using sed

Discussion started by: L0rd

8. Shell Programming and Scripting

sed to extract only floating point numbers from HTML

Discussion started by: pondlife

9. UNIX for Advanced & Expert Users

sed to extract HTML content

Discussion started by: stargazerr

10. Shell Programming and Scripting

How to supplement HTML tags with SED

Discussion started by: DocBrewer