help with sed needed to extract content from html tags Post: 302604325

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to supplement HTML tags with SED

I am cleaning up HTML with sed. With the regexp <a name="+"></a><h>*<span class="mw-headline" >+</span></h> I can find the tags I need. But when I place them in a sed command, sed fails. So I started building up from a smaller command. This is where I am now: sed -r -e s/"<a...

2. UNIX for Advanced & Expert Users

sed to extract HTML content

Hiya, I am trying to extract a news article from a web page. The sed I have written brings back a lot of Javascript code and sometimes advertisments too. Can anyone please help with this one ??? I need to fix this sed so it picks up the article ONLY (don't worry about the title or date .. i got...

3. Shell Programming and Scripting

sed to extract only floating point numbers from HTML

Hi All, I'm trying to extract some floating point numbers from within some HTML code like this: <TR><TD class='awrc'>Parse CPU to Parse Elapsd %:</TD><TD ALIGN='right' class='awrc'> 64.50</TD><TD class='awrc'>% Non-Parse CPU:</TD><TD ALIGN='right' class='awrc'> ...

4. Shell Programming and Scripting

Extract URLs from HTML code using sed

Hello, i try to extract urls from google-search-results, but i have problem with sed filtering of html-code. what i wont is just list of urls thay apears between ........<p><a href=" and next following " in html code. here is my code, i use wget and pipelines to filtering. wget works, but...

5. Shell Programming and Scripting

sed - striping out html tags

I have pasted the contents of a log file (swmbackup.wrkstn.1262071383.sales2a) below: Workstation: sales2a<BR Vault sales2a-hogwarts will be initialized.<BR <font color="red"There was a problem mounting /mnt/sales2a/desktop$ </FONT<BR <font color="red"There was a problem mounting...

6. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more...

7. Shell Programming and Scripting

awk -- Extract data from html within multiple tags as reference

Hi, I'm trying to get some data from an html file, but the problem is before it can extract the information I have multiple patterns that need to be passed through. https://www.unix.com/shell-programming-scripting/150711-extract-data-awk-html-files.html Is a similar problem. The only...

8. UNIX for Dummies Questions & Answers

Replacing HTML tags with sed

Ok, so this is stupid simple, and I know I am going to feel like an idiot when I get help. I am altering a HTML report that has contraband in it so that the links to said contraband and the images are not shown. The link/img pairs are in the form of : <a...

9. Shell Programming and Scripting

Print content between two html tags

Hi Expert, Is there any other way to print and write to a same filename the content between two html tags? Here the sample: cat file.html <div id="outline"> hello world<br> </div> <div id="container_faq"> test1<br> </div> <div class="widget_quick"> thead test<br> </div> ...

10. Shell Programming and Scripting

Awk/sed HTML extract

I'm extracting text between table tags in HTML <th><a href="/wiki/Buick_LeSabre" title="Buick LeSabre">Buick LeSabre</a></th> using this: awk -F "</*th>" '/<\/*th>/ {print $2}' auto2 > auto3 then this (text between a href): sed -e 's/$<*>$//g' auto3 > auto4 How to shorten this into one...

LEARN ABOUT CENTOS

xml::dom::text

XML::DOM::Text(3)					User Contributed Perl Documentation					 XML::DOM::Text(3)

NAME

       XML::DOM::Text - A piece of XML text in XML::DOM

DESCRIPTION

       XML::DOM::Text extends XML::DOM::CharacterData, which extends XML::DOM::Node.

       The Text interface represents the textual content (termed character data in XML) of an Element or Attr. If there is no markup inside an
       element's content, the text is contained in a single object implementing the Text interface that is the only child of the element.  If
       there is markup, it is parsed into a list of elements and Text nodes that form the list of children of the element.

       When a document is first made available via the DOM, there is only one Text node for each block of text. Users may create adjacent Text
       nodes that represent the contents of a given element without any intervening markup, but should be aware that there is no way to represent
       the separations between these nodes in XML or HTML, so they will not (in general) persist between DOM editing sessions. The normalize()
       method on Element merges any such adjacent Text objects into a single node for each block of text; this is recommended before employing
       operations that depend on a particular document structure, such as navigation with XPointers.

   METHODS
       splitText (offset)
	   Breaks this Text node into two Text nodes at the specified offset, keeping both in the tree as siblings. This node then only contains
	   all the content up to the offset point. And a new Text node, which is inserted as the next sibling of this node, contains all the
	   content at and after the offset point.

	   Parameters:
	    offset  The offset at which to split, starting from 0.

	   Return Value: The new Text node.

	   DOMExceptions:

	   o   INDEX_SIZE_ERR

	       Raised if the specified offset is negative or greater than the number of characters in data.

	   o   NO_MODIFICATION_ALLOWED_ERR

	       Raised if this node is readonly.

perl v5.16.3							    2000-01-31							 XML::DOM::Text(3)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to supplement HTML tags with SED

Discussion started by: DocBrewer

2. UNIX for Advanced & Expert Users

sed to extract HTML content

Discussion started by: stargazerr

3. Shell Programming and Scripting

sed to extract only floating point numbers from HTML

Discussion started by: pondlife

4. Shell Programming and Scripting

Extract URLs from HTML code using sed

Discussion started by: L0rd