Since HTML is very similar to XML, you may use an xml tool to parse your file.
Since your HTML-File is not fully standards compliant, the parser complains about it and the file has either be adapted by hand to be compliant or to be preprocessed prior to the parsing. The <br> is the problematic element. Compliant would be <br/> with a slash within the tag.
So you can do it with an xmlparser like xmlstarlet in three steps:
1. Make the html file compliant by replacing the br-Tags
2. Get the wanted HTML-Element with xmlstarlet
3. suppress unwanted empty lines and leading whitespace in data / xmlstarlet output
I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part.
Same problem happens in "type" command in MS-DOS.
I know you can do it by opening it in Internet Explorer,... (4 Replies)
Hai friends
I have a small doubt..
how can we use html tag in shell scripting
code :
echo "<html>"
echo "<body>"
echo " welcome to peace world "
echo "</body>"
echo "</html>"
output displayed like this:
<html>
<body>
welcome to peace world
</body>
</html> (5 Replies)
hi all,
i have a html file something similar to this.
<tr class="evenrow">
<td class="data">added</td><td class="data">xyz@abc.com</td>
<td class="data">filename.sql</td><td class="modifications-data">08/25/2009 07:58:40</td><td class="data">Added TK prof script</td>
</tr>
<tr... (1 Reply)
Hi!
I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record.
With awk and sed, I managed to put every table row in separate lines. So my file looks like this:
<TR> .... </TR>
<TR> .... </TR>
...One... (1 Reply)
Guys,
I have a little script that I got of the internet and that I use in Squid to block ads.
I used that script with linux but now i have moved my servers to freebsd. I have a step learning curve there but it is fun: Back to the script issue.
The script used to work i with linux but... (15 Replies)
I have an XML tag like this:
<property name="agent" value="/var/tmp/root/eclipse" />
Is there way using awk that i can get the value from the above tag. So the output should be:
/var/tmp/root/eclipse
Help will be appreciated.
Regards,
Adi (6 Replies)
I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help
eg.
<fruits>
<fruit id="111">mango<fruit>
.
another 20 lines
.
</fruits> (3 Replies)
Hi Guys
Here is my Input :
<?xml version="1.0" encoding="UTF-8"?>
<xn:MeContext id="01736">
<xn:VsDataContainer id="01736">
<xn:attributes>
<xn:vsDataType>vsDataMeContext</xn:vsDataType>
... (12 Replies)
I want to clean a html file.
I try to remove the script part in the html and remove the rest of tags and empty lines.
The code I try to use is the following:
sed '/<script/,/<\/script>/d' webpage.html | sed -e 's/<*>//g' | sed '/^\s*$/d' > output.txt
However, in this method, I can not... (10 Replies)
Discussion started by: YuhuiFeng
10 Replies
LEARN ABOUT DEBIAN
html::elementraw
HTML::ElementRaw(3pm) User Contributed Perl Documentation HTML::ElementRaw(3pm)NAME
HTML::ElementRaw - Perl extension for HTML::Element(3).
SYNOPSIS
use HTML::ElementRaw;
$er = new HTML::ElementRaw;
$text = '<p>I would like this HTML to not be encoded</p>';
$er->push_content($text);
$h = new HTML::Element 'h2';
$h->push_content($er);
# Now $text will appear as you typed it, non-escaped,
# embedded in the HTML produced by $h.
print $h->as_HTML;
DESCRIPTION
Provides a way to graft raw HTML strings into your HTML::Element(3) structures. Since they represent raw text, these can only be leaves in
your HTML element tree. The only methods that are of any real use in this degenerate element are push_content() and as_HTML(). The
push_content() method will simply prepend the provided text to the current content. If you happen to pass an HTML::element to
push_content, the output of the as_HTML() method in that element will be prepended.
REQUIRES HTML::Element(3)AUTHOR
Matthew P. Sisk, <sisk@mojotoad.com>
COPYRIGHT
Copyright (c) 1998-2010 Matthew P. Sisk. All rights reserved. All wrongs revenged. This program is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.
SEE ALSO HTML::Element(3), HTML::ElementSuper(3), HTML::Element::Glob(3), HTML::ElementTable(3), perl(1).
perl v5.10.1 2010-06-09 HTML::ElementRaw(3pm)