Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How do I extract text only from html file without HTML tag Post 83908 by LanceBoyles on Tuesday 20th of September 2005 07:29:28 PM
Old 09-20-2005
Use Lynx with the --dump option, like this:
Code:
lynx --dump myfile.html > myfile.txt

OR
Code:
lynx --dump http://some.where.com/whatever.html > myfile.txt

You can write a shell script that will do this for many files without you having to touch it.

Last edited by Yogesh Sawant; 12-20-2010 at 08:35 AM.. Reason: added code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parse HTML tag parameters and text

Hi! I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record. With awk and sed, I managed to put every table row in separate lines. So my file looks like this: <TR> .... </TR> <TR> .... </TR> ...One... (1 Reply)
Discussion started by: senszey
1 Replies

2. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies

3. Shell Programming and Scripting

Parsing HTML, get text between 2 HTML tags

Hi there, I'm quite new to the forum and shell scripting. I want to filter out the "166.0 points". The results, that i found in google / the forum search didn't helped me :( <a href="/user/test" class="headitem menu" style="color:rgb(83,186,224);">test</a><a href="/points" class="headitem... (1 Reply)
Discussion started by: Mysthik
1 Replies

4. Shell Programming and Scripting

Removing all except couple of html tags from html file

I tried to find elegant (or at least simple) way to remove all but couple of html tags from html file, but all examples I found dealt with removing all the tags. The logic of the script would be: - if there is <li> or <ul> on the line, do nothing (=write same line to output) - if there is:... (0 Replies)
Discussion started by: juubuntu
0 Replies

5. Shell Programming and Scripting

Add the html tag first and last line the file

Hi, i have 30 html files and i want to add the html tag first (<html>) and end of the line </html> tag..How to do it in script. Thanks, (7 Replies)
Discussion started by: bmk
7 Replies

6. Shell Programming and Scripting

Search for a html tag and print the entire tag

I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help eg. <fruits> <fruit id="111">mango<fruit> . another 20 lines . </fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies

7. UNIX for Dummies Questions & Answers

Extract table from an HTML file

I want to extract a table from an HTML file. the table starts with <table class="tableinfo" and ends with next closing table tag </table> how can I do this with awk/sed... ---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ---------- also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies

8. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

9. Shell Programming and Scripting

Extract both contents from a html file and do printing

Hi there, Print IP Address: grep 'HostID :' 10.244.9.124\ nessus.html | awk -F '<br>' '{print $12}' | tr -s ' ' | awk -F ':' '{print "<tr><td>" $2 "</td><td>"}' Print Respective Ports: grep 'classsubsection\|./tcp\|./udp' 10.244.9.124\ nessus.html | grep -v 'h2.classsubsection... (3 Replies)
Discussion started by: alvinoo
3 Replies

10. Shell Programming and Scripting

Extract text from html using perl or awk

I am trying to extract text after keywords fron an html file. The keywords are reportLink":, "barcodedSamples": {", "barcodedSamples": {". Both the perl and awk run but the output is just the entire index.html not the desired output. Also for the reportLink": only the text after the second / until... (5 Replies)
Discussion started by: cmccabe
5 Replies
HTML::TagCloud(3pm)					User Contributed Perl Documentation				       HTML::TagCloud(3pm)

NAME
HTML::TagCloud - Generate An HTML Tag Cloud SYNOPSIS
# A cloud with tags that link to other web pages. my $cloud = HTML::TagCloud->new; $cloud->add($tag1, $url1, $count1); $cloud->add($tag2, $url2, $count2); $cloud->add($tag3, $url3, $count3); my $html = $cloud->html_and_css(50); # A cloud with tags that do not link to other web pages. my $cloud = HTML::TagCloud->new; $cloud->add_static($tag1, $count1); $cloud->add_static($tag2, $count2); $cloud->add_static($tag3, $count3); my $html = $cloud->html_and_css(50); # A cloud that is comprised of tags in multiple categories. my $cloud = HTML::TagCloud->new; $cloud->add($tag1, $url1, $count1, $category1); $cloud->add($tag2, $url2, $count2, $category2); $cloud->add($tag3, $url3, $count3, $category3); my $html = $cloud->html_and_css(50); # The same cloud without tags that link to other web pages. my $cloud = HTML::TagCloud->new; $cloud->add_static($tag1, $count1, $category1); $cloud->add_static($tag2, $count2, $category2); $cloud->add_static($tag3, $count3, $category3); my $html = $cloud->html_and_css(50); # Obtaining uncategorized HTML for a categorized tag cloud. my $html = $cloud->html_without_categories(); # Explicitly requesting categorized HTML. my $html = $cloud->html_with_categories(); DESCRIPTION
The HTML::TagCloud module enables you to generate "tag clouds" in HTML. Tag clouds serve as a textual way to visualize terms and topics that are used most frequently. The tags are sorted alphabetically and a larger font is used to indicate more frequent term usage. Example sites with tag clouds: <http://www.43things.com/>, <http://www.astray.com/recipes/> and <http://www.flickr.com/photos/tags/>. This module provides a simple interface to generating a CSS-based HTML tag cloud. You simply pass in a set of tags, their URL and their count. This module outputs stylesheet-based HTML. You may use the included CSS or use your own. CONSTRUCTOR
new The constructor takes a few optional arguments: my $cloud = HTML::TagCloud->new(levels=>10); if not provided, levels defaults to 24 my $cloud = HTML::TagCloud->new(distinguish_adjacent_tags=>1); If distinguish_adjacent_tags is true HTML::TagCloud will use different CSS classes for adjacent tags in order to be able to make it easier to distinguish adjacent multi-word tags. If not specified, this parameter defaults to a false value. my $cloud = HTML::TagCloud->new(categories=>@categories); If categories are provided then tags are grouped in separate divisions by category when the HTML fragment is generated. METHODS
add This module adds a tag into the cloud. You pass in the tag name, its URL and its count: $cloud->add($tag1, $url1, $count1); $cloud->add($tag2, $url2, $count2); $cloud->add($tag3, $url3, $count3); add_static This module adds a tag that does not link to another web page into the cloud. You pass in the tag name and its count: $cloud->add_static($tag1, $count1); $cloud->add_static($tag2, $count2); tags($limit) Returns a list of hashrefs representing each tag in the cloud, sorted by alphabet. Each tag has the following keys: name, count, url and level. css This returns the CSS that will format the HTML returned by the html() method with tags which have a high count as larger: my $css = $cloud->css; html($limit) This returns the tag cloud as HTML without the embedded CSS (you should use both css() and html() or simply the html_and_css() method). If any categories were specified when items were being placed in the cloud then the tags will be organized into divisions by category name. If a limit is provided, only the top $limit tags are in the cloud, otherwise all the tags are in the cloud: my $html = $cloud->html(200); html_with_categories($limit) This returns the tag cloud as HTML without the embedded CSS. The tags will be arranged into divisions by category. If a limit is provided, only the top $limit tags are in the cloud. Otherwise, all tags are in the cloud. html_without_categories($limit) This returns the tag cloud as HTML without the embedded CSS. The tags will not be grouped by category if this method is used to generate the HTML. html_and_css($limit) This returns the tag cloud as HTML with embedded CSS. If a limit is provided, only the top $limit tags are in the cloud, otherwise all the tags are in the cloud: my $html_and_css = $cloud->html_and_css(50); AUTHOR
Leon Brocard, "<acme@astray.com>". COPYRIGHT
Copyright (C) 2005-6, Leon Brocard This module is free software; you can redistribute it or modify it under the same terms as Perl itself. perl v5.12.3 2011-06-18 HTML::TagCloud(3pm)
All times are GMT -4. The time now is 11:47 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy