Removing html tags Post: 302642383

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

removing html tags via parameter expansion

Hi all- I have a variable that contains a web page: echo $STUFF <html> <head> <title>my page</title></head> <body> blah blah etc.. Can I use the shell's parameter expansion abilities to remove just the tags? I thought that FIXHTML=${STUFF//<*>/} might do it, but it didn't seem to...

2. Shell Programming and Scripting

searching & replacing/removing only certain HTML tags

I generally save a lot of web pages for reading offline which works out great for school. Now I have to spend a lot of time on the bus and I am looking for the best way to read some of these webpages using my Nokia 7610. I have uploaded the files to my phone, but they are deadly deadly slow to...

3. UNIX for Advanced & Expert Users

Removing HTML tags

Hello Unix Gurus I am having a problem with one of the files that i am generating using a Unix Script. This Unix Scripts connects to the MY SQL Server and loads the data into a Text file. While generating the Text file for one of the tables the value in one of the column is as follows. <p>...

4. Shell Programming and Scripting

removing html format with sed

Hello i am trying to remove the html format from the file using sed. for example remove <p> </p> i tried to do this : sed -e 's/<*>//g' test > test.t but still i have some html format . please help if you have any suggestions lets say this is the html file 1...

5. Shell Programming and Scripting

Parsing HTML, get text between 2 HTML tags

Hi there, I'm quite new to the forum and shell scripting. I want to filter out the "166.0 points". The results, that i found in google / the forum search didn't helped me :( <a href="/user/test" class="headitem menu" style="color:rgb(83,186,224);">test</a><a href="/points" class="headitem...

6. Shell Programming and Scripting

Remove html tags with particular string inside the tags

Could someone, please provide a solution to the following: I would like to remove some tags from the "head" of multiple html documents across the web site. They look like <link rel="alternate" type="application/rss+xml" title="Business and Investment in the Philippines"...

7. Shell Programming and Scripting

Removing all except couple of html tags from html file

I tried to find elegant (or at least simple) way to remove all but couple of html tags from html file, but all examples I found dealt with removing all the tags. The logic of the script would be: - if there is <li> or <ul> on the line, do nothing (=write same line to output) - if there is:...

8. Homework & Coursework Questions

Script: Removing HTML tags and duplicate lines

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: You will write a script that will remove all HTML tags from an HTML document and remove any consecutive...

9. UNIX for Beginners Questions & Answers

Html - Removing transparency on tooltips

I want to use the tooltip in html, however the tranparency is creating problem for detailed tooltips as the text from the back interferes with the readability of the tooltip text. I have done the following changes, however the normal tooltip es still transparent I call it using <a...

LEARN ABOUT CENTOS

xml_pp

XML_PP(1)						User Contributed Perl Documentation						 XML_PP(1)

NAME

       xml_pp - xml pretty-printer

SYNOPSYS

       xml_pp [options] [<files>]

DESCRIPTION

       XML pretty printer using XML::Twig

OPTIONS

       -i[<extension>]
	   edits the file(s) in place, if an extension is provided (no space between "-i" and the extension) then the original file is backed-up
	   with that extension

	   The rules for the extension are the same as Perl's (see perldoc perlrun): if the extension includes no "*" then it is appended to the
	   original file name, If the extension does contain one or more "*" characters, then each "*" is replaced with the current filename.

       -s <style>
	   the style to use for pretty printing: none, nsgmls, nice, indented, record, or record_c (see XML::Twig docs for the exact description
	   of those styles), 'indented' by default

       -p <tag(s)>
	   preserves white spaces in tags. You can use several "-p" options or quote the tags if you need more than one

       -e <encoding>
	   use XML::Twig output_encoding (based on Text::Iconv or Unicode::Map8 and Unicode::String) to set the output encoding. By default the
	   original encoding is preserved.

	   If this option is used the XML declaration is updated (and created if there was none).

	   Make sure that the encoding is supported by the parser you use if you want to be able to process the pretty_printed file (XML::Parser
	   does not support 'latin1' for example, you have to use 'iso-8859-1')

       -l  loads the documents in memory instead of outputing them as they are being parsed.

	   This prevents a bug (see BUGS) but uses more memory

       -f <file>
	   read the list of files to process from <file>, one per line

       -v  verbose (list the current file being processed)

       --  stop argument processing (to process files that start with -)

       -h  display help

EXAMPLES

	 xml_pp foo.xml > foo_pp.xml	       # pretty print foo.xml
	 xml_pp < foo.xml > foo_pp.xml	       # pretty print from standard input

	 xml_pp -v -i.bak *.xml 	       # pretty print .xml files, with backups
	 xml_pp -v -i'orig_*' *.xml	       # backups are named orig_<filename>

	 xml_pp -i -p pre foo.xhtml	       # preserve spaces in pre tags

	 xml_pp -i.bak -p 'pre code' foo.xml   # preserve spaces in pre and code tags
	 xml_pp -i.bak -p pre -p code foo.xml  # same

	 xml_pp -i -s record mydb_export.xml   # pretty print using the record style

	 xml_pp -e utf8 -i foo.xml	       # output will be in utf8
	 xml_pp -e iso-8859-1 -i foo.xml       # output will be in iso-8859-1

	 xml_pp -v -i.bak -f lof	       # pretty print in place files from lof

	 xml_pp -- -i.xml		       # pretty print the -i.xml file

	 xml_pp -l foo.xml		       # loads the entire file in memory
					       # before pretty printing it

	 xml_pp -h			       # display help

BUGS

       Elements with mixed content that start with an embedded element get an extra 


	 <elt><b>b</b>toto<b>bold</b></elt>

       will be output as

	 <elt>
	   <b>b</b>toto<b>bold</b></elt>

       Using the "-l" option solves this bug (but uses more memory)

TODO

       update XML::Twig to use Encode with perl 5.8.0

AUTHOR

       Michel Rodriguez <mirod@xmltwig.com>

perl v5.16.3							    2012-11-14								 XML_PP(1)