removing html format with sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting removing html format with sed
# 1  
Old 02-26-2012
removing html format with sed

Hello i am trying to remove the html format from the file using sed. for example remove <p> </p>

i tried to do this : sed -e 's/<[^>]*>//g' test > test.t
but still i have some html format . please help if you have any suggestions

lets say this is the html file

PHP Code:
      1 tructions><![CDATA[<class='exerciseMeta'>
      
2                Write a statement that prints
      3        
<font color="#006600">
      
4        <b>
      
5       Hello World
      6        
</b>
      
7        </font>
      
8        to the screen.
      
9        </P>
     
10        
     11                       
]]></Instructions>
     
12          <Instructions><![CDATA[
     
13       <class='exerciseMeta'>
     
14               Write a complete program that prints <span class='code'>Hello World</spanto the screen.
     
15       </P>
     
16        
     17                       
]]></Instructions>
     
18           <Instructions><![CDATA[
     
19       <class='exerciseMeta'>
     
20              Suppose your name was <span class='code'>Alan Turing</span>.
     
21               Write a statement that would print your last namefollowed by a commafollowed by your
     22               first name
. Do not print anything else (that includes blanks).
     
23       </P>
     
24        
     25                       
]]></Instructions>
     
26          <Instructions><![CDATA[
     
27       <class='exerciseMeta'>
     
28               Suppose your name was <span class='code'>George Gershwin</span>.
     
29               Write a complete program that would print your last namefollowed by a commafollowed by your
     30               first name
. Do not print anything else (that includes blanks).
     
31       </P
# 2  
Old 02-26-2012
How about this:

Code:
#! /bin/sed -f
# Delete html tags
# i.e. everything between < and >
:t
s|\(.*\)<.*>|\1|
tt
/</!b
N
bt

Output for you example text is:

Code:
 tructions>
          
           
          <![CDATA[
       
               Suppose your name was George Gershwin.
               Write a complete program that would print your last name, followed by a comma, followed by your
               first name. Do not print anything else (that includes blanks).

# 3  
Old 02-27-2012
can you please show the output what you want.
# 4  
Old 02-27-2012
Quote:
Originally Posted by parthmittal2007
can you please show the output what you want.
the output want is :
PHP Code:
               Write a statement that prints
       
       
      Hello World
       
       
       to the screen
.
       
      
     
      
              
Write a complete program that prints Hello World to the screen.
      
      
      
             
Suppose your name was Alan Turing.
              
Write a statement that would print your last namefollowed by a commafollowed by your
              first name
. Do not print anything else that includes blanks.
      
      
       
              
Suppose your name was George Gershwin.
              
Write a complete program that would print your last namefollowed by a commafollowed by your
              first name
. Do not print anything else that includes blanks

i also tried doing this but it doesnt work
sed -e 's/<[^>]*>//g' | sed 's/![^!]*!//g' | head test > test.t1
# 5  
Old 02-27-2012
Code:
perl -pe 's/\s*[><\[\]].*[><\[\]]\s*//g' inputfile

# 6  
Old 02-27-2012
Quote:
Originally Posted by balajesuri
Code:
perl -pe 's/\s*[><\[\]].*[><\[\]]\s*//g' inputfile

Thanks it works but how can i do that in sed
# 7  
Old 02-27-2012
using awk, works for your example data

Code:
$ nawk -F"[<>]" '{for(i=1;i<=NF && i%2!=0;i++){gsub("[\\\])(]","",$i);print $i}}' test.txt                                                          
              Write a statement that prints
      
      
     Hello World
      
      
      to the screen.
      
       
                      
         
      
              Write a complete program that prints 
      
       
                      
          
      
             Suppose your name was 
              Write a statement that would print your last name, followed by a comma, followed by your
              first name. Do not print anything else that includes blanks.
      
       
                      
         
      
              Suppose your name was 
              Write a complete program that would print your last name, followed by a comma, followed by your
              first name. Do not print anything else that includes blanks.

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Html - Removing transparency on tooltips

I want to use the tooltip in html, however the tranparency is creating problem for detailed tooltips as the text from the back interferes with the readability of the tooltip text. I have done the following changes, however the normal tooltip es still transparent I call it using <a... (3 Replies)
Discussion started by: kristinu
3 Replies

2. Shell Programming and Scripting

Html format not working

Hi All, I have a written a script which sents the output in html format and displays it in the foreground. But for some reason it is displaying in raw html format in outlook 2013. What could be the reason. I am pasting the script as below:- $ cat script.sh #!/bin/bash .... (4 Replies)
Discussion started by: Arun_p
4 Replies

3. Shell Programming and Scripting

Html output in correct format

Hi, I am running two scripts as below. In Script 1 i am getting correct output in proper HTML format while in script 2 i am not getting output in mail and only html code is getting printed.I want to get the output of script 2. Please guide. 1.IFILE=/home/home01/Report.csv if #Checks... (7 Replies)
Discussion started by: Vivekit82
7 Replies

4. Homework & Coursework Questions

Script: Removing HTML tags and duplicate lines

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: You will write a script that will remove all HTML tags from an HTML document and remove any consecutive... (3 Replies)
Discussion started by: tburns517
3 Replies

5. Shell Programming and Scripting

Removing all except couple of html tags from html file

I tried to find elegant (or at least simple) way to remove all but couple of html tags from html file, but all examples I found dealt with removing all the tags. The logic of the script would be: - if there is <li> or <ul> on the line, do nothing (=write same line to output) - if there is:... (0 Replies)
Discussion started by: juubuntu
0 Replies

6. Shell Programming and Scripting

Removing html tags

I store different variance of the below in an xml file. and apparently, xml has an issue loading up data like this because it contains html tags. i would like to preserve this data as it is, but unfortunately, xml says i cant. so i have to strip out all the html tags. the examples i found... (9 Replies)
Discussion started by: SkySmart
9 Replies

7. UNIX for Advanced & Expert Users

Removing HTML tags

Hello Unix Gurus I am having a problem with one of the files that i am generating using a Unix Script. This Unix Scripts connects to the MY SQL Server and loads the data into a Text file. While generating the Text file for one of the tables the value in one of the column is as follows. <p>... (3 Replies)
Discussion started by: chetan.mudike
3 Replies

8. Shell Programming and Scripting

searching & replacing/removing only certain HTML tags

I generally save a lot of web pages for reading offline which works out great for school. Now I have to spend a lot of time on the bus and I am looking for the best way to read some of these webpages using my Nokia 7610. I have uploaded the files to my phone, but they are deadly deadly slow to... (2 Replies)
Discussion started by: naphelge
2 Replies

9. Shell Programming and Scripting

removing html tags via parameter expansion

Hi all- I have a variable that contains a web page: echo $STUFF <html> <head> <title>my page</title></head> <body> blah blah etc.. Can I use the shell's parameter expansion abilities to remove just the tags? I thought that FIXHTML=${STUFF//<*>/} might do it, but it didn't seem to... (2 Replies)
Discussion started by: rev66
2 Replies
Login or Register to Ask a Question