Removal of HTML ASCII Codes from file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removal of HTML ASCII Codes from file
# 1  
Old 10-11-2011
Removal of HTML ASCII Codes from file

Hi all,

I have a file with extended ASCII codes in the description which needs to be removed.

List of extended ascii codes

"Œ", "œ", "Š", "š", "Ÿ", "ƒ", "-", "-", "‘",
"'", "‚", "“", "”", "„","†", "‡", "•",
"...", "‰", "€", "™"


Sample data:

Test Details-HAVE BEEN PUBLISHED on date 8/11
Please tag‘the notes and activate the pool'in all systems and reporting programs by 01/21
Select“new TRIP pool for sale.”Pick letter to be sent 6/26
Test - Obama w/ Sys.Admin•rights
Description would go hereŒClinton!
files documents by wikileaksšsearch

Using sed to remove the codes

sed -e 's/[”]*[“]*[-]*[']*[•]*[‘]*[Œ]*[š]/ /g' spcl_char.dat > spcl_char_2.dat

Test Details HAVE BEEN PUBLISHED on date 8/11
Please tag the notes and activate the pool in all systems and reporting programs by 01/ 1
Select new TRIP pool for sale. Pick letter to be sent 6/ 6
Test - Obama w/ Sys.Admin rights
Description would go here Clinton!
files documents by wikileaks search


The output has any occurrence of the number present in the description removed, however 26 has been converted to 6 and 21 to 1.

How can I strip/remove the extended ascii codes without the description being touched.


Thanks.

Last edited by btt3165; 10-11-2011 at 12:10 PM..
# 2  
Old 10-11-2011
You will find out that some special characters are very hard to delete.

Thus, you can work by reversing the logic and keeping the characters you do not want:
Code:
sed 's/[^a-zA-Z0-9]/ /g' File

Keep adding more characters inside of the brackets as you need them.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to remove unused html codes from the file using UNIX?

Hi All, We have a HTML source which will be processed using a informatica workflow. In between these two we have a Unix script which transforms the file. We are getting an error from past week in the informatica saying invalid format, because the file has unused html reference (0-8,14-31 etc)... (2 Replies)
Discussion started by: karthik adiga
2 Replies

2. Shell Programming and Scripting

Convert Hex to Ascii in a Ascii file

Hi All, I have an ascii file in which few columns are having hex values which i need to convert into ascii. Kindly suggest me what command can be used in unix shell scripting? Thanks in Advance (2 Replies)
Discussion started by: HemaV
2 Replies

3. Shell Programming and Scripting

Removal Extended ASCII using awk

Hi All, I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only. Thanks & Regads (14 Replies)
Discussion started by: tostay2003
14 Replies

4. Shell Programming and Scripting

String removal from file

Dear all From below mention input file I needed op file as show below. I am using below code but not worked. I/p file BSCBCH1 EXAL-1-4 WO* SMPS MAINS FAIL BSCBCH1 EXAL-1-5 WO* SMPS RECTIFIER FAIL BSCBCH1 EXAL-1-6 WO* SMPS MAJOR ALARM BSCBCH2 EXAL-1-10 WO* ... (5 Replies)
Discussion started by: jaydeep_sadaria
5 Replies

5. Shell Programming and Scripting

Removing all except couple of html tags from html file

I tried to find elegant (or at least simple) way to remove all but couple of html tags from html file, but all examples I found dealt with removing all the tags. The logic of the script would be: - if there is <li> or <ul> on the line, do nothing (=write same line to output) - if there is:... (0 Replies)
Discussion started by: juubuntu
0 Replies

6. Shell Programming and Scripting

HTML Codes for Shell Programs

Hi All, I need to use my script via html web page, already webserv running in my unix box... Please provide me the sample html files or any web site Thanks (2 Replies)
Discussion started by: l_gshankar24
2 Replies

7. Solaris

Removal of zip file permanently

Hi Everyone, I see some peculier thing happening on my server. I have one zipped file created long back as a normal user and trying to remove it now. When i tried to remove as that particular user, i was not able to do that. So i logged in as a root user and removed that successfully. But it... (8 Replies)
Discussion started by: Sricharan21
8 Replies

8. Shell Programming and Scripting

Help with removal of numericals in a file

I have a file from which I want to eliminate the numerical values.. The contents of the file are as shown below.. 1 a1,b,2 1,b,c 2 a2,4,b a,b,2 From the above file I want eliminate only the numberical values(except the line numbers which are at the beginning).. The file... (12 Replies)
Discussion started by: abk07
12 Replies

9. Shell Programming and Scripting

Removal of Duplicate Entries from the file

I have a file which consists of 1000 entries. Out of 1000 entries i have 500 Duplicate Entires. I want to remove the first Duplicate Entry (i,e entire Line) in the File. The example of the File is shown below: 8244100010143276|MARISOL CARO||MORALES|HSD768|CARR 430 KM 1.7 ... (1 Reply)
Discussion started by: ravi_rn
1 Replies

10. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
Login or Register to Ask a Question