remove special and unicode characters


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers remove special and unicode characters
# 1  
Old 11-29-2008
remove special and unicode characters

Hi,
How do I remove the lines where special characters or Unicode characters appear?
The following query does work but I wonder if there is a better way.

cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.'

The following lines show that my query is incomplete.

Warning: The word "*Khan" is invalid. The character '*' (U+2A) may not appear at the beginning of a word. Skipping word.
Warning: The word "Khan]" is invalid. The character ']' (U+5D) may not appear at the end of a word. Skipping word.
Warning: The word "Khandewa;l" is invalid. The character ';' (U+3B) may not appear in the middle of a word. Skipping word.
Warning: The word "[khanna" is invalid. The character '[' (U+5B) may not appear at the beginning of a word. Skipping word.
Warning: The word "Khar**Closed" is invalid. The character '*' (U+2A) may not appear in the middle of a word. Skipping word.
Warning: The word "Khelani]" is invalid. The character ']' (U+5D) may not appear at the end of a word. Skipping word.
Warning: The word "Khwaja[physician]" is invalid. The character '[' (U+5B) may not appear in the middle of a word. Skipping w
ord.
Warning: The word "Kids@play" is invalid. The character '@' (U+40) may not appear in the middle of a word. Skipping word.
# 2  
Old 12-05-2008
To eliminate all but printable characters:
Code:
grep -v '[^[:print]]' test.txt

There's also [:punct:] which are all punctuation characters. Or you can search for anything that doesn't have a normal letter or number:
Code:
grep -v '[^A-Za-z0-9]'

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove Special Characters Within Text

Hi, I have a "|" delimited file that is exported from a database. There is one column in the file which has description/comments entered by some application user. It has "Control-M" character and "New Line" character in between the text. Hence, when i export the data, this record with the new... (4 Replies)
Discussion started by: tarun.trehan
4 Replies

2. Shell Programming and Scripting

How to remove special characters?

Hi Gurus, I have file which contains some unicode charachator like "ü". I want to replace it with some charactors. I searched in internet and got command sed "s/ü/-/g", but I don't know how to type ü in unix command line. Please help me for this one. Thanks in advance (7 Replies)
Discussion started by: ken6503
7 Replies

3. Shell Programming and Scripting

How to remove some special characters in a string?

Hi, I have string like this ="Lookup Procedure" But i want the output like this Lookup Procedure =," should be removed. Please suggest me the solution. Regards, Madhuri (2 Replies)
Discussion started by: srimadhuri
2 Replies

4. Shell Programming and Scripting

Remove the special characters from field

Hi, In source data few of columns are having special charates(like *) due to this i am not able to display the data into flat file.it's displaying the some of junk data into the flat file. source dataExample: Address1="XDERFTG * HYJUYTG" how to remove the special charates in a string (2 Replies)
Discussion started by: koti_rama
2 Replies

5. UNIX for Dummies Questions & Answers

Remove Unicode/special chars from XML

Hi, We are receiving an XML file in Unix which has some special characters between tags like '^' etc <Tag> 1e^O7f%<2304e.$d8f57e8^Bf-&e.^Zh7/327e^O7 </Tag> We need to remove all special characters like ^ ones and also any '&' or '<' or '>' being sent within the start and close tags i.e.... (6 Replies)
Discussion started by: dsrookie7
6 Replies

6. Shell Programming and Scripting

remove special characters

hello all I am writing a perl code and i wish to remove the special characters for text. I wish to remove all extended ascii characters. If the list of special characters is huge, how can i do this using substitute command s/specialcharacters/null/g I really want to code like... (3 Replies)
Discussion started by: vasuarjula
3 Replies

7. UNIX for Dummies Questions & Answers

Files with special characters - how to remove

Hi, I have a directory that has a file which contained special characters in the filename. Can someone please advise how to remove the file, preferably with a rm -i ? Thanks in advance. Listing is as below: {oracle}> ls -1b bplog.bkup.001 bplog.bkup.002 bplog.bkup.003 bplog.bkup.004... (1 Reply)
Discussion started by: newbie_01
1 Replies

8. UNIX for Dummies Questions & Answers

How to Remove Special Characters

Dear Members, We have a file which contains some special characters. I need to replace these special character by a new line character(\n). The Special character is \x85. I am not sure what this character means and how we can remove it. Any inputs are greatly appreciated. Thanks... (5 Replies)
Discussion started by: sandeep_1105
5 Replies

9. UNIX for Dummies Questions & Answers

Remove directory that has special Characters

Hi All, I have a script written that creates a new directory within the shell program and if a parameter isn't passed in, it creates a strange directory name by mistake. So I have a directory like "-_12" and I am unable to remove it. I tried removing it using double quote and many others. I have... (12 Replies)
Discussion started by: datherriault
12 Replies

10. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies
Login or Register to Ask a Question