Sponsored Content
Top Forums UNIX for Dummies Questions & Answers remove special and unicode characters Post 302262810 by shantanuo on Saturday 29th of November 2008 02:35:27 AM
Old 11-29-2008
remove special and unicode characters

Hi,
How do I remove the lines where special characters or Unicode characters appear?
The following query does work but I wonder if there is a better way.

cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.'

The following lines show that my query is incomplete.

Warning: The word "*Khan" is invalid. The character '*' (U+2A) may not appear at the beginning of a word. Skipping word.
Warning: The word "Khan]" is invalid. The character ']' (U+5D) may not appear at the end of a word. Skipping word.
Warning: The word "Khandewa;l" is invalid. The character ';' (U+3B) may not appear in the middle of a word. Skipping word.
Warning: The word "[khanna" is invalid. The character '[' (U+5B) may not appear at the beginning of a word. Skipping word.
Warning: The word "Khar**Closed" is invalid. The character '*' (U+2A) may not appear in the middle of a word. Skipping word.
Warning: The word "Khelani]" is invalid. The character ']' (U+5D) may not appear at the end of a word. Skipping word.
Warning: The word "Khwaja[physician]" is invalid. The character '[' (U+5B) may not appear in the middle of a word. Skipping w
ord.
Warning: The word "Kids@play" is invalid. The character '@' (U+40) may not appear in the middle of a word. Skipping word.
 

10 More Discussions You Might Find Interesting

1. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

2. UNIX for Dummies Questions & Answers

Remove directory that has special Characters

Hi All, I have a script written that creates a new directory within the shell program and if a parameter isn't passed in, it creates a strange directory name by mistake. So I have a directory like "-_12" and I am unable to remove it. I tried removing it using double quote and many others. I have... (12 Replies)
Discussion started by: datherriault
12 Replies

3. UNIX for Dummies Questions & Answers

How to Remove Special Characters

Dear Members, We have a file which contains some special characters. I need to replace these special character by a new line character(\n). The Special character is \x85. I am not sure what this character means and how we can remove it. Any inputs are greatly appreciated. Thanks... (5 Replies)
Discussion started by: sandeep_1105
5 Replies

4. UNIX for Dummies Questions & Answers

Files with special characters - how to remove

Hi, I have a directory that has a file which contained special characters in the filename. Can someone please advise how to remove the file, preferably with a rm -i ? Thanks in advance. Listing is as below: {oracle}> ls -1b bplog.bkup.001 bplog.bkup.002 bplog.bkup.003 bplog.bkup.004... (1 Reply)
Discussion started by: newbie_01
1 Replies

5. Shell Programming and Scripting

remove special characters

hello all I am writing a perl code and i wish to remove the special characters for text. I wish to remove all extended ascii characters. If the list of special characters is huge, how can i do this using substitute command s/specialcharacters/null/g I really want to code like... (3 Replies)
Discussion started by: vasuarjula
3 Replies

6. UNIX for Dummies Questions & Answers

Remove Unicode/special chars from XML

Hi, We are receiving an XML file in Unix which has some special characters between tags like '^' etc <Tag> 1e^O7f%<2304e.$d8f57e8^Bf-&e.^Zh7/327e^O7 </Tag> We need to remove all special characters like ^ ones and also any '&' or '<' or '>' being sent within the start and close tags i.e.... (6 Replies)
Discussion started by: dsrookie7
6 Replies

7. Shell Programming and Scripting

Remove the special characters from field

Hi, In source data few of columns are having special charates(like *) due to this i am not able to display the data into flat file.it's displaying the some of junk data into the flat file. source dataExample: Address1="XDERFTG * HYJUYTG" how to remove the special charates in a string (2 Replies)
Discussion started by: koti_rama
2 Replies

8. Shell Programming and Scripting

How to remove some special characters in a string?

Hi, I have string like this ="Lookup Procedure" But i want the output like this Lookup Procedure =," should be removed. Please suggest me the solution. Regards, Madhuri (2 Replies)
Discussion started by: srimadhuri
2 Replies

9. Shell Programming and Scripting

How to remove special characters?

Hi Gurus, I have file which contains some unicode charachator like "ü". I want to replace it with some charactors. I searched in internet and got command sed "s/ü/-/g", but I don't know how to type ü in unix command line. Please help me for this one. Thanks in advance (7 Replies)
Discussion started by: ken6503
7 Replies

10. Shell Programming and Scripting

Remove Special Characters Within Text

Hi, I have a "|" delimited file that is exported from a database. There is one column in the file which has description/comments entered by some application user. It has "Control-M" character and "New Line" character in between the text. Hence, when i export the data, this record with the new... (4 Replies)
Discussion started by: tarun.trehan
4 Replies
UNICODE(1)						      General Commands Manual							UNICODE(1)

NAME
unicode - command line unicode database query tool SYNOPSIS
unicode [options] string DESCRIPTION
This manual page documents the unicode command. unicode is a command line unicode database query tool. OPTIONS
-h --help Show help and exit. -x --hexadecimal Assume string to be a hexadecimal number -d --decimal Assume string to be a decimal number -r --regexp Assume string to be a regular expression -s --string Assume string to be a sequence of characters -a --auto Try to guess type of string from one of the above (default) -mMAXCOUNT --max=MAXCOUNT Maximal number of codepoints to display, default: 20; use 0 for unlimited -iCHARSET --io=IOCHARSET I/O character set. For maximal pleasure, run unicode on UTF-8 capable terminal and specify IOCHARSET to be UTF-8. unicode tries to guess this value from your locale, so with properly set up locale, you should not need to specify it. -cADDCHARSET --charset-add=ADDCHARSET Show hexadecimal reprezentation of displayed characters in this additional charset. -CUSE_COLOUR --colour=USE_COLOUR USE_COLOUR is one of on off auto --colour=on will use ANSI colour codes to colourise the output --colour=off won't use colours. --colour=auto will test if standard output is a tty, and use colours only when it is. --color is a synonym of --colour -v --verbose Be more verbose about displayed characters, e.g. display Unihan information, if available. -w --wikipedia Spawn browser pointing to Wikipedia entry about the character. USAGE
unicode tries to guess the type of an argument. For example, you can use any of the following to display information about U+00E1 LATIN SMALL LETTER A WITH ACUTE (a): unicode 00E1 unicode U+00E1 unicode a unicode 'latin small letter a with acute' You can specify a range of characters as argumets, unicode will show these characters in nice tabular format, aligned to 256-byte bound- aries. Use two dots ".." to indicate the range, e.g. unicode 0450..0520 will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF) unicode 0400.. will display just characters from U+0400 up to U+04FF BUGS
Tabular format does not deal well with full-width, combining, control and RTL characters. SEE ALSO
ascii(1) AUTHOR
Radovan Garabik <garabik @ kassiopeia.juls.savba.sk> 2003-01-31 UNICODE(1)
All times are GMT -4. The time now is 03:22 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy