How to remove special characters?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove special characters?
# 1  
Old 07-09-2013
How to remove special characters?

Hi Gurus,

I have file which contains some unicode charachator like "ü". I want to replace it with some charactors. I searched in internet and got command
Code:
 sed "s/ü/-/g"

, but I don't know how to type ü in unix command line.
Please help me for this one.

Thanks in advance
# 2  
Old 07-09-2013
Be very careful, do not make an assumption that it is a single byte is size.

For a quick assessment use this command to check:-

Code:
hexdump -C /full/path/to/your/filename

This is an example; I copied your character and put it into an editor:-

Code:
This is the _byte_ ü _end_.

Note the character is between two spaces...

Now using the above command:-

Code:
Last login: Tue Jul  9 18:46:23 on ttys000
AMIGA:barrywalker~> hexdump -C /Users/barrywalker/byte_test.txt
00000000  54 68 69 73 20 69 73 20  74 68 65 20 5f 62 79 74  |This is the _byt|
00000010  65 5f 20 c3 bc 20 5f 65  6e 64 5f 2e 0a           |e_ .. _end_..|
0000001d
AMIGA:barrywalker~>

Note that at position 00000013 and 00000014 the bytes c3 and bc have appeared instead of the single character you are expecting...

So be very, very careful...

Hope this helps...
This User Gave Thanks to wisecracker For This Post:
# 3  
Old 07-09-2013
Quote:
Originally Posted by wisecracker
Be very careful, do not make an assumption that it is a single byte is size.

For a quick assessment use this command to check:-

Code:
hexdump -C /full/path/to/your/filename

This is an example; I copied your character and put it into an editor:-

Code:
This is the _byte_ ü _end_.

Note the character is between two spaces...

Now using the above command:-

Code:
Last login: Tue Jul  9 18:46:23 on ttys000
AMIGA:barrywalker~> hexdump -C /Users/barrywalker/byte_test.txt
00000000  54 68 69 73 20 69 73 20  74 68 65 20 5f 62 79 74  |This is the _byt|
00000010  65 5f 20 c3 bc 20 5f 65  6e 64 5f 2e 0a           |e_ .. _end_..|
0000001d
AMIGA:barrywalker~>

Note that at position 00000013 and 00000014 the bytes c3 and bc have appeared instead of the single character you are expecting...

So be very, very careful...

Hope this helps...
Thanks for your quick reply.
I run following command and got some result.
Code:
 # echo 'ADDÜL' |hexdump -C
00000000  41 44 44 dc 4c 0a                                 |ADD.L.|
00000006

Actually, I was run following command to split the file with one line to separate lines. when it hits the charactor Ü, it stopped.
what should I do to make the command to spearate file without stop
Code:
awk -v L="$2" '{for (i=1; i<=length($0); i+=L) print substr($0, i, L)}' "$1" > "$1"_split

Thanks in advance
# 4  
Old 07-09-2013
I get a completely different result to you using the same command AFTER copying and pasting:-

Code:
Last login: Tue Jul  9 19:07:11 on ttys000
AMIGA:barrywalker~> echo 'ADDÜL' |hexdump -C
00000000  41 44 44 c3 9c 4c 0a                              |ADD..L.|
00000007
AMIGA:barrywalker~>

Notice I have 2 binary values, c3 and 9c, so it makes for a difficult cure...

However, IF, you can guarantee a single constant byte value you could try a derivative of this idea out:-

https://www.unix.com/shell-programmin...ipulation.html
# 5  
Old 07-09-2013
Quote:
Originally Posted by ken6503
Hi Gurus,

I have file which contains some unicode charachator like "ü". I want to replace it with some charactors. I searched in internet and got command
Code:
 sed "s/ü/-/g"

, but I don't know how to type ü in unix command line.
Please help me for this one.

Thanks in advance
I guess I don't understand the problem. You have created the command line you want to use above. Why can't you just copy it and paste it into your shell? Or copy it and paste it into a shell script using your editor?

The way you type unicode characters using a keyboard will vary depending on your operating system, your keyboard, and your current locale settings, but as long as your current locale and the character you're copying are both using the same underlying codeset, copy and paste should work.

Note that on most UNIX and Linux systems there won't be a locale that uses Unicode as the underlying codeset, but there are probably several that use UTF-8 (which is a multi-byte codeset that can encode any Unicode character).

Note that whether ü is a single-byte character (as it is in some EBCDIC code page variants and some ISO 8859-* codesets) or a multi-byte character (as it is in UTF-8) shouldn't matter to sed. The sed utility operates on characters; not bytes. You just need to be sure that the locale you're using when running sed is using a codeset with the same encoding for ü as the encoding used in the file you're editing.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 07-09-2013
Also in post 1 this character:-
Code:
ü

Is not the same as this character in post 3:-
Code:
Ü

So which is it?

This could be part of your problem...
This User Gave Thanks to wisecracker For This Post:
# 7  
Old 07-09-2013
Quote:
Originally Posted by wisecracker
Also in post 1 this character:-
Code:
ü

Is not the same as this character in post 3:-
Code:
Ü

So which is it?

This could be part of your problem...
It is
Code:
Ü

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove some special characters in a string?

Hi, I have string like this ="Lookup Procedure" But i want the output like this Lookup Procedure =," should be removed. Please suggest me the solution. Regards, Madhuri (2 Replies)
Discussion started by: srimadhuri
2 Replies

2. Shell Programming and Scripting

Sed - remove special characters

Hi, I have a file with this line, it's always in the first line: I want to remove these special characters: ´╗┐ file1 ´╗┐\\bar\c$\test2\;3.348.118 Bytes;160 ;3 \\bar\c$\test\;35 Bytes;2 ;1 I want the same file to be only \\bar\c$\test2\;3.348.118 Bytes;160 ;3 \\bar\c$\test\;35... (4 Replies)
Discussion started by: nakaedu
4 Replies

3. Shell Programming and Scripting

Remove the special characters from field

Hi, In source data few of columns are having special charates(like *) due to this i am not able to display the data into flat file.it's displaying the some of junk data into the flat file. source dataExample: Address1="XDERFTG * HYJUYTG" how to remove the special charates in a string (2 Replies)
Discussion started by: koti_rama
2 Replies

4. Shell Programming and Scripting

Remove string between two special characters

Hi All, I have a variable like AVAIL="\ BACK:bkpstg:testdb3.iad.expertcity.com:backtest|\ #AUTH:authstg:testdb3.iad.expertcity.com:authiapd|\ TEST:authstg:testdb3.iad.expertcity.com:authiapd|\ " What I want to do here is that If a find # before any entry, remove the entire string... (5 Replies)
Discussion started by: engineermayur
5 Replies

5. Shell Programming and Scripting

remove special characters

hello all I am writing a perl code and i wish to remove the special characters for text. I wish to remove all extended ascii characters. If the list of special characters is huge, how can i do this using substitute command s/specialcharacters/null/g I really want to code like... (3 Replies)
Discussion started by: vasuarjula
3 Replies

6. UNIX for Dummies Questions & Answers

Files with special characters - how to remove

Hi, I have a directory that has a file which contained special characters in the filename. Can someone please advise how to remove the file, preferably with a rm -i ? Thanks in advance. Listing is as below: {oracle}> ls -1b bplog.bkup.001 bplog.bkup.002 bplog.bkup.003 bplog.bkup.004... (1 Reply)
Discussion started by: newbie_01
1 Replies

7. UNIX for Dummies Questions & Answers

How to Remove Special Characters

Dear Members, We have a file which contains some special characters. I need to replace these special character by a new line character(\n). The Special character is \x85. I am not sure what this character means and how we can remove it. Any inputs are greatly appreciated. Thanks... (5 Replies)
Discussion started by: sandeep_1105
5 Replies

8. Shell Programming and Scripting

How to remove special characters from each line?

Hello, Is there a simpler way to remove special characters (color codes) from each lines in a log file? I use sed like in the example below but I think there should be a more simple way to achieve the same result: $ cat -vet file1 ^, , , , Maybe to convert the file somehow? ... (5 Replies)
Discussion started by: majormark
5 Replies

9. Shell Programming and Scripting

Remove special characters from string

Hi there, I'd like to write a script that removes any set of character from any string. The first argument would be the string, the second argument would be the characters to remove. For example: $ myscript "My name's Santiago. What's yours?" "atu" My nme's Snigo. Wh's yors? I wrote the... (11 Replies)
Discussion started by: chebarbudo
11 Replies

10. UNIX for Dummies Questions & Answers

remove special and unicode characters

Hi, How do I remove the lines where special characters or Unicode characters appear? The following query does work but I wonder if there is a better way. cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.' The following lines show that my query is incomplete. Warning: The word "*Khan" is... (1 Reply)
Discussion started by: shantanuo
1 Replies
Login or Register to Ask a Question