replace UTF-8 characters with tr


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting replace UTF-8 characters with tr
# 1  
Old 06-26-2008
replace UTF-8 characters with tr

Hi,

I try to get tr to replace multibytes characters by ascii equivalent. For example

"Je vais à l'école" ---> 'Je vais a l'ecole"

But my version of tr (5.97) doesn't seem to support multibyte sets.
Code:
$ locale charmap; echo "Je vais à l'école" | tr éà ea
UTF-8
Je vais aa l'aacole

I try to avoid using multibyte friendly tools like perl or python as I want my script to work on platforms that don't have these tools. Sed could do the job with something like:
Code:
$ sed 's/[àâä]/a/g; s/[ÀÂÄ]/A/g; s/[éèêë]/e/g; s/[ÉÈÊË]/E/g; s/[îï]/i/g; s/[ÎÏ]/I/g'

but I find it rather clumsy and less elegant than tr.

I have also tried iconv to no avail.
# 2  
Old 06-26-2008
Maybe you can use the POSIX "Equivalence classes" e.g. [wiki]=e=[/wiki] match any of e, è, or é.
It should work with awk not with sed. Hope this helps.

Regards
# 3  
Old 06-26-2008
Thanks but I could'nt get awk to do the job:

Code:
$ echo "Je vais à l'école" | awk '{gsub(/[[=e=]]/, "*")}1'
J* vais à l'écol*

Despite what the man says:
Quote:
Equivalence Classes
An equivalence class is a locale-specific name for a list of characters that are equivalent. The name is
enclosed in [= and =]. For example, the name e might be used to represent all of “e,” “´,” and “`.” In
this case, [[=e=]] is a regular expression that matches any of e, ´, or `.

These features are very valuable in non-English speaking locales. The library functions that gawk uses for regu‐
lar expression matching currently only recognize POSIX character classes; they do not recognize collating symbols
or equivalence classes.
I have a little hope with tr but I will have to wait:
Quote:
Currently `tr' fully supports only single-byte characters.
Eventually it will support multibyte characters; when it does, the `-C'
option will cause it to complement the set of characters, whereas `-c'
will cause it to complement the set of values.
Eventually...

All that noise about i18n Linux is only buzz and I am not good enough in C to change the source code of tr.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace characters between $ and . with .

Hi - I have below in put to demo.txt /test/xyz/ibcdownload.jsp /test/xyz/pvxprogramtreeovermain.jsp /test/xyz/jtfrsrsr$HtmlTag.jsp /test/xyz/csdronumlov.jsp /test/xyz/iecvaluereset.jsp /test/xyz/ibecumpassignrole.jsp /test/xyz/ozfoffermarketmain.jsp output should be... (4 Replies)
Discussion started by: oraclermanpt
4 Replies

2. Shell Programming and Scripting

Convert UTF-8 file to ASCII/ISO8859-1 OR replace characters

I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies

3. Linux

Help to Convert file from UNIX UTF-8 to Windows UTF-16

Hi, I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt and i am getting some chinese characters as below which l opened the converted file on windows machine. LANG=en_US.UTF-8... (3 Replies)
Discussion started by: phanidhar6039
3 Replies

4. Shell Programming and Scripting

Replace special characters with Escape characters?

i need to replace the any special characters with escape characters like below. test!=123-> test\!\=123 !@#$%^&*()-= to be replaced by \!\@\#\$\%\^\&\*\(\)\-\= (8 Replies)
Discussion started by: laknar
8 Replies

5. Shell Programming and Scripting

how to replace characters using tr

Hi, I have a file which includes some French Characters and I want to change them to other characters like À to &Agrave; Â to &Acirc; É to &Eacute; ..... ..... and so on. I am tyring to use tr command like tr ÀÂÉ &Agrave;&Acirc;&Eacute; < input file But it does not work. Only... (2 Replies)
Discussion started by: naveed
2 Replies

6. Shell Programming and Scripting

How to replace characters with random characters

I've got a file (numbers.txt) filled with numbers and I want to replace each one of those numbers with a new random number between 0 and 9. This is my script so far: #!/bin/bash rand=$(($RANDOM % 9)) sed -i s//$rand/g numbers.txtThe problem that I have is that it replaces each number with just... (2 Replies)
Discussion started by: hellocatfood
2 Replies

7. HP-UX

utf-8, problem with special characters

Hi all, We are facing the following problem in our HP-UX machine: software that manipulates utf-8 encoded strings (e.g. during string cut), fails to correctly manipulate strings (all containing Greek characters) that contain special characters like @, &, # etc. Actually, in different... (3 Replies)
Discussion started by: alina
3 Replies

8. Shell Programming and Scripting

Header Replace characters

Hi, I have a flat file with header with tab delimiter. nbr id name salesid detail num source num jun_2007 jul_2007 aug_2007 sep_2007 ....feb_2008 I need to modify the header for the columns nbr to Id1 jun_2007 to Jun07 jul_2007 to Jul07 aug_2007 to Aug07 sep_2007 to Sep07... (3 Replies)
Discussion started by: umathurumella
3 Replies

9. Shell Programming and Scripting

Want to replace characters

Hi I have searched for a way to replace odd characters in a FOLDER NAME. All search-and-replace issues I have seen, only involves how to make search-and-replace on a FILE och with TEXT INSIDE a FILE. My problem is with the FOLDER NAME. My case is this: I have a couple of persons that every... (5 Replies)
Discussion started by: arndorff
5 Replies

10. UNIX for Dummies Questions & Answers

Replace Characters...

In a file, How do I replace a set number of characters in each line? For example.... substitute the first 54 characters of each line with mv? Thanks! Lisa (8 Replies)
Discussion started by: lgardner17325
8 Replies
Login or Register to Ask a Question