Sponsored Content
Top Forums Shell Programming and Scripting replace UTF-8 characters with tr Post 302209131 by ripat on Thursday 26th of June 2008 02:53:05 AM
Old 06-26-2008
replace UTF-8 characters with tr

Hi,

I try to get tr to replace multibytes characters by ascii equivalent. For example

"Je vais à l'école" ---> 'Je vais a l'ecole"

But my version of tr (5.97) doesn't seem to support multibyte sets.
Code:
$ locale charmap; echo "Je vais à l'école" | tr éà ea
UTF-8
Je vais aa l'aacole

I try to avoid using multibyte friendly tools like perl or python as I want my script to work on platforms that don't have these tools. Sed could do the job with something like:
Code:
$ sed 's/[àâä]/a/g; s/[ÀÂÄ]/A/g; s/[éèêë]/e/g; s/[ÉÈÊË]/E/g; s/[îï]/i/g; s/[ÎÏ]/I/g'

but I find it rather clumsy and less elegant than tr.

I have also tried iconv to no avail.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Replace Characters...

In a file, How do I replace a set number of characters in each line? For example.... substitute the first 54 characters of each line with mv? Thanks! Lisa (8 Replies)
Discussion started by: lgardner17325
8 Replies

2. Shell Programming and Scripting

Want to replace characters

Hi I have searched for a way to replace odd characters in a FOLDER NAME. All search-and-replace issues I have seen, only involves how to make search-and-replace on a FILE och with TEXT INSIDE a FILE. My problem is with the FOLDER NAME. My case is this: I have a couple of persons that every... (5 Replies)
Discussion started by: arndorff
5 Replies

3. Shell Programming and Scripting

Header Replace characters

Hi, I have a flat file with header with tab delimiter. nbr id name salesid detail num source num jun_2007 jul_2007 aug_2007 sep_2007 ....feb_2008 I need to modify the header for the columns nbr to Id1 jun_2007 to Jun07 jul_2007 to Jul07 aug_2007 to Aug07 sep_2007 to Sep07... (3 Replies)
Discussion started by: umathurumella
3 Replies

4. HP-UX

utf-8, problem with special characters

Hi all, We are facing the following problem in our HP-UX machine: software that manipulates utf-8 encoded strings (e.g. during string cut), fails to correctly manipulate strings (all containing Greek characters) that contain special characters like @, &, # etc. Actually, in different... (3 Replies)
Discussion started by: alina
3 Replies

5. Shell Programming and Scripting

How to replace characters with random characters

I've got a file (numbers.txt) filled with numbers and I want to replace each one of those numbers with a new random number between 0 and 9. This is my script so far: #!/bin/bash rand=$(($RANDOM % 9)) sed -i s//$rand/g numbers.txtThe problem that I have is that it replaces each number with just... (2 Replies)
Discussion started by: hellocatfood
2 Replies

6. Shell Programming and Scripting

how to replace characters using tr

Hi, I have a file which includes some French Characters and I want to change them to other characters like À to &Agrave; Â to &Acirc; É to &Eacute; ..... ..... and so on. I am tyring to use tr command like tr ÀÂÉ &Agrave;&Acirc;&Eacute; < input file But it does not work. Only... (2 Replies)
Discussion started by: naveed
2 Replies

7. Shell Programming and Scripting

Replace special characters with Escape characters?

i need to replace the any special characters with escape characters like below. test!=123-> test\!\=123 !@#$%^&*()-= to be replaced by \!\@\#\$\%\^\&\*\(\)\-\= (8 Replies)
Discussion started by: laknar
8 Replies

8. Linux

Help to Convert file from UNIX UTF-8 to Windows UTF-16

Hi, I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt and i am getting some chinese characters as below which l opened the converted file on windows machine. LANG=en_US.UTF-8... (3 Replies)
Discussion started by: phanidhar6039
3 Replies

9. Shell Programming and Scripting

Convert UTF-8 file to ASCII/ISO8859-1 OR replace characters

I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies

10. Shell Programming and Scripting

Replace characters between $ and . with .

Hi - I have below in put to demo.txt /test/xyz/ibcdownload.jsp /test/xyz/pvxprogramtreeovermain.jsp /test/xyz/jtfrsrsr$HtmlTag.jsp /test/xyz/csdronumlov.jsp /test/xyz/iecvaluereset.jsp /test/xyz/ibecumpassignrole.jsp /test/xyz/ozfoffermarketmain.jsp output should be... (4 Replies)
Discussion started by: oraclermanpt
4 Replies
Locale::Codes::LangExt(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangExt(3pm)

NAME
Locale::Codes::LangExt - standard codes for language extension identification SYNOPSIS
use Locale::Codes::LangExt; $lext = code2langext('acm'); # $lext gets 'Mesopotamian Arabic' $code = langext2code('Mesopotamian Arabic'); # $code gets 'acm' @codes = all_langext_codes(); @names = all_langext_names(); DESCRIPTION
The "Locale::Codes::LangExt" module provides access to standard codes used for identifying language extensions, such as those as defined in the IANA language registry. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language registry codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language extensions. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lext = code2langext('acm','alpha'); $lext = code2langext('acm',LOCALE_LANGEXT_ALPHA); The codesets currently supported are: alpha This is the set of three-letter (lowercase) codes from the IANA language registry, such as 'acm' for Mesopotamian Arabic. This is the default code set. ROUTINES
code2langext ( CODE [,CODESET] ) langext2code ( NAME [,CODESET] ) langext_code2code ( CODE ,CODESET ,CODESET2 ) all_langext_codes ( [CODESET] ) all_langext_names ( [CODESET] ) Locale::Codes::LangExt::rename_langext ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangExt::add_langext ( CODE ,NAME [,CODESET] ) Locale::Codes::LangExt::delete_langext ( CODE [,CODESET] ) Locale::Codes::LangExt::add_langext_alias ( NAME ,NEW_NAME ) Locale::Codes::LangExt::delete_langext_alias ( NAME ) Locale::Codes::LangExt::rename_langext_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangExt::add_langext_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangExt::delete_langext_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.iana.org/assignments/language-subtag-registry The IANA language subtag registry. AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2012 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.16.2 2012-10-11 Locale::Codes::LangExt(3pm)
All times are GMT -4. The time now is 10:53 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy