Sponsored Content
Top Forums Shell Programming and Scripting replace UTF-8 characters with tr Post 302209131 by ripat on Thursday 26th of June 2008 02:53:05 AM
Old 06-26-2008
replace UTF-8 characters with tr

Hi,

I try to get tr to replace multibytes characters by ascii equivalent. For example

"Je vais à l'école" ---> 'Je vais a l'ecole"

But my version of tr (5.97) doesn't seem to support multibyte sets.
Code:
$ locale charmap; echo "Je vais à l'école" | tr éà ea
UTF-8
Je vais aa l'aacole

I try to avoid using multibyte friendly tools like perl or python as I want my script to work on platforms that don't have these tools. Sed could do the job with something like:
Code:
$ sed 's/[àâä]/a/g; s/[ÀÂÄ]/A/g; s/[éèêë]/e/g; s/[ÉÈÊË]/E/g; s/[îï]/i/g; s/[ÎÏ]/I/g'

but I find it rather clumsy and less elegant than tr.

I have also tried iconv to no avail.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Replace Characters...

In a file, How do I replace a set number of characters in each line? For example.... substitute the first 54 characters of each line with mv? Thanks! Lisa (8 Replies)
Discussion started by: lgardner17325
8 Replies

2. Shell Programming and Scripting

Want to replace characters

Hi I have searched for a way to replace odd characters in a FOLDER NAME. All search-and-replace issues I have seen, only involves how to make search-and-replace on a FILE och with TEXT INSIDE a FILE. My problem is with the FOLDER NAME. My case is this: I have a couple of persons that every... (5 Replies)
Discussion started by: arndorff
5 Replies

3. Shell Programming and Scripting

Header Replace characters

Hi, I have a flat file with header with tab delimiter. nbr id name salesid detail num source num jun_2007 jul_2007 aug_2007 sep_2007 ....feb_2008 I need to modify the header for the columns nbr to Id1 jun_2007 to Jun07 jul_2007 to Jul07 aug_2007 to Aug07 sep_2007 to Sep07... (3 Replies)
Discussion started by: umathurumella
3 Replies

4. HP-UX

utf-8, problem with special characters

Hi all, We are facing the following problem in our HP-UX machine: software that manipulates utf-8 encoded strings (e.g. during string cut), fails to correctly manipulate strings (all containing Greek characters) that contain special characters like @, &, # etc. Actually, in different... (3 Replies)
Discussion started by: alina
3 Replies

5. Shell Programming and Scripting

How to replace characters with random characters

I've got a file (numbers.txt) filled with numbers and I want to replace each one of those numbers with a new random number between 0 and 9. This is my script so far: #!/bin/bash rand=$(($RANDOM % 9)) sed -i s//$rand/g numbers.txtThe problem that I have is that it replaces each number with just... (2 Replies)
Discussion started by: hellocatfood
2 Replies

6. Shell Programming and Scripting

how to replace characters using tr

Hi, I have a file which includes some French Characters and I want to change them to other characters like À to &Agrave; Â to &Acirc; É to &Eacute; ..... ..... and so on. I am tyring to use tr command like tr ÀÂÉ &Agrave;&Acirc;&Eacute; < input file But it does not work. Only... (2 Replies)
Discussion started by: naveed
2 Replies

7. Shell Programming and Scripting

Replace special characters with Escape characters?

i need to replace the any special characters with escape characters like below. test!=123-> test\!\=123 !@#$%^&*()-= to be replaced by \!\@\#\$\%\^\&\*\(\)\-\= (8 Replies)
Discussion started by: laknar
8 Replies

8. Linux

Help to Convert file from UNIX UTF-8 to Windows UTF-16

Hi, I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt and i am getting some chinese characters as below which l opened the converted file on windows machine. LANG=en_US.UTF-8... (3 Replies)
Discussion started by: phanidhar6039
3 Replies

9. Shell Programming and Scripting

Convert UTF-8 file to ASCII/ISO8859-1 OR replace characters

I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies

10. Shell Programming and Scripting

Replace characters between $ and . with .

Hi - I have below in put to demo.txt /test/xyz/ibcdownload.jsp /test/xyz/pvxprogramtreeovermain.jsp /test/xyz/jtfrsrsr$HtmlTag.jsp /test/xyz/csdronumlov.jsp /test/xyz/iecvaluereset.jsp /test/xyz/ibecumpassignrole.jsp /test/xyz/ozfoffermarketmain.jsp output should be... (4 Replies)
Discussion started by: oraclermanpt
4 Replies
Locale::Codes::LangFam(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangFam(3pm)

NAME
Locale::Codes::LangFam - standard codes for language extension identification SYNOPSIS
use Locale::Codes::LangFam; $lext = code2langfam('apa'); # $lext gets 'Apache languages' $code = langfam2code('Apache languages'); # $code gets 'apa' @codes = all_langfam_codes(); @names = all_langfam_names(); DESCRIPTION
The "Locale::Codes::LangFam" module provides access to standard codes used for identifying language families, such as those as defined in ISO 639-5. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 639-5 language family codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language families. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lext = code2langfam('apa','alpha'); $lext = code2langfam('apa',LOCALE_LANGFAM_ALPHA); The codesets currently supported are: alpha This is the set of three-letter (lowercase) codes from ISO 639-5 such as 'apa' for Apache languages. This is the default code set. ROUTINES
code2langfam ( CODE [,CODESET] ) langfam2code ( NAME [,CODESET] ) langfam_code2code ( CODE ,CODESET ,CODESET2 ) all_langfam_codes ( [CODESET] ) all_langfam_names ( [CODESET] ) Locale::Codes::LangFam::rename_langfam ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangFam::add_langfam ( CODE ,NAME [,CODESET] ) Locale::Codes::LangFam::delete_langfam ( CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_alias ( NAME ,NEW_NAME ) Locale::Codes::LangFam::delete_langfam_alias ( NAME ) Locale::Codes::LangFam::rename_langfam_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::delete_langfam_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.loc.gov/standards/iso639-5/id.php ISO 639-5 . AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.18.2 2013-11-04 Locale::Codes::LangFam(3pm)
All times are GMT -4. The time now is 06:04 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy