Replacing French special characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Replacing French special characters
# 1  
Old 07-09-2008
Replacing French special characters

Hi,

I have tonnes of .txt files that are written in French. I need to replace the French special characters, however, with English equivalents (e.g. é -> e and ç -> c).

I have tried this

---

#!/bin/bash
# Convert French characters to normal characters

# Treat each of the files

exec 3<&0
exec 0<frenchCharacters.txt

while read currentFrenchCharacter
do
read currentReplacementCharacter
sed -e "s/$currentFrenchCharacter/$currentReplacementCharacter/g" $1 > $1.frenchCharactersReplaced
mv $1.frenchCharactersReplaced $1
done

# Close the file
exec 3<&0

---

where "frenchCharacters.txt" contains a list of characters, where the first is the character to find and the second is the character to replace it with.

The problem is that it doesn't make any changes to the file that I send in (stored in $1). Anyone know why? Also, anyone know of a better way to do this?
# 2  
Old 07-23-2008
Tools perhaps this might help..

I don't think you can do it with the sed function. sed will try to replace condition1 with condition2; does not lend itself to checking inside a listing as you were starting to program.

I used the tr function below. You will need to know the octal values of the characters for your fontset.


Code:
> cat file1 
numero
telefono
vehiculo

> cat file1 | tr "u" "\372" | tr "e" "\351" | tr "i" "\354" >file2
> cat file2
núméro
téléfono
véhìcúlo

> cat file2 | tr "\372" "u" | tr "\351" "e" | tr "\354" "i" >file3
> cat file3
numero
telefono
vehiculo

# 3  
Old 07-23-2008
No magic tool. Just sed:
Code:
sed 's/[àâä]/a/g; s/[ÀÂÄ]/A/g; s/[éèêë]/e/g; s/[ÉÈÊË]/E/g; s/[îï]/i/g;
s/[ÎÏ]/I/g; s/[ôö]/o/g; s/[ÖÔ]/O/g; s/[ûüù]/u/g; s/[ÛÜÙ]/U/g; s/ç/c/g; s/Ç/C/g' your file

If you want sed to change our file "inline" just add the -i switch.

Code:
sed -i 'sed command' your_file

But be carefull: no way back.
# 4  
Old 07-23-2008
My copy of sed supports y//; is that a GNU extension? Seems tailor-made for the problem.
# 5  
Old 07-24-2008
Well, yes. And it's even more elegant:

Code:
SPEC_CHAR="ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"
NORM_CHAR="AAAAAAACEEEEIIIIDNOOOOOOUUUUYPSaaaaaaaceeeeiiiionoooooouuuuyby"

sed -i.bk 'y/'$SPEC_CHAR'/'$NORM_CHAR'/' file-to-process

The in-place switch -i will create a backup of the original file with a .bk extension.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Any tip to replacing the special characters in a file

Hi, Please find attached a file that has special characters on it. It is a copy and paste from a Micro$oft file. I don't want to use strings as it remove all the 'indentations' / 'formatting' so I am replacing them with space instead. I am using the sed command below sed "s/$(printf... (1 Reply)
Discussion started by: newbie_01
1 Replies

2. AIX

French Accented characters in xml file comes as numbers

Hello all, I am using AIX 7.1 and whenever xml files with accented French characters are read, for example Name Andree where the first e has accented mark on top, AIX should it as Andrée but it comes as funny number characters for the first e. What do I need to fix this. I want to test with one... (5 Replies)
Discussion started by: pregmi
5 Replies

3. UNIX for Beginners Questions & Answers

Replacing string/special characters using a 'conversion' table

Hi, Does anyone know if there is a script or program available out there that uses a conversion table to replace special characters from a file? I am trying to remove some special characters from a file but there are several unprintable/control characters that some I need to remove but some I... (2 Replies)
Discussion started by: newbie_01
2 Replies

4. UNIX for Dummies Questions & Answers

Replacing valuses containig space and special characters

**Extremely sorry for the typos in heading Old:CAST ('${DEFAULT_HIGH_DATE}' AS DATE FORMAT 'YYYY-MM-DD') New :CAST(CAST('${G_DEFAULT_HIGH_DATE}' AS DATE FORMAT 'MM-DD-YYYY') as DATE FORMAT 'YYYY-MM-DD') Need to change old format as new format cat file1 CAST ('${DEFAULT_HIGH_DATE}' AS... (1 Reply)
Discussion started by: 100bees
1 Replies

5. UNIX for Dummies Questions & Answers

French characters in postfix/sendmail

Hello again, How can I send emails via postfix with special characters like "à" via postfix. When I'm paste-ing the special character inside a editor (nano) it shows like this --> � ... any tips? (1 Reply)
Discussion started by: galford
1 Replies

6. Shell Programming and Scripting

sed replacing specific characters and control characters by escaping

sed -e "s// /g" old.txt > new.txt While I do know some control characters need to be escaped, can normal characters also be escaped and still work the same way? Basically I do not know all control characters that have a special meaning, for example, ?, ., % have a meaning and have to be escaped... (11 Replies)
Discussion started by: ijustneeda
11 Replies

7. Shell Programming and Scripting

Need help in replacing special characters

I am writing a ksh script. I need to replace a set of characters in an xml file. FROM="ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÛÚÜÝßàáâãäåçèéêëìíîïðñòóôõö¿¶ø®"; TO="AAAAAAACEEEEIIIIDNOOOOOOUUUUYSaaaaaaceeeeiiiionooooo N R" I have used the code- sed 's/$FROM/$TO/g'<abc.xml But its not working. Can anyone tell me the code to do this? (3 Replies)
Discussion started by: saga20
3 Replies

8. Shell Programming and Scripting

Replacing string with special characters in shell

Hi, I am trying to replace a string in shell but it is not working correctly. @xcom.file@ needs to be replaced with tb137 Plz help.Thx. Please use and tags when posting code, data or logs etc. to preserve formatting and enhance readability, thanks. (4 Replies)
Discussion started by: manish72
4 Replies

9. HP-UX

Problems with French Characters

I am having a problem with two OSes. One is running windows 2003 and sending XML to a second system running Unix (HP-UX 11i v1). Windows sends XML to the UNIX system fine but then the UNIX system reads the buffer file and turns the french characters into the following: é Ú É ╔ Î ... (3 Replies)
Discussion started by: Redfin
3 Replies

10. Shell Programming and Scripting

help on sed replacing special characters

Hello, I have a file with many lines with below format: \abc\\1234 jkl\\567 def\\345 \pqr\\567 \xyz\\234 Here, i need to do 2 things. 1. replace \\ with \ 2. remove starting \ so output to be as below: (11 Replies)
Discussion started by: prvnrk
11 Replies
Login or Register to Ask a Question