French Accented characters in xml file comes as numbers


 
Thread Tools Search this Thread
Operating Systems AIX French Accented characters in xml file comes as numbers
# 1  
Old 12-14-2017
French Accented characters in xml file comes as numbers

Hello all, I am using AIX 7.1 and whenever xml files with accented French characters are read, for example Name Andree where the first e has accented mark on top, AIX should it as Andrée but it comes as funny number characters for the first e. What do I need to fix this. I want to test with one ftpuser such as itftp by making changes on its profile and read the file before making global change on /etc/environments. Please help me fixing this. I have tried to change the language to en_US.UTF-8 and it still reads funny.

I have
Code:
LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8 on .profile for itftp user.

Thank you

Last edited by rbatte1; 12-15-2017 at 08:40 AM.. Reason: Added CODE tags
# 2  
Old 12-14-2017
You need to decide whether you want to see English or French. English locales don't have accented vowels.

You might (or might not) have some luck with:
Code:
unset LC_ALL
export LANG=en_US.UTF-8
export LC_CTYPE=fr_FR.UTF-8

assuming that the French locales are loaded on your AIX system.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 12-15-2017
Thank you Don. It is regular xml file that would have French Names once in a while and it needs to read both.

---------- Post updated at 11:49 PM ---------- Previous update was at 11:48 PM ----------

Don what does the second command export LC_CTYPE=fr_FR.UTF-8 do?

Last edited by rbatte1; 12-15-2017 at 08:41 AM.. Reason: Added ICODE tags
# 4  
Old 12-15-2017
Quote:
Originally Posted by pregmi
Thank you Don. It is regular xml file that would have French Names once in a while and it needs to read both.

---------- Post updated at 11:49 PM ---------- Previous update was at 11:48 PM ----------

Don what does the second command

export LC_CTYPE=fr_FR.UTF-8 do?
It uses a French locale for the definition of characters that are to be considered valid when looking at strings, character classes, etc. Setting LC_ALL to any value overrides any values assigned to LANG and all of the other LC_* locale setting environment variables (which is why the first step in my suggestion was to undefine LC_ALL).

But, of course, I don't have an AIX system to test and I just guessed at the name of a French locale based on a guess at the naming convention used on your system based on the name of your English UTF-8 locale.
# 5  
Old 12-15-2017
Still the same problem Don. I have the locales loaded and I have this on itftp .profile

Code:
 unset LC_ALL
export LANG=en.US.UTF-8
export LC_CTYPE=fr_FR.UTF-8
  
 [root@teamaix]/app/user/itftp ->locale -a
C
POSIX
EN_US.UTF-8
EN_US
FR_CA.UTF-8
FR_CA
FR_FR.UTF-8@euro
FR_FR.UTF-8@preeuro
FR_FR.UTF-8
FR_FR@euro
FR_FR@preeuro
FR_FR
en_US.8859-15
en_US.ISO8859-1
en_US.UTF-8
en_US
fr_BE.8859-15@euro
fr_BE.8859-15@preeuro
fr_BE.8859-15
fr_BE.IBM-1252@euro
fr_BE.IBM-1252@preeuro
fr_BE.IBM-1252
fr_BE.ISO8859-1
fr_BE
fr_CA.8859-15
fr_CA.ISO8859-1
fr_CA.UTF-8
fr_CA
fr_CH.8859-15
fr_CH.ISO8859-1
fr_CH
fr_FR.UTF-8
fr_LU.8859-15@euro
fr_LU.8859-15@preeuro
fr_LU.8859-15
fr_LU@euro
fr_LU@preeuro
fr_LU

But when I read the xml file still the same error.

Code:
 teamaix(itftp): /app/user/itftp -> grep Andr F18GRAD014.xml
<FirstName>Andrée</FirstName>---->This one
<FirstName>Andrew</FirstName>

Moderator's Comments:
Mod Comment Please use CODE tags when displaying sample input, output, and code segments as required by forum rules.

Last edited by Don Cragun; 12-15-2017 at 01:36 PM.. Reason: Add CODE tags.
# 6  
Old 12-15-2017
Quote:
Originally Posted by pregmi
Still the same problem Don. I have the locales loaded and I have this on itftp .profile

Code:
 unset LC_ALL
export LANG=en.US.UTF-8
export LC_CTYPE=fr_FR.UTF-8
  
 [root@teamaix]/app/user/itftp ->locale -a
C
POSIX
EN_US.UTF-8
EN_US
FR_CA.UTF-8
FR_CA
FR_FR.UTF-8@euro
FR_FR.UTF-8@preeuro
FR_FR.UTF-8
FR_FR@euro
FR_FR@preeuro
FR_FR
en_US.8859-15
en_US.ISO8859-1
en_US.UTF-8
en_US
fr_BE.8859-15@euro
fr_BE.8859-15@preeuro
fr_BE.8859-15
fr_BE.IBM-1252@euro
fr_BE.IBM-1252@preeuro
fr_BE.IBM-1252
fr_BE.ISO8859-1
fr_BE
fr_CA.8859-15
fr_CA.ISO8859-1
fr_CA.UTF-8
fr_CA
fr_CH.8859-15
fr_CH.ISO8859-1
fr_CH
fr_FR.UTF-8
fr_LU.8859-15@euro
fr_LU.8859-15@preeuro
fr_LU.8859-15
fr_LU@euro
fr_LU@preeuro
fr_LU

But when I read the xml file still the same error.

Code:
 teamaix(itftp): /app/user/itftp -> grep Andr F18GRAD014.xml
<FirstName>Andrée</FirstName>---->This one
<FirstName>Andrew</FirstName>

Moderator's Comments:
Mod Comment Please use CODE tags when displaying sample input, output, and code segments as required by forum rules.
I'm confused by what you have shown us.
Setting these locale environment variables in the .profile file of the user itftp will not affect the output you see when you run commands in your shell when you are logged in as you. Try running the following commands in your shell, and tell us what happens:
Code:
( unset LC_ALL
LANG=en.US.UTF-8
LC_CTYPE=fr_FR.UTF-8
grep Andr F18GRAD014.xml
)

(Note that the parentheses put all of these commands in a subshell environment. The locale environment variables outside of this subshell will not be affected. So, if it doesn't work, you haven't modified your current shell execution environment.

If it does work, you can decide whether you want to make make the changes I suggested to your .profile file, log out, and login again so all future commands you run will be using these locale settings or if you want to type in these commands only when you run certain commands (like this grep) in the future.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

French characters in postfix/sendmail

Hello again, How can I send emails via postfix with special characters like "à" via postfix. When I'm paste-ing the special character inside a editor (nano) it shows like this --> � ... any tips? (1 Reply)
Discussion started by: galford
1 Replies

2. Shell Programming and Scripting

How to ignore characters and print only numbers using awk?

Input: ak=70&cat15481=lot=6991901">Kaschau (1820-1840) ak=7078&cat15482=lot=70121">Principauté (1940-1993) ak=709&cat=lot15484=70183944">Arubas (4543-5043)Output: 70 15481 6991901 7078 15482 70121 709 15484 70183944 (11 Replies)
Discussion started by: sdf
11 Replies

3. Shell Programming and Scripting

Find out special characters from xml file

Hi....I have a xml file which is having lots of special characters which I need to find out and put the distinct list of those into a text file. The list of special characters is not specific, it can be anything at different point of time. Can anyone help me to find out the same and list out? ... (10 Replies)
Discussion started by: Krishanu Saha
10 Replies

4. UNIX for Dummies Questions & Answers

Matching numbers of characters in two lines

Dear all, I'm stuck on a certain problem regarding counting the number of characters in one line and then adjusting the number of characters of another line to this number. This was my original input data: @HWI-ST471_57:1:1:1231:2079/2... (4 Replies)
Discussion started by: DerSeb
4 Replies

5. Shell Programming and Scripting

Remove lines with non-chinese characters from xml file

Hi there, I'm looking for a way to remove all lines that don't contain chinese characters from an xml file. Example: http://pastebin.com/8KzSbCKe The result should be like this: http://pastebin.com/ZywXsNhx Only lines that don't contain chinese characters should be deleted. If theres a mix of... (3 Replies)
Discussion started by: g4rb4g3
3 Replies

6. Shell Programming and Scripting

Help with escaping xml characters in a file

Hi, I have a file xy.csv with the following data separated by pipe (|): BC-NACO|12>ISA43<TEST| A & A INC|FAMOUS'S AL| i need to escape the xml characters as below BC-NACO|12&gt;ISA43&lt;TEST| A &amp; A INC|FAMOUS&apos;S AL| Please advise (5 Replies)
Discussion started by: prasannarajesh
5 Replies

7. UNIX for Dummies Questions & Answers

XML file shows Junk Characters in UNIX

Hello sir, I have generated XML file from VS 2005. It works well in windows but it shows some junk characters in unix. Can any help me with this problem. Thank you in advance. Hema (6 Replies)
Discussion started by: hemavenkatesh
6 Replies

8. HP-UX

Problems with French Characters

I am having a problem with two OSes. One is running windows 2003 and sending XML to a second system running Unix (HP-UX 11i v1). Windows sends XML to the UNIX system fine but then the UNIX system reads the buffer file and turns the french characters into the following: é Ú É ╔ Î ... (3 Replies)
Discussion started by: Redfin
3 Replies

9. Shell Programming and Scripting

display all possible control characters from .xml file in unix

Hi, I have a .xml file in unix. We are passing this file through a xml parser. But we are getting some control characters from input file and XML parser is failing for the control character in file.Now I am getting following error, Error at byte 243206625 of file filename_$.xml: Error... (1 Reply)
Discussion started by: fantushmayu
1 Replies

10. Shell Programming and Scripting

Replacing French special characters

Hi, I have tonnes of .txt files that are written in French. I need to replace the French special characters, however, with English equivalents (e.g. é -> e and ç -> c). I have tried this --- #!/bin/bash # Convert French characters to normal characters # Treat each of the files exec... (4 Replies)
Discussion started by: BlueberryPickle
4 Replies
Login or Register to Ask a Question