Sponsored Content
Operating Systems AIX French Accented characters in xml file comes as numbers Post 303009513 by pregmi on Friday 15th of December 2017 08:42:33 AM
Old 12-15-2017
Still the same problem Don. I have the locales loaded and I have this on itftp .profile

Code:
 unset LC_ALL
export LANG=en.US.UTF-8
export LC_CTYPE=fr_FR.UTF-8
  
 [root@teamaix]/app/user/itftp ->locale -a
C
POSIX
EN_US.UTF-8
EN_US
FR_CA.UTF-8
FR_CA
FR_FR.UTF-8@euro
FR_FR.UTF-8@preeuro
FR_FR.UTF-8
FR_FR@euro
FR_FR@preeuro
FR_FR
en_US.8859-15
en_US.ISO8859-1
en_US.UTF-8
en_US
fr_BE.8859-15@euro
fr_BE.8859-15@preeuro
fr_BE.8859-15
fr_BE.IBM-1252@euro
fr_BE.IBM-1252@preeuro
fr_BE.IBM-1252
fr_BE.ISO8859-1
fr_BE
fr_CA.8859-15
fr_CA.ISO8859-1
fr_CA.UTF-8
fr_CA
fr_CH.8859-15
fr_CH.ISO8859-1
fr_CH
fr_FR.UTF-8
fr_LU.8859-15@euro
fr_LU.8859-15@preeuro
fr_LU.8859-15
fr_LU@euro
fr_LU@preeuro
fr_LU

But when I read the xml file still the same error.

Code:
 teamaix(itftp): /app/user/itftp -> grep Andr F18GRAD014.xml
<FirstName>Andrée</FirstName>---->This one
<FirstName>Andrew</FirstName>

Moderator's Comments:
Mod Comment Please use CODE tags when displaying sample input, output, and code segments as required by forum rules.

Last edited by Don Cragun; 12-15-2017 at 01:36 PM.. Reason: Add CODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replacing French special characters

Hi, I have tonnes of .txt files that are written in French. I need to replace the French special characters, however, with English equivalents (e.g. é -> e and ç -> c). I have tried this --- #!/bin/bash # Convert French characters to normal characters # Treat each of the files exec... (4 Replies)
Discussion started by: BlueberryPickle
4 Replies

2. Shell Programming and Scripting

display all possible control characters from .xml file in unix

Hi, I have a .xml file in unix. We are passing this file through a xml parser. But we are getting some control characters from input file and XML parser is failing for the control character in file.Now I am getting following error, Error at byte 243206625 of file filename_$.xml: Error... (1 Reply)
Discussion started by: fantushmayu
1 Replies

3. HP-UX

Problems with French Characters

I am having a problem with two OSes. One is running windows 2003 and sending XML to a second system running Unix (HP-UX 11i v1). Windows sends XML to the UNIX system fine but then the UNIX system reads the buffer file and turns the french characters into the following: é Ú É ╔ Î ... (3 Replies)
Discussion started by: Redfin
3 Replies

4. UNIX for Dummies Questions & Answers

XML file shows Junk Characters in UNIX

Hello sir, I have generated XML file from VS 2005. It works well in windows but it shows some junk characters in unix. Can any help me with this problem. Thank you in advance. Hema (6 Replies)
Discussion started by: hemavenkatesh
6 Replies

5. Shell Programming and Scripting

Help with escaping xml characters in a file

Hi, I have a file xy.csv with the following data separated by pipe (|): BC-NACO|12>ISA43<TEST| A & A INC|FAMOUS'S AL| i need to escape the xml characters as below BC-NACO|12&gt;ISA43&lt;TEST| A &amp; A INC|FAMOUS&apos;S AL| Please advise (5 Replies)
Discussion started by: prasannarajesh
5 Replies

6. Shell Programming and Scripting

Remove lines with non-chinese characters from xml file

Hi there, I'm looking for a way to remove all lines that don't contain chinese characters from an xml file. Example: http://pastebin.com/8KzSbCKe The result should be like this: http://pastebin.com/ZywXsNhx Only lines that don't contain chinese characters should be deleted. If theres a mix of... (3 Replies)
Discussion started by: g4rb4g3
3 Replies

7. UNIX for Dummies Questions & Answers

Matching numbers of characters in two lines

Dear all, I'm stuck on a certain problem regarding counting the number of characters in one line and then adjusting the number of characters of another line to this number. This was my original input data: @HWI-ST471_57:1:1:1231:2079/2... (4 Replies)
Discussion started by: DerSeb
4 Replies

8. Shell Programming and Scripting

Find out special characters from xml file

Hi....I have a xml file which is having lots of special characters which I need to find out and put the distinct list of those into a text file. The list of special characters is not specific, it can be anything at different point of time. Can anyone help me to find out the same and list out? ... (10 Replies)
Discussion started by: Krishanu Saha
10 Replies

9. Shell Programming and Scripting

How to ignore characters and print only numbers using awk?

Input: ak=70&cat15481=lot=6991901">Kaschau (1820-1840) ak=7078&cat15482=lot=70121">Principauté (1940-1993) ak=709&cat=lot15484=70183944">Arubas (4543-5043)Output: 70 15481 6991901 7078 15482 70121 709 15484 70183944 (11 Replies)
Discussion started by: sdf
11 Replies

10. UNIX for Dummies Questions & Answers

French characters in postfix/sendmail

Hello again, How can I send emails via postfix with special characters like "à" via postfix. When I'm paste-ing the special character inside a editor (nano) it shows like this --> � ... any tips? (1 Reply)
Discussion started by: galford
1 Replies
euro(5) 							File Formats Manual							   euro(5)

NAME
euro, Euro, EUR - Euro currency sign DESCRIPTION
The Euro currency is the new currency for European countries belonging to the Economic and Monetary Union (EMU). Euro currency is scheduled for introduction on January 1, 1999. By the end of 2002, the new currency should completely replace local currencies for EMU member coun- tries. The Euro currency has its own euro currency sign, which looks like an equal sign (=) superimposed on the capital letter C. Most character sets do not support this sign. Note that the string EUR can be prepended before monetary amounts in Euro currency in the same way USD is sometimes used to specify U. S. dollars in certain kinds of financial reports. However, for the euro character itself, the string C= is the closest representation that most of the current character sets support and this approximation is not appropriate for some applications. Several character sets have been updated or invented to include the euro character. Among these are: Unicode Version 2.1 ISO/IEC 8859-15 (Latin-9) Certain DOS and Microsoft code pages The following table specifies the encoding position of the euro character in each of these character sets: -------------------------------------------- Character Set Euro Position -------------------------------------------- Unicode Version 2.1 0x20AC ISO/IEC 8859-15 (Latin-9) 0xA4 CP1250 (Windows Latin-2) 0x80 CP1251 (Windows Cyrillic) 0x88 CP1252 (Windows Latin-1) 0x80 CP1253 (Windows Greek) 0x80 CP1254 (Windows Turkish) 0x80 CP1255 (Windows Hebrew) 0x80 CP1256 (Windows Arabic) 0x80 CP1257 (Windows Baltic) 0x80 CP1258 (Windows Vietnamese) 0x80 CP874 (DOS Thai) 0x80 -------------------------------------------- Locales That Support the Euro Character Tru64 UNIX locales that support the euro character use either the UTF-8 or ISO 8859-15 codeset. The following table lists these locales by language and country: ca_ES.UTF-8, ca_ES.ISO8859-15 da_DK.UTF-8, da_DK.ISO8859-15 nl_NL.UTF-8, nl_NL.ISO8859-15 de_DE.UTF-8, de_DE.ISO8859-15 de_CH.UTF-8, de_CH.ISO8859-15 en_GB.UTF-8, en_GB.ISO8859-15 en_EU.UTF-8@euro (This is a special-purpose locale that is explained following the list.) en_US.UTF-8, en_US.UTF-8@euro, en_US.ISO8859-15 fi_FI.UTF-8, fi_FI.ISO8859-15 nl_BE.UTF-8, nl_BE.ISO8859-15 fr_BE.UTF-8, fr_BE.ISO8859-15 fr_CA.UTF-8, fr_CA.ISO8859-15 fr_FR.UTF-8, fr_FR.ISO8859-15 fr_CH.UTF-8, fr_CH.ISO8859-15 is_IS.UTF-8, is_IS.ISO8859-15 it_IT.UTF-8, it_IT.ISO8859-15 no_NO.UTF-8, no_NO.ISO8859-15 pt_PT.UTF-8, pt_PT.ISO8859-15 es_ES.UTF-8, ds_ES.ISO8859-15 sv_SE.UTF-8, sv_SE.ISO8859-15 CDE users can select locales by using the Language menu at session login time and selecting languages whose names are followed by "(Uni- code)." Alternatively, users can set the LANG environment variable to one of the locales in a terminal emulation window. The Latin-9 locales can be set in a terminal emulation window. When set in a terminal emulation window, the locale setting applies to child applica- tions subsequently invoked from that window. The @euro locale variants provide LC_MONETARY definitions for the euro character and are intended for assignment specifically to the LC_MONETARY locale variable. In these locales, the local currency sign is defined to be the euro character and the international currency sign is defined to be EUR. The en_US.UTF-8@euro locale defines the radix point to be the period (.) and the thousands separator to be the comma (,). The en_EU.UTF-8@euro locale reverses these character assignments; the radix point is a comma(,) and the thousands separator is a period (.). Because en_EU.UTF-8@euro is intended for assignment only to LC_MONETARY, the locale is useful for languages other than English. For example, support for the euro character in Germany can be obtained by setting LANG to de_DE.UTF-8 and LC_MONETARY to en_EU.UTF-8@euro. Note The LC_ALL environment variable overrides settings of all locale category variables, such as LC_MONETARY. When setting LC_MONETARY to be different from settings for the remainder of locale categories, be sure to use the LANG, not the LC_ALL, environment variable. Applications that currently assume that one character of data is represented by one byte of data in file code can more easily support the euro character by running in a locale rather than a locale. Because UTF-8 is basically a multibyte character encoding format, programmers cannot assume that one character is equal to one byte of input data. To run in a locale, applications should use functions that handle multibyte and wide-character data rather than older functions that operate only on single-byte characters. For more information on this topic, see Writing Software for the International Market. For more information about UTF-8 and UCS-4 encoding formats, see Unicode(5) Codeset Converters That Support the Euro Character Codeset converters are available to convert data between encoding formats that support the euro character. Codeset converters can convert file data between the following formats: Unicode encoding formats and the 874 and 125* codepages Unicode encoding formats and ISO 8859-15 (Latin-9) For more information about these codeset converters, see iconv_intro(5), Unicode(5), code_page(5), and iso8859-15(5). Keyboard Entry of the Euro Character Depending on locale setting and keyboard style, you can use particular key sequences to enter the euro character. When using a or locale and a keyboard that supports the Compose-character entry method, you can use the Compose key input method to enter the euro character. For Compose-key input, you press and release certain keys in sequence, starting with the key defined as the Compose key. For the euro character, use one of the following two sequences: Compose C = Compose = C The following table lists more efficient key sequences that are supported for specific languages and keyboard styles. Note that the key sequences in the table are supported only by xkb format keymaps (which are the default for CDE users). When using these key sequences, you hold down the first key while pressing the other. ----------------------------------------------------------- Keymap Description VT-Style Keyboard PC-Style Keyboard ----------------------------------------------------------- Belgian Left Compose+E Right Alt+E Czech Left Compose+E Right Alt+E Danish Left Compose+E Right Alt+E Dutch Left Compose+E Right Alt+E English Canadian Left Compose+E Right Alt+E Finnish Left Compose+E Right Alt+E Flemish Left Compose+E Right Alt+E French Left Compose+E Right Alt+E French Canadian Left Compose+E Right Alt+E Swiss French Left Compose+E Right Alt+E German Left Compose+E Right Alt+E Swiss German Left Compose+E Right Alt+E Hungarian Left Compose+E Right Alt+E Italian Left Compose+E Right Alt+E Lithuanian Left Compose+E Right Alt+E Norwegian Left Compose+E Right Alt+E Polish Left Compose+U Right Alt+u Portuguese None Right Alt+E Serb/Croat/Slovene Left Compose+E Right Alt+E Slovak Left Compose+E Right Alt+E Spanish Left Compose+E Right Alt+E Swedish Left Compose+E Right Alt+E Turkish Left Compose+E Right Alt+E United Kingdom Left Compose+4 Right Alt+4 ----------------------------------------------------------- For more information about keyboards, keymaps, and character-entry methods, see keyboard(5). Font Support for the Euro Character The operating system does not provide native Unicode fonts that include glyphs for the euro character. However, the character is supported by a set of Latin-9 fonts. The X font library has been extended to combine a number of fonts together to provide logical Unicode fonts for applications to use. The names of these logical fonts end with ISO10646-1. You can use the xlsfonts utility to find out if these fonts are installed on your system. Printer Support for the Euro Character Printing of file data in UTF-8 or Latin-9 format is supported by a generic PostScript print filter. See wwpsof(8) for information on how to configure this print filter. SEE ALSO
Commands: xlsfonts(1X), wwpsof(8) Others: code_page(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), iso8859-15(5), keyboard(5), l10n_intro(5), Unicode(5) Writing Software for the International Market euro(5)
All times are GMT -4. The time now is 10:49 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy