strlen for UTF-8 Post: 302413932

10 More Discussions You Might Find Interesting

1. Programming

Problems with Strlen

hello, i have a problem with strlen. I have written this: for(y=13,z=0; cInBuf!=' ';y++) { cBuf=cInBuf; z++; } len = strlen(cBuf); out=len/2; fprintf(outfile,"F%i",out); If strlen is e.g. 22, it write F22. I want to write F2F2. How can i do this?...

2. Shell Programming and Scripting

Problem with the strlen function in ksh

Hello, Just a little problem with the ksh function : strlen I want to use this function in this little ksh program : while read line ; do TOTO=$line TOTONB=strlen($TOTO) echo $TOTONB

3. Shell Programming and Scripting

UTF 8 and SED

Collegues I tried to manipulate a UTF 8 data using the following script. cat $1 | sed 's/ലായി$/ലായി LAYI/g' | sed 's/ുടെ/ുടെ UTE/g' | sed 's/യില്*/യില്* YIL/g' But it says that cnot exicute binary file. Any solution. Jaganadh. Linguist

4. Programming

'strlen' of a constant string

In a declaration, I have: const char comment_begin = ""; const int comment_begin_len = strlen(comment_begin); const int comment_end_len = strlen(comment_end); When I compile, I get the warnings: emhttpc.c:64: warning: initializer element is not...

5. Programming

pointer arithmetic vs. strlen() & strnlen()?

I have been getting some flack recently for my use of strlen() and strnlen(). Honestly I have always just taken their functionality for granted as being the easiest way of getting the length of a string. Is it really so much better to do pointer arithmetic? What am I gaining besides more...

6. UNIX for Advanced & Expert Users

vi and UTF-8 errors

We just installed icu for UTF-8 compliance on our AIX 5.3 system. While usuing vi on some files we get the following error: ex: 0602-169 Incomplete or invalid multibyte character encountere yte character encountered, conversion failed.ex: 0602-169 Incomplete or invalidb ractersultibyte...

7. UNIX for Dummies Questions & Answers

UTF-8 in xterm

I need to use sort, uniq, grep, wc,... and the like to work with lists of words in UTF-8 (the "words" being phonetic transcriptions using the IPA). I have been using Google a lot and I even found at least one previous post on this topic, but it didn't help. I tried following the instructions...

8. Linux

Help to Convert file from UNIX UTF-8 to Windows UTF-16

Hi, I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt and i am getting some chinese characters as below which l opened the converted file on windows machine. LANG=en_US.UTF-8...

9. Programming

Segment fault related to strlen.S

Hello, This function was copied into my code, which was compiled without error/warning, but when executed there is always Segmentation fault at the end after the output (which seems correct!): void get_hashes(unsigned int hash, unsigned char *in) { unsigned char *str = in; int pos =...

10. Shell Programming and Scripting

PHP: declared variables, strlen vs isset

greetings, pretty new to php and i think i might be missing some fundamental limitation of isset. i have two php scripts below that are executed by crond, one using --host X and one that does not. and below that are three different attempts at generating a command line that will be executed. the...

LEARN ABOUT OSF1

euro

euro(5) 							File Formats Manual							   euro(5)

NAME

       euro, Euro, EUR - Euro currency sign

DESCRIPTION

       The Euro currency is the new currency for European countries belonging to the Economic and Monetary Union (EMU). Euro currency is scheduled
       for introduction on January 1, 1999. By the end of 2002, the new currency should completely replace local currencies for EMU  member  coun-
       tries.

       The  Euro  currency has its own euro currency sign, which looks like an equal sign (=) superimposed on the capital letter C. Most character
       sets do not support this sign. Note that the string EUR can be prepended before monetary amounts in Euro currency in the same  way  USD	is
       sometimes used to specify U. S. dollars in certain kinds of financial reports. However, for the euro character itself, the string C= is the
       closest representation that most of the current character sets support and this approximation is not appropriate for some applications.

       Several character sets have been updated or invented to include the euro character. Among these are: Unicode Version  2.1  ISO/IEC  8859-15
       (Latin-9) Certain DOS and Microsoft code pages

       The following table specifies the encoding position of the euro character in each of these character sets:

       --------------------------------------------
       Character Set		     Euro Position
       --------------------------------------------
       Unicode Version 2.1	     0x20AC
       ISO/IEC 8859-15 (Latin-9)     0xA4
       CP1250 (Windows Latin-2)      0x80
       CP1251 (Windows Cyrillic)     0x88
       CP1252 (Windows Latin-1)      0x80
       CP1253 (Windows Greek)	     0x80
       CP1254 (Windows Turkish)      0x80
       CP1255 (Windows Hebrew)	     0x80
       CP1256 (Windows Arabic)	     0x80
       CP1257 (Windows Baltic)	     0x80
       CP1258 (Windows Vietnamese)   0x80
       CP874 (DOS Thai) 	     0x80
       --------------------------------------------

   Locales That Support the Euro Character
       Tru64  UNIX locales that support the euro character use either the UTF-8 or ISO 8859-15 codeset. The following table lists these locales by
       language  and  country:	ca_ES.UTF-8,   ca_ES.ISO8859-15   da_DK.UTF-8,	 da_DK.ISO8859-15   nl_NL.UTF-8,   nl_NL.ISO8859-15   de_DE.UTF-8,
       de_DE.ISO8859-15  de_CH.UTF-8,  de_CH.ISO8859-15  en_GB.UTF-8,  en_GB.ISO8859-15 en_EU.UTF-8@euro (This is a special-purpose locale that is
       explained following the list.)  en_US.UTF-8, en_US.UTF-8@euro, en_US.ISO8859-15 fi_FI.UTF-8, fi_FI.ISO8859-15 nl_BE.UTF-8, nl_BE.ISO8859-15
       fr_BE.UTF-8,  fr_BE.ISO8859-15  fr_CA.UTF-8,  fr_CA.ISO8859-15  fr_FR.UTF-8,  fr_FR.ISO8859-15  fr_CH.UTF-8,  fr_CH.ISO8859-15 is_IS.UTF-8,
       is_IS.ISO8859-15 it_IT.UTF-8, it_IT.ISO8859-15 no_NO.UTF-8, no_NO.ISO8859-15 pt_PT.UTF-8,  pt_PT.ISO8859-15  es_ES.UTF-8,  ds_ES.ISO8859-15
       sv_SE.UTF-8, sv_SE.ISO8859-15

       CDE  users  can	select locales by using the Language menu at session login time and selecting languages whose names are followed by "(Uni-
       code)." Alternatively, users can set the LANG environment variable to one of the locales  in  a	terminal  emulation  window.  The  Latin-9
       locales	can  be  set in a terminal emulation window. When set in a terminal emulation window, the locale setting applies to child applica-
       tions subsequently invoked from that window.

       The @euro locale variants provide LC_MONETARY definitions for the euro character and  are  intended  for  assignment  specifically  to  the
       LC_MONETARY  locale  variable. In these locales, the local currency sign is defined to be the euro character and the international currency
       sign is defined to be EUR. The en_US.UTF-8@euro locale defines the radix point to be the period (.) and the thousands separator to  be  the
       comma (,). The en_EU.UTF-8@euro locale reverses these character assignments; the radix point is a comma(,) and the thousands separator is a
       period (.). Because en_EU.UTF-8@euro is intended for assignment only to LC_MONETARY, the locale is useful for languages other than English.
       For example, support for the euro character in Germany can be obtained by setting LANG to de_DE.UTF-8 and LC_MONETARY to en_EU.UTF-8@euro.

									  Note

       The  LC_ALL  environment variable overrides settings of all locale category variables, such as LC_MONETARY.  When setting LC_MONETARY to be
       different from settings for the remainder of locale categories, be sure to use the LANG, not the LC_ALL, environment variable.

       Applications that currently assume that one character of data is represented by one byte of data in file code can more easily  support  the
       euro  character	by running in a locale rather than a locale. Because UTF-8 is basically a multibyte character encoding format, programmers
       cannot assume that one character is equal to one byte of input data. To run in a locale, applications  should  use  functions  that  handle
       multibyte  and  wide-character  data  rather than older functions that operate only on single-byte characters. For more information on this
       topic, see Writing Software for the International Market. For more information about UTF-8 and UCS-4 encoding formats, see Unicode(5)

   Codeset Converters That Support the Euro Character
       Codeset converters are available to convert data between encoding formats that support the euro character. Codeset converters  can  convert
       file  data  between the following formats: Unicode encoding formats and the 874 and 125* codepages Unicode encoding formats and ISO 8859-15
       (Latin-9)

       For more information about these codeset converters, see iconv_intro(5), Unicode(5), code_page(5), and iso8859-15(5).

   Keyboard Entry of the Euro Character
       Depending on locale setting and keyboard style, you can use particular key sequences to enter the euro character.

       When using a or locale and a keyboard that supports the Compose-character entry method, you can use the Compose key input method  to  enter
       the  euro  character.  For  Compose-key input, you press and release certain keys in sequence, starting with the key defined as the Compose
       key. For the euro character, use one of the following two sequences: Compose C = Compose = C

       The following table lists more efficient key sequences that are supported for specific languages and keyboard styles.  Note  that  the  key
       sequences  in the table are supported only by xkb format keymaps (which are the default for CDE users). When using these key sequences, you
       hold down the first key while pressing the other.

       -----------------------------------------------------------
       Keymap Description   VT-Style Keyboard	PC-Style Keyboard
       -----------------------------------------------------------
       Belgian		    Left Compose+E	Right Alt+E
       Czech		    Left Compose+E	Right Alt+E
       Danish		    Left Compose+E	Right Alt+E
       Dutch		    Left Compose+E	Right Alt+E
       English Canadian     Left Compose+E	Right Alt+E
       Finnish		    Left Compose+E	Right Alt+E
       Flemish		    Left Compose+E	Right Alt+E
       French		    Left Compose+E	Right Alt+E
       French Canadian	    Left Compose+E	Right Alt+E
       Swiss French	    Left Compose+E	Right Alt+E
       German		    Left Compose+E	Right Alt+E
       Swiss German	    Left Compose+E	Right Alt+E
       Hungarian	    Left Compose+E	Right Alt+E
       Italian		    Left Compose+E	Right Alt+E
       Lithuanian	    Left Compose+E	Right Alt+E
       Norwegian	    Left Compose+E	Right Alt+E
       Polish		    Left Compose+U	Right Alt+u
       Portuguese	    None		Right Alt+E
       Serb/Croat/Slovene   Left Compose+E	Right Alt+E
       Slovak		    Left Compose+E	Right Alt+E
       Spanish		    Left Compose+E	Right Alt+E
       Swedish		    Left Compose+E	Right Alt+E
       Turkish		    Left Compose+E	Right Alt+E
       United Kingdom	    Left Compose+4	Right Alt+4
       -----------------------------------------------------------

       For more information about keyboards, keymaps, and character-entry methods, see keyboard(5).

   Font Support for the Euro Character
       The operating system does not provide native Unicode fonts that include glyphs for the euro character. However, the character is  supported
       by  a set of Latin-9 fonts. The X font library has been extended to combine a number of fonts together to provide logical Unicode fonts for
       applications to use.  The names of these logical fonts end with ISO10646-1. You can use the xlsfonts utility to find out if these fonts are
       installed on your system.

   Printer Support for the Euro Character
       Printing of file data in UTF-8 or Latin-9 format is supported by a generic PostScript print filter. See wwpsof(8) for information on how to
       configure this print filter.

SEE ALSO

       Commands: xlsfonts(1X), wwpsof(8)

       Others: code_page(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), iso8859-15(5), keyboard(5), l10n_intro(5), Unicode(5)

       Writing Software for the International Market

																	   euro(5)

10 More Discussions You Might Find Interesting

1. Programming

Problems with Strlen

Discussion started by: ACeD

2. Shell Programming and Scripting

Problem with the strlen function in ksh

Discussion started by: steiner

3. Shell Programming and Scripting

UTF 8 and SED

Discussion started by: jaganadh

4. Programming

'strlen' of a constant string

Discussion started by: cleopard