Sponsored Content
Full Discussion: Real UNICODE back to string
Top Forums Shell Programming and Scripting Real UNICODE back to string Post 302500654 by strolchFX on Tuesday 1st of March 2011 07:34:52 AM
Old 03-01-2011
Real UNICODE back to string

I'm looking for proper NLS_LANG settings if I've a real UNICODE delimited string (Hex code points) , containing also multibyte characters and using a small java program which converts them back to local.

i.e: '0056;0065;006E;0064;006F;0072;0020;0054;0065;0078;0074;003A;5355;8EAB;6D4B;8BD5;4E2D;6587;5B57;7B26 ;FF0C;9884;795D;5927;8FD0;4F1A;987A;5229;53EC;5F00;'

I've tried:

export NLS_LANG=Japanese_Japan.UTF8
export NLS_LANG=American_America.UTF8

None works for the multibyte characters, as it points to UTF8 but not real Unicode.
In UTF8 the multibyte chars are expected to start also with '00..'.
Somehow i do not get the Hex code points handled and it always expect UTF8 code units.

Last edited by strolchFX; 03-01-2011 at 08:45 AM..
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

converting string to unicode

How can I can convert a string in a shell script that looks something like: ]] to unicode equivalent? thanks a lot, webtekie (1 Reply)
Discussion started by: webtekie
1 Replies

2. Programming

How to make static unicode string?

In Windows, wchar_t *pStr = L"Hello"; works, but I can't find the equivalent to Unix system. How can I make static stack-memory-based wide character string in C in Unix? (1 Reply)
Discussion started by: sledge76
1 Replies

3. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

4. Shell Programming and Scripting

Removing back quotes from string in CSH

Hello, I am using csh to read a text file and save its words into variable $word in a foreach loop. These words have small back quotes ` as integral parts of them, for example, one word would be `abc`, another would be `xyz1` etc... These quotes are always the first and last characters of the... (5 Replies)
Discussion started by: aplaydoc
5 Replies

5. Solaris

Can't install Unicode::String due to String.so not found

CPAN.pm: Going to build G/GA/GAAS/Unicode-String-2.09.tar.gz Checking if your kit is complete... Looks good Writing Makefile for Unicode::String cp String.pm blib/lib/Unicode/String.pm cp lib/Unicode/CharName.pm blib/lib/Unicode/CharName.pm /usr/bin/perl /usr/perl5/5.8.4/lib/ExtUtils/xsubpp... (5 Replies)
Discussion started by: PatrickBaer
5 Replies

6. Shell Programming and Scripting

Bash shell script: Str(007) to int(7),increment it(8) & convert back to string(008)

Hi, I have the following requirement. There will be following text/line in a file (eg: search-build.txt) PRODUCT_VERSION="V:01.002.007.Build1234" I need to update the incremental build number (eg here 007) every time I give a build through script. I am able to search the string and get... (4 Replies)
Discussion started by: drwatson_droid
4 Replies

7. Shell Programming and Scripting

Problem in Concatination of string in bash scripts containing back slashes.

My script is as follows: #!/bin/bash STR1="test" echo $STR1 STR2="/bldtmp/"$STR1 echo $STR2 STR3=$STR2'/tmp' echo $STR3 output i am geting ---------------- test /bldtmp/test /tmptmp/test but my need is: ------------------ test /bldtmp/test (1 Reply)
Discussion started by: dchoudhury
1 Replies

8. Programming

Unicode String Issue

I am storing some unicode characters "лфи" in a char array. When I view(x/30s <variable name>) the values in gdb it show me something like: 0x80ac47c: "?\004>\004 " 0x80ac482: "A\0048\004;\004L\004D\004>\004=\004:\0045\004/" Why it is happening so and what are these \004 representing? (1 Reply)
Discussion started by: rupeshkp728
1 Replies
multibyte(3C)															     multibyte(3C)

NAME
mblen(), mbtowc(), mbstowcs(), wctomb(), wcstombs() - multibyte characters and strings conversions SYNOPSIS
DESCRIPTION
A multibyte character is composed of one or more bytes that represent a "whole" character in a character encoding. A wide character (type of is composed of a fixed number of bytes whose code value can represent any character in a character encoding. Determine the number of bytes in the multibyte character pointed to by s. Equivalent to: If s is a null pointer, mblen returns a nonzero or zero value, depending on whether the multibyte character encodings do or do not have state-dependent encodings, respectively. Since no character encodings currently supported by HP-UX are state-depen- dent, zero is always returned in this case. However, for maximum portability to other systems, application programs should not depend on this. If s is not a null pointer, mblen returns the number of bytes in the multibyte character if the next n or fewer bytes form a valid multibyte character, or return -1 if they do not form a valid multibyte character. If s points to the null character, mblen returns 0. Determine the number of bytes in the multibyte character pointed to by s, determine the code for the value of type corresponding to that multibyte character, then store the code in the object pointed to by pwc. The value of the code corresponding to the null character is zero. At most n characters are examined, starting at the character pointed to by s. If s is a null pointer, returns a non-zero or zero value, depending on whether the multibyte character encodings do or do not have state-dependent encodings, respectively. Since no character encodings currently supported by HP-UX are state-dependent, zero is always returned in this case. However, for maximum portability to other systems, application programs should not depend on this. If s is not a null pointer, returns the number of bytes in the converted multibyte character if the next n or fewer bytes form a valid multibyte character, or -1 if they do not form a valid multibyte character. If s points to the null character, returns 0. The value returned is never greater than n or the value of the macro. Determine the number of bytes needed to represent the multibyte character corresponding to the code whose value is wchar and store the multibyte character representation in the array object pointed to by s. At most characters are stored. If s is a null pointer, returns a nonzero or zero value, depending on whether the multibyte character encodings do or do not have state-dependent encodings, respectively. Since no character encodings currently supported by HP-UX are state-dependent, zero is always returned in this case. However, for maximum portability to other systems, application programs should not depend on this. If s is not a null pointer, returns the number of bytes in the multibyte character corresponding to the value of wchar, or -1 if the value of wchar does not correspond to a valid multibyte character. The value returned is never greater than the value of the macro. Convert a sequence of multibyte characters from the array pointed to by s into a sequence of corresponding codes and store these codes into the array pointed to by pwcs, stopping after either n codes or a code with value zero (a converted null character) is stored. Each multibyte character is converted as if by a call to No more than n elements are modified in the array pointed to by pwcs. If an invalid multibyte character is encountered, returns (size_t)-1. Otherwise, returns the number of array elements modi- fied, not including a terminating zero code, if any. The array is not null- or zero-terminated if the value returned is n. If pwcs is a null pointer, returns the number of elements required for the wide-character-code array. Convert a sequence of codes corresponding to multibyte characters from the array pointed to by pwcs into a sequence of multibyte characters and store them into the array pointed to by s, stopping if a multibyte character exceeds the limit of n total bytes or if a null character is stored. Each code is converted as if by a call to No more than n bytes are modified in the array pointed to by s. If a code is encountered that does not correspond to a valid multibyte character, returns (size_t)-1. Otherwise, returns the number of bytes modified, not including a terminating null character, if any. The array is not null- or zero-terminated if the value returned is n. If s is a null pointer, returns the number of bytes required for the character array. EXTERNAL INFLUENCES
Locale The category determines the behavior of the multibyte character and string functions. ERRORS
and may fail and is set if the following condition is encountered: [EILSEQ] An invalid multibyte sequence or wide character code was found. WARNINGS
With the exception of ASCII characters, the code values of wide characters (type of are specific to the effective locale specified by the environment variable. These values may not be compatible with values obtained by specifying other locales that are supported now, or which may be supported in the future. It is recommended that wide character constants and wide string literals (see the not be used, and that wide character code values not be stored in files or devices because future standards may dictate changes in the code value assignments of the wide characters. However, wide character constants and wide string literals corresponding to the characters of the ASCII code set can be safely used since their values are guaranteed to be the same as their ASCII code set values. AUTHOR
The multibyte functions in this entry were developed by OSF and HP. SEE ALSO
setlocale(3C), wctype(3C), thread_safety(5), glossary(9). STANDARDS CONFORMANCE
multibyte(3C)
All times are GMT -4. The time now is 07:48 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy