Sponsored Content
Top Forums Programming How will the behaviour of multibyte char differ because of different LC_CTYPE locale? Post 302509328 by DGPickett on Wednesday 30th of March 2011 02:15:44 PM
Old 03-30-2011
Sybase says en_GB is like Sybase iso_1: New Functionality in Adaptive Server Enterprise Version 12.5.x

and similar to iso 8859-1, often called or similar to latin-1, an one-character code page or font like ASCII but with the upper 128 loaded with western european support, umlau-A, diaresis-o and such.

UTF-8 is an variable width encoding of wide unicode such that the ASCII characters match in one character.

So, until the high bit comes on, you are pretty good, perhaps give or take a few symbol shifts like # versus pound-sterling L with slashes.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

split string with multibyte delimiter

Hi, I need to split a string, either using awk or cut or basic unix commands (no programming) , with a multibyte charectar as a delimeter. Ex: abcd-efgh-ijkl split by -efgh- to get two segments abcd & ijkl Is it possible? Thanks A.H.S (1 Reply)
Discussion started by: azmathshaikh
1 Replies

2. Shell Programming and Scripting

Multibyte characters to ASCII

Hello, Is there any UNIX utility/command/executable that will convert mutlibyte characters to standard single byte ASCII characters in a given file? and Is there any UNIX utility/command/executable that will recognize multibyte characters in a given file name? The typical multibyte... (8 Replies)
Discussion started by: jerardfjay
8 Replies

3. Shell Programming and Scripting

PHP: preg_match_all with multibyte characters?

Hi! I'm trying to separate text into sentences, like this: $pattern = "/(|]|,)**/"; preg_match_all($pattern, $text, $matches); This works fine unless the text contains multibyte characters, like "åäö". How can I make this work with these exotic characters? (2 Replies)
Discussion started by: Ilja
2 Replies

4. Shell Programming and Scripting

PHP: preg_match_all with multibyte characters?

Hi! I'm trying to separate text into sentences, like this: $pattern = "/(|]|,)**/"; preg_match_all($pattern, $text, $matches); This works fine unless the text contains multibyte characters, like "åäö". How can I make this work with these exotic characters? An example phrase that doesn't match:... (1 Reply)
Discussion started by: Ilja
1 Replies

5. Shell Programming and Scripting

sftp file size differ

Hi, I have one doubt over sftp. I am trnasferring a file from server1 to server2 using sftp. The size of the file shows different in file 1 and file2 after sftp even though it shows same number of byte transferred. I don't understand the problem. For example: I have file1 having size... (3 Replies)
Discussion started by: siba.s.nayak
3 Replies

6. Programming

error: invalid conversion from ‘const char*’ to ‘char*’

Compiling xpp (The X Printing Panel) on SL6 (RHEL6 essentially): xpp.cxx: In constructor ‘printFiles::printFiles(int, char**, int&)’: xpp.cxx:200: error: invalid conversion from ‘const char*’ to ‘char*’ The same error with all c++ constructors - gcc 4.4.4. If anyone can throw any light on... (8 Replies)
Discussion started by: GSO
8 Replies

7. Shell Programming and Scripting

Store user input in differ

Hello all Can anyone help me to solve the below issue I want to take user input with space separated .The number of inputs can be variable like if user inputs 1 2 3 4 ouput will stored in as array a where i=4 and I can retreive the value like a =3 any thoughts how to do it ... (2 Replies)
Discussion started by: Pratik4891
2 Replies

8. Shell Programming and Scripting

List the file names that differ

Hello, I have two directories - prev and current . They both have same multiple subdirectories and files. Now the current directory can have some updated files and some new files added that is not in prev. I want to find the list of file names that differ. I am doing this because i can not... (2 Replies)
Discussion started by: jakSun8
2 Replies

9. Shell Programming and Scripting

User switching without carrying over LC_CTYPE env variable

I am using Solaris8, userA's shell '/usr/ace/prog/sdshell', AppuserB's shell '/bin/ksh'. serverT:/home/userA>LC_CTYPE=iso_8859_1; export LC_CTYPE; vtemp='userA variable'; export vtemp serverT:/home/userA>echo "LC_CTYPE=$LC_CTYPE, vtemp=$vtemp"; LC_CTYPE=iso_8859_1, vtemp=userA... (4 Replies)
Discussion started by: kchinnam
4 Replies

10. Shell Programming and Scripting

Positional insertion for multibyte characters

Hi I have a requirement to insert a dot "." after a position in each line, say 110th position. For which, I have written the below command. cat filename | sed 's/./&\./110' > new_filename The code is working fine, but when we have multi byte (2 or 3) characters in the input file, the... (3 Replies)
Discussion started by: tostay2003
3 Replies
UTF8_ENCODE(3)								 1							    UTF8_ENCODE(3)

utf8_encode - Encodes an ISO-8859-1 string to UTF-8

SYNOPSIS
string utf8_encode (string $data) DESCRIPTION
This function encodes the string $data to UTF-8, and returns the encoded version. UTF-8 is a standard mechanism used by Unicode for encoding wide character values into a byte stream. UTF-8 is transparent to plain ASCII characters, is self-synchronized (meaning it is possible for a program to figure out where in the bytestream characters start) and can be used with normal string comparison functions for sorting and such. PHP encodes UTF-8 characters in up to four bytes, like this: UTF-8 encoding +------+-------------------------------------+---+ |bytes | | | | | | | | | bits | | | | | | | | representation | | | | | | +------+-------------------------------------+---+ | 1 | | | | | | | | | 7 | | | | | | | | 0bbbbbbb | | | | | | | 2 | | | | | | | | | 11 | | | | | | | | 110bbbbb 10bbbbbb | | | | | | | 3 | | | | | | | | | 16 | | | | | | | | 1110bbbb 10bbbbbb 10bbbbbb | | | | | | | 4 | | | | | | | | | 21 | | | | | | | | 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb | | | | | | +------+-------------------------------------+---+ Each b represents a bit that can be used to store character data. PARAMETERS
o $data - An ISO-8859-1 string. RETURN VALUES
Returns the UTF-8 translation of $data. SEE ALSO
utf8_decode(3). PHP Documentation Group UTF8_ENCODE(3)
All times are GMT -4. The time now is 03:37 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy