UTF8_ENCODE(3) 1 UTF8_ENCODE(3)utf8_encode - Encodes an ISO-8859-1 string to UTF-8SYNOPSIS
string utf8_encode (string $data)
DESCRIPTION
This function encodes the string $data to UTF-8, and returns the encoded version. UTF-8 is a standard mechanism used by Unicode for
encoding wide character values into a byte stream. UTF-8 is transparent to plain ASCII characters, is self-synchronized (meaning it is
possible for a program to figure out where in the bytestream characters start) and can be used with normal string comparison functions for
sorting and such. PHP encodes UTF-8 characters in up to four bytes, like this:
UTF-8 encoding
+------+-------------------------------------+---+
|bytes | | |
| | | |
| | bits | |
| | | |
| | representation | |
| | | |
+------+-------------------------------------+---+
| 1 | | |
| | | |
| | 7 | |
| | | |
| | 0bbbbbbb | |
| | | |
| 2 | | |
| | | |
| | 11 | |
| | | |
| | 110bbbbb 10bbbbbb | |
| | | |
| 3 | | |
| | | |
| | 16 | |
| | | |
| | 1110bbbb 10bbbbbb 10bbbbbb | |
| | | |
| 4 | | |
| | | |
| | 21 | |
| | | |
| | 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb | |
| | | |
+------+-------------------------------------+---+
Each b represents a bit that can be used to store character data.
PARAMETERS
o $data
- An ISO-8859-1 string.
RETURN VALUES
Returns the UTF-8 translation of $data.
SEE ALSO utf8_decode(3).
PHP Documentation Group UTF8_ENCODE(3)
Check Out this Related Man Page
UTF2(5) BSD File Formats Manual UTF2(5)NAME
utf2 -- Universal character set Transformation Format encoding of runes
SYNOPSIS
ENCODING "UTF2"
DESCRIPTION
The UTF2 encoding has been deprecated in favour of UTF-8. New applications should not use UTF2.
The UTF2 encoding is based on a proposed X-Open multibyte FSS-UCS-TF (File System Safe Universal Character Set Transformation Format) encod-
ing as used in Plan 9 from Bell Labs. Although it is capable of representing more than 16 bits, the current implementation is limited to 16
bits as defined by the Unicode Standard.
UTF2 representation is backwards compatible with ASCII, so 0x00-0x7f refer to the ASCII character set. The multibyte encoding of runes
between 0x0080 and 0xffff consist entirely of bytes whose high order bit is set. The actual encoding is represented by the following table:
[0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb
[0x0080 - 0x07ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
[0x0800 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
used (but the longer ones will be correctly decoded).
The final three encodings provided by X-Open:
[00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
which provides for the entire proposed ISO-10646 31 bit standard are currently not implemented.
SEE ALSO mklocale(1), setlocale(3), utf8(5)BSD October 11, 2002 BSD