utf-8(7) [redhat man page]
UTF-8(7) Linux Programmer's Manual UTF-8(7) NAME
UTF-8 - an ASCII compatible multi-byte Unicode encoding DESCRIPTION
The Unicode 3.0 character set occupies a 16-bit code space. The most obvious Unicode encoding (known as UCS-2) consists of a sequence of 16-bit words. Such strings can contain as parts of many 16-bit characters bytes like '