utf-8(7) [posix man page]
UTF-8(7) Linux Programmer's Manual UTF-8(7) NAME
UTF-8 - an ASCII compatible multibyte Unicode encoding DESCRIPTION
The Unicode 3.0 character set occupies a 16-bit code space. The most obvious Unicode encoding (known as UCS-2) consists of a sequence of 16-bit words. Such strings can contain--as part of many 16-bit characters--bytes such as '