NORMALIZER(3) 1 NORMALIZER(3)
The Normalizer class
INTRODUCTION
Normalization is a process that involves transforming characters and sequences of characters into a formally-defined underlying represen-
tation. This process is most important when text needs to be compared for sorting and searching, but it is also used when storing text to
ensure that the text is stored in a consistent representation.
The Unicode Consortium has defined a number of normalization forms reflecting the various needs of applications:
oNormalization Form D (NFD) - Canonical Decomposition
o Normalization Form C (NFC) - Canonical Decomposition followed by Canonical Composition
o Normalization Form KD (NFKD) - Compatibility Decomposition
o Normalization Form KC (NFKC) - Compatibility Decomposition followed by Canonical Composition
The different forms are defined in terms of a set of transformations on the text, transformations that are expressed by both an algorithm
and a set of data files.
CLASS SYNOPSIS
Normalizer
Normalizer
Methods
o publicstatic bool Normalizer::isNormalized (string $input, [string $form = Normalizer::FORM_C])
o publicstatic string Normalizer::normalize (string $input, [string $form = Normalizer::FORM_C])
PREDEFINED CONSTANTS
The following constants define the normalization form used by the normalizer:
o Normalizer::FORM_C ( integer) - Normalization Form C (NFC) - Canonical Decomposition followed by Canonical Composition
o Normalizer::FORM_D ( integer) -Normalization Form D (NFD) - Canonical Decomposition
o Normalizer::FORM_KC ( integer) - Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition
o Normalizer::FORM_KD ( integer) - Normalization Form KD (NFKD) - Compatibility Decomposition
o Normalizer::NONE ( integer) -No decomposition/composition
o Normalizer::OPTION_DEFAULT ( integer) -Default normalization options
SEE ALSO
o Unicode Normalization
o Unicode Normalization FAQ
o ICU User Guide - Normalization
o ICU API Reference - Normalization
PHP Documentation Group NORMALIZER(3)