GRAPHEME_EXTRACT(3) 1 GRAPHEME_EXTRACT(3)
grapheme_extract - Function to extract a sequence of default grapheme clusters from a text buffer, which must be encoded in UTF-8.
Procedural style
SYNOPSIS
string grapheme_extract (string $haystack, int $size, [int $extract_type], [int $start], [int &$next])
DESCRIPTION
Function to extract a sequence of default grapheme clusters from a text buffer, which must be encoded in UTF-8.
PARAMETERS
o $haystack
- String to search.
o $size
- Maximum number items - based on the $extract_type - to return.
o $extract_type
- Defines the type of units referred to by the $size parameter:
oGRAPHEME_EXTR_COUNT (default) - $size is the number of default grapheme clusters to extract.
oGRAPHEME_EXTR_MAXBYTES - $size is the maximum number of bytes returned.
oGRAPHEME_EXTR_MAXCHARS - $size is the maximum number of UTF-8 characters returned.
o $start
- Starting position in $haystack in bytes - if given, it must be zero or a positive value that is less than or equal to the length
of $haystack in bytes. If $start does not point to the first byte of a UTF-8 character, the start position is moved to the next
character boundary.
o $next
- Reference to a value that will be set to the next starting position. When the call returns, this may point to the first byte
position past the end of the string.
RETURN VALUES
A string starting at offset $start and ending on a default grapheme cluster boundary that conforms to the $size and $extract_type speci-
fied.
EXAMPLES
Example #1
grapheme_extract(3) example
<?php
$char_a_ring_nfd = "axCCx8A"; // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) normalization form "D"
$char_o_diaeresis_nfd = "oxCCx88"; // 'LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6) normalization form "D"
print urlencode(grapheme_extract( $char_a_ring_nfd . $char_o_diaeresis_nfd, 1, GRAPHEME_EXTR_COUNT, 2));
?>
The above example will output:
o%CC%88
SEE ALSO
grapheme_substr(3), Unicode Text Segmentation: Grapheme Cluster Boundaries .
PHP Documentation Group GRAPHEME_EXTRACT(3)