UTF8_ENCODE(3) 1 UTF8_ENCODE(3)utf8_encode - Encodes an ISO-8859-1 string to UTF-8SYNOPSIS
string utf8_encode (string $data)
DESCRIPTION
This function encodes the string $data to UTF-8, and returns the encoded version. UTF-8 is a standard mechanism used by Unicode for
encoding wide character values into a byte stream. UTF-8 is transparent to plain ASCII characters, is self-synchronized (meaning it is
possible for a program to figure out where in the bytestream characters start) and can be used with normal string comparison functions for
sorting and such. PHP encodes UTF-8 characters in up to four bytes, like this:
UTF-8 encoding
+------+-------------------------------------+---+
|bytes | | |
| | | |
| | bits | |
| | | |
| | representation | |
| | | |
+------+-------------------------------------+---+
| 1 | | |
| | | |
| | 7 | |
| | | |
| | 0bbbbbbb | |
| | | |
| 2 | | |
| | | |
| | 11 | |
| | | |
| | 110bbbbb 10bbbbbb | |
| | | |
| 3 | | |
| | | |
| | 16 | |
| | | |
| | 1110bbbb 10bbbbbb 10bbbbbb | |
| | | |
| 4 | | |
| | | |
| | 21 | |
| | | |
| | 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb | |
| | | |
+------+-------------------------------------+---+
Each b represents a bit that can be used to store character data.
PARAMETERS
o $data
- An ISO-8859-1 string.
RETURN VALUES
Returns the UTF-8 translation of $data.
SEE ALSO utf8_decode(3).
PHP Documentation Group UTF8_ENCODE(3)
Check Out this Related Man Page
SQLITE_LIBENCODING(3)SQLITE_LIBENCODING(3)sqlite_libencoding - Returns the encoding of the linked SQLite librarySYNOPSIS
string sqlite_libencoding (void )
DESCRIPTION
The SQLite library may be compiled in either ISO-8859-1 or UTF-8 compatible modes. This function allows you to determine which encoding
scheme is used by your version of the library.
Warning
The default PHP distribution builds libsqlite in ISO-8859-1 encoding mode. However, this is a misnomer; rather than handling
ISO-8859-1, it operates according to your current locale settings for string comparisons and sort ordering. So, rather than
ISO-8859-1, you should think of it as being ' 8-bit' instead.
When compiled with UTF-8 support, sqlite handles encoding and decoding of UTF-8 multi-byte character sequences, but does not yet do a com-
plete job when working with the data (no normalization is performed for example), and some comparison operations may still not be carried
out correctly.
Warning
It is not recommended that you use PHP in a web-server configuration with a version of the SQLite library compiled with UTF-8 sup-
port, since libsqlite will abort the process if it detects a problem with the UTF-8 encoding.
RETURN VALUES
Returns the library encoding.
SEE ALSO sqlite_lib_version(3).
PHP Documentation Group SQLITE_LIBENCODING(3)
this is my output for my crawler.
/about.html
/ads/
/advanced_search?hl=en
froogle.google.com/frghp?hl=en&tab=wf&ie=UTF-8
groups.google.com/grphp?hl=en&tab=wg&ie=UTF-8
/imghp?hl=en&tab=wi&ie=UTF-8
/intl/en/options/
/language_tools?hl=en
/maphp?hl=en&tab=wl&ie=UTF-8... (3 Replies)
Hi
I was just wondering if there was a way in which i could find out the character set used in a file in HP-UX. ie Whether it is Unicode, UTF-8,ascii etc.
Regards (3 Replies)
My OS (Debian) and gcc use the UTF-8 locale. This code says that the char size is 1 byte but the size of 'a' is really 4 bytes.
int main(void)
{
setlocale(LC_ALL, "en_US.UTF-8");
printf("Char size: %i\nSize of char 'a': %i\nSize of Euro sign '€': %i\nLength of Euro sign: %i\n",... (8 Replies)
I have to search for a keyword (UTF-16) in xml file and if the keyword is found i have to convert the encoding type in the file to UTF-8 and then replace the keyword inside the file from UTF-16 to UTF-8.
I have the code which is working for one file but for number of files in a path its not... (2 Replies)
I Am trying to change the file encoding from ASCII to UTF-8 using below command
iconv -f ASCII -t UTF-8 <input_file> > <output_file>
But the output_file is not actually in UTF-8 format. If I use the file command to check the file encoding it still says ASCII.
While converting am not... (5 Replies)
Hi
I have spool file data UTF file containing
Header
data
footer
when there is data my file type is UTF FORMAT. that is typing
file file1.utf
output is data
but when there is no records of data if only shows Header and footer then the flletype is ASCII... why this happen?
Pls... (2 Replies)
I want to check if the string is WINDOWS-1251 or UTF-8
can you help me to find the string encoding???
or maybe to get URL Content-Type charset with wget?
this is my function on PHP
function check_utf8($str) {
$len = strlen($str);
for($i = 0; $i < $len; $i++){
$c =... (2 Replies)
I'm in the process of being forward-thinking and finally converting my site's db to UTF-8. I've already done the UTF-8 conversion (on a copy for testing) and now I want to go through and convert html entities to their actual characters.
I ran an entity decode on a mysqldump file but realized... (10 Replies)
Hi,
I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine
unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt
and i am getting some chinese characters as below which l opened the converted file on windows machine.
LANG=en_US.UTF-8... (3 Replies)
Hi,
I need to run a SQL which check for special UTF char in DB. When I try to copy that in UNIX file it changes it to some wierd chat. How can in retain the UTF chars in my script?
e.g. ο|π|ρ|σ|τ|υ|φ|χ|ψ
Any help will be appriciated.
Thanks, (14 Replies)
Hi All,
I am trying to obtain count of characters using awk, but "length" function returns a value of 1 for 2-byte or 3-byte characters as well unlike wc -c command.
I have tried to use the below commands within awk function, but it does not seem to work
{
cmd="wc -c "stringtocheck
( cmd )... (6 Replies)
I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following
It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc
1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Hey guys,
I have a little problem,
Let's say I create this script :
#!/bin/sh
nfo_file="/home/admin/info.nfo"
echo "▒▒█ Hello █▒▒" > $nfo_fileIt seems to be okay :
cat /home/admin/info.nfo
▒▒█ Hello █▒▒file -bi /home/admin/info.nfo
text/plain; charset=utf-8But when I open it in a... (7 Replies)
Hello all,
I have a strange Problem with writing umlauts like (ä, ü) to a file, which has an ISO-8859-1 Encoding.
My Shell-script is reading a file. The Encoding differs. Sometimes US-ASCII, UTF-8, ISO-8859-1. Then a I have to replace all "{" with a "ä".
I am reading the file line by line... (3 Replies)