02-04-2014
iconv and BOMs are a gray area in the Unicode specification. A useful discussion regarding iconv and presence or lack of a BOM is
here
10 More Discussions You Might Find Interesting
1. Programming
While working with russian text under FreeBSD&MySQL I need to convert a string from MySQL to the Unicode format.
I've just started my way in C++ under FreeBSD , so please explain me how can I get ascii code of Char variable and also how can i get a character into variable with the specified ascii... (3 Replies)
Discussion started by: macron
3 Replies
2. UNIX for Dummies Questions & Answers
I'm using shell scripting in Applescript. When searching a file with the ANSEL character set (for GEDCOM files) using (grep '1 CHAR ANSEL' filepath) gives the expected result. When searching a UNICODE formatted file (utf-16), searching for text known to exist in the file using (grep '1 CHAR... (4 Replies)
Discussion started by: Whiterock
4 Replies
3. UNIX for Advanced & Expert Users
Hi,
I have a non-ascii character (Ŵ), which can be represented in UTF-8 encoding as equivalent hex value (\xC5B4). Is there a function in unix to convert this hex value back to display the charcter ? (10 Replies)
Discussion started by: sumirmehta
10 Replies
4. UNIX for Advanced & Expert Users
Hi all,
At present a file from AS400 system is being FTPed to an AIX system.
Now, a similar file needs to be sent from our Unix box (Solaris)
Is there any tool available which does the conversion in Unix from UTF-8 to EBCDIC?
Any suggestions/ pointers are really appreciated.
Thanks,... (4 Replies)
Discussion started by: sridhar_423
4 Replies
5. Red Hat
Hello,
I am trying to convert a 7bit ASCII file to UTF-8.
I have used iconv before though it can't recognize it for some reason and says unknown file encoding.
When I used ascii2uni package with different package, ./ascii2uni -a K -a I -a J -a X test_file > new_test_file
It still... (2 Replies)
Discussion started by: rockf1bull
2 Replies
6. UNIX for Dummies Questions & Answers
Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad.
Problem is: When saving... (4 Replies)
Discussion started by: jawsnnn
4 Replies
7. Shell Programming and Scripting
Hello all
i have utf-8 file that i try to convert to WINDOWS-1251 on linux
without any success
the file name is utf-8 when i try to do :
file -bi test.txt
it gives me :
text/plain; charset=utf-8
when i try to convert the file i do :
/usr/bin/iconv -f UTF-8 -t WINDOWS-1251 test.txt >... (1 Reply)
Discussion started by: umen
1 Replies
8. Shell Programming and Scripting
Hi,
I need to run a SQL which check for special UTF char in DB. When I try to copy that in UNIX file it changes it to some wierd chat. How can in retain the UTF chars in my script?
e.g. ο|π|ρ|σ|τ|υ|φ|χ|ψ
Any help will be appriciated.
Thanks, (14 Replies)
Discussion started by: varun22486
14 Replies
9. Shell Programming and Scripting
I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following
It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc
1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies
10. UNIX for Beginners Questions & Answers
Dears,
I have a shell script - working perfectly on Oracle Linux - that detects the encoding (the charset to be exact) of the files in a specified directory using the "file" command (The file command outputs the charset in Linux, but doesn't do that in AIX), then if the file isn't a UTF-8 text... (4 Replies)
Discussion started by: JeanM-1
4 Replies
LEARN ABOUT PHP
utf8_encode
UTF8_ENCODE(3) 1 UTF8_ENCODE(3)
utf8_encode - Encodes an ISO-8859-1 string to UTF-8
SYNOPSIS
string utf8_encode (string $data)
DESCRIPTION
This function encodes the string $data to UTF-8, and returns the encoded version. UTF-8 is a standard mechanism used by Unicode for
encoding wide character values into a byte stream. UTF-8 is transparent to plain ASCII characters, is self-synchronized (meaning it is
possible for a program to figure out where in the bytestream characters start) and can be used with normal string comparison functions for
sorting and such. PHP encodes UTF-8 characters in up to four bytes, like this:
UTF-8 encoding
+------+-------------------------------------+---+
|bytes | | |
| | | |
| | bits | |
| | | |
| | representation | |
| | | |
+------+-------------------------------------+---+
| 1 | | |
| | | |
| | 7 | |
| | | |
| | 0bbbbbbb | |
| | | |
| 2 | | |
| | | |
| | 11 | |
| | | |
| | 110bbbbb 10bbbbbb | |
| | | |
| 3 | | |
| | | |
| | 16 | |
| | | |
| | 1110bbbb 10bbbbbb 10bbbbbb | |
| | | |
| 4 | | |
| | | |
| | 21 | |
| | | |
| | 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb | |
| | | |
+------+-------------------------------------+---+
Each b represents a bit that can be used to store character data.
PARAMETERS
o $data
- An ISO-8859-1 string.
RETURN VALUES
Returns the UTF-8 translation of $data.
SEE ALSO
utf8_decode(3).
PHP Documentation Group UTF8_ENCODE(3)