02-04-2014
iconv and BOMs are a gray area in the Unicode specification. A useful discussion regarding iconv and presence or lack of a BOM is
here
10 More Discussions You Might Find Interesting
1. Programming
While working with russian text under FreeBSD&MySQL I need to convert a string from MySQL to the Unicode format.
I've just started my way in C++ under FreeBSD , so please explain me how can I get ascii code of Char variable and also how can i get a character into variable with the specified ascii... (3 Replies)
Discussion started by: macron
3 Replies
2. UNIX for Dummies Questions & Answers
I'm using shell scripting in Applescript. When searching a file with the ANSEL character set (for GEDCOM files) using (grep '1 CHAR ANSEL' filepath) gives the expected result. When searching a UNICODE formatted file (utf-16), searching for text known to exist in the file using (grep '1 CHAR... (4 Replies)
Discussion started by: Whiterock
4 Replies
3. UNIX for Advanced & Expert Users
Hi,
I have a non-ascii character (Ŵ), which can be represented in UTF-8 encoding as equivalent hex value (\xC5B4). Is there a function in unix to convert this hex value back to display the charcter ? (10 Replies)
Discussion started by: sumirmehta
10 Replies
4. UNIX for Advanced & Expert Users
Hi all,
At present a file from AS400 system is being FTPed to an AIX system.
Now, a similar file needs to be sent from our Unix box (Solaris)
Is there any tool available which does the conversion in Unix from UTF-8 to EBCDIC?
Any suggestions/ pointers are really appreciated.
Thanks,... (4 Replies)
Discussion started by: sridhar_423
4 Replies
5. Red Hat
Hello,
I am trying to convert a 7bit ASCII file to UTF-8.
I have used iconv before though it can't recognize it for some reason and says unknown file encoding.
When I used ascii2uni package with different package, ./ascii2uni -a K -a I -a J -a X test_file > new_test_file
It still... (2 Replies)
Discussion started by: rockf1bull
2 Replies
6. UNIX for Dummies Questions & Answers
Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad.
Problem is: When saving... (4 Replies)
Discussion started by: jawsnnn
4 Replies
7. Shell Programming and Scripting
Hello all
i have utf-8 file that i try to convert to WINDOWS-1251 on linux
without any success
the file name is utf-8 when i try to do :
file -bi test.txt
it gives me :
text/plain; charset=utf-8
when i try to convert the file i do :
/usr/bin/iconv -f UTF-8 -t WINDOWS-1251 test.txt >... (1 Reply)
Discussion started by: umen
1 Replies
8. Shell Programming and Scripting
Hi,
I need to run a SQL which check for special UTF char in DB. When I try to copy that in UNIX file it changes it to some wierd chat. How can in retain the UTF chars in my script?
e.g. ο|π|ρ|σ|τ|υ|φ|χ|ψ
Any help will be appriciated.
Thanks, (14 Replies)
Discussion started by: varun22486
14 Replies
9. Shell Programming and Scripting
I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following
It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc
1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies
10. UNIX for Beginners Questions & Answers
Dears,
I have a shell script - working perfectly on Oracle Linux - that detects the encoding (the charset to be exact) of the files in a specified directory using the "file" command (The file command outputs the charset in Linux, but doesn't do that in AIX), then if the file isn't a UTF-8 text... (4 Replies)
Discussion started by: JeanM-1
4 Replies
LEARN ABOUT DEBIAN
caca_utf8_to_utf32
libcaca character set conversions(3caca) libcaca libcaca character set conversions(3caca)
NAME
libcaca character set conversions -
Functions
__extern uint32_t caca_utf8_to_utf32 (char const *, size_t *)
Convert a UTF-8 character to UTF-32.
__extern size_t caca_utf32_to_utf8 (char *, uint32_t)
Convert a UTF-32 character to UTF-8.
__extern uint8_t caca_utf32_to_cp437 (uint32_t)
Convert a UTF-32 character to CP437.
__extern uint32_t caca_cp437_to_utf32 (uint8_t)
Convert a CP437 character to UTF-32.
__extern char caca_utf32_to_ascii (uint32_t)
Convert a UTF-32 character to ASCII.
__extern int caca_utf32_is_fullwidth (uint32_t)
Tell whether a UTF-32 character is fullwidth.
Detailed Description
These functions perform conversions between usual character sets.
Function Documentation
__extern uint32_t caca_utf8_to_utf32 (char const *s, size_t *bytes) Convert a UTF-8 character read from a string and return its value in the
UTF-32 character set. If the second argument is not null, the total number of read bytes is written in it.
If a null byte was reached before the expected end of the UTF-8 sequence, this function returns zero and the number of read bytes is set to
zero.
This function never fails, but its behaviour with illegal UTF-8 sequences is undefined.
Parameters:
s A string containing the UTF-8 character.
bytes A pointer to a size_t to store the number of bytes in the character, or NULL.
Returns:
The corresponding UTF-32 character, or zero if the character is incomplete.
Referenced by caca_put_str().
__extern size_t caca_utf32_to_utf8 (char *buf, uint32_tch) Convert a UTF-32 character read from a string and write its value in the UTF-8
character set into the given buffer.
This function never fails, but its behaviour with illegal UTF-32 characters is undefined.
Parameters:
buf A pointer to a character buffer where the UTF-8 sequence will be written.
ch The UTF-32 character.
Returns:
The number of bytes written.
__extern uint8_t caca_utf32_to_cp437 (uint32_tch) Convert a UTF-32 character read from a string and return its value in the CP437 character
set, or '?' if the character has no equivalent.
This function never fails.
Parameters:
ch The UTF-32 character.
Returns:
The corresponding CP437 character, or '?' if not representable.
__extern uint32_t caca_cp437_to_utf32 (uint8_tch) Convert a CP437 character read from a string and return its value in the UTF-32 character
set, or zero if the character is a CP437 control character.
This function never fails.
Parameters:
ch The CP437 character.
Returns:
The corresponding UTF-32 character, or zero if not representable.
__extern char caca_utf32_to_ascii (uint32_tch) Convert a UTF-32 character into an ASCII character. When no equivalent exists, a graphically
close equivalent is sought.
This function never fails, but its behaviour with illegal UTF-32 characters is undefined.
Parameters:
ch The UTF-32 character.
Returns:
The corresponding ASCII character, or a graphically close equivalent if found, or '?' if not representable.
__extern int caca_utf32_is_fullwidth (uint32_tch) Check whether the given UTF-32 character should be printed at twice the normal width
(fullwidth characters). If the character is unknown or if its status cannot be decided, it is treated as a standard-width character.
This function never fails.
Parameters:
ch The UTF-32 character.
Returns:
1 if the character is fullwidth, 0 otherwise.
Referenced by caca_put_char(), and caca_put_str().
Author
Generated automatically by Doxygen for libcaca from the source code.
Version 0.99.beta18 Fri Apr 6 2012 libcaca character set conversions(3caca)