I am not too sure about MIME::Base64 as I have not used it before. However, base64 itself is encoding-agnostic, that is, it encodes/decodes without regard to whatever encoding the original message is, because it is not only used to encode textual data, but also images, zip files or just about any binary data you can imagine, that do not have the notion of an "encoding" at all. So what Base64 sees and acts on, is just a bytestream. it doesn't really care what is inside.
So, for a text message:
Code:
encoding Base64 Encoding
Text content ---> bytestream ---> Base64-encoded message
Base64 decoding Decoding
Base64-encoded message ---> bytestream ---> Text content
In other words, you still need to manually handle the decoding to have Perl decode it as UTF-8 properly. By default, Perl treats everything as ASCII, so that may explain why you get the output wrong.
Perl has specific quirks with respect to Unicode. That really much depends on the version of Perl you are using. I have had a rather thorough investigation of Perl Unicode support in 5.8 branch, but not sure if any changes have been implemented in 5.10. If you have Perl 5.6 or earlier, chances are the Perl Unicode support is not adequate to ensure Unicode-safety.
I am unable to explain so much with so little space here. I recommend you start with the perluniintro manpage for further information:
While working with russian text under FreeBSD&MySQL I need to convert a string from MySQL to the Unicode format.
I've just started my way in C++ under FreeBSD , so please explain me how can I get ascii code of Char variable and also how can i get a character into variable with the specified ascii... (3 Replies)
Hello,
I am trying to convert a 7bit ASCII file to UTF-8.
I have used iconv before though it can't recognize it for some reason and says unknown file encoding.
When I used ascii2uni package with different package, ./ascii2uni -a K -a I -a J -a X test_file > new_test_file
It still... (2 Replies)
I have a shell script running to load some data from a text file to database. Text file contains some non-ASCII characters like ü. How can i convert these characters to UTF-8 codes before loading to DB. (5 Replies)
can someone help me in converting hex streams to decimal values using perl script
Hex value:
$my_hex_stream="0c07ac14001676";
Every hex value in the above stream should be converted in to decimal and separated by comma.
The output should be: 12,07,172,20,00,22,118 (2 Replies)
Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad.
Problem is: When saving... (4 Replies)
Hi,
I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine
unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt
and i am getting some chinese characters as below which l opened the converted file on windows machine.
LANG=en_US.UTF-8... (3 Replies)
Hello all
i have utf-8 file that i try to convert to WINDOWS-1251 on linux
without any success
the file name is utf-8 when i try to do :
file -bi test.txt
it gives me :
text/plain; charset=utf-8
when i try to convert the file i do :
/usr/bin/iconv -f UTF-8 -t WINDOWS-1251 test.txt >... (1 Reply)
Hi All,
I am trying to obtain count of characters using awk, but "length" function returns a value of 1 for 2-byte or 3-byte characters as well unlike wc -c command.
I have tried to use the below commands within awk function, but it does not seem to work
{
cmd="wc -c "stringtocheck
( cmd )... (6 Replies)
I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following
It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc
1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Dears,
I have a shell script - working perfectly on Oracle Linux - that detects the encoding (the charset to be exact) of the files in a specified directory using the "file" command (The file command outputs the charset in Linux, but doesn't do that in AIX), then if the file isn't a UTF-8 text... (4 Replies)
Discussion started by: JeanM-1
4 Replies
LEARN ABOUT PLAN9
ascii
ASCII(1) General Commands Manual ASCII(1)NAME
ascii, unicode - interpret ASCII, Unicode characters
SYNOPSIS
ascii [ -8 ] [ -oxdbn ] [ -nct ] [ text ]
unicode [ -nt ] hexmin-hexmax
unicode [ -t ] hex [ ... ]
unicode [ -n ] characters
look hex /lib/unicode
DESCRIPTION
Ascii prints the ASCII values corresponding to characters and vice versa; under the -8 option, the ISO Latin-1 extensions (codes 0200-0377)
are included. The values are interpreted in a settable numeric base; -o specifies octal, -d decimal, -x hexadecimal (the default), and -bn
base n.
With no arguments, ascii prints a table of the character set in the specified base. Characters of text are converted to their ASCII val-
ues, one per line. If, however, the first text argument is a valid number in the specified base, conversion goes the opposite way. Control
characters are printed as two- or three-character mnemonics. Other options are:
-n Force numeric output.
-c Force character output.
-t Convert from numbers to running text; do not interpret control characters or insert newlines.
Unicode is similar; it converts between UTF and character values from the Unicode Standard (see utf(6)). If given a range of hexadecimal
numbers, unicode prints a table of the specified Unicode characters -- their values and UTF representations. Otherwise it translates from
UTF to numeric value or vice versa, depending on the appearance of the supplied text; the -n option forces numeric output to avoid ambigu-
ity with numeric characters. If converting to UTF , the characters are printed one per line unless the -t flag is set, in which case the
output is a single string containing only the specified characters. Unlike ascii, unicode treats no characters specially.
The output of ascii and unicode may be unhelpful if the characters printed are not available in the current font.
The file /lib/unicode contains a table of characters and descriptions, sorted in hexadecimal order, suitable for look(1) on the lower case
hex values of characters.
EXAMPLES
ascii -d
Print the ASCII table base 10.
unicode p
Print the hex value of `p'.
unicode 2200-22f1
Print a table of miscellaneous mathematical symbols.
look 039 /lib/unicode
See the start of the Greek alphabet's encoding in the Unicode Standard.
FILES
/lib/unicode
table of characters and descriptions.
SOURCE
/sys/src/cmd/ascii.c
/sys/src/cmd/unicode.c
SEE ALSO look(1)tcs(1), utf(6), font(6),
ASCII(1)