Convert UTF-8 encoded hex value to a character Post: 302252175

Sponsored Content

Top Forums UNIX for Advanced & Expert Users Convert UTF-8 encoded hex value to a character Post 302252175 by cbkihong on Tuesday 28th of October 2008 10:31:09 PM

10-28-2008

Registered User

I am not too sure about MIME::Base64 as I have not used it before. However, base64 itself is encoding-agnostic, that is, it encodes/decodes without regard to whatever encoding the original message is, because it is not only used to encode textual data, but also images, zip files or just about any binary data you can imagine, that do not have the notion of an "encoding" at all. So what Base64 sees and acts on, is just a bytestream. it doesn't really care what is inside.

So, for a text message:

Code:

            encoding      Base64 Encoding
Text content ---> bytestream ---> Base64-encoded message

               Base64 decoding      Decoding
Base64-encoded message ---> bytestream ---> Text content

In other words, you still need to manually handle the decoding to have Perl decode it as UTF-8 properly. By default, Perl treats everything as ASCII, so that may explain why you get the output wrong.

Perl has specific quirks with respect to Unicode. That really much depends on the version of Perl you are using. I have had a rather thorough investigation of Perl Unicode support in 5.8 branch, but not sure if any changes have been implemented in 5.10. If you have Perl 5.6 or earlier, chances are the Perl Unicode support is not adequate to ensure Unicode-safety.

I am unable to explain so much with so little space here. I recommend you start with the perluniintro manpage for further information:

perluniintro - perldoc.perl.org

You will need to provide more information in what is going on in the Perl side if you would like to pursue this in a more constructive manner.

Last edited by cbkihong; 10-28-2008 at 11:48 PM.. Reason: typo

cbkihong

View Public Profile for cbkihong

Find all posts by cbkihong

10 More Discussions You Might Find Interesting

1. Programming

Howto convert Ascii -> UTF-8 & back C++

While working with russian text under FreeBSD&MySQL I need to convert a string from MySQL to the Unicode format. I've just started my way in C++ under FreeBSD , so please explain me how can I get ascii code of Char variable and also how can i get a character into variable with the specified ascii...

2. Red Hat

Can't convert 7bit ASCII to UTF-8

Hello, I am trying to convert a 7bit ASCII file to UTF-8. I have used iconv before though it can't recognize it for some reason and says unknown file encoding. When I used ascii2uni package with different package, ./ascii2uni -a K -a I -a J -a X test_file > new_test_file It still...

3. Shell Programming and Scripting

How to modify character to UTF-8 in shell script?

I have a shell script running to load some data from a text file to database. Text file contains some non-ASCII characters like �. How can i convert these characters to UTF-8 codes before loading to DB.

4. Shell Programming and Scripting

Convert hex to decimal

can someone help me in converting hex streams to decimal values using perl script Hex value: $my_hex_stream="0c07ac14001676"; Every hex value in the above stream should be converted in to decimal and separated by comma. The output should be: 12,07,172,20,00,22,118

5. UNIX for Dummies Questions & Answers

Issue with UTF-8 BOM character in text file

Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad. Problem is: When saving...

6. Linux

Help to Convert file from UNIX UTF-8 to Windows UTF-16

Hi, I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt and i am getting some chinese characters as below which l opened the converted file on windows machine. LANG=en_US.UTF-8...

7. Shell Programming and Scripting

Trying to convert utf-8 to WINDOWS-1251

Hello all i have utf-8 file that i try to convert to WINDOWS-1251 on linux without any success the file name is utf-8 when i try to do : file -bi test.txt it gives me : text/plain; charset=utf-8 when i try to convert the file i do : /usr/bin/iconv -f UTF-8 -t WINDOWS-1251 test.txt >...

8. UNIX for Advanced & Expert Users

UTF-8,16,32 character lengths using awk

Hi All, I am trying to obtain count of characters using awk, but "length" function returns a value of 1 for 2-byte or 3-byte characters as well unlike wc -c command. I have tried to use the below commands within awk function, but it does not seem to work { cmd="wc -c "stringtocheck ( cmd )...

9. Shell Programming and Scripting

Convert UTF-8 file to ASCII/ISO8859-1 OR replace characters

I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or...

10. UNIX for Beginners Questions & Answers

Convert files to UTF-8 on AIX 7.1

Dears, I have a shell script - working perfectly on Oracle Linux - that detects the encoding (the charset to be exact) of the files in a specified directory using the "file" command (The file command outputs the charset in Linux, but doesn't do that in AIX), then if the file isn't a UTF-8 text...

LEARN ABOUT PLAN9

ascii

ASCII(1)						      General Commands Manual							  ASCII(1)

NAME

       ascii, unicode - interpret ASCII, Unicode characters

SYNOPSIS

       ascii [ -8 ] [ -oxdbn ] [ -nct ] [ text ]

       unicode [ -nt ] hexmin-hexmax

       unicode [ -t ] hex [ ...  ]

       unicode [ -n ] characters

       look hex /lib/unicode

DESCRIPTION

       Ascii prints the ASCII values corresponding to characters and vice versa; under the -8 option, the ISO Latin-1 extensions (codes 0200-0377)
       are included.  The values are interpreted in a settable numeric base; -o specifies octal, -d decimal, -x hexadecimal (the default), and -bn
       base n.

       With  no  arguments, ascii prints a table of the character set in the specified base.  Characters of text are converted to their ASCII val-
       ues, one per line. If, however, the first text argument is a valid number in the specified base, conversion goes the opposite way.  Control
       characters are printed as two- or three-character mnemonics.  Other options are:

       -n     Force numeric output.

       -c     Force character output.

       -t     Convert from numbers to running text; do not interpret control characters or insert newlines.

       Unicode	is  similar; it converts between UTF and character values from the Unicode Standard (see utf(6)).  If given a range of hexadecimal
       numbers, unicode prints a table of the specified Unicode characters -- their values and UTF representations.  Otherwise it translates  from
       UTF  to numeric value or vice versa, depending on the appearance of the supplied text; the -n option forces numeric output to avoid ambigu-
       ity with numeric characters.  If converting to UTF , the characters are printed one per line unless the -t flag is set, in which  case  the
       output is a single string containing only the specified characters.  Unlike ascii, unicode treats no characters specially.

       The output of ascii and unicode may be unhelpful if the characters printed are not available in the current font.

       The  file /lib/unicode contains a table of characters and descriptions, sorted in hexadecimal order, suitable for look(1) on the lower case
       hex values of characters.

EXAMPLES

       ascii -d
	      Print the ASCII table base 10.

       unicode p
	      Print the hex value of `p'.

       unicode 2200-22f1
	      Print a table of miscellaneous mathematical symbols.

       look 039 /lib/unicode
	      See the start of the Greek alphabet's encoding in the Unicode Standard.

FILES

       /lib/unicode
	      table of characters and descriptions.

SOURCE

       /sys/src/cmd/ascii.c
       /sys/src/cmd/unicode.c

SEE ALSO

       look(1) tcs(1), utf(6), font(6),

																	  ASCII(1)

10 More Discussions You Might Find Interesting

1. Programming

Howto convert Ascii -> UTF-8 & back C++

Discussion started by: macron

2. Red Hat

Can't convert 7bit ASCII to UTF-8

Discussion started by: rockf1bull

3. Shell Programming and Scripting

How to modify character to UTF-8 in shell script?

Discussion started by: vel4ever

4. Shell Programming and Scripting

Convert hex to decimal

Discussion started by: Arun_Linux