02-27-2018
If you don't know what codeset was used to encode a file, there isn't much that can be done to guess at what it might be.
It is easy to guess that it is just ASCII if there aren't any bytes with the high order bit set and there aren't any NUL bytes. It is easy to guess that it might be UTF-16 if every other byte is a NUL byte. Guessing that some text might be encoded in one of the EBCDIC codesets might not be too hard, but correctly guessing which variant is another matter. And, other than that, good luck. The differences between the various 8859-* character sets is only obvious to most people if you know what the text in the file is supposed to be beforehand.
10 More Discussions You Might Find Interesting
1. Programming
While working with russian text under FreeBSD&MySQL I need to convert a string from MySQL to the Unicode format.
I've just started my way in C++ under FreeBSD , so please explain me how can I get ascii code of Char variable and also how can i get a character into variable with the specified ascii... (3 Replies)
Discussion started by: macron
3 Replies
2. UNIX for Dummies Questions & Answers
Hi, there:
I am relatively new to Unix. So, I am not even sure if I am asking is an easy or difficult task.
I want to peform GREP like command which will generate a list of files with a file format of UTF-8. I would especially like to know whether the files use UTF-8 or UTF-8N (in other... (0 Replies)
Discussion started by: kotoponus
0 Replies
3. UNIX for Advanced & Expert Users
Hi,
I have a non-ascii character (Ŵ), which can be represented in UTF-8 encoding as equivalent hex value (\xC5B4). Is there a function in unix to convert this hex value back to display the charcter ? (10 Replies)
Discussion started by: sumirmehta
10 Replies
4. Red Hat
Hello,
I am trying to convert a 7bit ASCII file to UTF-8.
I have used iconv before though it can't recognize it for some reason and says unknown file encoding.
When I used ascii2uni package with different package, ./ascii2uni -a K -a I -a J -a X test_file > new_test_file
It still... (2 Replies)
Discussion started by: rockf1bull
2 Replies
5. AIX
Hello everyone!
I have a problem with printing ru_RU.UTF-8 from AIX using lp command.
#locale -a
C
POSIX
RU_RU.UTF-8
RU_RU
en_US.8859-15
en_US.ISO8859-1
en_US
ru_RU.ISO8859-5
ru_RU
#locale
LANG=en_US.UTF-8
LC_COLLATE=RU_RU.UTF-8
LC_CTYPE=RU_RU.UTF-8
LC_MONETARY="en_US" (3 Replies)
Discussion started by: burnAF
3 Replies
6. OS X (Apple)
I have two Macs running 10.7.5.
We download .txt files from remote site to these local Macs using 'rsync -e ssh -avz...'.
The files on Mac1 are in the required format of pure UTF-8. The files on Mac2 are in UTF-8 (no BOM) which is wrong format for us; these formats are indicated using BBEdit.... (1 Reply)
Discussion started by: sovdia
1 Replies
7. Linux
Hi,
I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine
unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt
and i am getting some chinese characters as below which l opened the converted file on windows machine.
LANG=en_US.UTF-8... (3 Replies)
Discussion started by: phanidhar6039
3 Replies
8. AIX
Hello Gang
Can you please help me in installing EN_GB UTF-8 on AIX 5.3 .
I have worked on Solaris and Linux, but not familier with AIX so hoping if someone show the roadmap. (5 Replies)
Discussion started by: ningy
5 Replies
9. Shell Programming and Scripting
Hello all
i have utf-8 file that i try to convert to WINDOWS-1251 on linux
without any success
the file name is utf-8 when i try to do :
file -bi test.txt
it gives me :
text/plain; charset=utf-8
when i try to convert the file i do :
/usr/bin/iconv -f UTF-8 -t WINDOWS-1251 test.txt >... (1 Reply)
Discussion started by: umen
1 Replies
10. Shell Programming and Scripting
I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following
It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc
1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies
LEARN ABOUT REDHAT
encoding
encoding(n) Tcl Built-In Commands encoding(n)
__________________________________________________________________________________________________________________________________________________
NAME
encoding - Manipulate encodings
SYNOPSIS
encoding option ?arg arg ...?
_________________________________________________________________
INTRODUCTION
Strings in Tcl are encoded using 16-bit Unicode characters. Different operating system interfaces or applications may generate strings in
other encodings such as Shift-JIS. The encoding command helps to bridge the gap between Unicode and these other formats.
DESCRIPTION
Performs one of several encoding related operations, depending on option. The legal options are:
encoding convertfrom ?encoding? data
Convert data to Unicode from the specified encoding. The characters in data are treated as binary data where the lower 8-bits of
each character is taken as a single byte. The resulting sequence of bytes is treated as a string in the specified encoding. If
encoding is not specified, the current system encoding is used.
encoding convertto ?encoding? string
Convert string from Unicode to the specified encoding. The result is a sequence of bytes that represents the converted string.
Each byte is stored in the lower 8-bits of a Unicode character. If encoding is not specified, the current system encoding is used.
encoding names
Returns a list containing the names of all of the encodings that are currently available.
encoding system ?encoding?
Set the system encoding to encoding. If encoding is omitted then the command returns the current system encoding. The system encod-
ing is used whenever Tcl passes strings to system calls.
EXAMPLE
It is common practice to write script files using a text editor that produces output in the euc-jp encoding, which represents the ASCII
characters as singe bytes and Japanese characters as two bytes. This makes it easy to embed literal strings that correspond to non-ASCII
characters by simply typing the strings in place in the script. However, because the source command always reads files using the ISO8859-1
encoding, Tcl will treat each byte in the file as a separate character that maps to the 00 page in Unicode. The resulting Tcl strings will
not contain the expected Japanese characters. Instead, they will contain a sequence of Latin-1 characters that correspond to the bytes of
the original string. The encoding command can be used to convert this string to the expected Japanese Unicode characters. For example,
set s [encoding convertfrom euc-jp "xA4xCF"]
would return the Unicode string "u306F", which is the Hiragana letter HA.
SEE ALSO
Tcl_GetEncoding(3)
KEYWORDS
encoding
Tcl 8.1 encoding(n)