01-04-2013
Try using 'od' on it to see if there is a pattern you can recognize. Is it unicode, euc, jis, ebcdic, bcdic, or just an odd code page? Hard to say! 'I use 'od -bc' because I was octal-raised, but there are options for hex and decimal offsets. But yes, really, you should know!
Often, 'C' is linked to iso-8859-1 or Latin-1, but your file is not that.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
1. I have a shell script which creates a file using cat command. How can i find what encoding the file follows (e.g. UTF8, ANSI)?
2. I want to convert that file to PC-ANSI format. How can i achieve that?
I am using HP-Unix. (6 Replies)
Discussion started by: ssmallya
6 Replies
2. Shell Programming and Scripting
Hello!
The system is AIX 5.3
Give please command or script to get the file encoding
Thanks (2 Replies)
Discussion started by: vinment
2 Replies
3. AIX
Hello!
The system is AIX 5.3
Give please command or script to get the file encoding (1 Reply)
Discussion started by: vinment
1 Replies
4. Shell Programming and Scripting
hi,
In my project i cannot determine the number of check list initially... I will know dynamically during execution... so How to specify the number of check list dynamically in zenity
Waiting for your precious Answer..... (1 Reply)
Discussion started by: shivarajM
1 Replies
5. Shell Programming and Scripting
Hi,
I have got a zip (binary) file transferred from MacOS (thus it has additional __MACOSX directory packed inside). On extracting this zip, there are few *.xml files available. When I opened this *.xml file in vim editor using Cygwin (on windows) the editor displayed in the bottom. I tried... (4 Replies)
Discussion started by: royalibrahim
4 Replies
6. HP-UX
how to find the character encoding of a file in hp_ux (1 Reply)
Discussion started by: alokjyotibal
1 Replies
7. Shell Programming and Scripting
Hi,
I am beginner to Unix.
My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8.
Please advice me how to proceed on this. (7 Replies)
Discussion started by: cnraja
7 Replies
8. HP-UX
Hi Experts,
Need your advise in determining the size of swap space in of the new HP-Ux server.
Server is having 32G of physical memory.
Ideally what amout of physical memory should be allocated as a swap space?
Following document from HP suggests to have minimum swap space... (2 Replies)
Discussion started by: sai_2507
2 Replies
9. Solaris
Hi all!!
I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII
Is there command to display the files encoding?
Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies
10. Shell Programming and Scripting
how can i know what format a file is
* example:
UTF-8
ANSI
UCS2
i am in a... (8 Replies)
Discussion started by: tricampeon81
8 Replies
LEARN ABOUT DEBIAN
encode::jp
Encode::JP(3perl) Perl Programmers Reference Guide Encode::JP(3perl)
NAME
Encode::JP - Japanese Encodings
SYNOPSIS
use Encode qw/encode decode/;
$euc_jp = encode("euc-jp", $utf8); # loads Encode::JP implicitly
$utf8 = decode("euc-jp", $euc_jp); # ditto
ABSTRACT
This module implements Japanese charset encodings. Encodings supported are as follows.
Canonical Alias Description
--------------------------------------------------------------------
euc-jp /euc.*jp$/i EUC (Extended Unix Character)
/jp.*euc/i
/ujis$/i
shiftjis /shift.*jis$/i Shift JIS (aka MS Kanji)
/sjis$/i
7bit-jis /jis$/i 7bit JIS
iso-2022-jp ISO-2022-JP [RFC1468]
= 7bit JIS with all Halfwidth Kana
converted to Fullwidth
iso-2022-jp-1 ISO-2022-JP-1 [RFC2237]
= ISO-2022-JP with JIS X 0212-1990
support. See below
MacJapanese Shift JIS + Apple vendor mappings
cp932 /windows-31j$/i Code Page 932
= Shift JIS + MS/IBM vendor mappings
jis0201-raw JIS0201, raw format
jis0208-raw JIS0201, raw format
jis0212-raw JIS0201, raw format
--------------------------------------------------------------------
DESCRIPTION
To find out how to use this module in detail, see Encode.
Note on ISO-2022-JP(-1)?
ISO-2022-JP-1 (RFC2237) is a superset of ISO-2022-JP (RFC1468) which adds support for JIS X 0212-1990. That means you can use the same
code to decode to utf8 but not vice versa.
$utf8 = decode('iso-2022-jp-1', $stream);
and
$utf8 = decode('iso-2022-jp', $stream);
yield the same result but
$with_0212 = encode('iso-2022-jp-1', $utf8);
is now different from
$without_0212 = encode('iso-2022-jp', $utf8 );
In the latter case, characters that map to 0212 are first converted to U+3013 (0xA2AE in EUC-JP; a white square also known as 'Tofu' or
'geta mark') then fed to the decoding engine. U+FFFD is not used, in order to preserve text layout as much as possible.
BUGS
The ASCII region (0x00-0x7f) is preserved for all encodings, even though this conflicts with mappings by the Unicode Consortium.
SEE ALSO
Encode
perl v5.14.2 2010-12-30 Encode::JP(3perl)