Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Determing the encoding of a file Post 302751859 by MIA651 on Friday 4th of January 2013 03:45:10 PM
Old 01-04-2013
Quote:
Originally Posted by DGPickett
Well, utf-8 and unicode have a pattern in their encoding. The dd command has an ebcdic decoder I have used. Might it be from big blue land?

Googling around the subject, one suggests file -i, another mentions enca enca(1): detect/convert encoding of text files - Linux man page and for solaris, auto_ef. There is a 'chardet' python based tool.
Yes tried file -i and it tells me it is a regular file. By big blue land, I assume you mean IBM? If that's the case yes I am using an AIX machine therefore auto_ef and enca are unrecognized commands. Yet to try chardet...I'll have to dig deeper. Thanks though!
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

File encoding in Unix

1. I have a shell script which creates a file using cat command. How can i find what encoding the file follows (e.g. UTF8, ANSI)? 2. I want to convert that file to PC-ANSI format. How can i achieve that? I am using HP-Unix. (6 Replies)
Discussion started by: ssmallya
6 Replies

2. Shell Programming and Scripting

get the file encoding

Hello! The system is AIX 5.3 Give please command or script to get the file encoding Thanks (2 Replies)
Discussion started by: vinment
2 Replies

3. AIX

get the file encoding

Hello! The system is AIX 5.3 Give please command or script to get the file encoding (1 Reply)
Discussion started by: vinment
1 Replies

4. Shell Programming and Scripting

Dymically determing the number of check list in Zenity, How?

hi, In my project i cannot determine the number of check list initially... I will know dynamically during execution... so How to specify the number of check list dynamically in zenity Waiting for your precious Answer..... (1 Reply)
Discussion started by: shivarajM
1 Replies

5. Shell Programming and Scripting

Cygwin vi XML file encoding problem

Hi, I have got a zip (binary) file transferred from MacOS (thus it has additional __MACOSX directory packed inside). On extracting this zip, there are few *.xml files available. When I opened this *.xml file in vim editor using Cygwin (on windows) the editor displayed in the bottom. I tried... (4 Replies)
Discussion started by: royalibrahim
4 Replies

6. HP-UX

how to find the character encoding of a file in hp_ux

how to find the character encoding of a file in hp_ux (1 Reply)
Discussion started by: alokjyotibal
1 Replies

7. Shell Programming and Scripting

How to find the file encoding and updating the file encoding?

Hi, I am beginner to Unix. My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8. Please advice me how to proceed on this. (7 Replies)
Discussion started by: cnraja
7 Replies

8. HP-UX

Determing size of swap space

Hi Experts, Need your advise in determining the size of swap space in of the new HP-Ux server. Server is having 32G of physical memory. Ideally what amout of physical memory should be allocated as a swap space? Following document from HP suggests to have minimum swap space... (2 Replies)
Discussion started by: sai_2507
2 Replies

9. Solaris

View file encoding then change encoding.

Hi all!! I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

10. Shell Programming and Scripting

How to know file encoding?

how can i know what format a file is * example: UTF-8 ANSI UCS2 i am in a... (8 Replies)
Discussion started by: tricampeon81
8 Replies
auto_ef(1)							   User Commands							auto_ef(1)

NAME
auto_ef - auto encoding finder SYNOPSIS
/usr/bin/auto_ef [-e encoding_list] [-a] [-l level] [file ...] /usr/bin/auto_ef -h DESCRIPTION
The auto_ef utility identifies the encoding of a given file. The utility judges the encoding by using the iconv code conversion, determin- ing whether a certain code conversion was successful with the file, and also by performing frequency analyses on the character sequences that appear in the file. The auto_ef utility might produce unexpected output if the string is binary, a character table, a localized digit list, or a chronogram, or if the string or file is very small in size (for example, less than one 100 bytes). ASCII JIS ISO-2022-JP eucJP Japanese EUC PCK Japanese PC Kanji, CP932, Shift JIS UTF-8 Korean EUC ko_KR.euc ko_KR.cp949 Unified Hangul ISO-2022-KR ISO-2022 Korean zh_CN.iso2022-CN ISO-2022 CN/CN-EXT zh_CN.euc Simplified Chinese EUC, GB2312 GB18030 Simplified Chinese GB18030/GBK zh_TW-big5 BIG5 zh_TW-euc Traditional Chinese EUC zh_TW.hkscs Hong Kong BIG5 iso-8859-1 West European, and similar iso-8859-2 East European, and similar iso-8859-5 Cyrillic, and similar iso-8859-6 Arabic iso-8859-7 Greek iso-8859-8 Hebrew CP1250 windows-1250, corresponding to ISO-8859-2 CP1251 windows-1251, corresponding to ISO-8859-5 CP1252 windows-1252, corresponding to ISO-8859-1 CP1253 windows-1253, corresponding to ISO-8859-7 CP1255 windows-1255, corresponding to ISO-8859-8 koi8-r corresponding to iso-8859-5 By default, auto_ef returns a single, most likely encoding for text in a specified file. To get all possible encodings for the file, use the -a option. Also by default, auto_ef uses the fastest process to examine the file. For more accurate results, use the -l option. To examine data with a limited set of encodings, use the -e option. OPTIONS
The following options are supported: -a Shows all possible encodings in order of possibility, with scores in the range between 0.0 and 1.0. A higher score means a higher possibility. For example, example% auto_ef -a test_file eucJP 0.89 zh_CN.euc 0.04 ko_KR.euc 0.01 Without this option, only one encoding with the highest score is shown. -e encoding_list Examines data only with specified encodings. For example, when encoding_list is specified as "ko_KR.euc:ko_KR.cp949", auto_ef examines text only with CP949 and ko_KR.euc. Without this option, auto_ef examines text with all encodings. Multiple encodings can be specified by separating the encodings using a colon (:). -h Shows the usage message. -l level Specifies the level of judgment. The value of level can be 0, 1, 2, or 3. Level 3 produces the best result but can be slow. Level 0 is fastest but results can be less accurate than in higher levels. The default is level 0. OPERANDS
The following operands are supported: file File name to examine. EXAMPLES
Example 1 Examining encoding of a file example% auto_ef file_name Example 2 Examining encoding of a file at level 2. example% auto_ef -l 2 file_name Example 3 Examining encoding of a file with only eucJP or ko_KR.euc example% auto_ef -e "eucJP:ko_KR.euc" file_name EXIT STATUS
The following exit values are returned: 0 Successful completion 1 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWautoef | +-----------------------------+-----------------------------+ |Interface Stability |See below. | +-----------------------------+-----------------------------+ Interface Stability of output format, when option -a is specified, is Evolving. Other interfaces are Stable. SEE ALSO
auto_ef(3EXT), libauto_ef(3LIB), attributes(5) International Language Environments Guide SunOS 5.11 26 Sep 2004 auto_ef(1)
All times are GMT -4. The time now is 07:41 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy