|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Determing the encoding of a file
Hi, I am trying to determine the encoding for the file, because to convert to UTF-8, it seems as though I have to know the encoding of the source. Tried this Code:
file <filename> give me this: <filename>:data or International Language text Tried to see the locale and this is the output: LANG=C LC_COLLATE="C" LC_CTYPE="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_MESSAGES="C" LC_ALL= Not really much help there either. Any help will be appreciated! |
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
Try using 'od' on it to see if there is a pattern you can recognize. Is it unicode, euc, jis, ebcdic, bcdic, or just an odd code page? Hard to say! 'I use 'od -bc' because I was octal-raised, but there are options for hex and decimal offsets. But yes, really, you should know!
Often, 'C' is linked to iso-8859-1 or Latin-1, but your file is not that. |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
Sorry DGPickett, tried that and it looked all Greek to me(not in a literal sense, lol) |
|
#4
|
|||
|
|||
|
Well, utf-8 and unicode have a pattern in their encoding. The dd command has an ebcdic decoder I have used. Might it be from big blue land?
Googling around the subject, one suggests file -i, another mentions enca http://linux.die.net/man/1/enca and for solaris, auto_ef. There is a 'chardet' python based tool. Last edited by DGPickett; 01-04-2013 at 03:35 PM.. |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Quote:
|
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
Yes, IBM is a world unto itself, and ebcdic is the dominant charset, and even then to print right you may need the code page. BCDIC was the 6 bit code, Binary Coded Decimal Info Code, so called because it was closely related to card codes with a decimal basis, where A is 21 base 8, B is 22, I is 31 (20+9), then J is 41 through R at 51, then / is 61, S is 62 through Z is 71. The r-x-0 rows of the card became upper bits, and 1-9 were binary coded. EBCDIC is BCDIC Extended to 8 bits.
You can probably get enca binary or source, and python and chardet for free, and install them. http://www.perzl.org/aix/index.php?n=Main.Enca http://www.python.org/getit/other/ http://pypi.python.org/pypi/chardet Last edited by DGPickett; 01-04-2013 at 04:20 PM.. |
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Did you consider using
iconv or
recode ? Maybe on a trial and error basis, but I think they complain if an unsuitable from-charset is given.
|
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Determing size of swap space | sai_2507 | HP-UX | 2 | 08-27-2012 09:37 AM |
| How to find the file encoding and updating the file encoding? | cnraja | Shell Programming and Scripting | 7 | 05-27-2011 06:50 AM |
| Dymically determing the number of check list in Zenity, How? | shivarajM | Shell Programming and Scripting | 1 | 04-29-2009 01:35 PM |
| get the file encoding | vinment | AIX | 1 | 12-12-2008 01:40 PM |
| get the file encoding | vinment | Shell Programming and Scripting | 2 | 12-12-2008 11:39 AM |
|
|