How to know file encoding? Post: 303038018

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

File encoding in Unix

1. I have a shell script which creates a file using cat command. How can i find what encoding the file follows (e.g. UTF8, ANSI)? 2. I want to convert that file to PC-ANSI format. How can i achieve that? I am using HP-Unix.

2. Shell Programming and Scripting

Delete original wav file if lame was successful encoding.

In a bash script: src=�cooltrack.wav� dst=�cooltrack.mp3� lame $src $dst I would like to add some line that would delete the source wav file like: rm $src but I would like this only if the encoding was successful. What should I include before deleting the original to check that the...

3. Shell Programming and Scripting

get the file encoding

Hello! The system is AIX 5.3 Give please command or script to get the file encoding Thanks

4. AIX

get the file encoding

Hello! The system is AIX 5.3 Give please command or script to get the file encoding

5. Shell Programming and Scripting

Cygwin vi XML file encoding problem

Hi, I have got a zip (binary) file transferred from MacOS (thus it has additional __MACOSX directory packed inside). On extracting this zip, there are few *.xml files available. When I opened this *.xml file in vim editor using Cygwin (on windows) the editor displayed in the bottom. I tried...

6. HP-UX

how to find the character encoding of a file in hp_ux

7. Shell Programming and Scripting

How to find the file encoding and updating the file encoding?

Hi, I am beginner to Unix. My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8. Please advice me how to proceed on this.

8. UNIX for Dummies Questions & Answers

Determing the encoding of a file

Hi, I am trying to determine the encoding for the file, because to convert to UTF-8, it seems as though I have to know the encoding of the source. Tried this file <filename> give me this: <filename>:data or International Language text Tried to see the locale and this is the output:...

9. UNIX for Advanced & Expert Users

ISO 88591 file encoding charset in Linux

Hello Experts, please help to provide any insight as I am facing issue migrating java application from hpux to redhat. The java program is using InputStreamReader to read a file without specifying any charset parameter. However, in new Linux Redhat 5.6 environent, when reading a file that...

10. Solaris

View file encoding then change encoding.

Hi all!! I�m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I�m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you!

LEARN ABOUT SUNOS

iconv_unicode

iconv_unicode(5)					Standards, Environments, and Macros					  iconv_unicode(5)

NAME

       iconv_unicode - code set conversion tables for Unicode

DESCRIPTION

       The following code set conversions are supported:

			   CODE SET CONVERSIONS SUPPORTED
			   ------------------------------
	 FROM Code Set				     TO Code Set
	     Code	       FROM	     Target Code	    TO
			       Filename 			    Filename
			       Element				    Element

       ISO 8859-1 (Latin 1)    8859-1		 UTF-8		     UTF-8
       ISO 8859-2 (Latin 2)    8859-2		 UTF-8		     UTF-8
       ISO 8859-3 (Latin 3)    8859-3		 UTF-8		     UTF-8
       ISO 8859-4 (Latin 4)    8859-4		 UTF-8		     UTF-8
       ISO 8859-5 (Cyrillic)   8859-5		 UTF-8		     UTF-8
       ISO 8859-6 (Arabic)     8859-6		 UTF-8		     UTF-8
       ISO 8859-7 (Greek)      8859-7		 UTF-8		     UTF-8
       ISO 8859-8 (Hebrew)     8859-8		 UTF-8		     UTF-8
       ISO 8859-9 (Latin 5)    8859-9		 UTF-8		     UTF-8
       ISO 8859-10 (Latin 6)   8859-10		 UTF-8		     UTF-8
       Japanese EUC	       eucJP		 UTF-8		     UTF-8
       Chinese/PRC EUC
       (GB 2312-1980)	       gb2312		 UTF-8		     UTF-8
       ISO-2022 	       iso2022		 UTF-8		     UTF-8
       Korean EUC	       ko_KR-euc	 Korean UTF-8	     ko_KR-UTF-8
       ISO-2022-KR	       ko_KR-iso2022-7	 Korean UTF-8	     ko_KR_UTF-8
       Korean Johap
       (KS C 5601-1987)        ko_KR-johap	 Korean UTF-8	     ko_KR-UTF-8
       Korean Johap
       (KS C 5601-1992)        ko_KR-johap92	 Korean UTF-8	     ko_KR-UTF-8
       Korean UTF-8	       ko_KR-UTF-8	 Korean EUC	     ko_KR-euc
       Korean UTF-8	       ko_KR-UTF-8	 Korean Johap	     ko_KR-johap
						 (KS C 5601-1987)
       Korean UTF-8	       ko_KR-UTF-8	 Korean Johap	     ko_KR-johap92
						 (KS C 5601-1992)
       KOI8-R (Cyrillic)       KOI8-R		 UCS-2		     UCS-2
       KOI8-R (Cyrillic)       KOI8-R		 UTF-8		     UTF-8
       PC Kanji (SJIS)	       PCK		 UTF-8		     UTF-8
       PC Kanji (SJIS)	       SJIS		 UTF-8		     UTF-8
       UCS-2		       UCS-2		 KOI8-R (Cyrillic)   KOI8-R
       UCS-2		       UCS-2		 UCS-4		     UCS-4

			   CODE SET CONVERSIONS SUPPORTED
			   ------------------------------
	 FROM Code Set				     TO Code Set
	     Code	       FROM	     Target Code	    TO
			       Filename 			    Filename
			       Element				    Element

       UCS-2		  UCS-2 	  UTF-7 		  UTF-7
       UCS-2		  UCS-2 	  UTF-8 		  UTF-8
       UCS-4		  UCS-4 	  UCS-2 		  UCS-2
       UCS-4		  UCS-4 	  UTF-16		  UTF-16
       UCS-4		  UCS-4 	  UTF-7 		  UTF-7
       UCS-4		  UCS-4 	  UTF-8 		  UTF-8
       UTF-16		  UTF-16	  UCS-4 		  UCS-4
       UTF-16		  UTF-16	  UTF-8 		  UTF-8
       UTF-7		  UTF-7 	  UCS-2 		  UCS-2
       UTF-7		  UTF-7 	  UCS-4 		  UCS-4
       UTF-7		  UTF-7 	  UTF-8 		  UTF-8
       UTF-8		  UTF-8 	  ISO 8859-1 (Latin 1)	  8859-1
       UTF-8		  UTF-8 	  ISO 8859-2 (Latin 2)	  8859-2
       UTF-8		  UTF-8 	  ISO 8859-3 (Latin 3)	  8859-3
       UTF-8		  UTF-8 	  ISO 8859-4 (Latin 4)	  8859-4
       UTF-8		  UTF-8 	  ISO 8859-5 (Cyrillic)   8859-5
       UTF-8		  UTF-8 	  ISO 8859-6 (Arabic)	  8859-6
       UTF-8		  UTF-8 	  ISO 8859-7 (Greek)	  8859-7
       UTF-8		  UTF-8 	  ISO 8859-8 (Hebrew)	  8859-8
       UTF-8		  UTF-8 	  ISO 8859-9 (Latin 5)	  8859-9
       UTF-8		  UTF-8 	  ISO 8859-10 (Latin 6)   8859-10
       UTF-8		  UTF-8 	  Japanese EUC		  eucJP
       UTF-8		  UTF-8 	  Chinese/PRC EUC	  gb2312
					  (GB 2312-1980)
       UTF-8		  UTF-8 	  ISO-2022		  iso2022
       UTF-8		  UTF-8 	  KOI8-R (Cyrillic)	  KOI8-R
       UTF-8		  UTF-8 	  PC Kanji (SJIS)	  PCK
       UTF-8		  UTF-8 	  PC Kanji (SJIS)	  SJIS
       UTF-8		  UTF-8 	  UCS-2 		  UCS-2
       UTF-8		  UTF-8 	  UCS-4 		  UCS-4
       UTF-8		  UTF-8 	  UTF-16		  UTF-16
       UTF-8		  UTF-8 	  UTF-7 		  UTF-7
       UTF-8		  UTF-8 	  Chinese/PRC EUC	  zh_CN.euc
					  (GB 2312-1980)

			   CODE SET CONVERSIONS SUPPORTED
			   ------------------------------
	 FROM Code Set				     TO Code Set
	     Code	       FROM	     Target Code	    TO
			       Filename 			    Filename
			       Element				    Element

       UTF-8		     UTF-8	       ISO 2022-CN	     zh_CN.iso2022-7
       UTF-8		     UTF-8	       Chinese/Taiwan Big5   zh_TW-big5
       UTF-8		     UTF-8	       Chinese/Taiwan  EUC   zh_TW-euc
					       (CNS 11643-1992)
       UTF-8		     UTF-8	       ISO 2022-TW	     zh_TW-iso2022-7
       Chinese/PRC EUC	     zh_CN.euc	       UTF-8		     UTF-8
       (GB 2312-1980)
       ISO 2022-CN	     zh_CN.iso2022-7   UTF-8		     UTF-8
       Chinese/Taiwan Big5   zh_TW-big5        UTF-8		     UTF-8
       Chinese/Taiwan  EUC   zh_TW-euc	       UTF-8		     UTF-8
       (CNS 11643-1992)
       ISO 2022-TW	     zh_TW-iso2022-7   UTF-8		     UTF-8

EXAMPLES

       Example 1: The library module filename

       In  the	conversion  library, /usr/lib/iconv (see iconv(3C)), the library module filename is composed of two symbolic elements separated by
       the percent sign (%). The first symbol specifies the code set that is being converted; the second symbol specifies the  target  code,  that
       is, the code set to which the first one is being converted.

       In  the	conversion  table  above, the first  symbol is termed the "FROM Filename Element". The second symbol, representing the target code
       set, is the "TO Filename Element".

       For example, the library module filename to convert from the Korean EUC code set to the Korean UTF-8 code set is

       ko_KR-euc%ko_KR-UTF-8

FILES

       /usr/lib/iconv/*.so	       conversion modules

SEE ALSO

       iconv(1), iconv(3C), iconv(5)

       Chernov, A., Registration of a Cyrillic Character Set, RFC 1489, RELCOM Development Team, July 1993.

       Chon, K., H. Je Park, and U. Choi, Korean Character Encoding for Internet Messages, RFC 1557, Solvit Chosun Media, December 1993.

       Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transformation Format of Unicode, RFC 1642, Taligent, Inc., July 1994.

       Lee, F., HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters, RFC 1843, Stanford  University,  August
       1995.

       Murai,  J.,  M.	Crispin, and E. van der Poel, Japanese Character Encoding for Internet Messages, RFC 1468, Keio University, Panda Program-
       ming, June 1993.

       Nussbacher, H., and Y. Bourvine, Hebrew Character Encoding for Internet Messages, RFC 1555, Israeli  Inter-University,  Hebrew  University,
       December 1993.

       Ohta, M., Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo Institute of Technology, July 1995.

       Ohta, M., and K. Handa, ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP, RFC 1554, Tokyo Institute of Technology, December 1993.

       Reynolds, J., and J. Postel, ASSIGNED NUMBERS, RFC 1700, University of Southern California/Information Sciences Institute, October 1994.

       Simonson, K., Character Mnemonics & Character Sets, RFC 1345, Rationel Almen Planlaegning, June 1992.

       Spinellis, D., Greek Character Encoding for Electronic Mail Messages, RFC 1947, SENA S.A., May 1996.

       The Unicode Consortium, The Unicode Standard, Version 2.0, Addison Wesley Developers Press, July 1996.

       Wei,  Y.,  Y.  Zhang,  J. Li, J. Ding, and Y. Jiang, ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages, RFC
       1842, AsiaInfo Services Inc., Harvard University, Rice University, University of Maryland, August 1995.

       Yergeau, F., UTF-8, a transformation format of Unicode and ISO 10646, RFC 2044, Alis Technologies, October 1996.

       Zhu, H., D. Hu, Z. Wang, T. Kao, W. Chang, and M. Crispin, Chinese Character Encoding for Internet Messages, RFC 1922, Tsinghua University,
       China  Information  Technology Standardization Technical Committee (CITS), Institute for Information Industry (III), University of Washing-
       ton, March 1996.

NOTES

       ISO 8859 character sets using Latin alphabetic characters are distinguished as follows:

       ISO 8859-1 (Latin 1)

	   For most West European languages, including:

	   Albanian		Finnish 	      Italian
	   Catalan		French		      Norwegian
	   Danish		German		      Portuguese
	   Dutch		Galician	      Spanish
	   English		Irish		      Swedish
	   Faeroese		Icelandic

       ISO 8859-2 (Latin 2)

	   For most Latin-written Slavic and Central European languages:

	   Czech		Polish		      Slovak
	   German		Rumanian	      Slovene
	   Hungarian		Croatian

       ISO 8859-3 (Latin 3)

	   Popularly used for Esperanto, Galician, Maltese, and Turkish.

       ISO 8859-4 (Latin 4)

	   Introduces letters for Estonian, Latvian, and Lithuanian. It is an incomplete predecessor of ISO 8859-10 (Latin 6).

       ISO 8859-9 (Latin 5)

	   Replaces the rarely needed Icelandic letters in ISO 8859-1 (Latin 1) with the Turkish ones.

       ISO 8859-10 (Latin 6)

	   Adds the last Inuit (Greenlandic) and Sami (Lappish) letters that were not included in ISO 8859-4 (Latin 4) to complete coverage of the
	   Nordic area.

SunOS 5.10							    18 Apr 1997 						  iconv_unicode(5)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

File encoding in Unix

Discussion started by: ssmallya

2. Shell Programming and Scripting

Delete original wav file if lame was successful encoding.

Discussion started by: Aia

3. Shell Programming and Scripting

get the file encoding

Discussion started by: vinment

4. AIX

get the file encoding

Discussion started by: vinment

5. Shell Programming and Scripting

Cygwin vi XML file encoding problem

Discussion started by: royalibrahim

6. HP-UX

how to find the character encoding of a file in hp_ux

Discussion started by: alokjyotibal

7. Shell Programming and Scripting

How to find the file encoding and updating the file encoding?

Discussion started by: cnraja

8. UNIX for Dummies Questions & Answers

Determing the encoding of a file

Discussion started by: MIA651

9. UNIX for Advanced & Expert Users

ISO 88591 file encoding charset in Linux

Discussion started by: sonic_air

10. Solaris

View file encoding then change encoding.

Discussion started by: mrreds

LEARN ABOUT SUNOS

iconv_unicode