Unix/Linux Go Back    


CentOS 7.0 - man page for tcl_getencodingsearchpath (centos section 3)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)


Tcl_GetEncoding(3)		      Tcl Library Procedures		       Tcl_GetEncoding(3)

_________________________________________________________________________________________________

NAME
       Tcl_GetEncoding,   Tcl_FreeEncoding,   Tcl_GetEncodingFromObj,	Tcl_ExternalToUtfDString,
       Tcl_ExternalToUtf,   Tcl_UtfToExternalDString,	 Tcl_UtfToExternal,    Tcl_WinTCharToUtf,
       Tcl_WinUtfToTChar,  Tcl_GetEncodingName, Tcl_SetSystemEncoding, Tcl_GetEncodingNameFromEn-
       vironment, Tcl_GetEncodingNames, Tcl_CreateEncoding, Tcl_GetEncodingSearchPath, Tcl_SetEn-
       codingSearchPath,  Tcl_GetDefaultEncodingDir,  Tcl_SetDefaultEncodingDir  - procedures for
       creating and using encodings

SYNOPSIS
       #include <tcl.h>

       Tcl_Encoding
       Tcl_GetEncoding(interp, name)

       void
       Tcl_FreeEncoding(encoding)

       int											  |
       Tcl_GetEncodingFromObj(interp, objPtr, encodingPtr)					  |

       char *
       Tcl_ExternalToUtfDString(encoding, src, srcLen, dstPtr)

       char *
       Tcl_UtfToExternalDString(encoding, src, srcLen, dstPtr)

       int
       Tcl_ExternalToUtf(interp, encoding, src, srcLen, flags, statePtr,
			 dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr)

       int
       Tcl_UtfToExternal(interp, encoding, src, srcLen, flags, statePtr,
			 dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr)

       char *
       Tcl_WinTCharToUtf(tsrc, srcLen, dstPtr)

       TCHAR *
       Tcl_WinUtfToTChar(src, srcLen, dstPtr)

       const char *
       Tcl_GetEncodingName(encoding)

       int
       Tcl_SetSystemEncoding(interp, name)

       const char *										  |
       Tcl_GetEncodingNameFromEnvironment(bufPtr)						  |

       void
       Tcl_GetEncodingNames(interp)

       Tcl_Encoding
       Tcl_CreateEncoding(typePtr)

       Tcl_Obj *										  |
       Tcl_GetEncodingSearchPath()								  |

       int											  |
       Tcl_SetEncodingSearchPath(searchPath)							  |

       const char *
       Tcl_GetDefaultEncodingDir(void)

       void
       Tcl_SetDefaultEncodingDir(path)

ARGUMENTS
       Tcl_Interp *interp (in)				 Interpreter to use for error  reporting,
							 or   NULL   if  no  error  reporting  is
							 desired.

       const char *name (in)				 Name of encoding to load.

       Tcl_Encoding encoding (in)			 The encoding to query, free, or use  for
							 converting  text.   If encoding is NULL,
							 the current system encoding is used.

       Tcl_Obj *objPtr (in)				 Name of encoding to get token for.	  |

       Tcl_Encoding *encodingPtr (out)			 Points to storage where  encoding  token |
							 is to be written.

       const char *src (in)				 For  the Tcl_ExternalToUtf functions, an
							 array of bytes in the specified encoding
							 that  are to be converted to UTF-8.  For
							 the   Tcl_UtfToExternal   and	 Tcl_Win-
							 UtfToTChar  functions, an array of UTF-8
							 characters to be converted to the speci-
							 fied encoding.

       const TCHAR *tsrc (in)				 An  array of Windows TCHAR characters to
							 convert to UTF-8.

       int srcLen (in)					 Length of src or tsrc in bytes.  If  the
							 length  is  negative,	the encoding-spe-
							 cific length of the string is used.

       Tcl_DString *dstPtr (out)			 Pointer  to  an  uninitialized  or  free
							 Tcl_DString   in   which  the	converted
							 result will be stored.

       int flags (in)					 Various  flag	 bits	OR-ed	together.
							 TCL_ENCODING_START  signifies	that  the
							 source buffer is the first  block  in	a
							 (potentially  multi-block) input stream,
							 telling the conversion routine to  reset
							 to an initial state and perform any ini-
							 tialization that needs to  occur  before
							 the  first byte is converted. TCL_ENCOD-
							 ING_END signifies that the source buffer
							 is  the  last	block  in  a (potentially
							 multi-block) input stream,  telling  the
							 conversion routine to perform any final-
							 ization that needs to	occur  after  the
							 last byte is converted and then to reset
							 to an initial state.	TCL_ENCODING_STO-
							 PONERROR  signifies  that the conversion
							 routine should return	immediately  upon
							 reading a source character that does not
							 exist in the target encoding;	otherwise
							 a  default fallback character will auto-
							 matically be substituted.

       Tcl_EncodingState *statePtr (in/out)		 Used when converting a  (generally  long
							 or  indefinite  length) byte stream in a
							 piece-by-piece fashion.  The  conversion
							 routine  stores  its  current	state  in
							 *statePtr after src (the buffer contain-
							 ing  the  current  piece)  has been con-
							 verted; that state information  must  be
							 passed  back  when  converting  the next
							 piece of the stream  so  the  conversion
							 routine  knows what state it was in when
							 it left off  at  the  end  of	the  last
							 piece.   May  be NULL, in which case the
							 value specified for flags is ignored and
							 the  source buffer is assumed to contain
							 the complete string to convert.

       char *dst (out)					 Buffer in  which  the	converted  result
							 will  be  stored.   No  more than dstLen
							 bytes will be stored in dst.

       int dstLen (in)					 The maximum length of the output  buffer
							 dst in bytes.

       int *srcReadPtr (out)				 Filled with the number of bytes from src
							 that were actually converted.	This  may
							 be  less than the original source length
							 if there was a problem  converting  some
							 source characters.  May be NULL.

       int *dstWrotePtr (out)				 Filled  with  the  number  of bytes that
							 were actually stored in the output  buf-
							 fer  as a result of the conversion.  May
							 be NULL.

       int *dstCharsPtr (out)				 Filled with  the  number  of  characters
							 that  correspond  to the number of bytes
							 stored in the	output	buffer.   May  be
							 NULL.

       Tcl_DString *bufPtr (out)			 Storage for the prescribed system encod- |
							 ing name.

       const Tcl_EncodingType *typePtr (in)		 Structure that defines  a  new  type  of
							 encoding.

       Tcl_Obj *searchPath (in) 			 List  of filesystem directories in which |
							 to search for encoding data files.

       const char *path (in)				 A path to the location of  the  encoding
							 file.
_________________________________________________________________

INTRODUCTION
       These routines convert between Tcl's internal character representation, UTF-8, and charac-
       ter representations used by various operating systems or file systems,  such  as  Unicode,
       ASCII,  or  Shift-JIS.	When operating on strings, such as such as obtaining the names of
       files or displaying characters using international fonts, the strings must  be  translated
       into  one  or  possibly	multiple  formats  that  the  various  system calls can use.  For
       instance, on a Japanese Unix workstation, a user might obtain a	filename  represented  in
       the  EUC-JP  file encoding and then translate the characters to the jisx0208 font encoding
       in order to display the filename in a Tk widget.  The purpose of the encoding  package  is
       to help bridge the translation gap.  UTF-8 provides an intermediate staging ground for all
       the various encodings.  In the example above, text would be  translated	into  UTF-8  from
       whatever  file  encoding  the operating system is using.  Then it would be translated from
       UTF-8 into whatever font encoding the display routines require.

       Some basic encodings are compiled into Tcl.  Others can be defined by the user or  dynami-
       cally loaded from encoding files in a platform-independent manner.

DESCRIPTION
       Tcl_GetEncoding	finds  an  encoding given its name.  The name may refer to a built-in Tcl
       encoding, a user-defined encoding registered by calling Tcl_CreateEncoding, or  a  dynami-
       cally-loadable  encoding  file.	 The return value is a token that represents the encoding
       and can be used in subsequent calls to procedures such as Tcl_GetEncodingName, Tcl_FreeEn-
       coding,	and Tcl_UtfToExternal.	If the name did not refer to any known or loadable encod-
       ing, NULL is returned and an error message is returned in interp.

       The encoding package maintains a database of all encodings currently in	use.   The  first
       time  name  is  seen, Tcl_GetEncoding returns an encoding with a reference count of 1.  If
       the same name is requested further times, then the reference count for  that  encoding  is
       incremented  without the overhead of allocating a new encoding and all its associated data
       structures.

       When an encoding is no longer needed, Tcl_FreeEncoding should be  called  to  release  it.
       When an encoding is no longer in use anywhere (i.e., it has been freed as many times as it
       has been gotten) Tcl_FreeEncoding will release all storage  the	encoding  was  using  and
       delete it from the database.

       Tcl_GetEncodingFromObj treats the string representation of objPtr as an encoding name, and |
       finds an encoding with that name, just as Tcl_GetEncoding does. When an encoding is found, |
       it is cached within the objPtr value for future reference, the Tcl_Encoding token is writ- |
       ten to the storage pointed to by encodingPtr, and the value TCL_OK is returned. If no such |
       encoding  is  found, the value TCL_ERROR is returned, and no writing to *encodingPtr takes |
       place. Just as with Tcl_GetEncoding,  the  caller  should  call	Tcl_FreeEncoding  on  the |
       resulting encoding token when that token will no longer be used.

       Tcl_ExternalToUtfDString  converts  a  source  buffer src from the specified encoding into
       UTF-8.  The converted bytes are stored in dstPtr,  which  is  then  null-terminated.   The
       caller  should  eventually  call Tcl_DStringFree to free any information stored in dstPtr.
       When converting, if any of the characters in the source buffer cannot  be  represented  in
       the  target  encoding,  a  default fallback character will be used.  The return value is a
       pointer to the value stored in the DString.

       Tcl_ExternalToUtf converts a source buffer src from the specified encoding into UTF-8.  Up
       to  srcLen bytes are converted from the source buffer and up to dstLen converted bytes are
       stored in dst.  In all cases, *srcReadPtr is filled with the number  of	bytes  that  were
       successfully  converted	from src and *dstWrotePtr is filled with the corresponding number
       of bytes that were stored in dst.  The return value is one of the following:

	      TCL_OK			   All bytes of src were converted.

	      TCL_CONVERT_NOSPACE	   The destination buffer was not large enough for all of
					   the	converted  data;  as many characters as could fit
					   were converted though.

	      TCL_CONVERT_MULTIBYTE	   The last few bytes  in  the	source	buffer	were  the
					   beginning of a multibyte sequence, but more bytes were
					   needed to complete this sequence.  A  subsequent  call
					   to  the  conversion	routine should pass a buffer con-
					   taining the unconverted bytes  that	remained  in  src
					   plus  some  further	bytes  from  the source stream to
					   properly  convert  the  formerly  split-up	multibyte
					   sequence.

	      TCL_CONVERT_SYNTAX	   The	source	buffer	contained  an  invalid	character
					   sequence.  This may occur if the input stream has been
					   damaged or if the input encoding method was misidenti-
					   fied.

	      TCL_CONVERT_UNKNOWN	   The source buffer contained a character that could not
					   be  represented  in the target encoding and TCL_ENCOD-
					   ING_STOPONERROR was specified.

       Tcl_UtfToExternalDString converts a source buffer src from UTF-8 into the specified encod-
       ing.   The  converted bytes are stored in dstPtr, which is then terminated with the appro-
       priate encoding-specific null.  The caller should eventually call Tcl_DStringFree to  free
       any information stored in dstPtr.  When converting, if any of the characters in the source
       buffer cannot be represented in the target encoding, a default fallback character will  be
       used.  The return value is a pointer to the value stored in the DString.

       Tcl_UtfToExternal converts a source buffer src from UTF-8 into the specified encoding.  Up
       to srcLen bytes are converted from the source buffer and up to dstLen converted bytes  are
       stored  in  dst.   In  all cases, *srcReadPtr is filled with the number of bytes that were
       successfully converted from src and *dstWrotePtr is filled with the  corresponding  number
       of bytes that were stored in dst.  The return values are the same as the return values for
       Tcl_ExternalToUtf.

       Tcl_WinUtfToTChar and Tcl_WinTCharToUtf are Windows-only convenience  functions	for  con-
       verting between UTF-8 and Windows strings.  On Windows 95 (as with the Unix operating sys-
       tem), all strings exchanged between Tcl and the operating system  are  "char"  based.   On
       Windows	NT,  some  strings exchanged between Tcl and the operating system are "char" ori-
       ented while others are in Unicode.  By convention, in Windows a TCHAR is  a  character  in
       the ANSI code page on Windows 95 and a Unicode character on Windows NT.

       If  you planned to use the same "char" based interfaces on both Windows 95 and Windows NT,
       you could use Tcl_UtfToExternal and Tcl_ExternalToUtf (or their	Tcl_DString  equivalents)
       with an encoding of NULL (the current system encoding).	On the other hand, if you planned
       to use the Unicode interface when running on Windows NT and  the  "char"  interfaces  when
       running	on Windows 95, you would have to perform the following type of test over and over
       in your program (as represented in pseudo-code):
	      if (running NT) {
		  encoding <- Tcl_GetEncoding("unicode");
		  nativeBuffer <- Tcl_UtfToExternal(encoding, utfBuffer);
		  Tcl_FreeEncoding(encoding);
	      } else {
		  nativeBuffer <- Tcl_UtfToExternal(NULL, utfBuffer);
	      }
       Tcl_WinUtfToTChar and Tcl_WinTCharToUtf automatically handle this test and use the  proper
       encoding  based on the current operating system.  Tcl_WinUtfToTChar returns a pointer to a
       TCHAR string, and Tcl_WinTCharToUtf expects a TCHAR string  pointer  as	the  src  string.
       Otherwise,   these   functions	behave	 identically   to   Tcl_UtfToExternalDString  and
       Tcl_ExternalToUtfDString.

       Tcl_GetEncodingName is roughly the inverse of Tcl_GetEncoding.	Given  an  encoding,  the
       return  value  is  the  name  argument  that  was used to create the encoding.  The string
       returned by Tcl_GetEncodingName is only	guaranteed  to	persist  until	the  encoding  is
       deleted.  The caller must not modify this string.

       Tcl_SetSystemEncoding  sets  the  default  encoding  that should be used whenever the user
       passes a NULL value for the encoding argument to any of the other encoding functions.   If
       name is NULL, the system encoding is reset to the default system encoding, binary.  If the
       name did not refer to any known or loadable encoding, TCL_ERROR is returned and	an  error
       message	is  left  in interp.  Otherwise, this procedure increments the reference count of
       the new system encoding, decrements the reference count of the old  system  encoding,  and
       returns TCL_OK.

       Tcl_GetEncodingNameFromEnvironment  provides  a	means  for  the Tcl library to report the |
       encoding name it believes to be the correct one to use as the system  encoding,	based  on |
       system  calls  and  examination	of the environment suitable for the platform.  It accepts |
       bufPtr, a pointer to an uninitialized or freed Tcl_DString and writes the encoding name to |
       it.  The Tcl_DStringValue is returned.

       Tcl_GetEncodingNames  sets  the interp result to a list consisting of the names of all the
       encodings that are currently defined or can be dynamically loaded, searching the  encoding
       path  specified	by  Tcl_SetDefaultEncodingDir.	 This  procedure does not ensure that the
       dynamically-loadable encoding files contain valid data, but merely that they exist.

       Tcl_CreateEncoding defines a new encoding and registers the C procedures that  are  called
       back  to  convert between the encoding and UTF-8.  Encodings created by Tcl_CreateEncoding
       are thereafter visible in  the  database  used  by  Tcl_GetEncoding.   Just  as	with  the
       Tcl_GetEncoding	procedure,  the  return value is a token that represents the encoding and
       can be used in subsequent calls to other encoding functions.   Tcl_CreateEncoding  returns
       an  encoding  with  a reference count of 1. If an encoding with the specified name already
       exists, then its entry in the database is replaced with the new encoding;  the  token  for
       the  old encoding will remain valid and continue to behave as before, but users of the new
       token will now call the new encoding procedures.

       The typePtr argument to Tcl_CreateEncoding contains information	about  the  name  of  the
       encoding  and  the  procedures  that  will  be called to convert between this encoding and
       UTF-8.  It is defined as follows:

	      typedef struct Tcl_EncodingType {
		      const char *encodingName;
		      Tcl_EncodingConvertProc *toUtfProc;
		      Tcl_EncodingConvertProc *fromUtfProc;
		      Tcl_EncodingFreeProc *freeProc;
		      ClientData clientData;
		      int nullSize;
	      } Tcl_EncodingType;

       The encodingName provides a string name for the encoding, by which it can be  referred  in
       other procedures such as Tcl_GetEncoding.  The toUtfProc refers to a callback procedure to
       invoke to convert text from this encoding into UTF-8.  The fromUtfProc refers to  a  call-
       back  procedure	to  invoke  to	convert text from UTF-8 into this encoding.  The freeProc
       refers to a callback procedure to invoke when this  encoding  is  deleted.   The  freeProc
       field  may  be NULL.  The clientData contains an arbitrary one-word value passed to toUtf-
       Proc, fromUtfProc, and freeProc whenever they are called.  Typically, this is a pointer to
       a data structure containing encoding-specific information that can be used by the callback
       procedures.  For instance, two very similar encodings such as ascii and macRoman  may  use
       the  same callback procedure, but use different values of clientData to control its behav-
       ior.  The nullSize specifies the number of zero bytes that signify end-of-string  in  this
       encoding.   It must be 1 (for single-byte or multi-byte encodings like ASCII or Shift-JIS)
       or 2 (for double-byte encodings like Unicode).  Constant-sized encodings with  3  or  more
       bytes per character (such as CNS11643) are not accepted.

       The  callback  procedures toUtfProc and fromUtfProc should match the type Tcl_EncodingCon-
       vertProc:

	      typedef int Tcl_EncodingConvertProc(
		      ClientData clientData,
		      const char *src,
		      int srcLen,
		      int flags,
		      Tcl_EncodingState *statePtr,
		      char *dst,
		      int dstLen,
		      int *srcReadPtr,
		      int *dstWrotePtr,
		      int *dstCharsPtr);

       The  toUtfProc  and  fromUtfProc  procedures  are  called  by  the  Tcl_ExternalToUtf   or
       Tcl_UtfToExternal  family  of  functions to perform the actual conversion.  The clientData
       parameter to these procedures is the same as the clientData field  specified  to  Tcl_Cre-
       ateEncoding when the encoding was created.  The remaining arguments to the callback proce-
       dures are the same as the arguments,  documented  at  the  top,	to  Tcl_ExternalToUtf  or
       Tcl_UtfToExternal,  with the following exceptions.  If the srcLen argument to one of those
       high-level functions is negative, the value passed to the callback procedure will  be  the
       appropriate  encoding-specific  string  length  of  src.   If  any of the srcReadPtr, dst-
       WrotePtr, or dstCharsPtr arguments to one of the high-level functions is NULL, the  corre-
       sponding value passed to the callback procedure will be a non-NULL location.

       The callback procedure freeProc, if non-NULL, should match the type Tcl_EncodingFreeProc:
	      typedef void Tcl_EncodingFreeProc(
		      ClientData clientData);

       This  freeProc  function is called when the encoding is deleted.  The clientData parameter
       is the same as the clientData field specified to Tcl_CreateEncoding when the encoding  was
       created.

       Tcl_GetEncodingSearchPath  and  Tcl_SetEncodingSearchPath are called to access and set the |
       list of filesystem directories searched for encoding data files. 			  |

       The value returned by Tcl_GetEncodingSearchPath is the value stored by the last successful |
       call   to  Tcl_SetEncodingSearchPath.   If  no  calls  to  Tcl_SetEncodingSearchPath  have |
       occurred, Tcl will compute an initial value based on the environment.  There is one encod- |
       ing search path for the entire process, shared by all threads in the process.		  |

       Tcl_SetEncodingSearchPath stores searchPath and returns TCL_OK, unless searchPath is not a |
       valid Tcl list, which causes TCL_ERROR to be returned.  The elements of searchPath are not |
       verified  as  existing  readable filesystem directories.  When searching for encoding data |
       files takes place, and non-existent or non-readable filesystem directories on the  search- |
       Path are silently ignored.								  |

       Tcl_GetDefaultEncodingDir  and  Tcl_SetDefaultEncodingDir  are  obsolete  interfaces  best |
       replaced with calls to Tcl_GetEncodingSearchPath and Tcl_SetEncodingSearchPath.	They  are |
       called  to  access  and	set the first element of the searchPath list.  Since Tcl searches |
       searchPath for encoding data files in list order, these routines establish  the	"default" |
       directory in which to find encoding data files.

ENCODING FILES
       Space  would  prohibit  precompiling  into  Tcl every possible encoding algorithm, so many
       encodings are stored on disk as dynamically-loadable encoding files.  This  behavior  also
       allows  the  user  to  create  additional encoding files that can be loaded using the same
       mechanism.  These encoding files  contain  information  about  the  tables  and/or  escape
       sequences used to map between an external encoding and Unicode.	The external encoding may
       consist of single-byte, multi-byte, or double-byte characters.

       Each dynamically-loadable encoding is represented as a text file.  The initial line of the
       file, beginning with a "#" symbol, is a comment that provides a human-readable description
       of the file.  The next line identifies the type of encoding file.  It can be  one  of  the
       following letters:

       [1] S  A  single-byte  encoding, where one character is always one byte long in the encod-
	      ing.  An example is iso8859-1, used by many European languages.

       [2] D  A double-byte encoding, where one character is always two bytes long in the  encod-
	      ing.  An example is big5, used for Chinese text.

       [3] M  A  multi-byte  encoding,	where  one character may be either one or two bytes long.
	      Certain bytes are lead bytes, indicating that another byte  must	follow	and  that
	      together the two bytes represent one character.  Other bytes are not lead bytes and
	      represent themselves.  An example is shiftjis, used by many Japanese computers.

       [4] E  An escape-sequence encoding, specifying that certain sequences of bytes do not rep-
	      resent  characters, but commands that describe how following bytes should be inter-
	      preted.

       The rest of the lines in the file depend on the type.

       Cases [1], [2], and [3] are collectively referred to as table-based encoding  files.   The
       lines in a table-based encoding file are in the same format as this example taken from the
       shiftjis encoding (this is not the complete file):
	      # Encoding file: shiftjis, multi-byte
	      M
	      003F 0 40
	      00
	      0000000100020003000400050006000700080009000A000B000C000D000E000F
	      0010001100120013001400150016001700180019001A001B001C001D001E001F
	      0020002100220023002400250026002700280029002A002B002C002D002E002F
	      0030003100320033003400350036003700380039003A003B003C003D003E003F
	      0040004100420043004400450046004700480049004A004B004C004D004E004F
	      0050005100520053005400550056005700580059005A005B005C005D005E005F
	      0060006100620063006400650066006700680069006A006B006C006D006E006F
	      0070007100720073007400750076007700780079007A007B007C007D203E007F
	      0080000000000000000000000000000000000000000000000000000000000000
	      0000000000000000000000000000000000000000000000000000000000000000
	      0000FF61FF62FF63FF64FF65FF66FF67FF68FF69FF6AFF6BFF6CFF6DFF6EFF6F
	      FF70FF71FF72FF73FF74FF75FF76FF77FF78FF79FF7AFF7BFF7CFF7DFF7EFF7F
	      FF80FF81FF82FF83FF84FF85FF86FF87FF88FF89FF8AFF8BFF8CFF8DFF8EFF8F
	      FF90FF91FF92FF93FF94FF95FF96FF97FF98FF99FF9AFF9BFF9CFF9DFF9EFF9F
	      0000000000000000000000000000000000000000000000000000000000000000
	      0000000000000000000000000000000000000000000000000000000000000000
	      81
	      0000000000000000000000000000000000000000000000000000000000000000
	      0000000000000000000000000000000000000000000000000000000000000000
	      0000000000000000000000000000000000000000000000000000000000000000
	      0000000000000000000000000000000000000000000000000000000000000000
	      300030013002FF0CFF0E30FBFF1AFF1BFF1FFF01309B309C00B4FF4000A8FF3E
	      FFE3FF3F30FD30FE309D309E30034EDD30053006300730FC20152010FF0F005C
	      301C2016FF5C2026202520182019201C201DFF08FF0930143015FF3BFF3DFF5B
	      FF5D30083009300A300B300C300D300E300F30103011FF0B221200B100D70000
	      00F7FF1D2260FF1CFF1E22662267221E22342642264000B0203220332103FFE5
	      FF0400A200A3FF05FF03FF06FF0AFF2000A72606260525CB25CF25CE25C725C6
	      25A125A025B325B225BD25BC203B301221922190219121933013000000000000
	      000000000000000000000000000000002208220B2286228722822283222A2229
	      000000000000000000000000000000002227222800AC21D221D4220022030000
	      0000000000000000000000000000000000000000222022A52312220222072261
	      2252226A226B221A223D221D2235222B222C0000000000000000000000000000
	      212B2030266F266D266A2020202100B6000000000000000025EF000000000000

       The third line of the file is three numbers.  The first number is the  fallback	character
       (in base 16) to use when converting from UTF-8 to this encoding.  The second number is a 1
       if this file represents the encoding for a symbol font, or 0 otherwise.	The  last  number
       (in base 10) is how many pages of data follow.

       Subsequent lines in the example above are pages that describe how to map from the encoding
       into 2-byte Unicode.  The first line in a page identifies the page number.   Following  it
       are  256 double-byte numbers, arranged as 16 rows of 16 numbers.  Given a character in the
       encoding, the high byte of that character is used to select which page, and the	low  byte
       of  that  character  is	used as an index to select one of the double-byte numbers in that
       page - the value obtained being the corresponding Unicode character.   By  examination  of
       the example above, one can see that the characters 0x7E and 0x8163 in shiftjis map to 203E
       and 2026 in Unicode, respectively.

       Following the first page will be all the other pages, each  in  the  same  format  as  the
       first: one number identifying the page followed by 256 double-byte Unicode characters.  If
       a character in the encoding maps to the Unicode character 0000, it means that the  charac-
       ter does not actually exist.  If all characters on a page would map to 0000, that page can
       be omitted.

       Case [4] is the escape-sequence encoding file.  The lines in an this type of file  are  in
       the same format as this example taken from the iso2022-jp encoding:
	      # Encoding file: iso2022-jp, escape-driven
	      E
	      init	     {}
	      final	     {}
	      iso8859-1      \x1b(B
	      jis0201	     \x1b(J
	      jis0208	     \x1b$@
	      jis0208	     \x1b$B
	      jis0212	     \x1b$(D
	      gb2312	     \x1b$A
	      ksc5601	     \x1b$(C

       In the file, the first column represents an option and the second column is the associated
       value.  init is a string to emit or expect before the first character is converted,  while
       final is a string to emit or expect after the last character.  All other options are names
       of table-based encodings; the associated value is  the  escape-sequence	that  marks  that
       encoding.   Tcl	syntax	is  used for the values; in the above example, for instance, "{}"
       represents the empty string and "\x1b" represents character 27.

       When Tcl_GetEncoding encounters an encoding name that has not been loaded, it attempts  to
       load  an  encoding  file  called name.enc from the encoding subdirectory of each directory
       that Tcl searches for its script library.  If the encoding file exists, but is  malformed,
       an error message will be left in interp.

KEYWORDS
       utf, encoding, convert

Tcl					       8.1			       Tcl_GetEncoding(3)
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums


All times are GMT -4. The time now is 02:10 PM.