Handling Invisible character in a file Post: 302487876

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Identifying invisible characters in Unix file

I have a file, which when you look at it, appears as if it has spaces.... But sometimes, it is has tab or Nulls or some other character which we are not able to see..... How to find what character exactly it is in the file, where ever we are seeing a space... (Iam in solaris)...

2. UNIX for Advanced & Expert Users

Invisible login

Hello! My 1st post here, and I am not so sure if it belongs to the "Andvanced" category! I have searched very much to find a way to login to a system in such a way, that others will not be able to "see" me, with the "who" command! So, is there anybody here to help me with this? :rolleyes: ...

3. Shell Programming and Scripting

Null Character Handling

Hi All, I have a problem with Null values while reading line by line from a text file. I wrote a shell script to read set of file names from a text file line by line, and zipping the each individual file and copying those zip files into some separate directory, and removing the original file...

4. Shell Programming and Scripting

read in a file character by character - replace any unknown ASCII characters with spa

Can someone help me to write a script / command to read in a file, character by character, replace any unknown ASCII characters with space. then write out the file to a new filename/ Thanks!

5. Shell Programming and Scripting

Deleting all characters from 350th character to 450th character from the log file

Hi All, I have a big log file i want to delete all characters (between 350th to 450th characters) starting at 350th character position to 450th character position. please advice or sample code.

6. UNIX for Dummies Questions & Answers

Removing invisible files

hi I have lots of invisable files under a file structure which i would like to delete from the top level rather than going down into each folder. All the files start with ._ these are stub files that get generated. Does anyone have a script that will do this please thanks Treds

7. Virtualization and Cloud Computing

Invisible/Transparent Background in VM

Hello, If you switch to "seamless mode" in virtualbox, you can see the taskbar of the OS on your screen , like having a transparent background on your VM. My question: is there a possibility to do the same in VMware's Workstation (7) ? I know and use the "Unity" mode in Workstation/Player, but...

8. UNIX for Advanced & Expert Users

NCurses not handling hindi half character correctly

Hello, I am working on Ubuntu's Virtual terminal. On the virtual terminal, I am typing in hindi language. Most of the characters are being correctly typed, but in case of typing of a half character, problem is occuring. A hindi character is 'converted' into half by typing ' ् ' after the...

9. UNIX for Beginners Questions & Answers

To remove any invisible and special characters from the file(exclude @!#$&*)

Hi Guys, My requirement is to remove any invisible and special characters from the file like control M(carriage return) and alt numerics and it should not replace @#!$% abc|xyz|acd�� adc|123| 12�� Please help on this. Thanks Rakesh

10. UNIX for Advanced & Expert Users

To remove any invisible and special characters from the file(exclude @#!$*)

LEARN ABOUT OSF1

dechanzi

dechanzi(5)							File Formats Manual						       dechanzi(5)

NAME

       dechanzi - A character encoding system (codeset) for Simplified Chinese

DESCRIPTION

       The DEC Hanzi (dechanzi) codeset consists of the following character sets: ASCII GB2312-80 Extended GB

       DEC Hanzi uses a 2-byte data representation for symbols and ideographic characters that are defined in GB2312-80.

   ASCII Characters
       All  ASCII  characters  are  represented in the form of single-byte, 7-bit data in the DEC Hanzi codeset; that is, the most significant bit
       (MSB) of the byte that represents an ASCII character is always set off. For more information on ASCII characters, refer to ascii(5).

   GB2312-80 Characters
       The code table for GB2312-80 characters is divided into 94 rows(Qu), numbered from 1 to 94. Each row has  94  columns(Wei),  also  numbered
       from  1	to  94. The code table defines a total of 7445 characters, of which 6763 are Chinese characters. Chinese characters are grouped as
       follows: Graphic symbols

	      There are 682 graphic symbols, which occupy rows 1 to 9 in the code table.  Frequently used (Level 1) characters

	      There are 3755 frequently used characters, which occupy rows 16 to 55 in the code table.	Less frequently used (Level 2) characters

	      There are 3008 less frequently used characters, which occupy rows 56-87 in the code table.

       To differentiate GB2312-80 character codes from ASCII and Extended GB character codes, the most significant bit (MSB)  of  both	the  first
       byte and the second byte are set on. The following formulas show how to calculate the value for a GB2312-80 character from its row and col-
       umn numbers:

       1st byte = A0 + Row number
       2nd byte = A0 + Column number

       For example, if a GB2312-80 character is in the first column of the 16th row, the character's value is B0A1, which is  calculated  as  fol-
       lows:

       1st byte = A0(hex) + 16 = B0(hex)
       2nd byte = A0(hex) + 01 = A1(hex)

   Extended GB Characters
       The  Extended GB code table is similar to the GB2312 code table and is divided into 94 rows and 94 columns (8894 code points). However, the
       Extended GB code table provides code points for user-defined characters (UDC). The 8836 code points in this  table  are	divided  into  two
       areas: User-defined area

	      This area spans rows 1 to 87 and provides 8178 code points.  User-defined (reserved) area

	      This  area  spans  rows  88  to  94 and provides 658 code points. This area is where users can define special and long-lasting user-
	      defined characters.

       To differentiate Extended GB codes from ASCII codes and GB2312-80 codes, the most significant bit (MSB) of the first byte is set  on  while
       that  of  the second byte is set off. The following formulas show how the code value of an Extended GB character is calculated from its row
       and column numbers:

       1st byte = A0 + Row number
       2nd byte = 20 + Column number

       For example, if a character is positioned at the first column of the 16th row on the GB2312-80 code plane, the character's value  is  B021,
       which is calculated as follows:

       1st byte = A0(hex) + 16 = B0(hex)
       2nd byte = 20(hex) + 01 = 21(hex)

   Codeset Conversion
       The  following  codeset converter pairs are available for converting Simplified Chinese characters between dechanzi and other encoding for-
       mats. Refer to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which dechanzi is
       the input or output, see the reference page specified in the list item.	big5_dechanzi, dechanzi_big5

	      Converting from and to the Big-5 codeset: big5(5) dechanyu_dechanzi, dechanzi_dechanyu

	      Converting from and to the DEC Hanyu codeset: dechanyu(5) eucTW_dechanzi, dechanzi_eucTW

	      Converting from and to Taiwanese Extended UNIX Code: eucTW(5) UCS-2_dechanzi, dechanzi_UCS-2

	      Converting from and to UCS-2 format: Unicode(5) UCS-4_dechanzi, dechanzi_UCS-4

	      Converting from and to UCS-4 format: Unicode(5) UTF-8_dechanzi, dechanzi_UTF-8

	      Converting from and to UTF-8 format: Unicode(5)

       DEC  Hanzi  encoding  is identical to the Microsoft code-page format (cp936) used for Simplified Chinese characters on PC systems. However,
       DEC Hanzi supports fewer characters than supported by the code page. Therefore, using converters with dechanzi in  the  converter  name	to
       convert between cp936 and other formats can result in some data loss. Refer to code_page(5) for more information about PC code pages.

   DEC Hanzi Fonts
       The operating system provides both screen and printer fonts for DEC Hanzi characters.

       The following bitmap fonts are grouped according to family and reflect various sizes and typefaces for 75dpi and 100dpi display devices:

       Fangsongti Family:

       -adecw-fangsongti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1     -adecw-fangsongti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1
       -adecw-fangsongti-medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-fangsongti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1

       Heiti Family:

       -adecw-heiti-medium-r-normal--16-160-75-75-m-160-gb2312.1980-1 -adecw-heiti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-heiti-
       medium-r-normal--34-340-75-75-m-340-gb2312.1980-1  -adecw-heiti-medium-r-normal--16-160-100-100-m-160-gb2312.1980-1  -adecw-heiti-medium-r-
       normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-heiti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1

       Kaiti Family:

       -adecw-kaiti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-kaiti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1 -adecw-kaiti-
       medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-kaiti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1

       Screen Family:

       -adecw-screen-medium-r-normal--18-180-75-75-m-160-gb2312.1980-1	 -adecw-screen-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1   -adecw-
       screen-medium-r-normal--18-180-100-100-m-160-gb2312.1980-1 -adecw-screen-medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-screen-
       medium-r-normal--18-180-100-100-m-160-gb2312.1980-UDC -adecw-screen-medium-r-normal--24-240-100-100-m-240-gb2312.1980-UDC

       Songti Family:

       -adecw-songti-medium-r-normal--16-160-75-75-m-160-gb2312.1980-1	 -adecw-songti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1   -adecw-
       songti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1  -adecw-songti-medium-r-normal--16-160-100-100-m-160-gb2312.1980-1  -adecw-songti-
       medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-songti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1

       The operating system provides the following PostScript printer fonts for DEC Hanzi characters: Hei-GB2312-80 XiSong-GB2312-80

       For general information on printing Asian language text, refer to i18n_printing(5).

SEE ALSO

       Commands: locale(1)

       Others:	ascii(5),  big5(5),  Chinese(5),  code_page(5),  dechanyu(5),  eucTW(5),  GBK(5), i18n_intro(5), i18n_printing(5), iconv_intro(5),
       l10n_intro(5), sbig5(5), telecode(5), Unicode(5)

																       dechanzi(5)