Sponsored Content
Top Forums Shell Programming and Scripting Handling Invisible character in a file Post 302487876 by Scrutinizer on Friday 14th of January 2011 02:21:53 AM
Old 01-14-2011
It is extended ascii so it can have all kinds of meanings, depending on the original ASCII extended character set. 240 octal = ascii 160 decimal, which may have had the meaning of a non-breaking space.... So if that was the case then it is maybe best to replace it with a regular space:
Code:
sed "s/$(printf "\240")/ /g" file

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Identifying invisible characters in Unix file

I have a file, which when you look at it, appears as if it has spaces.... But sometimes, it is has tab or Nulls or some other character which we are not able to see..... How to find what character exactly it is in the file, where ever we are seeing a space... (Iam in solaris)... (1 Reply)
Discussion started by: thanuman
1 Replies

2. UNIX for Advanced & Expert Users

Invisible login

Hello! My 1st post here, and I am not so sure if it belongs to the "Andvanced" category! I have searched very much to find a way to login to a system in such a way, that others will not be able to "see" me, with the "who" command! So, is there anybody here to help me with this? :rolleyes: ... (1 Reply)
Discussion started by: SmileKilled
1 Replies

3. Shell Programming and Scripting

Null Character Handling

Hi All, I have a problem with Null values while reading line by line from a text file. I wrote a shell script to read set of file names from a text file line by line, and zipping the each individual file and copying those zip files into some separate directory, and removing the original file... (3 Replies)
Discussion started by: npk2210
3 Replies

4. Shell Programming and Scripting

read in a file character by character - replace any unknown ASCII characters with spa

Can someone help me to write a script / command to read in a file, character by character, replace any unknown ASCII characters with space. then write out the file to a new filename/ Thanks! (1 Reply)
Discussion started by: raghav525
1 Replies

5. Shell Programming and Scripting

Deleting all characters from 350th character to 450th character from the log file

Hi All, I have a big log file i want to delete all characters (between 350th to 450th characters) starting at 350th character position to 450th character position. please advice or sample code. (6 Replies)
Discussion started by: rajeshorpu
6 Replies

6. UNIX for Dummies Questions & Answers

Removing invisible files

hi I have lots of invisable files under a file structure which i would like to delete from the top level rather than going down into each folder. All the files start with ._ these are stub files that get generated. Does anyone have a script that will do this please thanks Treds (7 Replies)
Discussion started by: treds
7 Replies

7. Virtualization and Cloud Computing

Invisible/Transparent Background in VM

Hello, If you switch to "seamless mode" in virtualbox, you can see the taskbar of the OS on your screen , like having a transparent background on your VM. My question: is there a possibility to do the same in VMware's Workstation (7) ? I know and use the "Unity" mode in Workstation/Player, but... (0 Replies)
Discussion started by: al0x
0 Replies

8. UNIX for Advanced & Expert Users

NCurses not handling hindi half character correctly

Hello, I am working on Ubuntu's Virtual terminal. On the virtual terminal, I am typing in hindi language. Most of the characters are being correctly typed, but in case of typing of a half character, problem is occuring. A hindi character is 'converted' into half by typing ' ् ' after the... (0 Replies)
Discussion started by: syed.waris
0 Replies

9. UNIX for Beginners Questions & Answers

To remove any invisible and special characters from the file(exclude @!#$&*)

Hi Guys, My requirement is to remove any invisible and special characters from the file like control M(carriage return) and alt numerics and it should not replace @#!$% abc|xyz|acd¥£ó adc|123| 12áí Please help on this. Thanks Rakesh (1 Reply)
Discussion started by: rakeshp
1 Replies

10. UNIX for Advanced & Expert Users

To remove any invisible and special characters from the file(exclude @#!$*)

Hi Guys, My requirement is to remove any invisible and special characters from the file like control M(carriage return) and alt numerics and it should not replace @#!$% abc|xyz|acd¥£ó adc|123| 12áí Please help on this. Thanks Rakesh (1 Reply)
Discussion started by: rakeshp
1 Replies
dechanyu(5)							File Formats Manual						       dechanyu(5)

NAME
dechanyu - A character encoding system (codeset) for Traditional Chinese DESCRIPTION
The DEC Hanyu (dechanyu) codset consists of the following sets of characters: ASCII The first and second character planes of CNS11643-1986 Digital Taiwan Supplemental Character Set (DTSCS) User-defined characters DEC Hanyu uses a combination of single-byte data, 2-byte data, and 4-byte data to represent ASCII characters, symbols, or ideographic char- acters. ASCII characters All ASCII characters are represented in the form of single-byte, 7-bit data in DEC Hanyu; that is, the most significant bit (MSB) of a byte that represents an ASCII character is always set off. Refer to ascii(5) for more information about the ASCII character set. CNS11643-1986 Characters (Planes 1 and 2) Each plane of the CNS 11643-1986 character set is divided into 94 rows and each of these rows has 94 columns. The characters defined in plane 1 and plane 2 of CNS 11643-1986 are as follows: ------------------------------------------------------------------------- Character Plane Character Type Number of Characters ------------------------------------------------------------------------- 1 Special characters 651 Control characters 33 Frequently used characters 5401 2 Less frequently used characters 7650 ------------------------------------------------------------------------- Note that the first two planes of the CNS11643-1986 character set are the same as those specified for the revised CNS11643-1992 character set. In DEC Hanyu, each CNS 11643-1986 character is represented by two bytes, in conformance with the CNS 11643-1986 standard. The MSB of the first byte is always turned on while that of the second byte is on for the first character plane and off for the second character plane. The first byte of CNS 11643-1986 encoding determines the row number of the character, while the second byte determines its column number. Code ranges for the two character planes are as follows: A1A1 to FEFE A121 to FE7E The following formulas determine the value of a CNS 11643-1986 character in relation to its row and column numbers. For a CNS 11643-1986 Plane 1 character: 1st byte = A0(hex) + Row number 2nd byte = A0(hex) + Column number For a CNS 11643-1986 Plane 2 character: 1st byte = A0(hex) + Row number 2nd byte = 20(hex) + Column number For example, if a character is positioned at the first column of the 36th row on CNS 11643 plane 1, its value is C4A1, which is calculated as follows: 1st byte = A0(hex) + 36 = C4(hex) 2nd byte = A0(hex) + 01 = A1(hex) Similarly, if a character is positioned at the first column of the 36th row on CNS 11643 plane 2, its value is C421, which is calculated as follows: 1st byte = A0(hex) + 36 = C4(hex) 2nd byte = 20(hex) + 01 = 21(hex) DTSCS Characters Currently, only the EDPC (Electronic Data Processing Centre) Recommended Character Set, which defines a total of 6319 characters (rows 1 to 68), is included in the Digital Taiwan Supplementary Character Set (DTSCS). In the revised CNS 11643-1992 standard, the 6319 characters in the EDPC Recommended Character Set are assigned to the third and fourth character planes as follows: --------------------------------------------------------- EDPC Characters Character Plane Number of Characters --------------------------------------------------------- Part I Plane 3 6148 Part II Plane 4 171 --------------------------------------------------------- The characters defined in Plane 3 and Plane 4 of CNS 11643-1992 are as follows: --------------------------------------------------------------------------- Character Plane Character Type Number of Characters --------------------------------------------------------------------------- 3 Rarely-used characters (EDPC Part I) 6148 4 Used for residency system, ISO 2nd edi- 7298 tion DIS 10646 Han characters, 171 EDPC Part II Characters --------------------------------------------------------------------------- In DEC Hanyu, each DTSCS character is represented by a 4-byte value. The first two bytes are the leading value, specifically C2CB, which is used as a designator sequence for the DTSCS character set. The MSB of the third and fourth bytes is set on for the EDPC Recommended Character Set. User-Defined Characters In addition to the two Chinese character sets described in preceding sections, DEC Hanyu provides an area of 3587 positions for user- defined characters (UDC). The positions for UDC are those positions that are unused (but not reserved) code points on the first and second character planes of CNS 11643-1986. The encoding for UDC is exactly the same as that for CNS11643-1986 except that the two sets of characters occupy different regions. Code ranges for UDC are as follows: ----------------------------------------------- Character Plane Number of UDC Code Range ----------------------------------------------- 1 145 FDCC to FEFE 1 2256 AAA1 to C1FE 2 1186 F245 to FE7E ----------------------------------------------- Codeset Conversion The following codeset converter pairs are available for converting Traditional Chinese characters between dechanyu and other encoding for- mats. Refer to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which dechanyu is the input or output, see the reference page specified in the list item. big5_dechanyu, dechanyu_big5 Converting from and to the Big-5 codeset: big5(5). Note that Big-5 encoding is equivalent to the Microsoft code-page format used on PCs for Traditional Chinese. See code_page(5) for information about PC code pages. dechanzi_dechanyu, dechanyu_dechanzi Converting from and to the DEC Hanzi codeset: dechanzi(5). eucTW_dechanyu, dechanyu_eucTW Converting from and to Taiwanese Extended UNIX Code: eucTW(5). telecode_dechanyu, dechanyu_telecode Converting from and to the Telecode codeset: telecode(5). UCS-2_dechanyu, dechanyu_UCS-2 Converting from and to UCS-2 format: Unicode(5). UCS-4_dechanyu, dechanyu_UCS-4 Converting from and to UCS-4 format: Unicode(5). UTF-8_dechanyu, dechanyu_UTF-8 Converting from and to UTF--8 format: Unicode(5). Fonts for DEC Hanyu Characters The operating system provides both screen and printer fonts for DEC Hanyu characters. The following DECwindows Motif fonts are grouped according to character set and family; they reflect various sizes and typefaces for 75dpi and 100dpi display devices: CNS 11643-1986 Fonts (Hei family): -adecw-hei-medium-r-normal--16-160-75-75-m-160-dec.cns11643.1986-2 -adecw-hei-medium-r-normal--24-240-75-75-m-240-dec.cns11643.1986-2 -adecw-hei-medium-r-normal--16-160-100-100-m-160-dec.cns11643.1986-2 -adecw-hei-medium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-2 CNS 11643-1986 fonts (Screen family): -adecw-screen-medium-r-normal--18-180-75-75-m-160-dec.cns11643.1986-2 -adecw-screen-medium-r-normal--24-240-75-75-m-240-dec.cns11643.1986-2 -adecw-screen-medium-r-normal--18-180-100-100-m-160-dec.cns11643.1986-2 -adecw-screen-medium-r-nor- mal--24-240-100-100-m-240-dec.cns11643.1986-2 -adecw-screen-medium-r-normal--18-180-100-100-m-160-dec.cns11643.1986-UDC -adecw-screen- medium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-UDC CNS 11643-1986 fonts (Sung family): -adecw-sung-medium-r-normal--24-240-75-75-m-240-dec.cns11643.1986-2 -adecw-sung-medium-r-normal--32-320-75-75-m-320-dec.cns11643.1986-2 -adecw-sung-medium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-2 -adecw-sung-medium-r-normal--32-320-100-100-m-320-dec.cns11643.1986-2 DTSCS fonts (Hei family): -adecw-hei-medium-r-normal--16-160-75-75-m-160-dec.dtscs.1990-2 -adecw-hei-medium-r-normal--24-240-75-75-m-240-dec.dtscs.1990-2 -adecw-hei- medium-r-normal--16-160-100-100-m-160-dec.dtscs.1990-2 -adecw-hei-medium-r-normal--24-240-100-100-m-240-dec.dtscs.1990-2 DTSCS fonts (Screen family): -adecw-screen-medium-r-normal--18-180-75-75-m-160-dec.dtscs.1990-2 -adecw-screen-medium-r-normal--24-240-75-75-m-240-dec.dtscs.1990-2 -adecw-screen-medium-r-normal--18-180-100-100-m-160-dec.dtscs.1990-2 -adecw-screen-medium-r-normal--24-240-100-100-m-240-dec.dtscs.1990-2 DTSCS fonts (Sung family): -adecw-sung-medium-r-normal--24-240-75-75-m-240-dec.dtscs.1990-2 -adecw-sung-medium-r-normal--32-320-75-75-m-320-dec.dtscs.1990-2 -adecw- sung-medium-r-normal--24-240-100-100-m-240-dec.dtscs.1990-2 -adecw-sung-medium-r-normal--32-320-100-100-m-320-dec.dtscs.1990-2 The operating system provides the following PostScript printer fonts for CNS 11643-1986 characters: Hei-Light-CNS11643 Sung-Light-CNS11643 These PostScript fonts support only the Traditional Chinese characters in planes 1 and 2 of the CNS 11643 character set. The Traditional Chinese characters in the DTSCS character set are not supported by printer fonts. The restriction also applies to the eucTW codeset, which also includes DTSCS characters and is supported by the same fonts as dechanyu. For general information on printing Asian language text, refer to i18n_printing(5). SEE ALSO
Commands: locale(1) Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanzi(5), eucTW(5), GBK(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), l10n_intro(5), sbig5(5), telecode(5) dechanyu(5)
All times are GMT -4. The time now is 09:14 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy