Sponsored Content
Top Forums Shell Programming and Scripting Handling Invisible character in a file Post 302487876 by Scrutinizer on Friday 14th of January 2011 02:21:53 AM
Old 01-14-2011
It is extended ascii so it can have all kinds of meanings, depending on the original ASCII extended character set. 240 octal = ascii 160 decimal, which may have had the meaning of a non-breaking space.... So if that was the case then it is maybe best to replace it with a regular space:
Code:
sed "s/$(printf "\240")/ /g" file

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Identifying invisible characters in Unix file

I have a file, which when you look at it, appears as if it has spaces.... But sometimes, it is has tab or Nulls or some other character which we are not able to see..... How to find what character exactly it is in the file, where ever we are seeing a space... (Iam in solaris)... (1 Reply)
Discussion started by: thanuman
1 Replies

2. UNIX for Advanced & Expert Users

Invisible login

Hello! My 1st post here, and I am not so sure if it belongs to the "Andvanced" category! I have searched very much to find a way to login to a system in such a way, that others will not be able to "see" me, with the "who" command! So, is there anybody here to help me with this? :rolleyes: ... (1 Reply)
Discussion started by: SmileKilled
1 Replies

3. Shell Programming and Scripting

Null Character Handling

Hi All, I have a problem with Null values while reading line by line from a text file. I wrote a shell script to read set of file names from a text file line by line, and zipping the each individual file and copying those zip files into some separate directory, and removing the original file... (3 Replies)
Discussion started by: npk2210
3 Replies

4. Shell Programming and Scripting

read in a file character by character - replace any unknown ASCII characters with spa

Can someone help me to write a script / command to read in a file, character by character, replace any unknown ASCII characters with space. then write out the file to a new filename/ Thanks! (1 Reply)
Discussion started by: raghav525
1 Replies

5. Shell Programming and Scripting

Deleting all characters from 350th character to 450th character from the log file

Hi All, I have a big log file i want to delete all characters (between 350th to 450th characters) starting at 350th character position to 450th character position. please advice or sample code. (6 Replies)
Discussion started by: rajeshorpu
6 Replies

6. UNIX for Dummies Questions & Answers

Removing invisible files

hi I have lots of invisable files under a file structure which i would like to delete from the top level rather than going down into each folder. All the files start with ._ these are stub files that get generated. Does anyone have a script that will do this please thanks Treds (7 Replies)
Discussion started by: treds
7 Replies

7. Virtualization and Cloud Computing

Invisible/Transparent Background in VM

Hello, If you switch to "seamless mode" in virtualbox, you can see the taskbar of the OS on your screen , like having a transparent background on your VM. My question: is there a possibility to do the same in VMware's Workstation (7) ? I know and use the "Unity" mode in Workstation/Player, but... (0 Replies)
Discussion started by: al0x
0 Replies

8. UNIX for Advanced & Expert Users

NCurses not handling hindi half character correctly

Hello, I am working on Ubuntu's Virtual terminal. On the virtual terminal, I am typing in hindi language. Most of the characters are being correctly typed, but in case of typing of a half character, problem is occuring. A hindi character is 'converted' into half by typing ' ् ' after the... (0 Replies)
Discussion started by: syed.waris
0 Replies

9. UNIX for Beginners Questions & Answers

To remove any invisible and special characters from the file(exclude @!#$&*)

Hi Guys, My requirement is to remove any invisible and special characters from the file like control M(carriage return) and alt numerics and it should not replace @#!$% abc|xyz|acd¥£ó adc|123| 12áí Please help on this. Thanks Rakesh (1 Reply)
Discussion started by: rakeshp
1 Replies

10. UNIX for Advanced & Expert Users

To remove any invisible and special characters from the file(exclude @#!$*)

Hi Guys, My requirement is to remove any invisible and special characters from the file like control M(carriage return) and alt numerics and it should not replace @#!$% abc|xyz|acd¥£ó adc|123| 12áí Please help on this. Thanks Rakesh (1 Reply)
Discussion started by: rakeshp
1 Replies
dechanzi(5)							File Formats Manual						       dechanzi(5)

NAME
dechanzi - A character encoding system (codeset) for Simplified Chinese DESCRIPTION
The DEC Hanzi (dechanzi) codeset consists of the following character sets: ASCII GB2312-80 Extended GB DEC Hanzi uses a 2-byte data representation for symbols and ideographic characters that are defined in GB2312-80. ASCII Characters All ASCII characters are represented in the form of single-byte, 7-bit data in the DEC Hanzi codeset; that is, the most significant bit (MSB) of the byte that represents an ASCII character is always set off. For more information on ASCII characters, refer to ascii(5). GB2312-80 Characters The code table for GB2312-80 characters is divided into 94 rows(Qu), numbered from 1 to 94. Each row has 94 columns(Wei), also numbered from 1 to 94. The code table defines a total of 7445 characters, of which 6763 are Chinese characters. Chinese characters are grouped as follows: Graphic symbols There are 682 graphic symbols, which occupy rows 1 to 9 in the code table. Frequently used (Level 1) characters There are 3755 frequently used characters, which occupy rows 16 to 55 in the code table. Less frequently used (Level 2) characters There are 3008 less frequently used characters, which occupy rows 56-87 in the code table. To differentiate GB2312-80 character codes from ASCII and Extended GB character codes, the most significant bit (MSB) of both the first byte and the second byte are set on. The following formulas show how to calculate the value for a GB2312-80 character from its row and col- umn numbers: 1st byte = A0 + Row number 2nd byte = A0 + Column number For example, if a GB2312-80 character is in the first column of the 16th row, the character's value is B0A1, which is calculated as fol- lows: 1st byte = A0(hex) + 16 = B0(hex) 2nd byte = A0(hex) + 01 = A1(hex) Extended GB Characters The Extended GB code table is similar to the GB2312 code table and is divided into 94 rows and 94 columns (8894 code points). However, the Extended GB code table provides code points for user-defined characters (UDC). The 8836 code points in this table are divided into two areas: User-defined area This area spans rows 1 to 87 and provides 8178 code points. User-defined (reserved) area This area spans rows 88 to 94 and provides 658 code points. This area is where users can define special and long-lasting user- defined characters. To differentiate Extended GB codes from ASCII codes and GB2312-80 codes, the most significant bit (MSB) of the first byte is set on while that of the second byte is set off. The following formulas show how the code value of an Extended GB character is calculated from its row and column numbers: 1st byte = A0 + Row number 2nd byte = 20 + Column number For example, if a character is positioned at the first column of the 16th row on the GB2312-80 code plane, the character's value is B021, which is calculated as follows: 1st byte = A0(hex) + 16 = B0(hex) 2nd byte = 20(hex) + 01 = 21(hex) Codeset Conversion The following codeset converter pairs are available for converting Simplified Chinese characters between dechanzi and other encoding for- mats. Refer to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which dechanzi is the input or output, see the reference page specified in the list item. big5_dechanzi, dechanzi_big5 Converting from and to the Big-5 codeset: big5(5) dechanyu_dechanzi, dechanzi_dechanyu Converting from and to the DEC Hanyu codeset: dechanyu(5) eucTW_dechanzi, dechanzi_eucTW Converting from and to Taiwanese Extended UNIX Code: eucTW(5) UCS-2_dechanzi, dechanzi_UCS-2 Converting from and to UCS-2 format: Unicode(5) UCS-4_dechanzi, dechanzi_UCS-4 Converting from and to UCS-4 format: Unicode(5) UTF-8_dechanzi, dechanzi_UTF-8 Converting from and to UTF-8 format: Unicode(5) DEC Hanzi encoding is identical to the Microsoft code-page format (cp936) used for Simplified Chinese characters on PC systems. However, DEC Hanzi supports fewer characters than supported by the code page. Therefore, using converters with dechanzi in the converter name to convert between cp936 and other formats can result in some data loss. Refer to code_page(5) for more information about PC code pages. DEC Hanzi Fonts The operating system provides both screen and printer fonts for DEC Hanzi characters. The following bitmap fonts are grouped according to family and reflect various sizes and typefaces for 75dpi and 100dpi display devices: Fangsongti Family: -adecw-fangsongti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-fangsongti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1 -adecw-fangsongti-medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-fangsongti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1 Heiti Family: -adecw-heiti-medium-r-normal--16-160-75-75-m-160-gb2312.1980-1 -adecw-heiti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-heiti- medium-r-normal--34-340-75-75-m-340-gb2312.1980-1 -adecw-heiti-medium-r-normal--16-160-100-100-m-160-gb2312.1980-1 -adecw-heiti-medium-r- normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-heiti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1 Kaiti Family: -adecw-kaiti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw-kaiti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1 -adecw-kaiti- medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-kaiti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1 Screen Family: -adecw-screen-medium-r-normal--18-180-75-75-m-160-gb2312.1980-1 -adecw-screen-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw- screen-medium-r-normal--18-180-100-100-m-160-gb2312.1980-1 -adecw-screen-medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-screen- medium-r-normal--18-180-100-100-m-160-gb2312.1980-UDC -adecw-screen-medium-r-normal--24-240-100-100-m-240-gb2312.1980-UDC Songti Family: -adecw-songti-medium-r-normal--16-160-75-75-m-160-gb2312.1980-1 -adecw-songti-medium-r-normal--24-240-75-75-m-240-gb2312.1980-1 -adecw- songti-medium-r-normal--34-340-75-75-m-340-gb2312.1980-1 -adecw-songti-medium-r-normal--16-160-100-100-m-160-gb2312.1980-1 -adecw-songti- medium-r-normal--24-240-100-100-m-240-gb2312.1980-1 -adecw-songti-medium-r-normal--34-340-100-100-m-340-gb2312.1980-1 The operating system provides the following PostScript printer fonts for DEC Hanzi characters: Hei-GB2312-80 XiSong-GB2312-80 For general information on printing Asian language text, refer to i18n_printing(5). SEE ALSO
Commands: locale(1) Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanyu(5), eucTW(5), GBK(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), l10n_intro(5), sbig5(5), telecode(5), Unicode(5) dechanzi(5)
All times are GMT -4. The time now is 10:59 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy