Sponsored Content
Top Forums Shell Programming and Scripting Extended ASCII Characters keep on getting reintroduced to text files Post 302976989 by Scrutinizer on Sunday 10th of July 2016 09:18:15 AM
Old 07-10-2016
One thing I noticed is that this:
Code:
[[:^ascii:]]

should be:
Code:
[^[:ascii:]]

Also, this :
Code:
grep -v $'[^\t\r -~]' filename_01.csv > filename_02.csv

does not just remove non-ascii characters, it discards entire lines that contains one of those characters that are not [\t\r -~]

Last edited by Scrutinizer; 07-10-2016 at 10:28 AM..
 

10 More Discussions You Might Find Interesting

1. Programming

Extended ascii

Hi all, I would like to change the extended ascii code ( 128 - 255). I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good. Thanks. (0 Replies)
Discussion started by: avis
0 Replies

2. Shell Programming and Scripting

extended ascii problem

hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Discussion started by: smooth
7 Replies

3. Shell Programming and Scripting

convert ascii values into ascii characters

Hi gurus, I have a file in unix with ascii values. I need to convert all the ascii values in the file to ascii characters. File contains nearly 20000 records with ascii values. (10 Replies)
Discussion started by: sandeeppvk
10 Replies

4. AIX

Printing extended ASCII

Hi All, I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font. Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Discussion started by: petervg
5 Replies

5. Shell Programming and Scripting

Extended replacing of nonspecific strings in text files [beware complicated !]

Well, to make another post at this helpful forum :b::D: I recently tried something like this, I want to replace all those numberings/letters that are located between <string>file://localhost/var/mobile/Applications/ and /Documents/</string> numberings =---- replace with: first... (6 Replies)
Discussion started by: pasc
6 Replies

6. Shell Programming and Scripting

Identify extended ascii characters in a file

Hi, Is there a way to identify the lines in a file having extended ascii characters and display the same? For instance I have a file abc.txt having below data aaa|bbb|111|This is first line aaa|bbb|222|This is secõnd line aaa|bbb|333|This is third line aaa|bbb|444|This is foùrth line... (3 Replies)
Discussion started by: decci_7
3 Replies

7. Shell Programming and Scripting

Search and Replace Extended Ascii Characters

We are getting extended Ascii characters in the input file and my requirement is to search and replace them with a space. I am using the following command LANG=C sed -e 's// /g' It is doing a good job, but in some cases it is replacing the extended characters with two spaces. So my input... (12 Replies)
Discussion started by: ysvsr1
12 Replies

8. Programming

How to read extended ASCII characters from stdin?

Hi, I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
Discussion started by: sanzee007
3 Replies

9. Shell Programming and Scripting

Removal Extended ASCII using awk

Hi All, I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only. Thanks & Regads (14 Replies)
Discussion started by: tostay2003
14 Replies

10. UNIX for Beginners Questions & Answers

Print byte position of extended ascii character

Hello, I am on AIX. When I encounter extended ascii characters and special characters on a file I need to print.. Byte position, actual character and line number. Is there a simple command that can give me the above result ? Thanks in advance (38 Replies)
Discussion started by: rosebud123
38 Replies
big5(5) 							File Formats Manual							   big5(5)

NAME
big5 - A character encoding system (codeset) for Traditional Chinese DESCRIPTION
The big5 codeset is one of several codesets that support the Traditional Chinese language. This codeset includes the following character sets: ASCII Big-5 The big5 codeset uses a combination of single-byte data and two-byte data to represent ASCII characters, symbols, and Chinese ideographic characters. ASCII Characters All ASCII characters are represented in the form of single-byte, 7-bit data in the big5 codeset; that is, the most significant bit (MSB) of a byte that represents an ASCII character is always set off. For more information, see ascii(5). Big-5 Character Groups The Big-5 character set defines the following character groups: Special symbols (408) Level 1 characters (5401) Level 2 characters (7652) Level 1 user-defined space (785) Level 2 user-defined space (2983) Level 3 user-defined space (2041) Code Values for Big-5 Characters Each Big-5 character is represented by a two-byte code that compiles according to the Big-5 standard. The MSB of the first byte is always set on while that of the second byte can be on or off. Code ranges for characters in the different character groups are as follows: Special symbols: A140 to A3BF Level 1 characters: A440 to C67E Level 2 characters: C940 to F9D5 Level 1 user-defined space: FA40 to FEFE Level 2 user-defined space: 8E40 to A0FE Level 3 user-defined space: 8140 to 8DFE In this space, the valid code range for the first byte is 81 to FE, while that for the second byte is 40 to 7E and A1 to FE. Codeset Conversion The following codeset converter pairs are available for converting Traditional Chinese characters between big5 and other encoding formats. Refer to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which big5 is the input or output, see the reference page specified in the list item. dechanyu_big5, big5_dechanyu Converting from and to DEC Hanyu: dechanyu(5) dechanzi_big5, big5_dechanzi Converting from and to DEC Hanzi: dechanzi(5) eucTW_big5, big5_eucTW Converting from and to Taiwanese Extended UNIX Code: eucTW(5) sbig5_big5, big5_sbig5 Converting from and to Shift Big-5: sbig5(5) telecode_big5, big5_telecode Converting from and to Telecode: telecode(5) UCS-2_big5, big5_UCS-2 Converting from and to UCS-2: Unicode(5) UCS-4_big5, big5_UCS-4 Converting from and to UCS-4: Unicode(5) UTF-8_big5, big5_UTF-8 Converting from and to UTF-8: Unicode(5) Note The big5 encoding format is identical to the encoding format used in PC code pages that support Traditional Chinese. Therefore, you can use codeset converters that convert between big5 and UCS-2, UCS-4, or UTF-8 to convert Traditional Chinese data between PC code-page and Uni- code encoding formats. Refer to code_page(5) for a discussion of how the operating system supports PC code pages. Fonts for Big-5 Characters The operating system supports Big-5 code by internally converting characters to DEC Hanyu. Therefore, DEC Hanyu fonts are used for Big-5 characters. Both display and printer fonts are provided for DEC Hanyu and these are listed in the dechanyu(5) reference page. For general information about printer support for and codeset conversion of Asian text, refer to i18n_printing(5). SEE ALSO
Commands: locale(1) Others: ascii(5), Chinese(5), code_page(5), dechanyu(5), dechanzi(5), eucTW(5), GBK(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), l10n_intro(5), sbig5(5), telecode(5), Unicode(5) big5(5)
All times are GMT -4. The time now is 11:36 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy