07-15-2018
By definition, a text file cannot contain NUL bytes.
If the file you're reading contains pointers or other binary values. You need to really understand the format of the data you are processing and use tools appropriate to your task. Without understanding the format of the data you're reading, all bets are off. Note that the format includes not only knowing where there are binary values in your data (if there are any), but also knowing what codeset is being used to encode characters in your file. (For example, there is obviously a big difference between extended ASCII characters encoded in ISO 8859-1 and extended ASCII character encoded in UTF-8.)
These 2 Users Gave Thanks to Don Cragun For This Post:
10 More Discussions You Might Find Interesting
1. Programming
Hi all,
I would like to change the extended ascii code ( 128 - 255).
I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good.
Thanks. (0 Replies)
Discussion started by: avis
0 Replies
2. Shell Programming and Scripting
hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Discussion started by: smooth
7 Replies
3. UNIX for Advanced & Expert Users
Hi, I have a accentuated letter (ö) in a script for an Installer. It's a file name. This is not working and I'm told to try using the octal value for the extended ascii character. Does anyone no how to do this? If I had the word "filförval", can I just put in the value between the letters, like... (9 Replies)
Discussion started by: peli
9 Replies
4. Shell Programming and Scripting
I need to print lines with character S at nth position in a file...can someone pl help me with appropriate awk command for this (2 Replies)
Discussion started by: manaswinig
2 Replies
5. Shell Programming and Scripting
I need to print lines with character S at nth position in a file...can someone pl help me with appropriate awk command for this (1 Reply)
Discussion started by: manaswinig
1 Replies
6. AIX
Hi All,
I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font.
Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Discussion started by: petervg
5 Replies
7. Shell Programming and Scripting
Hi,
In my file, for few field I have to print the next ASCII character for every character.
In the below file, I have to do for the 2,3 and 5th fields.
Input File
========
1|abc|def|5|ghi
2|jkl|mno|6|pqr
Expected
Ouput file
=======
1|bcd|efg|5|hij
2|klm|nop|6|qrs (2 Replies)
Discussion started by: machomaddy
2 Replies
8. Shell Programming and Scripting
We are getting extended Ascii characters in the input file and my requirement is to search and replace them with a space. I am using the following command
LANG=C sed -e 's// /g'
It is doing a good job, but in some cases it is replacing the extended characters with two spaces. So my input... (12 Replies)
Discussion started by: ysvsr1
12 Replies
9. Programming
Hi,
I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
Discussion started by: sanzee007
3 Replies
10. Shell Programming and Scripting
Hi All,
I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only.
Thanks & Regads (14 Replies)
Discussion started by: tostay2003
14 Replies
TCS(1) General Commands Manual TCS(1)
NAME
tcs - translate character sets
SYNOPSIS
tcs [ -slcv ] [ -f ics ] [ -t ocs ] [ file ... ]
DESCRIPTION
Tcs interprets the named file(s) (standard input default) as a stream of characters from the ics character set or format, converts them to
runes, and then converts them into a stream of characters from the ocs character set or format on the standard output. The default value
for ics and ocs is utf, the UTF encoding described in utf(6). The -l option lists the character sets known to tcs. Processing continues
in the face of conversion errors (the -s option prevents reporting of these errors). The -c option forces the output to contain only cor-
rectly converted characters; otherwise, 0x80 characters will be substituted for UTF encoding errors and 0xFFFD characters will substituted
for unknown characters.
The -v option generates various diagnostic and summary information on standard error, or makes the -l output more verbose.
Tcs recognizes an ever changing list of character sets. In particular, it supports a variety of Russian and Japanese encodings. Some of
the supported encodings are
utf The Plan 9 UTF encoding, known by ISO as UTF-8
utf1 The deprecated original UTF encoding from ISO 10646
ascii 7-bit ASCII
8859-1 Latin-1 (Central European)
8859-2 Latin-2 (Czech .. Slovak)
8859-3 Latin-3 (Dutch .. Turkish)
8859-4 Latin-4 (Scandinavian)
8859-5 Part 5 (Cyrillic)
8859-6 Part 6 (Arabic)
8859-7 Part 7 (Greek)
8859-8 Part 8 (Hebrew)
8859-9 Latin-5 (Finnish .. Portuguese)
koi8 KOI-8 (GOST 19769-74)
jis-kanji
ISO 2022-JP
ujis EUC-JX: JIS 0208
ms-kanji
Microsoft, or Shift-JIS
jis (from only) guesses between ISO 2022-JP, EUC or Shift-Jis
gb Chinese national standard (GB2312-80)
big5 Big 5 (HKU version)
unicode
Unicode Standard 1.0
tis Thai character set plus ASCII (TIS 620-1986)
msdos IBM PC: CP 437
atari Atari-ST character set
EXAMPLES
tcs -f 8859-1
Convert 8859-1 (Latin-1) characters into UTF format.
tcs -s -f jis
Convert characters encoded in one of several shift JIS encodings into UTF format. Unknown Kanji will be converted into 0xFFFD char-
acters.
tcs -lv
Print an up to date list of the supported character sets.
SOURCE
/sys/src/cmd/tcs
SEE ALSO
ascii(1), rune(2), utf(6).
TCS(1)