What we are getting at: your choice of the C locale does not "work" with the file.
So, is the file from an external source like a vendor or is it corrupted?
Because if a line has 100 bytes of characters used in a given locale, the output of your sed will be 100 bytes of data, not 101. So something is going on with the data in the file.
This uses cat as UUOC to simplify the example. You know the fixed record length of your file. For this example assume it is 100.
This should produce the answer of zero, meaning all records are the same, correct size. Try it to make sure the file not corrupt. And we are not barking up the wrong tree.
Greetings....
I'm looking for the command and syntax to search files, several actually, that will find the string pattern "\0;" and delete it. I have over 200 files to change :o
Thanx (2 Replies)
Hi all,
I would like to change the extended ascii code ( 128 - 255).
I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good.
Thanks. (0 Replies)
Hi All,
In the HP Unix that i'm using when i initialise a string as Stalled="'30¬G'"
Stalled=$Stalled" '30¬C'", it is taking the character ¬ as a comma. I need to grep for 30¬G 30¬C in a file and take its count. But since this character ¬ is not being understood, the count returns a zero.
The... (2 Replies)
hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Hi All,
I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font.
Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Hi,
Is there a way to identify the lines in a file having extended ascii characters and display the same?
For instance I have a file abc.txt having below data
aaa|bbb|111|This is first line
aaa|bbb|222|This is secõnd line
aaa|bbb|333|This is third line
aaa|bbb|444|This is foùrth line... (3 Replies)
Hi,
I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
Hi All,
I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only.
Thanks & Regads (14 Replies)
I am working with a log file that I am trying to clean up by removing non-English ASCII characters. I am using Bash via Cygwin on Windows.
Before I start I set:
export LC_ALL=C
I clean it up by removing all non-English ASCII characters with the following command;
grep -v $''... (4 Replies)
I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following
It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc
1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies
LEARN ABOUT XFREE86
ascii
ASCII(1) Development Tools ASCII(1)NAME
ascii - report character aliases
SYNOPSIS
ascii [-dxohv] [-t] [char-alias...]
OPTIONS
Called with no options, ascii behaves like `ascii -h'. Options are as follows:
-t
Script-friendly mode, emits only ISO/decimal/hex/octal/binary encodings of the character.
-s
Parse multiple characters. Convenient way of parsing strings.
-d
Ascii table in decimal.
-x
Ascii table in hex.
-o
Ascii table in octal.
-h, -?
Show summary of options and a simple ASCII table.
-v
Show version of program.
DESCRIPTION
Characters in the ASCII set can have many aliases, depending on context. A character's possible names include:
*
Its bit pattern (binary representation).
*
Its hex, decimal and octal representations.
*
Its teletype mnemonic and caret-notation form (for control chars).
*
Its backlash-escape form in C (for some control chars).
*
Its printed form (for printables).
*
Its full ISO official name in English.
*
Its ISO/ECMA code table reference.
*
Its name as an HTML/SGML entity.
*
Slang and other names in wide use for it among hackers.
This utility accepts command-line strings and tries to interpret them as one of the above. When it finds a value, it prints all of the
names of the character. The constructs in the following list can be used to specify character values. If an argument could be interpreted
in two or more ways, names for all the different characters it might be are dumped.
character
Any character not described by one of the following conventions represents the character itself.
^character
A caret followed by a character.
character
A backslash followed by certain special characters (abfnrtv).
mnemonic
An ASCII teletype mnemonic.
hexadecimal
A hexadecimal (hex) sequence consists of one or two case-insensitive hex digit characters (01234567890abcdef). To ensure hex
interpretation use hexh, 0xhex, xhex or xhex.
decimal
A decimal sequence consists of one, two or three decimal digit characters (0123456789). To ensure decimal interpretation use
ddecimal, ddecimal, or ddecimal.
octal
An octal sequence consists of one, two or three octal digit characters (01234567). To ensure octal interpretation use octal, 0ooctal,
ooctal, or ooctal.
bit pattern
A bit pattern (binary) sequence consists of one to eight binary digit characters (01). To ensure bit interpretation use 0bbit pattern,
bbit pattern or bit pattern.
ISO/ECMA code
A ISO/ECMA code sequence consists of one or two decimal digit characters, a slash, and one or two decimal digit characters.
name
An official ASCII or slang name.
The slang names recognized and printed out are from a rather comprehensive list that first appeared on USENET in early 1990 and has been
continuously updated since. Mnemonics recognized and printed include the official ASCII set, some official ISO names (where those differ)
and a few common-use alternatives (such as NL for LF). HTML/SGML entity names are also printed when applicable. All comparisons are
case-insensitive, and dashes are mapped to spaces. Any unrecognized arguments or out of range values are silently ignored. Note that the -s
option will not recognize 'long' names, as it cannot differentiate them from other parts of the string.
For correct results, be careful to stringize or quote shell metacharacters in arguments (especially backslash).
This utility is particularly handy for interpreting cc(1)'s ugly octal `invalid-character' messages, or when coding anything to do with
serial communications. As a side effect it serves as a handy base-converter for random 8-bit values.
AUTHOR
Eric S. Raymond esr@snark.thyrsus.com; November 1990 (home page at http://www.catb.org/~esr/). Reproduce, use, and modify as you like as
long as you don't remove this authorship notice. Ioannis E. Tambouras <ioannis@debian.org> added command options and minor enhancements.
Brian J. Ginsbach <ginsbach@sgi.com> fixed several bugs and expanded the man page. David N. Welton <davidw@efn.org> added the -s option.
Matej Vela corrected the ISO names. Dave Capella contributed the idea of listing HTML/SGML entities.
ascii 03/26/2011 ASCII(1)