Hey all,
Just found your forum...Looks super rich with info! Can't wait to get through it all.
I am currently writing a web app in .net that telnets into a unix server (require uid + passwd), runs a command, and returns that output to be displayed on the web page.
I have gotten through the... (8 Replies)
Hi,
Can I know how to grep for lines with non-ascii characters in a file?
If not grep, at least can we do it with command-line perl or awk? I tried the functionality of perl, but still could not get the result. Any help??
PS: I was sure that someone should have asked this question... (9 Replies)
Is there a way to determine the ascii value of a character? For example, let's say a shell variable has the value 'A'. I would like it's ascii value (e.g. 65 in this case). I would like to do this from a script (preferably ksh). (12 Replies)
Can someone help me to write a script / command to read in a file, character by character, replace any unknown ASCII characters with space. then write out the file to a new filename/
Thanks! (1 Reply)
Hi, I need to do a global search and replacement of a non-ascii character. Let me first give the background of my problem.
Very frequently, I need to copy set of references from different sources. Typically, a reference would like this:
Banumathy et al., 2002 G. Banumathy, V. Singh and U.... (1 Reply)
A very simple question but I have scoured the web and can't find an answer. How do I search for a character by ASCII code in a regular expression using grep?
For example, we use the End of Medium symbol as a delimiter in certain files. (this is ascii 031 in oct, displays as ^Y) I want to grep... (6 Replies)
I have one file .dat file on windows server containg the following text
"Bürki"
Now When I am using FTP (get) command from UNIX server the text is appering is as "Bürki"
I want to preserve the text in the file on UNIX server as it is in source file.
Could you please suggest some... (2 Replies)
Hi,
In my file, for few field I have to print the next ASCII character for every character.
In the below file, I have to do for the 2,3 and 5th fields.
Input File
========
1|abc|def|5|ghi
2|jkl|mno|6|pqr
Expected
Ouput file
=======
1|bcd|efg|5|hij
2|klm|nop|6|qrs (2 Replies)
Hello
I have this special caracter after retreving rows from sql server:
"....spasses: • Entrem al valort 6050108002811 • El donem..."
I would like a sed command to remove it..or just know it's ascii code in order to replace it into my sql sentence.. Hope some one knows how to do that.... (7 Replies)
Hi Guru,
I have put one post yesterday and get answer. thanks for your help.
my question today is: what is ascii character for following non printable characters: ( we need filter these characters out in another process)
^MM-^E^MM-^E.
Old post link: ... (5 Replies)
Discussion started by: ken002
5 Replies
LEARN ABOUT PLAN9
utf
UTF(6) Games Manual UTF(6)NAME
UTF, Unicode, ASCII, rune - character set and format
DESCRIPTION
The Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character
Set Transformation Format, 8 bits wide). The Unicode Standard represents its characters in 16 bits; UTF-8 represents such values in an
8-bit byte stream. Throughout this manual, UTF-8 is shortened to UTF.
In Plan 9, a rune is a 16-bit quantity representing a Unicode character. Internally, programs may store characters as runes. However, any
external manifestation of textual information, in files or at the interface between programs, uses a machine-independent, byte-stream
encoding called UTF.
UTF is designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding. Runes with values above
7F appear as sequences of two or more bytes with values only from 80 to FF.
The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even if not
written to deal with UTF, as do programs that deal with uninterpreted byte streams. However, programs that perform semantic processing on
ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input. See rune(2).
Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows:
01. x in [00000000.0bbbbbbb] -> 0bbbbbbb
10. x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
11. x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way. Conversions 10 and 11 represent higher-
valued characters as sequences of two or three bytes with the high bit set. Plan 9 does not support the 4, 5, and 6 byte sequences pro-
posed by X-Open. When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used.
In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080.
FILES
/lib/unicode
table of characters and descriptions, suitable for look(1).
SEE ALSO ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard.
UTF(6)