09-02-2011
Finding out the actual data that's making it throw up would be a good start. If the data's not actually UTF-8, setting the encoding to UTF-8 won't help.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi all,
I'm not sure if this is the right forum, but i'll give it a try.
Here is my problem:
i have two files having basically the same things in it (hostnames):
File 1
mituap01
mituap02
mituap03
File 2:
mituap01
mituap04
mituap05
my goal is to get a .py out to check if pcs' in... (0 Replies)
Discussion started by: penguin-friend
0 Replies
2. Shell Programming and Scripting
Hi
i have a variable 'reform' and store the lines like
reform= {
record
string(8) ID;
string(4) PRD;
date("YYMMDD", split = "800101") DateofManufact;
string(4) PRDC_MODULE_NUM;
string(1) END_OF_RECORD = "\n";
}
I need to search for the character "\n"in the above variable... (1 Reply)
Discussion started by: dhanamurthy
1 Replies
3. Shell Programming and Scripting
My question is so simple:
A = raw_input("A ")
if A == '56':
VAR = (A + 54)/13
else:
print "other operations"
if I write in input 5656565656
i want to make some arithmetic operations if the first input is 56XXX
but the output is
TypeError: cannot concatenate 'str' and... (2 Replies)
Discussion started by: kazikamuntu
2 Replies
4. Shell Programming and Scripting
Hi,
I am beginner to Unix.
My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8.
Please advice me how to proceed on this. (7 Replies)
Discussion started by: cnraja
7 Replies
5. Shell Programming and Scripting
I have a file command.txt. It's content are as follows:-
The content of file is actually a command with script name and respective arguments.
arg1 and arg2 are dummy arguments , format : -arg arg_value
test is a argument specifying run mode , format : -arg
In my python code, i read it and... (1 Reply)
Discussion started by: animesharma
1 Replies
6. Shell Programming and Scripting
I want to check if the string is WINDOWS-1251 or UTF-8
can you help me to find the string encoding???
or maybe to get URL Content-Type charset with wget?
this is my function on PHP
function check_utf8($str) {
$len = strlen($str);
for($i = 0; $i < $len; $i++){
$c =... (2 Replies)
Discussion started by: sanantonio7777
2 Replies
7. Shell Programming and Scripting
Hi,
I am trying to remove lines once a string is found till another string is found including the start string and end string. I want to basically grab all the lines starting with color (closing bracket). PS: The line after the closing bracket for color could be anything (currently 'more').... (1 Reply)
Discussion started by: Dabheeruz
1 Replies
8. Shell Programming and Scripting
Hi,
I have a python variable with a value like this :
string = "abc.de.fghijk.com:zyz.ab.fgfijk.com:abc.ef.fghijk.com"
They are hostnames separated by the special character ":" . From this string I want to make a list with values : (2 Replies)
Discussion started by: ctrld
2 Replies
9. Solaris
Hi all!!
I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII
Is there command to display the files encoding?
Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies
10. Shell Programming and Scripting
Hi I was hoping some one would know if it is possible to url encode a string using sed?
My problem is I have extracted some key value pairs from a text file with sed, and will be inserting these pairs as source variables into a curl script to automatically download some xml from our server.
My... (5 Replies)
Discussion started by: Paul Walker
5 Replies
UTF(6) Games Manual UTF(6)
NAME
UTF, Unicode, ASCII, rune - character set and format
DESCRIPTION
The Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character
Set Transformation Format, 8 bits wide). The Unicode Standard represents its characters in 16 bits; UTF-8 represents such values in an
8-bit byte stream. Throughout this manual, UTF-8 is shortened to UTF.
In Plan 9, a rune is a 16-bit quantity representing a Unicode character. Internally, programs may store characters as runes. However, any
external manifestation of textual information, in files or at the interface between programs, uses a machine-independent, byte-stream
encoding called UTF.
UTF is designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding. Runes with values above
7F appear as sequences of two or more bytes with values only from 80 to FF.
The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even if not
written to deal with UTF, as do programs that deal with uninterpreted byte streams. However, programs that perform semantic processing on
ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input. See rune(2).
Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows:
01. x in [00000000.0bbbbbbb] -> 0bbbbbbb
10. x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
11. x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way. Conversions 10 and 11 represent higher-
valued characters as sequences of two or three bytes with the high bit set. Plan 9 does not support the 4, 5, and 6 byte sequences pro-
posed by X-Open. When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used.
In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080.
FILES
/lib/unicode
table of characters and descriptions, suitable for look(1).
SEE ALSO
ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard.
UTF(6)