Sponsored Content
Top Forums Shell Programming and Scripting python - string encoding error Post 302552354 by Corona688 on Friday 2nd of September 2011 02:25:17 PM
Old 09-02-2011
Finding out the actual data that's making it throw up would be a good start. If the data's not actually UTF-8, setting the encoding to UTF-8 won't help.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

python and string.find

Hi all, I'm not sure if this is the right forum, but i'll give it a try. Here is my problem: i have two files having basically the same things in it (hostnames): File 1 mituap01 mituap02 mituap03 File 2: mituap01 mituap04 mituap05 my goal is to get a .py out to check if pcs' in... (0 Replies)
Discussion started by: penguin-friend
0 Replies

2. Shell Programming and Scripting

Python - Scan for string

Hi i have a variable 'reform' and store the lines like reform= { record string(8) ID; string(4) PRD; date("YYMMDD", split = "800101") DateofManufact; string(4) PRDC_MODULE_NUM; string(1) END_OF_RECORD = "\n"; } I need to search for the character "\n"in the above variable... (1 Reply)
Discussion started by: dhanamurthy
1 Replies

3. Shell Programming and Scripting

Python String <--> Number

My question is so simple: A = raw_input("A ") if A == '56': VAR = (A + 54)/13 else: print "other operations" if I write in input 5656565656 i want to make some arithmetic operations if the first input is 56XXX but the output is TypeError: cannot concatenate 'str' and... (2 Replies)
Discussion started by: kazikamuntu
2 Replies

4. Shell Programming and Scripting

How to find the file encoding and updating the file encoding?

Hi, I am beginner to Unix. My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8. Please advice me how to proceed on this. (7 Replies)
Discussion started by: cnraja
7 Replies

5. Shell Programming and Scripting

[python]string to list conversion

I have a file command.txt. It's content are as follows:- The content of file is actually a command with script name and respective arguments. arg1 and arg2 are dummy arguments , format : -arg arg_value test is a argument specifying run mode , format : -arg In my python code, i read it and... (1 Reply)
Discussion started by: animesharma
1 Replies

6. Shell Programming and Scripting

How to check string encoding?

I want to check if the string is WINDOWS-1251 or UTF-8 can you help me to find the string encoding??? or maybe to get URL Content-Type charset with wget? this is my function on PHP function check_utf8($str) { $len = strlen($str); for($i = 0; $i < $len; $i++){ $c =... (2 Replies)
Discussion started by: sanantonio7777
2 Replies

7. Shell Programming and Scripting

Remove lines between the start string and end string including start and end string Python

Hi, I am trying to remove lines once a string is found till another string is found including the start string and end string. I want to basically grab all the lines starting with color (closing bracket). PS: The line after the closing bracket for color could be anything (currently 'more').... (1 Reply)
Discussion started by: Dabheeruz
1 Replies

8. Shell Programming and Scripting

Python replace string

Hi, I have a python variable with a value like this : string = "abc.de.fghijk.com:zyz.ab.fgfijk.com:abc.ef.fghijk.com" They are hostnames separated by the special character ":" . From this string I want to make a list with values : (2 Replies)
Discussion started by: ctrld
2 Replies

9. Solaris

View file encoding then change encoding.

Hi all!! I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

10. Shell Programming and Scripting

Url encoding a string using sed

Hi I was hoping some one would know if it is possible to url encode a string using sed? My problem is I have extracted some key value pairs from a text file with sed, and will be inserting these pairs as source variables into a curl script to automatically download some xml from our server. My... (5 Replies)
Discussion started by: Paul Walker
5 Replies
UTF(6)								   Games Manual 							    UTF(6)

NAME
UTF, Unicode, ASCII, rune - character set and format DESCRIPTION
The Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character Set Transformation Format, 8 bits wide). The Unicode Standard represents its characters in 16 bits; UTF-8 represents such values in an 8-bit byte stream. Throughout this manual, UTF-8 is shortened to UTF. In Plan 9, a rune is a 16-bit quantity representing a Unicode character. Internally, programs may store characters as runes. However, any external manifestation of textual information, in files or at the interface between programs, uses a machine-independent, byte-stream encoding called UTF. UTF is designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding. Runes with values above 7F appear as sequences of two or more bytes with values only from 80 to FF. The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even if not written to deal with UTF, as do programs that deal with uninterpreted byte streams. However, programs that perform semantic processing on ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input. See rune(2). Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows: 01. x in [00000000.0bbbbbbb] -> 0bbbbbbb 10. x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb 11. x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way. Conversions 10 and 11 represent higher- valued characters as sequences of two or three bytes with the high bit set. Plan 9 does not support the 4, 5, and 6 byte sequences pro- posed by X-Open. When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used. In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080. FILES
/lib/unicode table of characters and descriptions, suitable for look(1). SEE ALSO
ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard. UTF(6)
All times are GMT -4. The time now is 06:17 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy