python - string encoding error


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting python - string encoding error
# 1  
Old 09-02-2011
python - string encoding error

I'm trying to pull a google calendar (successful) and then put the contents into a mysql db (almost successful). On one of the field I keep getting an encode error:
Code:
#!/usr/bin/python   

from xml.etree import ElementTree 
import gdata.calendar.data 
import gdata.calendar.client 
import gdata.acl.data 
import atom.data 
import time 
import MySQLdb

calendar_client = gdata.calendar.client.CalendarClient() username = 'user@gmail.com' visibility = 'public' projection = 'full' 
feed_uri = calendar_client.GetCalendarEventFeedUri(calendar=username, visibility=visibility, projection=projection) 

# define mysql db connection/credentials 
conn = MySQLdb.connect (host = "localhost", user = "test1", passwd = "test1", db = "googlecal") 
cursor = conn.cursor ()  
feed = calendar_client.GetCalendarEventFeed(uri=feed_uri) 

print 'Events on Primary Calendar: %s' % (feed.title.text,)
for i, an_event in enumerate(feed.entry):
     print '\t%s. %s' % (i, an_event.title.text,)
     data_point = {}
     data_point[ 'title' ] = (i, an_event.title.text,)
     print '\t%s. %s' % data_point[ 'title' ]
     for a_when in an_event.when:
         print '\t\t%s. %s' % (i, an_event.content.text,)
         data_point[ 'content' ] = (i, an_event.content.text,)
         print '\t\tStart time: %s' % (a_when.start,)
         data_point[ 'start' ] = (a_when.start,)
         print '\t\tEnd time:   %s' % (a_when.end,)
         data_point[ 'end' ] = (a_when.end,)

         cursor.execute("""insert into events (id, title, content, start, end)
         values (NULL, %s, %s, %s, %s)""",
         (data_point[ 'title' ],
           data_point[ 'content' ],
          data_point[ 'start' ],
          data_point[ 'end' ]))

conn.commit()
cursor.close () 
conn.close ()  

Traceback (most recent call last):
   File "/home/unclecameron/Documents/workspace/google_pull_calendar/src/get_gmail.cal.py", line 45, in <module>
     data_point[ 'end' ]))   
File "/usr/lib/pymodules/python2.7/MySQLdb/cursors.py", line 151, in execute
     query = query % db.literal(args)   
File "/usr/lib/pymodules/python2.7/MySQLdb/connections.py", line 247, in literal
     return self.escape(o, self.encoders)   
File "/usr/lib/pymodules/python2.7/MySQLdb/connections.py", line 185, in unicode_literal return db.literal(u.encode(unicode_literal.charset))
    UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 404: ordinal not in range(256)

the db is utf-8, I've also tried inserting it after encoding like this:
Code:
data_point[ 'content' ].encode('utf-8', 'ignore')

and
Code:
data_point['content'] = i, smart_str(an_event.content.text)

the rest of the fields work fine, what am I doing wrong?

Last edited by unclecameron; 09-02-2011 at 02:59 PM..
# 2  
Old 09-02-2011
Finding out the actual data that's making it throw up would be a good start. If the data's not actually UTF-8, setting the encoding to UTF-8 won't help.
# 3  
Old 09-02-2011
okay, here's the example that makes it data_point [ 'content' ] break:
Code:
0. A group of Rogue River area actors has formed a new
 performance troupe, calling themselves Exit 48. Their first
 production will be an original murder mystery, written by 
Ronda Bagley, a member of the group, to be staged as a dinner
 theater event at The Station Restaurant in Rogue River on
 Friday, September 9th. Dinner in the outdoor setting will begin
 at 6 pm. The interactive show, a western entitled “Welcome
 Wagon”, will begin shortly after that. 

Tickets are $30 per person.  For reservations, please call The
 Station at 541-582-9725. The restaurant is located at 595
 Main Street in Rogue River. 

Guests are invited to participate as much as they like, even
 taking on the roles of some of the minor characters in the
 play. It’s a real whodunit, with lots of clues and plenty of
 suspects. And of course, there’s that wonderful dinner fare for
 which The Station is so well known.

The actors, most of whom live in the Rogue River area, will be
 familiar to patrons of Barnstormers Theater in Grants Pass as
 they have all been involved in a variety of plays for many
 years at that icon of local community theater. 

For information and details, contact:
Eric Epstein, Manager
541-226-6703

(I put some linebreaks in there, the code tags didn't word wrap)
# 4  
Old 09-02-2011
Just copy-pasting it as text doesn't tell me what original encoding it had unfortunately. It also won't show bogus nonprinting characters. Could you save it to a file, so you can hexdump -C < filename and post that?
This User Gave Thanks to Corona688 For This Post:
# 5  
Old 09-02-2011
Code:
00000000  41 20 67 72 6f 75 70 20  6f 66 20 52 6f 67 75 65  |A group of Rogue|
00000010  20 52 69 76 65 72 20 61  72 65 61 20 61 63 74 6f  | River area acto|
00000020  72 73 20 68 61 73 20 66  6f 72 6d 65 64 20 61 20  |rs has formed a |
00000030  6e 65 77 20 70 65 72 66  6f 72 6d 61 6e 63 65 20  |new performance |
00000040  74 72 6f 75 70 65 2c 20  63 61 6c 6c 69 6e 67 20  |troupe, calling |
00000050  74 68 65 6d 73 65 6c 76  65 73 20 45 78 69 74 20  |themselves Exit |
00000060  34 38 2e 20 54 68 65 69  72 20 66 69 72 73 74 20  |48. Their first |
00000070  70 72 6f 64 75 63 74 69  6f 6e 20 77 69 6c 6c 20  |production will |
00000080  62 65 20 61 6e 20 6f 72  69 67 69 6e 61 6c 20 6d  |be an original m|
00000090  75 72 64 65 72 20 6d 79  73 74 65 72 79 2c 20 77  |urder mystery, w|
000000a0  72 69 74 74 65 6e 20 62  79 20 52 6f 6e 64 61 20  |ritten by Ronda |
000000b0  42 61 67 6c 65 79 2c 20  61 20 6d 65 6d 62 65 72  |Bagley, a member|
000000c0  20 6f 66 20 74 68 65 20  67 72 6f 75 70 2c 20 74  | of the group, t|
000000d0  6f 20 62 65 20 73 74 61  67 65 64 20 61 73 20 61  |o be staged as a|
000000e0  20 64 69 6e 6e 65 72 20  74 68 65 61 74 65 72 20  | dinner theater |
000000f0  65 76 65 6e 74 20 61 74  20 54 68 65 20 53 74 61  |event at The Sta|
00000100  74 69 6f 6e 20 52 65 73  74 61 75 72 61 6e 74 20  |tion Restaurant |
00000110  69 6e 20 52 6f 67 75 65  20 52 69 76 65 72 20 6f  |in Rogue River o|
00000120  6e 20 46 72 69 64 61 79  2c 20 53 65 70 74 65 6d  |n Friday, Septem|
00000130  62 65 72 20 39 74 68 2e  20 44 69 6e 6e 65 72 20  |ber 9th. Dinner |
00000140  69 6e 20 74 68 65 20 6f  75 74 64 6f 6f 72 20 73  |in the outdoor s|
00000150  65 74 74 69 6e 67 20 77  69 6c 6c 20 62 65 67 69  |etting will begi|
00000160  6e 20 61 74 20 36 20 70  6d 2e 20 54 68 65 20 69  |n at 6 pm. The i|
00000170  6e 74 65 72 61 63 74 69  76 65 20 73 68 6f 77 2c  |nteractive show,|
00000180  20 61 20 77 65 73 74 65  72 6e 20 65 6e 74 69 74  | a western entit|
00000190  6c 65 64 20 e2 80 9c 57  65 6c 63 6f 6d 65 20 57  |led ...Welcome W|
000001a0  61 67 6f 6e e2 80 9d 2c  20 77 69 6c 6c 20 62 65  |agon..., will be|
000001b0  67 69 6e 20 73 68 6f 72  74 6c 79 20 61 66 74 65  |gin shortly afte|
000001c0  72 20 74 68 61 74 2e 20  0a 0a 54 69 63 6b 65 74  |r that. ..Ticket|
000001d0  73 20 61 72 65 20 24 33  30 20 70 65 72 20 70 65  |s are $30 per pe|
000001e0  72 73 6f 6e 2e 20 20 46  6f 72 20 72 65 73 65 72  |rson.  For reser|
000001f0  76 61 74 69 6f 6e 73 2c  20 70 6c 65 61 73 65 20  |vations, please |
00000200  63 61 6c 6c 20 54 68 65  20 53 74 61 74 69 6f 6e  |call The Station|
00000210  20 61 74 20 35 34 31 2d  35 38 32 2d 39 37 32 35  | at 541-582-9725|
00000220  2e 20 54 68 65 20 72 65  73 74 61 75 72 61 6e 74  |. The restaurant|
00000230  20 69 73 20 6c 6f 63 61  74 65 64 20 61 74 20 35  | is located at 5|
00000240  39 35 20 4d 61 69 6e 20  53 74 72 65 65 74 20 69  |95 Main Street i|
00000250  6e 20 52 6f 67 75 65 20  52 69 76 65 72 2e 20 0a  |n Rogue River. .|
00000260  0a 47 75 65 73 74 73 20  61 72 65 20 69 6e 76 69  |.Guests are invi|
00000270  74 65 64 20 74 6f 20 70  61 72 74 69 63 69 70 61  |ted to participa|
00000280  74 65 20 61 73 20 6d 75  63 68 20 61 73 20 74 68  |te as much as th|
00000290  65 79 20 6c 69 6b 65 2c  20 65 76 65 6e 20 74 61  |ey like, even ta|
000002a0  6b 69 6e 67 20 6f 6e 20  74 68 65 20 72 6f 6c 65  |king on the role|
000002b0  73 20 6f 66 20 73 6f 6d  65 20 6f 66 20 74 68 65  |s of some of the|
000002c0  20 6d 69 6e 6f 72 20 63  68 61 72 61 63 74 65 72  | minor character|
000002d0  73 20 69 6e 20 74 68 65  20 70 6c 61 79 2e 20 49  |s in the play. I|
000002e0  74 e2 80 99 73 20 61 20  72 65 61 6c 20 77 68 6f  |t...s a real who|
000002f0  64 75 6e 69 74 2c 20 77  69 74 68 20 6c 6f 74 73  |dunit, with lots|
00000300  20 6f 66 20 63 6c 75 65  73 20 61 6e 64 20 70 6c  | of clues and pl|
00000310  65 6e 74 79 20 6f 66 20  73 75 73 70 65 63 74 73  |enty of suspects|
00000320  2e 20 41 6e 64 20 6f 66  20 63 6f 75 72 73 65 2c  |. And of course,|
00000330  20 74 68 65 72 65 e2 80  99 73 20 74 68 61 74 20  | there...s that |
00000340  77 6f 6e 64 65 72 66 75  6c 20 64 69 6e 6e 65 72  |wonderful dinner|
00000350  20 66 61 72 65 20 66 6f  72 20 77 68 69 63 68 20  | fare for which |
00000360  54 68 65 20 53 74 61 74  69 6f 6e 20 69 73 20 73  |The Station is s|
00000370  6f 20 77 65 6c 6c 20 6b  6e 6f 77 6e 2e 0a 0a 54  |o well known...T|
00000380  68 65 20 61 63 74 6f 72  73 2c 20 6d 6f 73 74 20  |he actors, most |
00000390  6f 66 20 77 68 6f 6d 20  6c 69 76 65 20 69 6e 20  |of whom live in |
000003a0  74 68 65 20 52 6f 67 75  65 20 52 69 76 65 72 20  |the Rogue River |
000003b0  61 72 65 61 2c 20 77 69  6c 6c 20 62 65 20 66 61  |area, will be fa|
000003c0  6d 69 6c 69 61 72 20 74  6f 20 70 61 74 72 6f 6e  |miliar to patron|
000003d0  73 20 6f 66 20 42 61 72  6e 73 74 6f 72 6d 65 72  |s of Barnstormer|
000003e0  73 20 54 68 65 61 74 65  72 20 69 6e 20 47 72 61  |s Theater in Gra|
000003f0  6e 74 73 20 50 61 73 73  20 61 73 20 74 68 65 79  |nts Pass as they|
00000400  20 68 61 76 65 20 61 6c  6c 20 62 65 65 6e 20 69  | have all been i|
00000410  6e 76 6f 6c 76 65 64 20  69 6e 20 61 20 76 61 72  |nvolved in a var|
00000420  69 65 74 79 20 6f 66 20  70 6c 61 79 73 20 66 6f  |iety of plays fo|
00000430  72 20 6d 61 6e 79 20 79  65 61 72 73 20 61 74 20  |r many years at |
00000440  74 68 61 74 20 69 63 6f  6e 20 6f 66 20 6c 6f 63  |that icon of loc|
00000450  61 6c 20 63 6f 6d 6d 75  6e 69 74 79 20 74 68 65  |al community the|
00000460  61 74 65 72 2e 20 0a 0a  0a 46 6f 72 20 69 6e 66  |ater. ...For inf|
00000470  6f 72 6d 61 74 69 6f 6e  20 61 6e 64 20 64 65 74  |ormation and det|
00000480  61 69 6c 73 2c 20 63 6f  6e 74 61 63 74 3a 0a 45  |ails, contact:.E|
00000490  72 69 63 20 45 70 73 74  65 69 6e 2c 20 4d 61 6e  |ric Epstein, Man|
000004a0  61 67 65 72 0a 35 34 31  2d 32 32 36 2d 36 37 30  |ager.541-226-670|
000004b0  33 0a                                             |3.|
000004b2

# 6  
Old 09-02-2011
It's choking on the utf-8 encoded (e2 80 9c) left double quotation mark which does not exist (same goes for the right double quotation mark) in the latin-1 encoding.

Regards,
Alister
# 7  
Old 09-02-2011
If that's the raw unconverted data it's definitely UTF8. The 'smart quotes' Microsoft Word likes to substitute for everything under the sun are the only things in there that need it.

The database still seems to believe it has to encode to latin-1. How did you change the database to UTF8?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Url encoding a string using sed

Hi I was hoping some one would know if it is possible to url encode a string using sed? My problem is I have extracted some key value pairs from a text file with sed, and will be inserting these pairs as source variables into a curl script to automatically download some xml from our server. My... (5 Replies)
Discussion started by: Paul Walker
5 Replies

2. Solaris

View file encoding then change encoding.

Hi all!! I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

3. Shell Programming and Scripting

Python replace string

Hi, I have a python variable with a value like this : string = "abc.de.fghijk.com:zyz.ab.fgfijk.com:abc.ef.fghijk.com" They are hostnames separated by the special character ":" . From this string I want to make a list with values : (2 Replies)
Discussion started by: ctrld
2 Replies

4. Shell Programming and Scripting

Remove lines between the start string and end string including start and end string Python

Hi, I am trying to remove lines once a string is found till another string is found including the start string and end string. I want to basically grab all the lines starting with color (closing bracket). PS: The line after the closing bracket for color could be anything (currently 'more').... (1 Reply)
Discussion started by: Dabheeruz
1 Replies

5. Shell Programming and Scripting

How to check string encoding?

I want to check if the string is WINDOWS-1251 or UTF-8 can you help me to find the string encoding??? or maybe to get URL Content-Type charset with wget? this is my function on PHP function check_utf8($str) { $len = strlen($str); for($i = 0; $i < $len; $i++){ $c =... (2 Replies)
Discussion started by: sanantonio7777
2 Replies

6. Shell Programming and Scripting

[python]string to list conversion

I have a file command.txt. It's content are as follows:- The content of file is actually a command with script name and respective arguments. arg1 and arg2 are dummy arguments , format : -arg arg_value test is a argument specifying run mode , format : -arg In my python code, i read it and... (1 Reply)
Discussion started by: animesharma
1 Replies

7. Shell Programming and Scripting

How to find the file encoding and updating the file encoding?

Hi, I am beginner to Unix. My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8. Please advice me how to proceed on this. (7 Replies)
Discussion started by: cnraja
7 Replies

8. Shell Programming and Scripting

Python String <--> Number

My question is so simple: A = raw_input("A ") if A == '56': VAR = (A + 54)/13 else: print "other operations" if I write in input 5656565656 i want to make some arithmetic operations if the first input is 56XXX but the output is TypeError: cannot concatenate 'str' and... (2 Replies)
Discussion started by: kazikamuntu
2 Replies

9. Shell Programming and Scripting

Python - Scan for string

Hi i have a variable 'reform' and store the lines like reform= { record string(8) ID; string(4) PRD; date("YYMMDD", split = "800101") DateofManufact; string(4) PRDC_MODULE_NUM; string(1) END_OF_RECORD = "\n"; } I need to search for the character "\n"in the above variable... (1 Reply)
Discussion started by: dhanamurthy
1 Replies

10. Shell Programming and Scripting

python and string.find

Hi all, I'm not sure if this is the right forum, but i'll give it a try. Here is my problem: i have two files having basically the same things in it (hostnames): File 1 mituap01 mituap02 mituap03 File 2: mituap01 mituap04 mituap05 my goal is to get a .py out to check if pcs' in... (0 Replies)
Discussion started by: penguin-friend
0 Replies
Login or Register to Ask a Question