Python: Encoding problem

Question

I want to copy data from one database to another database. Therefore I wrote a Python script for this purpose.

Names are in german, but I don't think that will be a problem for understanding my question.

The script does the following

db = mysql.connect(db='', charset="utf8", use_unicode=True, **v.MySQLServer[server]);
...
cursor = db.cursor();

cursor.execute('select * from %s.%s where %s = %d;' % (eingangsDatenbankName, tabelle, syncFeldname, v.NEU))
daten = cursor.fetchall()

for zeile in daten:
    sql = 'select * from %s.%s where ' % (hauptdatenbankName, tabelle)
    ...
    for i in xrange(len(spalten)):
        sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])

The method "db_util.formatierFeld" looks like this

def formatierFeld(inhalt, feldTyp):

    if inhalt.lower() == "none":
        return "NULL"    #Stringtypen
    if "char" in feldTyp.lower() or "text" in feldTyp.lower() or "blob" in feldTyp.lower() or "date".lower() in feldTyp.lower() or "time" in feldTyp.lower():
        return '"%s"' % inhalt 
    else:
        return '%s' % inhalt

Well, to some of you this stuff will seem quite odd, but I can asure you I MUST do it this way, so please no discussion about style etc.

Okay, when running this code I get the following error message when I run into words with umlauts.

Traceback (most recent call last):
  File "db_import.py", line 222, in <module>
    main()
  File "db_import.py", line 219, in main
    importieren(server, lokaleMaschine, dbEingang, dbHaupt)
  File "db_import.py", line 145, in importieren
    sql += " %s, " %  db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)

Actually I do not understand why this string can't be build that way. I my opinion this should work since I explicitly tell the program to use unicode here.

Anybody has a guess what is going wrong here?

Jean-Paul Calderone · Accepted Answer · 2011-05-23 14:26:59Z

The error is made more difficult to interpret by the deep nesting of expressions you have.

In the line

sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])

where does the exception come from? It's difficult to say. However, I would suppose that it comes from str(zeile[i]). If zeile[i] is unicode containing non-ASCII characters, then you cannot convert it to a byte string using str. Instead, you must encode it to a byte string using a codec which can represent all of the characters it contains.

However...

unicode(str(zeile[i]), "utf-8")

This is pointless, if zeile[i] is a unicode string. First you try to encode it to a byte string, then you try to decode it back into a unicode string. You could skip all that and just do zeile[i]. formatierFeld doesn't really matter because execution never gets that far.

Collectives™ on Stack Overflow

Python: Encoding problem

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related