0

I have a weird encoding problem from my PyQt app to my mysql database. I mean weird in the sense that it works in one case and not the other ones, even though I seem to be doing the exact same thing for all.

My process is the following:

I have some QFocusOutTextEdit elements in which I write text possibly containing accents and stuff (é,à,è,...)

I get the text written with :

    text = self.ui.text_area.toPlainText()
    text = text.toUtf8()

Then to insert it in my database I do :

text= str(text).decode('unicode_escape').encode('iso8859-1').decode('utf8')

I also set the character set of my database, the specific tables and the specific columns of the table to utf8.

It is working for one my text areas, and for the other ones it puts weird characters instead in my db.

Any hint appreciated on this !

RESOLVED : sorry for the disturbance, apparently I had some fields in my database that weren't up to date and this was blocking the process of encoding somehow.

4
  • WOW. Encode to UTF-8 then decode again and encode again and then decode again? Perhaps you'd better explain what all that munging is supposed to mean! Shouldn't you be able to simplify this A LOT? Commented Jan 13, 2012 at 9:30
  • Yeah I know it looked weird to me too. I saw this solution on some forum, and for some reason it works.. So I didn't look further, I don't understand much to encoding issues. If you have a simpler solution, I'll take it ! Commented Jan 13, 2012 at 9:34
  • how do you insert it into the database? Commented Jan 13, 2012 at 9:48
  • @Celeda: That's the code to convert UTF-8 \x escapes to the corresponding characters. Commented Jan 17, 2012 at 17:37

1 Answer 1

2

You are doing a lot of encoding, decoding, and reencoding which is hard to follow even if you know what all of it means. You should try to simplify this down to just working natively with Unicode strings. In Python 3 that means str (normal strings) and in Python 2 that means unicode (u"this kind of string").

Arrange for your connection to the MySQL database to use Unicode on input and output. If you use something high-level like Sqlalchemy, you probably don't need to do anything. If you use MySQLdb directly make sure you pass charset="UTF8" (which implies use_unicode) to the connect() method.

Then make sure the value you are getting from PyQT is a unicode value. I don't know PyQT. Check the type of self.ui.text_area or self.ui.text_area.toPlainText(). Hopefully it is already a Unicode string. If yes: you're all set. If no: it's a byte string which is probably encoded in UTF-8 so you can decode it with theresult.decode('utf8') which will give you a Unicode object.

Once your code is dealing with all Unicode objects and no more encoded byte strings, you don't need to do any kind of encoding or decoding anymore. Just pass the strings directly from PyQT to MySQL.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the tip ! I temporarily resolved my problem, but it is a bit convoluted I have to admit that. I'll try your solution as soon as I have some time !
Great ! I added this line self.conn.set_character_set('utf8') to my connection class, stopped the decoding-encoding-decoding part and it seems to work

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.