Python 3 character encoding issue

Question

i am selecting values from a MySQL // Maria DB that contains latin1 charset with latin1_swedish_ci collation. There are possible characters from different European language as Spanish ñ, German ä or Norwegian ø.

I get the data with

#!/usr/bin/env python3
# coding: utf-8

...
sql.execute("SELECT name FROM myTab")
for row in sql
 print(row[0])

There is an error message: UnicodeEncodeError: 'ascii' codec can't encode character '\xf1' Okay I have changed my print to

print(str(row[0].encode('utf8')))

and the result looks like this: b'\xc3\xb1' i looked at this Working with utf-8 encoding in Python source but i have declard the header. Also decode('utf8').encode('cp1250') does not help

thanks for supporting. this returnes UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 0 — Joe Platano
– Joe Platano, Commented Jun 19, 2017 at 23:23
Possible duplicate of How to set sys.stdout encoding in Python 3? — Joe Platano
– Joe Platano, Commented Jun 26, 2017 at 21:23

Joe Platano · Accepted Answer · 2017-06-26 21:22:26Z

3

okay the encoding issue has been solved finaly. Coldspeed gave a important hind with loacle. therefore all kudos for him! Unfortunately it was not that easy.

I found a workaround that fix the problem.

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)

The solution is from Jack O'Connor. posted in this answer:

answered Jun 26, 2017 at 21:22

Joe Platano

6241 gold badge16 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Richard Corden Over a year ago

+1 as this has allowed me to move forward. However, shouldn't this be written in flashing lights at the top of somewhere like docs.python.org/3/howto/unicode.html? My issue relates to using a jinja2 template. Where the template doesn't contain any unicode everything is OK, however, once there is a single unicode character somewhere in the template it breaks. My system locale is 'en_US.UTF-8' and no amount of encode/decode solved the problem. But the above just feels like such a fundamental thing that it cannot be the "correct way"?

domenukk Over a year ago

A thousand time this! How is this not the default in 2018 :/

cs95 · Accepted Answer · 2017-06-19 23:37:03Z

1

Python3 tries to automatically decode this string based on your locale settings. If your locale doesn't match up with the encoding on the string, you get garbled text, or it doesn't work at all. You can forcibly try encoding it with your locale and then decoding to cp1252 (it seems this is the encoding on the string).

print(row[0].encode('latin-1').decode('cp1252'))

answered Jun 19, 2017 at 23:37

cs95

406k106 gold badges744 silver badges794 bronze badges

5 Comments

Joe Platano Over a year ago

seems the point with locale directs to the goal. unfortunately your approach still does not brings the correct solution. But with locale i am getting closer.

cs95 Over a year ago

@JoePlatano what about row[0].encode('latin-1').decode('utf-8')?

Joe Platano Over a year ago

no does not work, well it does on shell if i exec the script as python script.py it works. On the webserver not. I added the following lines print(sys.stdout.encoding)and print(sys.getdefaultencoding()) in shell there is utf-8 for both. if i execute the script on browser there is ANSI_X3.4-1968 for sys.stdout.encoding and utf-8 for sys.getdefaultencoding(). I think there is some locale issue on apache

cs95 Over a year ago

@JoePlatano Oh, I see... afraid I'm at a loss here. Hope you figure it out! You should try different encodings and see which works.

Joe Platano Over a year ago

yeah thanks anyway for pushing me in a good direction! Therefore the upvote. Thanks buddy

Collectives™ on Stack Overflow

Python 3 character encoding issue

2 Answers 2

2 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Linked

Related