How to use .encode('utf-8') in Python?

Question

I'm administering some Python code in which I now see an error in the logs:

Traceback (most recent call last):
  File "./app/core.py", line 772, in scrapeEmail
    l.info('EMAIL SUBJECT: ', header['value'])
  File "./app/__init__.py", line 44, in info
    logging.info(str(datetime.utcnow()) + ' INFO     ' + caller.filename + ':' + str(caller.lineno) + ' - ' + ' '.join([str(x) for x in args]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 25: ordinal not in range(128)

which I guess means that header['value'] contains differently encoded characters.

I searched around, and this SO answer suggests to "put .encode('utf-8') at the end of the object for recent versions of Python".

This raised two questions for me:

On what object do I need to use .encode('utf-8'). On x or on str(x). So should it be str(x.encode('utf-8')) or on str(x).encode('utf-8')?
What does the writer mean with "recent versions of Python"? Can I still use .encode('utf-8') in Python 2.7?

Normally I would simply try it, but it is not easy (actually impossible) to find the string on which the error occurred. So I can't really test it.

A little help would be greatly appreciated here.

For 1) unless your object x implements method encode, you use it on the string (which has a method .encode) — DainDwarf
– DainDwarf, Commented Dec 8, 2015 at 13:55
That answer is not relevant to you; randomly putting encode on the end of string calls is unlikely to help. The problem is more likely that you have overridden the info method with your own implementation, which does not do the right thing. The decision about what to put in a log message belongs to the formatter, not a logger subclass. — Daniel Roseman
– Daniel Roseman, Commented Dec 8, 2015 at 14:01
Have you tried using unicode('something') instead str('something')? — pazitos10
– pazitos10, Commented Dec 8, 2015 at 14:06

Ryan Chou · Accepted Answer · 2015-12-08 14:13:35Z

I suggest that you should get clearly known about the relationship between unicode and other coding format(e.g GB2312, GBK) firstly. And soon there is no major problem on encoding and decoding:)

The following diagram will show you the relationship, once you got the main point on it, you will know when and how to do the encode and decode in your code. :)

---------              -----------             ----------
|       |  1.decode(A) |         | 2.encode(B) |        |
|   A   | -----------> | unicode | ----------->|   B    |
|       | <----------- |         | <---------- |        |
|       |  4.encode(A) |         | 3.decode(B) |        |
---------              -----------             ----------

So, according to the diagram, you should know what encoding is now, and what encoding want to transform, and then follow the relationship as diagram shows.

Collectives™ on Stack Overflow

How to use .encode('utf-8') in Python?

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related