81

I am having a problem with my encoding in Python. I have tried different methods but I can't seem to find the best way to encode my output to UTF-8.

This is what I am trying to do:

result = unicode(google.searchGoogle(param), "utf-8").encode("utf-8")

searchGoogle returns the first Google result for param.

This is the error I get:

exceptions.TypeError: decoding Unicode is not supported

Does anyone know how I can make Python encode my output in UTF-8 to avoid this error?

1 Answer 1

102

Looks like google.searchGoogle(param) already returns unicode:

>>> unicode(u'foo', 'utf-8')

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    unicode(u'foo', 'utf-8')
TypeError: decoding Unicode is not supported

So what you want is:

result = google.searchGoogle(param).encode("utf-8")

As a side note, your code expects it to return a utf-8 encoded string so what was the point in decoding it (using unicode()) and encoding back (using .encode()) using the same encoding?

Sign up to request clarification or add additional context in comments.

5 Comments

Honestly, the unicode() was just fooling around trying to understand what was happening. Thank you very much :-)
Now I will sometimes get ascii' codec can't decode byte 0xc3 in position. Do you know why that is?
In the line I suggested? Then it would mean that searchGoogle() returned a string with 0xC3 byte. Calling .encode() on that results in Python trying to convert to unicode first (using ascii encoding). I don't know why searchGoogle() would sometimes return unicode and sometimes a string. Maybe it depends on what you give it in param? Try to stick to one type.
I wish there was a safe, simple way to cast to unicode.
@EricWalker You could write an awkward helper function like def uors2u(object, encoding=..., errors=...) which will return object param unchanged if it is already in Unicode or convert it if str. However, this code smells. You should be converting all input to Unicode as soon as you receive it from the outside (like a file system) and converting it back if needed before sending it back. There should be only one place where you convert str to unicode, so a helper function like the one I described should not be needed.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.