1

While attempting to render a template I'm getting the following error:

DjangoUnicodeDecodeError: 'utf8' codec can't decode bytes in position 26-27: invalid data. You passed in '\xce\x88\xce\xbe\xce\xbf\xce\xb4\xce\xb1 \xcf\x83\xcf\x84\xce\xb7\xce\xbd \xce\xb5\xcf\x81\xce\xb3\xce...' (<type 'str'>)

The template is fairly large and complex, so I'm hoping for some tips on how to track down where exactly this is coming from.

A few facts that might be helpful:

  • The template is generally unicode friendly; we display a fair amount of unicode data through it
  • The mysql table the data is coming from has utf8 encoding
  • This is a strange one: The error doesn't show up on my staging server when using the same code base and the same production data. The setup is very similar to the production server: Python 2.5.1, Django-1.1.1, mysql 5.0.38, ubuntu.

I'm not sure where exactly to look for the badly encoded data, any hints or pointers would be appreciated.

3 Answers 3

4

Somewhere you're truncating a string, but you're doing it on a str instead of a unicode so you end up splitting a UTF-8 character sequence in half. Always perform text operations on unicode, never str.

Sign up to request clarification or add additional context in comments.

3 Comments

Aaah, excellent suggestion, will try it when I get back in front of the computer.
Ignacio was dead on. I'd written a tag to truncate titles and had used str() instead of unicode() to convert the tag parameters to strings. Switched it over to unicode() and the problems went away.
This just bit me too. Should we then ever again use str()?
1

What is reported by the exception is 26 bytes of valid UTF-8 followed by '\xce...'

It looks very much to me that some piece of software, either in your code or in Django's code is doing something like this:

def too_big_display(strg, maxlen):
    return strg[:maxlen-3] + "..."

and in your case calling it with too_big_display(your_Greek_text_encoded_in_utf8,30)

and so you are seeing a secondary error ... \xce. is not valid UTF-8.

I suggest that you look very carefully through the traceback (which you should have shown us, and still can, by editing your question) to see whether there is any evidence of a primary error. If not, scrutinise your code for such a truncation.

Comments

0

In case somebody has similar situation like mine: I recently changed a MySQL table to use collation utf8_bin and ran into the same problem. I found out that in staging I have MySQL-python 1.2.3. Upgrading to 1.2.4 solved the problem for me. I am using python2.7, Django1.4.2.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.