Python format UnicodeDecodeError

Question

Hello I want to save string into variable like this:

  msg=_(u'Uživatel <a href="{0}">{1} {3}</a>').format(request.user.get_absolute_url, request.user.first_name, request.user.last_name)

But since the inserted variables contain characters with accents such as š I get UnicodeDecodeError even though I have set the encoding by# -*- coding: utf-8 -*-

It is weird (IMHO) that it was working when I was creating this string by concatenating the variables like this:

msg=u'Uživatel <a href="' + request.user.get_absolute_url + ...

I have no clue why it shouldn't be working since its running project and I had to use such statements many times.

If you have any advice how to solve this I will be very grateful.

Apart from the problem of non-ASCII str being passed to unicode.format(), you also appear to be injecting raw strings into HTML markup, which likely represents a XSS security hole. When you create HTML from plain text variables you need to HTML-escape them before adding them to the markup string, for example using django.utils.html.escape. — bobince
– bobince, Commented May 6, 2015 at 21:34

Peter DeGlopper · Accepted Answer · 2015-05-06 21:21:09Z

One of your user lookups is returning an encoded bytestring rather than a Unicode object.

When Python 2.x is asked to concatenate Unicode and encoded bytestrings, it does so by decoding the bytestring into Unicode using the default encoding, which is ascii unless you go to some effort to change it. The # -*- coding: utf-8 -*- directive sets the encoding for your source code, but not the system default encoding.

From testing format, it looks like it tries to convert the argument to match the type of the left-hand side.

Under 2.x, things will work fine as long as the bytestring you're using can be decoded using ascii:

>>> u'test\u270c {0}'.format('bar')
u'test\u270c bar'

Or of course you're formatting in another Unicode object:

>>> u'test\u270c {0}'.format(u'bar\u270d')
u'test\u270c bar\u270d'

If you omit the u before your format, you'll get a UnicodeEncodeError:

>>> 'foo {0}'.format(u'test\u270c')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u270c' in position 4: ordinal not in range(128)

Conversely, if you format an encoded string with non-ascii bytes into a Unicode object, you'll get a UnicodeDecodeError:

>>> u'foo {0}'.format(test.encode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

I'd start by checking the get_absolute_url implementation. Valid URLs can never contain unescaped non-ascii characters, so they should always be decodable by ascii, but if you're using things built from standard Django models first_name and last_name should be Unicode objects so I'd bet on a buggy implementation of get_absolute_url at first.

Thanks for the explanation...the reason is sooo stupid, but the traceback specifically said that the string that could not be encoded was the name. I used get_absolute_url instead of get_absolute_url().

Chris · Accepted Answer · 2015-05-06 21:16:52Z

0

Check the type of the arguments to format, I guess they are 'str', not 'unicode'. Before using them, encode them apropriatly, e.g.:

url = request.user.get_absolute_url
if isinstance(url, str):
    print 'url was str'
    a = url.decode('utf-8')
msg = u'Uživatel <a href="{0}">...</a>').format(url)

(The if and print statement is just for demonstration purpose) Use the other values accordingly.

answered May 6, 2015 at 21:16

Chris

1,1627 silver badges14 bronze badges

Comments

Dalbenn · Accepted Answer · 2015-05-07 19:35:11Z

0

The solution is pretty simple, I used get_absolute_urlinstead of get_absolute_url(). Sorry to bother you.

answered May 7, 2015 at 19:35

Dalbenn

1272 silver badges12 bronze badges

Collectives™ on Stack Overflow

Python format UnicodeDecodeError

3 Answers 3

1 Comment

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Related