0

Hello I want to save string into variable like this:

  msg=_(u'Uživatel <a href="{0}">{1} {3}</a>').format(request.user.get_absolute_url, request.user.first_name, request.user.last_name)

But since the inserted variables contain characters with accents such as š I get UnicodeDecodeError even though I have set the encoding by# -*- coding: utf-8 -*-

It is weird (IMHO) that it was working when I was creating this string by concatenating the variables like this:

msg=u'Uživatel <a href="' + request.user.get_absolute_url + ...

I have no clue why it shouldn't be working since its running project and I had to use such statements many times.

If you have any advice how to solve this I will be very grateful.

2
  • Apart from the problem of non-ASCII str being passed to unicode.format(), you also appear to be injecting raw strings into HTML markup, which likely represents a XSS security hole. When you create HTML from plain text variables you need to HTML-escape them before adding them to the markup string, for example using django.utils.html.escape. Commented May 6, 2015 at 21:34
  • Thats a good point. thank you. Commented May 7, 2015 at 19:22

3 Answers 3

2

One of your user lookups is returning an encoded bytestring rather than a Unicode object.

When Python 2.x is asked to concatenate Unicode and encoded bytestrings, it does so by decoding the bytestring into Unicode using the default encoding, which is ascii unless you go to some effort to change it. The # -*- coding: utf-8 -*- directive sets the encoding for your source code, but not the system default encoding.

From testing format, it looks like it tries to convert the argument to match the type of the left-hand side.

Under 2.x, things will work fine as long as the bytestring you're using can be decoded using ascii:

>>> u'test\u270c {0}'.format('bar')
u'test\u270c bar'

Or of course you're formatting in another Unicode object:

>>> u'test\u270c {0}'.format(u'bar\u270d')
u'test\u270c bar\u270d'

If you omit the u before your format, you'll get a UnicodeEncodeError:

>>> 'foo {0}'.format(u'test\u270c')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u270c' in position 4: ordinal not in range(128)

Conversely, if you format an encoded string with non-ascii bytes into a Unicode object, you'll get a UnicodeDecodeError:

>>> u'foo {0}'.format(test.encode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

I'd start by checking the get_absolute_url implementation. Valid URLs can never contain unescaped non-ascii characters, so they should always be decodable by ascii, but if you're using things built from standard Django models first_name and last_name should be Unicode objects so I'd bet on a buggy implementation of get_absolute_url at first.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the explanation...the reason is sooo stupid, but the traceback specifically said that the string that could not be encoded was the name. I used get_absolute_url instead of get_absolute_url().
0

Check the type of the arguments to format, I guess they are 'str', not 'unicode'. Before using them, encode them apropriatly, e.g.:

url = request.user.get_absolute_url
if isinstance(url, str):
    print 'url was str'
    a = url.decode('utf-8')
msg = u'Uživatel <a href="{0}">...</a>').format(url)

(The if and print statement is just for demonstration purpose) Use the other values accordingly.

Comments

0

The solution is pretty simple, I used get_absolute_urlinstead of get_absolute_url(). Sorry to bother you.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.