How to handle special characters in comments and hard coded strings in python file?

Question

This question aims at the following two scenarios:

You want to add a string with special characters to a variable:

special_char_string = "äöüáèô"
You want to allow special characters in comments.

# This a comment with special characters in it: äöà etc.

At the moment I handle this this way:

# -*- encoding: utf-8 -*-
special_char_string = "äöüáèô".decode('utf8')
# This a comment with special characters in it: äöà etc.

Works fine.

Is this the recommended way? Or is there a better solution for this?

Why .decode() ? Define a unicode string! u"äöüáèô" — phant0m
– phant0m, Commented Jun 22, 2011 at 12:15
@phant0m, I am confused. Do you suggest, that u"äöüáèô" is the same as "äöüáèô".decode('utf8') in any case? — Aufwind
– Aufwind, Commented Jun 22, 2011 at 12:21

phant0m · Accepted Answer · 2011-06-22 13:00:09Z

Python will check the first or second line for an emacs/vim-like encoding specification.

More precisely, the first or second line must match the regular expression "coding[:=]\s*([-\w.]+)". The first group of this expression is then interpreted as encoding name. If the encoding is unknown to Python, an error is raised during compilation.

Source: PEP 263

(A BOM would also make Python interpret the source as UTF-8.

I would recommend, you use this over .decode('utf8')

# -*- encoding: utf-8 -*-
special_char_string = u"äöüáèô"

In any case, special_char_string will then contain a unicode object, no longer a str. As you can see, they're both semantically equivalent:

>>> u"äöüáèô" == "äöüáèô".decode('utf8')
True

And the reverse:

>>> u"äöüáèô".encode('utf8')
'\xc3\xa4\xc3\xb6\xc3\xbc\xc3\xa1\xc3\xa8\xc3\xb4'
>>> "äöüáèô"
'\xc3\xa4\xc3\xb6\xc3\xbc\xc3\xa1\xc3\xa8\xc3\xb4'

There is a technical difference, however: if you use u"something", it will instruct the parser that there is a unicode literal, it should be a bit faster.

Tamás · Accepted Answer · 2011-06-22 12:01:50Z

2

Yes, this is the recommended way for Python 2.x, see PEP 0263. In Python 3.x and above, the default encoding is UTF-8 and not ASCII, so you don't need this there. See PEP 3120.

answered Jun 22, 2011 at 12:01

Tamás

48.2k12 gold badges107 silver badges125 bronze badges

Collectives™ on Stack Overflow

How to handle special characters in comments and hard coded strings in python file?

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related