Python: Using .format() on a Unicode-escaped string

Question

I am using Python 2.6.5. My code requires the use of the "more than or equal to" sign. Here it goes:

>>> s = u'\u2265'
>>> print s
>>> ≥
>>> print "{0}".format(s)
Traceback (most recent call last):
     File "<input>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265'
  in position 0: ordinal not in range(128)`

Why do I get this error? Is there a right way to do this? I need to use the .format() function.

Mad Scientist · Accepted Answer · 2010-07-13 08:34:48Z

251

Just make the second string also a unicode string

>>> s = u'\u2265'
>>> print s
≥
>>> print "{0}".format(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
>>> print u"{0}".format(s)
≥
>>>

answered Jul 13, 2010 at 8:34

Mad Scientist

18.7k13 gold badges88 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Philipp Over a year ago

@Kit: If you want all literals to be Unicode (like in Python 3), put from __future__ import unicode_literals at the beginning of your source files.

Hylidan Over a year ago

Yeah, this will get you if you're used to % formatting as this "%s" % u"\u2265" works, but "{}".format(u"\u2265") will throw an exception.

Iosu S. Over a year ago

what a simple thing.. what a terrible headache i got until i found this bit of enlightenment..

Ignacio Vazquez-Abrams · Accepted Answer · 2010-07-13 08:35:36Z

73

unicodes need unicode format strings.

>>> print u'{0}'.format(s)
≥

answered Jul 13, 2010 at 8:35

Ignacio Vazquez-Abrams

803k160 gold badges1.4k silver badges1.4k bronze badges

Comments

lps · Accepted Answer · 2019-03-18 19:15:12Z

A bit more information on why that happens.

>>> s = u'\u2265'
>>> print s

works because print automatically uses the system encoding for your environment, which was likely set to UTF-8. (You can check by doing import sys; print sys.stdout.encoding)

>>> print "{0}".format(s)

fails because format tries to match the encoding of the type that it is called on (I couldn't find documentation on this, but this is the behavior I've noticed). Since string literals are byte strings encoded as ASCII in python 2, format tries to encode s as ASCII, which then results in that exception. Observe:

>>> s = u'\u2265'
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)

So that is basically why these approaches work:

>>> s = u'\u2265'
>>> print u'{}'.format(s)
≥
>>> print '{}'.format(s.encode('utf-8'))
≥

The source character set is defined by the encoding declaration; it is ASCII if no encoding declaration is given in the source file (https://docs.python.org/2/reference/lexical_analysis.html#string-literals)

Oh and I found this to be of great help in understanding unicode in python, and text representation in computer systems in general: nedbatchelder.com/text/unipain.html

personal_cloud · Accepted Answer · 2024-02-05 20:01:34Z

Sorry to have to chime in 10 years later, but it looks like folks missed the simple answer here... str format strings need str arguments. Wherein '≥' is perfectly acceptable: since 2001 you can declare the source format. This lets you use UTF-8 directly in your source code:

# -*- coding: utf-8 -*-
print '{}'.format('≥')

Note that Python 3 drops format() and UTF-8 source support for byte strings, so consider switching to % and hex codes:

import sys
bprint = lambda s: sys.stdout.buffer.write(s + b"\n") and None
bprint(b'%s' % b'\xe2\x89\xa5')

(should you choose to port your program to Python 3 one day)

Collectives™ on Stack Overflow

Python: Using .format() on a Unicode-escaped string

4 Answers 4

3 Comments

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

1 Comment

Comments

Linked

Related