Wrong string formatting when using encoded unicode strings

Question

I am dealing with some unicode strings, which I am encoding using utf-8 whenever I need to display them. This way I make sure that, even when redirecting the output of my script to a file, the proper encoding is used (I know there are other ways to do this, but this is not the point).

Now, sometimes I need to tabulate some data, and for that I use format specifiers, as shown below:

def tabulate(uni1, uni2):
    print "%-15s,%-15s" % (uni1.encode('utf-8'), uni2.encode('utf-8'))

print '01234567890123456789' # ruler
tabulate(u'HELLO', u'BYE')
tabulate(u'ñññññ', u'BYE')

This program will produce the following output

01234567890123456789
HELLO          ,BYE            
ñññññ     ,BYE

As you can see, the second string is not properly tabulated. I guess that %s is not aware of the encoding of the string, and computes badly its length.

Is there a solution to this problem?

Andrew Clark · Accepted Answer · 2012-03-20 23:41:27Z

1

Here is an implementation for what Ignacio pointed out, which is to do the formatting before the encoding:

def tabulate(uni1, uni2):
    print (u"%-15s,%-15s" % (uni1, uni2)).encode('utf-8')

>>> tabulate(u'HELLO', u'BYE')
HELLO          ,BYE            
>>> tabulate(u'ñññññ', u'BYE')
ñññññ          ,BYE

answered Mar 20, 2012 at 23:41

Andrew Clark

210k36 gold badges284 silver badges310 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ignacio Vazquez-Abrams · Accepted Answer · 2012-03-20 23:36:01Z

1

Format as unicode, then encode.

answered Mar 20, 2012 at 23:36

Ignacio Vazquez-Abrams

803k160 gold badges1.4k silver badges1.4k bronze badges

Collectives™ on Stack Overflow

Wrong string formatting when using encoded unicode strings

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related