2

I am dealing with some unicode strings, which I am encoding using utf-8 whenever I need to display them. This way I make sure that, even when redirecting the output of my script to a file, the proper encoding is used (I know there are other ways to do this, but this is not the point).

Now, sometimes I need to tabulate some data, and for that I use format specifiers, as shown below:

def tabulate(uni1, uni2):
    print "%-15s,%-15s" % (uni1.encode('utf-8'), uni2.encode('utf-8'))

print '01234567890123456789' # ruler
tabulate(u'HELLO', u'BYE')
tabulate(u'ñññññ', u'BYE')

This program will produce the following output

01234567890123456789
HELLO          ,BYE            
ñññññ     ,BYE

As you can see, the second string is not properly tabulated. I guess that %s is not aware of the encoding of the string, and computes badly its length.

Is there a solution to this problem?

2 Answers 2

1

Here is an implementation for what Ignacio pointed out, which is to do the formatting before the encoding:

def tabulate(uni1, uni2):
    print (u"%-15s,%-15s" % (uni1, uni2)).encode('utf-8')

>>> tabulate(u'HELLO', u'BYE')
HELLO          ,BYE            
>>> tabulate(u'ñññññ', u'BYE')
ñññññ          ,BYE    
Sign up to request clarification or add additional context in comments.

Comments

1

Format as unicode, then encode.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.