I am dealing with some unicode strings, which I am encoding using utf-8 whenever I need to display them. This way I make sure that, even when redirecting the output of my script to a file, the proper encoding is used (I know there are other ways to do this, but this is not the point).
Now, sometimes I need to tabulate some data, and for that I use format specifiers, as shown below:
def tabulate(uni1, uni2):
print "%-15s,%-15s" % (uni1.encode('utf-8'), uni2.encode('utf-8'))
print '01234567890123456789' # ruler
tabulate(u'HELLO', u'BYE')
tabulate(u'ñññññ', u'BYE')
This program will produce the following output
01234567890123456789
HELLO ,BYE
ñññññ ,BYE
As you can see, the second string is not properly tabulated. I guess that %s is not aware of the encoding of the string, and computes badly its length.
Is there a solution to this problem?