3

Why mysql command-line outputs utf8 columns twice as wide compared to non-utf columns? Example:

$ mysql -u user --default-character-set=utf8
mysql> select "αβγαβγαβγαβγαβγαβγαβγ";
+--------------------------------------------+
| αβγαβγαβγαβγαβγαβγαβγ                      |
+--------------------------------------------+
| αβγαβγαβγαβγαβγαβγαβγ                      |
+--------------------------------------------+
1 row in set (0.00 sec)

mysql> select "abcabcabcabcabcabcabc";
+-----------------------+
| abcabcabcabcabcabcabc |
+-----------------------+
| abcabcabcabcabcabcabc |
+-----------------------+
1 row in set (0.00 sec)

As you can see, first table has column twice as wide compared to second table, and this often breaks formatting when lines start to get more than half-screen wide.

I tried this on MySQL 14.14 and MariaDB 15.1.

Is there a way to output utf8 columns with the same width as non-utf?

edit:

MariaDB [(none)]> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
3
  • SHOW VARIABLES LIKE 'char%'; -- I suspect something is set wrong. Greek letters take 2 bytes in utf8. Commented Sep 9, 2017 at 16:40
  • @RickJames - Added output of that command to the question. I do know that most non-latin1 chars will take 2 bytes in utf8, but why mysql uses byte count instead of character count when computing column display width? Commented Sep 9, 2017 at 16:56
  • Hi @Rogach, I'm having the same issue with other utf8 languages. Have you found a solution to it? Commented Mar 24, 2021 at 18:01

1 Answer 1

5

In the source code for mysql.cc (the source for the mysql client) there is an explanation in the comment block for function get_field_disp_length() which is used in the formatting of result set output.

Return the length of a field after it would be rendered into text.

This doesn't know or care about multibyte characters. Assume we're using such a charset. We can't know that all of the upcoming rows for this column will have bytes that each render into some fraction of a character. It's at least possible that a row has bytes that all render into one character each, and so the maximum length is still the number of bytes. (Assumption 1: This can't be better because we can never know the number of characters that the DB is going to send -- only the number of bytes. 2: Chars <= Bytes.)

In other words, since UTF8 can store characters that are 1 byte per character (like Latin characters), and the result can't know what the data is before it fetches it to display, it must assume any or all characters may be one byte per character.

The story might be different if you used a character set that uses a constant 2 bytes per character, like UCS-2. But I have never heard of anyone using UCS-2, since MySQL supports variable-length Unicode encodings.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.