5

I'm working on a script that displays UTF-8 characters as output. In my Gnome Terminal, this prints out a pretty maple leaf (🍁):

$ echo -e '\xF0\x9F\x8D\x81'

In rxvt, it prints out a box (the character it uses for "unknown"). locale is UTF-8 for both, but the fonts are different. Is there a way to determine on a user's machine whether certain characters are supported or not?

6
  • I'm assuming you are using some flavour of Linux - but you should specify the distro and version as I think the answer will likely depend on it (detect installation of various font packages). Commented May 23, 2016 at 18:48
  • I'm wondering how to do this in the general case (for other users) not for my machine. Commented May 23, 2016 at 18:57
  • I don't know the definitive answer, but this might help, if you can first programmatically figure out which font a given terminal is using Commented May 23, 2016 at 18:59
  • Possible duplicate of Get the display width of a string of characters Commented May 23, 2016 at 23:18
  • Does aterm support UTF-8 at all? Doesn't seem so. Commented May 24, 2016 at 11:46

1 Answer 1

7

An application running in a terminal has no way to find out from the terminal what the glyphs that the terminal has drawn look like (or even if they are substitute/placeholder characters).

One thing the application can do is find out if the terminal supports UTF-8 at all, and if it does, if it supports variable width characters. The method is as follows:

  • Read the cursor position by writing ESC [ 6 n and expecting ESC [ line ; col R
  • Write the 2-byte sequence "\xc2\xa0". If the terminal supports UTF-8, this is a single nonbreaking space. If the terminal does not support UTF-8, it's something unknown but which probably occupies 2 character positions (probably  followed by nonbreaking space, in fact).
  • Read the cursor position again and find out of the cursor moved by one position or two positions

If the terminal does support UTF-8, then you can find out if it supports variable characters widths by basically using the same trick. Read the cursor position, write a character which is supposed to be double-width in monospace fonts, such as あ, then read the cursor position again. If the terminal does not support double-width characters, the cursor will probably have naively moved by only one position.

5
  • 1
    That's only half the problem: the terminal may "know" a width for a character, but not display it due to font limitations. Also, the width may not be what you expect. Commented May 23, 2016 at 20:46
  • 1
    Character set detection by width packaged in a script Commented May 23, 2016 at 22:58
  • 1
    As Celada correctly points out, there's no way to detect if a glyph is displayed correctly. Although the answer shows you correctly how to detect if UTF-8 is supported. I recommend you not to do this. Any emulator not supporting UTF-8 should've been ditched a long time ago. If the terminal's behavior doesn't match the locale's charset, all the applications will fall apart big time. Imagine if every app repeated this check, and then... then what? It's not feasible. Apps should assume that the underlying system is set up correctly. If you really care, I recommend you to add a FAQ entry. Commented May 24, 2016 at 12:10
  • @egmont is quite right to recommend not using the trick I proposed in general. It's from a login script I wrote in 2003 whose job was to autodetect the UTF-8 support of the terminal and set the locale appropriately. A normal app should just assume the locale is set right without testing. I agree that if you need this kind of trick in 2016 you have a sorry system. Commented May 25, 2016 at 19:38
  • "An application running in a terminal has no way to find out from the terminal what the glyphs that the terminal has drawn look like (or even if they are substitute/placeholder characters)"" : This is not quite true, at least on linux systems. It is sometimes possible to figure out what font a given terminal is using by reading configuration variables, and then detect supported glyphs by examining the corresponding font file. The setfont command can also be used to output a map from unicode-code-points to terminal-font-code points, which provide clues about which glyphs can be rendered. Commented Jun 2, 2021 at 13:06

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.