5

I'm working on an embedded linux distribution based on Yocto Morty.

I have used an Ubuntu distribution to create the following two files:

  • fòò.dàt
  • bàr.dàt

I have stored the files into a pendrive and connected the pendrive to my embedded system.

I have used PuTTY to connect via serial to the embedded system and browse the content of the pendrive. The files are listed as follow:

root@imx6qsabresd:/media/linux_desktop# ls -la
total 8
drwxr-xr-x 2 root root 4096 Mar 17  2017 .
drwxr-xr-x 9 root root 4096 Jan  1  1970 ..
-rwxr-xr-x 1 root root    0 Mar 17  2017 b?r.d?t
-rwxr-xr-x 1 root root    0 Mar 17  2017 f??.d?t

The locale of the Ubuntu distribution is:

user@user-VirtualBox:~$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=it_IT.UTF-8
LC_TIME=it_IT.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=it_IT.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=it_IT.UTF-8
LC_NAME=it_IT.UTF-8
LC_ADDRESS=it_IT.UTF-8
LC_TELEPHONE=it_IT.UTF-8
LC_MEASUREMENT=it_IT.UTF-8
LC_IDENTIFICATION=it_IT.UTF-8
LC_ALL=

The locale of the embedded distribution is:

root@imx6qsabresd:/media/linux_desktop# locale
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=en_US

even if the .UTF-8 suffix isn't explicited I assume the embedded system locale is UTF-8 because:

root@imx6qsabresd:/media/linux_desktop# locale charmap
UTF-8

see here: https://stackoverflow.com/a/42797421/5321161 for further details.

Below the list of locales I've currently installed in my embedded distribution:

root@imx6qsabresd:/media/linux_desktop# locale -a
C
de_DE
en_GB
en_GB.ISO-8859-1
en_US
en_US.ISO-8859-1
fr_FR
POSIX
zh_CN

PuTTY terminal emulator is configured to use UTF-8 as remote character set.

Why accented characters are replaced by question marks?

11
  • 1
    Have you verified your assertion by setting the locale correctly? (en_US.UTF-8) Do you have a clue on how the file names are actually written on the disk? It may not be UTF at all. You should do some tests that don't rely on the existing saved file names to make sure the communication link is not partially responsible for the issue either. Commented Mar 17, 2017 at 15:57
  • 1
    Either those characters are not recognised as printable in the locale (unlikely), or more likely, the mount options of the device make it that the file names are translated to a different charset. What's the output of ls | LC_ALL=C send -n l on the embedded system? Or it could very well be that that ls doesn't support localisation (though I'd expect to see f????.d??t if it were UTF-8 characters). Commented Mar 17, 2017 at 16:13
  • @JuliePelletier: I don't have any en_US.UTF-8 locale on my embedded system, en_US is the UTF-8 encoded version. I've checked the file name using the answer provided here: unix.stackexchange.com/a/351899/191458, e.g. fòò.dàt is encoded as: 66 c3 b2 c3 b2 2e 64 c3 a0 74, that corresponds to UTF-8 Commented Mar 17, 2017 at 16:19
  • @StéphaneChazelas: I don't have "send" on my BSP, I'll try to add it. I confirm that "ls" supports localization, I tried to set zh_CN locale (export LANGUAGE=zh_CN) and "ls --help" shows the man in chinese, see:"用法:ls [选项]... [文件]..." Commented Mar 17, 2017 at 16:25
  • Sorry typo, I meant sed, not send. Or use od -tx1 -tc Commented Mar 17, 2017 at 16:50

1 Answer 1

0

The problem was caused by the mount of the pendrive. I usually mount the device without specifying any option. E.g.

mount /dev/sdb1 /media

The result is:

/dev/sdb1 on /media type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)

As described here: https://linux.die.net/man/8/mount the default iocharset option is: iso8859-1.

I tried to mount the pendrive specifying the option iocharset=utf8 and this solved the problem:

mount -o iocharset=utf8 /dev/sdb1 /media

See the following UTF-8 characters correctly displayed in console:

root@imx6qsabresd:/media/win/mix# ls -la
total 28
drwxr-xr-x 7 root root 4096 Mar 13 15:19 .
drwxr-xr-x 9 root root 4096 Mar 16  2017 ..
drwxr-xr-x 2 root root 4096 Mar 13 15:13 Île-de-France
-rwxr-xr-x 1 root root    0 Mar 13 15:13 Île-de-France.txt
drwxr-xr-x 2 root root 4096 Mar 13 15:14 madrileños
-rwxr-xr-x 1 root root    0 Mar 13 15:15 madrileños.txt
drwxr-xr-x 2 root root 4096 Mar 13 14:58 mà_però
-rwxr-xr-x 1 root root    0 Mar 13 14:57 mà_però.txt
drwxr-xr-x 2 root root 4096 Mar 13 15:12 Märkisch-Oderland
-rwxr-xr-x 1 root root    0 Mar 13 15:13 Märkisch-Oderland.txt
drwxr-xr-x 2 root root 4096 Mar 13 15:08 أبو ظبي
-rwxr-xr-x 1 root root    0 Mar 13 15:09 أبو ظبي.txt

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.