3

when I perform the "tree" command in the console, here's what I've got :

.
├── Annexe\ 1\ -\ Sch\303\251ma\ global\ de\ la\ base\ de\ donn\303\251es.raw
...

The result is composed of utf-8 sequences, I need to get the string in a human-readable form for a report. How can I convert that nasty thing ?

5
  • 1
    You can try exporting LC_ALL=C. Commented May 13, 2014 at 14:18
  • There is no change. Commented May 13, 2014 at 14:21
  • 2
    What does the locale command report? Commented May 13, 2014 at 14:22
  • What's the output of printf %s Annex*.raw | hd? Commented May 13, 2014 at 15:00
  • 1
    In which console? Linux? Commented May 13, 2014 at 23:19

2 Answers 2

6

You can specify any character set you want it to use with the --charset switch.

   --charset charset
          Set the character set to use when outputting HTML and for line 
          drawing.

There are also these 2 switches which may help:

   -q     Print non-printable characters in filenames as question marks 
          instead of the default.

   -N     Print non-printable characters as is instead of as escaped octal 
          numbers.

Also you can augment the output using these switches:

   -A     Turn on ANSI line graphics hack when printing the indentation 
          lines.

   -S     Turn on ASCII line graphics (useful when using Linux console mode 
          fonts). This option is now equivalent to `--charset=IBM437' and 
          may eventually be depreciated.
2

I can get that output with:

LC_ALL=C tree -A

You'd see \303\251 if tree thought that 0303 and 0251 were not valid characters (or sequence of character in your locale).

However that is valid in UTF-8 locales where \303\251 is é and in iso-8859-1 or iso-8859-15 (the two common single byte per character charsets that are common in French speaking countries) where \303 is à and \251 is ©.

So, here that suggests you're in a locale where the charset is defined only for the first 128 byte values like ASCII is like in the C locale.

You could tell tree that your charset is UTF-8 or iso-8859-15, and then it would not translate those 0303 bytes to \303.

locale -a will tell you if there's a locale on your system with a UTF-8 charset. Then you can pick one like fr_FR.UTF-8:

LC_ALL=fr_FR.UTF-8 tree

But then, whether it's going to be displayed properly or not will depend on what your terminal emulator understands. If it's not configured to display UTF-8 characters, it won't work.

If your terminal emulator is able to display iso-8859-1, you could make tree display UTF-8 and convert that with iconv:

LC_ALL=fr_FR.UTF-8 tree | iconv -f UTF-8

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.