Skip to main content
60 votes
Accepted

Why did {} start appearing as äå in Terminal.app?

I can reproduce it with the xterm terminal emulator (version 366), if I do: $ printf '\e[?42h\e(H'; cat chars.txt; printf '\e(B\e[?42l' !É#$%Ü&*()_+äå Where: \e[?42h. Enables National ...
Stéphane Chazelas's user avatar
27 votes
Accepted

Strange character in a file

This file contains bytes C2 96, which are the UTF-8 encoding of codepoint U+0096. That codepoint is one of the C1 control characters commonly called SPA "Start of Guarded Area" (or "Protected Area"). ...
Michael Homer's user avatar
26 votes
Accepted

UTF-8 characters in POSIX shell script *comments* - anything against it?

POSIX specifies how tokens should be recognised, including comments: If the current character is a '#', it and all subsequent characters up to, but excluding, the next <newline> shall be ...
Stephen Kitt's user avatar
17 votes

How can I correctly decompress a ZIP archive of files with Hebrew names?

I had success with the command 7z x <source.zip>. Version: p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,[...]) Potentially relevant environment: LANG=en_US.UTF-8 LC_ALL=...
vsz's user avatar
  • 171
13 votes

Can not use `cut -c` (`--characters`) with UTF-8?

colrm (part of util-linux, should be already installed on most distributions) seems to handle internationalization much better : $ echo 'αβγ' | colrm 3 αβ $ echo 'αβγ' | colrm 2 α Beware of the ...
Skippy le Grand Gourou's user avatar
13 votes

Why do some characters show as squares in Chrome?

There's a better way to determine what font you're missing instead of blindly installing font packages. For example I did the following to resolve missing fonts: I received an email with two unknown ...
2bluesc's user avatar
  • 479
12 votes
Accepted

How to translate Unicode characters?

Both GNU and BSD sed are multibyte-aware in appropriate locales, and the y command is analogous to tr: $ echo hello | sed -e 'y/abcdefghijklmnopqrstuvwxyz/abcdefghijklmnopqrstuvwxyz/' hello This ...
Michael Homer's user avatar
12 votes
Accepted

Problems with UTF-8 when attaching to a tmux session over ssh

You're checking the locale settings inside the tmux sessions, but not those that tmux itself receives. server.ltd likely doesn't have AcceptEnv LANG LC_* in its sshd_config or/and you don't have ...
Stéphane Chazelas's user avatar
12 votes

UTF-8 characters in POSIX shell script *comments* - anything against it?

The accepted answer is fine, but let me explain the same with a slightly different angle: POSIX is very exact and complete in its handling of character encodings. That is, any conceivable effect of ...
AnoE's user avatar
  • 947
12 votes
Accepted

Revert filenames after they were garbled by using different encoding

Those look like file names that were initially encoded in CP866 but were incorrectly converted to UTF-8 assuming they were encoded in MAC-CYRILLIC instead. $ echo СМП структура | iconv -t cp866 | # ...
Stéphane Chazelas's user avatar
11 votes
Accepted

View file names in hex?

Pipe the file names to od or a similar tool: printf '%s\n' * | od -t x1 -a $ ls Accentué bar foo $ printf '%s\n' * | od -t x1 -a 0000000 41 63 63 65 6e 74 75 c3 a9 0a 62 61 72 0a 66 ...
Stephen Kitt's user avatar
10 votes

How to print a variable that contains unprintable characters?

Some various approaches at giving visual representations of strings: POSIX $ printf %s "$IFS" | od -vtc -to1 0000000 \t \n \0 040 011 012 000 0000004 $ printf '%s\n' "$IFS" | LC_ALL=C ...
Stéphane Chazelas's user avatar
10 votes
Accepted

What's the right way to base64 encode a binary file on CentOS 7?

$ echo foo |base64 Zm9vCg== $ echo foo |base64 |wc -c 9 Note the trailing newline in the output of base64, it's the ninth character here. For longer input, it'll produce more than one line, as it ...
ilkkachu's user avatar
  • 148k
10 votes

How is data encoded in pipes/STDOUT/STDIN?

I'll address each of your points below: Pipes deal with binary, and are agnostic to the encoding Correct. Applications on each side of the pipe (including STDOUT/STDIN) should have consensus on the ...
Andy Dalton's user avatar
  • 14.7k
10 votes

What is the difference between a byte and a character (at least *nixwise)?

POSIXly, emphasis mine: 3.87 Character A sequence of one or more bytes representing a single graphic symbol or control code. In practice, the exact meaning depends on the locale in effect, e.g. ...
ilkkachu's user avatar
  • 148k
10 votes

What is "modifier" in locale name?

There is no single unified meaning for the modifier. For example, in the early 2000s, when parts of the EU transitioned from their own national currencies to the Euro, the @euro modifier was used to ...
Jörg W Mittag's user avatar
9 votes

Why do some characters show as squares in Chrome?

installing the noto font from google, did it for me. yay -S noto-fonts Now, reload the font cache: fc-cache -vf
Manuel Schmitzberger's user avatar
9 votes
Accepted

Unexpected non-null encoding of /proc/<pid>/cmdline

at least one uses spaces for delimiters Incorrect. If you look at the end of the pseudo-file on FreeBSD/TrueOS, where you can encounter exactly the same behaviour with Chromium, you will find a &#...
JdeBP's user avatar
  • 71.9k
9 votes
Accepted

What is `â<80><98>` and how to avoid it?

Your distribution uses UTF-8 character encoding. This is normal for most current distributions. What you see is the effect of UTF-8 coded characters displayed as another encoding. Many GNU utilities ...
RalfFriedl's user avatar
  • 9,239
8 votes

How can I correctly decompress a ZIP archive of files with Hebrew names?

I have just had the same problem, and it turns out that my version of unzip that is available from Ubuntu repositories (UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.) can handle ...
Igor Zinov'yev's user avatar
8 votes
Accepted

Bash convert \xC3\x89 to É?

Hexdecimal numeric constants are usually represented with 0x prefix.Character and string constants may express character codes in hexadecimal with the prefix \x followed by two hex digits. echo -ne '\...
RomanPerekhrest's user avatar
8 votes

Generate collating order of a list of individual characters

There are several aspects to that. We need to list all the characters in the locale's charset, select the graphical ones (like your 33 to 126 ASCII ones) and sort them. And there's also the question ...
Stéphane Chazelas's user avatar
7 votes

How to print a variable that contains unprintable characters?

Especially with IFS, you absolutely want to quote it, since otherwise it, well turns to nothing. You did that already, so no problem there. As for echo, it depends on the shell. Some versions of echo ...
ilkkachu's user avatar
  • 148k
7 votes
Accepted

How do I properly convert the file to UTF-16LE encoding without strange characters appearing in the file?

Your vim hasn't recognised the encoding, and is showing the 16-bit characters as 8-bit characters. The ^@ markers represent the higher order 8-bits, which for common Latin characters will be zero ...
Chris Davies's user avatar
7 votes

What is "modifier" in locale name?

The @modifier setting specifies a variant. A minor addition in the encoding set. As an example : European countries have long time relied on ISO definitions. Some French, for example (language fr, ...
MC68020's user avatar
  • 8,617
7 votes
Accepted

Collect chars from strings and print their unicode

With perl: perl -C -lne ' if (/=(.*)/) {$c{$_}++ for split //, $1} END{print join ",", map {sprintf "0x%X", ord$_} sort keys %c} ' your-file Gives: 0x42,0x46,0x61,0x63,0x64,...
Stéphane Chazelas's user avatar
7 votes
Accepted

Converting from ISO-IR-87 to UTF-8 encoding

GNU recode seems to support it: $ recode -l | grep -i ISO-IR-87 JIS_X0208 csISO87JISX0208 ISO-IR-87 JIS0208 JISX0208.1983-0 JISX0208.1990-0 JIS_X0208-1983 JIS_X0208-1990 X0208 So: recode ISO-IR-87.....
Stéphane Chazelas's user avatar
7 votes
Accepted

Allow unicode characters in zsh shell variable names on MacOS

Zsh variable names have to be made of alphanumeric characters only, and the first one can't be an ASCII digit which is reserved for position parameters. When the posixidentifiers option is enabled (...
Stéphane Chazelas's user avatar
6 votes
Accepted

How to make `less` understand codepage?

Running less as LC_ALL=ru_RU.CP1251 less file provided that ru_RU.CP1251 locale exists on your system (see if LC_ALL=ru_RU.CP1251 locale charmap returns CP1252) tells less that you are in that locale,...
Stéphane Chazelas's user avatar

Only top scored, non community-wiki answers of a minimum length are eligible