LC_ALL is the environment variable that overrides all the other localisation settings (except $LANGUAGE under some circumstances).
Different aspects of localisations (like the thousand separator or decimal point character, character set, sorting order, month, day names, currency symbol) can be set using a few variables.
You'll typically set $LANG to your preference. The individual LC_xxx variables override a certain aspect. LC_ALL overrides them all. The locale command, when called without argument gives a summary of the current settings.
For instance, on a GNU system, I get:
$ locale
LANG=en_GB.UTF-8
LANGUAGE=
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
I can override an individual setting with for instance:
$ LC_TIME=fr_FR.UTF-8 date
jeudi 22 août 2013, 10:41:30 (UTC+0100)
Or:
$ LC_MONETARY=fr_FR.UTF-8 locale currency_symbol
€
Or override everything with LC_ALL.
$ LC_ALL=C LANG=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 cat /
cat: /: Is a directory
In a script, if you want to force a specific setting, as you don't know what settings the user has forced (possibly LC_ALL as well), your best, safest and generally only option is to force LC_ALL.
The C locale is a special locale that is meant to be the simplest locale. You could also say that while the other locales are for humans, the C locale is for computers. In the C locale, characters are single bytes, the charset is ASCII, the sorting order is based on the byte values, the language is US english and things like currency symbols are not defined.
On some systems, there's a difference with the POSIX locale where for instance the sort order for non-ASCII characters is not defined.
You generally run a command with LC_ALL=C to avoid the user's settings to interfere with your script. For instance, if you want [a-z] to match the 26 ASCII characters from a to z, you have to set LC_ALL=C.
On GNU systems, LC_ALL=C and LC_ALL=POSIX (or LC_MESSAGES=C|POSIX) override $LANGUAGE, while LC_ALL=anything-else wouldn't.
A few cases where you typically need to set LC_ALL=C:
sort -uorsort ... | uniq.... In many locales other than C, some characters have the same sorting order.sort -udoesn't report unique lines, but one of each group of lines that have equal sorting order. So if you do want unique lines, you need a locale where characters are byte and all characters have different sorting order (which theClocale guarantees).Character ranges like in
grep. If you mean to match a letter in the user's language, usegrep '[[:alpha:]]'and don't modifyLC_ALL. But if you want to match thea-zA-ZASCII characters, you need eitherLC_ALL=C grep '[[:alpha:]]'orLC_ALL=C grep '[a-zA-Z]'.[a-z]matches the characters that sort afteraand beforez. In other locales, you generally don't know what those are. For instance some locales ignore case for sorting so[a-z]could include[B-Z]or[A-Y]. In many UTF-8 locales (includingen_US.UTF-8on most systems),[a-z]will include the latin letters fromatoywith diacritics but not those ofz(sincezsorts before them) which I can't imagine would be what you want (why would you want to includeéand notź?).floating point arithmetic in
ksh93.ksh93honours thedecimal_pointsetting inLC_NUMERIC. If you write a script that containsa=$((1.2/7)), it will stop working when run by a user whose locale has comma as the decimal separator:$ ksh93 -c 'echo $((1.1/2))' 0.55 $ LANG=fr_FR.UTF-8 ksh93 -c 'echo $((1.1/2))' ksh93: 1.1/2: arithmetic syntax error
Then you need things like:
#! /bin/ksh93 -
float input="$1" # get it as input from the user in his locale
float output
arith() { typeset LC_ALL=C; (($@)); }
arith output=input/1.2 # use the dot here as it will be interpreted
# under LC_ALL=C
echo "$output" # output in the user's locale
As a side note: the , decimal separator conflicts with the , arithmetic operator which can cause even more confusion.
When you need characters to be bytes. Nowadays, most locales are UTF-8 based which means characters can take up from 1 to 6 bytes. When dealing with data that is meant to be bytes, with text utilities, you'll want to set LC_ALL=C. It will also improve performance significantly because parsing UTF-8 data has a cost.
a corollary of the previous point: when processing text where you don't know what character set the input is written in, but can assume it's compatible with ASCII (as virtually all charsets are). For instance
grep '<.*>'to look for lines containing a<,>pair will no work if you're in a UTF-8 locale and the input is encoded in a single-byte 8-bit character set like iso8859-15. That's because.only matches characters and non-ASCII characters in iso8859-15 are likely not to form a valid character in UTF-8. On the other hand,LC_ALL=C grep '<.*>'will work because any byte value forms a valid character in theClocale.Any time where you process input data or output data that is not intended from/for a human. If you're talking to a user, you may want to use their convention and language, but for instance, if you generate some numbers to feed some other application that expects English style decimal points, or English month names, you'll want to set LC_ALL=C:
$ printf '%g\n' 1e-2 0,01 $ LC_ALL=C printf '%g\n' 1e-2 0.01 $ date +%b août $ LC_ALL=C date +%b Aug