Revisions to What does "LC_ALL=C" do?

need `function f { ...}` syntax in ksh93 to get local variable scope.

Source Link

edited Jul 30 at 6:10

584.9k
96
1.1k
1.7k

    #! /bin/ksh93 -
    float input="$1" # get it as input from the user in his locale
    float output
    function arith() { typeset LC_ALL=C; (($@)); }
    arith output=input/1.2 # use the dot here as it will be interpreted
                           # under LC_ALL=C
    echo "$output" # output in the user's locale

    #! /bin/ksh93 -
    float input="$1" # get it as input from the user in his locale
    float output
    arith() { typeset LC_ALL=C; (($@)); }
    arith output=input/1.2 # use the dot here as it will be interpreted
                           # under LC_ALL=C
    echo "$output" # output in the user's locale

    #! /bin/ksh93 -
    float input="$1" # get it as input from the user in his locale
    float output
    function arith { typeset LC_ALL=C; (($@)); }
    arith output=input/1.2 # use the dot here as it will be interpreted
                           # under LC_ALL=C
    echo "$output" # output in the user's locale

Make sentence clearer

Source Link

edited Dec 19, 2023 at 13:35

Stéphane Chazelas

584.9k
96
1.1k
1.7k

When you need characters to be bytes. Nowadays, most locales are UTF-8 based which means characters can take up from 1 to 6 bytes³. When dealing with data that is meant to be bytes, with text utilities, you'll want to set LC_ALL=C. It will also improve performance significantly because parsing UTF-8 data has a cost.
a corollary of the previous point: when processing text where you don't know what character set the input is written in, but can assume it's compatible with ASCII (as virtually all charsets are). For instance grep '<.*>' to look for lines containing a <, > pair will nonot work if you're in a UTF-8 locale and the input is encoded in a single-byte 8-bit character set like iso8859-15. That's because . only matches characters, and non-ASCII characters in iso8859-15 are likely not to form a valid character in UTF-8. On the other hand, LC_ALL=C grep '<.*>' will work because any byte value forms a valid character in the C locale.
Any time where you process input data or output data that is not intended from/for a human. If you're talking to a user, you may want to use their convention and language, but for instance, if you generate some numbers to feed some other application that expects English style decimal points, or English month names, you'll want to set LC_ALL=C:
```
 $ printf '%g\n' 1e-2
 0,01
 $ LC_ALL=C printf '%g\n' 1e-2
 0.01
 $ date +%b
 août
 $ LC_ALL=C date +%b
 Aug
```

When you need characters to be bytes. Nowadays, most locales are UTF-8 based which means characters can take up from 1 to 6 bytes³. When dealing with data that is meant to be bytes, with text utilities, you'll want to set LC_ALL=C. It will also improve performance significantly because parsing UTF-8 data has a cost.
a corollary of the previous point: when processing text where you don't know what character set the input is written in, but can assume it's compatible with ASCII (as virtually all charsets are). For instance grep '<.*>' to look for lines containing a <, > pair will no work if you're in a UTF-8 locale and the input is encoded in a single-byte 8-bit character set like iso8859-15. That's because . only matches characters, and non-ASCII characters in iso8859-15 are likely not to form a valid character in UTF-8. On the other hand, LC_ALL=C grep '<.*>' will work because any byte value forms a valid character in the C locale.
Any time where you process input data or output data that is not intended from/for a human. If you're talking to a user, you may want to use their convention and language, but for instance, if you generate some numbers to feed some other application that expects English style decimal points, or English month names, you'll want to set LC_ALL=C:
```
 $ printf '%g\n' 1e-2
 0,01
 $ LC_ALL=C printf '%g\n' 1e-2
 0.01
 $ date +%b
 août
 $ LC_ALL=C date +%b
 Aug
```

When you need characters to be bytes. Nowadays, most locales are UTF-8 based which means characters can take up from 1 to 6 bytes³. When dealing with data that is meant to be bytes, with text utilities, you'll want to set LC_ALL=C. It will also improve performance significantly because parsing UTF-8 data has a cost.
a corollary of the previous point: when processing text where you don't know what character set the input is written in, but can assume it's compatible with ASCII (as virtually all charsets are). For instance grep '<.*>' to look for lines containing a <, > pair will not work if you're in a UTF-8 locale and the input is encoded in a single-byte 8-bit character set like iso8859-15. That's because . only matches characters, and non-ASCII characters in iso8859-15 are likely not to form a valid character in UTF-8. On the other hand, LC_ALL=C grep '<.*>' will work because any byte value forms a valid character in the C locale.
Any time where you process input data or output data that is not intended from/for a human. If you're talking to a user, you may want to use their convention and language, but for instance, if you generate some numbers to feed some other application that expects English style decimal points, or English month names, you'll want to set LC_ALL=C:
```
 $ printf '%g\n' 1e-2
 0,01
 $ LC_ALL=C printf '%g\n' 1e-2
 0.01
 $ date +%b
 août
 $ LC_ALL=C date +%b
 Aug
```

Make sentence clearer

Source Link

edit approved Dec 19, 2023 at 13:35

Iulian Onofrei

176
1
10

When you need characters to be bytes. Nowadays, most locales are UTF-8 based which means characters can take up from 1 to 6 bytes³. When dealing with data that is meant to be bytes, with text utilities, you'll want to set LC_ALL=C. It will also improve performance significantly because parsing UTF-8 data has a cost.
a corollary of the previous point: when processing text where you don't know what character set the input is written in, but can assume it's compatible with ASCII (as virtually all charsets are). For instance grep '<.*>' to look for lines containing a <, > pair will no work if you're in a UTF-8 locale and the input is encoded in a single-byte 8-bit character set like iso8859-15. That's because . only matches characters, and non-ASCII characters in iso8859-15 are likely not to form a valid character in UTF-8. On the other hand, LC_ALL=C grep '<.*>' will work because any byte value forms a valid character in the C locale.
Any time where you process input data or output data that is not intended from/for a human. If you're talking to a user, you may want to use their convention and language, but for instance, if you generate some numbers to feed some other application that expects English style decimal points, or English month names, you'll want to set LC_ALL=C:
```
 $ printf '%g\n' 1e-2
 0,01
 $ LC_ALL=C printf '%g\n' 1e-2
 0.01
 $ date +%b
 août
 $ LC_ALL=C date +%b
 Aug
```

When you need characters to be bytes. Nowadays, most locales are UTF-8 based which means characters can take up from 1 to 6 bytes³. When dealing with data that is meant to be bytes, with text utilities, you'll want to set LC_ALL=C. It will also improve performance significantly because parsing UTF-8 data has a cost.
a corollary of the previous point: when processing text where you don't know what character set the input is written in, but can assume it's compatible with ASCII (as virtually all charsets are). For instance grep '<.*>' to look for lines containing a <, > pair will no work if you're in a UTF-8 locale and the input is encoded in a single-byte 8-bit character set like iso8859-15. That's because . only matches characters and non-ASCII characters in iso8859-15 are likely not to form a valid character in UTF-8. On the other hand, LC_ALL=C grep '<.*>' will work because any byte value forms a valid character in the C locale.
Any time where you process input data or output data that is not intended from/for a human. If you're talking to a user, you may want to use their convention and language, but for instance, if you generate some numbers to feed some other application that expects English style decimal points, or English month names, you'll want to set LC_ALL=C:
```
 $ printf '%g\n' 1e-2
 0,01
 $ LC_ALL=C printf '%g\n' 1e-2
 0.01
 $ date +%b
 août
 $ LC_ALL=C date +%b
 Aug
```

When you need characters to be bytes. Nowadays, most locales are UTF-8 based which means characters can take up from 1 to 6 bytes³. When dealing with data that is meant to be bytes, with text utilities, you'll want to set LC_ALL=C. It will also improve performance significantly because parsing UTF-8 data has a cost.
a corollary of the previous point: when processing text where you don't know what character set the input is written in, but can assume it's compatible with ASCII (as virtually all charsets are). For instance grep '<.*>' to look for lines containing a <, > pair will no work if you're in a UTF-8 locale and the input is encoded in a single-byte 8-bit character set like iso8859-15. That's because . only matches characters, and non-ASCII characters in iso8859-15 are likely not to form a valid character in UTF-8. On the other hand, LC_ALL=C grep '<.*>' will work because any byte value forms a valid character in the C locale.
Any time where you process input data or output data that is not intended from/for a human. If you're talking to a user, you may want to use their convention and language, but for instance, if you generate some numbers to feed some other application that expects English style decimal points, or English month names, you'll want to set LC_ALL=C:
```
 $ printf '%g\n' 1e-2
 0,01
 $ LC_ALL=C printf '%g\n' 1e-2
 0.01
 $ date +%b
 août
 $ LC_ALL=C date +%b
 Aug
```

added 308 characters in body

Source Link

edited Feb 10, 2021 at 8:13

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 450 characters in body

Source Link

edited Feb 8, 2021 at 15:07

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 1240 characters in body

Source Link

edited Jun 23, 2017 at 15:03

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 91 characters in body

Source Link

edited Nov 3, 2015 at 10:20

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 182 characters in body

Source Link

edited Oct 1, 2015 at 9:26

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 99 characters in body

Source Link

edited Mar 30, 2015 at 20:28

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 338 characters in body

Source Link

edited Mar 30, 2015 at 16:22

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 16 characters in body

Source Link

edited Mar 10, 2015 at 16:47

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 171 characters in body

Source Link

edited Sep 10, 2014 at 15:32

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 382 characters in body

Source Link

edited Jun 20, 2014 at 9:20

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 643 characters in body

Source Link

edited Jun 16, 2014 at 14:59

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 503 characters in body

Source Link

edited May 15, 2014 at 8:46

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 2550 characters in body

Source Link

edited Aug 22, 2013 at 20:52

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 95 characters in body

Source Link

edited Aug 22, 2013 at 16:15

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

added 151 characters in body

Source Link

edited Aug 22, 2013 at 16:07

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

Source Link

answered Aug 22, 2013 at 9:50

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

Stack Exchange Network

Return to Answer