Skip to main content
Show how to install on macOS.
Source Link
Stephen Kitt
  • 481.3k
  • 60
  • 1.2k
  • 1.4k

As you say, your version of awk seems to count bytes, not characters. To fix this, use a character-aware implementation such as GNU Awk or The One True Awk (as updated for the second edition of The AWK Programming Language).

GNU Awk produces

Лорем ипсум долор сит амет, консектетур адиписцинг элит, сед до еусимо
темпор инцидидант ют лаборе эт долоре магна аликуа.

with your example input in a UTF-8 locale.

On macOS, both of these implementations can be installed using Homebrew, albeit one at a time (they conflict with each other):

brew install gawk

installs GNU Awk, whereas

brew install awk

installs The One True Awk.

As you say, your version of awk seems to count bytes, not characters. To fix this, use a character-aware implementation such as GNU Awk or The One True Awk (as updated for the second edition of The AWK Programming Language).

GNU Awk produces

Лорем ипсум долор сит амет, консектетур адиписцинг элит, сед до еусимо
темпор инцидидант ют лаборе эт долоре магна аликуа.

with your example input in a UTF-8 locale.

As you say, your version of awk seems to count bytes, not characters. To fix this, use a character-aware implementation such as GNU Awk or The One True Awk (as updated for the second edition of The AWK Programming Language).

GNU Awk produces

Лорем ипсум долор сит амет, консектетур адиписцинг элит, сед до еусимо
темпор инцидидант ют лаборе эт долоре магна аликуа.

with your example input in a UTF-8 locale.

On macOS, both of these implementations can be installed using Homebrew, albeit one at a time (they conflict with each other):

brew install gawk

installs GNU Awk, whereas

brew install awk

installs The One True Awk.

Source Link
Stephen Kitt
  • 481.3k
  • 60
  • 1.2k
  • 1.4k

As you say, your version of awk seems to count bytes, not characters. To fix this, use a character-aware implementation such as GNU Awk or The One True Awk (as updated for the second edition of The AWK Programming Language).

GNU Awk produces

Лорем ипсум долор сит амет, консектетур адиписцинг элит, сед до еусимо
темпор инцидидант ют лаборе эт долоре магна аликуа.

with your example input in a UTF-8 locale.