4

I need to use fmt to format some text output in Greek, but it does not behave as it does with latin characters. Consider for example the sentences with 15 characters below.

With latin characters:

 $echo "Have a nice day" | fmt -w 16
 Have a nice day

but, strangely, with non-latin characters:

 $echo "Ηαωε α νιψε δαυ" | fmt -w 16
 Ηαωε α
 νιψε δαυ

In fact for the above string, the smallest value that it prints the sentence without line breaks would be -w 28:

 $echo "Ηαωε α νιψε δαυ" | fmt -w 28
 Ηαωε α νιψε δαυ
 $echo "Ηαωε α νιψε δαυ" | fmt -w 27
 Ηαωε α νιψε
 δαυ

Can somebody explain why this happens and how to fix it, if possible?

1
  • 1
    fmt -w 120 on ascii text give me similar results to fmt -w 205-220 on unicode text. So just double the amount of characters (depends on the encoding) to get acceptable results ;) The coreutils people were doing some work on multibyte chars last year, perhaps we'll see results of that soon. lists.gnu.org/archive/html/coreutils/2016-07/msg00013.html Commented Jul 6, 2017 at 5:08

3 Answers 3

5

To answer your question, it is not working because Greek characters are non-Latin, Unicode characters, and:

Unlike par, fmt has no Unicode support, ...

https://en.wikipedia.org/wiki/Fmt

Additional notes

The second part of your question on how-to, unfortunately,

Although there seems be a fairly recent technical report regarding how to wrap Unicode, for example Heninger, Unicode Line Breaking Algorithm , 2015-06-01 http://www.unicode.org/reports/tr14/ however this seems to be specification only, no actual implementation or mention of software how-to examples. You could try asking the author via the email listed.

Since the Wikipedia article on fmt referred to par, and it was available via apt-get, I decided to try it on your posted text.

But I was unsuccessful, it still doesn't wrap the way you wish:

$ echo "Ηαωε α νιψε δαυ" | par 16gr
Ηαωε α
νιψε δαυ

The man page was difficult enough that even the author cautioned that it was: not well-written for the end-user, but if you are determined you could try your luck reading it.

4

fmt, as such, generally does not support "non-latin" (in your example, probably UTF-8). You could perhaps use par, which does do this.

Interestingly, the Solaris and FreeBSD manual pages for fmt are very similar, hinting that the program has been unimproved noticeably since the mid-1980s:

par may be available as a package for your system. If not, it is easy to compile, and found here:

http://www.nicemice.net/par/

On the other hand, proper support for UTF-8 in par has been lacking:

0

Plan 9's utils are usually UTF-8 aware. You can get them on your unix machine by installing plan9port. On macOS you can use Homebrew

brew install plan9port

which installs their fmt under the 9 command. It seems to behave the way you wanted:

$ echo "Ηαωε α νιψε δαυ" | 9 fmt -w 16
Ηαωε α νιψε δαυ
$ echo "Ηαωε α νιψε δαυ" | 9 fmt -w 14
Ηαωε α νιψε
δαυ

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.