2

Scenario:

# on Linux
$ cat r456.txt
e+e+l
e+e-c

$ cat r456.txt | sort
e+e-c
e+e+l

$ sort --version
sort (GNU coreutils) 8.30

# on Cygwin
$ cat r456.txt
e+e+l
e+e-c

$ cat r456.txt | sort
e+e+l
e+e-c

$ sort --version
sort (GNU coreutils) 9.0
Packaged by Cygwin (9.0-1)

Here we see that sort on Linux and on Cygwin returns different results. Why?

How to make sort on Linux and on Cygwin return the same results?


UPD. Locales:

# on Linux
$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

# on Cygwin
$ locale
LANG=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=

UPD2. I've changed all locales on Cygwin to en_US.UTF-8. However, sort still returns different results:

# on Linux
$ cat r456a.txt
u1
u-1

$ cat r456a.txt | sort
u-1
u1

# on Cygwin
$ cat r456a.txt
u1
u-1

$ cat r456a.txt | sort
u1
u-1

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=en_US.UTF-8

How to fix?


UPD3. Observation: on both Linux and Windows all LC variables are set to en_US.UTF-8, strings to sort are:

ProfileDataContainer.cpp
ProfileData.cpp

Linux sort sorts them as

ProfileDataContainer.cpp
ProfileData.cpp

Cygwin sort sorts them as

ProfileData.cpp
ProfileDataContainer.cpp

Is it a bug in Linux sort or in Cygwin sort?

How to make Linux sort produce the same results as Cygwin sort?

Versions: Linux sort: 8.32, Cygwin sort: 9.0. Both are GNU coreutils.

10
  • @steeldriver How to check the locale? Commented Feb 18, 2024 at 19:51
  • Can reproduce it with C or C.UTF-8 versus en_US.UTF-8 Commented Feb 18, 2024 at 19:52
  • @steeldriver See UPD. Commented Feb 18, 2024 at 19:53
  • @steeldriver See UPD2. Commented Feb 18, 2024 at 20:14
  • 1
    have you tried the --debug option? Commented Sep 30, 2024 at 22:03

1 Answer 1

2
+50

I cannot give a definite answer to your question, but here is something that's probably worth thinking about it:

How strings are sorted is defined by the collation that the respective locale uses. A collation is a set of rules that determines which of two given code points (characters) is "greater" than the other and thus should be sorted later. Theoretically, a language or locale can use whatever collation it wants, although every locale uses a default collation (AFAIK).

In Linux, many locales use the iso14651_t1_common collation. For example, on one of my Debian 11 systems, the en_US locale includes (more precisely: literally copies) the iso14651_t1 collation, which in turn includes the iso14651_t1_common collation:

root@morn ~ # cat /usr/share/i18n/locales/en_US
...
LC_COLLATE

% Copy the template from ISO/IEC 14651
copy "iso14651_t1"

END LC_COLLATE
...

root@morn ~ # cat /usr/share/i18n/locales/iso14651_t1
...
LC_COLLATE

copy "iso14651_t1_common"
...
END LC_COLLATE

In contrast, Cygwin seems to use the collation data that Windows offers. At least, I am interpreting this page that way, and additionally, I was not able to spot collation data in any of the files in the Cygwin directory.

Given that, it may be the case that the default en_US.UTF-8 collation in Windows differs from that in Linux. Unfortunately, the collation files in Windows are binary, and their format is not documented according to this discussion. Therefore, it would be very difficult to prove that theory.

However, to me this seems the most convincing explanation for your observation. I do not believe that there is a bug in Cygwin's sort or in Linux's sort. Rather, I believe that there are differences in the en_US.UTF-8 default collation between Windows (and hence, Cygwin) and Linux.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.