5

If I create a text file containing the following lines:

>TESTTEXT_10000000
>TESTTEXT_1000000
>TESTTEXT_10000002
>TESTTEXT_10000001

and perform sort myfile, my output is

>TESTTEXT_1000000
>TESTTEXT_10000000
>TESTTEXT_10000001
>TESTTEXT_10000002

However, if I append /1 and /2 to my lines the sort output changes drastically, and I do not know why.

Input:

>TESTTEXT_10000000/1
>TESTTEXT_1000000/1
>TESTTEXT_10000002/1
>TESTTEXT_10000001/1

Output:

>TESTTEXT_10000000/1
>TESTTEXT_1000000/1
>TESTTEXT_10000001/1
>TESTTEXT_10000002/1

Input:

>TESTTEXT_10000000/2
>TESTTEXT_1000000/2
>TESTTEXT_10000002/2
>TESTTEXT_10000001/2

Output:

>TESTTEXT_10000000/2
>TESTTEXT_10000001/2
>TESTTEXT_1000000/2
>TESTTEXT_10000002/2

Is the forward slash being recognised as a seperator? using --field-sperator did not alter the behaviour. If so, why is 1000000/2 in between the 1000001/2 and 1000002/2 entries? Using the human sort, numeric sort or other options never brought about consistency. Can anyone help me out here?

:edit: Because it seems to be relevant, considering the answers, the value of LC_ALL on this machine is en_GB.UTF-8

1
  • 2
    Try this: LC_ALL=C sort file Commented Nov 1, 2016 at 13:52

1 Answer 1

3

/ is before 0 in your locale. Using LC_ALL=C or other locale will properly not change anything.

In your use case you would properly be able to use -Version sort:

sort -V myfile

Alternative can you specify the separator and keys to sort on:

sort -t/ -k1,1 myfile
Sign up to request clarification or add additional context in comments.

1 Comment

Oddly enough, LC_ALL=C did solve my issue and made the ordering consistent. However, the sort -V option also solved the issue without changing the locale, so I'll accept this answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.