Skip to main content
Improved formatting. Minor fixes.
Source Link
Pablo A
  • 3.2k
  • 1
  • 26
  • 46

It's the way your current locale defines collations/sorting rules that's causing it, and how -kN uses field N to the end of the line when comparing lines, not just field N (And some locales will sort bc,pe before b,foo if they ignore the commas). 

Use -k1,1 to only use that specific field, or specify the "C" locale and you should get the expected results:

$ LC_ALL=en_US.utf8 sort -t, -k1 test.txt
a,foo
ant,baz
bcd,ty
bc,pe
b,foo
c,bar
cn,cn 

$ LC_ALL=en_US.utf8 sort -t, -k1,1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn 

$ LC_ALL=C sort -t, -k1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn

It's the way your current locale defines collations/sorting rules that's causing it, and how -kN uses field N to the end of the line when comparing lines, not just field N (And some locales will sort bc,pe before b,foo if they ignore the commas). Use -k1,1 to only use that specific field, or specify the "C" locale and you should get the expected results:

$ LC_ALL=en_US.utf8 sort -t, -k1 test.txt
a,foo
ant,baz
bcd,ty
bc,pe
b,foo
c,bar
cn,cn
$ LC_ALL=en_US.utf8 sort -t, -k1,1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn
$ LC_ALL=C sort -t, -k1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn

It's the way your current locale defines collations/sorting rules that's causing it, and how -kN uses field N to the end of the line when comparing lines, not just field N (And some locales will sort bc,pe before b,foo if they ignore the commas). 

Use -k1,1 to only use that specific field, or specify the "C" locale and you should get the expected results:

$ LC_ALL=en_US.utf8 sort -t, -k1 test.txt
a,foo
ant,baz
bcd,ty
bc,pe
b,foo
c,bar
cn,cn 

$ LC_ALL=en_US.utf8 sort -t, -k1,1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn 

$ LC_ALL=C sort -t, -k1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn
better answer
Source Link
Shawn
  • 1.4k
  • 9
  • 9

It's the way your current locale defines collations/sorting rules that's causing it, and how (I have no clue about-kN uses field N to the reasoning for sorting bc before bend of the line when comparing lines, but I can reproduce itnot just field N (And some locales will sort bc,pe before b,foo if they ignore the commas). SpecifyUse -k1,1 to only use that specific field, or specify the "C" locale and you should get the expected results:

$ LC_ALL=en_US.utf8 sort -t, -k1 test.txt
a,foo
ant,baz
bcd,ty
bc,pe
b,foo
c,bar
cn,cn
$ LC_ALL=en_US.utf8 sort -t, -k1,1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn
$ LC_ALL=C sort -t, -k1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn

It's the way your current locale defines collations/sorting rules that's causing it (I have no clue about the reasoning for sorting bc before b, but I can reproduce it). Specify the "C" locale and you should get the expected results:

$ LC_ALL=en_US.utf8 sort -t, -k1 test.txt
a,foo
ant,baz
bcd,ty
bc,pe
b,foo
c,bar
cn,cn
$ LC_ALL=C sort -t, -k1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn

It's the way your current locale defines collations/sorting rules that's causing it, and how -kN uses field N to the end of the line when comparing lines, not just field N (And some locales will sort bc,pe before b,foo if they ignore the commas). Use -k1,1 to only use that specific field, or specify the "C" locale and you should get the expected results:

$ LC_ALL=en_US.utf8 sort -t, -k1 test.txt
a,foo
ant,baz
bcd,ty
bc,pe
b,foo
c,bar
cn,cn
$ LC_ALL=en_US.utf8 sort -t, -k1,1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn
$ LC_ALL=C sort -t, -k1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn
Source Link
Shawn
  • 1.4k
  • 9
  • 9

It's the way your current locale defines collations/sorting rules that's causing it (I have no clue about the reasoning for sorting bc before b, but I can reproduce it). Specify the "C" locale and you should get the expected results:

$ LC_ALL=en_US.utf8 sort -t, -k1 test.txt
a,foo
ant,baz
bcd,ty
bc,pe
b,foo
c,bar
cn,cn
$ LC_ALL=C sort -t, -k1 test.txt
a,foo
ant,baz
b,foo
bc,pe
bcd,ty
c,bar
cn,cn