Skip to main content
1 of 5

I just wanted to point out that uniq seems terribly slow, even on a sorted list.

I just tried getting a list of directory prefixes from a list of sorted filenames:

$ pv all_files | cut -d '/' -f 1,2,3,4 | uniq > all_prefixes

36.7GiB 0:07:41 [81.4MiB/s]

$ pv all_files | cut -d '/' -f 1,2,3,4 | sort -u > all_prefixes2

36.7GiB 0:03:14 [ 193MiB/s]

sort -u seems twice as fast as uniq, and this is with sort reading from stdin and writing to stdout, so I don't see it do any parallelization yet. I have no idea why uniq should be so much slower then sort, since it doesn't have to sort the list...

The same speeds remain if you turn around the order of the commands, my flow is limited by cpu time here, not disk access and caches (I only have 8GB of RAM and my swap is not used)