3

I am trying to sort the output of ls by a certain part of the file names. The files are named like this:

file1-2025-09-30.tgz
file1-2025-10-01.tgz
file1-2025-10-15.tgz
file2-2025-09-30.tgz
file2-2025-10-01.tgz
file2-2025-10-15.tgz
file3-2025-09-30.tgz
file3-2025-10-01.tgz
file3-2025-10-15.tgz
special-file-2025-09-30.tgz
special-file-2025-10-01.tgz
special-file-2025-10-15.tgz
yet-another-file-2025-09-30.tgz
yet-another-file-2025-10-01.tgz
yet-another-file-2025-10-15.tgz

I need to sort these files by the part of their names that represents their date. The problem I am facing is that there are files containing a dash as part of their file name, so using "--field-separator=-" doesn't work there.

The output I need to achieve should be:

file1-2025-09-30.tgz
file2-2025-09-30.tgz
file3-2025-09-30.tgz
special-file-2025-09-30.tgz
yet-another-file-2025-09-30.tgz
file1-2025-10-01.tgz
file2-2025-10-01.tgz
file3-2025-10-01.tgz
special-file-2025-10-01.tgz
yet-another-file-2025-10-01.tgz
file1-2025-10-15.tgz
file2-2025-10-15.tgz
file3-2025-10-15.tgz
special-file-2025-10-15.tgz
yet-another-file-2025-10-15.tgz

In other words: files need to be grouped by the date part.

I can't use find . -type f -mtime... either, because the files' creation dates are not identical with the dates as part of their names. In other words: file1-2025-10-01.tgz has not been created at 2025-10-01, is has been copied later to that directory.

How can I achieve the desired output?

Many thanks in advance for any help!

New contributor
weka is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
1
  • Beware most sort implementations don't have "--key" and "--field-separator" options. Those are non-standard alternative names by the GNU implementation of sort for the standard -k and -t options. Commented 15 hours ago

3 Answers 3

4

Yes, you can't do it with sort alone unless you pre- and post- process the list in a decorate-sort-undecorate fashion.

Here you can use zsh whose globs you can sort using arbitrary transformations.

$ print -rl -- *(oe['REPLY=${(M)REPLY%-*-*-*}-$REPLY'])
file1-2025-09-30.tgz
file2-2025-09-30.tgz
file3-2025-09-30.tgz
special-file-2025-09-30.tgz
yet-another-file-2025-09-30.tgz
file1-2025-10-01.tgz
file2-2025-10-01.tgz
file3-2025-10-01.tgz
special-file-2025-10-01.tgz
yet-another-file-2025-10-01.tgz
file1-2025-10-15.tgz
file2-2025-10-15.tgz
file3-2025-10-15.tgz
special-file-2025-10-15.tgz
yet-another-file-2025-10-15.tgz

For example, where we use oe[code] to define the order based on the evaluation of the code (with $REPLY the name of the file to work on).

${(M)REPLY%-*-*-*} expands to the shortest part of $REPLY at the end that Matches -*-*-* so on special-file-2025-10-15.tgz for instance, expands to -2025-10-15.tgz to which we append $REPLY so we end up sorting based on -2025-10-15.tgz-special-file-2025-10-15.tgz instead of special-file-2025-10-15.tgz

The list is passed to print -rl instead of ls -d as ls by default does its own sorting. Though if your ls is GNU ls you can add the -U option to disable that sorting.

Another approach:

print -rl -- *(oe['REPLY[1,0]=$REPLY[(ws[-])-3,-1]-'])

Where we get the third-last (-3) to last (-1) words of $REPLY (those words being --separated) and prepend to $REPLY by assigning to $REPLY[1,0].

If you don't have zsh, you could use perl:

perl -le '
  print $_->[0] for
    sort {$a->[1] cmp $b->[1]}
    map {[$_, s/(.*)((-.*){3})/$2-$1/sr]} @ARGV' -- *

Where you'll recognise a typical Schwartzian transform, which is essentially what zsh does with its oe glob qualifier.

Here the list of (non-hidden) files is passed to perl via the expansion of the shell glob *, but you could also use perl's own glob:

perl -le '
  print $_->[0] for
    sort {$a->[1] cmp $b->[1]}
    map {[$_, s/(.*)((-.*){3})/$2-$1/sr]} <*>'

Which would also have the advantage of avoiding:

  • the limit on the size of arguments passed to a command.
  • issues when there's no non-hidden file in the current directory.

To pass those to a command such as GNU ls -ogdU --, change it to:

perl -le '
  exec qw(ls -ogdU --), map {$_->[0]}
    sort {$a->[1] cmp $b->[1]}
    map {[$_, s/(.*)((-.*){3})/$2-$1/sr]} <*>'

Though that reintroduces the problem with the limit on size of arguments which you could work around with:

perl -l0e '
  print $_->[0] for
    sort {$a->[1] cmp $b->[1]}
    map {[$_, s/(.*)((-.*){3})/$2-$1/sr]} <*>' |
  xargs -r0 ls -ogdU --

printing the list 0-byte-delimited (0 being the only byte value that can't be found in a file path) and using xargs to split that list into as many invocations of ls as necessary to avoid that limit; zsh has its own zargs for that.

Note that zsh globs and sort sort files based on the locale's collation¹ while perl's cmp compares byte-wise à la memcmp().


¹ though see the n qualifier in zsh or the -V/--sort=version of GNU sort to take into account the numeric value of sequences of digits within the strings; for a YYYY-MM-DD date, they should all be equivalent.

1
  • 1
    yep, with a good shell you don't have to implement the "decorate, sort, undecorate" yourself, it can do it by itself. Commented 16 hours ago
3

Welcome here! When people start doing processing on the output of ls, we typically point them to this question, which in short, says

don't do it, there's a better way! ls was meant for humans to read, not for software to process in a determinstic way.

That's true here, as well. The solutions is relatively straightforward: as any sorting problem where the things being sorted are not identical to the sorting keys, the general approach is a programming pattern called "decorate, sort, undecorate".

  1. decorate: you take your file names and transform in a way that makes them sort correctly, using a standard sorting algorithm, without losing the info what the original name was. In this means, you need to prepend the file names with the date.
  2. sort: you sort the transformed items
  3. undecorate: you remove the decoration, i.e., the leading date, to get back the original file names.

So, without wanting to do all your work for you and steal the feeling of success from you.

You start by getting all file names in an interable way. No need for ls or find: Good old file completion of your shell does that for you. You can make use of that like this:

for filename in * ; do
   # Do something with "${filename}" here
done

So, what would you want to "do" to the filename there? Well, you can (among other things) use a regular expression to get only the date string before .tgz at the end of the line:

printf '%s\0' "${filename}" | sed -z 's/.*\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)\.tgz/\1 &/s'

will transform a filename like 'file2-2025-10-15.tgz' into '2025-10-15 file2-2025-10-15.tgz'.

Great!

Now you have a for-loop that can transform file names into sortable strings, separated by \0 zero bytes.

You can sort that with sort -z.

You can then take the output of that sorting and remove the leading [0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} again, to get the original file names.

1
  • Note GNU ls now has a --zero which goes with GNU sort's -z. Commented 16 hours ago
1

This date format is such that, when isolated from the leading text, required a simple sort. With reversed lines you only need to replace the third hyphen by space. This is enough to create two sort fields. The rest is easy:

$ rev input_file.txt | sed 's/-/ /3' | rev | sort -k2 | sed 's/ /-/'

file1-2025-09-30.tgz
file2-2025-09-30.tgz
file3-2025-09-30.tgz
special-file-2025-09-30.tgz
yet-another-file-2025-09-30.tgz
file1-2025-10-01.tgz
file2-2025-10-01.tgz
file3-2025-10-01.tgz
special-file-2025-10-01.tgz
yet-another-file-2025-10-01.tgz
file1-2025-10-15.tgz
file2-2025-10-15.tgz
file3-2025-10-15.tgz
special-file-2025-10-15.tgz
yet-another-file-2025-10-15.tgz
1
  • 2
    But that moves the problem to file names that contain blanks (or newlines). Commented 13 hours ago

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.