6

How to determine the "biggest number" so that sort -n will always put it at the end? I'm thinking of something along Inf in some languages but I'm not sure if anything like this exists for sort.


The background is that I'm sorting list of potentially non-existent paths by age so that

  • existent go first, from newest to oldest,

  • non-existent go last.

I'm using decoration approach and trying to put penalty on those unborns:

INFINITY=99999999999999   # ...close enough, right?

age_of() {
    if [ -e $1 ];
    then
        local ctime=$(stat -c %Z "$1" 2>/dev/null)
        local now=$(date +%s)
        echo $(($now - $ctime))
    else
        echo $INFINITY
    fi
}

cat - \
  | while read path;
    do
        echo "$(age_of $path) $path"
    done \
      | sort -n \
      | cut -d\  -f 2-

but obviously the comment is pretty naive; it's just a matter of time when even 99999999999999 will not be close enough. ;)

So is there a better value for INFINITY?

7
  • Do you need it to work on all platforms or just on yours? Commented Mar 14, 2014 at 20:11
  • @ALiX Since it's rather theoretical question I'd prefer solution on all platforms, or say, within some reasonable standard like POSIX. (I'll be using part of that code on my Fedora box but I'm not planning on living that long so I'm OK with current value there ;)). Commented Mar 14, 2014 at 20:16
  • I don't think that the -n flag necessarily has a limit. It could be implemented without resorting to actually converting to a number (so there wouldn't e a largest number). If you output something non-numerice, then all the sort versions I have seen will put the non-numerics last. Commented Mar 14, 2014 at 20:26
  • @ALiX I'm not really looking for limit since that will make the behavior undefined if age_of returned the same number. I'm rather looking for something that will always be after all numbers ... if it makes sense. Commented Mar 14, 2014 at 20:32
  • 2
    sort -g (a GNUism, but then you're using GNU stat) recognises "inf". perl -lne '$age = -M; $age = "inf" unless defined $age; print "$age $_"' | sort -g Commented Mar 14, 2014 at 21:16

1 Answer 1

5

Not a POSIX solution, but GNU sort offers the -g option which supports a wider range of number specifications including infinity. From http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html -

‘-g’
‘--general-numeric-sort’
‘--sort=general-numeric’

Sort numerically, converting a prefix of each line to a long double-precision
floating point number. See Floating point. Do not report overflow, underflow, or
conversion errors. Use the following collating sequence:

    Lines that do not start with numbers (all considered to be equal).
    NaNs (“Not a Number” values, in IEEE floating point arithmetic) in a
      consistent but machine-dependent order.
    Minus infinity.
    Finite numbers in ascending numeric order (with -0 and +0 equal).
    Plus infinity. 

Use this option only if there is no alternative; it is much slower than
--numeric-sort (-n) and it can lose information when converting to floating
point.

From my own tests it seems that any line beginning with Inf (any combination of upper/lower case) will appear after any numbers.

Failing that I don't think there are any character sequences that are reliably sorted after numbers using sort -n. GNU sort seems to treat all other sequences first as zero, placing the after negative numbers but before positive ones. What you could do, if it is timestamps that are being sorted, is to use the maximum value for a 64 bit timestamp plus one:

 9,223,372,036,854,775,808

This is a few more digits than you started out with!

7
  • GNU sort seems to put all other sequences first. From my tests (coreutils-8.21-21.fc20.x86_64), it actually seems that sort -g does that, e.g. (I'm using : for brevity here) "a:b:-inf:-10:0:10:inf", while sort -n regards all NaN's zero so that they end up in the middle, e.g. "-10:a:b:-inf:inf:0:10" Commented Mar 14, 2014 at 21:58
  • That said, your great answer is exactly what I was looking for. Commented Mar 14, 2014 at 22:01
  • @Alois, good point I never thought to try negative numbers. Commented Mar 14, 2014 at 22:03
  • I also had the maximum timestamp approach in mind, but then I wonder if there's a command that can give me that number regardless of architecture (64bit or 32bit etc.) Commented Mar 14, 2014 at 22:07
  • 2
    @Alois, I don't think there is a maximum value that sort can handle. A prudent implementation would avoid converting the string to a number internally so length wouldn't be a problem. You can still use a similar approach as for a pure lexical comparison to compare numbers - this would be the more robust than a conversion. You would only then be limited in extreme cases, perhaps where a number has INT_MAX digits (if the system hasn't already run out of memory). Commented Mar 14, 2014 at 22:45

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.