Skip to main content
4 of 6
added 619 characters in body
Stéphane Chazelas
  • 585.1k
  • 96
  • 1.1k
  • 1.7k

If you wanted the cumulative disk usage (as your usage of du suggests) of the regular files that are over 60 days old and need only to be portable to GNU and busybox systems, you can do:

find . -type f -mtime +59 -print0 |
  xargs -r0 stat -c '%D:%i %b' | awk '
    !seen[$1]++ {sum += $2}
    END {print sum * 512}'

(and yes, you need -mtime +59 for files older than 60 x 24 hours. -mtime +60 would not match on a file that is 60.9 days old as that's rounded down to 60 days and 60 is not greater than 60)

That reports the total in number of bytes. Hard links are counted only once (like GNU du does; busybox du doesn't do it if the hardlinks are passed as separate arguments as opposed to found in the traversal of a single directory argument). However, like du, it won't detect the cases where some data is shared between non-hardlinked files, like when files have been copied with cp --reflink=always on filesystems like btrfs or when deduplication is performed by some file systems.

That should be equivalent to the GNU-specific:

find . -type f -mtime +59 -print0 |
  du -cB1 --files0-from=- |
  awk 'END{print $1}'

POSIXly, and assuming all files are on the same file system, you could do:

LC_ALL=C LS_BLOCK_SIZE=512 BLOCKSIZE=512 POSIXLY_CORRECT=1 \
  find . -type f -mtime +59 -exec ls -nisqd {} + | awk '
    !seen[$1]++ {sum += $2}
    END {print sum * 512}'

(with LS_BLOCK_SIZE=512 BLOCKSIZE=512 POSIXLY_CORRECT=1 to work around the fact that some ls implementations like GNU ls are not POSIX compliant by default).

After (here on a GNU system):

$ seq 10000 > a
$ truncate -s14T a
$ ln a b
$ touch -d '-60 days' a
$ BLOCKSIZE=1 ls -lis --full-time
total 98304
59944369 49152 -rw-rw-r-- 2 me me 15393162788864 2019-07-29 09:49:25.933 +0100 a
59944369 49152 -rw-rw-r-- 2 me me 15393162788864 2019-07-29 09:49:25.933 +0100 b
$ date --iso-8601=s
2019-09-27T09:50:03+01:00
$ du -h
52K     .

All give me 49152, which is the cumulative disk usage of both a and b but is different from the sum of their size (28 TiB) or the size of their disk usage (49152 x 2).

(note that the 52K above also includes the disk usage of the current directory file (., 4KiB in my case)).

For the sum of the apparent sizes.

find . -type f -mtime +59 -print0 |
  xargs -r0 stat -c %s | awk -v sum=0 '
    {sum += $0}; END{print sum}'

Or with GNU du:

find . -type f -mtime +59 -print0 |
  du -cbl --files0-from=- |
  awk 'END{print $1}'

Or POSIXly (here without the restriction about single file system):

LC_ALL=C find . -type f -mtime +59 -exec ls -nqd {} + |
  awk -v sum=0 '{sum += $5}; END {print sum}'

On the above example, they all give: 30786325577728 (28 TiB).

Stéphane Chazelas
  • 585.1k
  • 96
  • 1.1k
  • 1.7k