Skip to main content
added 497 characters in body
Source Link
Stéphane Chazelas
  • 584.6k
  • 96
  • 1.1k
  • 1.7k

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

  • specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
  • Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
  • -d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
  • -l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

POSIXly, you'd need something like:

(
  unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE
  cd ~/foo &&
   LC_ALL=C POSIXLY_CORRECT=1 find . ! -name . -prune -type d -exec sh -c '
     for dir do
       du -s "$dir" | awk '{exit $1<50*1024/512 ? 41 : 0}'
       [ "$?" -eq 41 ] && echo rm -rf "$dir"
     done' sh {} +
)

(the unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE and POSIXLY_CORRECT=1 being for GNU du to make sure it uses 512 as the block size as POSIX requires).

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

  • specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
  • Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
  • -d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
  • -l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

  • specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
  • Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
  • -d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
  • -l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

POSIXly, you'd need something like:

(
  unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE
  cd ~/foo &&
   LC_ALL=C POSIXLY_CORRECT=1 find . ! -name . -prune -type d -exec sh -c '
     for dir do
       du -s "$dir" | awk '{exit $1<50*1024/512 ? 41 : 0}'
       [ "$?" -eq 41 ] && echo rm -rf "$dir"
     done' sh {} +
)

(the unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE and POSIXLY_CORRECT=1 being for GNU du to make sure it uses 512 as the block size as POSIX requires).

added 105 characters in body
Source Link
Stéphane Chazelas
  • 584.6k
  • 96
  • 1.1k
  • 1.7k

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

  • specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
  • Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
  • -d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
  • -l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

  • specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
  • Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
  • -d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
  • -l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

  • specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
  • Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
  • -d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
  • -l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'
Source Link
Stéphane Chazelas
  • 584.6k
  • 96
  • 1.1k
  • 1.7k

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

  • specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
  • Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
  • -d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
  • -l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.