Revisions to Command to delete directories whose contents are less than a given size

added 497 characters in body

Source Link

edited Jan 17, 2022 at 15:15

584.6k
96
1.1k
1.7k

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
-d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
-l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

POSIXly, you'd need something like:

(
  unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE
  cd ~/foo &&
   LC_ALL=C POSIXLY_CORRECT=1 find . ! -name . -prune -type d -exec sh -c '
     for dir do
       du -s "$dir" | awk '{exit $1<50*1024/512 ? 41 : 0}'
       [ "$?" -eq 41 ] && echo rm -rf "$dir"
     done' sh {} +
)

(the unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE and POSIXLY_CORRECT=1 being for GNU du to make sure it uses 512 as the block size as POSIX requires).

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
-d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
-l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
-d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
-l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

POSIXly, you'd need something like:

(
  unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE
  cd ~/foo &&
   LC_ALL=C POSIXLY_CORRECT=1 find . ! -name . -prune -type d -exec sh -c '
     for dir do
       du -s "$dir" | awk '{exit $1<50*1024/512 ? 41 : 0}'
       [ "$?" -eq 41 ] && echo rm -rf "$dir"
     done' sh {} +
)

(the unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE and POSIXLY_CORRECT=1 being for GNU du to make sure it uses 512 as the block size as POSIX requires).

added 105 characters in body

Source Link

edited Jan 17, 2022 at 14:34

Stéphane Chazelas

584.6k
96
1.1k
1.7k

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
-d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
-l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
-d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
-l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
-d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
-l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

Source Link

answered Jan 17, 2022 at 14:27

Stéphane Chazelas

584.6k
96
1.1k
1.7k

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
-d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
-l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Stack Exchange Network

Return to Answer