Command to delete directories whose contents are less than a given size

Question

I'm working in a directory ~/foo which has subdirectories

~/foo/alpha
~/foo/beta
~/foo/epsilon
~/foo/gamma

I would like to issue a command that checks the total size under each "level 1" subdirectory of ~/foo and deletes the directory along with its contents if the size is under a given amount.

So, say I'd like to delete the directories whose contents have less than 50K. Issuing $ du -sh */ returns

8.0K alpha/
114M beta/
20K  epsilon/
1.2G gamma/

I'd like my command to delete ~/alpha and ~/epsilon along with their contents. Is there such a command? I suspect this can be done with find somehow but I'm not quite sure how.

lcd047 · Accepted Answer · 2017-04-16 20:58:04Z

22

With GNU find and GNU coreutils, and assuming your directories don't have newlines in their names:

find ~/foo -mindepth 1 -maxdepth 1 -type d -exec du -ks {} + | awk '$1 <= 50' | cut -f 2-

This will list directories with total contents smaller than 50K. If you're happy with the results and you want to delete them, add | xargs -d \\n rm -rf to the end of the command line.

edited Apr 16, 2017 at 20:58

answered Jul 6, 2015 at 5:29

lcd047

7,3871 gold badge24 silver badges33 bronze badges

4

@BrianFitzpatrick There is also ncdu that can be useful occasionally.

lcd047
– lcd047

2015-07-06 06:01:56 +00:00
Commented Jul 6, 2015 at 6:01
You probably want '-r' in the xargs command, else it will probably run once anyway?

David Goodwin
– David Goodwin

2023-01-11 09:41:08 +00:00
Commented Jan 11, 2023 at 9:41
1

@DavidGoodwin, not critical here as rm -rf won't complain if not passed any argument. There's a problem though that it works line-based instead of file-path-based. Also du by default deduplicates hard links and since the order of find is non-deterministic and du may be called several times, that will be done seemingly randomly. Adding -l would disable that deduplication, or + could be replaced with ';' for the deduplication to be done only on each dir separately

Stéphane Chazelas
– Stéphane Chazelas

2023-01-11 11:02:09 +00:00
Commented Jan 11, 2023 at 11:02
@StéphaneChazelas - I started by putting an echo before the rm bit, due to paranoia, and didn't like seeing the single "echo rm -Rf" coming out ....

David Goodwin
– David Goodwin

2023-01-17 11:47:54 +00:00
Commented Jan 17, 2023 at 11:47

Add a comment |

Stéphane Chazelas · Accepted Answer · 2022-01-17 15:15:14Z

With the GNU implementations of du, awk and xargs, to work with arbitrary file names, you do:

(
  cd ~/foo &&
    du --block-size=1 -l0d1 |
      awk -v RS='\0' -v ORS='\0' '
        $1 < 50*1024 && !/^[0-9]+\t\.$/ && sub("^[^\t]+\t", "")' |
      xargs -r0 echo rm -rf --)
)

That is:

specify a block size as otherwise which one GNU du uses depends on the environment. 1 guarantees you get the maximum precision (you get disk usage in number of bytes).
Use -0 to work with NUL-delimited records (NUL being the only character that may not be found in a file path).
-d1 to only get the cumulative disk usage of dirs up to depth 1 (depth 0 (.) is excluded with !/^[0-9]\t\.$/ in awk.
-l to make sure files' disk usage are accounted against every directory they're found in as an entry, not just the first.

Remove the echo (dry-run) to actually do it.

Or with perl instead of gawk:

perl -0ne 'print $2 if m{(\d+)\t(.*)}s && $1 < 50<<10'

POSIXly, you'd need something like:

(
  unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE
  cd ~/foo &&
   LC_ALL=C POSIXLY_CORRECT=1 find . ! -name . -prune -type d -exec sh -c '
     for dir do
       du -s "$dir" | awk '{exit $1<50*1024/512 ? 41 : 0}'
       [ "$?" -eq 41 ] && echo rm -rf "$dir"
     done' sh {} +
)

(the unset -v BLOCK_SIZE BLOCKSIZE DU_BLOCKSIZE and POSIXLY_CORRECT=1 being for GNU du to make sure it uses 512 as the block size as POSIX requires).

BoeroBoy · Accepted Answer · 2022-01-17 12:57:38Z

1

I know this is kind of old but I have my own $0.02 for doing this and hope it might help someone else down the line. Using GNU parallel for much better parallel performance:

find . -type d | parallel du -s {} | sort -h

This will output all directory sizes in PWD sorted by size. To sort reverse:

find . -type d | parallel du -s {} | sort -hr

Note that sort -h also works with du -h:

~  VirtualBox VMs  $  find . -type d | parallel du -sh {} | sort -h
4.0K    ./CentOS6/dir with spaces
4.0K    ./TFE79/Snapshots
8.0K    ./Desktop_default_1614944927311_69927/Logs
8.0K    ./Desktop_default_1614945289369_20675/Logs
12K     ./Desktop_default_1614944927311_69927
12K     ./Desktop_default_1614945289369_20675
96K     ./hello-world/Logs
108K    ./hello-world
160K    ./Knoppix/Logs
172K    ./Desktop_default_1627485664080_37244/Logs
172K    ./Knoppix
208K    ./CentOS6/Logs
228K    ./Flash/Logs
880K    ./TFE8/Logs
980K    ./TFE79/Logs
260M    ./NomadOS
411M    ./Desktop_default_1627485664080_37244/Snapshots
4.5G    ./CentOS6
6.6G    ./Flash
9.4G    ./TFE8/Snapshots
13G     ./TFE8
15G     ./Desktop_default_1627485664080_37244
18G     ./TFE79
56G     .

answered Jan 17, 2022 at 12:57

BoeroBoy

1313 bronze badges

parallelizing tasks that are I/O bound tasks is counter productive. Also, running du for each dir means you're going to get disk usages of the same files several times. du -s dir includes the disk usage reported by du -s dir/subdir. Run du without -s instead without find. You'll need -h for du if you want human suffixes. So here just du -lh | sort -rh (all those -l, -h being GNU extensions and here assuming dir paths don't contain newline characters).

Stéphane Chazelas
– Stéphane Chazelas

2022-01-17 14:15:06 +00:00
Commented Jan 17, 2022 at 14:15
1

@StéphaneChazelas "Parallelizing tasks that are I/O bound tasks is counter productive." Not always. The answer is really: "it depends, so measure instead of assume". oletange.wordpress.com/2015/07/04/parallel-disk-io-is-it-faster

Ole Tange
– Ole Tange

2022-01-17 17:28:17 +00:00
Commented Jan 17, 2022 at 17:28
In practice it works much better for me anyway. Once the metadata is cached other threads that might re-use it speed up by a significant margin. 56 threads on this box and it's about 16x faster in most of my experiences. In my case I needed to purge small or empty garbage dirs from a web crawler so left the full min/max depth.

BoeroBoy
– BoeroBoy

2022-01-17 19:04:20 +00:00
Commented Jan 17, 2022 at 19:04

Add a comment |

Community · Accepted Answer · 2017-04-16 06:46:51Z

-2

First answer work well but not with dir name who contain spaces. (corrected logic because this was anything 50Kb or over)

#RESULTTODELETE=$(find ~/foo -mindepth 1 -maxdepth 1 -type d -exec du -ks {} + | awk '$1 <= 50' | cut -f 2-); RESULTTODELETE2=$(echo "$RESULTTODELETE" | sed 's, ,\\ ,g'); echo "$RESULTTODELETE2" | xargs rm -rf

Will work with :

~/f oo/a lpha
~/fo o/be ta
~/f o o/ep silon
~/foo/gamma

edited Apr 16, 2017 at 6:46

CommunityBot

1

answered Jan 2, 2017 at 12:04

Beber

972 bronze badges

This looks extremely complex and brittle. Usually the recommended approach is to handle everything related to file name handling inside the -exec; spaces are not the only problematic character, mind you (newlines are another common corner case, though it's less often encountered in reality).

tripleee
– tripleee

2017-04-16 10:32:00 +00:00
Commented Apr 16, 2017 at 10:32
Leave out the | xargs rm -rf or it risks deleting data if somebody blindly copy/pastes this into the terminal.

TommyPeanuts
– TommyPeanuts

2020-04-02 19:13:15 +00:00
Commented Apr 2, 2020 at 19:13
Your problem is that you used xargs without -d \\n as per the currently accepted answer (though to be fair, it was added after you posted your answer). -d is a GNU extension. If your xargs doesn't support it but supports -0 (another GNU extension for common these days), you can used find... | awk... | tr '\n' '\0' | xargs -0 rm...

Stéphane Chazelas
– Stéphane Chazelas

2022-01-17 15:02:16 +00:00
Commented Jan 17, 2022 at 15:02

Add a comment |

Stack Exchange Network

Command to delete directories whose contents are less than a given size

4 Answers 4

You must log in to answer this question.

Hot Network Questions

Command to delete directories whose contents are less than a given size

4 Answers 4

You must log in to answer this question.

Related

Hot Network Questions