How can I create a compressed archive consisting of the largest files from multiple directories?

Question

I want to write a script which will create a single compressed-archive file consisting of the largest files above a certain threshold from multiple directories. For example, I would like to know how to take the 5 largest files above 2MB and put them into a compressed archive file called largestfile.tar.gz. Here is what I have so far:

du -a $path | sort -n -r | head -n 5 > diskspacefile.txt
file=$(cat diskspacefile.txt)
while read p; do
    filesize=echo $p | awk '{print $1 }'
    if [ $filesize > 2000000 ] 
    then
        zipfile=`echo $p | awk '{print $2 }'`
        tar -zcvf largestfile.tar.gz $zipfile
    fi
done

Unfortunately this does not appear to work. When I run it I either get only a single file in the archive or no files at all. For context, the directories I'm interested in applying this to are /root and /boot (i.e. these will be the values for the path variable in the code snippet).

du -a $path | sort -n -r | head -n 5 > diskspacefile.txt file=$(cat diskspacefile.txt) while read p; do filesize=echo $p | awk '{print $1 }'` if [ $filesize > 20000000 ] ; then zipfile=echo $p | awk '{print $2 }' tar -zcvf largestfile.tar.gz $zipfile fi done < diskspacefile.txt — Suganthan Raj
– Suganthan Raj, Commented Jan 10, 2017 at 18:59
looks like your target tar-file is static largestfile.tar.gz. For every iteration, same file is getting overwritten. — franklinsijo
– franklinsijo, Commented Jan 10, 2017 at 19:10
@SuganthanRaj If you want all the files in a single archive you should use the -r option: -r, --append append files to the end of an archive. Otherwise your tar command will create a new archive with only a single file in it each time. — Centimane
– Centimane, Commented Jan 10, 2017 at 19:25

igal · Accepted Answer · 2017-11-21 06:48:04Z

First, notice that we can use find to generate a list of all the files larger than 2MB in a given directory:

find . -type f -size +2M

We want to extract the 5 largest files from this list. To do that we can use the -printf option to print out both the file path and the file size (in bytes), as follows:

find . -maxdepth 1 -type f -size +2M -printf '%s:%P\n'

Now we can sort these results by file size (in descending order) and the top five results from this order list:

find . -maxdepth 1 -type f -size +2M -printf '%s:%P\n' | sort -rnk1

Next we remove the filesize to retrieve the relative paths to the 5 largest files above 2M in the current directory:

find . -maxdepth 1 -type f -size +2M -printf '%s:%P\n' | sort -rnk1 | head -n 5 | cut -d: -f2-

Finally we can pass this list of file paths to the tar command in order to create a zipped archive of these files:

tar czf largestfile.tar -T <(find . -maxdepth 1 -type f -size +2M -printf '%s:%P\n' | sort -rnk1 | head -n 5 | cut -d: -f2-)

Stack Exchange Network

How can I create a compressed archive consisting of the largest files from multiple directories?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

How can I create a compressed archive consisting of the largest files from multiple directories?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions