2

I want to write a script which will create a single compressed-archive file consisting of the largest files above a certain threshold from multiple directories. For example, I would like to know how to take the 5 largest files above 2MB and put them into a compressed archive file called largestfile.tar.gz. Here is what I have so far:

du -a $path | sort -n -r | head -n 5 > diskspacefile.txt
file=$(cat diskspacefile.txt)
while read p; do
    filesize=echo $p | awk '{print $1 }'
    if [ $filesize > 2000000 ] 
    then
        zipfile=`echo $p | awk '{print $2 }'`
        tar -zcvf largestfile.tar.gz $zipfile
    fi
done

Unfortunately this does not appear to work. When I run it I either get only a single file in the archive or no files at all. For context, the directories I'm interested in applying this to are /root and /boot (i.e. these will be the values for the path variable in the code snippet).

9
  • The code is not complete. Commented Jan 10, 2017 at 18:47
  • du -a $path | sort -n -r | head -n 5 > diskspacefile.txt file=$(cat diskspacefile.txt) while read p; do filesize=echo $p | awk '{print $1 }'` if [ $filesize > 20000000 ] ; then zipfile=echo $p | awk '{print $2 }' tar -zcvf largestfile.tar.gz $zipfile fi done < diskspacefile.txt Commented Jan 10, 2017 at 18:59
  • Still at least done is missing. Commented Jan 10, 2017 at 19:02
  • looks like your target tar-file is static largestfile.tar.gz. For every iteration, same file is getting overwritten. Commented Jan 10, 2017 at 19:10
  • 1
    @SuganthanRaj If you want all the files in a single archive you should use the -r option: -r, --append append files to the end of an archive. Otherwise your tar command will create a new archive with only a single file in it each time. Commented Jan 10, 2017 at 19:25

1 Answer 1

0

First, notice that we can use find to generate a list of all the files larger than 2MB in a given directory:

find . -type f -size +2M

We want to extract the 5 largest files from this list. To do that we can use the -printf option to print out both the file path and the file size (in bytes), as follows:

find . -maxdepth 1 -type f -size +2M -printf '%s:%P\n'

Now we can sort these results by file size (in descending order) and the top five results from this order list:

find . -maxdepth 1 -type f -size +2M -printf '%s:%P\n' | sort -rnk1

Next we remove the filesize to retrieve the relative paths to the 5 largest files above 2M in the current directory:

find . -maxdepth 1 -type f -size +2M -printf '%s:%P\n' | sort -rnk1 | head -n 5 | cut -d: -f2-

Finally we can pass this list of file paths to the tar command in order to create a zipped archive of these files:

tar czf largestfile.tar -T <(find . -maxdepth 1 -type f -size +2M -printf '%s:%P\n' | sort -rnk1 | head -n 5 | cut -d: -f2-)

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.