11

I had a large (~60G) compressed file (tar.gz).

I used split to break it into 4 parts and then cat to join them back together.

However, now, when I am trying to estimate the size of the uncompressed file, it turns out it is smaller than the original? How is this possible?

$ gzip -l myfile.tar.gz 
         compressed        uncompressed  ratio uncompressed_name
        60680003101          3985780736 -1422.4% myfile.tar
0

1 Answer 1

25

This is caused by the size of the field used to store the uncompressed size in gzipped files: it’s only 32 bits, so gzip can only store sizes of files up to 4 GiB. Anything larger is compressed and uncompressed correctly, but gzip -l gives an incorrect uncompressed size in versions 1.11 and older.

So splitting the tarball and reconstructing it hasn’t caused this, and shouldn’t have affected the file — if you want to make sure, you can check it with gzip -tv.

See Fastest way of working out uncompressed size of large GZIPPED file for more details, and the gzip manual:

The gzip format represents the input size modulo 2³², so the uncompressed size and compression ratio are listed incorrectly for uncompressed files 4 GiB and larger.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.