12

I have a big .gz file. I would like to split it into 100 smaller gzip files, that can each be decompressed by itself. In other words: I am not looking for a way of chopping up the .gz file into chunks that would have to be put back together to be able to decompress it. I want to be able to decompress each of the smaller files independently.

Can it be done without recompressing the whole file?

Can it be done if the original file is compressed with --rsyncable? ("Cater better to the rsync program by periodically resetting the internal structure of the compressed data stream." sounds like these reset points might be good places to split at and probably prepend a header.)

Can it be done for any of the other compressed formats? I would imagine bzip2 would be doable - as it is compressed in blocks.

9
  • 1
    Have you try split -b ? Commented Jan 17, 2017 at 23:24
  • 5
    @GeorgeVasiliou It will not result in smaller gzip files that can be decompressed. Commented Jan 17, 2017 at 23:39
  • The answer to your first question is no, this has been covered in Delete last line of gz file. The answer is probably no with most compressed formats, since what you're asking for goes against compression. I think the answer is also no with gzip --rsyncable given that “gunzip cannot tell the difference” (if you could find a place to split, you could tell that there is a place to split). It might be doable with bzip2 because of its peculiar block feature. Commented Jan 17, 2017 at 23:58
  • This may help: stackoverflow.com/a/22628945/4941495 Just let the standard input stream be the output of gzip -d -c bigfile.gz. Commented Jan 18, 2017 at 0:18
  • 1
    It is almost just a tar-ball with gzip files inside you want (not big.tar.gz but big-gzs.tar). Then all or only a few files can be extracted and decompressed. I have tried to extract the last file only in a tar-ball but I guess it can "fast forward" as a tape drive can. Commented Feb 8, 2017 at 12:00

2 Answers 2

2

Split and join of the big file works, but it is impossible to decompress pieces of the compressed file, because essential informations are distributed through the whole dataset. Another way; split the uncompressed file and compress the single parts. Now you can decompress each pieces. But why? You have to merge all decompressed parts before further processing.

4
  • 2
    Fun fact: When you have the individually compressed parts (using gzip or xz), you may do concatenation and decompression, or decompression and concatenation. The order doesn't matter. Commented Feb 24, 2017 at 9:13
  • Maybe, it depends on the data. If you split and compress disk images, you have a chance to recover parts of the filesystem. If you first compress and then split, you have definitively no chance. Commented Feb 24, 2017 at 9:24
  • No, and that was not my premise either. I just said that the order in which you do concatenation and decompression when you have individually compressed parts does not matter (this is due to the compressed file formats). If compressing first, then splitting, then one obviously need to recombine first. Commented Feb 24, 2017 at 9:26
  • Oh that's cool. It work's, even though every part contains a individual file header! Commented Feb 24, 2017 at 9:36
-1

Except my error, i think this is not possible, without alterate your file in losing the ability to rebuild and decompress the big file because you will lose the metadata (header and tail) from the first big file compress and those metadata don't exist for each of your small file.

But you could create a wrapper that could do...

  1. (optional) compress the big file
  2. split your big file into 100 small chunk
  3. compress each of your small chunck in gzip
  4. decompress each chunck in gzip.
  5. concat chunck into the big file.
  6. (optional) decompress the big file

Notice : I am not sure in terme of your purpose... save storage ? save time network transmission ? limite space system ? what is your root need ?

Best Regards

2
  • I want to take a gzip file and decompress it in parallel. And before you say pigz please test how well DEcompressing works on a 64-core machine. Commented Oct 27, 2020 at 22:49
  • I was trying to say the same thing of ingopingo but with my poor english level. For the proposition yes, it is not interesting, it was just an explaination. Better to use the simple way, compress what you want in small archive independant and after make a big archive with a select of compress of your choice : you can compress, some other files already compress (not sure you while gain a lot but you can organize differently and with password or special security why not). --resyncable seems use only by rsync for the transfert data to avoid to retransfer the whole archive. The basic Commented Nov 17, 2021 at 11:09

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.