5

I have a top directory ds237 which has multiple sub-directories under it as below:

ds237/
├── dataset_description.json
├── derivatives
├── sub-01
├── sub-02
├── sub-03
├── sub-04
├── sub-05
├── sub-06
├── sub-07
├── sub-08
├── sub-09
├── sub-10
├── sub-11
├── sub-12
├── sub-13
├── sub-21
├── sub-22
├── sub-23
├── sub-24
├── sub-25
├── sub-26
├── sub-27
├── sub-28
├── sub-29

I am trying to create multiple zip files(with proper zip names) from ds237 as per size of the zip files. sub01-01.zip: contain sub-01 to sub-07 sub08-13.zip : it contains sub08 to sub-13

I have written a logic which creates a list of sub-directories [sub-01,sub-02, sub-03, sub-04, sub-05]. I have created the list so that the total size of the all subdirectories in the list should not be > 5gb.

My question: is how can I write a function to zip these sub-dirs (which are in a list) into a destination zip file with a proper name. Basically i want to write a function as follows:

def zipit([list of subdirs], 'path/to/zipfile/sub*-*.zip'):

I Linux I generally achieve this by:

'zip -r compress/sub01-08.zip ds237/sub-0[1-8]'

4 Answers 4

14
+25

Looking at https://stackoverflow.com/a/1855118/375530, you can re-use that answer's function to add a directory to a ZipFile.

import os
import zipfile


def zipdir(path, ziph):
    # ziph is zipfile handle
    for root, dirs, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file),
                       os.path.relpath(os.path.join(root, file),
                                       os.path.join(path, '..')))


def zipit(dir_list, zip_name):
    zipf = zipfile.ZipFile(zip_name, 'w', zipfile.ZIP_DEFLATED)
    for dir in dir_list:
        zipdir(dir, zipf)
    zipf.close()

The zipit function should be called with your pre-chunked list and a given name. You can use string formatting if you want to use a programmatic name (e.g. "path/to/zipfile/sub{}-{}.zip".format(start, end)).

Sign up to request clarification or add additional context in comments.

4 Comments

the above script will create a zip file by excluding the path of directory. Let say i zip /Users/aba/ds100/sub-0[1-6] into sub01-06.zip then when i uncompress the zip, it should generate following path ds100/sub-01 and other directories.
You can also change the relpath to go two directories up from path. So change os.path.join(path, '..') to os.path.join(path, '..', '..') and it should work.
it does the job partially but when i uncompress the sub01-06.zip and sub07-09.zip, ideally it should uncompress into ds100/sub-01 ds100/sub-02 ds100/sub-03 ds100/sub-04 ds100/sub-05 ds100/sub-06 ds100/sub-07 ds100/sub-08 ds100/sub-09, However above script with chnages you suggested crates two different ds100`
Not sure what you're seeing, I ran a similar test and was able to extract both zips to fill in the ds100 directory. There may be some configuration with your unzip tool. You can also use unzip zip_file.zip -d output_directory to unzip the file zip_file.zip to output_directory. This would also be an alternate to changing the code to put ds100 in there, where you would just specify the output directory as ds100.
1

You can use subprocess calling 'zip' and passing the paths as arguments

1 Comment

I intend to do this in pythonic way
1

The following will give you zip file with a first folder ds100:

import os
import zipfile    

def zipit(folders, zip_filename):
    zip_file = zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED)

    for folder in folders:
        for dirpath, dirnames, filenames in os.walk(folder):
            for filename in filenames:
                zip_file.write(
                    os.path.join(dirpath, filename),
                    os.path.relpath(os.path.join(dirpath, filename), os.path.join(folders[0], '../..')))

    zip_file.close()


folders = [
    "/Users/aba/ds100/sub-01",
    "/Users/aba/ds100/sub-02",
    "/Users/aba/ds100/sub-03",
    "/Users/aba/ds100/sub-04",
    "/Users/aba/ds100/sub-05"]

zipit(folders, "/Users/aba/ds100/sub01-05.zip")

For example sub01-05.zip would have a structure similar to:

ds100
├── sub-01
|   ├── 1
|       ├── 2
|   ├── 1
|   ├── 2
├── sub-02
    ├── 1
        ├── 2
    ├── 1
    ├── 2

Comments

0

To batch zip, improving from previous answer, you can use the following

import os
from zipfile import ZipFile, ZIP_DEFLATED

base_dir = "."
base_zip_dir = f"{base_dir}/zip"
target_dir = f"{base_dir}/data"
folders_per_zip = 500

os.makedirs(base_zip_dir, exist_ok=True)

def zipdir(path, ziph):
    for root, _, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file),
                       os.path.relpath(os.path.join(root, file), os.path.join(path, '..')))

def batch_zip(folder_list, folders_per_zip, target_dir):
    for i, folders in enumerate(zip(*[iter(folder_list)] * folders_per_zip), start=1):
        zip_filename = f"{target_dir}/{i}.zip"
        with ZipFile(zip_filename, 'w', ZIP_DEFLATED) as zipf:
            for folder in folders:
                if any(os.path.isfile(os.path.join(folder, f)) for f in os.listdir(folder)):
                    zipdir(folder, zipf)
        print(f"Zip file {zip_filename} created.")

folders = [f.path for f in os.scandir(target_dir) if f.is_dir()]
batch_zip(folders , folders_per_zip, base_zip_dir)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.