0

I am working on a research project that involves large video datasets (100s of GB, possibly multiple TB in the near future). I am fairly new to linux, sysadmin, and setting up servers, so please bear with me. I've provided quite a bit of info, and let me know if there is anything else that would be helpful.

I am using Ubuntu, Docker (w/ docker-compose), nginx, Python3.5 & django 1.10

Uploading a large-ish (60GB) dataset leads to the following error:

$ sudo docker-compose build
postgres uses an image, skipping
Building django
Step 1 : FROM python:3.5-onbuild
# Executing 3 build triggers...
Step 1 : COPY requirements.txt /usr/src/app/
 ---> Using cache
Step 1 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Using cache
Step 1 : COPY . /usr/src/app
ERROR: Service 'django' failed to build: Error processing tar file(exit status 1): write /usr/src/app/media/packages/video_3/video/video_3.mkv: no space left on device

My files are on a drive with 500GB free, and the current dataset is only ~60GB.

I found this discussion on container size. Perhaps I am misunderstanding Docker, but I believe I just want my volumes to be larger, not the containers themselves, so this doesn't seem appropriate. It also doesn't use docker-compose, so I'm unclear how to implement it in my current setup.

Just to be clear, with help from this question I am able to serve static files & media files with a small test set of data. (unclear to me if they're serving from the django container or the nginx container, as the data appears in both containers via ssh)

How can I get my setup to handle this large amount of data? I would like to be able to upload additional data later, so if a solution exists that can do this without having to rebuild volumes all the time, that'd be swell.

My Setup

Directory Structure

film_web
├── docker-compose.yml
├── Dockerfile
├── film_grammar
│   ├── #django code lives here
├── gunicorn_conf.py
├── media
│   ├── #media files live here
├── nginx
│   ├── Dockerfile
│   └── nginx.conf
├── requirements.txt
└── static
    ├── #static files live here

docker-compose.yml

nginx:
  build: ./nginx
  volumes:
    - ./media:/usr/src/app/film_grammar/media
    - ./static:/usr/src/app/film_grammar/static
  links:
    - django
  ports:
    - "80:80"
  volumes_from:
    - django

django:
  build: .
  volumes:
    - ./film_grammar:/usr/src/app/film_grammar
  expose:
    - "8000"
  links:
    - postgres

postgres:
  image: postgres:9.3

film_web Dockerfile

From python:3.5-onbuild
ENV DJANGO_CONFIGURATION Docker
CMD ["gunicorn", "-c", "gunicorn_conf.py", "--chdir", "film_grammar", "fg.wsgi:application", "--reload"]

VOLUME /home/alexhall/www/film_web/static
VOLUME /home/alexhall/www/film_web/media

nginx Dockerfile:

FROM nginx
COPY nginx.conf /etc/nginx/nginx.conf

nginx.conf

worker_processes 1;

events {
    worker_connections   1024;
}

http {
    include /etc/nginx/mime.types;
    server {
        listen 80;
        server_name film_grammar_server;

        access_log /dev/stdout;
        error_log /dev/stdout info;

        location /static {
            alias /usr/src/app/film_grammar/static/;
        }

        location /media {
            alias /usr/src/app/film_grammar/media/;
        }


        location / {
            proxy_pass http://django:8000;
            proxy_set_header   Host $host;
            proxy_set_header   X-Real-IP $remote_addr;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Host $server_name;
        }
    }
}

Thanks in advance for your help!

1
  • 1
    I'd suggest to leave the data out from container and use volume bind mounting to get the data in (docker run -v <video data folder on host>:<video data folder on container>). Commented Nov 8, 2016 at 8:52

1 Answer 1

1

build starts by creating a tarball from the context directory (in your case .) and sending that tarball to the server. The tarball is created in the tmp directory I believe, which is probably why you're running out of space when trying to build.

When you're working with large datasets the recommended approach is to use a volume. You can use a bind mount volume to mount the files from the host.

Since you're providing the data using a volume, you'll want to exclude it from the image context. To do this create a .dockerignore in the . directory. In that file add all the paths with large data (.git, media, static).

Once you've ignored the large directories the build should work.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. I thought that these lines in docker-compose.yml were using a volume as you suggested. Is this not the case? volumes: - ./media:/usr/src/app/film_grammar/media - ./static:/usr/src/app/film_grammar/static
Yes, you are using a volume, but that doesn't get used until after build.
Thanks! Adding the big media and static folders to the ignore did the trick. And now I understand the system a little better.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.