KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux
HCL
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
cl
manifests
workers
.gitignore
LICENSE
README.md
apiserver.tf
bootkube.tf
controllers.tf
network.tf
outputs.tf
require.tf
ssh.tf
terraform.tfvars
variables.tf
workers.tf

README.md

KubeCon EU 2018

This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.

youtube asciicast

Prerequisites

You will need a Google Cloud account with available quota for NVIDIA GPUs.

Getting Started

Edit the require.tf Terraform file and uncomment and add the details for your Google Cloud project:

$EDITOR require.tf

Modify the provided terraform.tfvars file to suit your project:

$EDITOR terraform.tfvars

Running

  1. create cluster:

    terraform apply --auto-approve
  2. get nodes:

    export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig
    watch -n 1 kubectl get nodes
  3. create GPU manifests:

    kubectl apply -f manifests
  4. check status of driver installer:

    kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f
  5. check status of device plugin:

    kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f
  6. verify worker node has allocatable GPUs:

    kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')
  7. let's inspect the GPU workload:

    less manifests/darkapi.yaml
  8. let's see if the GPU workload has been scheduled:

    watch -n 2 kubectl get pods
    kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f
  9. for fun, let's test the GPU workload:

    export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}')
    ~/code/darkapi/client http://$INGRESS/api/yolo
  10. finally, let's clean up:

    terraform destroy --auto-approve

Projects Leveraged In This Demo

Component URL
Kubernetes installer https://github.com/poseidon/typhoon
GPU driver installer https://github.com/squat/modulus
Kubernetes device plugin https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
sample workload https://github.com/squat/darkapi