13

I want to create 100 virtual servers. They will be used for testing, so they should be easy to create and destroy.

  • They must be accessible through SSH from another physical machine (I provide the public ssh-key)
  • They must have their own IP-address and be accessible from another physical host as ssh I.P.n.o e.g. ssh 10.0.0.99 (IPv4 or IPv6, private address space OK, port-forwarding is not - so this may involve setting up a bridge)
  • They must have basic UNIX tools installed (preferably a full distro)
  • They must have /proc/cpuinfo, a root user, and a netcard (This is probably only relevant if the machine is not fully virtualized)
  • Added bonus if they can be made to run an X server that can be connected to remotely (using VNC or similar)

What is the fastest way (wall clock time) to do this given:

  • The host system runs Ubuntu 20.04 and has plenty of RAM and CPU
  • The LAN has a DHCP-server (it is also OK to use a predefined IP-range)
  • I do not care which Free virtualization technology is used (Containerization is also OK if the other requirements are met)

and what are the actual commands I should run/files I should create?

I have the feeling that given the right technology this is a 50 line job that can be set up in minutes.

The few lines can probably be split into a few bash functions:

install() {
  # Install needed software once
}
setup() {
  # Configure the virtual servers
}
start() {
  # Start the virtual servers
  # After this it is possible to do:
  #   ssh 10.0.0.99
  # from another physical server
}
stop() {
  # Stop the virtual servers
  # After there is no running processes on the host server
  # and after this it is no longer possible to do:
  #   ssh 10.0.0.99
  # from another physical server
  # The host server returns to the state before running `start`
}
destroy() {
  # Remove the setup
  # After this the host server returns to the state before running `setup`
}

Background

For developing GNU Parallel I need an easy way to test running on 100 machines in parallel.

For other projects it would also be handy to be able to create a bunch of virtual machines, test some race conditions and then destroy the machines again.

In other words: This is not for a production environment and security is not an issue.

Docker

Based on @danielleontiev's notes below:

install() {
    # Install needed software once
    sudo apt -y install docker.io
    sudo groupadd docker
    sudo usermod -aG docker $USER
    # Logout and login if you were not in group 'docker' before
    docker run hello-world
}

setup() {
    # Configure the virtual servers
    mkdir -p my-ubuntu/ ssh/
    cp ~/.ssh/id_rsa.pub ssh/
    cat ssh/*.pub > my-ubuntu/authorized_keys
    cat >my-ubuntu/Dockerfile <<EOF
FROM ubuntu:bionic
RUN apt update && \
    apt install -y openssh-server
RUN mkdir /root/.ssh
COPY authorized_keys /root/.ssh/authorized_keys
# run blocking command which prevents container to exit immediately after start.
CMD service ssh start && tail -f /dev/null
EOF
    docker build my-ubuntu -t my-ubuntu
}

start() {
    # start container number x..y
    servers_min=$1
    servers_max=$2
    
    testssh() {
        ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/known root@"$1" echo "'$1'" '`uptime`'
    }
    export -f testssh
    setup_bridge() {
    # OMG why is this so hard
    # Default interface must have IP-addr removed
    # bridge must have IP-addr + routing copied from $dif, so it takes over default interface
    # Why on earth could we not just: brctl addif dock0 $dif - and be done?
    default_interface=$(ip -4 route ls | grep default | grep -Po '(?<=dev )(\S+)')
    dif=$default_interface
    gw=$(ip -4 route ls | grep default | grep -Po '(?<=via )(\S+)')
    dif_ip=$(ip -4 route ls | grep default | grep -Po '(?<=src )(\S+)')
    echo Add bridge
    docker network create --driver bridge --subnet=172.20.0.0/16 --opt com.docker.network.bridge.name=dock0 net0
    # $dif must be up, but with no ip addr
    sudo ip addr flush dev $dif
    sudo brctl addif dock0 $dif
    sudo ifconfig dock0:ext $dif_ip
    sudo route add -net 0.0.0.0 gw $gw
    }
    # Start the containers
    startone() {
    id=$1
    net=$2
        docker run -d --rm --name ubuntu-$id-$net --network $net my-ubuntu
    docker inspect ubuntu-$id-$net
    }
    export -f startone

    setup_bridge
    echo Start containers
    seq $servers_min $servers_max | parallel startone {} net0 |
        # After this it is possible to do:
        #   ssh 10.0.0.99
        # from another physical server
        perl -nE '/"IPAddress": "(\S+)"/ and not $seen{$1}++ and say $1' |
    # Keep a list of the IP addresses in /tmp/ipaddr
    tee /tmp/ipaddr |
        parallel testssh
    docker ps
    route -n
}

stop() {
    # Stop the virtual servers
    # After there is no running processes on the host server
    # and after this it is no longer possible to do:
    #   ssh 10.0.0.99
    # from another physical server
    # The host server returns to the state before running `start`
    echo Stop containers
    docker ps -q | parallel docker stop {} |
    perl -pe '$|=1; s/^............\n$/./'
    echo
    echo If any containers are remaining it is an error
    docker ps
    # Take down bridge
    docker network ls|G bridge net|field 1| sudo parallel docker network rm
    # Re-establish default interface
    dif=$default_interface
    sudo ifconfig $dif $dif_ip
    # Routing takes a while to be updated
    sleep 2
    route -n
}

destroy() {
    # Remove the setup
    # After this the host server returns to the state before running `setup`
    rm -rf my-ubuntu/
    docker rmi my-ubuntu
}

full() {
    install
    setup
    start
    stop
    destroy
}

$ time full
real    2m21.611s
user    0m47.337s
sys     0m31.882s

This takes up 7 GB RAM in total for running 100 virtual servers. So you do not even need to have plenty of RAM to do this.

It scales up to 1024 servers after which the docker bridge complains (probably due to Each Bridge Device can have up to a maximum of 1024 ports).

The script can be adapted to run 6000 containers (Run > 1024 docker containers), but at 6055 it blocks (https://serverfault.com/questions/1091520/docker-blocks-when-running-multiple-containers).

Vagrant

Based on @Martin's notes below:

install() {
    # Install needed software once
    sudo apt install -y vagrant virtualbox
}
setup() {
    # Configure the virtual servers
    mkdir -p ssh/
    cp ~/.ssh/id_rsa.pub ssh/
    cat ssh/*.pub > authorized_keys
    cat >Vagrantfile <<'EOF'
Vagrant.configure("2") do |config|
  config.vm.box = "debian/buster64"
  (1..100).each do |i|
    config.vm.define "vm%d" % i do |node|
      node.vm.hostname = "vm%d" % i
      node.vm.network "public_network", ip: "192.168.1.%d" % (100+i)
    end
  end

  config.vm.provision "shell" do |s|
    ssh_pub_key = File.readlines("authorized_keys").first.strip
    s.inline = <<-SHELL
      mkdir /root/.ssh
      echo #{ssh_pub_key} >> /home/vagrant/.ssh/authorized_keys
      echo #{ssh_pub_key} >> /root/.ssh/authorized_keys
      apt-get update
      apt-get install -y parallel
    SHELL
  end
end
EOF
}
start() {
    testssh() {
        ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@"$1" echo "'$1'" '`uptime`'
    }
    export -f testssh
    # Start the virtual servers
    seq 100 | parallel --lb vagrant up vm{}
    # After this it is possible to do:
    #   ssh 192.168.1.111
    # from another physical server
    parallel testssh ::: 192.168.1.{101..200}
}
stop() {
    # Stop the virtual servers
    # After there is no running processes on the host server
    # and after this it is no longer possible to do:
    #   ssh 10.0.0.99
    # from another physical server
    # The host server returns to the state before running `start`
    seq 100 | parallel vagrant halt vm{}
}
destroy() {
    # Remove the setup
    # After this the host server returns to the state before running `setup`
    seq 100 | parallel vagrant destroy -f vm{}
    rm -r Vagrantfile .vagrant/
}

full() {
    install
    setup
    start
    stop
    destroy
}

start gives a lot of warnings:

NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01.

stop gives this warning:

NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01.
Gem::Specification.default_specifications_dir called from /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/bundler.rb:428.
NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01.
Gem::Specification.default_specifications_dir called from /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/bundler.rb:428.
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/plugins/kernel_v2/config/vm.rb:354: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/plugins/kernel_v2/config/vm_provisioner.rb:92: warning: The called method `add_config' is defined here
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/errors.rb:103: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/usr/share/rubygems-integration/all/gems/i18n-1.8.2/lib/i18n.rb:195: warning: The called method `t' is defined here

Each virtual machine takes up 0.5 GB of RAM on the host system.

It is much slower to start than the Docker machines above. The big difference is that the Vagrant-machines do not have to run the same kernel as the host, but are complete virtual machines.

12
  • 1
    I have avoided it being opinion based by putting in the measuring stick: fastest. So we can objectively test if the solution given is the fastest. Commented Jun 4, 2020 at 16:05
  • 4
    First thing I'd try is to spin up 50 docker containers; with replicas in docker-compose that's a one-liner once the configuration file is written, both for creation and deletion. Pick any distro image you like. You can run Xvfb (but that will use up more memory, and don't know how much "plenty of RAM" is). But both for GNU parallel and race condition tests you don't have to. I'll leave it as an exercise for you to figure out the actual commands. Commented Jun 4, 2020 at 16:21
  • @dirkt Plenty literally means plenty. Commented Jun 4, 2020 at 16:34
  • I would implement this using Ansible. Sorry, did not got time for a full answer. Commented Jun 6, 2020 at 16:33
  • have you considered systemd-nspawn? it should suit the job well and also be fairly simple and fast to setup on a ubuntu 20.04 host Commented Jun 7, 2020 at 13:59

5 Answers 5

10
+25

I think docker meets your requirements.

1) Install docker (https://docs.docker.com/engine/install/) Make sure you are done with linux post installation steps (https://docs.docker.com/engine/install/linux-postinstall/)

2) I assume you have the following directory structure:

.
└── my-ubuntu
    ├── Dockerfile
    └── id_rsa.pub

1 directory, 2 files

id_rsa.pub is your public key and Dockerfile we will discuss below

3) First, we are going to build docker image. It's like template for containers that we are going to run. Each container would be something like materialization of our image.

4) To build image we need a template. It's Dockerfile:

FROM ubuntu:bionic
RUN apt update && \
    apt install -y openssh-server
RUN mkdir /root/.ssh
COPY id_rsa.pub /root/.ssh/authorized_keys

CMD service ssh start && tail -f /dev/null

  • FROM ubuntu:bionic defines our base image. You can find base for Arch, Debian, Apline, Ubuntu, etc on hub.docker.com
  • apt install part installs ssh server
  • COPY from to copies our public key to the place where it will be in the container
  • Here you could add more RUN statements to do additional things: install software, create files, etc...
  • The last is tricky. The first part starts ssh server when we start container which is obvious but the second is important - it runs blocking command which prevents container to exit immediately after start.

5) docker build my-ubuntu -t my-ubuntu - builds image. The output of this command:

Sending build context to Docker daemon  3.584kB
Step 1/5 : FROM ubuntu:bionic
 ---> c3c304cb4f22
Step 2/5 : RUN apt update &&     apt install -y openssh-server
 ---> Using cache
 ---> 40c56d549c0e
Step 3/5 : RUN mkdir /root/.ssh
 ---> Using cache
 ---> c50d8b614b21
Step 4/5 : COPY id_rsa.pub /root/.ssh/authorized_keys
 ---> Using cache
 ---> 34d1cf4e9f69
Step 5/5 : CMD service ssh start && tail -f /dev/null
 ---> Using cache
 ---> a442db47bf6b
Successfully built a442db47bf6b
Successfully tagged my-ubuntu:latest

6) Let's run my-ubuntu. (Once again my-ubuntu is the name of image). Starting container with name my-ubuntu-1 which is derived from my-ubuntu image:

docker run -d --rm --name my-ubuntu-1 my-ubuntu

Options:

  • -d demonize for running container in bg
  • --rm to erase container after container stops. It can be important because when you deal with a lot of containers they can quickly pollute you HDD.
  • --name name for container
  • my-ubuntu image we start from

7) Image is running. docker ps can prove this:

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                NAMES
ee6bc20fd820        my-ubuntu           "/bin/sh -c 'service…"   5 minutes ago       Up 5 minutes         my-ubuntu-1

8) To execute command in the container run:

docker exec -it my-ubuntu-1 bash - to get into the container's bash. It is possible to provide any command

9) If running command the way above is not enough do docker inspect my-ubuntu-1 and grep IPAddress field. For my it's 172.17.0.2.

ssh [email protected]
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.6.15-arch1-1 x86_64)

10) To stop container: docker stop my-ubuntu-1

11) Now it is possible to run 100 containers:

#!/bin/bash

for i in $(seq 1 100); do
    docker run -d --rm --name my-ubuntu-$i my-ubuntu
done

My docker ps:

... and so on ...
ee2ccce7f642        my-ubuntu           "/bin/sh -c 'service…"   46 seconds ago      Up 45 seconds                            my-ubuntu-20
9fb0bfb0d6ec        my-ubuntu           "/bin/sh -c 'service…"   47 seconds ago      Up 45 seconds                            my-ubuntu-19
ee636409a8f8        my-ubuntu           "/bin/sh -c 'service…"   47 seconds ago      Up 46 seconds                            my-ubuntu-18
9c146ca30c9b        my-ubuntu           "/bin/sh -c 'service…"   48 seconds ago      Up 46 seconds                            my-ubuntu-17
2dbda323d57c        my-ubuntu           "/bin/sh -c 'service…"   48 seconds ago      Up 47 seconds                            my-ubuntu-16
3c349f1ff11a        my-ubuntu           "/bin/sh -c 'service…"   49 seconds ago      Up 47 seconds                            my-ubuntu-15
19741651df12        my-ubuntu           "/bin/sh -c 'service…"   49 seconds ago      Up 48 seconds                            my-ubuntu-14
7a39aaf669ba        my-ubuntu           "/bin/sh -c 'service…"   50 seconds ago      Up 48 seconds                            my-ubuntu-13
8c8261b92137        my-ubuntu           "/bin/sh -c 'service…"   50 seconds ago      Up 49 seconds                            my-ubuntu-12
f8eec379ee9c        my-ubuntu           "/bin/sh -c 'service…"   51 seconds ago      Up 49 seconds                            my-ubuntu-11
128894393dcd        my-ubuntu           "/bin/sh -c 'service…"   51 seconds ago      Up 50 seconds                            my-ubuntu-10
81944fdde768        my-ubuntu           "/bin/sh -c 'service…"   52 seconds ago      Up 50 seconds                            my-ubuntu-9
cfa7c259426a        my-ubuntu           "/bin/sh -c 'service…"   52 seconds ago      Up 51 seconds                            my-ubuntu-8
bff538085a3a        my-ubuntu           "/bin/sh -c 'service…"   52 seconds ago      Up 51 seconds                            my-ubuntu-7
1a50a64eb82c        my-ubuntu           "/bin/sh -c 'service…"   53 seconds ago      Up 51 seconds                            my-ubuntu-6
88c2e538e578        my-ubuntu           "/bin/sh -c 'service…"   53 seconds ago      Up 52 seconds                            my-ubuntu-5
1d10f232e7b6        my-ubuntu           "/bin/sh -c 'service…"   54 seconds ago      Up 52 seconds                            my-ubuntu-4
e827296b00ac        my-ubuntu           "/bin/sh -c 'service…"   54 seconds ago      Up 53 seconds                            my-ubuntu-3
91fce445b706        my-ubuntu           "/bin/sh -c 'service…"   55 seconds ago      Up 53 seconds                            my-ubuntu-2
54c70789d1ff        my-ubuntu           "/bin/sh -c 'service…"   2 minutes ago       Up 2 minutes         my-ubuntu-1

I can do f.e. docker inspect my-ubuntu-15, get its IP and connect to ssh to it or use docker exec.

It is possible to ping containters from containers (install iputils-ping to reproduce):

root@5cacaf03bf89:~# ping 172.17.0.2 
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=1.19 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.158 ms
64 bytes from 172.17.0.2: icmp_seq=3 ttl=64 time=0.160 ms
^C
--- 172.17.0.2 ping statistics ---

N.B. running containers from bash is quick solution. If you would like scalable approach consider using kubernetes or swarm

P.S. Useful commands:

  • docker ps
  • docker stats
  • docker container ls
  • docker image ls

  • docker stop $(docker ps -aq) - stops all running containers

Also, follow the basics from docs.docker.com - it's 1 hour time spent for better experience working with containers

Additional:

Base image in the example is really minimal image. It does not have DE or even xorg. You could install it manually (adding packages to RUN apt install ... section) or use image that already has the software you need. Quick googling gives me this (https://github.com/fcwu/docker-ubuntu-vnc-desktop). I have never tried but I think it should work. If you are definitely need VNC access I should try to play around a bit and add info to the answer

Exposing to local network:

This one may be tricky. I am sure it can be done with some obscure port forwarding but the straightforward solution is to change running script as follows:

#!/bin/bash

for i in $(seq 1 100); do
    docker run -d --rm -p $((10000 + i)):22 --name my-ubuntu-$i my-ubuntu
done

After that you would be able to access your containers with host machine IP:

ssh root@localhost -p 10001
The authenticity of host '[localhost]:10001 ([::1]:10001)' can't be established.
ECDSA key fingerprint is SHA256:erW9kguSvn1k84VzKHrHefdnK04YFg8eE6QEH33HmPY.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[localhost]:10001' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.6.15-arch1-1 x86_64)
8
  • This is close to an answer. I can see, I have not made it clear, that port-forwarding is not enough: The virtual server must be accessible from another physical machine on its IP address without a port number (e.g. ssh 10.0.0.99). Commented Jun 8, 2020 at 6:17
  • I will try to do something with ports later today Commented Jun 8, 2020 at 7:32
  • docker network create --driver=bridge --ip-range=10.0.0.0/24--subnet=10.0.0.0/16 --aux-address='ip1=10.0.0.1' -o "com.docker.network.bridge.name=br0" br0 , then add --network=br0 to the docker run command and find a way to either FORWARD with iptables or attach a network card to your br0 bridge Commented Jun 9, 2020 at 10:50
  • I have some problems with my local network and in fact I am only able to test on my machine. Maybe @oletange will try it Commented Jun 9, 2020 at 11:09
  • @BenjiBear Can you adapt the precise commands given the host has IP 192.168.1.31 on interface eno1? Commented Jun 9, 2020 at 16:57
5
  • create a virtual network

    ( either with virtualbox

    or by using docker , e.g.: docker network create --driver=bridge --ip-range=10.0.190.0/24 --subnet=10.0.0.0/16 --aux-address='ip1=10.0.190.1' --aux-address='ip2=10.0.190.2' --aux-address='ip3=10.0.190.3' -o "com.docker.network.bridge.name=br0" br0 )

  • if you want virtualbox/kvm :

    prepare a pxe/http server and a distribution like SLAX or Alpine Linux , with slax and savechanges you cloud build a system with all software prepackaged , on the other hand , it will be much overhead , but with tools like Cluster SSH you can trigger your commands simultaneously by running

    cssh [email protected].{04..254} -p 22

  • when using docker: attach all containers to the named network , either via docker-compose or manually , you could also modify the CMD to run dropbear if you want to have ssh access

4
  • 1
    This does not seem to be a complete answer, but mostly ideas for an answer. I am looking for a complete answer. A complete answer would include all the actual commands with complete config files to run on a stock Ubuntu 20.04. Commented Jun 7, 2020 at 22:28
  • 1
    i am really sorry that i am not going to do your homework Commented Jun 8, 2020 at 10:33
  • 3
    @BashStack, lol. If I knew how to, I wouldn't mind doing some of Ole's "homework" so he could put the time to better use maintaining GNU Parallel Commented Jun 8, 2020 at 17:25
  • 1
    @OleTange @iruvar since @Ole found a way with docker: you can create a network named br0 ( the physical bridge will have the same name ) with the command above , and then just modify your current version in the post to have "docker run ... --network=br0 " ( or however you named it ) Commented Jun 8, 2020 at 19:31
5

You could use Vagrant for spinning up your test environments. Once you wrote a Vagrantfile defining the distro to run, the network configuration etc. you can bring up the machines by running vagrant up <vmname> or just vagrant up to fire all of them up. Vagrant supports various virtualization providers including Virtual Box, VMware, KVM, AWS, Docker,... Vagrant is able to spin up development environments quickly since it is leveraging pre-built "box" files rather than installing each system from scratch. At the same time Vagrant allows you to run your custom provisioning for each VM using Ansible, Puppet, Chef, CFEngine or simply a short shell script. You can mix and match different distributions in the same Vagrantfile. SSH access is set up automatically. You can get access to a machine by running vagrant ssh <vmname>. Synced folders make it easy to bring files from your host system into your test environments.


Here are the steps in detail:

  1. Download and install Vagrant and your favorite virtualization provider:

    $ sudo apt install -y vagrant virtualbox
    
  2. Create Vagrantfile with the following content:

    Vagrant.configure("2") do |config|
      config.vm.box = "debian/buster64"
      (1..100).each do |i|
        config.vm.define "vm%03d" % i do |node|
          node.vm.hostname = "vm%03d" % i
          node.vm.network "public_network", ip: "192.168.1.%d" % (99 + i)
        end
      end
    
      config.vm.provision "shell" do |s|
        ssh_pub_key = File.readlines("#{Dir.home}/.ssh/id_rsa.pub").first.strip
        s.inline = <<-SHELL
          mkdir /root/.ssh
          echo #{ssh_pub_key} >> /home/vagrant/.ssh/authorized_keys
          echo #{ssh_pub_key} >> /root/.ssh/authorized_keys
          apt-get update
          apt-get install -y parallel
        SHELL
      end
    end
    
  3. Spin up the VMs:

    $ parallel vagrant up ::: vm{001..100}
    
  4. SSH to the VMs: The Vagrant way (using the key generated by Vagrant):

    $ vagrant ssh vm001
    

    Using your own key (which we installed into the VMs during the provisioning phase):

    $ ssh vagrant@<IP>
    

    Or to get root access:

    $ ssh root@<IP>
    
  5. You can suspend the VMs by running vagrant suspend and bring them up a few days later to continue testing (vagrant up). If you have many test environments but only limited disk space you can destroy some VMs and recreate them later.

  6. Destroy the VMs and delete the configuration:

    vagrant destroy -f
    rm -rf Vagrantfile .vagrant
    
4
  • This does not seem to be a complete answer, but mostly ideas for an answer. I am looking for a complete answer. A complete answer would include all the actual commands with complete config files to run on a stock Ubuntu 20.04. vagrant ssh <vmname> does not seem to meet the criteria of being accessible from a different physical machine using plain ssh. Commented Jun 7, 2020 at 22:30
  • I found some time to outline the steps for setting this up. With the configuration above you can also get direct SSH access from a different machine. The "Vagrant way" would be to let Vagrant set up SSH and the keys and then run vagrant ssh-config to get an SSH config snippet + a private key you can use to get access from outside. Commented Jun 9, 2020 at 2:45
  • This is getting closer to an answer. I have updated my question with how I interpret your notes. But I still cannot access the virtual machines on their IP-address from a remote machine. I hope you will update your answer showing where I got you wrong. Commented Jun 9, 2020 at 6:29
  • I was expecting your DHCP/DNS to take care of that based on the host name. I updated my answer to use static IPs. Commented Jun 10, 2020 at 6:02
1

This might be a job well suited for systemd-nspawn containers, except for the X server unless xvfb is enough, and here I made a couple of complete scripts including basic network connectivity to LAN.

I've made them along the lines of your skeleton script, and they are tailored for maximum speed of setting-up.

The first script builds containers based on an Ubuntu 20.04 providing the same tools as in your docker attempt, as it seems you are happy with those for your use case. On a single-CPU Xeon Silver 4114 2.20Ghz (10cores + HT) with 32GB of RAM this script completes a full run from install to destroy of 100 containers in ~35secs, with a RAM occupation of ~600MB.

The second script builds containers that more resemble a true VM, with a fuller Ubuntu 20.04 distro comprising its own systemd and the typical service daemons like cron, rsyslog etc. This completes in < 3mins, with an occupation of about 3.3GB for 100 "machines".

In both cases the great majority of time is spent in the setup phase, downloading/bootstrapping the image template etc.


First script, "docker-like" experience:

#!/bin/bash --
# vim: ts=4 noet

install() {
    [ -e /etc/radvd.conf ] || cat > /etc/radvd.conf <<EOF
interface bogus {
    IgnoreIfMissing on;
};
EOF
    apt -y install systemd-container debootstrap wget radvd
}

setup() {
    mkdir -p "$machines"

    # Fetch Ubuntu 20.04 basic system
    #debootstrap focal "$machines/$tmpl" # <-- either this, or the below wget + tar + mount
    wget -P "$machines" https://partner-images.canonical.com/core/focal/current/ubuntu-focal-core-cloudimg-amd64-root.tar.gz
    mkdir -p "$machines/$tmpl"
    tar -C "$machines/$tmpl" -xzf "$machines/ubuntu-focal-core-cloudimg-amd64-root.tar.gz"
    mount --bind /etc/resolv.conf "$machines/$tmpl/etc/resolv.conf"

    # Put our ssh pubkeys
    mkdir -p "$machines/$tmpl/root/.ssh"
    (shopt -s failglob; : ~/.ssh/*.pub) 2>/dev/null \
        && cat ~/.ssh/*.pub > "$machines/$tmpl/root/.ssh/authorized_keys"
    # Let nspawn use our parameterized hostname
    rm -f "$machines/$tmpl/etc/hostname"
    # Allow apt to function in chroot without complaints
    mount -o bind,slave,unbindable /dev "$machines/$tmpl/dev"
    mount -o bind,slave,unbindable /dev/pts "$machines/$tmpl/dev/pts"
    export DEBIAN_FRONTEND=noninteractive LANG=C.UTF-8
    chroot "$machines/$tmpl" sh -c 'apt-get update && apt-get install -y --no-install-recommends apt-utils'
    # No init-scripts are to be run while in chroot
    cat >> "$machines/$tmpl/usr/sbin/policy-rc.d" <<'EOF'
#!/bin/sh --
exit 101
EOF
    chmod +x "$machines/$tmpl/usr/sbin/policy-rc.d"
    # Install additional packages for the use case
    chroot "$machines/$tmpl" apt-get install -y --no-install-recommends \
            bash-completion iproute2 vim iputils-ping \
            openssh-server
    # Uncomment these to allow root in, with password "let-me-in"
#   echo 'PermitRootLogin yes' > "$machines/$tmpl/etc/ssh/sshd_config.d/allow-root-with-password.conf" \
#       && chroot "$machines/$tmpl" chpasswd <<<'root:let-me-in'
    umount -l "$machines/$tmpl/dev/pts" "$machines/$tmpl/dev" "$machines/$tmpl/etc/resolv.conf"
}

start() {
    # Connect to physical LAN by building a temporary bridge over the specified physical interface
    # Of course this is not required if the interface facing the LAN is already a bridge interface, in which case you can just use that as "$mybr" and skip this pipeline
    # TODO: check on possible "$mybr" existence, and/or being already a bridge, and/or enslaving of "$intf" already in place
    # NOTE: be careful how the interface in "$intf" is named, as here it is used in sed's regex
    ip -o -b - <<EOF | awk '{print "route list " $4}' | ip -b - | sed "s/^/route replace /;s/ $intf / $mybr /g" | ip -b -
link add $mybr type bridge
link set $mybr up
link set $intf master $mybr
addr show $intf up
EOF

    # Advertise a temporary private IPv6 network in LAN
    ipv6pfx='fddf:' # this arbitrary pfx is not properly compliant, but very handy for quick use in simple LANs
    cat >> /etc/radvd.conf <<EOF
### $tmpl
interface $mybr {
    AdvSendAdvert on;
    prefix $ipv6pfx:/64 {
        AdvValidLifetime 7200;
        AdvPreferredLifetime 3600;
    };
};
###
EOF
    systemctl start radvd

    for i in $(seq "$vmnum"); do
        # Spawn containers that don't persist on disk
        systemd-run --unit="$tmpl-mini-$i" --service-type=notify \
            systemd-nspawn --notify-ready=no --register=no --keep-unit --kill-signal=RTMIN+3 \
                -M "${tmpl:0:8}$i" \
                -D "$machines/$tmpl" --read-only --link-journal no \
                --overlay +/etc::/etc --overlay +/var::/var \
                --network-bridge="$mybr" \
                --as-pid2 sh -c 'ip link set host0 up && ip addr add '"$ipv6pfx:$i/64"' dev host0 && mkdir -p /run/sshd && exec /usr/sbin/sshd -D' \
                & # Run in bg and wait later; this way we allow systemd's parallel spawning
                # Below is a --as-pid2 alternative for using dhcp, but beware bombing on LAN's dhcp server
                #--as-pid2 sh -c 'udhcpc -fbi host0; mkdir -p /run/sshd && exec /usr/sbin/sshd -D' \
    done
    wait
}

stop() {
    systemctl stop "$tmpl-mini-*"
    systemctl stop radvd
    ip link del "$mybr" 2>/dev/null
    netplan apply
    sed -i "/^### $tmpl/,/^###$/d" /etc/radvd.conf
}

destroy() {
    rm -rf "$machines/$tmpl"
    rm -f "$machines/ubuntu-focal-core-cloudimg-amd64-root.tar.gz"
}

: "${machines:=/var/lib/machines}" # default location for systemd-nspawn containers
: "${vmnum:=100}" # how many containers to spawn
: "${intf:?specify the physical interface facing the LAN to connect to}"
: "${tmpl:?specify directory basename under $machines to store the containers\' OS template into}"
: "${mybr:=$tmpl-br}" # the temporary bridge to LAN will be named this

install
setup
start
stop
destroy

Once you have spawn "docker-like" containers you can handle them through systemctl. They are all spawn as systemd services named <template-name>-mini-<number>.

You may enter a shell into any one of them either through ssh or via nsenter -at <pid-of-any-process-belonging-to-a-specific-container>


Second script, "vm-like" experience:

#!/bin/bash --
# vim: ts=4 noet

install() {
    [ -e /etc/radvd.conf ] || cat > /etc/radvd.conf <<EOF || return
interface bogus {
    IgnoreIfMissing on;
};
EOF
    apt -y install systemd-container debootstrap radvd || return
}

setup() {
    mkdir -p "$machines/$tmpl" || return
    # Fetch Ubuntu 20.04 base system
    debootstrap focal "$machines/$tmpl" || return

    # Allow apt to function in chroot without complaints
    trap "umount -l $machines/$tmpl/dev/pts" RETURN
    mount -o bind,slave,unbindable /dev/pts "$machines/$tmpl/dev/pts" || return
    # Put our ssh pubkeys
    mkdir -p "$machines/$tmpl/root/.ssh" || return
    (shopt -s failglob; : ~/.ssh/*.pub) 2>/dev/null \
        && { cat ~/.ssh/*.pub > "$machines/$tmpl/root/.ssh/authorized_keys" || return; }
    # Let nspawn use our parameterized hostname
    rm -f "$machines/$tmpl/etc/hostname" || return
    # Enable container's systemd-networkd, it blends automatically with host's systemd-networkd
    chroot "$machines/$tmpl" systemctl enable systemd-networkd || return
    # Make provision for static addresses passed along at start time (see start phase below)
    cat > "$machines/$tmpl/etc/networkd-dispatcher/carrier.d/$tmpl-static-addrs.sh" <<'EOF' || return
#!/bin/bash --
[ -n "$static_ipaddrs" ] && printf 'addr add %s dev host0\n' ${static_ipaddrs//,/ } | ip -b -
EOF
    chmod +x "$machines/$tmpl/etc/networkd-dispatcher/carrier.d/$tmpl-static-addrs.sh" || return
    # Uncomment this to mind about updates and security
#   printf 'deb http://%s.ubuntu.com/ubuntu/ focal-%s main\n' \
#      archive updates security security \
#      >> "$machines/$tmpl/etc/apt/sources.list" || return
    # Uncomment this to consider [uni|multi]verse packages
#   sed -i 's/$/ universe multiverse' "$machines/$tmpl/etc/apt/sources.list" || return

    export DEBIAN_FRONTEND=noninteractive LANG=C.UTF-8
    chroot "$machines/$tmpl" apt-get update || return
    # To upgrade or not to upgrade? that is the question..
    #chroot "$machines/$tmpl" apt-get -y upgrade || return
    # Install additional packages for the use case
    chroot "$machines/$tmpl" apt-get install -y --no-install-recommends \
            bash-completion \
            openssh-server \
        || return
    # Uncomment these to allow root in, with password "let-me-in"
#   echo 'PermitRootLogin yes' > "$machines/$tmpl/etc/ssh/sshd_config.d/allow-root-with-password.conf" || return
#   chroot "$machines/$tmpl" chpasswd <<<'root:let-me-in' || return
}

start() {
    # For full-system modes we need inotify limits greater than default even for just a bunch of containers
    (( (prev_max_inst = $(sysctl -n fs.inotify.max_user_instances)) < 10*vmnum )) \
        && { sysctl fs.inotify.max_user_instances=$((10*vmnum)) || return 1; }
    (( (prev_max_wd = $(sysctl -n fs.inotify.max_user_watches)) < 40*vmnum )) \
        && { sysctl fs.inotify.max_user_watches=$((40*vmnum)) || return 1; }
    [ -s "$machines/prev_inotifys" ] || declare -p ${!prev_max_*} > "$machines/prev_inotifys"

    # Connect to physical LAN by building a temporary bridge over the specified physical interface
    # Of course this is not required if the interface facing the LAN is already a bridge interface, in which case you can just use that as "$mybr" and skip this pipeline
    # TODO: check on possible "$mybr" existence, and/or being already a bridge, and/or enslaving of "$intf" already in place
    # NOTE: be careful how the interface in "$intf" is named, as here it is used in sed's regex
    ip -o -b - <<EOF | awk '{print "route list " $4}' | ip -b - | sed "s/^/route replace /;s/ $intf / $mybr /g" | ip -b -
link add $mybr type bridge
link set $mybr up
link set $intf master $mybr
addr show $intf up
EOF

    # Advertise a temporary private IPv6 network in LAN
    ipv6pfx='fddf:' # this arbitrary pfx is not properly compliant, but very handy for quick use in simple LANs
    cat >> /etc/radvd.conf <<EOF || return
### $tmpl
interface $mybr {
    AdvSendAdvert on;
    prefix $ipv6pfx:/64 {
        AdvValidLifetime 7200;
        AdvPreferredLifetime 3600;
    };
};
###
EOF
    systemctl start radvd

    for i in $(seq "$vmnum"); do
        # Spawn containers that don't persist on disk
        systemd-run --unit="$tmpl-full-$i" --service-type=notify \
            systemd-nspawn --notify-ready=yes -b \
                -M "${tmpl:0:8}$i" \
                -D "$machines/$tmpl" --read-only --link-journal no \
                --overlay +/etc::/etc --overlay +/var::/var \
                --network-bridge="$mybr" \
                --capability=all --drop-capability=CAP_SYS_MODULE \
                "systemd.setenv=static_ipaddrs=$ipv6pfx:$i/64" \
                & # Run in bg and wait later; this way we allow systemd's parallel spawning
                # All capabilities allowed and no users isolation provide an experience which is
                # closer to a true vm (though with less security)
                # The comma separated list of static addresses will be set by our script in networkd-dispatcher
    done
    wait
}

stop() {
    systemctl stop "machine-$tmpl*" "$tmpl-full-*"
    systemctl stop radvd
    ip link del "$mybr" 2>/dev/null
    netplan apply
    sed -i "/^### $tmpl/,/^###$/d" /etc/radvd.conf
    # restore previous inotify limits
    source "$machines/prev_inotifys" || return
    rm -f "$machines/prev_inotifys"
    (( prev_max_wd > 0 )) && sysctl fs.inotify.max_user_watches="$prev_max_wd"
    (( prev_max_inst > 0 )) && sysctl fs.inotify.max_user_instances="$prev_max_inst"
}

destroy() {
    rm -rf "$machines/$tmpl"
}

: "${machines:=/var/lib/machines}" # default location for systemd-nspawn machines
: "${vmnum:=100}" # how many containers to spawn
: "${intf:?specify the physical interface facing the LAN to connect to}"
: "${tmpl:?specify directory basename under $machines to store the containers\' OS template into}"
: "${mybr:=$tmpl-br}" # the temporary bridge will be named this

install || exit
setup || { destroy; exit 1; }
start || { stop; exit 1; }
stop
destroy

Once you have spawn "vm-like" containers, on the host you can also use either machinectl and systemctl to handle them. Examples:

  • machinectl shell <container-name> provides a handy way to get a shell into a specific container
  • machinectl alone, or also systemctl list-machines, provide the list of running containers
  • machinectl poweroff <container-name>, or also systemctl stop machine-<container-name> stops a container (you can also do poweroff from a shell inside the container)

For both scripts I went for IPv6 connectivity as it has native features for hosts autoconfiguration. If all your hosts on LAN are friendly IPv6 citizens they will self-configure a temporary address of the fddf::/64 network initiated on-fly by my scripts and advertised to the entire LAN (and of course shared with all containers).

Such "fddf::/64" IPv6 prefix is entirely arbitrary, and falls in the official prefix allocated by IANA for private networks. I've chosen it very handy so that from any host on your LAN you can just do ssh root@fddf::<vm-number>.

However it is not exactly compliant to how those prefixes should be generated, and should you wish to generate a compliant private prefix please read RFC 4193, particularly section 3.2.2.

In any case, said "vm-number" is from 1 to whatever number of guests you'll spawn, and I've left them decimal in an IPv6 hextet so you have room for up to 9999 addresses.

Of course you may also use IPv4 addresses, and here you have two options:

  • static addresses: just add them to the command line that spawns the containers (see comments there); however you will have to implement a way to pre-compute such addresses as per your needs
  • dhcp: the "docker-like" script has a commented line for enabling dhcp, the "vm-like" already does it on its own accord as per Ubuntu 20.04 systemd's default behavior
0

I would suggest self hosted Gitlab. It has kubernetes and docker integrated out of the box and will allow for automation of pretty much everything you describe needing to do.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.