ipng/lab

Go to file

Pim van Pelt 5bc0d2c84c Futher flesh out the generator

- Create a per-host directory called overlays/$(overlay)/hostname/$(host.hostname) to have files that
  ought to be included only for that host. Things like /etc/vpp/config/interface.vpp go there
- Rename the "templates" directory as overlays/$(overlay)/common/
- Render one after the other, so a file can exist in common and hostname, the latter taking precedence
- Remove the config for 'pubkeys' and instead just make these common/root/.ssh/* and common/home/ipng/.ssh/*
- Split out bootstrap.vpp so that a per-host include can be overriden for interfaces.vpp
  - But keep a default common/etc/vpp/config/interface.vpp as a placeholder, so that VPP will still start
    even if the per-hostname override isn't provided.
- Generate a fresh output for 'default' on all machines

2022-10-18 16:30:52 +02:00

build/default

Futher flesh out the generator

2022-10-18 16:30:52 +02:00

config

Futher flesh out the generator

2022-10-18 16:30:52 +02:00

overlays/bird

Futher flesh out the generator

2022-10-18 16:30:52 +02:00

.gitignore

initial checkin - a start of the generator, and some config files and overlays

2022-10-13 20:41:34 +02:00

create

Add a placeholder flow for create/destroy

2022-10-17 09:44:48 +02:00

destroy

Add a placeholder flow for create/destroy

2022-10-17 09:44:48 +02:00

generate

Futher flesh out the generator

2022-10-18 16:30:52 +02:00

README.md

initial checkin - a start of the generator, and some config files and overlays

2022-10-13 20:41:34 +02:00

README.md

IPng Networks Lab environment

High level overview

There's a disk image on each hypervisor called the proto image, which serves as the base image for all VMs on it. Every now and again, the proto image is updated (Debian, FRR and VPP) and from that base image, lab VMs are cloned from it and local filesystem overrides are put in place on each clone. The lab is used, and when we're done with it, we simply destroy all clones. This way, each time the lab is started, it is in a pristine state.

The proto image is shared among the hypervisors. Typically, maintenance will be performed on one of the hypervisors, and then the proto image is snapshotted and copied to the other machines.

Proto maintenance

The main vpp-proto image runs on hvn0.chbtl0.ipng.ch with a VM called vpp-proto. When you want to refresh the image, you can

spongebob:~$ ssh -A root@hvn0.chbtl0.ipng.ch

SNAP=$(date +%Y%m%d) ## 20221012
zfs snapshot ssd-vol0/vpp-proto-disk0@${SNAP}-before
virsh start --console vpp-proto

## Do the upgrades, make changes to vpp-proto's disk image
## You can always roll back to the -before image if you'd like to revert

virsh shutdown --console vpp-proto
zfs snapshot ssd-vol0/vpp-proto-disk0@${SNAP}-release
zrepl signal wakeup vpp-proto-snapshots

There is a zrepl running on this machine, which can pick up the snapshot by manually waking up the daemon (see the last command above). Each of the hypervisors in the fleet will watch this replication endpoint, and if they see new snapshots arrive, they will do an incremental pull of the data to their own ZFS filesystem as a snapshot. Old/current running labs will not be disrupted, as they will be cloned off of old snapshots.

You will find the image as ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0:

spongebob:~$ ssh -A root@hvn0.lab.ipng.ch 'zfs list -t snap'
NAME                                                                     USED  AVAIL     REFER  MOUNTPOINT
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221013-release     0B      -     6.04G  -

Usage

There are three hypervisor nodes each running one isolated lab environment:

hvn0.lab.ipng.ch runs VPP lab0
hvn1.lab.ipng.ch runs VPP lab1
hvn2.lab.ipng.ch runs VPP lab2

Now that we have a base image (in the form of vpp-proto-disk0@$(date)-release), we can make point-in-time clones of them, copy over any specifics (like IP addresses, hostname, SSH keys, Bird/FRR configs, etc). We do this on the lab controller lab.ipng.ch which:

Looks on the hypervisor to see if there is a running VM, and if there is, bails
Looks on the hypervisor to see if there is an existing cloned image, and if there is bails
Builds a local overlay directory using a generator and Jinja2 (ie. build/vpp0-0/)
Creates a new cloned filesystem based off of a base vpp-proto-disk0 snapshot on the hypervisor
Mounts that filesystem
Rsync's the built overlay into that filesystem
Unmounts the filesystem
Starts the VM using the newly built filesystem

Of course, the first two steps are meant to ensure we don't clobber running labs, which can be overridden with the --force flag. And when the lab is finished, it's common practice to shut down the VMs and destroy the clones.

lab:~/src/ipng-lab$ ./destroy  --host hvn0.lab.ipng.ch
lab:~/src/ipng-lab$ ./generate --host hvn0.lab.ipng.ch --overlay bird
lab:~/src/ipng-lab$ ./create   --host hvn0.lab.ipng.ch --overlay bird

Generate

The generator reads input YAML files one after another merging and overriding them as it goes along, then for each node building a node dictionary alongside the lab and other information from the config files. Then, it read the overlays dictionary for a given --overlay type, reading all the template files from that overlay directory and assembling an output directory which will hold the per-node overrides, emitting them to the directory specified by the --build flag. It also copies in any per-node files (if they exist) from the overlays/$(overlay)/blobs/$(node.hostname)/ giving full control of the filesystem's contents.

Create

Based on a generated directory and a lab YAML description, uses SSH to connect to the hypervisor, create a clone of the base vpp-proto snapshot, mount it locally in a staging directory, then rsync over the generated overlay from files from the generator output (build/$(overlay)/$(node.hostname)) after which the directory is unmounted and the virtual machine booted from the clone.

If the VM is running, or there exists a clone, an error is printed and the process skips over that node. It's wise to run destroy before create to ensure the hypervisors are in a pristine state.

Destroy

Ensures that both the VMs are not running (and will stop them if they are), and their filesystem clones are destroyed. Obviously this is the most dangerous operation of the bunch.