# IPng Networks Lab environment ## High level overview There's a disk image on each hypervisor called the `proto` image, which serves as the base image for all VMs on it. Every now and again, the proto image is updated (Debian, FRR and VPP) and from that base image, lab VMs are cloned from it and local filesystem overrides are put in place on each clone. The lab is used, and when we're done with it, we simply destroy all clones. This way, each time the lab is started, it is in a pristine state. The `proto` image is shared among the hypervisors. Typically, maintenance will be performed on one of the hypervisors, and then the `proto` image is snapshotted and copied to the other machines. ### Proto maintenance The main `vpp-proto` image runs on `hvn0.chbtl0.ipng.ch` with a VM called `vpp-proto`. When you want to refresh the image, you can ``` hvn0-chbtl0:~$ virsh start --console vpp-proto ## Do the upgrades, make changes to vpp-proto's disk image ## You can always roll back to the previous snapshot image if you'd like to revert hvn0-chbtl0:~$ SNAP=$(date +%Y%m%d) ## 20221012 hvn0-chbtl0:~$ virsh shutdown --console vpp-proto hvn0-chbtl0:~$ sudo zfs snapshot ssd-vol0/vpp-proto-disk0@${SNAP}-release hvn0-chbtl0:~$ sudo zrepl signal wakeup vpp-proto-snapshots ``` There is a `zrepl` running on this machine, which can pick up the snapshot by manually waking up the daemon (see the last command above). Each of the hypervisors in the fleet will watch this replication endpoint, and if they see new snapshots arrive, they will do an incremental pull of the data to their own ZFS filesystem as a snapshot. Old/current running labs will not be disrupted, as they will be cloned off of old snapshots. You will find the image as `ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0`: ``` lab:~$ ssh -A root@hvn0.lab.ipng.ch 'zfs list -t snap' NAME USED AVAIL REFER MOUNTPOINT ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221013-release 0B - 6.04G - ``` ## Install Make sure that you're logged in to the `lab.ipng.ch` machine with an SSH key agent, or key forwarding, so that you can manipulate the hypervisors. Installing for the first time requires adding a few PIP packages: ``` lab:~/src/lab$ pip3 install hiyapyco lab:~/src/lab$ pip3 install ipaddress lab:~/src/lab$ pip3 install jinja2 lab:~/src/lab$ pip3 install jinja2-ansible-filters ``` .. after which the generator should be able to create its artifacts! ## Usage There are three hypervisor nodes each running one isolated lab environment: * hvn0.lab.ipng.ch runs VPP lab0 * hvn1.lab.ipng.ch runs VPP lab1 * hvn2.lab.ipng.ch runs VPP lab2 Now that we have a base image (in the form of `vpp-proto-disk0@$(date)-release`), we can make point-in-time clones of them, copy over any specifics (like IP addresses, hostname, SSH keys, Bird/FRR configs, etc). We do this on the lab controller `lab.ipng.ch` which: 1. Looks on the hypervisor to see if there is a running VM, and if there is, bails 1. Looks on the hypervisor to see if there is an existing cloned image, and if there is bails 1. Builds a local overlay directory using a generator and Jinja2 (ie. `build/vpp0-0/`) 1. Creates a new cloned filesystem based off of a base `vpp-proto-disk0` snapshot on the hypervisor 1. Mounts that filesystem 1. Rsync's the built overlay into that filesystem 1. Unmounts the filesystem 1. Starts the VM using the newly built filesystem 1. Commits the `openvswitch` topology configuration (see `overlays/*/ovs-config.sh`) Of course, the first two steps are meant to ensure we don't clobber running labs, which can be overridden with the `--force` flag. And when the lab is finished, it's common practice to shut down the VMs and destroy the clones. ``` lab:~/src/lab$ ./generate --host hvn0.lab.ipng.ch --overlay bird lab:~/src/lab$ LAB=0 ./destroy ## remove VMs and ZFS clones lab:~/src/lab$ LAB=0 ./create ## create ZFS 'pristine' snapshot lab:~/src/lab$ LAB=0 ./virshall start ## Start the VMs lab:~/src/lab$ LAB=0 ./virshall shutdown ## Gracefully stop the VMs lab:~/src/lab$ LAB=0 ./virshall destroy ## Hard poweroff the VMs lab:~/src/lab$ LAB=0 ./pristine ## return the lab to the latest 'pristine' snapshot ``` ### Generate The generator reads input YAML files one after another merging and overriding them as it goes along, then for each node building a `node` dictionary alongside the `lab` and other information from the config files. Then, it read the `overlays` dictionary for a given --overlay type, reading all the common files from that overlay directory and assembling an output directory which will hold the per-node overrides, emitting them to the directory specified by the --build flag. It also copies in any per-node files (if they exist) from the overlays/$(overlay)/hostname/$(node.hostname)/ giving full control of the filesystem's ultimate contents. ``` lab:~/src/lab$ ./generate --host hvn0.lab.ipng.ch --overlay bird lab:~/src/lab$ git status build/bird/hvn0.lab.ipng.ch/ ``` ### Destroy Ensures that both the VMs are not running (and will stop them if they are), and their filesystem clones are destroyed. Obviously this is the most dangerous operation of the bunch, but the philosophy of the lab is that the VMs can be re-created off of a stable base image and a generated build. ``` lab:~/src/lab$ LAB=0 ./command destroy ## remove VMs and ZFS clones on hvn0.lab.ipng.ch ``` ### Create Based on a generated directory and a lab YAML description, uses SSH to connect to the hypervisor, create a clone of the base `vpp-proto` snapshot, mount it locally in a staging directory, then rsync over the generated overlay from files from the generator output (build/$(overlay)/$(node.hostname)) after which the directory is unmounted and a specific ZFS snapshot is created called `pristine`. The VMs are booted off of their `pristine` snapshot. Typically, it's necessary to destroy/create, only when the build or the base image change. Otherwise, the lab can be brought back into a _factory default_ state by rolling back to the `pristine` snapshot. ``` lab:~/src/lab$ LAB=0 ./create ## create ZFS clones, copy in the build ``` ### Pristine In the process of creating the ZFS clones and their per-node filesystems, a snapshot of each VM's boot disk is made, and this is called the `pristine` snapshot. After using the lab, it can be quickly brought back into a default state by rolling back the disks to the `pristine` snapshot and restarting the virtual machines. ``` lab:~/src/lab$ LAB=0 ./command start ## Start the VMs lab:~/src/lab$ LAB=0 ./command shutdown ## Start the VMs lab:~/src/lab$ LAB=0 ./command pristine ## return the lab to the latest 'pristine' snapshot ```