All checks were successful
continuous-integration/drone/push Build is passing
646 lines
29 KiB
Markdown
646 lines
29 KiB
Markdown
---
|
||
date: "2022-10-14T19:52:11Z"
|
||
title: VPP Lab - Setup
|
||
aliases:
|
||
- /s/articles/2022/10/14/lab-1.html
|
||
---
|
||
|
||
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
||
|
||
# Introduction
|
||
|
||
In a previous post ([VPP Linux CP - Virtual Machine Playground]({{< ref "2021-12-23-vpp-playground" >}})), I
|
||
wrote a bit about building a QEMU image so that folks can play with the [Vector Packet Processor](https://fd.io)
|
||
and the Linux Control Plane code. Judging by our access logs, this image has definitely been downloaded a bunch,
|
||
and I myself use it regularly when I want to tinker a little bit, without wanting to impact the production
|
||
routers at [AS8298]({{< ref "2021-02-27-network" >}}).
|
||
|
||
The topology of my tests has become a bit more complicated over time, and often just one router would not be
|
||
enough. Yet, repeatability is quite important, and I found myself constantly reinstalling / recheckpointing
|
||
the `vpp-proto` virtual machine I was using. I got my hands on some LAB hardware, so it's time for an upgrade!
|
||
|
||
## IPng Networks LAB - Physical
|
||
|
||
{{< image width="300px" float="left" src="/assets/lab/physical.png" alt="Physical" >}}
|
||
|
||
First, I specc'd out a few machines that will serve as hypervisors. From top to bottom in the picture here, two
|
||
FS.com S5680-20SQ switches -- I reviewed these earlier [[ref]({{< ref "2021-08-07-fs-switch" >}})], and I really
|
||
like these, as they come with 20x10G, 4x25G and 2x40G ports, an OOB management port and serial to configure them.
|
||
Under it, is its larger brother, with 48x10G and 8x100G ports, the FS.com S5860-48SC. Although it's a bit more
|
||
expensive, it's also necessary because I often test VPP at higher bandwidth, and as such being able to make
|
||
ethernet topologies by mixing 10, 25, 40, 100G is super useful for me. So, this switch is `fsw0.lab.ipng.ch`
|
||
and dedicated to lab experiments.
|
||
|
||
Connected to the switch are my trusty `Rhino` and `Hippo` machines. If you remember that game _Hungry Hungry Hippos_
|
||
that's where the name comes from. They are both Ryzen 5950X on ASUS B550 motherboard, with each 2x1G i350 copper
|
||
nics (pictured here not connected), and 2x100G i810 QSFP network cards (properly slotted in the motherboard'ss
|
||
PCIe v4.0 x16 slot).
|
||
|
||
Finally, three Dell R720XD machines serve as the to be built VPP testbed. They each come with 128GB of RAM, 2x500G
|
||
SSDs, two Intel 82599ES dual 10G NICs (four ports total), and four Broadcom BCM5720 1G NICs. The first 1G port is
|
||
connected to a management switch, and it doubles up as an IPMI speaker, so I can turn on/off the hypervisors
|
||
remotely. All four 10G ports are connected with DACs to `fsw0-lab`, as are two 1G copper ports (the blue UTP
|
||
cables). Everything can be turned on/off remotely, which is useful for noise, heat and overall the environment 🍀.
|
||
|
||
## IPng Networks LAB - Logical
|
||
|
||
{{< image width="200px" float="right" src="/assets/lab/logical.svg" alt="Logical" >}}
|
||
|
||
I have three of these Dell R720XD machines in the lab, and each one of them will run one complete lab environment,
|
||
consisting of four VPP virtual machines, network plumbing, and uplink. That way, I can turn on one hypervisor,
|
||
say `hvn0.lab.ipng.ch`, prepare and boot the VMs, mess around with it, and when I'm done, return the VMs to a
|
||
pristine state, and turn off the hypervisor. And, because I have three of these machines, I can run three separate
|
||
LABs at the same time, or one really big one spanning all the machines. Pictured on the right is a logical sketch
|
||
of one of the LABs (LAB id=0), with a bunch of VPP virtual machines, each four NICs daisychained together, with
|
||
a few NICs left for experimenting.
|
||
|
||
### Headend
|
||
|
||
At the top of the logical environment, I am going to be using one of our production machines (`hvn0.chbtl0.ipng.ch`)
|
||
which will run a permanently running LAB _headend_, a Debian VM called `lab.ipng.ch`. This allows me to hermetically
|
||
seal the LAB environments, letting me run them entirely in RFC1918 space, and by forcing the LAbs to be connected
|
||
under this machine, I can ensure that no unwanted traffic enters or exits the network [imagine a loadtest at
|
||
100Gbit accidentally leaking, this may or totally may not have once happened to me before ...].
|
||
|
||
### Disk images
|
||
|
||
On this production hypervisor (`hvn0.chbtl0.ipng.ch`), I'll also prepare and maintain a prototype `vpp-proto` disk
|
||
image, which will serve as a consistent image to boot the LAB virtual machines. This _main_ image will be replicated
|
||
over the network into all three `hvn0 - hvn2` hypervisor machines. This way, I can do periodical maintenance on the
|
||
_main_ `vpp-proto` image, snapshot it, publish it as a QCOW2 for downloading (see my [[VPP Linux CP - Virtual Machine
|
||
Playground]({{< ref "2021-12-23-vpp-playground" >}})] post for details on how it's built and what you can do with it
|
||
yourself!). The snapshots will then also be sync'd to all hypervisors, and from there I can use simple ZFS filesystem
|
||
_cloning_ and _snapshotting_ to maintain the LAB virtual machines.
|
||
|
||
### Networking
|
||
|
||
Each hypervisor will get an install of [Open vSwitch](https://openvswitch.org/), a production quality, multilayer virtual switch designed to
|
||
enable massive network automation through programmatic extension, while still supporting standard management interfaces
|
||
and protocols. This takes lots of the guesswork and tinkering out of Linux bridges in KVM/QEMU, and it's a perfect fit
|
||
due to its tight integration with `libvirt` (the thing most of us use in Debian/Ubuntu hypervisors). If need be, I can
|
||
add one or more of the 1G or 10G ports as well to the OVS fabric, to build more complicated topologies. And, because
|
||
the OVS infrastructure and libvirt both allow themselves to be configured over the network, I can control all aspects
|
||
of the runtime directly from the `lab.ipng.ch` headend, not having to log in to the hypervisor machines at all. Slick!
|
||
|
||
# Implementation Details
|
||
|
||
I start with image management. On the production hypervisor, I create a 6GB ZFS dataset that will serve as my `vpp-proto`
|
||
machine, and install it using the exact same method as the playground [[ref]({{< ref "2021-12-23-vpp-playground" >}})].
|
||
Once I have it the way I like it, I'll poweroff the VM, and see to this image being replicated to all hypervisors.
|
||
|
||
## ZFS Replication
|
||
|
||
Enter [zrepl](https://zrepl.github.io/), a one-stop, integrated solution for ZFS replication. This tool is incredibly
|
||
powerful, and can do snapshot management, sourcing / sinking replication, of course using incremental snapshots as they
|
||
are native to ZFS. Because this is a LAB article, not a zrepl tutorial, I'll just cut to the chase and show the
|
||
configuration I came up with.
|
||
|
||
```
|
||
pim@hvn0-chbtl0:~$ cat << EOF | sudo tee /etc/zrepl/zrepl.yml
|
||
global:
|
||
logging:
|
||
# use syslog instead of stdout because it makes journald happy
|
||
- type: syslog
|
||
format: human
|
||
level: warn
|
||
|
||
jobs:
|
||
- name: snap-vpp-proto
|
||
type: snap
|
||
filesystems:
|
||
'ssd-vol0/vpp-proto-disk0<': true
|
||
snapshotting:
|
||
type: manual
|
||
pruning:
|
||
keep:
|
||
- type: last_n
|
||
count: 10
|
||
|
||
- name: source-vpp-proto
|
||
type: source
|
||
serve:
|
||
type: stdinserver
|
||
client_identities:
|
||
- "hvn0-lab"
|
||
- "hvn1-lab"
|
||
- "hvn2-lab"
|
||
filesystems:
|
||
'ssd-vol0/vpp-proto-disk0<': true # all filesystems
|
||
snapshotting:
|
||
type: manual
|
||
EOF
|
||
|
||
pim@hvn0-chbtl0:~$ cat << EOF | sudo tee -a /root/.ssh/authorized_keys
|
||
# ZFS Replication Clients for IPng Networks LAB
|
||
command="zrepl stdinserver hvn0-lab",restrict ecdsa-sha2-nistp256 <omitted> root@hvn0.lab.ipng.ch
|
||
command="zrepl stdinserver hvn1-lab",restrict ecdsa-sha2-nistp256 <omitted> root@hvn1.lab.ipng.ch
|
||
command="zrepl stdinserver hvn2-lab",restrict ecdsa-sha2-nistp256 <omitted> root@hvn2.lab.ipng.ch
|
||
EOF
|
||
```
|
||
|
||
To unpack this, there are two jobs configured in **zrepl**:
|
||
|
||
* `snap-vpp-proto` - the purpose of this job is to track snapshots as they are created. Normally, zrepl is configured
|
||
to automatically make snapshots every hour and copy them out, but in my case, I only want to take snapshots when I changed
|
||
and released the `vpp-proto` image, not periodically. So, I set the snapshotting to manual, and let the system keep the last
|
||
ten images.
|
||
* `source-vpp-proto` - this is a source job that uses a _lazy_ (albeit fine in this lab environment) method to serve the
|
||
snapshots to clients. By adding these SSH keys to the _authorized_keys_ file, but restricting them to be able to execute
|
||
only the `zrepl stdinserver` command, and nothing else (ie. these keys cannot log in to the machine). If any given server
|
||
were to present thesze keys, I can now map them to a **zrepl client** (for example, `hvn0-lab` for the SSH key presented by
|
||
hostname `hvn0.lab.ipng.ch`. The source job now knows to serve the listed filesystems (and their dataset children, noted by
|
||
the `<` suffix), to those clients.
|
||
|
||
For the client side, each of the hypervisors gets only one job, called a _pull_ job, which will periodically wake up (every
|
||
minute) and ensure that any pending snapshots and their incrementals from the remote _source_ are slurped in and replicated
|
||
to a _root_fs_ dataset, in this case I called it `ssd-vol0/hvn0.chbtl0.ipng.ch` so I can track where the datasets come from.
|
||
|
||
```
|
||
pim@hvn0-lab:~$ sudo ssh-keygen -t ecdsa -f /etc/zrepl/ssh/identity -C "root@$(hostname -f)"
|
||
pim@hvn0-lab:~$ cat << EOF | sudo tee /etc/zrepl/zrepl.yml
|
||
global:
|
||
logging:
|
||
# use syslog instead of stdout because it makes journald happy
|
||
- type: syslog
|
||
format: human
|
||
level: warn
|
||
|
||
jobs:
|
||
- name: vpp-proto
|
||
type: pull
|
||
connect:
|
||
type: ssh+stdinserver
|
||
host: hvn0.chbtl0.ipng.ch
|
||
user: root
|
||
port: 22
|
||
identity_file: /etc/zrepl/ssh/identity
|
||
root_fs: ssd-vol0/hvn0.chbtl0.ipng.ch
|
||
interval: 1m
|
||
pruning:
|
||
keep_sender:
|
||
- type: regex
|
||
regex: '.*'
|
||
keep_receiver:
|
||
- type: last_n
|
||
count: 10
|
||
recv:
|
||
placeholder:
|
||
encryption: off
|
||
```
|
||
|
||
After restarting zrepl for each of the machines (the _source_ machine and the three _pull_ machines), I can now do the
|
||
following cool hat trick:
|
||
|
||
```
|
||
pim@hvn0-chbtl0:~$ virsh start --console vpp-proto
|
||
## Do whatever maintenance, and then poweroff the VM
|
||
pim@hvn0-chbtl0:~$ sudo zfs snapshot ssd-vol0/vpp-proto-disk0@20221019-release
|
||
pim@hvn0-chbtl0:~$ sudo zrepl signal wakeup source-vpp-proto
|
||
```
|
||
|
||
This signals the zrepl daemon to re-read the snapshots, which will pick up the newest one, and then without me doing
|
||
much of anything else:
|
||
|
||
```
|
||
pim@hvn0-lab:~$ sudo zfs list -t all | grep vpp-proto
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0 6.60G 367G 6.04G -
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221013-release 499M - 6.04G -
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221018-release 24.1M - 6.04G -
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221019-release 0B - 6.04G -
|
||
```
|
||
|
||
That last image was just pushed automatically to all hypervisors! If they're turned off, no worries, as soon as they
|
||
start up, their local **zrepl** will make its next minutely poll, and pull in all snapshots, bringing the machine up
|
||
to date. So even when the hypervisors are normally turned off, this is zero-touch and maintenance free.
|
||
|
||
## VM image maintenance
|
||
|
||
Now that I have a stable image to work off of, all I have to do is `zfs clone` this image into new per-VM datasets,
|
||
after which I can mess around on the VMs all I want, and when I'm done, I can `zfs destroy` the clone and bring it
|
||
back to normal. However, I clearly don't want one and the same clone for each of the VMs, as they do have lots of
|
||
config files that are specific to that one _instance_. For example, the mgmt IPv4/IPv6 addresses are unique, and
|
||
the VPP and Bird/FRR configs are unique as well. But how unique are they, really?
|
||
|
||
Enter Jinja (known mostly from Ansible). I decide to make some form of per-VM config files that are generated based
|
||
on some templates. That way, I can clone the base ZFS dataset, copy in the deltas, and boot that instead. And to
|
||
be extra efficient, I can also make a per-VM `zfs snapshot` of the cloned+updated filesystem, before tinkering with
|
||
the VMs, which I'll call a `pristine` snapshot. Still with me?
|
||
|
||
1. First, clone the base dataset into a per-VM dataset, say `ssd-vol0/vpp0-0`
|
||
1. Then, generate a bunch of override files, copying them into the per-VM dataset `ssd-vol0/vpp0-0`
|
||
1. Finally, create a snapshot of that, called `ssd-vol0/vpp0-0@pristine` and boot off of that.
|
||
|
||
Now, returning the VM to a pristine state is simply a matter of shutting down the VM, performing a `zfs rollback`
|
||
to the `pristine` snapshot, and starting the VM again. Ready? Let's go!
|
||
|
||
### Generator
|
||
|
||
So off I go, writing a small Python generator that uses Jinja to read a bunch of YAML files, merging them along
|
||
the way, and then traversing a set of directories with template files and per-VM overrides, to assemble a build
|
||
output directory with a fully formed set of files that I can copy into the per-VM dataset.
|
||
|
||
Take a look at this as a minimally viable configuration:
|
||
|
||
```
|
||
pim@lab:~/src/lab$ cat config/common/generic.yaml
|
||
overlays:
|
||
default:
|
||
path: overlays/bird/
|
||
build: build/default/
|
||
lab:
|
||
mgmt:
|
||
ipv4: 192.168.1.80/24
|
||
ipv6: 2001:678:d78:101::80/64
|
||
gw4: 192.168.1.252
|
||
gw6: 2001:678:d78:101::1
|
||
nameserver:
|
||
search: [ "lab.ipng.ch", "ipng.ch", "rfc1918.ipng.nl", "ipng.nl" ]
|
||
nodes: 4
|
||
|
||
pim@lab:~/src/lab$ cat config/hvn0.lab.ipng.ch.yaml
|
||
lab:
|
||
id: 0
|
||
ipv4: 192.168.10.0/24
|
||
ipv6: 2001:678:d78:200::/60
|
||
nameserver:
|
||
addresses: [ 192.168.10.4, 2001:678:d78:201::ffff ]
|
||
hypervisor: hvn0.lab.ipng.ch
|
||
```
|
||
|
||
Here I define a common config file with fields and attributes which will apply to all LAB environments, things
|
||
such as the mgmt network, nameserver search paths, and how many VPP virtual machine nodes I want to build. Then,
|
||
for `hvn0.lab.ipng.ch`, I specify an IPv4 and IPv6 prefix assigned to it, some specific nameserver endpoints
|
||
that will point at an `unbound` running on `lab.ipng.ch` itself.
|
||
|
||
I can now create any file I'd like which may use variable substition and other jinja2 style templating. Take
|
||
for example these two files:
|
||
|
||
```
|
||
pim@lab:~/src/lab$ cat overlays/bird/common/etc/netplan/01-netcfg.yaml.j2
|
||
network:
|
||
version: 2
|
||
renderer: networkd
|
||
ethernets:
|
||
enp1s0:
|
||
optional: true
|
||
accept-ra: false
|
||
dhcp4: false
|
||
addresses: [ {{node.mgmt.ipv4}}, {{node.mgmt.ipv6}} ]
|
||
gateway4: {{lab.mgmt.gw4}}
|
||
gateway6: {{lab.mgmt.gw6}}
|
||
|
||
pim@lab:~/src/lab$ cat overlays/bird/common/etc/netns/dataplane/resolv.conf.j2
|
||
domain lab.ipng.ch
|
||
search{% for domain in lab.nameserver.search %} {{ domain }}{% endfor %}
|
||
|
||
{% for resolver in lab.nameserver.addresses %}
|
||
nameserver {{ resolver }}
|
||
{% endfor %}
|
||
```
|
||
|
||
The first file is a [[NetPlan.io](https://netplan.io/)] configuration that substitutes the correct management
|
||
IPv4 and IPv6 addresses and gateways. The second one enumerates a set of search domains and nameservers, so that
|
||
each LAB can have their own unique resolvers. I point these at the `lab.ipng.ch` uplink interface, in the case
|
||
of the LAB `hvn0.lab.ipng.ch`, this will be 192.168.10.4 and 2001:678:d78:201::ffff, but on `hvn1.lab.ipng.ch`
|
||
I can override that to become 192.168.11.4 and 2001:678:d78:211::ffff.
|
||
|
||
There's one subdirectory for each _overlay_ type (imagine that I want a lab that runs Bird2, but I may also
|
||
want one which runs FRR, or another thing still). Within the _overlay_ directory, there's one _common_
|
||
tree, with files that apply to every machine in the LAB, and a _hostname_ tree, with files that apply
|
||
only to specific nodes (VMs) in the LAB:
|
||
|
||
```
|
||
pim@lab:~/src/lab$ tree overlays/default/
|
||
overlays/default/
|
||
├── common
|
||
│ ├── etc
|
||
│ │ ├── bird
|
||
│ │ │ ├── bfd.conf.j2
|
||
│ │ │ ├── bird.conf.j2
|
||
│ │ │ ├── ibgp.conf.j2
|
||
│ │ │ ├── ospf.conf.j2
|
||
│ │ │ └── static.conf.j2
|
||
│ │ ├── hostname.j2
|
||
│ │ ├── hosts.j2
|
||
│ │ ├── netns
|
||
│ │ │ └── dataplane
|
||
│ │ │ └── resolv.conf.j2
|
||
│ │ ├── netplan
|
||
│ │ │ └── 01-netcfg.yaml.j2
|
||
│ │ ├── resolv.conf.j2
|
||
│ │ └── vpp
|
||
│ │ ├── bootstrap.vpp.j2
|
||
│ │ └── config
|
||
│ │ ├── defaults.vpp
|
||
│ │ ├── flowprobe.vpp.j2
|
||
│ │ ├── interface.vpp.j2
|
||
│ │ ├── lcp.vpp
|
||
│ │ ├── loopback.vpp.j2
|
||
│ │ └── manual.vpp.j2
|
||
│ ├── home
|
||
│ │ └── ipng
|
||
│ └── root
|
||
├── hostname
|
||
├── vpp0-0
|
||
└── etc
|
||
(etc) └── vpp
|
||
└── config
|
||
└── interface.vpp
|
||
```
|
||
|
||
Now all that's left to do is generate this hierarchy, and of course I can check this in to git and track changes to the
|
||
templates and their resulting generated filesystem overrides over time:
|
||
|
||
```
|
||
pim@lab:~/src/lab$ ./generate -q --host hvn0.lab.ipng.ch
|
||
pim@lab:~/src/lab$ find build/default/hvn0.lab.ipng.ch/vpp0-0/ -type f
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/home/ipng/.ssh/authorized_keys
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/hosts
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/resolv.conf
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/static.conf
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/bfd.conf
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/bird.conf
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/ibgp.conf
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/ospf.conf
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/loopback.vpp
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/flowprobe.vpp
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/interface.vpp
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/defaults.vpp
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/lcp.vpp
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/manual.vpp
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/bootstrap.vpp
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/netplan/01-netcfg.yaml
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/netns/dataplane/resolv.conf
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/hostname
|
||
build/default/hvn0.lab.ipng.ch/vpp0-0/root/.ssh/authorized_keys
|
||
```
|
||
|
||
## Open vSwitch maintenance
|
||
|
||
The OVS installs on each Debian hypervisor in the lab is the same. I install the required Debian packages, create
|
||
a switchfabric, add one physical network port (the one that will serve as the _uplink_ (VLAN 10 in the sketch above)
|
||
for the LAB), and all the virtio ports from KVM.
|
||
|
||
```
|
||
pim@hvn0-lab:~$ sudo vi /etc/netplan/01-netcfg.yaml
|
||
network:
|
||
vlans:
|
||
uplink:
|
||
optional: true
|
||
accept-ra: false
|
||
dhcp4: false
|
||
link: eno1
|
||
id: 200
|
||
pim@hvn0-lab:~$ sudo netplan apply
|
||
pim@hvn0-lab:~$ sudo apt install openvswitch-switch python3-openvswitch
|
||
pim@hvn0-lab:~$ sudo ovs-vsctl add-br vpplan
|
||
pim@hvn0-lab:~$ sudo ovs-vsctl add-port vpplan uplink tag=10
|
||
```
|
||
|
||
The `vpplan` switch fabric and its uplink port will persist across reboots. Then I add a small change to `libvirt`
|
||
defined virtual machines:
|
||
|
||
```
|
||
pim@hvn0-lab:~$ virsh edit vpp0-0
|
||
...
|
||
<interface type='bridge'>
|
||
<mac address='52:54:00:00:10:00'/>
|
||
<source bridge='vpplan'/>
|
||
<virtualport type='openvswitch' />
|
||
<target dev='vpp0-0-0'/>
|
||
<model type='virtio'/>
|
||
<mtu size='9216'/>
|
||
<address type='pci' domain='0x0000' bus='0x10' slot='0x00' function='0x0' multifunction='on'/>
|
||
</interface>
|
||
<interface type='bridge'>
|
||
<mac address='52:54:00:00:10:01'/>
|
||
<source bridge='vpplan'/>
|
||
<virtualport type='openvswitch' />
|
||
<target dev='vpp0-0-1'/>
|
||
<model type='virtio'/>
|
||
<mtu size='9216'/>
|
||
<address type='pci' domain='0x0000' bus='0x10' slot='0x00' function='0x1'/>
|
||
</interface>
|
||
... etc
|
||
```
|
||
|
||
That the only two things I need to do are ensure that the _source bridge_ will be called the same as
|
||
the OVS fabric, in my case `vpplan`, and the _virtualport_ type is `openvswitch`, and that's it!
|
||
Once all four `vpp0-*` virtual machines each have all four of their network cards updated, when they
|
||
boot, the hypervisor will add them each as new untagged ports in the OVS fabric.
|
||
|
||
To then build the topology that I have in mind for the LAB, where each VPP machine is daisychained to
|
||
its siblin, all we have to do is program that into the OVS configuration:
|
||
|
||
```
|
||
pim@hvn0-lab:~$ cat << EOF > ovs-config.sh
|
||
#!/bin/sh
|
||
#
|
||
# OVS configuration for the `default` overlay
|
||
|
||
LAB=${LAB:=0}
|
||
for node in 0 1 2 3; do
|
||
for int in 0 1 2 3; do
|
||
ovs-vsctl set port vpp${LAB}-${node}-${int} vlan_mode=native-untagged
|
||
done
|
||
done
|
||
|
||
# Uplink is VLAN 10
|
||
ovs-vsctl add port vpp${LAB}-0-0 tag 10
|
||
ovs-vsctl add port uplink tag 10
|
||
|
||
# Link vpp${LAB}-0 <-> vpp${LAB}-1 in VLAN 20
|
||
ovs-vsctl add port vpp${LAB}-0-1 tag 20
|
||
ovs-vsctl add port vpp${LAB}-1-0 tag 20
|
||
|
||
# Link vpp${LAB}-1 <-> vpp${LAB}-2 in VLAN 21
|
||
ovs-vsctl add port vpp${LAB}-1-1 tag 21
|
||
ovs-vsctl add port vpp${LAB}-2-0 tag 21
|
||
|
||
# Link vpp${LAB}-2 <-> vpp${LAB}-3 in VLAN 22
|
||
ovs-vsctl add port vpp${LAB}-2-1 tag 22
|
||
ovs-vsctl add port vpp${LAB}-3-0 tag 22
|
||
EOF
|
||
|
||
pim@hvn0-lab:~$ chmod 755 ovs-config.sh
|
||
pim@hvn0-lab:~$ sudo ./ovs-config.sh
|
||
```
|
||
|
||
The first block here wheels over all nodes and then for all of ther ports, sets the VLAN mode to what
|
||
OVS calleds 'native-untagged'. In this mode, the `tag` becomes the VLAN in which the port will operate,
|
||
but, to add as well dot1q tagged additional VLANs, we can use the syntax `add port ... trunks 10,20,30`.
|
||
|
||
TO see the configuration, `ovs-vsctl show port vpp0-0-0` will show the switch port configuration, while
|
||
`ovs-vsctl show interface vpp0-0-0` will show the virtual machine's NIC configuration (think of the
|
||
difference here as the switch port on the one hand, and the NIC (interface) plugged into it on the other).
|
||
|
||
### Deployment
|
||
|
||
There's three main points to consider when deploying these lab VMs:
|
||
|
||
1. Create the VMs and their ZFS datasets
|
||
1. Destroy the VMs and their ZFS datasets
|
||
1. Bring the VMs into a pristine state
|
||
|
||
#### Create
|
||
|
||
If the hypervisor doesn't yet have a LAB running, we need to create it:
|
||
|
||
```
|
||
BASE=${BASE:=ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221019-release}
|
||
BUILD=${BUILD:=default}
|
||
LAB=${LAB:=0}
|
||
|
||
## Do not touch below this line
|
||
LABDIR=/var/lab
|
||
STAGING=$LABDIR/staging
|
||
HVN="hvn${LAB}.lab.ipng.ch"
|
||
|
||
echo "* Cloning base"
|
||
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
|
||
mkdir -p $STAGING/\$VM; zfs clone $BASE ssd-vol0/\$VM; done"
|
||
sleep 1
|
||
|
||
echo "* Mounting in staging"
|
||
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
|
||
mount /dev/zvol/ssd-vol0/\$VM-part1 $STAGING/\$VM; done"
|
||
|
||
echo "* Rsyncing build"
|
||
rsync -avugP build/$BUILD/$HVN/ root@hvn${LAB}.lab.ipng.ch:$STAGING
|
||
|
||
echo "* Setting permissions"
|
||
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
|
||
chown -R root. $STAGING/\$VM/root; done"
|
||
|
||
echo "* Unmounting and snapshotting pristine state"
|
||
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
|
||
umount $STAGING/\$VM; zfs snapshot ssd-vol0/\${VM}@pristine; done"
|
||
|
||
echo "* Starting VMs"
|
||
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
|
||
virsh start \$VM; done"
|
||
|
||
echo "* Committing OVS config"
|
||
scp overlays/$BUILD/ovs-config.sh root@$HVN:$LABDIR
|
||
ssh root@$HVN "set -x; LAB=$LAB $LABDIR/ovs-config.sh"
|
||
```
|
||
|
||
After running this, the hypervisor will have 4 clones, and 4 snapshots (one for each virtual machine):
|
||
|
||
```
|
||
root@hvn0-lab:~# zfs list -t all
|
||
NAME USED AVAIL REFER MOUNTPOINT
|
||
ssd-vol0 6.80G 367G 24K /ssd-vol0
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch 6.60G 367G 24K none
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0 6.60G 367G 24K none
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0 6.60G 367G 6.04G -
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221013-release 499M - 6.04G -
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221018-release 24.1M - 6.04G -
|
||
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221019-release 0B - 6.04G -
|
||
ssd-vol0/vpp0-0 43.6M 367G 6.04G -
|
||
ssd-vol0/vpp0-0@pristine 1.13M - 6.04G -
|
||
ssd-vol0/vpp0-1 25.0M 367G 6.04G -
|
||
ssd-vol0/vpp0-1@pristine 1.14M - 6.04G -
|
||
ssd-vol0/vpp0-2 42.2M 367G 6.04G -
|
||
ssd-vol0/vpp0-2@pristine 1.13M - 6.04G -
|
||
ssd-vol0/vpp0-3 79.1M 367G 6.04G -
|
||
ssd-vol0/vpp0-3@pristine 1.13M - 6.04G -
|
||
```
|
||
|
||
The last thing the create script does is commit the OVS configuration, because when the VMs are shutdown
|
||
or newly created, KVM will add them to the switching fabric as untagged/unconfigured ports.
|
||
|
||
But would you look at that! The delta between the base image and the `pristine` snapshots is about 1MB of
|
||
configuration files, the ones that I generated and rsync'd in above, and then once the machine boots, it
|
||
will have a read/write mounted filesystem as per normal, except it's a delta on top of the snapshotted,
|
||
cloned dataset.
|
||
|
||
#### Destroy
|
||
|
||
I love destroying things! But in this case, I'm removing what are essentially ephemeral disk images, as
|
||
I still have the base image to clone from. But, the destroy is conceptually very simple:
|
||
|
||
```
|
||
BASE=${BASE:=ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221018-release}
|
||
LAB=${LAB:=0}
|
||
|
||
## Do not touch below this line
|
||
HVN="hvn${LAB}.lab.ipng.ch"
|
||
|
||
echo "* Destroying VMs"
|
||
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
|
||
virsh destroy \$VM; done"
|
||
|
||
echo "* Destroying ZFS datasets"
|
||
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
|
||
zfs destroy -r ssd-vol0/\$VM; done"
|
||
```
|
||
|
||
After running this, the VMs will be shutdown and their cloned filesystems (including any snapshots
|
||
those may have) are wiped. To get back into a working state, all I must do is run `./create` again!
|
||
|
||
#### Pristine
|
||
|
||
Sometimes though, I don't need to completely destroy the VMs, but rather I want to put them back into
|
||
the state they where just after creating the LAB. Luckily, the create made a snapshot (called `pristine`)
|
||
for each VM before booting it, so bringing the LAB back to _factory default_ settings is really easy:
|
||
|
||
```
|
||
BUILD=${BUILD:=default}
|
||
LAB=${LAB:=0}
|
||
|
||
## Do not touch below this line
|
||
LABDIR=/var/lab
|
||
STAGING=$LABDIR/staging
|
||
HVN="hvn${LAB}.lab.ipng.ch"
|
||
|
||
## Bring back into pristine state
|
||
echo "* Restarting VMs from pristine snapshot"
|
||
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
|
||
virsh destroy \$VM;
|
||
zfs rollback ssd-vol0/\${VM}@pristine;
|
||
virsh start \$VM; done"
|
||
|
||
echo "* Committing OVS config"
|
||
scp overlays/$BUILD/ovs-config.sh root@$HVN:$LABDIR
|
||
ssh root@$HVN "set -x; $LABDIR/ovs-config.sh"
|
||
```
|
||
|
||
## Results
|
||
|
||
After completing this project, I have a completely hands-off, automated and autogenerated, and very maneageable set
|
||
of three LABs, each booting up in a running OSPF/OSPFv3 enabled topology for IPv4 and IPv6:
|
||
|
||
```
|
||
pim@lab:~/src/lab$ traceroute -q1 vpp0-3
|
||
traceroute to vpp0-3 (192.168.10.3), 30 hops max, 60 byte packets
|
||
1 e0.vpp0-0.lab.ipng.ch (192.168.10.5) 1.752 ms
|
||
2 e0.vpp0-1.lab.ipng.ch (192.168.10.7) 4.064 ms
|
||
3 e0.vpp0-2.lab.ipng.ch (192.168.10.9) 5.178 ms
|
||
4 vpp0-3.lab.ipng.ch (192.168.10.3) 7.469 ms
|
||
pim@lab:~/src/lab$ ssh ipng@vpp0-3
|
||
|
||
ipng@vpp0-3:~$ traceroute6 -q1 vpp2-3
|
||
traceroute to vpp2-3 (2001:678:d78:220::3), 30 hops max, 80 byte packets
|
||
1 e1.vpp0-2.lab.ipng.ch (2001:678:d78:201::3:2) 2.088 ms
|
||
2 e1.vpp0-1.lab.ipng.ch (2001:678:d78:201::2:1) 6.958 ms
|
||
3 e1.vpp0-0.lab.ipng.ch (2001:678:d78:201::1:0) 8.841 ms
|
||
4 lab0.lab.ipng.ch (2001:678:d78:201::ffff) 7.381 ms
|
||
5 e0.vpp2-0.lab.ipng.ch (2001:678:d78:221::fffe) 8.304 ms
|
||
6 e0.vpp2-1.lab.ipng.ch (2001:678:d78:221::1:21) 11.633 ms
|
||
7 e0.vpp2-2.lab.ipng.ch (2001:678:d78:221::2:22) 13.704 ms
|
||
8 vpp2-3.lab.ipng.ch (2001:678:d78:220::3) 15.597 ms
|
||
```
|
||
|
||
If you read this far, thanks! Each of these three LABs come with 4x10Gbit DPDK based packet generators (Cisco T-Rex),
|
||
four VPP machines running either Bird2 or FRR, and together they are connected to a 100G capable switch.
|
||
|
||
**These LABs are for rent, and we offer hands-on training on them.** Please **[contact](/s/contact/)** us for
|
||
daily/weekly rates, and custom training sessions.
|
||
|
||
I checked the generator and deploy scripts in to a git repository, which I'm happy to share if there's
|
||
an interest. But because it contains a few implementation details and doesn't do a lot of fool-proofing, as well as
|
||
because most of this can be easily recreated by interested parties from this blogpost, I decided not to publish
|
||
the LAB project github, but on our private git.ipng.ch server instead. Mail us if you'd like to take a closer look,
|
||
I'm happy to share the code.
|