Add clab part 1
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
2025-05-03 18:15:39 +02:00
parent af68c1ec3b
commit 8918821413
2 changed files with 338 additions and 0 deletions

View File

@ -0,0 +1,337 @@
---
date: "2025-05-03T15:07:23Z"
title: 'VPP in Containerlab - Part 1'
---
{{< image float="right" src="/assets/containerlab/containerlab.svg" alt="Containerlab Logo" width="12em" >}}
# Introduction
From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in
AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance.
However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines
like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to
allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP
performance almost the same as on bare metal. But did you know that VPP can also run in Docker?
The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa.
One of the presenters was Nardus le Roux of Nokia, and he showed off a project called
[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing
container-based networking labs. It starts the containers, builds a virtual wiring between them to
create lab topologies of users choice and manages labs lifecycle.
Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise
to actually add them. Here I go, on a journey to integrate VPP into Containerlab!
## Containerized VPP
The folks at [[Tigera](https://www.tigera.io/project-calico/)] maintain a project called _Calico_,
which accelerates Kubernetes CNI (Container Network Interface) by using [[FD.io](https://fd.io)]
VPP. Since the origins of Kubernetes are to run containers in a Docker environment, it stands to
reason that it should be possible to run a containerized VPP. I start by reading up on how they
create their Docker image, and I learn a lot.
### Docker Build
Considering IPng runs bare metal Debian (currently Bookworm) machines, my Docker image will be based
on `debian:bookworm` as well. The build starts off quite modest:
```
pim@summer:~$ mkdir -p src/vpp-containerlab
pim@summer:~/src/vpp-containerlab$ cat < EOF > Dockerfile.bookworm
FROM debian:bookworm
ARG DEBIAN_FRONTEND=noninteractive
ARG VPP_INSTALL_SKIP_SYSCTL=true
ARG REPO=release
RUN apt-get update && apt-get -y install curl procps && apt-get clean
# Install VPP
RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash
RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean
CMD ["/usr/bin/vpp","-c","/etc/vpp/startup.conf"]
EOF
pim@summer:~/src/vpp-containerlab$ docker build -f Dockerfile.bookworm . -t pimvanpelt/vpp-containerlab
```
One gotcha - when I install the upstream VPP debian packages, they generate a `sysctl` file which it
tries to execute. However, I can't set sysctl's in the container, so the build fails. I take a look
at the VPP source code and find `src/pkg/debian/vpp.postinst` which helpfully contains a means to
override setting the sysctl's, using an environment variable called `VPP_INSTALL_SKIP_SYSCTL`.
### Running VPP in Docker
With the Docker image built, I need to tweak the VPP startup configuration a little bit, to allow it
to run well in a Docker environment. There are a few things I make note of:
1. We may not have huge pages on the host machine, so I'll set all the page sizes to the
linux-default 4kB rather than 2MB or 1GB hugepages. This creates a performance regression, but
in the case of Containerlab, we're not here to build high performance stuff, but rather users
will be doing functional testing.
1. DPDK requires either UIO of VFIO kernel drivers, so that it can bind its so-called _poll mode
driver_ to the network cards. It also requires huge pages. Since my first version will be
using only virtual ethernet interfaces, I'll disable DPDK and VFIO alltogether.
1. VPP can run any number of CPU worker threads. In its simplest form, I can also run it with only
one thread. Of course, this will not be a high performance setup, but since I'm already not
using hugepages, I'll use only 1 thread.
The VPP `startup.conf` configuration file I came up with:
```
pim@summer:~/src/vpp-containerlab$ cat < EOF > clab-startup.conf
unix {
interactive
log /var/log/vpp/vpp.log
full-coredump
cli-listen /run/vpp/cli.sock
cli-prompt vpp-clab#
cli-no-pager
poll-sleep-usec 100
}
api-trace {
on
}
memory {
main-heap-size 512M
main-heap-page-size 4k
}
buffers {
buffers-per-numa 16000
default data-size 2048
page-size 4k
}
statseg {
size 64M
page-size 4k
per-node-counters on
}
plugins {
plugin default { enable }
plugin dpdk_plugin.so { disable }
}
EOF
```
Just a couple of notes for those who are running VPP in production. Each of the `*-page-size` config
settings take the normal Linux pagesize of 4kB, which effectively avoids VPP from using anhy
hugepages. Then, I'll specifically disable the DPDK plugin, although I didn't install it in the
Dockerfile build, as it lives in its own dedicated Debian package called `vpp-plugin-dpdk`. Finally,
I'll make VPP use less CPU by telling it to sleep for 100 microseconds between each poll iteration.
In production environments, VPP will use 100% of the CPUs it's assigned, but in this lab, it will
not be quite as hungry. By the way, even in this sleepy mode, it'll still easily handle a gigabit
of traffic!
Now, VPP wants to run as root and it needs a few host features, notably tuntap devices and vhost,
and a few capabilites, notably NET_ADMIN and SYS_PTRACE. I take a look at the
[[manpage](https://man7.org/linux/man-pages/man7/capabilities.7.html)]:
* ***CAP_SYS_NICE***: allows to set real-time scheduling, CPU affinity, I/O scheduling class, and
to migrate and move memory pages.
* ***CAP_NET_ADMIN***: allows to perform various network-relates operations like interface
configs, routing tables, nested network namespaces, multicast, set promiscuous mode, and so on.
* ***CAP_SYS_PTRACE***: allows to trace arbitrary processes using `ptrace(2)`, and a few related
kernel system calls.
Being a networking dataplane implementation, VPP wants to be able to tinker with network devices.
This is not typically allowed in Docker containers, although the Docker developers did make some
consessions for those containers that need just that little bit more access. They described it in
their
[[docs](https://docs.docker.com/engine/containers/run/#runtime-privilege-and-linux-capabilities)] as
follows:
| The --privileged flag gives all capabilities to the container. When the operator executes docker
| run --privileged, Docker enables access to all devices on the host, and reconfigures AppArmor or
| SELinux to allow the container nearly all the same access to the host as processes running outside
| containers on the host. Use this flag with caution. For more information about the --privileged
| flag, see the docker run reference.
{{< image width="4em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
In this moment, I feel I should point out that running a Docker container with `--privileged` flag
set does give it _a lot_ of privileges. A container with `--privileged` is not a securely sandboxed
process. Containers in this mode can get a root shell on the host and take control over the system.
With that little fineprint warning out of the way, I am going to Yolo like a boss:
```
pim@summer:~/src/vpp-containerlab$ docker run --name clab-pim \
--cap-add=NET_ADMIN --cap-add=SYS_NICE --cap-add=SYS_PTRACE \
--device=/dev/net/tun:/dev/net/tun --device=/dev/vhost-net:/dev/vhost-net \
--privileged -v $(pwd)/clab-startup.conf:/etc/vpp/startup.conf:ro \
docker.io/pimvanpelt/vpp-containerlab
clab-pim
```
### Configuring VPP in Docker
And with that, the Docker container is running! I post a screenshot on
[[Mastodon](https://ublog.tech/@IPngNetworks/114392852468494211)] and my buddy John responds with a
polite but firm insistence that I explain myself. Here you go, buddy :)
In another terminal, I can play around with this VPP instance a little bit:
```
pim@summer:~$ docker exec -it clab-pim bash
root@d57c3716eee9:/# ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0@if530566 UP 02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
root@d57c3716eee9:/# ps auxw
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 2.2 0.2 17498852 160300 ? Rs 15:11 0:00 /usr/bin/vpp -c /etc/vpp/startup.conf
root 10 0.0 0.0 4192 3388 pts/0 Ss 15:11 0:00 bash
root 18 0.0 0.0 8104 4056 pts/0 R+ 15:12 0:00 ps auxw
root@d57c3716eee9:/# vppctl
_______ _ _ _____ ___
__/ __/ _ \ (_)__ | | / / _ \/ _ \
_/ _// // / / / _ \ | |/ / ___/ ___/
/_/ /____(_)_/\___/ |___/_/ /_/
vpp-clab# show version
vpp v25.02-release built by root on d5cd2c304b7f at 2025-02-26T13:58:32
vpp-clab# show interfaces
Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count
local0 0 down 0/0/0/0
```
Slick! I can see that the container has an `eth0` device, which Docker has connected to the main
bridged network. For now, there's only one process running, pid 1 proudly shows VPP (as in Docker,
the `CMD` field will simply replace `init`. Later on, I can imagine running a few more daemons like
SSH and so on, but for now, I'm happy.
Looking at VPP itself, it has no network interfaces yet, except for the default `local0` interface.
### Adding Interfaces in Docker
But if I don't have DPDK, how will I add interfaces? Enter `veth(4)`. From the
[[manpage](https://man7.org/linux/man-pages/man4/veth.4.html)], I learn that veth devices are
virtual Ethernet devices. They can act as tunnels between network namespaces to create a bridge to
a physical network device in another namespace, but can also be used as standalone network devices.
veth devices are always created in interconnected pairs.
Of course, Docker users will recognize this. It's like bread and butter for containers to
communicate with one another - and with the host they're running on. I can simply create a Docker
network and attach one half of it to a running container, like so:
```
pim@summer:~$ docker network create --driver=bridge clab-network \
--subnet 192.0.2.0/24 --ipv6 --subnet 2001:db8::/64
5711b95c6c32ac0ed185a54f39e5af4b499677171ff3d00f99497034e09320d2
pim@summer:~$ docker network connect clab-network clab-pim --ip '' --ip6 ''
```
The first command here creates a new network called `clab-network` in Docker. As a result, a new
bridge called `br-5711b95c6c32` shows up on the host. The bridge name is chosen from the UUID of the
Docker object. Seeing as I added an IPv4 and IPv6 subnet to the bridge, it gets configured with the
first address in both:
```
pim@summer:~/src/vpp-containerlab$ brctl show br-5711b95c6c32
bridge name bridge id STP enabled interfaces
br-5711b95c6c32 8000.0242099728c6 no veth021e363
pim@summer:~/src/vpp-containerlab$ ip -br a show dev br-5711b95c6c32
br-5711b95c6c32 UP 192.0.2.1/24 2001:db8::1/64 fe80::42:9ff:fe97:28c6/64 fe80::1/64
```
The second command creates a `veth` pair, and puts one half of it in the bridge, and this interface
is called `veth021e363` above. The other half of it pops up as `eth1` in the Docker container:
```
pim@summer:~/src/vpp-containerlab$ docker exec -it clab-pim bash
root@d57c3716eee9:/# ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0@if530566 UP 02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
eth1@if530577 UP 02:42:c0:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
```
One of the many awesome features of VPP is its ability to attach to these `veth` devices by means of
its `af-packet` driver. I first take a look at the linux
[[manpage](https://man7.org/linux/man-pages/man7/packet.7.html)] for it, and then read up on the VPP
[[documentation](https://fd.io/docs/vpp/v2101/gettingstarted/progressivevpp/interface)] on the
topic. Armed with this knowledge, I can bind the container-side veth pair called `eth1` to VPP, like
so:
```
root@d57c3716eee9:/# vppctl
_______ _ _ _____ ___
__/ __/ _ \ (_)__ | | / / _ \/ _ \
_/ _// // / / / _ \ | |/ / ___/ ___/
/_/ /____(_)_/\___/ |___/_/ /_/
vpp-clab# create host-interface v2 name eth1
vpp-clab# set interface name host-eth1 eth1
vpp-clab# set interface mtu 1500 eth1
vpp-clab# set interface ip address eth1 192.0.2.2/24
vpp-clab# set interface ip address eth1 2001:db8::2/64
vpp-clab# set interface state eth1 up
vpp-clab# show int addr
eth1 (up):
L3 192.0.2.2/24
L3 2001:db8::2/64
local0 (dn):
```
## Results
After all this work, I've successfully created a Docker image based on Debian Bookworm and VPP 25.02
(the current stable release version), started a container with it, added a network bridge in Docker,
which binds the host `summer` to the container. Proof, as they say, is in the ping-pudding:
```
pim@summer:~/src/vpp-containerlab$ ping -c5 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.202 ms
64 bytes from 2001:db8::2: icmp_seq=4 ttl=64 time=0.102 ms
64 bytes from 2001:db8::2: icmp_seq=5 ttl=64 time=0.100 ms
--- 2001:db8::2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4098ms
rtt min/avg/max/mdev = 0.056/0.114/0.202/0.047 ms
pim@summer:~/src/vpp-containerlab$ ping -c5 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.043 ms
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.019 ms
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.041 ms
64 bytes from 192.0.2.2: icmp_seq=5 ttl=64 time=0.027 ms
--- 192.0.2.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4063ms
rtt min/avg/max/mdev = 0.019/0.032/0.043/0.008 ms
```
## What's Next
This was a nice exercise for me! I'm going this direction becaue the
[[Containerlab](https://containerlab.dev)] framework will start containers with given NOS images,
not too dissimilar from the one I just made, and then attaches `veth` pairs between the containers.
I started dabbling with a [[pull-request](https://github.com/srl-labs/containerlab/pull/2569)], but
I got stuck with a part of the Containerlab code that pre-deploys config files into the containers.
You see, I will need to generate two files:
1. A `startup.conf` file that is specific to the containerlab Docker container. I'd like them to
each set their own hostname so that the CLI has a unique prompt. I can do this by setting `unix
{ cli-prompt {{ .ShortName }}# }` in the template renderer.
1. Containerlab will know all of the veth pairs that are planned to be created into each VPP
container. I'll need it to then write a little snippet of config that does the `create
host-interface` spiel, to attach these `veth` pairs to the VPP dataplane.
I reached out to Roman from Nokia, who is one of the authors and current maintainer of Containerlab.
Roman was keen to help out, and seeing as he knows the COntainerlab stuff well, and I know the VPP
stuff well, this is a reasonable partnership! Soon, he and I plan to have a bare-bones setup that
will connect a few VPP containers together with an SR Linux node in a lab. Stand by!
Once we have that, there's still quite some work for me to do. Notably:
* Configuration persistence. `clab` allows you to save the running config. For that, I'll need to
introduce [[vppcfg](https://github.com/pimvanpelt/vppcfg.git)] and a means to invoke it when
the lab operator wants to save their config, and then reconfigure VPP when the container
restarts.
* I'll need to have a few files from `clab` shared with the host, notably the `startup.conf` and
`vppcfg.yaml`, as well as some manual pre- and post-flight configuration for the more esoteric
stuff. Building the plumbing for this is a TODO for now.
First order of business: get it to ping at all :)

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 21 KiB