From 8918821413c14e5433d0e209ceb02ac585b46e36 Mon Sep 17 00:00:00 2001 From: Pim van Pelt Date: Sat, 3 May 2025 18:15:39 +0200 Subject: [PATCH] Add clab part 1 --- content/articles/2025-05-03-containerlab-1.md | 337 ++++++++++++++++++ static/assets/containerlab/containerlab.svg | 1 + 2 files changed, 338 insertions(+) create mode 100644 content/articles/2025-05-03-containerlab-1.md create mode 100644 static/assets/containerlab/containerlab.svg diff --git a/content/articles/2025-05-03-containerlab-1.md b/content/articles/2025-05-03-containerlab-1.md new file mode 100644 index 0000000..fa86d22 --- /dev/null +++ b/content/articles/2025-05-03-containerlab-1.md @@ -0,0 +1,337 @@ +--- +date: "2025-05-03T15:07:23Z" +title: 'VPP in Containerlab - Part 1' +--- + +{{< image float="right" src="/assets/containerlab/containerlab.svg" alt="Containerlab Logo" width="12em" >}} + +# Introduction + +From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in +AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance. +However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines +like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to +allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP +performance almost the same as on bare metal. But did you know that VPP can also run in Docker? + +The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa. +One of the presenters was Nardus le Roux of Nokia, and he showed off a project called +[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing +container-based networking labs. It starts the containers, builds a virtual wiring between them to +create lab topologies of users choice and manages labs lifecycle. + +Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise +to actually add them. Here I go, on a journey to integrate VPP into Containerlab! + +## Containerized VPP + +The folks at [[Tigera](https://www.tigera.io/project-calico/)] maintain a project called _Calico_, +which accelerates Kubernetes CNI (Container Network Interface) by using [[FD.io](https://fd.io)] +VPP. Since the origins of Kubernetes are to run containers in a Docker environment, it stands to +reason that it should be possible to run a containerized VPP. I start by reading up on how they +create their Docker image, and I learn a lot. + +### Docker Build + +Considering IPng runs bare metal Debian (currently Bookworm) machines, my Docker image will be based +on `debian:bookworm` as well. The build starts off quite modest: + +``` +pim@summer:~$ mkdir -p src/vpp-containerlab +pim@summer:~/src/vpp-containerlab$ cat < EOF > Dockerfile.bookworm +FROM debian:bookworm +ARG DEBIAN_FRONTEND=noninteractive +ARG VPP_INSTALL_SKIP_SYSCTL=true +ARG REPO=release +RUN apt-get update && apt-get -y install curl procps && apt-get clean + +# Install VPP +RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash +RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean + +CMD ["/usr/bin/vpp","-c","/etc/vpp/startup.conf"] +EOF +pim@summer:~/src/vpp-containerlab$ docker build -f Dockerfile.bookworm . -t pimvanpelt/vpp-containerlab +``` + +One gotcha - when I install the upstream VPP debian packages, they generate a `sysctl` file which it +tries to execute. However, I can't set sysctl's in the container, so the build fails. I take a look +at the VPP source code and find `src/pkg/debian/vpp.postinst` which helpfully contains a means to +override setting the sysctl's, using an environment variable called `VPP_INSTALL_SKIP_SYSCTL`. + +### Running VPP in Docker + +With the Docker image built, I need to tweak the VPP startup configuration a little bit, to allow it +to run well in a Docker environment. There are a few things I make note of: +1. We may not have huge pages on the host machine, so I'll set all the page sizes to the + linux-default 4kB rather than 2MB or 1GB hugepages. This creates a performance regression, but + in the case of Containerlab, we're not here to build high performance stuff, but rather users + will be doing functional testing. +1. DPDK requires either UIO of VFIO kernel drivers, so that it can bind its so-called _poll mode + driver_ to the network cards. It also requires huge pages. Since my first version will be + using only virtual ethernet interfaces, I'll disable DPDK and VFIO alltogether. +1. VPP can run any number of CPU worker threads. In its simplest form, I can also run it with only + one thread. Of course, this will not be a high performance setup, but since I'm already not + using hugepages, I'll use only 1 thread. + +The VPP `startup.conf` configuration file I came up with: + +``` +pim@summer:~/src/vpp-containerlab$ cat < EOF > clab-startup.conf +unix { + interactive + log /var/log/vpp/vpp.log + full-coredump + cli-listen /run/vpp/cli.sock + cli-prompt vpp-clab# + cli-no-pager + poll-sleep-usec 100 +} + +api-trace { + on +} + +memory { + main-heap-size 512M + main-heap-page-size 4k +} +buffers { + buffers-per-numa 16000 + default data-size 2048 + page-size 4k +} + +statseg { + size 64M + page-size 4k + per-node-counters on +} + +plugins { + plugin default { enable } + plugin dpdk_plugin.so { disable } +} +EOF +``` + +Just a couple of notes for those who are running VPP in production. Each of the `*-page-size` config +settings take the normal Linux pagesize of 4kB, which effectively avoids VPP from using anhy +hugepages. Then, I'll specifically disable the DPDK plugin, although I didn't install it in the +Dockerfile build, as it lives in its own dedicated Debian package called `vpp-plugin-dpdk`. Finally, +I'll make VPP use less CPU by telling it to sleep for 100 microseconds between each poll iteration. +In production environments, VPP will use 100% of the CPUs it's assigned, but in this lab, it will +not be quite as hungry. By the way, even in this sleepy mode, it'll still easily handle a gigabit +of traffic! + +Now, VPP wants to run as root and it needs a few host features, notably tuntap devices and vhost, +and a few capabilites, notably NET_ADMIN and SYS_PTRACE. I take a look at the +[[manpage](https://man7.org/linux/man-pages/man7/capabilities.7.html)]: +* ***CAP_SYS_NICE***: allows to set real-time scheduling, CPU affinity, I/O scheduling class, and + to migrate and move memory pages. +* ***CAP_NET_ADMIN***: allows to perform various network-relates operations like interface + configs, routing tables, nested network namespaces, multicast, set promiscuous mode, and so on. +* ***CAP_SYS_PTRACE***: allows to trace arbitrary processes using `ptrace(2)`, and a few related + kernel system calls. + +Being a networking dataplane implementation, VPP wants to be able to tinker with network devices. +This is not typically allowed in Docker containers, although the Docker developers did make some +consessions for those containers that need just that little bit more access. They described it in +their +[[docs](https://docs.docker.com/engine/containers/run/#runtime-privilege-and-linux-capabilities)] as +follows: + +| The --privileged flag gives all capabilities to the container. When the operator executes docker +| run --privileged, Docker enables access to all devices on the host, and reconfigures AppArmor or +| SELinux to allow the container nearly all the same access to the host as processes running outside +| containers on the host. Use this flag with caution. For more information about the --privileged +| flag, see the docker run reference. + +{{< image width="4em" float="left" src="/assets/shared/warning.png" alt="Warning" >}} +In this moment, I feel I should point out that running a Docker container with `--privileged` flag +set does give it _a lot_ of privileges. A container with `--privileged` is not a securely sandboxed +process. Containers in this mode can get a root shell on the host and take control over the system. + +With that little fineprint warning out of the way, I am going to Yolo like a boss: + +``` +pim@summer:~/src/vpp-containerlab$ docker run --name clab-pim \ + --cap-add=NET_ADMIN --cap-add=SYS_NICE --cap-add=SYS_PTRACE \ + --device=/dev/net/tun:/dev/net/tun --device=/dev/vhost-net:/dev/vhost-net \ + --privileged -v $(pwd)/clab-startup.conf:/etc/vpp/startup.conf:ro \ + docker.io/pimvanpelt/vpp-containerlab +clab-pim +``` + +### Configuring VPP in Docker + +And with that, the Docker container is running! I post a screenshot on +[[Mastodon](https://ublog.tech/@IPngNetworks/114392852468494211)] and my buddy John responds with a +polite but firm insistence that I explain myself. Here you go, buddy :) + +In another terminal, I can play around with this VPP instance a little bit: +``` +pim@summer:~$ docker exec -it clab-pim bash +root@d57c3716eee9:/# ip -br l +lo UNKNOWN 00:00:00:00:00:00 +eth0@if530566 UP 02:42:ac:11:00:02 + +root@d57c3716eee9:/# ps auxw +USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND +root 1 2.2 0.2 17498852 160300 ? Rs 15:11 0:00 /usr/bin/vpp -c /etc/vpp/startup.conf +root 10 0.0 0.0 4192 3388 pts/0 Ss 15:11 0:00 bash +root 18 0.0 0.0 8104 4056 pts/0 R+ 15:12 0:00 ps auxw + +root@d57c3716eee9:/# vppctl + _______ _ _ _____ ___ + __/ __/ _ \ (_)__ | | / / _ \/ _ \ + _/ _// // / / / _ \ | |/ / ___/ ___/ + /_/ /____(_)_/\___/ |___/_/ /_/ + +vpp-clab# show version +vpp v25.02-release built by root on d5cd2c304b7f at 2025-02-26T13:58:32 +vpp-clab# show interfaces + Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count +local0 0 down 0/0/0/0 +``` + +Slick! I can see that the container has an `eth0` device, which Docker has connected to the main +bridged network. For now, there's only one process running, pid 1 proudly shows VPP (as in Docker, +the `CMD` field will simply replace `init`. Later on, I can imagine running a few more daemons like +SSH and so on, but for now, I'm happy. + +Looking at VPP itself, it has no network interfaces yet, except for the default `local0` interface. + +### Adding Interfaces in Docker + +But if I don't have DPDK, how will I add interfaces? Enter `veth(4)`. From the +[[manpage](https://man7.org/linux/man-pages/man4/veth.4.html)], I learn that veth devices are +virtual Ethernet devices. They can act as tunnels between network namespaces to create a bridge to +a physical network device in another namespace, but can also be used as standalone network devices. +veth devices are always created in interconnected pairs. + +Of course, Docker users will recognize this. It's like bread and butter for containers to +communicate with one another - and with the host they're running on. I can simply create a Docker +network and attach one half of it to a running container, like so: + +``` +pim@summer:~$ docker network create --driver=bridge clab-network \ + --subnet 192.0.2.0/24 --ipv6 --subnet 2001:db8::/64 +5711b95c6c32ac0ed185a54f39e5af4b499677171ff3d00f99497034e09320d2 +pim@summer:~$ docker network connect clab-network clab-pim --ip '' --ip6 '' +``` + +The first command here creates a new network called `clab-network` in Docker. As a result, a new +bridge called `br-5711b95c6c32` shows up on the host. The bridge name is chosen from the UUID of the +Docker object. Seeing as I added an IPv4 and IPv6 subnet to the bridge, it gets configured with the +first address in both: + +``` +pim@summer:~/src/vpp-containerlab$ brctl show br-5711b95c6c32 +bridge name bridge id STP enabled interfaces +br-5711b95c6c32 8000.0242099728c6 no veth021e363 + + +pim@summer:~/src/vpp-containerlab$ ip -br a show dev br-5711b95c6c32 +br-5711b95c6c32 UP 192.0.2.1/24 2001:db8::1/64 fe80::42:9ff:fe97:28c6/64 fe80::1/64 +``` + +The second command creates a `veth` pair, and puts one half of it in the bridge, and this interface +is called `veth021e363` above. The other half of it pops up as `eth1` in the Docker container: + +``` +pim@summer:~/src/vpp-containerlab$ docker exec -it clab-pim bash +root@d57c3716eee9:/# ip -br l +lo UNKNOWN 00:00:00:00:00:00 +eth0@if530566 UP 02:42:ac:11:00:02 +eth1@if530577 UP 02:42:c0:00:02:02 +``` + +One of the many awesome features of VPP is its ability to attach to these `veth` devices by means of +its `af-packet` driver. I first take a look at the linux +[[manpage](https://man7.org/linux/man-pages/man7/packet.7.html)] for it, and then read up on the VPP +[[documentation](https://fd.io/docs/vpp/v2101/gettingstarted/progressivevpp/interface)] on the +topic. Armed with this knowledge, I can bind the container-side veth pair called `eth1` to VPP, like +so: + +``` +root@d57c3716eee9:/# vppctl + _______ _ _ _____ ___ + __/ __/ _ \ (_)__ | | / / _ \/ _ \ + _/ _// // / / / _ \ | |/ / ___/ ___/ + /_/ /____(_)_/\___/ |___/_/ /_/ + +vpp-clab# create host-interface v2 name eth1 +vpp-clab# set interface name host-eth1 eth1 +vpp-clab# set interface mtu 1500 eth1 +vpp-clab# set interface ip address eth1 192.0.2.2/24 +vpp-clab# set interface ip address eth1 2001:db8::2/64 +vpp-clab# set interface state eth1 up +vpp-clab# show int addr +eth1 (up): + L3 192.0.2.2/24 + L3 2001:db8::2/64 +local0 (dn): +``` + +## Results + +After all this work, I've successfully created a Docker image based on Debian Bookworm and VPP 25.02 +(the current stable release version), started a container with it, added a network bridge in Docker, +which binds the host `summer` to the container. Proof, as they say, is in the ping-pudding: + +``` +pim@summer:~/src/vpp-containerlab$ ping -c5 2001:db8::2 +PING 2001:db8::2(2001:db8::2) 56 data bytes +64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.113 ms +64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.056 ms +64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.202 ms +64 bytes from 2001:db8::2: icmp_seq=4 ttl=64 time=0.102 ms +64 bytes from 2001:db8::2: icmp_seq=5 ttl=64 time=0.100 ms + +--- 2001:db8::2 ping statistics --- +5 packets transmitted, 5 received, 0% packet loss, time 4098ms +rtt min/avg/max/mdev = 0.056/0.114/0.202/0.047 ms +pim@summer:~/src/vpp-containerlab$ ping -c5 192.0.2.2 +PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data. +64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.043 ms +64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.032 ms +64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.019 ms +64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.041 ms +64 bytes from 192.0.2.2: icmp_seq=5 ttl=64 time=0.027 ms + +--- 192.0.2.2 ping statistics --- +5 packets transmitted, 5 received, 0% packet loss, time 4063ms +rtt min/avg/max/mdev = 0.019/0.032/0.043/0.008 ms +``` +## What's Next + +This was a nice exercise for me! I'm going this direction becaue the +[[Containerlab](https://containerlab.dev)] framework will start containers with given NOS images, +not too dissimilar from the one I just made, and then attaches `veth` pairs between the containers. +I started dabbling with a [[pull-request](https://github.com/srl-labs/containerlab/pull/2569)], but +I got stuck with a part of the Containerlab code that pre-deploys config files into the containers. +You see, I will need to generate two files: + +1. A `startup.conf` file that is specific to the containerlab Docker container. I'd like them to + each set their own hostname so that the CLI has a unique prompt. I can do this by setting `unix + { cli-prompt {{ .ShortName }}# }` in the template renderer. +1. Containerlab will know all of the veth pairs that are planned to be created into each VPP + container. I'll need it to then write a little snippet of config that does the `create + host-interface` spiel, to attach these `veth` pairs to the VPP dataplane. + +I reached out to Roman from Nokia, who is one of the authors and current maintainer of Containerlab. +Roman was keen to help out, and seeing as he knows the COntainerlab stuff well, and I know the VPP +stuff well, this is a reasonable partnership! Soon, he and I plan to have a bare-bones setup that +will connect a few VPP containers together with an SR Linux node in a lab. Stand by! + +Once we have that, there's still quite some work for me to do. Notably: +* Configuration persistence. `clab` allows you to save the running config. For that, I'll need to + introduce [[vppcfg](https://github.com/pimvanpelt/vppcfg.git)] and a means to invoke it when + the lab operator wants to save their config, and then reconfigure VPP when the container + restarts. +* I'll need to have a few files from `clab` shared with the host, notably the `startup.conf` and + `vppcfg.yaml`, as well as some manual pre- and post-flight configuration for the more esoteric + stuff. Building the plumbing for this is a TODO for now. + +First order of business: get it to ping at all :) diff --git a/static/assets/containerlab/containerlab.svg b/static/assets/containerlab/containerlab.svg new file mode 100644 index 0000000..c26dfc6 --- /dev/null +++ b/static/assets/containerlab/containerlab.svg @@ -0,0 +1 @@ + \ No newline at end of file