Add clab part 1

2025-05-03 18:15:39 +02:00
parent af68c1ec3b
commit 8918821413
2 changed files with 338 additions and 0 deletions
--- a/content/articles/2025-05-03-containerlab-1.md
+++ b/content/articles/2025-05-03-containerlab-1.md
@@ -0,0 +1,337 @@
+---
+date: "2025-05-03T15:07:23Z"
+title: 'VPP in Containerlab - Part 1'
+---
+
+{{< image float="right" src="/assets/containerlab/containerlab.svg" alt="Containerlab Logo" width="12em" >}}
+
+# Introduction
+
+From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in
+AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance.
+However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines
+like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to
+allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP
+performance almost the same as on bare metal. But did you know that VPP can also run in Docker?
+
+The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa.
+One of the presenters was Nardus le Roux of Nokia, and he showed off a project called
+[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing
+container-based networking labs. It starts the containers, builds a virtual wiring between them to
+create lab topologies of users choice and manages labs lifecycle.
+
+Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise
+to actually add them. Here I go, on a journey to integrate VPP into Containerlab!
+
+## Containerized VPP
+
+The folks at [[Tigera](https://www.tigera.io/project-calico/)] maintain a project called _Calico_,
+which accelerates Kubernetes CNI (Container Network Interface) by using [[FD.io](https://fd.io)]
+VPP. Since the origins of Kubernetes are to run containers in a Docker environment, it stands to
+reason that it should be possible to run a containerized VPP. I start by reading up on how they
+create their Docker image, and I learn a lot.
+
+### Docker Build
+
+Considering IPng runs bare metal Debian (currently Bookworm) machines, my Docker image will be based
+on `debian:bookworm` as well. The build starts off quite modest:
+
+```
+pim@summer:~$ mkdir -p src/vpp-containerlab
+pim@summer:~/src/vpp-containerlab$ cat < EOF > Dockerfile.bookworm
+FROM debian:bookworm
+ARG DEBIAN_FRONTEND=noninteractive
+ARG VPP_INSTALL_SKIP_SYSCTL=true
+ARG REPO=release
+RUN apt-get update && apt-get -y install curl procps && apt-get clean
+
+# Install VPP
+RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash
+RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean
+
+CMD ["/usr/bin/vpp","-c","/etc/vpp/startup.conf"]
+EOF
+pim@summer:~/src/vpp-containerlab$ docker build -f Dockerfile.bookworm . -t pimvanpelt/vpp-containerlab
+```
+
+One gotcha - when I install the upstream VPP debian packages, they generate a `sysctl` file which it
+tries to execute. However, I can't set sysctl's in the container, so the build fails. I take a look
+at the VPP source code and find `src/pkg/debian/vpp.postinst` which helpfully contains a means to
+override setting the sysctl's, using an environment variable called `VPP_INSTALL_SKIP_SYSCTL`.
+
+### Running VPP in Docker
+
+With the Docker image built, I need to tweak the VPP startup configuration a little bit, to allow it
+to run well in a Docker environment. There are a few things I make note of:
+1.   We may not have huge pages on the host machine, so I'll set all the page sizes to the
+     linux-default 4kB rather than 2MB or 1GB hugepages. This creates a performance regression, but
+     in the case of Containerlab, we're not here to build high performance stuff, but rather users
+     will be doing functional testing.
+1.   DPDK requires either UIO of VFIO kernel drivers, so that it can bind its so-called _poll mode
+     driver_  to the network cards. It also requires huge pages. Since my first version will be
+     using only virtual ethernet interfaces, I'll disable DPDK and VFIO alltogether.
+1.   VPP can run any number of CPU worker threads. In its simplest form, I can also run it with only
+     one thread. Of course, this will not be a high performance setup, but since I'm already not
+     using hugepages, I'll use only 1 thread.
+
+The VPP `startup.conf` configuration file I came up with:
+
+```
+pim@summer:~/src/vpp-containerlab$ cat < EOF > clab-startup.conf
+unix {
+  interactive
+  log /var/log/vpp/vpp.log
+  full-coredump
+  cli-listen /run/vpp/cli.sock
+  cli-prompt vpp-clab#
+  cli-no-pager
+  poll-sleep-usec 100
+}
+
+api-trace {
+  on
+}
+
+memory {
+  main-heap-size 512M
+  main-heap-page-size 4k
+}
+buffers {
+  buffers-per-numa 16000
+  default data-size 2048
+  page-size 4k
+}
+
+statseg {
+  size 64M
+  page-size 4k
+  per-node-counters on
+}
+
+plugins {
+  plugin default { enable }
+  plugin dpdk_plugin.so { disable }
+}
+EOF
+```
+
+Just a couple of notes for those who are running VPP in production. Each of the `*-page-size` config
+settings take the normal Linux pagesize of 4kB, which effectively avoids VPP from using anhy
+hugepages. Then, I'll specifically disable the DPDK plugin, although I didn't install it in the
+Dockerfile build, as it lives in its own dedicated Debian package called `vpp-plugin-dpdk`. Finally,
+I'll make VPP use less CPU by telling it to sleep for 100 microseconds between each poll iteration.
+In production environments, VPP will use 100% of the CPUs it's assigned, but in this lab, it will
+not be quite as hungry. By the way, even in this sleepy mode, it'll still easily handle a gigabit
+of traffic!
+
+Now, VPP wants to run as root and it needs a few host features, notably tuntap devices and vhost,
+and a few capabilites, notably NET_ADMIN and SYS_PTRACE. I take a look at the
+[[manpage](https://man7.org/linux/man-pages/man7/capabilities.7.html)]:
+*   ***CAP_SYS_NICE***: allows to set real-time scheduling, CPU affinity, I/O scheduling class, and
+    to migrate and move memory pages.
+*   ***CAP_NET_ADMIN***: allows to perform various network-relates operations like interface
+    configs, routing tables, nested network namespaces, multicast, set promiscuous mode, and so on.
+*   ***CAP_SYS_PTRACE***: allows to trace arbitrary processes using `ptrace(2)`, and a few related
+    kernel system calls.
+
+Being a networking dataplane implementation, VPP wants to be able to tinker with network devices.
+This is not typically allowed in Docker containers, although the Docker developers did make some
+consessions for those containers that need just that little bit more access. They described it in
+their
+[[docs](https://docs.docker.com/engine/containers/run/#runtime-privilege-and-linux-capabilities)] as
+follows:
+
+| The --privileged flag gives all capabilities to the container. When the operator executes docker
+| run --privileged, Docker enables access to all devices on the host, and reconfigures AppArmor or
+| SELinux to allow the container nearly all the same access to the host as processes running outside
+| containers on the host. Use this flag with caution. For more information about the --privileged
+| flag, see the docker run reference.
+
+{{< image width="4em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
+In this moment, I feel I should point out that running a Docker container with `--privileged` flag
+set does give it _a lot_ of privileges. A container with `--privileged` is not a securely sandboxed
+process. Containers in this mode can get a root shell on the host and take control over the system.
+
+With that little fineprint warning out of the way, I am going to Yolo like a boss:
+
+```
+pim@summer:~/src/vpp-containerlab$ docker run --name clab-pim \
+                --cap-add=NET_ADMIN --cap-add=SYS_NICE --cap-add=SYS_PTRACE \
+                --device=/dev/net/tun:/dev/net/tun --device=/dev/vhost-net:/dev/vhost-net \
+                --privileged -v $(pwd)/clab-startup.conf:/etc/vpp/startup.conf:ro \
+                docker.io/pimvanpelt/vpp-containerlab
+clab-pim
+```
+
+### Configuring VPP in Docker
+
+And with that, the Docker container is running! I post a screenshot on
+[[Mastodon](https://ublog.tech/@IPngNetworks/114392852468494211)] and my buddy John responds with a
+polite but firm insistence that I explain myself. Here you go, buddy :)
+
+In another terminal, I can play around with this VPP instance a little bit:
+```
+pim@summer:~$ docker exec -it clab-pim bash
+root@d57c3716eee9:/# ip -br l
+lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
+eth0@if530566    UP             02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 
+
+root@d57c3716eee9:/# ps auxw   
+USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
+root           1  2.2  0.2 17498852 160300 ?     Rs   15:11   0:00 /usr/bin/vpp -c /etc/vpp/startup.conf
+root          10  0.0  0.0   4192  3388 pts/0    Ss   15:11   0:00 bash
+root          18  0.0  0.0   8104  4056 pts/0    R+   15:12   0:00 ps auxw
+
+root@d57c3716eee9:/# vppctl
+    _______    _        _   _____  ___ 
+ __/ __/ _ \  (_)__    | | / / _ \/ _ \
+ _/ _// // / / / _ \   | |/ / ___/ ___/
+ /_/ /____(_)_/\___/   |___/_/  /_/    
+
+vpp-clab# show version
+vpp v25.02-release built by root on d5cd2c304b7f at 2025-02-26T13:58:32
+vpp-clab# show interfaces
+              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          Count     
+local0                            0     down          0/0/0/0       
+```
+
+Slick! I can see that the container has an `eth0` device, which Docker has connected to the main
+bridged network. For now, there's only one process running, pid 1 proudly shows VPP (as in Docker,
+the `CMD` field will simply replace `init`. Later on, I can imagine running a few more daemons like
+SSH and so on, but for now, I'm happy.
+
+Looking at VPP itself, it has no network interfaces yet, except for the default `local0` interface.
+
+### Adding Interfaces in Docker
+
+But if I don't have DPDK, how will I add interfaces? Enter `veth(4)`. From the
+[[manpage](https://man7.org/linux/man-pages/man4/veth.4.html)], I learn that veth devices are
+virtual Ethernet devices.  They can act as tunnels between network namespaces to create a bridge to
+a physical network device in another namespace, but can also be used as standalone network devices.
+veth devices are always created in interconnected pairs.
+
+Of course, Docker users will recognize this. It's like bread and butter for containers to
+communicate with one another - and with the host they're running on. I can simply create a Docker
+network and attach one half of it to a running container, like so:
+
+```
+pim@summer:~$ docker network create --driver=bridge clab-network \
+                     --subnet 192.0.2.0/24 --ipv6 --subnet 2001:db8::/64
+5711b95c6c32ac0ed185a54f39e5af4b499677171ff3d00f99497034e09320d2
+pim@summer:~$ docker network connect clab-network clab-pim --ip '' --ip6 ''
+```
+
+The first command here creates a new network called `clab-network` in Docker. As a result, a new
+bridge called `br-5711b95c6c32` shows up on the host. The bridge name is chosen from the UUID of the
+Docker object. Seeing as I added an IPv4 and IPv6 subnet to the bridge, it gets configured with the
+first address in both:
+
+```
+pim@summer:~/src/vpp-containerlab$ brctl show br-5711b95c6c32
+bridge name       bridge id               STP enabled     interfaces
+br-5711b95c6c32   8000.0242099728c6       no              veth021e363
+
+
+pim@summer:~/src/vpp-containerlab$ ip -br a show dev br-5711b95c6c32
+br-5711b95c6c32  UP     192.0.2.1/24 2001:db8::1/64 fe80::42:9ff:fe97:28c6/64 fe80::1/64 
+```
+
+The second command creates a `veth` pair, and puts one half of it in the bridge, and this interface
+is called `veth021e363` above. The other half of it pops up as `eth1` in the Docker container:
+
+```
+pim@summer:~/src/vpp-containerlab$ docker exec -it clab-pim bash
+root@d57c3716eee9:/# ip -br l
+lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
+eth0@if530566    UP             02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 
+eth1@if530577    UP             02:42:c0:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 
+```
+
+One of the many awesome features of VPP is its ability to attach to these `veth` devices by means of
+its `af-packet` driver. I first take a look at the linux
+[[manpage](https://man7.org/linux/man-pages/man7/packet.7.html)] for it, and then read up on the VPP
+[[documentation](https://fd.io/docs/vpp/v2101/gettingstarted/progressivevpp/interface)] on the
+topic. Armed with this knowledge, I can bind the container-side veth pair called `eth1` to VPP, like
+so:
+
+```
+root@d57c3716eee9:/# vppctl
+    _______    _        _   _____  ___ 
+ __/ __/ _ \  (_)__    | | / / _ \/ _ \
+ _/ _// // / / / _ \   | |/ / ___/ ___/
+ /_/ /____(_)_/\___/   |___/_/  /_/    
+
+vpp-clab# create host-interface v2 name eth1
+vpp-clab# set interface name host-eth1 eth1
+vpp-clab# set interface mtu 1500 eth1
+vpp-clab# set interface ip address eth1 192.0.2.2/24
+vpp-clab# set interface ip address eth1 2001:db8::2/64
+vpp-clab# set interface state eth1 up
+vpp-clab# show int addr
+eth1 (up):
+  L3 192.0.2.2/24
+  L3 2001:db8::2/64
+local0 (dn):
+```
+
+## Results
+
+After all this work, I've successfully created a Docker image based on Debian Bookworm and VPP 25.02
+(the current stable release version), started a container with it, added a network bridge in Docker,
+which binds the host `summer` to the container. Proof, as they say, is in the ping-pudding:
+
+```
+pim@summer:~/src/vpp-containerlab$ ping -c5 2001:db8::2
+PING 2001:db8::2(2001:db8::2) 56 data bytes
+64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.113 ms
+64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.056 ms
+64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.202 ms
+64 bytes from 2001:db8::2: icmp_seq=4 ttl=64 time=0.102 ms
+64 bytes from 2001:db8::2: icmp_seq=5 ttl=64 time=0.100 ms
+
+--- 2001:db8::2 ping statistics ---
+5 packets transmitted, 5 received, 0% packet loss, time 4098ms
+rtt min/avg/max/mdev = 0.056/0.114/0.202/0.047 ms
+pim@summer:~/src/vpp-containerlab$ ping -c5 192.0.2.2
+PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
+64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.043 ms
+64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.032 ms
+64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.019 ms
+64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.041 ms
+64 bytes from 192.0.2.2: icmp_seq=5 ttl=64 time=0.027 ms
+
+--- 192.0.2.2 ping statistics ---
+5 packets transmitted, 5 received, 0% packet loss, time 4063ms
+rtt min/avg/max/mdev = 0.019/0.032/0.043/0.008 ms
+```
+## What's Next
+
+This was a nice exercise for me! I'm going this direction becaue the
+[[Containerlab](https://containerlab.dev)] framework will start containers with given NOS images, 
+not too dissimilar from the one I just made, and then attaches `veth` pairs between the containers.
+I started dabbling with a [[pull-request](https://github.com/srl-labs/containerlab/pull/2569)], but
+I got stuck with a part of the Containerlab code that pre-deploys config files into the containers.
+You see, I will need to generate two files:
+
+1.   A `startup.conf` file that is specific to the containerlab Docker container. I'd like them to
+     each set their own hostname so that the CLI has a unique prompt. I can do this by setting `unix
+     { cli-prompt {{ .ShortName }}# }` in the template renderer.
+1.   Containerlab will know all of the veth pairs that are planned to be created into each VPP
+     container. I'll need it to then write a little snippet of config that does the `create
+     host-interface` spiel, to attach these `veth` pairs to the VPP dataplane.
+
+I reached out to Roman from Nokia, who is one of the authors and current maintainer of Containerlab.
+Roman was keen to help out, and seeing as he knows the COntainerlab stuff well, and I know the VPP
+stuff well, this is a reasonable partnership! Soon, he and I plan to have a bare-bones setup that
+will connect a few VPP containers together with an SR Linux node in a lab. Stand by!
+
+Once we have that, there's still quite some work for me to do. Notably:
+*    Configuration persistence. `clab` allows you to save the running config. For that, I'll need to
+     introduce [[vppcfg](https://github.com/pimvanpelt/vppcfg.git)] and a means to invoke it when
+     the lab operator wants to save their config, and then reconfigure VPP when the container
+     restarts.
+*    I'll need to have a few files from `clab` shared with the host, notably the `startup.conf` and
+     `vppcfg.yaml`, as well as some manual pre- and post-flight configuration for the more esoteric
+     stuff. Building the plumbing for this is a TODO for now.
+
+First order of business: get it to ping at all :)
--- a/static/assets/containerlab/containerlab.svg
+++ b/static/assets/containerlab/containerlab.svg