---
date: "2025-05-03T15:07:23Z"
title: 'VPP in Containerlab - Part 1'
---

{{< image float="right" src="/assets/containerlab/containerlab.svg" alt="Containerlab Logo" width="12em" >}}

# Introduction

From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in
AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance.
However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines
like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to
allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP
performance almost the same as on bare metal. But did you know that VPP can also run in Docker?

The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa.
One of the presenters was Nardus le Roux of Nokia, and he showed off a project called
[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing
container-based networking labs. It starts the containers, builds a virtual wiring between them to
create lab topologies of users choice and manages labs lifecycle.

Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise
to actually add them. Here I go, on a journey to integrate VPP into Containerlab!

## Containerized VPP

The folks at [[Tigera](https://www.tigera.io/project-calico/)] maintain a project called _Calico_,
which accelerates Kubernetes CNI (Container Network Interface) by using [[FD.io](https://fd.io)]
VPP. Since the origins of Kubernetes are to run containers in a Docker environment, it stands to
reason that it should be possible to run a containerized VPP. I start by reading up on how they
create their Docker image, and I learn a lot.

### Docker Build

Considering IPng runs bare metal Debian (currently Bookworm) machines, my Docker image will be based
on `debian:bookworm` as well. The build starts off quite modest:

```
pim@summer:~$ mkdir -p src/vpp-containerlab
pim@summer:~/src/vpp-containerlab$ cat < EOF > Dockerfile.bookworm
FROM debian:bookworm
ARG DEBIAN_FRONTEND=noninteractive
ARG VPP_INSTALL_SKIP_SYSCTL=true
ARG REPO=release
RUN apt-get update && apt-get -y install curl procps && apt-get clean

# Install VPP
RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash
RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean

CMD ["/usr/bin/vpp","-c","/etc/vpp/startup.conf"]
EOF
pim@summer:~/src/vpp-containerlab$ docker build -f Dockerfile.bookworm . -t pimvanpelt/vpp-containerlab
```

One gotcha - when I install the upstream VPP debian packages, they generate a `sysctl` file which it
tries to execute. However, I can't set sysctl's in the container, so the build fails. I take a look
at the VPP source code and find `src/pkg/debian/vpp.postinst` which helpfully contains a means to
override setting the sysctl's, using an environment variable called `VPP_INSTALL_SKIP_SYSCTL`.

### Running VPP in Docker

With the Docker image built, I need to tweak the VPP startup configuration a little bit, to allow it
to run well in a Docker environment. There are a few things I make note of:
1.   We may not have huge pages on the host machine, so I'll set all the page sizes to the
     linux-default 4kB rather than 2MB or 1GB hugepages. This creates a performance regression, but
     in the case of Containerlab, we're not here to build high performance stuff, but rather users
     will be doing functional testing.
1.   DPDK requires either UIO of VFIO kernel drivers, so that it can bind its so-called _poll mode
     driver_  to the network cards. It also requires huge pages. Since my first version will be
     using only virtual ethernet interfaces, I'll disable DPDK and VFIO alltogether.
1.   VPP can run any number of CPU worker threads. In its simplest form, I can also run it with only
     one thread. Of course, this will not be a high performance setup, but since I'm already not
     using hugepages, I'll use only 1 thread.

The VPP `startup.conf` configuration file I came up with:

```
pim@summer:~/src/vpp-containerlab$ cat < EOF > clab-startup.conf
unix {
  interactive
  log /var/log/vpp/vpp.log
  full-coredump
  cli-listen /run/vpp/cli.sock
  cli-prompt vpp-clab#
  cli-no-pager
  poll-sleep-usec 100
}

api-trace {
  on
}

memory {
  main-heap-size 512M
  main-heap-page-size 4k
}
buffers {
  buffers-per-numa 16000
  default data-size 2048
  page-size 4k
}

statseg {
  size 64M
  page-size 4k
  per-node-counters on
}

plugins {
  plugin default { enable }
  plugin dpdk_plugin.so { disable }
}
EOF
```

Just a couple of notes for those who are running VPP in production. Each of the `*-page-size` config
settings take the normal Linux pagesize of 4kB, which effectively avoids VPP from using anhy
hugepages. Then, I'll specifically disable the DPDK plugin, although I didn't install it in the
Dockerfile build, as it lives in its own dedicated Debian package called `vpp-plugin-dpdk`. Finally,
I'll make VPP use less CPU by telling it to sleep for 100 microseconds between each poll iteration.
In production environments, VPP will use 100% of the CPUs it's assigned, but in this lab, it will
not be quite as hungry. By the way, even in this sleepy mode, it'll still easily handle a gigabit
of traffic!

Now, VPP wants to run as root and it needs a few host features, notably tuntap devices and vhost,
and a few capabilites, notably NET_ADMIN and SYS_PTRACE. I take a look at the
[[manpage](https://man7.org/linux/man-pages/man7/capabilities.7.html)]:
*   ***CAP_SYS_NICE***: allows to set real-time scheduling, CPU affinity, I/O scheduling class, and
    to migrate and move memory pages.
*   ***CAP_NET_ADMIN***: allows to perform various network-relates operations like interface
    configs, routing tables, nested network namespaces, multicast, set promiscuous mode, and so on.
*   ***CAP_SYS_PTRACE***: allows to trace arbitrary processes using `ptrace(2)`, and a few related
    kernel system calls.

Being a networking dataplane implementation, VPP wants to be able to tinker with network devices.
This is not typically allowed in Docker containers, although the Docker developers did make some
consessions for those containers that need just that little bit more access. They described it in
their
[[docs](https://docs.docker.com/engine/containers/run/#runtime-privilege-and-linux-capabilities)] as
follows:

| The --privileged flag gives all capabilities to the container. When the operator executes docker
| run --privileged, Docker enables access to all devices on the host, and reconfigures AppArmor or
| SELinux to allow the container nearly all the same access to the host as processes running outside
| containers on the host. Use this flag with caution. For more information about the --privileged
| flag, see the docker run reference.

{{< image width="4em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
In this moment, I feel I should point out that running a Docker container with `--privileged` flag
set does give it _a lot_ of privileges. A container with `--privileged` is not a securely sandboxed
process. Containers in this mode can get a root shell on the host and take control over the system.

With that little fineprint warning out of the way, I am going to Yolo like a boss:

```
pim@summer:~/src/vpp-containerlab$ docker run --name clab-pim \
                --cap-add=NET_ADMIN --cap-add=SYS_NICE --cap-add=SYS_PTRACE \
                --device=/dev/net/tun:/dev/net/tun --device=/dev/vhost-net:/dev/vhost-net \
                --privileged -v $(pwd)/clab-startup.conf:/etc/vpp/startup.conf:ro \
                docker.io/pimvanpelt/vpp-containerlab
clab-pim
```

### Configuring VPP in Docker

And with that, the Docker container is running! I post a screenshot on
[[Mastodon](https://ublog.tech/@IPngNetworks/114392852468494211)] and my buddy John responds with a
polite but firm insistence that I explain myself. Here you go, buddy :)

In another terminal, I can play around with this VPP instance a little bit:
```
pim@summer:~$ docker exec -it clab-pim bash
root@d57c3716eee9:/# ip -br l
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
eth0@if530566    UP             02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 

root@d57c3716eee9:/# ps auxw   
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  2.2  0.2 17498852 160300 ?     Rs   15:11   0:00 /usr/bin/vpp -c /etc/vpp/startup.conf
root          10  0.0  0.0   4192  3388 pts/0    Ss   15:11   0:00 bash
root          18  0.0  0.0   8104  4056 pts/0    R+   15:12   0:00 ps auxw

root@d57c3716eee9:/# vppctl
    _______    _        _   _____  ___ 
 __/ __/ _ \  (_)__    | | / / _ \/ _ \
 _/ _// // / / / _ \   | |/ / ___/ ___/
 /_/ /____(_)_/\___/   |___/_/  /_/    

vpp-clab# show version
vpp v25.02-release built by root on d5cd2c304b7f at 2025-02-26T13:58:32
vpp-clab# show interfaces
              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          Count     
local0                            0     down          0/0/0/0       
```

Slick! I can see that the container has an `eth0` device, which Docker has connected to the main
bridged network. For now, there's only one process running, pid 1 proudly shows VPP (as in Docker,
the `CMD` field will simply replace `init`. Later on, I can imagine running a few more daemons like
SSH and so on, but for now, I'm happy.

Looking at VPP itself, it has no network interfaces yet, except for the default `local0` interface.

### Adding Interfaces in Docker

But if I don't have DPDK, how will I add interfaces? Enter `veth(4)`. From the
[[manpage](https://man7.org/linux/man-pages/man4/veth.4.html)], I learn that veth devices are
virtual Ethernet devices.  They can act as tunnels between network namespaces to create a bridge to
a physical network device in another namespace, but can also be used as standalone network devices.
veth devices are always created in interconnected pairs.

Of course, Docker users will recognize this. It's like bread and butter for containers to
communicate with one another - and with the host they're running on. I can simply create a Docker
network and attach one half of it to a running container, like so:

```
pim@summer:~$ docker network create --driver=bridge clab-network \
                     --subnet 192.0.2.0/24 --ipv6 --subnet 2001:db8::/64
5711b95c6c32ac0ed185a54f39e5af4b499677171ff3d00f99497034e09320d2
pim@summer:~$ docker network connect clab-network clab-pim --ip '' --ip6 ''
```

The first command here creates a new network called `clab-network` in Docker. As a result, a new
bridge called `br-5711b95c6c32` shows up on the host. The bridge name is chosen from the UUID of the
Docker object. Seeing as I added an IPv4 and IPv6 subnet to the bridge, it gets configured with the
first address in both:

```
pim@summer:~/src/vpp-containerlab$ brctl show br-5711b95c6c32
bridge name       bridge id               STP enabled     interfaces
br-5711b95c6c32   8000.0242099728c6       no              veth021e363


pim@summer:~/src/vpp-containerlab$ ip -br a show dev br-5711b95c6c32
br-5711b95c6c32  UP     192.0.2.1/24 2001:db8::1/64 fe80::42:9ff:fe97:28c6/64 fe80::1/64 
```

The second command creates a `veth` pair, and puts one half of it in the bridge, and this interface
is called `veth021e363` above. The other half of it pops up as `eth1` in the Docker container:

```
pim@summer:~/src/vpp-containerlab$ docker exec -it clab-pim bash
root@d57c3716eee9:/# ip -br l
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
eth0@if530566    UP             02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 
eth1@if530577    UP             02:42:c0:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP> 
```

One of the many awesome features of VPP is its ability to attach to these `veth` devices by means of
its `af-packet` driver, by reusing the same MAC address (in this case `02:42:c0:00:02:02`). I first
take a look at the linux [[manpage](https://man7.org/linux/man-pages/man7/packet.7.html)] for it,
and then read up on the VPP
[[documentation](https://fd.io/docs/vpp/v2101/gettingstarted/progressivevpp/interface)] on the
topic.


However, my attention is drawn to Docker assigning an IPv4 and IPv6 address to the container:
```
root@d57c3716eee9:/# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0@if530566    UP             172.17.0.2/16
eth1@if530577    UP             192.0.2.2/24 2001:db8::2/64 fe80::42:c0ff:fe00:202/64
root@d57c3716eee9:/# ip addr del 192.0.2.2/24  dev eth1
root@d57c3716eee9:/# ip addr del 2001:db8::2/64 dev eth1
```

I decide to remove them from here, as in the end, `eth1` will be owned by VPP so _it_ should be
setting the IPv4 and IPv6 addresses. For the life of me, I don't see how I can avoid Docker from
assinging IPv4 and IPv6 addresses to this container ... and the
[[docs](https://docs.docker.com/engine/network/)] seem to be off as well, as they suggest I can pass
a flagg `--ipv4=False` but that flag doesn't exist, at least not on my Bookworm Docker variant. I
make a mental note to discuss this with the folks in the Containerlab community.


Anyway, armed with this knowledge I can bind the container-side veth pair called `eth1` to VPP, like
so:

```
root@d57c3716eee9:/# vppctl
    _______    _        _   _____  ___ 
 __/ __/ _ \  (_)__    | | / / _ \/ _ \
 _/ _// // / / / _ \   | |/ / ___/ ___/
 /_/ /____(_)_/\___/   |___/_/  /_/    

vpp-clab# create host-interface name eth1 hw-addr 02:42:c0:00:02:02
vpp-clab# set interface name host-eth1 eth1
vpp-clab# set interface mtu 1500 eth1
vpp-clab# set interface ip address eth1 192.0.2.2/24
vpp-clab# set interface ip address eth1 2001:db8::2/64
vpp-clab# set interface state eth1 up
vpp-clab# show int addr
eth1 (up):
  L3 192.0.2.2/24
  L3 2001:db8::2/64
local0 (dn):
```

## Results

After all this work, I've successfully created a Docker image based on Debian Bookworm and VPP 25.02
(the current stable release version), started a container with it, added a network bridge in Docker,
which binds the host `summer` to the container. Proof, as they say, is in the ping-pudding:

```
pim@summer:~/src/vpp-containerlab$ ping -c5 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.202 ms
64 bytes from 2001:db8::2: icmp_seq=4 ttl=64 time=0.102 ms
64 bytes from 2001:db8::2: icmp_seq=5 ttl=64 time=0.100 ms

--- 2001:db8::2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4098ms
rtt min/avg/max/mdev = 0.056/0.114/0.202/0.047 ms
pim@summer:~/src/vpp-containerlab$ ping -c5 192.0.2.2
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.043 ms
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.019 ms
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.041 ms
64 bytes from 192.0.2.2: icmp_seq=5 ttl=64 time=0.027 ms

--- 192.0.2.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4063ms
rtt min/avg/max/mdev = 0.019/0.032/0.043/0.008 ms
```

And in case that simple ping-test wasn't enough to get you excited, here's a packet trace from VPP
itself, while I'm performing this ping:

```
vpp-clab# trace add af-packet-input 100
vpp-clab# wait 3
vpp-clab# show trace
------------------- Start of thread 0 vpp_main -------------------
Packet 1

00:07:03:979275: af-packet-input
  af_packet: hw_if_index 1 rx-queue 0 next-index 4
    block 47:
      address 0x7fbf23b7d000 version 2 seq_num 48 pkt_num 0
    tpacket3_hdr:
      status 0x20000001 len 98 snaplen 98 mac 92 net 106
      sec 0x68164381 nsec 0x258e7659 vlan 0 vlan_tpid 0
    vnet-hdr:
      flags 0x00 gso_type 0x00 hdr_len 0
      gso_size 0 csum_start 0 csum_offset 0
00:07:03:979293: ethernet-input
  IP4: 02:42:09:97:28:c6 -> 02:42:c0:00:02:02
00:07:03:979306: ip4-input
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979315: ip4-lookup
  fib 0 dpo-idx 9 flow hash: 0x00000000
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979322: ip4-receive
    fib:0 adj:9 flow:0x00000000
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979323: ip4-icmp-input
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979323: ip4-icmp-echo-request
  ICMP: 192.0.2.1 -> 192.0.2.2
    tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
    fragment id 0x5813, flags DONT_FRAGMENT
  ICMP echo_request checksum 0xc16 id 21197
00:07:03:979326: ip4-load-balance
  fib 0 dpo-idx 5 flow hash: 0x00000000
  ICMP: 192.0.2.2 -> 192.0.2.1
    tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
    fragment id 0x2dc4, flags DONT_FRAGMENT
  ICMP echo_reply checksum 0x1416 id 21197
00:07:03:979325: ip4-rewrite
  tx_sw_if_index 1 dpo-idx 5 : ipv4 via 192.0.2.1 eth1: mtu:1500 next:3 flags:[] 0242099728c60242c00002020800 flow hash: 0x00000000
  00000000: 0242099728c60242c00002020800450000542dc44000400188e1c0000202c000
  00000020: 02010000141652cd00018143166800000000399d0900000000001011
00:07:03:979326: eth1-output
  eth1 flags 0x02180005
  IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
  ICMP: 192.0.2.2 -> 192.0.2.1
    tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
    fragment id 0x2dc4, flags DONT_FRAGMENT
  ICMP echo_reply checksum 0x1416 id 21197
00:07:03:979327: eth1-tx
  af_packet: hw_if_index 1 tx-queue 0
    tpacket3_hdr:
      status 0x1 len 108 snaplen 108 mac 0 net 0
      sec 0x0 nsec 0x0 vlan 0 vlan_tpid 0
    vnet-hdr:
      flags 0x00 gso_type 0x00 hdr_len 0
      gso_size 0 csum_start 0 csum_offset 0
    buffer 0xf97c4:
      current data 0, length 98, buffer-pool 0, ref-count 1, trace handle 0x0
      local l2-hdr-offset 0 l3-hdr-offset 14 
    IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
    ICMP: 192.0.2.2 -> 192.0.2.1
      tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
      fragment id 0x2dc4, flags DONT_FRAGMENT
    ICMP echo_reply checksum 0x1416 id 21197
```

Well, that's a mouthfull, isn't it! Here, I get to show you VPP in action. After receiving the
packet on its `af-packet-input` node from 192.0.2.1 (Summer, who is pinging us) to 192.0.2.2 (the
VPP container), the packet traverses the dataplane graph. It goes through `ethernet-input`, then
`ip4-input`, which sees it's destined to an IPv4 address configured, so the packet is handed to
`ip4-receive`. That one sees that the IP protocol is ICMP, so it hands the packet to
`ip4-icmp-input` which notices that the packet is an ICMP echo request, so off to
`ip4-icmp-echo-request` our little packet goes. The ICMP plugin in VPP now answers by
`ip4-rewrite`'ing the packet, sending the return to 192.0.2.1 at MAC address `02:42:09:97:28:c6`
(this is Summer, the host doing the pinging!), after which the newly created ICMP echo-reply is
handed to `eth1-output` which marshalls it back into the kernel's AF_PACKET interface using
`eth1-tx`.

Boom. I could not be more pleased.

## What's Next

This was a nice exercise for me! I'm going this direction becaue the
[[Containerlab](https://containerlab.dev)] framework will start containers with given NOS images, 
not too dissimilar from the one I just made, and then attaches `veth` pairs between the containers.
I started dabbling with a [[pull-request](https://github.com/srl-labs/containerlab/pull/2571)], but
I got stuck with a part of the Containerlab code that pre-deploys config files into the containers.
You see, I will need to generate two files:

1.   A `startup.conf` file that is specific to the containerlab Docker container. I'd like them to
     each set their own hostname so that the CLI has a unique prompt. I can do this by setting `unix
     { cli-prompt {{ .ShortName }}# }` in the template renderer.
1.   Containerlab will know all of the veth pairs that are planned to be created into each VPP
     container. I'll need it to then write a little snippet of config that does the `create
     host-interface` spiel, to attach these `veth` pairs to the VPP dataplane.

I reached out to Roman from Nokia, who is one of the authors and current maintainer of Containerlab.
Roman was keen to help out, and seeing as he knows the COntainerlab stuff well, and I know the VPP
stuff well, this is a reasonable partnership! Soon, he and I plan to have a bare-bones setup that
will connect a few VPP containers together with an SR Linux node in a lab. Stand by!

Once we have that, there's still quite some work for me to do. Notably:
*    Configuration persistence. `clab` allows you to save the running config. For that, I'll need to
     introduce [[vppcfg](https://git.ipng.ch/ipng/vppcfg)] and a means to invoke it when
     the lab operator wants to save their config, and then reconfigure VPP when the container
     restarts.
*    I'll need to have a few files from `clab` shared with the host, notably the `startup.conf` and
     `vppcfg.yaml`, as well as some manual pre- and post-flight configuration for the more esoteric
     stuff. Building the plumbing for this is a TODO for now.

## Acknowledgements

I wanted to give a shout-out to Nardus le Roux who inspired me to contribute this Containerlab VPP
node type, and to Roman Dodin for his help getting the Containerlab parts squared away when I got a
little bit stuck.

First order of business: get it to ping at all ... it'll go faster from there on out :)