Merge branch 'main' of git.ipng.ch:ipng/ipng.ch
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
464
content/articles/2025-05-03-containerlab-1.md
Normal file
464
content/articles/2025-05-03-containerlab-1.md
Normal file
@ -0,0 +1,464 @@
|
|||||||
|
---
|
||||||
|
date: "2025-05-03T15:07:23Z"
|
||||||
|
title: 'VPP in Containerlab - Part 1'
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/containerlab/containerlab.svg" alt="Containerlab Logo" width="12em" >}}
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in
|
||||||
|
AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance.
|
||||||
|
However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines
|
||||||
|
like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to
|
||||||
|
allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP
|
||||||
|
performance almost the same as on bare metal. But did you know that VPP can also run in Docker?
|
||||||
|
|
||||||
|
The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa.
|
||||||
|
One of the presenters was Nardus le Roux of Nokia, and he showed off a project called
|
||||||
|
[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing
|
||||||
|
container-based networking labs. It starts the containers, builds a virtual wiring between them to
|
||||||
|
create lab topologies of users choice and manages labs lifecycle.
|
||||||
|
|
||||||
|
Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise
|
||||||
|
to actually add them. Here I go, on a journey to integrate VPP into Containerlab!
|
||||||
|
|
||||||
|
## Containerized VPP
|
||||||
|
|
||||||
|
The folks at [[Tigera](https://www.tigera.io/project-calico/)] maintain a project called _Calico_,
|
||||||
|
which accelerates Kubernetes CNI (Container Network Interface) by using [[FD.io](https://fd.io)]
|
||||||
|
VPP. Since the origins of Kubernetes are to run containers in a Docker environment, it stands to
|
||||||
|
reason that it should be possible to run a containerized VPP. I start by reading up on how they
|
||||||
|
create their Docker image, and I learn a lot.
|
||||||
|
|
||||||
|
### Docker Build
|
||||||
|
|
||||||
|
Considering IPng runs bare metal Debian (currently Bookworm) machines, my Docker image will be based
|
||||||
|
on `debian:bookworm` as well. The build starts off quite modest:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ mkdir -p src/vpp-containerlab
|
||||||
|
pim@summer:~/src/vpp-containerlab$ cat < EOF > Dockerfile.bookworm
|
||||||
|
FROM debian:bookworm
|
||||||
|
ARG DEBIAN_FRONTEND=noninteractive
|
||||||
|
ARG VPP_INSTALL_SKIP_SYSCTL=true
|
||||||
|
ARG REPO=release
|
||||||
|
RUN apt-get update && apt-get -y install curl procps && apt-get clean
|
||||||
|
|
||||||
|
# Install VPP
|
||||||
|
RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash
|
||||||
|
RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean
|
||||||
|
|
||||||
|
CMD ["/usr/bin/vpp","-c","/etc/vpp/startup.conf"]
|
||||||
|
EOF
|
||||||
|
pim@summer:~/src/vpp-containerlab$ docker build -f Dockerfile.bookworm . -t pimvanpelt/vpp-containerlab
|
||||||
|
```
|
||||||
|
|
||||||
|
One gotcha - when I install the upstream VPP debian packages, they generate a `sysctl` file which it
|
||||||
|
tries to execute. However, I can't set sysctl's in the container, so the build fails. I take a look
|
||||||
|
at the VPP source code and find `src/pkg/debian/vpp.postinst` which helpfully contains a means to
|
||||||
|
override setting the sysctl's, using an environment variable called `VPP_INSTALL_SKIP_SYSCTL`.
|
||||||
|
|
||||||
|
### Running VPP in Docker
|
||||||
|
|
||||||
|
With the Docker image built, I need to tweak the VPP startup configuration a little bit, to allow it
|
||||||
|
to run well in a Docker environment. There are a few things I make note of:
|
||||||
|
1. We may not have huge pages on the host machine, so I'll set all the page sizes to the
|
||||||
|
linux-default 4kB rather than 2MB or 1GB hugepages. This creates a performance regression, but
|
||||||
|
in the case of Containerlab, we're not here to build high performance stuff, but rather users
|
||||||
|
will be doing functional testing.
|
||||||
|
1. DPDK requires either UIO of VFIO kernel drivers, so that it can bind its so-called _poll mode
|
||||||
|
driver_ to the network cards. It also requires huge pages. Since my first version will be
|
||||||
|
using only virtual ethernet interfaces, I'll disable DPDK and VFIO alltogether.
|
||||||
|
1. VPP can run any number of CPU worker threads. In its simplest form, I can also run it with only
|
||||||
|
one thread. Of course, this will not be a high performance setup, but since I'm already not
|
||||||
|
using hugepages, I'll use only 1 thread.
|
||||||
|
|
||||||
|
The VPP `startup.conf` configuration file I came up with:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~/src/vpp-containerlab$ cat < EOF > clab-startup.conf
|
||||||
|
unix {
|
||||||
|
interactive
|
||||||
|
log /var/log/vpp/vpp.log
|
||||||
|
full-coredump
|
||||||
|
cli-listen /run/vpp/cli.sock
|
||||||
|
cli-prompt vpp-clab#
|
||||||
|
cli-no-pager
|
||||||
|
poll-sleep-usec 100
|
||||||
|
}
|
||||||
|
|
||||||
|
api-trace {
|
||||||
|
on
|
||||||
|
}
|
||||||
|
|
||||||
|
memory {
|
||||||
|
main-heap-size 512M
|
||||||
|
main-heap-page-size 4k
|
||||||
|
}
|
||||||
|
buffers {
|
||||||
|
buffers-per-numa 16000
|
||||||
|
default data-size 2048
|
||||||
|
page-size 4k
|
||||||
|
}
|
||||||
|
|
||||||
|
statseg {
|
||||||
|
size 64M
|
||||||
|
page-size 4k
|
||||||
|
per-node-counters on
|
||||||
|
}
|
||||||
|
|
||||||
|
plugins {
|
||||||
|
plugin default { enable }
|
||||||
|
plugin dpdk_plugin.so { disable }
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
Just a couple of notes for those who are running VPP in production. Each of the `*-page-size` config
|
||||||
|
settings take the normal Linux pagesize of 4kB, which effectively avoids VPP from using anhy
|
||||||
|
hugepages. Then, I'll specifically disable the DPDK plugin, although I didn't install it in the
|
||||||
|
Dockerfile build, as it lives in its own dedicated Debian package called `vpp-plugin-dpdk`. Finally,
|
||||||
|
I'll make VPP use less CPU by telling it to sleep for 100 microseconds between each poll iteration.
|
||||||
|
In production environments, VPP will use 100% of the CPUs it's assigned, but in this lab, it will
|
||||||
|
not be quite as hungry. By the way, even in this sleepy mode, it'll still easily handle a gigabit
|
||||||
|
of traffic!
|
||||||
|
|
||||||
|
Now, VPP wants to run as root and it needs a few host features, notably tuntap devices and vhost,
|
||||||
|
and a few capabilites, notably NET_ADMIN and SYS_PTRACE. I take a look at the
|
||||||
|
[[manpage](https://man7.org/linux/man-pages/man7/capabilities.7.html)]:
|
||||||
|
* ***CAP_SYS_NICE***: allows to set real-time scheduling, CPU affinity, I/O scheduling class, and
|
||||||
|
to migrate and move memory pages.
|
||||||
|
* ***CAP_NET_ADMIN***: allows to perform various network-relates operations like interface
|
||||||
|
configs, routing tables, nested network namespaces, multicast, set promiscuous mode, and so on.
|
||||||
|
* ***CAP_SYS_PTRACE***: allows to trace arbitrary processes using `ptrace(2)`, and a few related
|
||||||
|
kernel system calls.
|
||||||
|
|
||||||
|
Being a networking dataplane implementation, VPP wants to be able to tinker with network devices.
|
||||||
|
This is not typically allowed in Docker containers, although the Docker developers did make some
|
||||||
|
consessions for those containers that need just that little bit more access. They described it in
|
||||||
|
their
|
||||||
|
[[docs](https://docs.docker.com/engine/containers/run/#runtime-privilege-and-linux-capabilities)] as
|
||||||
|
follows:
|
||||||
|
|
||||||
|
| The --privileged flag gives all capabilities to the container. When the operator executes docker
|
||||||
|
| run --privileged, Docker enables access to all devices on the host, and reconfigures AppArmor or
|
||||||
|
| SELinux to allow the container nearly all the same access to the host as processes running outside
|
||||||
|
| containers on the host. Use this flag with caution. For more information about the --privileged
|
||||||
|
| flag, see the docker run reference.
|
||||||
|
|
||||||
|
{{< image width="4em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
|
||||||
|
In this moment, I feel I should point out that running a Docker container with `--privileged` flag
|
||||||
|
set does give it _a lot_ of privileges. A container with `--privileged` is not a securely sandboxed
|
||||||
|
process. Containers in this mode can get a root shell on the host and take control over the system.
|
||||||
|
|
||||||
|
With that little fineprint warning out of the way, I am going to Yolo like a boss:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~/src/vpp-containerlab$ docker run --name clab-pim \
|
||||||
|
--cap-add=NET_ADMIN --cap-add=SYS_NICE --cap-add=SYS_PTRACE \
|
||||||
|
--device=/dev/net/tun:/dev/net/tun --device=/dev/vhost-net:/dev/vhost-net \
|
||||||
|
--privileged -v $(pwd)/clab-startup.conf:/etc/vpp/startup.conf:ro \
|
||||||
|
docker.io/pimvanpelt/vpp-containerlab
|
||||||
|
clab-pim
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuring VPP in Docker
|
||||||
|
|
||||||
|
And with that, the Docker container is running! I post a screenshot on
|
||||||
|
[[Mastodon](https://ublog.tech/@IPngNetworks/114392852468494211)] and my buddy John responds with a
|
||||||
|
polite but firm insistence that I explain myself. Here you go, buddy :)
|
||||||
|
|
||||||
|
In another terminal, I can play around with this VPP instance a little bit:
|
||||||
|
```
|
||||||
|
pim@summer:~$ docker exec -it clab-pim bash
|
||||||
|
root@d57c3716eee9:/# ip -br l
|
||||||
|
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
|
||||||
|
eth0@if530566 UP 02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
|
||||||
|
|
||||||
|
root@d57c3716eee9:/# ps auxw
|
||||||
|
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
||||||
|
root 1 2.2 0.2 17498852 160300 ? Rs 15:11 0:00 /usr/bin/vpp -c /etc/vpp/startup.conf
|
||||||
|
root 10 0.0 0.0 4192 3388 pts/0 Ss 15:11 0:00 bash
|
||||||
|
root 18 0.0 0.0 8104 4056 pts/0 R+ 15:12 0:00 ps auxw
|
||||||
|
|
||||||
|
root@d57c3716eee9:/# vppctl
|
||||||
|
_______ _ _ _____ ___
|
||||||
|
__/ __/ _ \ (_)__ | | / / _ \/ _ \
|
||||||
|
_/ _// // / / / _ \ | |/ / ___/ ___/
|
||||||
|
/_/ /____(_)_/\___/ |___/_/ /_/
|
||||||
|
|
||||||
|
vpp-clab# show version
|
||||||
|
vpp v25.02-release built by root on d5cd2c304b7f at 2025-02-26T13:58:32
|
||||||
|
vpp-clab# show interfaces
|
||||||
|
Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count
|
||||||
|
local0 0 down 0/0/0/0
|
||||||
|
```
|
||||||
|
|
||||||
|
Slick! I can see that the container has an `eth0` device, which Docker has connected to the main
|
||||||
|
bridged network. For now, there's only one process running, pid 1 proudly shows VPP (as in Docker,
|
||||||
|
the `CMD` field will simply replace `init`. Later on, I can imagine running a few more daemons like
|
||||||
|
SSH and so on, but for now, I'm happy.
|
||||||
|
|
||||||
|
Looking at VPP itself, it has no network interfaces yet, except for the default `local0` interface.
|
||||||
|
|
||||||
|
### Adding Interfaces in Docker
|
||||||
|
|
||||||
|
But if I don't have DPDK, how will I add interfaces? Enter `veth(4)`. From the
|
||||||
|
[[manpage](https://man7.org/linux/man-pages/man4/veth.4.html)], I learn that veth devices are
|
||||||
|
virtual Ethernet devices. They can act as tunnels between network namespaces to create a bridge to
|
||||||
|
a physical network device in another namespace, but can also be used as standalone network devices.
|
||||||
|
veth devices are always created in interconnected pairs.
|
||||||
|
|
||||||
|
Of course, Docker users will recognize this. It's like bread and butter for containers to
|
||||||
|
communicate with one another - and with the host they're running on. I can simply create a Docker
|
||||||
|
network and attach one half of it to a running container, like so:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ docker network create --driver=bridge clab-network \
|
||||||
|
--subnet 192.0.2.0/24 --ipv6 --subnet 2001:db8::/64
|
||||||
|
5711b95c6c32ac0ed185a54f39e5af4b499677171ff3d00f99497034e09320d2
|
||||||
|
pim@summer:~$ docker network connect clab-network clab-pim --ip '' --ip6 ''
|
||||||
|
```
|
||||||
|
|
||||||
|
The first command here creates a new network called `clab-network` in Docker. As a result, a new
|
||||||
|
bridge called `br-5711b95c6c32` shows up on the host. The bridge name is chosen from the UUID of the
|
||||||
|
Docker object. Seeing as I added an IPv4 and IPv6 subnet to the bridge, it gets configured with the
|
||||||
|
first address in both:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~/src/vpp-containerlab$ brctl show br-5711b95c6c32
|
||||||
|
bridge name bridge id STP enabled interfaces
|
||||||
|
br-5711b95c6c32 8000.0242099728c6 no veth021e363
|
||||||
|
|
||||||
|
|
||||||
|
pim@summer:~/src/vpp-containerlab$ ip -br a show dev br-5711b95c6c32
|
||||||
|
br-5711b95c6c32 UP 192.0.2.1/24 2001:db8::1/64 fe80::42:9ff:fe97:28c6/64 fe80::1/64
|
||||||
|
```
|
||||||
|
|
||||||
|
The second command creates a `veth` pair, and puts one half of it in the bridge, and this interface
|
||||||
|
is called `veth021e363` above. The other half of it pops up as `eth1` in the Docker container:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~/src/vpp-containerlab$ docker exec -it clab-pim bash
|
||||||
|
root@d57c3716eee9:/# ip -br l
|
||||||
|
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
|
||||||
|
eth0@if530566 UP 02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
|
||||||
|
eth1@if530577 UP 02:42:c0:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
|
||||||
|
```
|
||||||
|
|
||||||
|
One of the many awesome features of VPP is its ability to attach to these `veth` devices by means of
|
||||||
|
its `af-packet` driver, by reusing the same MAC address (in this case `02:42:c0:00:02:02`). I first
|
||||||
|
take a look at the linux [[manpage](https://man7.org/linux/man-pages/man7/packet.7.html)] for it,
|
||||||
|
and then read up on the VPP
|
||||||
|
[[documentation](https://fd.io/docs/vpp/v2101/gettingstarted/progressivevpp/interface)] on the
|
||||||
|
topic.
|
||||||
|
|
||||||
|
|
||||||
|
However, my attention is drawn to Docker assigning an IPv4 and IPv6 address to the container:
|
||||||
|
```
|
||||||
|
root@d57c3716eee9:/# ip -br a
|
||||||
|
lo UNKNOWN 127.0.0.1/8 ::1/128
|
||||||
|
eth0@if530566 UP 172.17.0.2/16
|
||||||
|
eth1@if530577 UP 192.0.2.2/24 2001:db8::2/64 fe80::42:c0ff:fe00:202/64
|
||||||
|
root@d57c3716eee9:/# ip addr del 192.0.2.2/24 dev eth1
|
||||||
|
root@d57c3716eee9:/# ip addr del 2001:db8::2/64 dev eth1
|
||||||
|
```
|
||||||
|
|
||||||
|
I decide to remove them from here, as in the end, `eth1` will be owned by VPP so _it_ should be
|
||||||
|
setting the IPv4 and IPv6 addresses. For the life of me, I don't see how I can avoid Docker from
|
||||||
|
assinging IPv4 and IPv6 addresses to this container ... and the
|
||||||
|
[[docs](https://docs.docker.com/engine/network/)] seem to be off as well, as they suggest I can pass
|
||||||
|
a flagg `--ipv4=False` but that flag doesn't exist, at least not on my Bookworm Docker variant. I
|
||||||
|
make a mental note to discuss this with the folks in the Containerlab community.
|
||||||
|
|
||||||
|
|
||||||
|
Anyway, armed with this knowledge I can bind the container-side veth pair called `eth1` to VPP, like
|
||||||
|
so:
|
||||||
|
|
||||||
|
```
|
||||||
|
root@d57c3716eee9:/# vppctl
|
||||||
|
_______ _ _ _____ ___
|
||||||
|
__/ __/ _ \ (_)__ | | / / _ \/ _ \
|
||||||
|
_/ _// // / / / _ \ | |/ / ___/ ___/
|
||||||
|
/_/ /____(_)_/\___/ |___/_/ /_/
|
||||||
|
|
||||||
|
vpp-clab# create host-interface name eth1 hw-addr 02:42:c0:00:02:02
|
||||||
|
vpp-clab# set interface name host-eth1 eth1
|
||||||
|
vpp-clab# set interface mtu 1500 eth1
|
||||||
|
vpp-clab# set interface ip address eth1 192.0.2.2/24
|
||||||
|
vpp-clab# set interface ip address eth1 2001:db8::2/64
|
||||||
|
vpp-clab# set interface state eth1 up
|
||||||
|
vpp-clab# show int addr
|
||||||
|
eth1 (up):
|
||||||
|
L3 192.0.2.2/24
|
||||||
|
L3 2001:db8::2/64
|
||||||
|
local0 (dn):
|
||||||
|
```
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
After all this work, I've successfully created a Docker image based on Debian Bookworm and VPP 25.02
|
||||||
|
(the current stable release version), started a container with it, added a network bridge in Docker,
|
||||||
|
which binds the host `summer` to the container. Proof, as they say, is in the ping-pudding:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~/src/vpp-containerlab$ ping -c5 2001:db8::2
|
||||||
|
PING 2001:db8::2(2001:db8::2) 56 data bytes
|
||||||
|
64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.113 ms
|
||||||
|
64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.056 ms
|
||||||
|
64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.202 ms
|
||||||
|
64 bytes from 2001:db8::2: icmp_seq=4 ttl=64 time=0.102 ms
|
||||||
|
64 bytes from 2001:db8::2: icmp_seq=5 ttl=64 time=0.100 ms
|
||||||
|
|
||||||
|
--- 2001:db8::2 ping statistics ---
|
||||||
|
5 packets transmitted, 5 received, 0% packet loss, time 4098ms
|
||||||
|
rtt min/avg/max/mdev = 0.056/0.114/0.202/0.047 ms
|
||||||
|
pim@summer:~/src/vpp-containerlab$ ping -c5 192.0.2.2
|
||||||
|
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
|
||||||
|
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.043 ms
|
||||||
|
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.032 ms
|
||||||
|
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.019 ms
|
||||||
|
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.041 ms
|
||||||
|
64 bytes from 192.0.2.2: icmp_seq=5 ttl=64 time=0.027 ms
|
||||||
|
|
||||||
|
--- 192.0.2.2 ping statistics ---
|
||||||
|
5 packets transmitted, 5 received, 0% packet loss, time 4063ms
|
||||||
|
rtt min/avg/max/mdev = 0.019/0.032/0.043/0.008 ms
|
||||||
|
```
|
||||||
|
|
||||||
|
And in case that simple ping-test wasn't enough to get you excited, here's a packet trace from VPP
|
||||||
|
itself, while I'm performing this ping:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp-clab# trace add af-packet-input 100
|
||||||
|
vpp-clab# wait 3
|
||||||
|
vpp-clab# show trace
|
||||||
|
------------------- Start of thread 0 vpp_main -------------------
|
||||||
|
Packet 1
|
||||||
|
|
||||||
|
00:07:03:979275: af-packet-input
|
||||||
|
af_packet: hw_if_index 1 rx-queue 0 next-index 4
|
||||||
|
block 47:
|
||||||
|
address 0x7fbf23b7d000 version 2 seq_num 48 pkt_num 0
|
||||||
|
tpacket3_hdr:
|
||||||
|
status 0x20000001 len 98 snaplen 98 mac 92 net 106
|
||||||
|
sec 0x68164381 nsec 0x258e7659 vlan 0 vlan_tpid 0
|
||||||
|
vnet-hdr:
|
||||||
|
flags 0x00 gso_type 0x00 hdr_len 0
|
||||||
|
gso_size 0 csum_start 0 csum_offset 0
|
||||||
|
00:07:03:979293: ethernet-input
|
||||||
|
IP4: 02:42:09:97:28:c6 -> 02:42:c0:00:02:02
|
||||||
|
00:07:03:979306: ip4-input
|
||||||
|
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||||
|
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||||
|
fragment id 0x5813, flags DONT_FRAGMENT
|
||||||
|
ICMP echo_request checksum 0xc16 id 21197
|
||||||
|
00:07:03:979315: ip4-lookup
|
||||||
|
fib 0 dpo-idx 9 flow hash: 0x00000000
|
||||||
|
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||||
|
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||||
|
fragment id 0x5813, flags DONT_FRAGMENT
|
||||||
|
ICMP echo_request checksum 0xc16 id 21197
|
||||||
|
00:07:03:979322: ip4-receive
|
||||||
|
fib:0 adj:9 flow:0x00000000
|
||||||
|
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||||
|
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||||
|
fragment id 0x5813, flags DONT_FRAGMENT
|
||||||
|
ICMP echo_request checksum 0xc16 id 21197
|
||||||
|
00:07:03:979323: ip4-icmp-input
|
||||||
|
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||||
|
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||||
|
fragment id 0x5813, flags DONT_FRAGMENT
|
||||||
|
ICMP echo_request checksum 0xc16 id 21197
|
||||||
|
00:07:03:979323: ip4-icmp-echo-request
|
||||||
|
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||||
|
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||||
|
fragment id 0x5813, flags DONT_FRAGMENT
|
||||||
|
ICMP echo_request checksum 0xc16 id 21197
|
||||||
|
00:07:03:979326: ip4-load-balance
|
||||||
|
fib 0 dpo-idx 5 flow hash: 0x00000000
|
||||||
|
ICMP: 192.0.2.2 -> 192.0.2.1
|
||||||
|
tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
|
||||||
|
fragment id 0x2dc4, flags DONT_FRAGMENT
|
||||||
|
ICMP echo_reply checksum 0x1416 id 21197
|
||||||
|
00:07:03:979325: ip4-rewrite
|
||||||
|
tx_sw_if_index 1 dpo-idx 5 : ipv4 via 192.0.2.1 eth1: mtu:1500 next:3 flags:[] 0242099728c60242c00002020800 flow hash: 0x00000000
|
||||||
|
00000000: 0242099728c60242c00002020800450000542dc44000400188e1c0000202c000
|
||||||
|
00000020: 02010000141652cd00018143166800000000399d0900000000001011
|
||||||
|
00:07:03:979326: eth1-output
|
||||||
|
eth1 flags 0x02180005
|
||||||
|
IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
|
||||||
|
ICMP: 192.0.2.2 -> 192.0.2.1
|
||||||
|
tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
|
||||||
|
fragment id 0x2dc4, flags DONT_FRAGMENT
|
||||||
|
ICMP echo_reply checksum 0x1416 id 21197
|
||||||
|
00:07:03:979327: eth1-tx
|
||||||
|
af_packet: hw_if_index 1 tx-queue 0
|
||||||
|
tpacket3_hdr:
|
||||||
|
status 0x1 len 108 snaplen 108 mac 0 net 0
|
||||||
|
sec 0x0 nsec 0x0 vlan 0 vlan_tpid 0
|
||||||
|
vnet-hdr:
|
||||||
|
flags 0x00 gso_type 0x00 hdr_len 0
|
||||||
|
gso_size 0 csum_start 0 csum_offset 0
|
||||||
|
buffer 0xf97c4:
|
||||||
|
current data 0, length 98, buffer-pool 0, ref-count 1, trace handle 0x0
|
||||||
|
local l2-hdr-offset 0 l3-hdr-offset 14
|
||||||
|
IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
|
||||||
|
ICMP: 192.0.2.2 -> 192.0.2.1
|
||||||
|
tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
|
||||||
|
fragment id 0x2dc4, flags DONT_FRAGMENT
|
||||||
|
ICMP echo_reply checksum 0x1416 id 21197
|
||||||
|
```
|
||||||
|
|
||||||
|
Well, that's a mouthfull, isn't it! Here, I get to show you VPP in action. After receiving the
|
||||||
|
packet on its `af-packet-input` node from 192.0.2.1 (Summer, who is pinging us) to 192.0.2.2 (the
|
||||||
|
VPP container), the packet traverses the dataplane graph. It goes through `ethernet-input`, then
|
||||||
|
`ip4-input`, which sees it's destined to an IPv4 address configured, so the packet is handed to
|
||||||
|
`ip4-receive`. That one sees that the IP protocol is ICMP, so it hands the packet to
|
||||||
|
`ip4-icmp-input` which notices that the packet is an ICMP echo request, so off to
|
||||||
|
`ip4-icmp-echo-request` our little packet goes. The ICMP plugin in VPP now answers by
|
||||||
|
`ip4-rewrite`'ing the packet, sending the return to 192.0.2.1 at MAC address `02:42:09:97:28:c6`
|
||||||
|
(this is Summer, the host doing the pinging!), after which the newly created ICMP echo-reply is
|
||||||
|
handed to `eth1-output` which marshalls it back into the kernel's AF_PACKET interface using
|
||||||
|
`eth1-tx`.
|
||||||
|
|
||||||
|
Boom. I could not be more pleased.
|
||||||
|
|
||||||
|
## What's Next
|
||||||
|
|
||||||
|
This was a nice exercise for me! I'm going this direction becaue the
|
||||||
|
[[Containerlab](https://containerlab.dev)] framework will start containers with given NOS images,
|
||||||
|
not too dissimilar from the one I just made, and then attaches `veth` pairs between the containers.
|
||||||
|
I started dabbling with a [[pull-request](https://github.com/srl-labs/containerlab/pull/2569)], but
|
||||||
|
I got stuck with a part of the Containerlab code that pre-deploys config files into the containers.
|
||||||
|
You see, I will need to generate two files:
|
||||||
|
|
||||||
|
1. A `startup.conf` file that is specific to the containerlab Docker container. I'd like them to
|
||||||
|
each set their own hostname so that the CLI has a unique prompt. I can do this by setting `unix
|
||||||
|
{ cli-prompt {{ .ShortName }}# }` in the template renderer.
|
||||||
|
1. Containerlab will know all of the veth pairs that are planned to be created into each VPP
|
||||||
|
container. I'll need it to then write a little snippet of config that does the `create
|
||||||
|
host-interface` spiel, to attach these `veth` pairs to the VPP dataplane.
|
||||||
|
|
||||||
|
I reached out to Roman from Nokia, who is one of the authors and current maintainer of Containerlab.
|
||||||
|
Roman was keen to help out, and seeing as he knows the COntainerlab stuff well, and I know the VPP
|
||||||
|
stuff well, this is a reasonable partnership! Soon, he and I plan to have a bare-bones setup that
|
||||||
|
will connect a few VPP containers together with an SR Linux node in a lab. Stand by!
|
||||||
|
|
||||||
|
Once we have that, there's still quite some work for me to do. Notably:
|
||||||
|
* Configuration persistence. `clab` allows you to save the running config. For that, I'll need to
|
||||||
|
introduce [[vppcfg](https://github.com/pimvanpelt/vppcfg.git)] and a means to invoke it when
|
||||||
|
the lab operator wants to save their config, and then reconfigure VPP when the container
|
||||||
|
restarts.
|
||||||
|
* I'll need to have a few files from `clab` shared with the host, notably the `startup.conf` and
|
||||||
|
`vppcfg.yaml`, as well as some manual pre- and post-flight configuration for the more esoteric
|
||||||
|
stuff. Building the plumbing for this is a TODO for now.
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
I wanted to give a shout-out to Nardus le Roux who inspired me to contribute this Containerlab VPP
|
||||||
|
node type, and to Roman Dodin for his help getting the Containerlab parts squared away when I got a
|
||||||
|
little bit stuck.
|
||||||
|
|
||||||
|
First order of business: get it to ping at all ... it'll go faster from there on out :)
|
1
static/assets/containerlab/containerlab.svg
Normal file
1
static/assets/containerlab/containerlab.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 21 KiB |
Reference in New Issue
Block a user