Files
ipng.ch/content/articles/2021-08-13-vpp-2.md
Pim van Pelt fdb77838b8
All checks were successful
continuous-integration/drone/push Build is passing
Rewrite github.com to git.ipng.ch for popular repos
2025-05-04 21:54:16 +02:00

349 lines
16 KiB
Markdown

---
date: "2021-08-13T15:33:14Z"
title: VPP Linux CP - Part2
aliases:
- /s/articles/2021/08/13/vpp-2.html
params:
asciinema: true
---
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
# About this series
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its
performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic
_ASR_ (aggregation services router), VPP will look and feel quite familiar as many of the approaches
are shared between the two. One thing notably missing, is the higher level control plane, that is
to say: there is no OSPF or ISIS, BGP, LDP and the like. This series of posts details my work on a
VPP _plugin_ which is called the **Linux Control Plane**, or LCP for short, which creates Linux network
devices that mirror their VPP dataplane counterpart. IPv4 and IPv6 traffic, and associated protocols
like ARP and IPv6 Neighbor Discovery can now be handled by Linux, while the heavy lifting of packet
forwarding is done by the VPP dataplane. Or, said another way: this plugin will allow Linux to use
VPP as a software ASIC for fast forwarding, filtering, NAT, and so on, while keeping control of the
interface state (links, addresses and routes) itself. When the plugin is completed, running software
like [FRR](https://frrouting.org/) or [Bird](https://bird.network.cz/) on top of VPP and achieving
&gt;100Mpps and &gt;100Gbps forwarding rates will be well in reach!
In this second post, let's make the plugin a bit more useful by making it copy forward state changes
to interfaces in VPP, into their Linux CP counterparts.
## My test setup
I'm using the same setup from the [previous post]({{< ref "2021-08-12-vpp-1" >}}). The goal of this
post is to show what code needed to be written and which changes needed to be made to the plugin, in
order to propagate changes to VPP interfaces to the Linux TAP devices.
### Startingpoint
The `linux-cp` plugin that ships with VPP 21.06, even with my [changes](https://gerrit.fd.io/r/c/vpp/+/33481)
is still _only_ able to create _LIP_ devices. It's not very user friendly to have to
apply state changes meticulously on both sides, but it can be done:
```
vppctl lcp create TenGigabitEthernet3/0/0 host-if e0
vppctl set interface state TenGigabitEthernet3/0/0 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0
vppctl set interface ip address TenGigabitEthernet3/0/0 10.0.1.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0 2001:db8:0:1::1/64
ip link set e0 up
ip link set e0 mtu 9000
ip addr add 10.0.1.1/30 dev e0
ip addr add 2001:db8:0:1::1/64 dev e0
```
In this snippet, we can see that after creating the _LIP_, thus conjuring up the unconfigured
`e0` interface in Linux, I changed the VPP interface in three ways:
1. I set the state of the VPP interface to 'up'
1. I set the MTU of the VPP interface to 9000
1. I add an IPv4 and IPv6 address to the interface
Because state does not (yet) propagate, I have to make those changes as well on the Linux side
with the subsequent `ip` commands.
### Configuration
I can imagine that operators want to have more control and facilitate the Linux and VPP changes
themselves. This is why I'll start off by adding a variable called `lcp_sync`, along with a
startup configuration keyword and a CLI setter. This allows me to turn the whole sync behavior on
and off, for example in `startup.conf`:
```
linux-cp {
default netns dataplane
lcp-sync
}
```
And in the CLI:
```
DBGvpp# show lcp
lcp default netns dataplane
lcp lcp-sync on
DBGvpp# lcp lcp-sync off
DBGvpp# show lcp
lcp default netns dataplane
lcp lcp-sync off
```
The prep work for the rest of the interface syncer starts with this
[[commit](https://git.ipng.ch/ipng/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
for the rest of this blog post, the behavior will be in the 'on' position.
### Change interface: state
Immediately, I find a dissonance between VPP and Linux: When Linux sets a parent interface down,
all children go to state `M-DOWN`. When Linux sets a parent interface up, all of its children
automatically go to state `UP` and `LOWER_UP`. To illustrate:
```
ip link set enp66s0f1 down
ip link add link enp66s0f1 name foo type vlan id 1234
ip link set foo down
## Both interfaces are down, which makes sense because I set them both down
ip link | grep enp66s0f1
9: enp66s0f1: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
61: foo@enp66s0f1: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000
ip link set enp66s0f1 up
ip link | grep enp66s0f1
## Both interfaces are up, which doesn't make sense because I only changed one of them!
9: enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
61: foo@enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
```
VPP does not work this way. In VPP, the admin state of each interface is individually
controllable, so it's possible to bring up the parent while leaving the sub-interface in
the state it was. I did notice that you can't bring up a sub-interface if its parent
is down, which I found counterintuitive, but that's neither here nor there.
All of this is to say that we have to be careful when copying state forward, because as
this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c15c84f6c4739860a85c599779c199cb9efef03)]
shows, issuing `set int state ... up` on an interface, won't touch its sub-interfaces in VPP, but
the subsequent netlink message to bring the _LIP_ for that interface up, **will** update the
children, thus desynchronising Linux and VPP: Linux will have interface **and all its
sub-interfaces** up unconditionally; VPP will have the interface up and its sub-interfaces in
whatever state they were before.
To address this, a second
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a3dc56c01461bdffcac8193ead654ae79225220f)] was
needed. I'm not too sure I want to keep this behavior, but for now, it results in an intuitive
end-state, which is that all interfaces states are exactly the same between Linux and VPP.
```
DBGvpp# create sub TenGigabitEthernet3/0/0 10
DBGvpp# lcp create TenGigabitEthernet3/0/0 host-if e0
DBGvpp# lcp create TenGigabitEthernet3/0/0.10 host-if e0.10
DBGvpp# set int state TenGigabitEthernet3/0/0 up
## Correct: parent is up, sub-int is not
694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
695: e0.10@e0: <BROADCAST,MULTICAST> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
DBGvpp# set int state TenGigabitEthernet3/0/0.10 up
## Correct: both interfaces up
694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
695: e0.10@e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
DBGvpp# set int state TenGigabitEthernet3/0/0 down
DBGvpp# set int state TenGigabitEthernet3/0/0.10 down
DBGvpp# set int state TenGigabitEthernet3/0/0 up
## Correct: only the parent is up
694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
695: e0.10@e0: <BROADCAST,MULTICAST> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
```
### Change interface: MTU
Finally, a straight forward
[[commit](https://git.ipng.ch/ipng/lcpng/commit/39bfa1615fd1cafe5df6d8fc9d34528e8d3906e2)], or
so I thought. When the MTU changes in VPP (with `set interface mtu packet N <int>`), there is
callback that can be registered which copies this into the _LIP_. I did notice a specific corner
case: In VPP, a sub-interface can have a larger MTU than its parent. In Linux, this cannot happen,
so the following remains problematic:
```
DBGvpp# create sub TenGigabitEthernet3/0/0 10
DBGvpp# set int mtu packet 1500 TenGigabitEthernet3/0/0
DBGvpp# set int mtu packet 9000 TenGigabitEthernet3/0/0.10
## Incorrect: sub-int has larger MTU than parent, valid in VPP, not in Linux
694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
695: e0.10@e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
```
I think the best way to ensure this works is to _clamp_ the sub-int to a maximum MTU of
that of its parent, and revert the user's request to change the VPP sub-int to anything
higher than that, perhaps logging an error explaining why. This means two things:
1. Any change in VPP of a child MTU to larger than its parent, must be reverted.
1. Any change in VPP of a parent MTU should ensure all children are clamped to at most that.
I addressed the issue in this
[[commit](https://git.ipng.ch/ipng/lcpng/commit/79a395b3c9f0dae9a23e6fbf10c5f284b1facb85)].
### Change interface: IP Addresses
There are three scenarios in which IP addresses will need to be copied from
VPP into the companion Linux devices:
1. `set interface ip address` adds an IPv4 or IPv6 address. This is handled by
`lcp_itf_ip[46]_add_del_interface_addr()` which is a callback installed in
`lcp_itf_pair_init()` at plugin initialization time.
1. `set interface ip address del` removes addresses. This is also handled by
`lcp_itf_ip[46]_add_del_interface_addr()` but curiously there is no
upstream `vnet_netlink_del_ip[46]_addr()` so I had to write them inline here.
I will try to get them upstreamed, as they appear to be obvious companions
in `vnet/device/netlink.h`.
1. This one is easy to overlook, but upon _LIP_ creation, it could be that there
are already L3 addresses present on the VPP interface. If so, set them in the
_LIP_ with `lcp_itf_set_interface_addr()`.
This means with this
[[commit](https://git.ipng.ch/ipng/lcpng/commit/f7e1bb951d648a63dfa27d04ded0b6261b9e39fe)], at
any time a new _LIP_ is created, the IPv4 and IPv6 address on the VPP interface are fully copied
over by the third change, while at runtime, new addresses can be set/removed as well by the first
and second change.
### Further work
I noticed that [Bird](https://bird.network.cz/) periodically scans the Linux
interface list and (re)learns information from them. I have a suspicion that
such a feature might be useful in the VPP plugin as well: I can imagine a
periodical process that walks over the _LIP_ interface list, and compares
what it finds in Linux with what is configured in VPP. What's not entirely
clear to me is which direction should 'trump', that is, should the Linux
state be forced into VPP, or should the VPP state be forced into Linux? I
don't yet have a good feeling of the answer, so I'll punt on that for now.
## Results
After applying the configuration to VPP (in Appendix), here's the results:
```
pim@hippo:~/src/lcpng$ ip ro
default via 194.1.163.65 dev enp6s0 proto static
10.0.1.0/30 dev e0 proto kernel scope link src 10.0.1.1
10.0.2.0/30 dev e0.1234 proto kernel scope link src 10.0.2.1
10.0.3.0/30 dev e0.1235 proto kernel scope link src 10.0.3.1
10.0.4.0/30 dev e0.1236 proto kernel scope link src 10.0.4.1
10.0.5.0/30 dev e0.1237 proto kernel scope link src 10.0.5.1
194.1.163.64/27 dev enp6s0 proto kernel scope link src 194.1.163.88
pim@hippo:~/src/lcpng$ fping 10.0.1.2 10.0.2.2 10.0.3.2 10.0.4.2 10.0.5.2
10.0.1.2 is alive
10.0.2.2 is alive
10.0.3.2 is alive
10.0.4.2 is alive
10.0.5.2 is alive
pim@hippo:~/src/lcpng$ fping6 2001:db8:0:1::2 2001:db8:0:2::2 \
2001:db8:0:3::2 2001:db8:0:4::2 2001:db8:0:5::2
2001:db8:0:1::2 is alive
2001:db8:0:2::2 is alive
2001:db8:0:3::2 is alive
2001:db8:0:4::2 is alive
2001:db8:0:5::2 is alive
```
In case you were wondering: my previous post ended in the same huzzah moment. It did.
The difference is that now the VPP configuration is _much shorter_! Comparing
the Appendix from this post with my [first post]({{< ref "2021-08-12-vpp-1" >}}), after
all of this work I no longer have to manually copy the configuration (like link states,
MTU changes, IP addresses) from VPP into Linux, instead the plugin does all of this work
for me, and I can configure both sides entirely with `vppctl` commands!
### Bonus screencast!
Humor me as I take the code out for a six minute screencast [[asciinema](/assets/vpp/430411.cast),
[gif](/assets/vpp/430411.gif)] :-)
{{< asciinema src="/assets/vpp/430411.cast" >}}
## Credits
I'd like to make clear that the Linux CP plugin is a great collaboration between several great folks
and that my work stands on their shoulders. I've had a little bit of help along the way from Neale
Ranns, Matthew Smith and Jon Loeliger, and I'd like to thank them for their work!
## Appendix
#### Ubuntu config
```
# Untagged interface
ip addr add 10.0.1.2/30 dev enp66s0f0
ip addr add 2001:db8:0:1::2/64 dev enp66s0f0
ip link set enp66s0f0 up mtu 9000
# Single 802.1q tag 1234
ip link add link enp66s0f0 name enp66s0f0.q type vlan id 1234
ip link set enp66s0f0.q up mtu 9000
ip addr add 10.0.2.2/30 dev enp66s0f0.q
ip addr add 2001:db8:0:2::2/64 dev enp66s0f0.q
# Double 802.1q tag 1234 inner-tag 1000
ip link add link enp66s0f0.q name enp66s0f0.qinq type vlan id 1000
ip link set enp66s0f0.qinq up mtu 9000
ip addr add 10.0.3.3/30 dev enp66s0f0.qinq
ip addr add 2001:db8:0:3::2/64 dev enp66s0f0.qinq
# Single 802.1ad tag 2345
ip link add link enp66s0f0 name enp66s0f0.ad type vlan id 2345 proto 802.1ad
ip link set enp66s0f0.ad up mtu 9000
ip addr add 10.0.4.2/30 dev enp66s0f0.ad
ip addr add 2001:db8:0:4::2/64 dev enp66s0f0.ad
# Double 802.1ad tag 2345 inner-tag 1000
ip link add link enp66s0f0.ad name enp66s0f0.qinad type vlan id 1000 proto 802.1q
ip link set enp66s0f0.qinad up mtu 9000
ip addr add 10.0.5.2/30 dev enp66s0f0.qinad
ip addr add 2001:db8:0:5::2/64 dev enp66s0f0.qinad
```
#### VPP config
```
## Look mom, no `ip` commands!! :-)
vppctl set interface state TenGigabitEthernet3/0/0 up
vppctl lcp create TenGigabitEthernet3/0/0 host-if e0
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0
vppctl set interface ip address TenGigabitEthernet3/0/0 10.0.1.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0 2001:db8:0:1::1/64
vppctl create sub TenGigabitEthernet3/0/0 1234
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1234
vppctl lcp create TenGigabitEthernet3/0/0.1234 host-if e0.1234
vppctl set interface state TenGigabitEthernet3/0/0.1234 up
vppctl set interface ip address TenGigabitEthernet3/0/0.1234 10.0.2.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1234 2001:db8:0:2::1/64
vppctl create sub TenGigabitEthernet3/0/0 1235 dot1q 1234 inner-dot1q 1000 exact-match
vppctl set interface state TenGigabitEthernet3/0/0.1235 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1235
vppctl lcp create TenGigabitEthernet3/0/0.1235 host-if e0.1235
vppctl set interface ip address TenGigabitEthernet3/0/0.1235 10.0.3.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1235 2001:db8:0:3::1/64
vppctl create sub TenGigabitEthernet3/0/0 1236 dot1ad 2345 exact-match
vppctl set interface state TenGigabitEthernet3/0/0.1236 up
vppctl lcp create TenGigabitEthernet3/0/0.1236 host-if e0.1236
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1236
vppctl set interface ip address TenGigabitEthernet3/0/0.1236 10.0.4.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1236 2001:db8:0:4::1/64
vppctl create sub TenGigabitEthernet3/0/0 1237 dot1ad 2345 inner-dot1q 1000 exact-match
vppctl set interface state TenGigabitEthernet3/0/0.1237 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1237
vppctl set interface ip address TenGigabitEthernet3/0/0.1237 10.0.5.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1237 2001:db8:0:5::1/64
vppctl lcp create TenGigabitEthernet3/0/0.1237 host-if e0.1237
```
#### Final note
You may have noticed that the [commit] links are all git commits in my private working copy. I want
to wait until my [previous work](https://gerrit.fd.io/r/c/vpp/+/33481) is reviewed and submitted
before piling on more changes. Feel free to contact vpp-dev@ for more information in the mean time
:-)