Files
ipng.ch/content/articles/2021-08-25-vpp-4.md
Pim van Pelt fdb77838b8
All checks were successful
continuous-integration/drone/push Build is passing
Rewrite github.com to git.ipng.ch for popular repos
2025-05-04 21:54:16 +02:00

526 lines
28 KiB
Markdown

---
date: "2021-08-25T08:55:14Z"
title: VPP Linux CP - Part4
aliases:
- /s/articles/2021/08/25/vpp-4.html
params:
asciinema: true
---
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
# About this series
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its
performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic
_ASR_ (aggregation services router), VPP will look and feel quite familiar as many of the approaches
are shared between the two. One thing notably missing, is the higher level control plane, that is
to say: there is no OSPF or ISIS, BGP, LDP and the like. This series of posts details my work on a
VPP _plugin_ which is called the **Linux Control Plane**, or LCP for short, which creates Linux network
devices that mirror their VPP dataplane counterpart. IPv4 and IPv6 traffic, and associated protocols
like ARP and IPv6 Neighbor Discovery can now be handled by Linux, while the heavy lifting of packet
forwarding is done by the VPP dataplane. Or, said another way: this plugin will allow Linux to use
VPP as a software ASIC for fast forwarding, filtering, NAT, and so on, while keeping control of the
interface state (links, addresses and routes) itself. When the plugin is completed, running software
like [FRR](https://frrouting.org/) or [Bird](https://bird.network.cz/) on top of VPP and achieving
&gt;100Mpps and &gt;100Gbps forwarding rates will be well in reach!
In the first three posts, I added the ability for VPP to synchronize its state (like link state,
MTU, and interface addresses) into Linux. In this post, I'll make a start on the other direction:
allowing changes to interfaces made in Linux to make their way back into VPP!
## My test setup
I'm keeping the setup from the [third post]({{< ref "2021-08-15-vpp-3" >}}). A Linux machine has an
interface `enp66s0f0` which has 4 sub-interfaces (one dot1q tagged, one q-in-q, one dot1ad tagged,
and one q-in-ad), giving me five flavors in total. Then, I created an LACP `bond0` interface, which
also has the whole kit and caboodle of sub-interfaces defined, see below in the Appendix for details,
but here's the table again for reference:
| Name | type | Addresses
|-----------------|------|----------
| enp66s0f0 | untagged | 10.0.1.2/30 2001:db8:0:1::2/64
| enp66s0f0.q | dot1q 1234 | 10.0.2.2/30 2001:db8:0:2::2/64
| enp66s0f0.qinq | outer dot1q 1234, inner dot1q 1000 | 10.0.3.2/30 2001:db8:0:3::2/64
| enp66s0f0.ad | dot1ad 2345 | 10.0.4.2/30 2001:db8:0:4::2/64
| enp66s0f0.qinad | outer dot1ad 2345, inner dot1q 1000 | 10.0.5.2/30 2001:db8:0:5::2/64
| bond0 | untagged | 10.1.1.2/30 2001:db8:1:1::2/64
| bond0.q | dot1q 1234 | 10.1.2.2/30 2001:db8:1:2::2/64
| bond0.qinq | outer dot1q 1234, inner dot1q 1000 | 10.1.3.2/30 2001:db8:1:3::2/64
| bond0.ad | dot1ad 2345 | 10.1.4.2/30 2001:db8:1:4::2/64
| bond0.qinad | outer dot1ad 2345, inner dot1q 1000 | 10.1.5.2/30 2001:db8:1:5::2/64
The goal of this post is to show what code needed to be written and introduces an entirely _new
plugin_, so that we can separate concerns (and have a higher chance of community acceptance
of the plugins). In the first plugin, now called the **Interface Mirror**, I have previously
implemented the VPP-to-Linux synchronization. In this new plugin (called the **Netlink Listener**)
I implement the Linux-to-VPP synchronization using, _quelle surprise_, Netlink message handlers.
### Startingpoint
Based on the state of the plugin after the [third post]({{< ref "2021-08-15-vpp-3" >}}),
operators can enable `lcp-sync` (which copies changes made in VPP into their Linux counterpart)
and `lcp-auto-subint` (which extends sub-interface creation in VPP to automatically create a
Linux Interface Pair, or _LIP_, and its companion Linux network interface):
```
DBGvpp# lcp lcp-sync on
DBGvpp# lcp lcp-auto-subint on
DBGvpp# lcp create TenGigabitEthernet3/0/0 host-if e0
DBGvpp# create sub TenGigabitEthernet3/0/0 1234
DBGvpp# create sub TenGigabitEthernet3/0/0 1235 dot1q 1234 inner-dot1q 1000 exact-match
DBGvpp# create sub TenGigabitEthernet3/0/0 1236 dot1ad 2345 exact-match
DBGvpp# create sub TenGigabitEthernet3/0/0 1237 dot1ad 2345 inner-dot1q 1000 exact-match
pim@hippo:~/src/lcpng$ ip link | grep e0
1286: e0.1234@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
1287: e0.1235@e0.1234: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
1288: e0.1236@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
1289: e0.1237@e0.1236: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
1701: e0: <BROADCAST,MULTICAST> mtu 9050 qdisc mq state DOWN mode DEFAULT group default qlen 1000
```
The vision for this plugin has been that Linux can drive most control-plane operations, such as
creating sub-interfaces, adding/removing addresses, changing MTU on links, etc. We can do that by
listening to [Netlink](https://en.wikipedia.org/wiki/Netlink) messages, which were designed for
transferring miscellaneous networking information between the kernel space and userspace processes
(like `VPP`). Networking utilities, such as the _iproute2_ family and its command line utilities
(like `ip`) use Netlink to communicate with the Linux kernel from userspace.
## Netlink Listener
The first task at hand is to install a Netlink listener. In this new plugin, I first register
`lcp_nl_init()` which adds Linux interface pair (_LIP_) add/del callbacks from the first plugin.
I'm now made aware of new _LIPs_ as they are created.
In `lcb_nl_pair_add_cb()`, I will initiate Netlink listener for first interface that gets created,
noting its netns. If subsequent adds are in other netns, I'll just issue a warning. And, I will keep
a refcount so I know how many _LIPs_ are bound to this listener.
In `lcb_nl_pair_del_cb()`, I can remove the listener when the last interface pair is removed.
Then for listening itself, a Netlink socket is opened, and because Linux can be quite chatty on
Netlink sockets, I'll raise its read/write buffers to something quite large (typically 64M read
and 16K write size). One note on this size, it'll need some sysctl to be set before VPP starts,
typically done as follows:
```
pim@hippo:~/src/vpp$ cat << EOF | sudo tee /etc/sysctl.d/81-vpp-Netlink.conf
# Increase Netlink to 64M
net.core.rmem_default=67108864
net.core.wmem_default=67108864
net.core.rmem_max=67108864
net.core.wmem_max=67108864
EOF
pim@hippo:~/src/vpp$ sudo sysctl -p
```
After creating the Netlink socket, I add its file descriptor to VPP's built in file handler, which
will see to polling it. On the file handler, I install `lcp_nl_read_cb()` and `lcp_nl_error_cb()`
callbacks which will be invoked when anything interesting happens on the socket:
A bit of explanation on why I'd use a queue rather than just consuming the Netlink messages directly
as they are offered. I _have to_ use a queue for the common case in which VPP is running single threaded.
Instead of consuming a block of potentially a million route del/add's (say, if BGP is reconverging),
and thereby blocking VPP from reading new packets from DPDK, but more importantly, new Netlink
messages from the kernel, which will fill the 64M socket buffer and overflow it, losing Netlink messages,
which is bad because it requires an end to end resync of the Linux namespace into the VPP dataplane,
something called an `NLM_F_DUMP` but that's a story for another day.
So I process only a batch of messages and only for a maximum amount of time per batch. If there are still
some messages left in the queue, I'll just reschedule consumption after M milliseconds. This allows new
Netlink messages to continuously be read from the kernel by VPP's file handler, even if there's a lot of
work to do.
* `lcp_nl_read_cb()` calls `lcp_nl_callback()` which pushes Netlink messages onto a queue and
issues a `NL_EVENT_READ` event, any socket read error issues `NL_EVENT_READ_ERR` event.
* `lcp_nl_error_cb()` simply issues `NL_EVENT_READ_ERR` event and moves on with life.
To capture these events, I initialize a process node called `lcp_nl_process()`, which handles:
* `NL_EVENT_READ` by calling `lcp_nl_process_msgs()` and processing a batch of messages (either
a maximum count, or a maximum duration, whichever is reached first).
* `NL_EVENT_READ_ERR` is the other event that can happen, in case VPP's file handler or my own
`lcp_nl_read_cb()` encounter a read error. All it does is close and reopen the Netlink socket
in the same network namespace we were before, in an attempt to minimize the damage, _dazed and
confused, but trying to continue_.
Allright, so at this point, I have a producer queue that gets added to by the Netlink reader
machinery, so all I have to do is consume them. `lcp_nl_process_msgs()` processes up to N messages
and/or for up to M msecs, whichever comes first, and for each individual Netlink message, it
will call `lcp_nl_dispatch()` to handle messages of a given type.
For now, `lcp_nl_dispatch()` just throws the message away after logging it with `format_nl_object()`,
a function that will come in very useful as I start to explore all the different Netlink message types.
The code that forms the basis of our Netlink Listener lives in [[this
commit](https://git.ipng.ch/ipng/lcpng/commit/c4e3043ea143d703915239b2390c55f7b6a9b0b1)] and
specifically, here I want to call out I was not the primary author, I worked off of Matt and Neale's
awesome work in this pending [Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122).
### Netlink: Neighbor
ARP and IPv6 Neighbor Discovery will trigger a set of Netlink messages, which are of type
`RTM_NEWNEIGH` and `RTM_DELNEIGH`
First, I'll add a new source file `lcpng_nl_sync.c` that will house these handler functions.
Their purpose is to take state learned from Netlink messages, and apply that state to VPP.
Then, I add `lcp_nl_neigh_add()` and `lcp_nl_neigh_del()` which implement the following
pattern: Most Netlink messages are somehow about a `link`, which is identified by an
interface index (`ifindex` or just idx for short). That's the same interface index I stored
when I created the _LIP_, calling it `vif_index` because in VPP, it describes a `virtio`
device which implements the IO for the TAP.
If I'm handling a message for link with a given ifindex, I can correlate it with a _LIP_. Not all
messages will be related to something VPP knows or cares about, I'll discuss that more later when
I discuss `RTM_NEWLINK` messages.
If there is no _LIP_ associated with the `ifindex`, then clearly this message is about a
Linux interface VPP is not aware of. But, if I can find the _LIP_, I can convert the lladdr
(MAC address) and IP address from the Netlink message into their VPP variants, and then simply
add or remove the ip4/ip6 neighbor adjacency.
The code for this first Netlink message handler lives in this
[[commit](https://git.ipng.ch/ipng/lcpng/commit/30bab1d3f9ab06670fbef2c7c6a658e7b77f7738)]. An
ironic insight is that after writing the code, I don't think any of it will be necessary, because
the interface plugin will already copy ARP and IPv6 ND packets back and forth and itself update its
neighbor adjacency tables; but I'm leaving the code in for now.
### Netlink: Address
A decidedly more interesting message is `RTM_NEWADDR` and its deletion companion `RTM_DELADDR`.
It's pretty straight forward to add and remove IPv4 and IPv6 addresses on interfaces. I have
to convert the Netlink representation of an IP address to its VPP counterpart with a helper, add
it or remove it, and if there are no link-local addresses left, disable IPv6 on the interface.
There's also a few multicast routes to add (notably 224.0.0.0/24 and ff00::/8, all-local-subnet).
The code for IP address handling is in this
[[commit]](https://git.ipng.ch/ipng/lcpng/commit/87742b4f541d389e745f0297d134e34f17b5b485), but
when I took it out for a spin, I noticed something curious, looking at the log lines that are
generated for the following sequence:
```
ip addr add 10.0.1.1/30 dev e0
debug linux-cp/nl addr_add: Netlink route/addr: add idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
warn linux-cp/nl dispatch: ignored route/route: add family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: add family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }
ping 10.0.1.2
debug linux-cp/nl neigh_add: Netlink route/neigh: add idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
notice linux-cp/nl neigh_add: Added 10.0.1.2 lladdr 68:05:ca:32:45:94 iface TenGigabitEthernet3/0/0
ip addr del 10.0.1.1/30 dev e0
debug linux-cp/nl addr_del: Netlink route/addr: del idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
notice linux-cp/nl addr_del: Deleted 10.0.1.1/30 iface TenGigabitEthernet3/0/0
warn linux-cp/nl dispatch: ignored route/route: del family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: del family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
debug linux-cp/nl neigh_del: Netlink route/neigh: del idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
error linux-cp/nl neigh_del: Failed 10.0.1.2 iface TenGigabitEthernet3/0/0
```
It is this very last message that's a bit of a surprise -- the ping brought the peer's
lladdr into the neighbor cache; and the subsequent address deletion first removed the address,
then all the typical local routes (the connected, the broadcast, the network, and the self/local);
but then as well explicitly deleted the neighbor, which I suppose is correct behavior for Linux,
were it not that VPP already invalidates the neighbor cache and adds/removes the connected routes
for example in `ip/ip4_forward.c` L826-L830 and L583.
I can see more of these false positive non-errors like the one on `lcp_nl_neigh_del()` because
interface and directly connected route addition/deletion is slightly different in VPP than in Linux.
So, I decide to take a little shortcut -- if an addition returns "already there", or a deletion returns
"no such entry", I'll just consider it a successful addition and deletion respectively, saving my eyes
from being screamed at by this red error message. I changed that in this
[[commit](https://git.ipng.ch/ipng/lcpng/commit/d63fbd8a9a612d038aa385e79a57198785d409ca)],
turning this situation in a friendly green notice instead.
### Netlink: Link (existing)
There's a bunch of use cases for these messages `RTM_NEWLINK` and `RTM_DELLINK`. They carry information
about carrier (link, no-link), admin state (up/down), MTU, and so on. The function `lcp_nl_link_del()`
is the easier of the two. If I see a message like this for an ifindex that VPP has a _LIP_ for, I'll
just remove it. This means first calling the `lcp_itf_pair_delete()` function and then, if the message
was for a VLAN interface, remove the accompanying sub-interface (both the physical one (eg. `TenGigabitEthernet3/0/0.1234`)
as well as the TAP that we used to communicate to the host with (eg. `tap8.1234`).
The other message (the `RTM_NEWLINK` one), is much more complicated, because it's actually many types
of operation all in one message type: We can set the link up/down, change its MTU, and change its MAC
address, in any combination, perhaps like so:
```
ip link set e0 mtu 9216 address 00:11:22:33:44:55 down
```
So in turn, `lcp_nl_link_add()` will first look at admin state and apply it to the phy and tap,
apply the MTU if it's different to what VPP has, and apply the MAC address if it's different to
what VPP has, notably applying MAC addresses only in 'hardware' interfaces, which I now know are
not just physical ones like `TenGigabitEthernet3/0/0` but also virtual ones like `BondEthernet0`.
One thing I noticed, is that link state and MTU changes tend to go around in circles (from Netlink
into VPP, with this code, but when `lcp-sync` is on in the interface mirror plugin, changes to link
and mtu will trigger a callback there, which will in turn generate a Netlink message, and so on).
To avoid this loop, I temporarily turn off `lcp-sync` just before handling a batch of messages, and
turn it back to its original state when I'm done with that.
The code for all/del of existing links is in this
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)].
### Netlink: Link (new)
Here's where it gets interesting! What if the `RTM_NEWLINK` message was for an interface that VPP
doesn't have a _LIP_ for, but specifically describes a VLAN interface? Well, then clearly the operator
is trying to create a new sub-interface. And supporting that operation would be super cool, so let's go!
Using the earlier placeholder hint in `lcp_nl_link_add()` (see the previous
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)]),
I know that I've gotten a NEWLINK request but the Linux ifindex doesn't have a _LIP_. This could be
because the interface is entirely foreign to VPP, for example somebody created a dummy interface or
a VLAN sub-interface on one:
```
ip link add dum0 type dummy
ip link add link dum0 name dum0.10 type vlan id 10
```
Or perhaps more interestingly, the operator is actually trying to create a VLAN sub-interface on an
interface we created in VPP earlier, like these:
```
ip link add link e0 name e0.1234 type vlan id 1234
ip link add link e0.1234 name e0.1235 type vlan id 1000
ip link add link e0 name e0.1236 type vlan id 2345 proto 802.1ad
ip link add link e0.1236 name e0.1237 type vlan id 1000
```
None of these `RTM_NEWLINK` messages, represented by vif (Linux ifindex) will have a corresponding _LIP_.
So, I try to _create one_ by calling `lcp_nl_link_add_vlan()`.
First, I'll lookup the parent ifindex (`dum0` or `e0` in the examples above). The first example parent,
`dum0`, doesn't have a _LIP_, so I bail after logging a warning. The second example however, `e0`,
definitely does have a _LIP_, so it's known to VPP.
Now, I have two further choices:
1. the _LIP_ is a phy (ie `TenGigabitEthernet3/0/0` or `BondEthernet0`) and this is a regular tagged
interface with a given proto (dot1q or dot1ad); or
1. the _LIP_ is itself a subint (ie `TenGigabitEthernet3/0/0.1234`) and what I'm being asked for is
actually a QinQ or QinAD sub-interface. Remember, there's an important difference:
- In Linux these sub-interfaces are chained (`e0` creates child `e0.1234@e0` for a normal VLAN,
and `e0.1234` creates child `e0.1235@e0.1234` for the QinQ).
- In VPP these are actually all flat sub-interfaces, with the 'regular' VLAN interface carrying
the `one_tag` flag with only an `outer_vlan_id` set, and the latter QinQ carrying the `two_tags`
flag with both an `outer_vlan_id` (1234) and an `inner_vlan_id` (1000).
So I look up both the parent _LIP_ as well the phy _LIP_. I now have all the ingredients I need to create
the VPP sub-interfaces with the correct inner-dot1q and outer dot1q or dot1ad.
Of course, I don't really know what subinterface ID to use. It's appealing to "just" use the vlan id,
but that's not helpful if the outer tag and the inner tag are the same. So I write a helper function
`vnet_sw_interface_get_available_subid()` whose job it is to return an unused subid for the phy,
starting from 1.
Here as well, the interface plugin can be configured to automatically create _LIPs_ for sub-interfaces,
which I have to turn off temporarily to let my new form of creation do its thing. I carefully ensure that
the thread barrier is taken/released and the original setting of `lcp-auto-subint` is restored at all
exit points. One cool thing is that the new link's name is given in the Netlink message, so I can just
use that one. I like the aesthetic a bit more, because here the operator can give the Linux interface
any name they like, where-as in the other direction, VPP's `lcp-auto-subint` feature has to make up
a boring `<phy>.<subid>` name.
Alright, without further ado, the code for the main innovation here, the implementation of
`lcp_nl_link_add_vlan()`, is in this
[[commit](https://git.ipng.ch/ipng/lcpng/commit/45f408865688eb7ea0cdbf23aa6f8a973be49d1a)].
## Results
The functional regression test I made on day one, the one that ensures end-to-end connectivity to and
from the Linux host interfaces works for all 5 interface types (untagged, .1q tagged, QinQ, .1ad tagged
and QinAD) and for both physical and virtual interfaces (like `TenGigabitEthernet3/0/0` and `BondEthernet0`),
still works.
After this code is in, the operator will only have to create a _LIP_ for any phy interfaces, and
can rely on the new Netlink Listener plugin and the use of `ip` in Linux for all the rest. This
implementation starts approaching 'vanilla' Linux user experience!
Here's a new screencast [[asciinema](/assets/vpp/432243.cast), [gif](/assets/vpp/432243.gif)]
showing me playing around a bit, demonstrating that synchronization works pretty well in both
directions, a huge improvement from the [[previous asciinema](/assets/vpp/430411.cast),
[gif](/assets/vpp/430411.gif)] in my [[second post]({{< ref "2021-08-13-vpp-2"
>}})], which was only two weeks ago:
{{< asciinema src="/assets/vpp/432243.cast" >}}
### Further Work
You will note that there's one important Netlink message type that's missing: routes! They are so
important in fact, that they're a topic of their very own post. Also, I haven't written the code
for them yet :-)
A few things worth noting, as future work.
**Multiple NetNS** - The original Netlink Listener ([ref](https://gerrit.fd.io/r/c/vpp/+/31122)) would
only listen to the default netns specified in the configuration file. This is problematic because the
interface plugin does allow interfaces to be made in other namespaces (by issuing
`lcp create ... host-if X netns foo`), the Netlink world of which will be unknown to VPP. I
created `struct lcp_nl_netlink_namespace` to hold the stuff needed for the Netlink listener,
which is a good starting point to create not one but multiple listeners, one for each unique
namespace that has one or more _LIPs_ defined. This is version-two work :)
**Multithreading** - In testing, I noticed that while my plugin itself are (or seem to be..) thread
safe, `virtio` may not be totally clean, and I noticed that in a multithreaded VPP instance with many
workers, there's a crash in `lcp_arp_phy_node()` where `vlib_buffer_copy()` returns NULL, which should
not happen. When VPP is in such a state, other plugins (notably DHCP and IPv6 ND) also start complaining,
and `show errors` shows millions of `virtio-input` errors about unavailable buffers.
I do confirm though, that running VPP single threaded does not have these issues.
## Credits
I'd like to make clear that the Linux CP plugin is a collaboration between several great minds,
and that my work stands on other software engineer's shoulders. In particular most of the Netlink
socket handling and Netlink message queueing was written by Matthew Smith, and I've had a little bit
of help along the way from Neale Ranns and Jon Loeliger. I'd like to thank them for their work!
## Appendix
#### Ubuntu config
This configuration has been the exact same ever since [my first post]({{< ref "2021-08-12-vpp-1" >}}):
```
# Untagged interface
ip addr add 10.0.1.2/30 dev enp66s0f0
ip addr add 2001:db8:0:1::2/64 dev enp66s0f0
ip link set enp66s0f0 up mtu 9000
# Single 802.1q tag 1234
ip link add link enp66s0f0 name enp66s0f0.q type vlan id 1234
ip link set enp66s0f0.q up mtu 9000
ip addr add 10.0.2.2/30 dev enp66s0f0.q
ip addr add 2001:db8:0:2::2/64 dev enp66s0f0.q
# Double 802.1q tag 1234 inner-tag 1000
ip link add link enp66s0f0.q name enp66s0f0.qinq type vlan id 1000
ip link set enp66s0f0.qinq up mtu 9000
ip addr add 10.0.3.2/30 dev enp66s0f0.qinq
ip addr add 2001:db8:0:3::2/64 dev enp66s0f0.qinq
# Single 802.1ad tag 2345
ip link add link enp66s0f0 name enp66s0f0.ad type vlan id 2345 proto 802.1ad
ip link set enp66s0f0.ad up mtu 9000
ip addr add 10.0.4.2/30 dev enp66s0f0.ad
ip addr add 2001:db8:0:4::2/64 dev enp66s0f0.ad
# Double 802.1ad tag 2345 inner-tag 1000
ip link add link enp66s0f0.ad name enp66s0f0.qinad type vlan id 1000 proto 802.1q
ip link set enp66s0f0.qinad up mtu 9000
ip addr add 10.0.5.2/30 dev enp66s0f0.qinad
ip addr add 2001:db8:0:5::2/64 dev enp66s0f0.qinad
## Bond interface
ip link add bond0 type bond mode 802.3ad
ip link set enp66s0f2 down
ip link set enp66s0f3 down
ip link set enp66s0f2 master bond0
ip link set enp66s0f3 master bond0
ip link set enp66s0f2 up
ip link set enp66s0f3 up
ip link set bond0 up
ip addr add 10.1.1.2/30 dev bond0
ip addr add 2001:db8:1:1::2/64 dev bond0
ip link set bond0 up mtu 9000
# Single 802.1q tag 1234
ip link add link bond0 name bond0.q type vlan id 1234
ip link set bond0.q up mtu 9000
ip addr add 10.1.2.2/30 dev bond0.q
ip addr add 2001:db8:1:2::2/64 dev bond0.q
# Double 802.1q tag 1234 inner-tag 1000
ip link add link bond0.q name bond0.qinq type vlan id 1000
ip link set bond0.qinq up mtu 9000
ip addr add 10.1.3.2/30 dev bond0.qinq
ip addr add 2001:db8:1:3::2/64 dev bond0.qinq
# Single 802.1ad tag 2345
ip link add link bond0 name bond0.ad type vlan id 2345 proto 802.1ad
ip link set bond0.ad up mtu 9000
ip addr add 10.1.4.2/30 dev bond0.ad
ip addr add 2001:db8:1:4::2/64 dev bond0.ad
# Double 802.1ad tag 2345 inner-tag 1000
ip link add link bond0.ad name bond0.qinad type vlan id 1000 proto 802.1q
ip link set bond0.qinad up mtu 9000
ip addr add 10.1.5.2/30 dev bond0.qinad
ip addr add 2001:db8:1:5::2/64 dev bond0.qinad
```
#### VPP config
We can whittle down the VPP configuration to the bare minimum:
```
vppctl lcp default netns dataplane
vppctl lcp lcp-sync on
vppctl lcp lcp-auto-subint on
## Create `e0`
vppctl lcp create TenGigabitEthernet3/0/0 host-if e0
## Create `be0`
vppctl create bond mode lacp load-balance l34
vppctl bond add BondEthernet0 TenGigabitEthernet3/0/2
vppctl bond add BondEthernet0 TenGigabitEthernet3/0/3
vppctl set interface state TenGigabitEthernet3/0/2 up
vppctl set interface state TenGigabitEthernet3/0/3 up
vppctl lcp create BondEthernet0 host-if be0
```
And the rest of the confifuration work is done entirely from the Linux side!
```
IP="sudo ip netns exec dataplane ip"
## `e0` aka TenGigabitEthernet3/0/0
$IP link add link e0 name e0.1234 type vlan id 1234
$IP link add link e0.1234 name e0.1235 type vlan id 1000
$IP link add link e0 name e0.1236 type vlan id 2345 proto 802.1ad
$IP link add link e0.1236 name e0.1237 type vlan id 1000
$IP link set e0 up mtu 9000
$IP addr add 10.0.1.1/30 dev e0
$IP addr add 2001:db8:0:1::1/64 dev e0
$IP addr add 10.0.2.1/30 dev e0.1234
$IP addr add 2001:db8:0:2::1/64 dev e0.1234
$IP addr add 10.0.3.1/30 dev e0.1235
$IP addr add 2001:db8:0:3::1/64 dev e0.1235
$IP addr add 10.0.4.1/30 dev e0.1236
$IP addr add 2001:db8:0:4::1/64 dev e0.1236
$IP addr add 10.0.5.1/30 dev e0.1237
$IP addr add 2001:db8:0:5::1/64 dev e0.1237
## `be0` aka BondEthernet0
$IP link add link be0 name be0.1234 type vlan id 1234
$IP link add link be0.1234 name be0.1235 type vlan id 1000
$IP link add link be0 name be0.1236 type vlan id 2345 proto 802.1ad
$IP link add link be0.1236 name be0.1237 type vlan id 1000
$IP link set be0 up mtu 9000
$IP addr add 10.1.1.1/30 dev be0
$IP addr add 2001:db8:1:1::1/64 dev be0
$IP addr add 10.1.2.1/30 dev be0.1234
$IP addr add 2001:db8:1:2::1/64 dev be0.1234
$IP addr add 10.1.3.1/30 dev be0.1235
$IP addr add 2001:db8:1:3::1/64 dev be0.1235
$IP addr add 10.1.4.1/30 dev be0.1236
$IP addr add 2001:db8:1:4::1/64 dev be0.1236
$IP addr add 10.1.5.1/30 dev be0.1237
$IP addr add 2001:db8:1:5::1/64 dev be0.1237
```
#### Final note
You may have noticed that the [commit] links are all to git commits in my private working copy. I
want to wait until my [previous work](https://gerrit.fd.io/r/c/vpp/+/33481) is reviewed and
submitted before piling on more changes. Feel free to contact vpp-dev@ for more information in the
mean time :-)