728 lines
36 KiB
Markdown
728 lines
36 KiB
Markdown
---
|
|
date: "2022-04-02T08:50:19Z"
|
|
title: VPP Configuration - Part2
|
|
---
|
|
|
|
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
|
|
|
# About this series
|
|
|
|
I use VPP - Vector Packet Processor - extensively at IPng Networks. Earlier this year, the VPP community
|
|
merged the [Linux Control Plane]({%post_url 2021-08-12-vpp-1 %}) plugin. I wrote about its deployment
|
|
to both regular servers like the [Supermicro]({%post_url 2021-09-21-vpp-7 %}) routers that run on our
|
|
[AS8298]({% post_url 2021-02-27-network %}), as well as virtual machines running in
|
|
[KVM/Qemu]({% post_url 2021-12-23-vpp-playground %}).
|
|
|
|
Now that I've been running VPP in production for about half a year, I can't help but notice one specific
|
|
drawback: VPP is a programmable dataplane, and _by design_ it does not include any configuration or
|
|
controlplane management stack. It's meant to be integrated into a full stack by operators. For end-users,
|
|
this unfortunately means that typing on the CLI won't persist any configuration, and if VPP is restarted,
|
|
it will not pick up where it left off. There's one developer convenience in the form of the `exec`
|
|
command-line (and startup.conf!) option, which will read a file and apply the contents to the CLI line
|
|
by line. However, if any typo is made in the file, processing immediately stops. It's meant as a convenience
|
|
for VPP developers, and is certainly not a useful configuration method for all but the simplest topologies.
|
|
|
|
Luckily, VPP comes with an extensive set of APIs to allow it to be programmed. So in this series of posts,
|
|
I'll detail the work I've done to create a configuration utility that can take a YAML configuration file,
|
|
compare it to a running VPP instance, and step-by-step plan through the API calls needed to safely apply
|
|
the configuration to the dataplane. Welcome to `vppcfg`!
|
|
|
|
In this second post of the series, I want to talk a little bit about how planning a path from a running
|
|
configuration to a desired new configuration might look like.
|
|
|
|
**Note**: Code is on [my Github](https://github.com/pimvanpelt/vppcfg), but it's not quite ready for
|
|
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
|
|
or reach out by [contacting us](/s/contact/).
|
|
|
|
## VPP Config: a DAG
|
|
|
|
Before we dive into my `vppcfg` code, let me first introduce a mental model of how configuration is built. We
|
|
rarely stop and think about it, but when we configure our routers (no matter if it's a Cisco or a Juniper or
|
|
a VPP router), in our mind we logically order the operations in a very particular way. To state the obvious,
|
|
if I want to create a sub-interface which also has an address, I would create the sub-int _before_ adding the
|
|
address, right? Similarly, if I wanted to expose a sub-interface `Hu12/0/0.100` in Linux as a _LIP_, I would
|
|
create it only _after_ having created a _LIP_ for the parent interface `Hu12/0/0`, to satisfy Linux's
|
|
requirement all sub-interfaces have a parent interface, like so:
|
|
|
|
```
|
|
vpp# create sub HundredGigabitEthernet12/0/0 100
|
|
vpp# set interface ip address HundredGigabitEthernet12/0/0.100 192.0.2.1/29
|
|
vpp# lcp create HundredGigabitEthernet12/0/0 host-if ice0
|
|
vpp# lcp create HundredGigabitEthernet12/0/0.100 host-if ice0.100
|
|
vpp# set interface state HundredGigabitEthernet12/0/0 up
|
|
vpp# set interface state HundredGigabitEthernet12/0/0.100 up
|
|
```
|
|
|
|
Of course some of the ordering doesn't strictly matter. For example, I can set the state of
|
|
`Hu12/0/0.100` up before adding the address, or after adding the address, or even after adding the
|
|
_LIP_, but one thing is certain: I cannot set its state to up before it was created in the first place!
|
|
In the other direction, when removing things, it's easy to see that you cannot manipulate the state
|
|
of a sub-interface after deleting it, so to cleanly remove the construction above, I would have to
|
|
walk the statements back in reverse, like so:
|
|
|
|
```
|
|
vpp# set interface state HundredGigabitEthernet12/0/0.100 down
|
|
vpp# set interface state HundredGigabitEthernet12/0/0 down
|
|
vpp# lcp delete HundredGigabitEthernet12/0/0.100 host-if ice0.100
|
|
vpp# lcp delete HundredGigabitEthernet12/0/0 host-if ice0
|
|
vpp# set interface ip address del HundredGigabitEthernet12/0/0.100 192.0.2.1/29
|
|
vpp# delete sub HundredGigabitEthernet12/0/0.100
|
|
```
|
|
|
|
Because of this reasonably straight forward ordering, it's possible to construct a graph showing
|
|
operations that depend on other operations having been completed beforehand. Such a graph, called
|
|
a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) or _DAG_.
|
|
|
|
{{< image width="400px" float="left" src="/assets/vppcfg/vppcfg-dag.png" alt="DAG" >}}
|
|
|
|
First some theory (from Wikipedia): A directed graph is a DAG if and only if it can be topologically
|
|
ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions.
|
|
DAGs have numerous scientific and computational applications, but the one I'm mostly interested here
|
|
is dependency mapping and computational scheduling.
|
|
|
|
A graph is formed by vertices and by edges connecting pairs of vertices, where the vertices are
|
|
objects that might exist in VPP (interfaces, bridge-domains, VXLAN tunnels, IP addresses, etc),
|
|
and these objects are connected in pairs by edges. In the case of a directed graph, each edge has an
|
|
orientation (or direction), from one (source) vertex to another (destination) vertex. A path in a
|
|
directed graph is a sequence of edges having the property that the ending vertex of each edge in the
|
|
sequence is the same as the starting vertex of the next edge in the sequence; a path forms a cycle
|
|
if the starting vertex of its first edge equals the ending vertex of its last edge. A directed acyclic
|
|
graph is a directed graph that has no cycles, which in this particular case means that objects'
|
|
existence can't rely other things that ultimately rely back on their own existence.
|
|
|
|
After I got that technobabble out of the way, practically speaking, the _edges_ in this graph model
|
|
dependencies, let me give a few examples:
|
|
|
|
1. The arrow from _Sub Interface_ pointing at _BondEther_ and _Physical Int_ makes the claim that
|
|
for the sub-int to exist, it _depends on_ the existence of either a BondEthernet, or a PHY.
|
|
1. The arrow from the _BondEther_ to the _Physical Int_, which makes the claim that for the BondEthernet
|
|
to work, it must have one or more PHYs in it.
|
|
1. There is no arrow between _BondEther_ and _Sub Interface_ which makes the claim that they are
|
|
independent, there is no need for a sub-int to exist in order for a BondEthernet to work.
|
|
|
|
## VPP Config: Ordering
|
|
|
|
In my [previous]({% post_url 2022-03-27-vppcfg-1 %}) post, I talked about a bunch of constraints that
|
|
make certain YAML configurations invalid (for example, having both _dot1q_ and _dot1ad_ on a sub-interface,
|
|
that wouldn't make any sense). Here, I'm going to talk about another type of constraint: ***Temporal
|
|
Constraints*** are statements about the ordering of operations. With the example DAG above, I derive the
|
|
following constraints:
|
|
|
|
* A parent interface must exist before a sub-interface can be created on it
|
|
* An interface (regardless of sub-int or phy) must exist before an IP address can be added to it
|
|
* A _LIP_ can be created on a sub-int only if its parent PHY has a _LIP_
|
|
* _LIPs_ must be removed from all sub-interfaces before a PHY's _LIP_ can be removed
|
|
* The admin-state of a sub-interface can only be up if its PHY is up
|
|
* ... and so on.
|
|
|
|
But there's a second thing to keep in mind, and this is a bit more specific to the VPP configuration
|
|
operations themselves. Sometimes, I may find that an object already exists, say a sub-interface, but
|
|
that it has configuration attributes that are not what I wanted. For example, I may have previously
|
|
configured a sub-int to be of a certain encapsulation `dot1q 1000 inner-dot1q 1234`, but I changed
|
|
my mind and want the sub-int to now be `dot1ad 1000 inner-dot1q 1234` instead. Some attributes of
|
|
an interface can be changed on the fly (like the MTU, for example), but some really cannot, and in
|
|
my example here, the encapsulation change has to be done another way.
|
|
|
|
I'll make an obvious but hopefully helpful observation: I can't create the second sub-int with
|
|
the same subid, because one already exists (duh). The intuitive way to solve this, of course, is to
|
|
delete the old sub-int _first_ and then create a _new_ sub-int with the correct attributes (`dot1ad`
|
|
outer encapsulation).
|
|
|
|
Here's another scenario that illustrates the ordering: Let's say I want to move an IP address
|
|
from interface A to interface B. In VPP, I can't configure the same IP address/prefixlen on two
|
|
interfaces at the same time, so as with the previous scenario of the encap changing, I will want
|
|
to remove the IP address from A before adding it to B.
|
|
|
|
Come to think of it, there are lots of scenarios where remove-before-add is required:
|
|
* If an interface was in bridge-domain A but now wants to be put in bridge-domain B, it'll have
|
|
to be _removed_ from the first bridge before being _added_ to the second bridge, because an
|
|
interface can't be in two bridges at the same time.
|
|
* If an interface was a member of a BondEthernet, but will be moved to be a member of a
|
|
bridge-domain now, it will have to be _removed_ from the bond before being _added_ to the
|
|
bridge, because an interface can't be both a bondethernet member and a member of a bridge
|
|
at the same time.
|
|
* And to add to the list, the scenario above: A sub-interface that differs in its intended
|
|
encapsulation must be _removed_ before a new one with the same `subid` can be _created_.
|
|
|
|
All of these cases can be modeled as edges (arrows) between vertices (objects) in the graph
|
|
describing the ordering of operations in VPP! I'm now ready to draw two important conclusions:
|
|
|
|
1. All objects that differ from their intended configuration must be removed before being
|
|
added elsewhere, in order to avoid them being referenced/used twice.
|
|
1. All objects must be created before their attributes can be set.
|
|
|
|
### vppcfg: Path Planning
|
|
|
|
By thinking about the configuration in this way, I can precisely predict the order of
|
|
operations needed to go from any running dataplane configuration to _any new_ target
|
|
dataplane configuration. A so called path-planner emerges, which has three main phases of
|
|
execution:
|
|
|
|
1. **Prune** phase (remove objects from VPP that are not in the config)
|
|
1. **Create** phase (add objects to VPP that are in the config but not VPP)
|
|
1. **Sync** phase, for each object in the configuration
|
|
|
|
When removing things, care has to be taken to remove inner-most objects first (first removing
|
|
LCP, then QinQ, Dot1Q, BondEthernet, and lastly PHY), because indeed, there exists a dependency
|
|
relationship between objects in this DAG. Conversely, when creating objects, the edges flip their
|
|
directionality, because creation must be done on outer-most objects first (first creating the
|
|
PHY, then BondEthernet, Dot1Q, QinQ and lastly LCP).
|
|
|
|
For example, QinQ/QinAD sub-interfaces should be removed before before their intermediary
|
|
Dot1Q/Dot1AD can be removed. Another example, MTU of parents should raise before their children,
|
|
while children should shrink before their parent.
|
|
|
|
**Order matters**.
|
|
|
|
**Pruning**: First, `vppcfg` will ensure all objects do not have attributes which they should not (eg. IP
|
|
addresses) and that objects are destroyed that are not needed (ie. have been removed from the
|
|
target config). After this phase, I am certain that any object that exists in the dataplane,
|
|
both (a) has the right to exist (because it's in the target configuration), and (b) has the
|
|
correct create-time (ie non syncable) attributes.
|
|
|
|
**Creating**: Next, `vppcfg` will ensure that all objects that are not yet present (including the ones that
|
|
it just removed because they were present but had incorrect attributes), get (re)created in the
|
|
right order. After this phase, I am certain that _all objects_ in the dataplane now (a) have the
|
|
right to exist (because they are in the target configuration), (b) have the correct attributes,
|
|
but newly, also that (c) all objects that are in the target configuration also got created and
|
|
now exist in the dataplane.
|
|
|
|
**Syncing**: Finally, all objects are synchronized with the target configuration (IP addresses,
|
|
MTU etc), taking care to shrink children before their parents, and growing parents before their
|
|
children (this is for the special case of any given sub-interface's MTU having to be equal to or
|
|
lower than their parent's MTU).
|
|
|
|
### vppcfg: Demonstration
|
|
|
|
I'll create three configurations and let vppcfg path-plan between them. I start a completely
|
|
empty VPP dataplane which has two GigabitEthernet and two HundredGigabitEthernet interfaces:
|
|
|
|
```
|
|
pim@hippo:~/src/vpp$ make run
|
|
_______ _ _ _____ ___
|
|
__/ __/ _ \ (_)__ | | / / _ \/ _ \
|
|
_/ _// // / / / _ \ | |/ / ___/ ___/
|
|
/_/ /____(_)_/\___/ |___/_/ /_/
|
|
|
|
DBGvpp# show interface
|
|
Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count
|
|
GigabitEthernet3/0/0 1 down 9000/0/0/0
|
|
GigabitEthernet3/0/1 2 down 9000/0/0/0
|
|
HundredGigabitEthernet12/0/0 3 down 9000/0/0/0
|
|
HundredGigabitEthernet12/0/1 4 down 9000/0/0/0
|
|
local0 0 down 0/0/0/0
|
|
```
|
|
|
|
#### Demo 1: First time config (empty VPP)
|
|
|
|
First, starting simple, I write the following YAML configuration called `hippo4.yaml`. It defines a
|
|
few sub-interfaces, a bridgedomain with one QinQ sub-interface `Hu12/0/0.101` in it, and it then
|
|
cross-connects `Gi3/0/0.100` with `Hu12/0/1.100`, keeping all sub-interfaces at an MTU of 2000 and
|
|
their PHYs at an MTU of 9216:
|
|
|
|
```
|
|
interfaces:
|
|
GigabitEthernet3/0/0:
|
|
mtu: 9216
|
|
sub-interfaces:
|
|
100:
|
|
mtu: 2000
|
|
l2xc: HundredGigabitEthernet12/0/1.100
|
|
GigabitEthernet3/0/1:
|
|
description: Not Used
|
|
HundredGigabitEthernet12/0/0:
|
|
mtu: 9216
|
|
sub-interfaces:
|
|
100:
|
|
mtu: 3000
|
|
101:
|
|
mtu: 2000
|
|
encapsulation:
|
|
dot1q: 100
|
|
inner-dot1q: 200
|
|
exact-match: True
|
|
HundredGigabitEthernet12/0/1:
|
|
mtu: 9216
|
|
sub-interfaces:
|
|
100:
|
|
mtu: 2000
|
|
l2xc: GigabitEthernet3/0/0.100
|
|
|
|
bridgedomains:
|
|
bd10:
|
|
description: "Bridge Domain 10"
|
|
mtu: 2000
|
|
interfaces: [ HundredGigabitEthernet12/0/0.101 ]
|
|
```
|
|
|
|
If I offer this config to `vppcfg` and ask it to plan a path, there won't be any **pruning** going on,
|
|
because there are no objects in the newly started VPP dataplane that need to be deleted. But I do expect
|
|
to see a bunch of sub-interface and one bridge-domain **creation**, followed by **syncing** a bunch of
|
|
interfaces with bridge-domain memberships and L2 Cross Connects. Finally, the MTU of the interfaces will
|
|
be sync'd to their configured values, and the path is planned like so:
|
|
|
|
```
|
|
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo4.yaml plan
|
|
[INFO ] root.main: Loading configfile hippo4.yaml
|
|
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
|
|
[INFO ] root.main: Configuration is valid
|
|
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
|
|
create sub GigabitEthernet3/0/0 100 dot1q 100 exact-match
|
|
create sub HundredGigabitEthernet12/0/0 100 dot1q 100 exact-match
|
|
create sub HundredGigabitEthernet12/0/1 100 dot1q 100 exact-match
|
|
create sub HundredGigabitEthernet12/0/0 101 dot1q 100 inner-dot1q 200 exact-match
|
|
create bridge-domain 10
|
|
set interface l2 bridge HundredGigabitEthernet12/0/0.101 10
|
|
set interface l2 tag-rewrite HundredGigabitEthernet12/0/0.101 pop 2
|
|
set interface l2 xconnect GigabitEthernet3/0/0.100 HundredGigabitEthernet12/0/1.100
|
|
set interface l2 tag-rewrite GigabitEthernet3/0/0.100 pop 1
|
|
set interface l2 xconnect HundredGigabitEthernet12/0/1.100 GigabitEthernet3/0/0.100
|
|
set interface l2 tag-rewrite HundredGigabitEthernet12/0/1.100 pop 1
|
|
set interface mtu 9216 GigabitEthernet3/0/0
|
|
set interface mtu 9216 HundredGigabitEthernet12/0/0
|
|
set interface mtu 9216 HundredGigabitEthernet12/0/1
|
|
set interface mtu packet 1500 GigabitEthernet3/0/1
|
|
set interface mtu packet 9216 GigabitEthernet3/0/0
|
|
set interface mtu packet 9216 HundredGigabitEthernet12/0/0
|
|
set interface mtu packet 9216 HundredGigabitEthernet12/0/1
|
|
set interface mtu packet 2000 GigabitEthernet3/0/0.100
|
|
set interface mtu packet 3000 HundredGigabitEthernet12/0/0.100
|
|
set interface mtu packet 2000 HundredGigabitEthernet12/0/1.100
|
|
set interface mtu packet 2000 HundredGigabitEthernet12/0/0.101
|
|
set interface mtu 1500 GigabitEthernet3/0/1
|
|
set interface state GigabitEthernet3/0/0 up
|
|
set interface state GigabitEthernet3/0/0.100 up
|
|
set interface state GigabitEthernet3/0/1 up
|
|
set interface state HundredGigabitEthernet12/0/0 up
|
|
set interface state HundredGigabitEthernet12/0/0.100 up
|
|
set interface state HundredGigabitEthernet12/0/0.101 up
|
|
set interface state HundredGigabitEthernet12/0/1 up
|
|
set interface state HundredGigabitEthernet12/0/1.100 up
|
|
[INFO ] root.main: Planning succeeded
|
|
```
|
|
|
|
On the `vppctl` commandline, I can simply cut-and-paste these CLI commands and the dataplane ends up
|
|
configured exactly like was desired in the `hippo4.yaml` configuration file. One nice way to tell if
|
|
the reconciliation of the config file into the running VPP instance was successful is by running the
|
|
planner again with the same YAML config file. It should not find anything worth pruning, creating nor
|
|
syncing, and indeed:
|
|
|
|
```
|
|
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo4.yaml plan
|
|
[INFO ] root.main: Loading configfile hippo4.yaml
|
|
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
|
|
[INFO ] root.main: Configuration is valid
|
|
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
|
|
[INFO ] root.main: Planning succeeded
|
|
```
|
|
|
|
#### Demo 2: Moving from one config to another
|
|
|
|
To demonstrate how my reconciliation algorithm works in practice, I decide to invent a radically
|
|
different configuration for Hippo, called `hippo12.yaml`, in which a new BondEthernet appears,
|
|
two of its sub-interfaces are cross connected, `Hu12/0/0` now gets a _LIP_ and some IP addresses, and
|
|
the bridge-domain `bd10` is replaced by two others, `bd1` and `bd11`, the former of which also sports
|
|
a BVI (with a _LIP_ called `bvi1`) and a VXLAN Tunnel bridged into `bd1` for good measure:
|
|
|
|
```
|
|
bondethernets:
|
|
BondEthernet0:
|
|
interfaces: [ GigabitEthernet3/0/0, GigabitEthernet3/0/1 ]
|
|
|
|
interfaces:
|
|
GigabitEthernet3/0/0:
|
|
mtu: 9000
|
|
description: "LAG #1"
|
|
GigabitEthernet3/0/1:
|
|
mtu: 9000
|
|
description: "LAG #2"
|
|
|
|
HundredGigabitEthernet12/0/0:
|
|
lcp: "ice12-0-0"
|
|
mtu: 9000
|
|
addresses: [ 192.0.2.17/30, 2001:db8:3::1/64 ]
|
|
sub-interfaces:
|
|
1234:
|
|
mtu: 1200
|
|
lcp: "ice0.1234"
|
|
encapsulation:
|
|
dot1q: 1234
|
|
exact-match: True
|
|
1235:
|
|
mtu: 1100
|
|
lcp: "ice0.1234.1000"
|
|
encapsulation:
|
|
dot1q: 1234
|
|
inner-dot1q: 1000
|
|
exact-match: True
|
|
|
|
HundredGigabitEthernet12/0/1:
|
|
mtu: 2000
|
|
description: "Bridged"
|
|
BondEthernet0:
|
|
mtu: 9000
|
|
lcp: "bond0"
|
|
sub-interfaces:
|
|
10:
|
|
lcp: "bond0.10"
|
|
mtu: 3000
|
|
100:
|
|
mtu: 2500
|
|
l2xc: BondEthernet0.200
|
|
encapsulation:
|
|
dot1q: 100
|
|
exact-match: False
|
|
200:
|
|
mtu: 2500
|
|
l2xc: BondEthernet0.100
|
|
encapsulation:
|
|
dot1q: 200
|
|
exact-match: False
|
|
500:
|
|
mtu: 2000
|
|
encapsulation:
|
|
dot1ad: 500
|
|
exact-match: False
|
|
501:
|
|
mtu: 2000
|
|
encapsulation:
|
|
dot1ad: 501
|
|
exact-match: False
|
|
vxlan_tunnel1:
|
|
mtu: 2000
|
|
|
|
loopbacks:
|
|
loop0:
|
|
lcp: "lo0"
|
|
addresses: [ 10.0.0.1/32, 2001:db8::1/128 ]
|
|
loop1:
|
|
lcp: "bvi1"
|
|
addresses: [ 10.0.1.1/24, 2001:db8:1::1/64 ]
|
|
|
|
bridgedomains:
|
|
bd1:
|
|
mtu: 2000
|
|
bvi: loop1
|
|
interfaces: [ BondEthernet0.500, BondEthernet0.501, HundredGigabitEthernet12/0/1, vxlan_tunnel1 ]
|
|
bd11:
|
|
mtu: 1500
|
|
|
|
vxlan_tunnels:
|
|
vxlan_tunnel1:
|
|
local: 192.0.2.1
|
|
remote: 192.0.2.2
|
|
vni: 101
|
|
```
|
|
|
|
Before writing `vppcfg`, the art of moving from `hippo4.yaml` to this radically different `hippo12.yaml`
|
|
would be a nightmare, and almost certainly have caused me to miss a step and cause an outage. But, due to
|
|
the fundamental understanding of ordering, and the methodical execution of **pruning**, **creating** and
|
|
**syncing** the objects, the path planner comes up with the following sequence, which I'll break down
|
|
in its three constituent phases:
|
|
|
|
```
|
|
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan
|
|
[INFO ] root.main: Loading configfile hippo12.yaml
|
|
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
|
|
[INFO ] root.main: Configuration is valid
|
|
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
|
|
set interface state HundredGigabitEthernet12/0/0.101 down
|
|
set interface state GigabitEthernet3/0/0.100 down
|
|
set interface state HundredGigabitEthernet12/0/0.100 down
|
|
set interface state HundredGigabitEthernet12/0/1.100 down
|
|
set interface l2 tag-rewrite HundredGigabitEthernet12/0/0.101 disable
|
|
set interface l3 HundredGigabitEthernet12/0/0.101
|
|
create bridge-domain 10 del
|
|
set interface l2 tag-rewrite GigabitEthernet3/0/0.100 disable
|
|
set interface l3 GigabitEthernet3/0/0.100
|
|
set interface l2 tag-rewrite HundredGigabitEthernet12/0/1.100 disable
|
|
set interface l3 HundredGigabitEthernet12/0/1.100
|
|
delete sub HundredGigabitEthernet12/0/0.101
|
|
delete sub GigabitEthernet3/0/0.100
|
|
delete sub HundredGigabitEthernet12/0/0.100
|
|
delete sub HundredGigabitEthernet12/0/1.100
|
|
```
|
|
|
|
First, `vppcfg` concludes that `Hu12/0/0.101`, `Hu12/0/1.100` and `Gi3/0/0.100` are no longer
|
|
needed, so it sets them all admin-state down. The bridge-domain `bd10` no longer has the right to
|
|
exist, the poor thing. But before it is deleted, the interface that was in `bd10` can be pruned
|
|
(membership _depends_ on the bridge, so in pruning, dependencies are removed before dependents).
|
|
Considering `Hu12/0/1.101` and `Gi3/0/0.100` were an L2XC pair before, they are returned to default
|
|
(L3) mode and because it's no longer needed, the [VLAN Gymnastics]({%post_url 2022-02-14-vpp-vlan-gym %})
|
|
tag rewriting is also cleaned up for both interfaces. Finally, the sub-interfaces that do not appear
|
|
in the target configuration are deleted, completing the **pruning** phase.
|
|
|
|
It then continues with the **create** phase:
|
|
|
|
```
|
|
create loopback interface instance 0
|
|
create loopback interface instance 1
|
|
create bond mode lacp load-balance l34 id 0
|
|
create vxlan tunnel src 192.0.2.1 dst 192.0.2.2 instance 1 vni 101 decap-next l2
|
|
create sub HundredGigabitEthernet12/0/0 1234 dot1q 1234 exact-match
|
|
create sub BondEthernet0 10 dot1q 10 exact-match
|
|
create sub BondEthernet0 100 dot1q 100
|
|
create sub BondEthernet0 200 dot1q 200
|
|
create sub BondEthernet0 500 dot1ad 500
|
|
create sub BondEthernet0 501 dot1ad 501
|
|
create sub HundredGigabitEthernet12/0/0 1235 dot1q 1234 inner-dot1q 1000 exact-match
|
|
create bridge-domain 1
|
|
create bridge-domain 11
|
|
lcp create HundredGigabitEthernet12/0/0 host-if ice12-0-0
|
|
lcp create BondEthernet0 host-if bond0
|
|
lcp create loop0 host-if lo0
|
|
lcp create loop1 host-if bvi1
|
|
lcp create HundredGigabitEthernet12/0/0.1234 host-if ice0.1234
|
|
lcp create BondEthernet0.10 host-if bond0.10
|
|
lcp create HundredGigabitEthernet12/0/0.1235 host-if ice0.1234.1000
|
|
```
|
|
|
|
Here, interfaces are created in order of loopbacks first, then BondEthernets, then Tunnels, and
|
|
finally sub-interfaces, first creating single-tagged and then creating dual-tagged sub-interfaces.
|
|
Of course, the BondEthernet has to be created before any sub-int will be able to be created on it.
|
|
Note that the QinQ `Hu12/0/0.1235` will be created after its intermediary parent `Hu12/0/0.1234`
|
|
due to this ordering requirement.
|
|
|
|
Then, the two new bridgedomains `bd1` and `bd11` are created, and finally the _LIP_ plumbing is
|
|
performed, starting with the PHY `ice12-0-0` and BondEthernet `bond0`, then the two loopbacks,
|
|
and only then advancing to the two single-tag dot1q interfaces and finally the QinQ interface. For
|
|
LCPs, this is very important, because in Linux, the interfaces are a tree, not a list. `ice12-0-0`
|
|
must be created before its child `ice0.1234@ice12-0-0` can be created, and only then can the QinQ
|
|
`ice0.1234.1000@ice0.1234` be created. This creation order follows from the DAG having an edge
|
|
signalling an LCP depending on the sub-interface, and an edge between the sub-interface with two
|
|
tags depending on the sub-interface with one tag, and an edge between the single-tagged sub-interface
|
|
depending on its PHY.
|
|
|
|
After all this work, `vppcfg` can assert (a) every object that now exists in VPP is in the
|
|
target configuration and (b) that any object that exists in the configuration also is present in
|
|
VPP (with the correct attributes).
|
|
|
|
But there's one last thing to do, and that's ensure that the attributes that can be changed at
|
|
runtime (IP addresses, L2XCs, BondEthernet and bridge-domain members, etc) , are **sync'd** into
|
|
their respective objects in VPP based on what's in the target configuration:
|
|
|
|
```
|
|
bond add BondEthernet0 GigabitEthernet3/0/0
|
|
bond add BondEthernet0 GigabitEthernet3/0/1
|
|
comment { ip link set bond0 address 00:25:90:0c:05:01 }
|
|
set interface l2 bridge loop1 1 bvi
|
|
set interface l2 bridge BondEthernet0.500 1
|
|
set interface l2 tag-rewrite BondEthernet0.500 pop 1
|
|
set interface l2 bridge BondEthernet0.501 1
|
|
set interface l2 tag-rewrite BondEthernet0.501 pop 1
|
|
set interface l2 bridge HundredGigabitEthernet12/0/1 1
|
|
set interface l2 tag-rewrite HundredGigabitEthernet12/0/1 disable
|
|
set interface l2 bridge vxlan_tunnel1 1
|
|
set interface l2 tag-rewrite vxlan_tunnel1 disable
|
|
set interface l2 xconnect BondEthernet0.100 BondEthernet0.200
|
|
set interface l2 tag-rewrite BondEthernet0.100 pop 1
|
|
set interface l2 xconnect BondEthernet0.200 BondEthernet0.100
|
|
set interface l2 tag-rewrite BondEthernet0.200 pop 1
|
|
set interface state GigabitEthernet3/0/1 down
|
|
set interface mtu 9000 GigabitEthernet3/0/1
|
|
set interface state GigabitEthernet3/0/1 up
|
|
set interface mtu packet 9000 GigabitEthernet3/0/0
|
|
set interface mtu packet 9000 HundredGigabitEthernet12/0/0
|
|
set interface mtu packet 2000 HundredGigabitEthernet12/0/1
|
|
set interface mtu packet 2000 vxlan_tunnel1
|
|
set interface mtu packet 1500 loop0
|
|
set interface mtu packet 1500 loop1
|
|
set interface mtu packet 9000 GigabitEthernet3/0/1
|
|
set interface mtu packet 1200 HundredGigabitEthernet12/0/0.1234
|
|
set interface mtu packet 3000 BondEthernet0.10
|
|
set interface mtu packet 2500 BondEthernet0.100
|
|
set interface mtu packet 2500 BondEthernet0.200
|
|
set interface mtu packet 2000 BondEthernet0.500
|
|
set interface mtu packet 2000 BondEthernet0.501
|
|
set interface mtu packet 1100 HundredGigabitEthernet12/0/0.1235
|
|
set interface state GigabitEthernet3/0/0 down
|
|
set interface mtu 9000 GigabitEthernet3/0/0
|
|
set interface state GigabitEthernet3/0/0 up
|
|
set interface state HundredGigabitEthernet12/0/0 down
|
|
set interface mtu 9000 HundredGigabitEthernet12/0/0
|
|
set interface state HundredGigabitEthernet12/0/0 up
|
|
set interface state HundredGigabitEthernet12/0/1 down
|
|
set interface mtu 2000 HundredGigabitEthernet12/0/1
|
|
set interface state HundredGigabitEthernet12/0/1 up
|
|
set interface ip address HundredGigabitEthernet12/0/0 192.0.2.17/30
|
|
set interface ip address HundredGigabitEthernet12/0/0 2001:db8:3::1/64
|
|
set interface ip address loop0 10.0.0.1/32
|
|
set interface ip address loop0 2001:db8::1/128
|
|
set interface ip address loop1 10.0.1.1/24
|
|
set interface ip address loop1 2001:db8:1::1/64
|
|
set interface state HundredGigabitEthernet12/0/0.1234 up
|
|
set interface state HundredGigabitEthernet12/0/0.1235 up
|
|
set interface state BondEthernet0 up
|
|
set interface state BondEthernet0.10 up
|
|
set interface state BondEthernet0.100 up
|
|
set interface state BondEthernet0.200 up
|
|
set interface state BondEthernet0.500 up
|
|
set interface state BondEthernet0.501 up
|
|
set interface state vxlan_tunnel1 up
|
|
set interface state loop0 up
|
|
set interface state loop1 up
|
|
```
|
|
|
|
I'm not gonna lie, it's a tonne of work, but it's all a pretty staight forward juggle. The sync
|
|
phase will look at each object in the config and ensure that the attributes that same object has in the
|
|
dataplane are present and correct. In my demo, `hippo12.yaml` creates a lot of interfaces and IP
|
|
addresses, and changes the MTU of pretty much every interface, but in order:
|
|
|
|
* The bondethernet gets its members `Gi3/0/0` and `Gi3/0/1`. As an interesting aside, when VPP
|
|
creates a BondEthernet it'll initially assign it an ephemeral MAC address. Then, when its first
|
|
member is added, the MAC address of the BondEthernet will change to that of the first member.
|
|
The comment reminds me to also set this MAC on the Linux device `bond0`. In the future, I'll add
|
|
some `PyRoute2` code to do that automatically.
|
|
* BridgeDomains are next. The BVI `loop1` is added first, then a few sub-interfaces and a tunnel,
|
|
and VLAN tag-rewriting for tagged interfaces is configured. There are two bridges, but only one
|
|
of them has members, so there's not much (in fact, there's nothing) to do for the other one.
|
|
* L2 Cross Connects can be changed at runtime, and they're next. The two interfaces `BE0.100` and
|
|
`BE0.200` are connected to one another and tag-rewrites are set up for them, considering they
|
|
are both tagged sub-interfaces.
|
|
* MTU is next. There's two variants of this. The first one `set interface mtu` is actually a
|
|
change in the DPDK driver to change the maximum allowable frame size. For this change, some
|
|
interface types have to be brought down first, the max frame size changed, and then brought back
|
|
up again. For all the others, the MTU will be changed in a specific order:
|
|
1. PHYs will grow their MTU first, as growing a PHY is guaranteed to be always safe.
|
|
1. Sub-interfaces will shrink QinX first, then Dot1Q/Dot1AD, then untagged interfaces. This is
|
|
to ensure we do not leave VPP and LinuxCP in a state where a QinQ sub-int has a higher MTU
|
|
than any of its parents.
|
|
1. Sub-interfaces will grow untagged first, then DOt1Q/Dot1AD, and finally QinX sub-interfaces.
|
|
Same reason as step 2, no sub-interface will end up with a higher MTU than any of its
|
|
parents.
|
|
1. PHYs will shrink their MTU last. The YAML configuration validation asserts that no PHY can
|
|
have an MTU lower than any of its children, so this is safe.
|
|
* Finally, IP addresses are added to `Hu12/0/0`, `loop0` and `loop1`. I can guarantee that adding
|
|
IP addresses will not clash with any other interface, because pruning would've removed IP
|
|
addresses from interfaces where they don't belong previously.
|
|
* And to finish off, the admin state for interfaces is set, again going from PHY, Bond, Tunnel,
|
|
1-tagged sub-interfaces and finally 2-tagged sub-interfaces and loopbacks.
|
|
|
|
Let's take it to the test:
|
|
|
|
```
|
|
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan -o hippo4-to-12.exec
|
|
[INFO ] root.main: Loading configfile hippo12.yaml
|
|
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
|
|
[INFO ] root.main: Configuration is valid
|
|
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
|
|
[INFO ] vppcfg.reconciler.write: Wrote 94 lines to hippo4-to-12.exec
|
|
[INFO ] root.main: Planning succeeded
|
|
|
|
pim@hippo:~/src/vppcfg$ vppctl exec ~/src/vppcfg/hippo4-to-12.exec
|
|
|
|
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan
|
|
[INFO ] root.main: Loading configfile hippo12.yaml
|
|
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
|
|
[INFO ] root.main: Configuration is valid
|
|
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
|
|
[INFO ] root.main: Planning succeeded
|
|
```
|
|
|
|
Notice that after applying `hippo4-to-12.exec`, the planner had nothing else to say. VPP is now in
|
|
the target configuration state, slick!
|
|
|
|
### Demo 3: Returning VPP to empty
|
|
|
|
This one is easy, but shows the pruning in action. Let's say I wanted to return VPP to a default
|
|
configuration without any objects, and its interfaces all at MTU 1500:
|
|
|
|
```
|
|
interfaces:
|
|
GigabitEthernet3/0/0:
|
|
mtu: 1500
|
|
description: Not Used
|
|
GigabitEthernet3/0/1:
|
|
mtu: 1500
|
|
description: Not Used
|
|
HundredGigabitEthernet12/0/0:
|
|
mtu: 1500
|
|
description: Not Used
|
|
HundredGigabitEthernet12/0/1:
|
|
mtu: 1500
|
|
description: Not Used
|
|
```
|
|
|
|
Simply applying that plan:
|
|
|
|
```
|
|
pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo-empty.yaml plan -o 12-to-empty.exec
|
|
[INFO ] root.main: Loading configfile hippo-empty.yaml
|
|
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
|
|
[INFO ] root.main: Configuration is valid
|
|
[INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823
|
|
[INFO ] vppcfg.reconciler.write: Wrote 66 lines to 12-to-empty.exec
|
|
[INFO ] root.main: Planning succeeded
|
|
|
|
pim@hippo:~/src/vppcfg$ vppctl
|
|
vpp# exec ~/src/vppcfg/12-to-empty.exec
|
|
vpp# show interface
|
|
Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count
|
|
GigabitEthernet3/0/0 1 up 1500/0/0/0
|
|
GigabitEthernet3/0/1 2 up 1500/0/0/0
|
|
HundredGigabitEthernet12/0/0 3 up 1500/0/0/0
|
|
HundredGigabitEthernet12/0/1 4 up 1500/0/0/0
|
|
local0 0 down 0/0/0/0
|
|
```
|
|
|
|
### Final notes
|
|
|
|
Now you may have been wondering why I would call the first file `hippo4.yaml` and the second one
|
|
`hippo12.yaml`. This is because I have 20 such YAML files that bring Hippo into all sorts of
|
|
esoteric configuration states, and I do this so that I can do a full integration test of any config
|
|
morphing into any other config:
|
|
|
|
```
|
|
for i in hippo[0-9]*.yaml; do
|
|
echo "Clearing: Moving to hippo-empty.yaml"
|
|
./vppcfg -c hippo-empty.yaml > /tmp/vppcfg-exec-empty
|
|
[ -s /tmp/vppcfg-exec-empty ] && vppctl exec /tmp/vppcfg-exec-empty
|
|
|
|
for j in hippo[0-9]*.yaml; do
|
|
echo " - Moving to $i .. "
|
|
./vppcfg -c $i > /tmp/vppcfg-exec_$i
|
|
[ -s /tmp/vppcfg-exec_$i ] && vppctl exec /tmp/vppcfg-exec_$i
|
|
|
|
echo " - Moving from $i to $j"
|
|
./vppcfg -c $j > /tmp/vppcfg-exec_${i}_${j}
|
|
[ -s /tmp/vppcfg-exec_${i}_${j} ] && vppctl exec /tmp/vppcfg-exec_${i}_${j}
|
|
|
|
echo " - Checking that from $j to $j is empty"
|
|
./vppcfg -c $j > /tmp/vppcfg-exec_${j}_${j}_null
|
|
done
|
|
done
|
|
```
|
|
|
|
What this does is starts off Hippo with an empty config, then moves it to `hippo1.yaml` and from
|
|
there it moves the configuration to each YAML file and back to `hippo1.yaml`. Doing this proves,
|
|
that no matter which configuration I want to obtain, I can get there safely when the VPP dataplane
|
|
config starts out looking like what is described in `hippo1.yaml`. I'll then move it back to empty,
|
|
and into `hippo2.yaml`, doing the whole cycle again. So for 20 files, this means ~400 or so
|
|
configuration transitions. And some of these are special, notably moving from `hippoN.yaml` to
|
|
the same `hippoN.yaml` should result in zero diffs.
|
|
|
|
With this path planner reasonably well tested, I have pretty high confidence that `vppcfg` can
|
|
change the dataplane from any existing configuration to any desired target configuration.
|
|
|
|
## What's next
|
|
|
|
One thing that I didn't mention yet, is that the `vppcfg` path planner works by reading the API
|
|
configuration state exactly once (at startup), and then it figures out the CLI calls to print
|
|
without needing to talk to VPP again. This is super useful as it's a non-intrusive way to inspect
|
|
the changes before applying them, and it's a property I'd like to carry forward.
|
|
|
|
However, I don't necessarily think that emitting the CLI statements is the best user experience,
|
|
it's more for the purposes of analysis that they can be useful. What I really want to do is emit
|
|
API calls after the plan is created and reviewed/approved, directly reprogramming the VPP dataplane,
|
|
and likely the Linux network namespace interfaces as well, for example setting the MAC address of
|
|
a BondEthernet as I showed in that one comment above, or setting interface alias names based on
|
|
the configured descriptions.
|
|
|
|
However, the VPP API set needed to do this is not 100% baked yet. For example, I observed crashes
|
|
when tinkering with BVIs and Loopbacks ([thread](https://lists.fd.io/g/vpp-dev/message/21116)), and
|
|
fixed a few obvious errors in the Linux CP API ([gerrit](https://gerrit.fd.io/r/c/vpp/+/35479)) but
|
|
there are still a few more issues to work through before I can set the next step with `vppcfg`.
|
|
|
|
But for now, it's already helping me out tremendously at IPng Networks and I hope it'll be useful
|
|
for others, too.
|