--- date: "2022-04-02T08:50:19Z" title: VPP Configuration - Part2 aliases: - /s/articles/2022/04/02/vppcfg-2.html --- {{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}} # About this series I use VPP - Vector Packet Processor - extensively at IPng Networks. Earlier this year, the VPP community merged the [Linux Control Plane]({{< ref "2021-08-12-vpp-1" >}}) plugin. I wrote about its deployment to both regular servers like the [Supermicro]({{< ref "2021-09-21-vpp-7" >}}) routers that run on our [AS8298]({{< ref "2021-02-27-network" >}}), as well as virtual machines running in [KVM/Qemu]({{< ref "2021-12-23-vpp-playground" >}}). Now that I've been running VPP in production for about half a year, I can't help but notice one specific drawback: VPP is a programmable dataplane, and _by design_ it does not include any configuration or controlplane management stack. It's meant to be integrated into a full stack by operators. For end-users, this unfortunately means that typing on the CLI won't persist any configuration, and if VPP is restarted, it will not pick up where it left off. There's one developer convenience in the form of the `exec` command-line (and startup.conf!) option, which will read a file and apply the contents to the CLI line by line. However, if any typo is made in the file, processing immediately stops. It's meant as a convenience for VPP developers, and is certainly not a useful configuration method for all but the simplest topologies. Luckily, VPP comes with an extensive set of APIs to allow it to be programmed. So in this series of posts, I'll detail the work I've done to create a configuration utility that can take a YAML configuration file, compare it to a running VPP instance, and step-by-step plan through the API calls needed to safely apply the configuration to the dataplane. Welcome to `vppcfg`! In this second post of the series, I want to talk a little bit about how planning a path from a running configuration to a desired new configuration might look like. **Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves) or reach out by [contacting us](/s/contact/). ## VPP Config: a DAG Before we dive into my `vppcfg` code, let me first introduce a mental model of how configuration is built. We rarely stop and think about it, but when we configure our routers (no matter if it's a Cisco or a Juniper or a VPP router), in our mind we logically order the operations in a very particular way. To state the obvious, if I want to create a sub-interface which also has an address, I would create the sub-int _before_ adding the address, right? Similarly, if I wanted to expose a sub-interface `Hu12/0/0.100` in Linux as a _LIP_, I would create it only _after_ having created a _LIP_ for the parent interface `Hu12/0/0`, to satisfy Linux's requirement all sub-interfaces have a parent interface, like so: ``` vpp# create sub HundredGigabitEthernet12/0/0 100 vpp# set interface ip address HundredGigabitEthernet12/0/0.100 192.0.2.1/29 vpp# lcp create HundredGigabitEthernet12/0/0 host-if ice0 vpp# lcp create HundredGigabitEthernet12/0/0.100 host-if ice0.100 vpp# set interface state HundredGigabitEthernet12/0/0 up vpp# set interface state HundredGigabitEthernet12/0/0.100 up ``` Of course some of the ordering doesn't strictly matter. For example, I can set the state of `Hu12/0/0.100` up before adding the address, or after adding the address, or even after adding the _LIP_, but one thing is certain: I cannot set its state to up before it was created in the first place! In the other direction, when removing things, it's easy to see that you cannot manipulate the state of a sub-interface after deleting it, so to cleanly remove the construction above, I would have to walk the statements back in reverse, like so: ``` vpp# set interface state HundredGigabitEthernet12/0/0.100 down vpp# set interface state HundredGigabitEthernet12/0/0 down vpp# lcp delete HundredGigabitEthernet12/0/0.100 host-if ice0.100 vpp# lcp delete HundredGigabitEthernet12/0/0 host-if ice0 vpp# set interface ip address del HundredGigabitEthernet12/0/0.100 192.0.2.1/29 vpp# delete sub HundredGigabitEthernet12/0/0.100 ``` Because of this reasonably straight forward ordering, it's possible to construct a graph showing operations that depend on other operations having been completed beforehand. Such a graph, called a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) or _DAG_. {{< image width="400px" float="left" src="/assets/vppcfg/vppcfg-dag.png" alt="DAG" >}} First some theory (from Wikipedia): A directed graph is a DAG if and only if it can be topologically ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions. DAGs have numerous scientific and computational applications, but the one I'm mostly interested here is dependency mapping and computational scheduling. A graph is formed by vertices and by edges connecting pairs of vertices, where the vertices are objects that might exist in VPP (interfaces, bridge-domains, VXLAN tunnels, IP addresses, etc), and these objects are connected in pairs by edges. In the case of a directed graph, each edge has an orientation (or direction), from one (source) vertex to another (destination) vertex. A path in a directed graph is a sequence of edges having the property that the ending vertex of each edge in the sequence is the same as the starting vertex of the next edge in the sequence; a path forms a cycle if the starting vertex of its first edge equals the ending vertex of its last edge. A directed acyclic graph is a directed graph that has no cycles, which in this particular case means that objects' existence can't rely other things that ultimately rely back on their own existence. After I got that technobabble out of the way, practically speaking, the _edges_ in this graph model dependencies, let me give a few examples: 1. The arrow from _Sub Interface_ pointing at _BondEther_ and _Physical Int_ makes the claim that for the sub-int to exist, it _depends on_ the existence of either a BondEthernet, or a PHY. 1. The arrow from the _BondEther_ to the _Physical Int_, which makes the claim that for the BondEthernet to work, it must have one or more PHYs in it. 1. There is no arrow between _BondEther_ and _Sub Interface_ which makes the claim that they are independent, there is no need for a sub-int to exist in order for a BondEthernet to work. ## VPP Config: Ordering In my [previous]({{< ref "2022-03-27-vppcfg-1" >}}) post, I talked about a bunch of constraints that make certain YAML configurations invalid (for example, having both _dot1q_ and _dot1ad_ on a sub-interface, that wouldn't make any sense). Here, I'm going to talk about another type of constraint: ***Temporal Constraints*** are statements about the ordering of operations. With the example DAG above, I derive the following constraints: * A parent interface must exist before a sub-interface can be created on it * An interface (regardless of sub-int or phy) must exist before an IP address can be added to it * A _LIP_ can be created on a sub-int only if its parent PHY has a _LIP_ * _LIPs_ must be removed from all sub-interfaces before a PHY's _LIP_ can be removed * The admin-state of a sub-interface can only be up if its PHY is up * ... and so on. But there's a second thing to keep in mind, and this is a bit more specific to the VPP configuration operations themselves. Sometimes, I may find that an object already exists, say a sub-interface, but that it has configuration attributes that are not what I wanted. For example, I may have previously configured a sub-int to be of a certain encapsulation `dot1q 1000 inner-dot1q 1234`, but I changed my mind and want the sub-int to now be `dot1ad 1000 inner-dot1q 1234` instead. Some attributes of an interface can be changed on the fly (like the MTU, for example), but some really cannot, and in my example here, the encapsulation change has to be done another way. I'll make an obvious but hopefully helpful observation: I can't create the second sub-int with the same subid, because one already exists (duh). The intuitive way to solve this, of course, is to delete the old sub-int _first_ and then create a _new_ sub-int with the correct attributes (`dot1ad` outer encapsulation). Here's another scenario that illustrates the ordering: Let's say I want to move an IP address from interface A to interface B. In VPP, I can't configure the same IP address/prefixlen on two interfaces at the same time, so as with the previous scenario of the encap changing, I will want to remove the IP address from A before adding it to B. Come to think of it, there are lots of scenarios where remove-before-add is required: * If an interface was in bridge-domain A but now wants to be put in bridge-domain B, it'll have to be _removed_ from the first bridge before being _added_ to the second bridge, because an interface can't be in two bridges at the same time. * If an interface was a member of a BondEthernet, but will be moved to be a member of a bridge-domain now, it will have to be _removed_ from the bond before being _added_ to the bridge, because an interface can't be both a bondethernet member and a member of a bridge at the same time. * And to add to the list, the scenario above: A sub-interface that differs in its intended encapsulation must be _removed_ before a new one with the same `subid` can be _created_. All of these cases can be modeled as edges (arrows) between vertices (objects) in the graph describing the ordering of operations in VPP! I'm now ready to draw two important conclusions: 1. All objects that differ from their intended configuration must be removed before being added elsewhere, in order to avoid them being referenced/used twice. 1. All objects must be created before their attributes can be set. ### vppcfg: Path Planning By thinking about the configuration in this way, I can precisely predict the order of operations needed to go from any running dataplane configuration to _any new_ target dataplane configuration. A so called path-planner emerges, which has three main phases of execution: 1. **Prune** phase (remove objects from VPP that are not in the config) 1. **Create** phase (add objects to VPP that are in the config but not VPP) 1. **Sync** phase, for each object in the configuration When removing things, care has to be taken to remove inner-most objects first (first removing LCP, then QinQ, Dot1Q, BondEthernet, and lastly PHY), because indeed, there exists a dependency relationship between objects in this DAG. Conversely, when creating objects, the edges flip their directionality, because creation must be done on outer-most objects first (first creating the PHY, then BondEthernet, Dot1Q, QinQ and lastly LCP). For example, QinQ/QinAD sub-interfaces should be removed before before their intermediary Dot1Q/Dot1AD can be removed. Another example, MTU of parents should raise before their children, while children should shrink before their parent. **Order matters**. **Pruning**: First, `vppcfg` will ensure all objects do not have attributes which they should not (eg. IP addresses) and that objects are destroyed that are not needed (ie. have been removed from the target config). After this phase, I am certain that any object that exists in the dataplane, both (a) has the right to exist (because it's in the target configuration), and (b) has the correct create-time (ie non syncable) attributes. **Creating**: Next, `vppcfg` will ensure that all objects that are not yet present (including the ones that it just removed because they were present but had incorrect attributes), get (re)created in the right order. After this phase, I am certain that _all objects_ in the dataplane now (a) have the right to exist (because they are in the target configuration), (b) have the correct attributes, but newly, also that (c) all objects that are in the target configuration also got created and now exist in the dataplane. **Syncing**: Finally, all objects are synchronized with the target configuration (IP addresses, MTU etc), taking care to shrink children before their parents, and growing parents before their children (this is for the special case of any given sub-interface's MTU having to be equal to or lower than their parent's MTU). ### vppcfg: Demonstration I'll create three configurations and let vppcfg path-plan between them. I start a completely empty VPP dataplane which has two GigabitEthernet and two HundredGigabitEthernet interfaces: ``` pim@hippo:~/src/vpp$ make run _______ _ _ _____ ___ __/ __/ _ \ (_)__ | | / / _ \/ _ \ _/ _// // / / / _ \ | |/ / ___/ ___/ /_/ /____(_)_/\___/ |___/_/ /_/ DBGvpp# show interface Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count GigabitEthernet3/0/0 1 down 9000/0/0/0 GigabitEthernet3/0/1 2 down 9000/0/0/0 HundredGigabitEthernet12/0/0 3 down 9000/0/0/0 HundredGigabitEthernet12/0/1 4 down 9000/0/0/0 local0 0 down 0/0/0/0 ``` #### Demo 1: First time config (empty VPP) First, starting simple, I write the following YAML configuration called `hippo4.yaml`. It defines a few sub-interfaces, a bridgedomain with one QinQ sub-interface `Hu12/0/0.101` in it, and it then cross-connects `Gi3/0/0.100` with `Hu12/0/1.100`, keeping all sub-interfaces at an MTU of 2000 and their PHYs at an MTU of 9216: ``` interfaces: GigabitEthernet3/0/0: mtu: 9216 sub-interfaces: 100: mtu: 2000 l2xc: HundredGigabitEthernet12/0/1.100 GigabitEthernet3/0/1: description: Not Used HundredGigabitEthernet12/0/0: mtu: 9216 sub-interfaces: 100: mtu: 3000 101: mtu: 2000 encapsulation: dot1q: 100 inner-dot1q: 200 exact-match: True HundredGigabitEthernet12/0/1: mtu: 9216 sub-interfaces: 100: mtu: 2000 l2xc: GigabitEthernet3/0/0.100 bridgedomains: bd10: description: "Bridge Domain 10" mtu: 2000 interfaces: [ HundredGigabitEthernet12/0/0.101 ] ``` If I offer this config to `vppcfg` and ask it to plan a path, there won't be any **pruning** going on, because there are no objects in the newly started VPP dataplane that need to be deleted. But I do expect to see a bunch of sub-interface and one bridge-domain **creation**, followed by **syncing** a bunch of interfaces with bridge-domain memberships and L2 Cross Connects. Finally, the MTU of the interfaces will be sync'd to their configured values, and the path is planned like so: ``` pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo4.yaml plan [INFO ] root.main: Loading configfile hippo4.yaml [INFO ] vppcfg.config.valid_config: Configuration validated successfully [INFO ] root.main: Configuration is valid [INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823 create sub GigabitEthernet3/0/0 100 dot1q 100 exact-match create sub HundredGigabitEthernet12/0/0 100 dot1q 100 exact-match create sub HundredGigabitEthernet12/0/1 100 dot1q 100 exact-match create sub HundredGigabitEthernet12/0/0 101 dot1q 100 inner-dot1q 200 exact-match create bridge-domain 10 set interface l2 bridge HundredGigabitEthernet12/0/0.101 10 set interface l2 tag-rewrite HundredGigabitEthernet12/0/0.101 pop 2 set interface l2 xconnect GigabitEthernet3/0/0.100 HundredGigabitEthernet12/0/1.100 set interface l2 tag-rewrite GigabitEthernet3/0/0.100 pop 1 set interface l2 xconnect HundredGigabitEthernet12/0/1.100 GigabitEthernet3/0/0.100 set interface l2 tag-rewrite HundredGigabitEthernet12/0/1.100 pop 1 set interface mtu 9216 GigabitEthernet3/0/0 set interface mtu 9216 HundredGigabitEthernet12/0/0 set interface mtu 9216 HundredGigabitEthernet12/0/1 set interface mtu packet 1500 GigabitEthernet3/0/1 set interface mtu packet 9216 GigabitEthernet3/0/0 set interface mtu packet 9216 HundredGigabitEthernet12/0/0 set interface mtu packet 9216 HundredGigabitEthernet12/0/1 set interface mtu packet 2000 GigabitEthernet3/0/0.100 set interface mtu packet 3000 HundredGigabitEthernet12/0/0.100 set interface mtu packet 2000 HundredGigabitEthernet12/0/1.100 set interface mtu packet 2000 HundredGigabitEthernet12/0/0.101 set interface mtu 1500 GigabitEthernet3/0/1 set interface state GigabitEthernet3/0/0 up set interface state GigabitEthernet3/0/0.100 up set interface state GigabitEthernet3/0/1 up set interface state HundredGigabitEthernet12/0/0 up set interface state HundredGigabitEthernet12/0/0.100 up set interface state HundredGigabitEthernet12/0/0.101 up set interface state HundredGigabitEthernet12/0/1 up set interface state HundredGigabitEthernet12/0/1.100 up [INFO ] root.main: Planning succeeded ``` On the `vppctl` commandline, I can simply cut-and-paste these CLI commands and the dataplane ends up configured exactly like was desired in the `hippo4.yaml` configuration file. One nice way to tell if the reconciliation of the config file into the running VPP instance was successful is by running the planner again with the same YAML config file. It should not find anything worth pruning, creating nor syncing, and indeed: ``` pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo4.yaml plan [INFO ] root.main: Loading configfile hippo4.yaml [INFO ] vppcfg.config.valid_config: Configuration validated successfully [INFO ] root.main: Configuration is valid [INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823 [INFO ] root.main: Planning succeeded ``` #### Demo 2: Moving from one config to another To demonstrate how my reconciliation algorithm works in practice, I decide to invent a radically different configuration for Hippo, called `hippo12.yaml`, in which a new BondEthernet appears, two of its sub-interfaces are cross connected, `Hu12/0/0` now gets a _LIP_ and some IP addresses, and the bridge-domain `bd10` is replaced by two others, `bd1` and `bd11`, the former of which also sports a BVI (with a _LIP_ called `bvi1`) and a VXLAN Tunnel bridged into `bd1` for good measure: ``` bondethernets: BondEthernet0: interfaces: [ GigabitEthernet3/0/0, GigabitEthernet3/0/1 ] interfaces: GigabitEthernet3/0/0: mtu: 9000 description: "LAG #1" GigabitEthernet3/0/1: mtu: 9000 description: "LAG #2" HundredGigabitEthernet12/0/0: lcp: "ice12-0-0" mtu: 9000 addresses: [ 192.0.2.17/30, 2001:db8:3::1/64 ] sub-interfaces: 1234: mtu: 1200 lcp: "ice0.1234" encapsulation: dot1q: 1234 exact-match: True 1235: mtu: 1100 lcp: "ice0.1234.1000" encapsulation: dot1q: 1234 inner-dot1q: 1000 exact-match: True HundredGigabitEthernet12/0/1: mtu: 2000 description: "Bridged" BondEthernet0: mtu: 9000 lcp: "bond0" sub-interfaces: 10: lcp: "bond0.10" mtu: 3000 100: mtu: 2500 l2xc: BondEthernet0.200 encapsulation: dot1q: 100 exact-match: False 200: mtu: 2500 l2xc: BondEthernet0.100 encapsulation: dot1q: 200 exact-match: False 500: mtu: 2000 encapsulation: dot1ad: 500 exact-match: False 501: mtu: 2000 encapsulation: dot1ad: 501 exact-match: False vxlan_tunnel1: mtu: 2000 loopbacks: loop0: lcp: "lo0" addresses: [ 10.0.0.1/32, 2001:db8::1/128 ] loop1: lcp: "bvi1" addresses: [ 10.0.1.1/24, 2001:db8:1::1/64 ] bridgedomains: bd1: mtu: 2000 bvi: loop1 interfaces: [ BondEthernet0.500, BondEthernet0.501, HundredGigabitEthernet12/0/1, vxlan_tunnel1 ] bd11: mtu: 1500 vxlan_tunnels: vxlan_tunnel1: local: 192.0.2.1 remote: 192.0.2.2 vni: 101 ``` Before writing `vppcfg`, the art of moving from `hippo4.yaml` to this radically different `hippo12.yaml` would be a nightmare, and almost certainly have caused me to miss a step and cause an outage. But, due to the fundamental understanding of ordering, and the methodical execution of **pruning**, **creating** and **syncing** the objects, the path planner comes up with the following sequence, which I'll break down in its three constituent phases: ``` pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan [INFO ] root.main: Loading configfile hippo12.yaml [INFO ] vppcfg.config.valid_config: Configuration validated successfully [INFO ] root.main: Configuration is valid [INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823 set interface state HundredGigabitEthernet12/0/0.101 down set interface state GigabitEthernet3/0/0.100 down set interface state HundredGigabitEthernet12/0/0.100 down set interface state HundredGigabitEthernet12/0/1.100 down set interface l2 tag-rewrite HundredGigabitEthernet12/0/0.101 disable set interface l3 HundredGigabitEthernet12/0/0.101 create bridge-domain 10 del set interface l2 tag-rewrite GigabitEthernet3/0/0.100 disable set interface l3 GigabitEthernet3/0/0.100 set interface l2 tag-rewrite HundredGigabitEthernet12/0/1.100 disable set interface l3 HundredGigabitEthernet12/0/1.100 delete sub HundredGigabitEthernet12/0/0.101 delete sub GigabitEthernet3/0/0.100 delete sub HundredGigabitEthernet12/0/0.100 delete sub HundredGigabitEthernet12/0/1.100 ``` First, `vppcfg` concludes that `Hu12/0/0.101`, `Hu12/0/1.100` and `Gi3/0/0.100` are no longer needed, so it sets them all admin-state down. The bridge-domain `bd10` no longer has the right to exist, the poor thing. But before it is deleted, the interface that was in `bd10` can be pruned (membership _depends_ on the bridge, so in pruning, dependencies are removed before dependents). Considering `Hu12/0/1.101` and `Gi3/0/0.100` were an L2XC pair before, they are returned to default (L3) mode and because it's no longer needed, the [VLAN Gymnastics]({{< ref "2022-02-14-vpp-vlan-gym" >}}) tag rewriting is also cleaned up for both interfaces. Finally, the sub-interfaces that do not appear in the target configuration are deleted, completing the **pruning** phase. It then continues with the **create** phase: ``` create loopback interface instance 0 create loopback interface instance 1 create bond mode lacp load-balance l34 id 0 create vxlan tunnel src 192.0.2.1 dst 192.0.2.2 instance 1 vni 101 decap-next l2 create sub HundredGigabitEthernet12/0/0 1234 dot1q 1234 exact-match create sub BondEthernet0 10 dot1q 10 exact-match create sub BondEthernet0 100 dot1q 100 create sub BondEthernet0 200 dot1q 200 create sub BondEthernet0 500 dot1ad 500 create sub BondEthernet0 501 dot1ad 501 create sub HundredGigabitEthernet12/0/0 1235 dot1q 1234 inner-dot1q 1000 exact-match create bridge-domain 1 create bridge-domain 11 lcp create HundredGigabitEthernet12/0/0 host-if ice12-0-0 lcp create BondEthernet0 host-if bond0 lcp create loop0 host-if lo0 lcp create loop1 host-if bvi1 lcp create HundredGigabitEthernet12/0/0.1234 host-if ice0.1234 lcp create BondEthernet0.10 host-if bond0.10 lcp create HundredGigabitEthernet12/0/0.1235 host-if ice0.1234.1000 ``` Here, interfaces are created in order of loopbacks first, then BondEthernets, then Tunnels, and finally sub-interfaces, first creating single-tagged and then creating dual-tagged sub-interfaces. Of course, the BondEthernet has to be created before any sub-int will be able to be created on it. Note that the QinQ `Hu12/0/0.1235` will be created after its intermediary parent `Hu12/0/0.1234` due to this ordering requirement. Then, the two new bridgedomains `bd1` and `bd11` are created, and finally the _LIP_ plumbing is performed, starting with the PHY `ice12-0-0` and BondEthernet `bond0`, then the two loopbacks, and only then advancing to the two single-tag dot1q interfaces and finally the QinQ interface. For LCPs, this is very important, because in Linux, the interfaces are a tree, not a list. `ice12-0-0` must be created before its child `ice0.1234@ice12-0-0` can be created, and only then can the QinQ `ice0.1234.1000@ice0.1234` be created. This creation order follows from the DAG having an edge signalling an LCP depending on the sub-interface, and an edge between the sub-interface with two tags depending on the sub-interface with one tag, and an edge between the single-tagged sub-interface depending on its PHY. After all this work, `vppcfg` can assert (a) every object that now exists in VPP is in the target configuration and (b) that any object that exists in the configuration also is present in VPP (with the correct attributes). But there's one last thing to do, and that's ensure that the attributes that can be changed at runtime (IP addresses, L2XCs, BondEthernet and bridge-domain members, etc) , are **sync'd** into their respective objects in VPP based on what's in the target configuration: ``` bond add BondEthernet0 GigabitEthernet3/0/0 bond add BondEthernet0 GigabitEthernet3/0/1 comment { ip link set bond0 address 00:25:90:0c:05:01 } set interface l2 bridge loop1 1 bvi set interface l2 bridge BondEthernet0.500 1 set interface l2 tag-rewrite BondEthernet0.500 pop 1 set interface l2 bridge BondEthernet0.501 1 set interface l2 tag-rewrite BondEthernet0.501 pop 1 set interface l2 bridge HundredGigabitEthernet12/0/1 1 set interface l2 tag-rewrite HundredGigabitEthernet12/0/1 disable set interface l2 bridge vxlan_tunnel1 1 set interface l2 tag-rewrite vxlan_tunnel1 disable set interface l2 xconnect BondEthernet0.100 BondEthernet0.200 set interface l2 tag-rewrite BondEthernet0.100 pop 1 set interface l2 xconnect BondEthernet0.200 BondEthernet0.100 set interface l2 tag-rewrite BondEthernet0.200 pop 1 set interface state GigabitEthernet3/0/1 down set interface mtu 9000 GigabitEthernet3/0/1 set interface state GigabitEthernet3/0/1 up set interface mtu packet 9000 GigabitEthernet3/0/0 set interface mtu packet 9000 HundredGigabitEthernet12/0/0 set interface mtu packet 2000 HundredGigabitEthernet12/0/1 set interface mtu packet 2000 vxlan_tunnel1 set interface mtu packet 1500 loop0 set interface mtu packet 1500 loop1 set interface mtu packet 9000 GigabitEthernet3/0/1 set interface mtu packet 1200 HundredGigabitEthernet12/0/0.1234 set interface mtu packet 3000 BondEthernet0.10 set interface mtu packet 2500 BondEthernet0.100 set interface mtu packet 2500 BondEthernet0.200 set interface mtu packet 2000 BondEthernet0.500 set interface mtu packet 2000 BondEthernet0.501 set interface mtu packet 1100 HundredGigabitEthernet12/0/0.1235 set interface state GigabitEthernet3/0/0 down set interface mtu 9000 GigabitEthernet3/0/0 set interface state GigabitEthernet3/0/0 up set interface state HundredGigabitEthernet12/0/0 down set interface mtu 9000 HundredGigabitEthernet12/0/0 set interface state HundredGigabitEthernet12/0/0 up set interface state HundredGigabitEthernet12/0/1 down set interface mtu 2000 HundredGigabitEthernet12/0/1 set interface state HundredGigabitEthernet12/0/1 up set interface ip address HundredGigabitEthernet12/0/0 192.0.2.17/30 set interface ip address HundredGigabitEthernet12/0/0 2001:db8:3::1/64 set interface ip address loop0 10.0.0.1/32 set interface ip address loop0 2001:db8::1/128 set interface ip address loop1 10.0.1.1/24 set interface ip address loop1 2001:db8:1::1/64 set interface state HundredGigabitEthernet12/0/0.1234 up set interface state HundredGigabitEthernet12/0/0.1235 up set interface state BondEthernet0 up set interface state BondEthernet0.10 up set interface state BondEthernet0.100 up set interface state BondEthernet0.200 up set interface state BondEthernet0.500 up set interface state BondEthernet0.501 up set interface state vxlan_tunnel1 up set interface state loop0 up set interface state loop1 up ``` I'm not gonna lie, it's a tonne of work, but it's all a pretty staight forward juggle. The sync phase will look at each object in the config and ensure that the attributes that same object has in the dataplane are present and correct. In my demo, `hippo12.yaml` creates a lot of interfaces and IP addresses, and changes the MTU of pretty much every interface, but in order: * The bondethernet gets its members `Gi3/0/0` and `Gi3/0/1`. As an interesting aside, when VPP creates a BondEthernet it'll initially assign it an ephemeral MAC address. Then, when its first member is added, the MAC address of the BondEthernet will change to that of the first member. The comment reminds me to also set this MAC on the Linux device `bond0`. In the future, I'll add some `PyRoute2` code to do that automatically. * BridgeDomains are next. The BVI `loop1` is added first, then a few sub-interfaces and a tunnel, and VLAN tag-rewriting for tagged interfaces is configured. There are two bridges, but only one of them has members, so there's not much (in fact, there's nothing) to do for the other one. * L2 Cross Connects can be changed at runtime, and they're next. The two interfaces `BE0.100` and `BE0.200` are connected to one another and tag-rewrites are set up for them, considering they are both tagged sub-interfaces. * MTU is next. There's two variants of this. The first one `set interface mtu` is actually a change in the DPDK driver to change the maximum allowable frame size. For this change, some interface types have to be brought down first, the max frame size changed, and then brought back up again. For all the others, the MTU will be changed in a specific order: 1. PHYs will grow their MTU first, as growing a PHY is guaranteed to be always safe. 1. Sub-interfaces will shrink QinX first, then Dot1Q/Dot1AD, then untagged interfaces. This is to ensure we do not leave VPP and LinuxCP in a state where a QinQ sub-int has a higher MTU than any of its parents. 1. Sub-interfaces will grow untagged first, then DOt1Q/Dot1AD, and finally QinX sub-interfaces. Same reason as step 2, no sub-interface will end up with a higher MTU than any of its parents. 1. PHYs will shrink their MTU last. The YAML configuration validation asserts that no PHY can have an MTU lower than any of its children, so this is safe. * Finally, IP addresses are added to `Hu12/0/0`, `loop0` and `loop1`. I can guarantee that adding IP addresses will not clash with any other interface, because pruning would've removed IP addresses from interfaces where they don't belong previously. * And to finish off, the admin state for interfaces is set, again going from PHY, Bond, Tunnel, 1-tagged sub-interfaces and finally 2-tagged sub-interfaces and loopbacks. Let's take it to the test: ``` pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan -o hippo4-to-12.exec [INFO ] root.main: Loading configfile hippo12.yaml [INFO ] vppcfg.config.valid_config: Configuration validated successfully [INFO ] root.main: Configuration is valid [INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823 [INFO ] vppcfg.reconciler.write: Wrote 94 lines to hippo4-to-12.exec [INFO ] root.main: Planning succeeded pim@hippo:~/src/vppcfg$ vppctl exec ~/src/vppcfg/hippo4-to-12.exec pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo12.yaml plan [INFO ] root.main: Loading configfile hippo12.yaml [INFO ] vppcfg.config.valid_config: Configuration validated successfully [INFO ] root.main: Configuration is valid [INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823 [INFO ] root.main: Planning succeeded ``` Notice that after applying `hippo4-to-12.exec`, the planner had nothing else to say. VPP is now in the target configuration state, slick! ### Demo 3: Returning VPP to empty This one is easy, but shows the pruning in action. Let's say I wanted to return VPP to a default configuration without any objects, and its interfaces all at MTU 1500: ``` interfaces: GigabitEthernet3/0/0: mtu: 1500 description: Not Used GigabitEthernet3/0/1: mtu: 1500 description: Not Used HundredGigabitEthernet12/0/0: mtu: 1500 description: Not Used HundredGigabitEthernet12/0/1: mtu: 1500 description: Not Used ``` Simply applying that plan: ``` pim@hippo:~/src/vppcfg$ ./vppcfg -c hippo-empty.yaml plan -o 12-to-empty.exec [INFO ] root.main: Loading configfile hippo-empty.yaml [INFO ] vppcfg.config.valid_config: Configuration validated successfully [INFO ] root.main: Configuration is valid [INFO ] vppcfg.vppapi.connect: VPP version is 22.06-rc0~324-g247385823 [INFO ] vppcfg.reconciler.write: Wrote 66 lines to 12-to-empty.exec [INFO ] root.main: Planning succeeded pim@hippo:~/src/vppcfg$ vppctl vpp# exec ~/src/vppcfg/12-to-empty.exec vpp# show interface Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count GigabitEthernet3/0/0 1 up 1500/0/0/0 GigabitEthernet3/0/1 2 up 1500/0/0/0 HundredGigabitEthernet12/0/0 3 up 1500/0/0/0 HundredGigabitEthernet12/0/1 4 up 1500/0/0/0 local0 0 down 0/0/0/0 ``` ### Final notes Now you may have been wondering why I would call the first file `hippo4.yaml` and the second one `hippo12.yaml`. This is because I have 20 such YAML files that bring Hippo into all sorts of esoteric configuration states, and I do this so that I can do a full integration test of any config morphing into any other config: ``` for i in hippo[0-9]*.yaml; do echo "Clearing: Moving to hippo-empty.yaml" ./vppcfg -c hippo-empty.yaml > /tmp/vppcfg-exec-empty [ -s /tmp/vppcfg-exec-empty ] && vppctl exec /tmp/vppcfg-exec-empty for j in hippo[0-9]*.yaml; do echo " - Moving to $i .. " ./vppcfg -c $i > /tmp/vppcfg-exec_$i [ -s /tmp/vppcfg-exec_$i ] && vppctl exec /tmp/vppcfg-exec_$i echo " - Moving from $i to $j" ./vppcfg -c $j > /tmp/vppcfg-exec_${i}_${j} [ -s /tmp/vppcfg-exec_${i}_${j} ] && vppctl exec /tmp/vppcfg-exec_${i}_${j} echo " - Checking that from $j to $j is empty" ./vppcfg -c $j > /tmp/vppcfg-exec_${j}_${j}_null done done ``` What this does is starts off Hippo with an empty config, then moves it to `hippo1.yaml` and from there it moves the configuration to each YAML file and back to `hippo1.yaml`. Doing this proves, that no matter which configuration I want to obtain, I can get there safely when the VPP dataplane config starts out looking like what is described in `hippo1.yaml`. I'll then move it back to empty, and into `hippo2.yaml`, doing the whole cycle again. So for 20 files, this means ~400 or so configuration transitions. And some of these are special, notably moving from `hippoN.yaml` to the same `hippoN.yaml` should result in zero diffs. With this path planner reasonably well tested, I have pretty high confidence that `vppcfg` can change the dataplane from any existing configuration to any desired target configuration. ## What's next One thing that I didn't mention yet, is that the `vppcfg` path planner works by reading the API configuration state exactly once (at startup), and then it figures out the CLI calls to print without needing to talk to VPP again. This is super useful as it's a non-intrusive way to inspect the changes before applying them, and it's a property I'd like to carry forward. However, I don't necessarily think that emitting the CLI statements is the best user experience, it's more for the purposes of analysis that they can be useful. What I really want to do is emit API calls after the plan is created and reviewed/approved, directly reprogramming the VPP dataplane, and likely the Linux network namespace interfaces as well, for example setting the MAC address of a BondEthernet as I showed in that one comment above, or setting interface alias names based on the configured descriptions. However, the VPP API set needed to do this is not 100% baked yet. For example, I observed crashes when tinkering with BVIs and Loopbacks ([thread](https://lists.fd.io/g/vpp-dev/message/21116)), and fixed a few obvious errors in the Linux CP API ([gerrit](https://gerrit.fd.io/r/c/vpp/+/35479)) but there are still a few more issues to work through before I can set the next step with `vppcfg`. But for now, it's already helping me out tremendously at IPng Networks and I hope it'll be useful for others, too.