ipng.ch/content/articles/2022-03-27-vppcfg-1.md

---
date: "2022-03-27T14:19:23Z"
title: VPP Configuration - Part1
aliases:
- /s/articles/2022/03/27/vppcfg-1.html
---

{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}

# About this series

I use VPP - Vector Packet Processor - extensively at IPng Networks. Earlier this year, the VPP community
merged the [Linux Control Plane]({{< ref "2021-08-12-vpp-1" >}}) plugin. I wrote about its deployment
to both regular servers like the [Supermicro]({{< ref "2021-09-21-vpp-7" >}}) routers that run on our
[AS8298]({{< ref "2021-02-27-network" >}}), as well as virtual machines running in
[KVM/Qemu]({{< ref "2021-12-23-vpp-playground" >}}).

Now that I've been running VPP in production for about half a year, I can't help but notice one specific
drawback: VPP is a programmable dataplane, and _by design_ it does not include any configuration or
controlplane management stack. It's meant to be integrated into a full stack by operators. For end-users,
this unfortunately means that typing on the CLI won't persist any configuration, and if VPP is restarted,
it will not pick up where it left off. There's one developer convenience in the form of the `exec`
command-line (and startup.conf!) option, which will read a file and apply the contents to the CLI line
by line. However, if any typo is made in the file, processing immediately stops. It's meant as a convenience
for VPP developers, and is certainly not a useful configuration method for all but the simplest topologies.

Luckily, VPP comes with an extensive set of APIs to allow it to be programmed. So in this series of posts,
I'll detail the work I've done to create a configuration utility that can take a YAML configuration file,
compare it to a running VPP instance, and step-by-step plan through the API calls needed to safely apply
the configuration to the dataplane. Welcome to `vppcfg`!

In this first post, let's take a look at tablestakes: writing a YAML specification which models the main
configuration elements of VPP, and then ensures that the YAML file is both syntactically as well as
semantically correct.

**Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
or reach out by [contacting us](/s/contact/).

## YAML Specification

I decide to use [Yamale](https://github.com/23andMe/Yamale/), which is a schema description language
and validator for [YAML](http://www.yaml.org/spec/1.2/spec.html). YAML is a very simple, text/human-readable
annotation format that can be used to store a wide range of data types. An interesting, but quick introduction
to the YAML language can be found on CraftIRC's [GitHub](https://github.com/Animosity/CraftIRC/wiki/Complete-idiot's-introduction-to-yaml)
page.

The first order of business for me is to devise a YAML file specification which models the configuration
options of VPP objects in an idiomatic way. It's apealing to make the decision to immediately build a
higher level abstraction, but I resist the urge and instead look at the types of objects that exist in
VPP, for example the `VNET_DEVICE_CLASS` types:

*   ***ethernet_simulated_device_class***: Loopbacks
*   ***bvi_device_class***: Bridge Virtual Interfaces
*   ***dpdk_device_class***: DPDK Interfaces
*   ***rdma_device_class***: RDMA Interfaces
*   ***bond_device_class***: BondEthernet Interfaces
*   ***vxlan_device_class***: VXLAN Tunnels

There are several others, but I decide to start with these, as I'll be needing each one of these in my
own network. Looking over the device class specification, I learn a lot about how they are configured,
which arguments and of which type they need, and which data-structures they are represent as in VPP
internally.

### Syntax Validation

Yamale first reads a _schema_ definition file, and then holds a given YAML file against the definition
and shows if the file has a syntax that is well-formed or not. As a practical example, let me start
with the following definition:

```
$ cat << EOF > schema.yaml
sub-interfaces: map(include('sub-interface'),key=int())
---
sub-interface:
  description: str(exclude='\'"',len=64,required=False)
  lcp: str(max=15,matches='[a-z]+[a-z0-9-]*',required=False)
  mtu: int(min=128,max=9216,required=False)
  addresses: list(ip(version=6),required=False)
  encapsulation: include('encapsulation',required=False)
---
encapsulation:
  dot1q: int(min=1,max=4095,required=False)
  dot1ad: int(min=1,max=4095,required=False)
  inner-dot1q: int(min=1,max=4095,required=False)
  exact-match: bool(required=False)
EOF
```

This snippet creates two types, one called `sub-interface` and the other called `encapsulation`. The fields
of the sub-interface, for example the `description` field, must follow the given typing to be valid. In the
case of the description, it must be at most 64 characters long and it must not contain the ` or "
characters. The designation `required=False` notes that this is an optional field and may be omitted.
The `lcp` field is also a string but it must match a certain regular expression, and start with a lowercase
letter. The `MTU` field must be an integer between 128 and 9216, and so on.

One nice feature of Yamale is the ability to reference other object types. I do this here with the `encapsulation`
field, which references an object type of the same name, and again, is optional. This means that when the
`encapsulation` field is encountered in the YAML file Yamale is validating, it'll hold the contents of that
field to the schema below. There, we have `dot1q`, `dot1ad`, `inner-dot1q` and `exact-match` fields, which are
all optional.

Then, at the top of the file, I create the entrypoint schema, which expects YAML files to contain a map
called `sub-interfaces` which is keyed by integers and contains values of type `sub-interface`, tying it all
together.

Yamale comes with a commandline utility to do direct schema validation, which is handy. Let me demonstrate with
the following terrible YAML:
```
$ cat << EOF > bad.yaml
sub-interfaces:
  100:
     description: "Pim's illegal description"
     lcp: "NotAGoodName-AmIRite"
     mtu: 16384
     addresses: 192.0.2.1
     encapsulation: False
EOF

$ yamale -s schemal.yaml bad.yaml
Validating /home/pim/bad.yaml...
Validation failed!
Error validating data '/home/pim/bad.yaml' with schema '/home/pim/schema.yaml'
	sub-interfaces.100.description: 'Pim's illegal description' contains excluded character '''
	sub-interfaces.100.lcp: Length of NotAGoodName-AmIRite is greater than 15
	sub-interfaces.100.lcp: NotAGoodName-AmIRite is not a regex match.
	sub-interfaces.100.mtu: 16384 is greater than 9216
	sub-interfaces.100.addresses: '192.0.2.1' is not a list.
	sub-interfaces.100.encapsulation : 'False' is not a map
```

This file trips so many syntax violations, it should be a crime! In fact every single field is invalid. The one that
is closest to being correct is the `addresses` field, but there I've set it up as a _list_ (not a scalar), and even
then, the list elements are expected to be IPv6 addresses, not IPv4 ones.

So let me try again:

```
$ cat << EOF > good.yaml
sub-interfaces:
  100:
     description: "Core: switch.example.com Te0/1"
     lcp: "xe3-0-0"
     mtu: 9216
     addresses: [ 2001:db8::1, 2001:db8:1::1 ]
     encapsulation:
       dot1q: 100
       exact-match: True
EOF

$ yamale good.yaml
Validating /home/pim/good.yaml...
Validation success! 👍
```

### Semantic Validation

When using Yamale, I can make a good start in _syntax_ validation, that is to say, if a field is present, it follows
a prescribed type. But that's not the whole story, though. There are many configuration files I can think of that
would be syntactically correct, but still make no sense in practice. For example, creating an encapsulation which
has both `dot1q` as well as `dot1ad`, or creating a _LIP_ (Linux Interface Pair) for sub-interface which does not
have `exact-match` set. Or how's about having two sub-interfaces with the same exact encapsulation?

Here's where _semantic_ validation comes in to play. So I set out to create all sorts of constraints, and after
reading the (Yamale validated, so syntactically correct) YAML file, I can hand it into a set of validators that
check for violations of these constraints. By means of example, let me create a few constraints that might capture
the issues described above:

1.  If a sub-interface has encapsulation:
    1.  It MUST have `dot1q` OR `dot1ad` set
    1.  It MUST NOT have `dot1q` AND `dot1ad` both set
1.  If a sub-interface has one or more `addresses`:
    1.  Its encapsulation MUST be set to `exact-match`
    1.  It MUST have an `lcp` set.
    1.  Each individual `address` MUST NOT occur in any other interface

## Config Validation

After spending a few weeks thinking about the problem, I came up with 59 semantic constraints, that is to say
things that might appear OK, but will yield impossible to implement or otherwise erratic VPP configurations.
This article would be a bad place to discuss them all, so I will talk about the structure of `vppcfg` instead.

First, a `Validator` class is instantiated with the Yamale schema. Then, a YAML file is read and passed to the
validator's `validate()` method. It will first run Yamale on the YAML file and make note of any issues that arise.
If so, it will enumerate them in a list and return (bool, [list-of-messages]). The validation will have failed
if the boolean returned is _false_, and if so, the list of messages will help understand which constraint was
violated.

The `vppcfg` schema consists of toplevel types, which are validated in order:

*   ***validate_bondethernets()***'s job is to ensure that anything configured in the `bondethernets` toplevel map
    is correct. For example, if a _BondEthernet_ device is created there, its members should reference existing
    interfaces, and it itself should make an appearance in the `interfaces` map, and the MTU of each member should
    be equal to the MTU of the _BondEthernet_, and so on. See `config/bondethernet.py` for a complete rundown.
*   ***validate_loopbacks()*** is pretty straight forward. It makes a few common assertions, such as that if the
    loopback has addresses, it must also have an LCP, and if it has an LCP, that no other interface has the same
    LCP name, and that all of the addresses configured are unique.
*   ***validate_vxlan_tunnels()*** Yamale already asserts that the `local` and `remote` fields are present and an
    IP address. The semantic validator ensures that the address family of the tunnel endpoints are the same, and that
    the used `VNI` is unique.
*   ***validate_bridgedomains()*** fiddles with its _Bridge Virtual Interface_, making sure that its addresses and
    LCP name are unique. Further, it makes sure that a given member interface is in at most one bridge, and that said
    member is in L2 mode, in other words, that it doesn't have an LCP or an address. An L2 interface can be either in
    a bridgedomain, or act as an L2 Cross Connect, but not both. Finally, it asserts that each member has an MTU
    identical to the bridge's MTU value.
*   ***validate_interfaces()*** is by far the most complex, but a few common things worth calling out is that each
    sub-interface must have a unique encapsulation, and if a given QinQ or QinAD 2-tagged sub-interface has an LCP,
    that there exist a parent Dot1Q or Dot1AD interface with the correct encapsulation, and that it also has an LCP.
    See `config/interface.py` for an extensive overview.

## Testing

Of course, in a configuration model so complex as a VPP router, being able to do a lot of validation helps ensure that
the constraints above are implemented correctly. To help this along, I use _regular_ unittesting as provided by
the Python3 [unittest](https://docs.python.org/3/library/unittest.html) framework, but I extend it to run as well
a special kind of test which I call a `YAMLTest`.

### Unit Testing

This is bread and butter, and should be straight forward for software engineers. I took a model of so called
test-driven development, where I start off by writing a test, which of course fails because the code hasn't been
implemented yet. Then I implement the code, and run this and all other unittests expecting them to pass.

Let me give an example based on BondEthernets, with a YAML config file as follows:

```
bondethernets:
  BondEthernet0:
    interfaces: [ GigabitEthernet1/0/0, GigabitEthernet1/0/1 ]
interfaces:
  GigabitEthernet1/0/0:
    mtu: 3000
  GigabitEthernet1/0/1:
    mtu: 3000
  GigabitEthernet2/0/0:
    mtu: 3000
    sub-interfaces:
      100:
        mtu: 2000

  BondEthernet0:
    mtu: 3000
    lcp: "be012345678"
    addresses: [ 192.0.2.1/29, 2001:db8::1/64 ]
    sub-interfaces:
      100:
        mtu: 2000
        addresses: [ 192.0.2.9/29, 2001:db8:1::1/64 ]
```

As I mentioned when discussing the semantic constraints, there's a few here that jump out at me. First, the
BondEthernet members `Gi1/0/0` and `Gi1/0/1` must exist. There is one BondEthernet defined in this file (obvious,
I know, but bear with me), and `Gi2/0/0` is not a bond member, and certainly `Gi2/0/0.100` is not a bond member,
because having a sub-interface as an LACP member would be super weird. Taking things like this into account, here's
a few tests that could assert that the behavior of the `bondethernets` map in the YAML config is correct:

```
class TestBondEthernetMethods(unittest.TestCase):
    def setUp(self):
        with open("unittest/test_bondethernet.yaml", "r") as f:
            self.cfg = yaml.load(f, Loader = yaml.FullLoader)

    def test_get_by_name(self):
        ifname, iface = bondethernet.get_by_name(self.cfg, "BondEthernet0")
        self.assertIsNotNone(iface)
        self.assertEqual("BondEthernet0", ifname)
        self.assertIn("GigabitEthernet1/0/0", iface['interfaces'])
        self.assertNotIn("GigabitEthernet2/0/0", iface['interfaces'])

        ifname, iface = bondethernet.get_by_name(self.cfg, "BondEthernet-notexist")
        self.assertIsNone(iface)
        self.assertIsNone(ifname)

    def test_members(self):
        self.assertTrue(bondethernet.is_bond_member(self.cfg, "GigabitEthernet1/0/0"))
        self.assertTrue(bondethernet.is_bond_member(self.cfg, "GigabitEthernet1/0/1"))
        self.assertFalse(bondethernet.is_bond_member(self.cfg, "GigabitEthernet2/0/0"))
        self.assertFalse(bondethernet.is_bond_member(self.cfg, "GigabitEthernet2/0/0.100"))

    def test_is_bondethernet(self):
        self.assertTrue(bondethernet.is_bondethernet(self.cfg, "BondEthernet0"))
        self.assertFalse(bondethernet.is_bondethernet(self.cfg, "BondEthernet-notexist"))
        self.assertFalse(bondethernet.is_bondethernet(self.cfg, "GigabitEthernet1/0/0"))

    def test_enumerators(self):
        ifs = bondethernet.get_bondethernets(self.cfg)
        self.assertEqual(len(ifs), 1)
        self.assertIn("BondEthernet0", ifs)
        self.assertNotIn("BondEthernet-noexist", ifs)
```

Every single function that is defined in the file `config/bondethernet.py` (there are four) will have
an accompanying unittest to ensure it works as expected. And every validator module, will have a suite
of unittests fully covering their functionality. In total, I wrote a few dozen unit tests like this,
in an attempt to be reasonably certain that the config validator functionality works as advertised.

### YAML Testing

I added one additional class of unittest called a ***YAMLTest***. What happens here is that a certain YAML configuration
file, which may be valid or have errors, is offered to the end to end config parser (so both the Yamale schema
validator as well as the semantic validators), and all errors are accounted for. As an example, two sub-interfaces
on the same parent cannot have the same encapsulation, so offering the following file to the config validator
is _expected_ to trip errors:

```
$ cat unittest/yaml/error-subinterface1.yaml << EOF
test:
  description: "Two subinterfaces can't have the same encapsulation"
  errors:
    expected:
     - "sub-interface .*.100 does not have unique encapsulation"
     - "sub-interface .*.102 does not have unique encapsulation"
    count: 2
---
interfaces:
  GigabitEthernet1/0/0:
    sub-interfaces:
      100:
        description: "VLAN 100"
      101:
        description: "Another VLAN 100, but without exact-match"
        encapsulation:
          dot1q: 100
      102:
        description: "Another VLAN 100, but without exact-match"
        encapsulation:
          dot1q: 100
          exact-match: True
EOF
```

You can see the file here has two YAML documents (separated by `---`), the first one explains to the YAMLTest
class what to expect. There can either be no errors (in which case `test.errors.count=0`), or there can be
specific errors that are expected. In this case, `Gi1/0/0.100` and `Gi1/0/0/102` have the same encapsulation
but `Gi1/0/0.101` is unique (if you're curious, this is because the encap on 100 and 102 has exact-match,
but the one one 101 does _not_ have exact-match).

The implementation of this YAMLTest class is in `tests.py`, which in turn runs all YAML tests on the files it
finds in `unittest/yaml/*.yaml` (currently 47 specific cases are tested there, which covered 100% of the
semantic constraints), and regular unittests (currently 42, which is a coincidence, I swear!)

# What's next?

These tests, together, give me a pretty strong assurance that any given YAML file that passes the validator,
is indeed a valid configuration for VPP. In my next post, I'll go one step further, and talk about applying
the configuration to a running VPP instance, which is of course the overarching goal. But I would not want
to mess up my (or your!) VPP router by feeding it garbage, so the lions' share of my time so far on this project
has been to assert the YAML file is both syntactically and semantically valid.


In the mean time, you can take a look at my code on [GitHub](https://git.ipng.ch/ipng/vppcfg), but to
whet your appetite, here's a hefty configuration that demonstrates all implemented types:

```
bondethernets:
  BondEthernet0:
    interfaces: [ GigabitEthernet3/0/0, GigabitEthernet3/0/1 ]

interfaces:
  GigabitEthernet3/0/0:
    mtu: 9000
    description: "LAG #1"
  GigabitEthernet3/0/1:
    mtu: 9000
    description: "LAG #2"

  HundredGigabitEthernet12/0/0:
    lcp: "ice0"
    mtu: 9000
    addresses: [ 192.0.2.17/30, 2001:db8:3::1/64 ]
    sub-interfaces:
      1234:
        mtu: 1200
        lcp: "ice0.1234"
        encapsulation:
          dot1q: 1234
          exact-match: True
      1235:
        mtu: 1100
        lcp: "ice0.1234.1000"
        encapsulation:
          dot1q: 1234
          inner-dot1q: 1000
          exact-match: True

  HundredGigabitEthernet12/0/1:
    mtu: 2000
    description: "Bridged"

  BondEthernet0:
    mtu: 9000
    lcp: "be0"
    sub-interfaces:
      100:
        mtu: 2500
        l2xc: BondEthernet0.200
        encapsulation:
           dot1q: 100
           exact-match: False
      200:
        mtu: 2500
        l2xc: BondEthernet0.100
        encapsulation:
           dot1q: 200
           exact-match: False
      500:
        mtu: 2000
        encapsulation:
           dot1ad: 500
           exact-match: False
      501:
        mtu: 2000
        encapsulation:
           dot1ad: 501
           exact-match: False
  vxlan_tunnel1:
    mtu: 2000

loopbacks:
  loop0:
    lcp: "lo0"
    addresses: [ 10.0.0.1/32, 2001:db8::1/128 ]
  loop1:
    lcp: "bvi1"
    addresses: [ 10.0.1.1/24, 2001:db8:1::1/64 ]

bridgedomains:
  bd1:
    mtu: 2000
    bvi: loop1
    interfaces: [ BondEthernet0.500, BondEthernet0.501, HundredGigabitEthernet12/0/1, vxlan_tunnel1 ]
  bd11:
    mtu: 1500

vxlan_tunnels:
  vxlan_tunnel1:
    local: 192.0.2.1
    remote: 192.0.2.2
    vni: 101
```

The vision for my VPP Configuration utility is that it can move from any existing VPP configuration to any
other (validated successfully) configuration with a minimal amount of steps, and that it will plan its
way declaratively from A to B, ordering the calls to the API safely and quickly. Interested? Good, because
I do expect that a utility like this would be very valuable to serious VPP users!