ipng.ch/content/articles/2021-09-02-vpp-5.md

---
date: "2021-09-02T12:19:14Z"
title: VPP Linux CP - Part5
aliases:
- /s/articles/2021/09/02/vpp-5.html
params:
  asciinema: true
---

{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}

# About this series

Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its
performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic
_ASR_ (aggregation services router), VPP will look and feel quite familiar as many of the approaches
are shared between the two. One thing notably missing, is the higher level control plane, that is
to say: there is no OSPF or ISIS, BGP, LDP and the like. This series of posts details my work on a
VPP _plugin_ which is called the **Linux Control Plane**, or LCP for short, which creates Linux network
devices that mirror their VPP dataplane counterpart. IPv4 and IPv6 traffic, and associated protocols
like ARP and IPv6 Neighbor Discovery can now be handled by Linux, while the heavy lifting of packet
forwarding is done by the VPP dataplane. Or, said another way: this plugin will allow Linux to use
VPP as a software ASIC for fast forwarding, filtering, NAT, and so on, while keeping control of the
interface state (links, addresses and routes) itself. When the plugin is completed, running software
like [FRR](https://frrouting.org/) or [Bird](https://bird.network.cz/) on top of VPP and achieving
&gt;100Mpps and &gt;100Gbps forwarding rates will be well in reach!

In the previous post, I added support for VPP to consume Netlink messages that describe interfaces,
IP addresses and ARP/ND neighbor changes. This post completes the stablestakes Netlink handler by
adding IPv4 and IPv6 route messages, and ends up with a router in the DFZ consuming 133K IPv6
prefixes and 870K IPv4 prefixes.

## My test setup

The goal of this post is to show what code needed to be written to extend the **Netlink Listener**
plugin I wrote in the [fourth post]({{< ref "2021-08-25-vpp-4" >}}), so that it can consume
route additions/deletions, a thing that is common in dynamic routing protocols such as OSPF and
BGP.

The setup from my [third post]({{< ref "2021-08-15-vpp-3" >}}) is still there, but it's no longer
a focal point for me. I use it (the regular interface + subints and the BondEthernet + subints)
just to ensure my new code doesn't have a regression.

Instead, I'm creating two VLAN interfaces now:
-   The first is in my home network's _servers_ VLAN. There are three OSPF speakers there:
    -   `chbtl0.ipng.ch` and `chbtl1.ipng.ch` are my main routers, they run DANOS and are in
        the Default Free Zone (or DFZ for short).
    -   `rr0.chbtl0.ipng.ch` is one of AS50869's three route-reflectors. Every one of the 13
        routers in AS50869 exchanges BGP information with these, and it cuts down on the total
        amount of iBGP sessions I have to maintain -- see [here](https://networklessons.com/bgp/bgp-route-reflector)
        for details on Route Reflectors.
-   The second is an L2 connection to a local BGP exchange, with only three members (IPng Networks,
    AS50869, Openfactory AS58299, and Stucchinet AS58280). In this VLAN, Openfactory was so kind
    as to configure a full transit session for me, and I'll use it in my test bench.

The test setup offers me the ability to consume OSPF, OSPFv3 and BGP.

### Startingpoint

Based on the state of the plugin after the [fourth post]({{< ref "2021-08-25-vpp-4" >}}),
operators can create VLANs (including .1q, .1ad, QinQ and QinAD subinterfaces) directly in
Linux. They can change link attributes (like set admin state 'up' or 'down', or change
the MTU on a link), they can add/remove IP addresses, and the system will add/remove IPv4
and IPv6 neighbors. But notably, the following Netlink messages are not yet consumed, as shown
by the following example:

```
pim@hippo:~/src/lcpng$ sudo ip link add link e1 name servers type vlan id 101
pim@hippo:~/src/lcpng$ sudo ip link up mtu 1500 servers
pim@hippo:~/src/lcpng$ sudo ip addr add 194.1.163.86/27 dev servers
pim@hippo:~/src/lcpng$ sudo ip ro add default via 194.1.163.65
```

which does the first three commands just fine, but the fourth:
```
linux-cp/nl          [debug ]: dispatch: ignored route/route: add family inet type 1 proto 3
                               table 254 dst 0.0.0.0/0 nexthops { gateway 194.1.163.65 idx 197 }
```

In this post, I'll implement that last missing piece in two functions called `lcp_nl_route_add()`
and `lcp_nl_route_del()`. Here we go!

## Netlink Routes

Reusing the approach from the work-in-progress [[Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122)], I introduce two FIB sources: one
for manual routes (ie. the ones that an operator might set with `ip route add`), and another one
for dynamic routes (ie. what a routing protocol like Bird or FRR might set), this is in
`lcp_nl_proto_fib_source()`. Next, I need a bunch of helper functions that can translate the
Netlink message information into VPP primitives:

-   `lcp_nl_mk_addr46()` converts a Netlink `nl_addr` to a VPP `ip46_address_t`.
-   `lcp_nl_mk_route_prefix()` converts a Netlink `rtnl_route` to a VPP `fib_prefix_t`.
-   `lcp_nl_mk_route_mprefix()` converts a Netlink `rtnl_route` to a VPP `mfib_prefix_t` (for
    multicast routes).
-   `lcp_nl_mk_route_entry_flags()` generates `fib_entry_flag_t` from the Netlink route type,
     table and proto metadata.
-   `lcp_nl_proto_fib_source()` selects the most appropciate FIB source by looking at the
    `rt_proto` field from the Netlink message (see `/etc/iproute2/rt_protos` for a list of
     these). Anything **RTPROT_STATIC** or better is `fib_src`, while anything above that
     becomes `fib_src_dynamic`.
-    `lcp_nl_route_path_parse()` converts a Netlink `rtnl_nexthop` to a VPP `fib_route_path_t`
     and adds that to a growing list of paths. Similar to Netlink's nethops being a list, so
     are the individual paths in VPP, so that lines up perfectly.
-    `lcp_nl_route_path_add_special()` adds a blackhole/unreach/prohibit route to the list
     of paths, in the special-case there is not yet a path for the destination.

With these helpers, I will have enough to manipulate VPP's forwarding information base or _FIB_
for short. But in VPP, the _FIB_ consists of any number of _tables_ (think of them as _VRFs_
or Virtual Routing/Forwarding domains). So first, I need to add these:

-    `lcp_nl_table_find()` selects the matching `{table-id,protocol}` (v4/v6) tuple from
     an internally kept hash of tables.
-    `lcp_nl_table_add_or_lock()` if a table with key `{table-id,protocol}` (v4/v6) hasn't
     been used yet, create one in VPP, and store it for future reference. Otherwise increment
     a table reference counter so I know how many FIB entries VPP will have in this table.
-    `lcp_nl_table_unlock()` given a table, decrease the refcount on it, and if no more
     prefixes are in the table, remove it from VPP.

All of this code was heavily inspired by the pending [[Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122)]
but a few finishing touches were added, and wrapped up in this
[[commit](https://git.ipng.ch/ipng/lcpng/commit/7a76498277edc43beaa680e91e3a0c1787319106)].

### Deletion

Our main function `lcp_nl_route_del()` will remove a route from the given table-id/protocol.
I do this by applying `rtnl_route_foreach_nexthop()` callbacks to the list of Netlink message
nexthops, converting each of them into VPP paths in a `lcp_nl_route_path_parse_t` structure.
If the route is for unreachable/blackhole/prohibit in Linux, add that path too.

Then, remove the VPP paths from the FIB and reduce refcnt or remove the table if it's empty.
This is reasonably straight forward.

### Addition

Adding routes to the FIB is done with `lcp_nl_route_add()`. It immediately becomes obvious
that not all routes are relevant for VPP. A prime example are those in table 255, they are
'local' routes, which have already been set up by IPv4 and IPv6 address addition functions
in VPP. There are some other route types that are invalid, so I'll just skip those.

Link-local IPv6 and IPv6 multicast is also skipped, because they're also added when interfaces
get their IP addresses configured. But for the other routes, similar to deletion, I'll extract
the paths from the Netlink message's netxhops list, by constructing an `lcp_nl_route_path_parse_t`
by walking those Netlink nexthops, and optionally add a _special_ route (in case the route was
for unreachable/blackhole/prohibit in Linux -- those won't have a nexthop).

Then, insert the VPP paths found in the Netlink message into the FIB or the multicast FIB,
respectively.

## Control Plane: Bird

So with this newly added code, the example above of setting a default route shoots to life.
But I can do better! At IPng Networks, my routing suite of choice is Bird2, and I have some
code to generate configurations for it and push those configs safely to routers. So, let's
take a closer look at a configuration on the test machine running VPP + Linux CP with this
new Netlink route handler.

```
router id 194.1.163.86;
protocol device { scan time 10; }
protocol direct { ipv4; ipv6; check link yes; }
```

These first two protocols are internal implementation details. The first, called _device_
periodically scans the network interface list in Linux, to pick up new interfaces. You can
compare it to issuing `ip link` and acting on additions/removals as they occur. The second,
called _direct_, generates directly connected routes for interfaces that have IPv4 or IPv6
addresses configured. It turns out that if I add `194.1.163.86/27` as an IPv4 address on
an interface, it'll generate several Netlink messages: one for the `RTM_NEWADDR` which
I discussed in my [fourth post]({{< ref "2021-08-25-vpp-4" >}}), and also a `RTM_NEWROUTE`
for the connected `194.1.163.64/27` in this case. It helps the kernel understand that if
we want to send a packet to a host in that prefix, we should not send it to the default
gateway, but rather to a nexthop of the device. Those are intermittently called `direct`
or `connected` routes. Ironically, these are called `RTS_DEVICE` routes in Bird2
[ref](https://github.com/BIRD/bird/blob/master/nest/route.h#L373) even though they are
generated by the `direct` routing protocol.

That brings me to the third protocol, one for each address type:
```
protocol kernel kernel4 {
  ipv4 {
    import all;
    export where source != RTS_DEVICE;
  };
}

protocol kernel kernel6 {
  ipv6 {
    import all;
    export where source != RTS_DEVICE;
  };
}
```
We're asking Bird to import any route it learns from the kernel, and we're asking it to
export any route that's not an `RTS_DEVICE` route. The reason for this is that when we
create IPv4/IPv6 addresses, the `ip` command already adds the connected route, and this
avoids Bird from inserting a second, identical route for those connected routes. And with
that, I have a very simple view, given for example these two interfaces:

```
pim@hippo:~/src/lcpng$ sudo ip netns exec dataplane ip route
45.129.224.232/29 dev ixp proto kernel scope link src 45.129.224.235
194.1.163.64/27 dev servers proto kernel scope link src 194.1.163.86

pim@hippo:~/src/lcpng$ sudo ip netns exec dataplane ip -6 route
2a0e:5040:0:2::/64 dev ixp proto kernel metric 256 pref medium
2001:678:d78:3::/64 dev servers proto kernel metric 256 pref medium

pim@hippo:/etc/bird$ birdc show route
BIRD 2.0.7 ready.
Table master4:
45.129.224.232/29    unicast [direct1 20:48:55.547] * (240)
        dev ixp
194.1.163.64/27      unicast [direct1 20:48:55.547] * (240)
        dev servers

Table master6:
2a0e:5040:1001::/64  unicast [direct1 20:48:55.547] * (240)
        dev stucchi
2001:678:d78:3::/64  unicast [direct1 20:48:55.547] * (240)
        dev servers
```

## Control Plane: OSPF

Considering the `servers` network above has a few OSPF speakers in it, I will introduce this
router there as well. The configuration is very straight forward in Bird, let's just add
the OSPF and OSPFv3 protocols as follows:

```
protocol ospf v2 ospf4 {
  ipv4 { export where source = RTS_DEVICE; import all; };
  area 0 {
   interface "lo" { stub yes; };
   interface "servers" { type broadcast; cost 5; };
 };
}

protocol ospf v3 ospf6 {
  ipv6 { export where source = RTS_DEVICE; import all; };
  area 0 {
   interface "lo" { stub yes; };
   interface "servers" { type broadcast; cost 5; };
 };
}
```

Here, I tell OSPF to export all `connected` routes, and accept any route given to it. The only
difference between IPv4 and IPv6 is that the former uses OSPF version 2 of the protocol, and IPv6
uses version 3 of the protocol. And, as with the `kernel` routing protocol above, each instance
has to has its own unique name, so I make the obvious choice.

Within a few seconds, the OSPF Hello packets can be seen going out of the `servers` interface,
and adjacencies form shortly thereafter:

```

pim@hippo:~/src/lcpng$ sudo ip netns exec dataplane ip ro  | wc -l
83
pim@hippo:~/src/lcpng$ sudo ip netns exec dataplane ip -6 ro | wc -l
74

pim@hippo:~/src/lcpng$ birdc show ospf nei ospf4
BIRD 2.0.7 ready.
ospf4:
Router ID       Pri          State      DTime   Interface  Router IP
194.1.163.3       1     Full/Other      39.588  servers    194.1.163.66
194.1.163.87      1     Full/DR         39.588  servers    194.1.163.87
194.1.163.4       1     Full/Other      39.588  servers    194.1.163.67

pim@hippo:~/src/lcpng$ birdc show ospf nei ospf6
BIRD 2.0.7 ready.
ospf6:
Router ID       Pri          State      DTime   Interface  Router IP
194.1.163.87      1     Full/DR         32.221  servers    fe80::5054:ff:feaa:2b24
194.1.163.3       1     Full/BDR        39.504  servers    fe80::9e69:b4ff:fe61:7679
194.1.163.4       1     2-Way/Other     38.357  servers    fe80::9e69:b4ff:fe61:a1dd
```

And all of these were inserted into the VPP forwarding information base, taking for example
the IPng router in Amsterdam, loopback address `194.1.163.32` and `2001:678:d78::8`:

```
DBGvpp# show ip fib 194.1.163.32
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[adjacency:1, recursive-resolution:1, default-route:1, lcp-rt:1, nat-hi:2, ]
194.1.163.32/32 fib:0 index:70 locks:2
  lcp-rt-dynamic refs:1 src-flags:added,contributing,active,
    path-list:[49] locks:142 flags:shared,popular, uPRF-list:49 len:1 itfs:[16, ]
      path:[69] pl-index:49 ip4 weight=1 pref=32 attached-nexthop:  oper-flags:resolved,
        194.1.163.67 TenGigabitEthernet3/0/1.3
      [@0]: ipv4 via 194.1.163.67 TenGigabitEthernet3/0/1.3: mtu:1500 next:5 flags:[] 9c69b461a1dd6805ca324615810000650800

 forwarding:   unicast-ip4-chain
  [@0]: dpo-load-balance: [proto:ip4 index:72 buckets:1 uRPF:49 to:[0:0]]
    [0] [@5]: ipv4 via 194.1.163.67 TenGigabitEthernet3/0/1.3: mtu:1500 next:5 flags:[] 9c69b461a1dd6805ca324615810000650800

DBGvpp# show ip6 fib 2001:678:d78::8
ipv6-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[adjacency:1, default-route:1, ]
2001:678:d78::8/128 fib:0 index:130058 locks:2
  lcp-rt-dynamic refs:1 src-flags:added,contributing,active,
    path-list:[116] locks:220 flags:shared,popular, uPRF-list:106 len:1 itfs:[16, ]
      path:[141] pl-index:116 ip6 weight=1 pref=32 attached-nexthop:  oper-flags:resolved,
        fe80::9e69:b4ff:fe61:a1dd TenGigabitEthernet3/0/1.3
      [@0]: ipv6 via fe80::9e69:b4ff:fe61:a1dd TenGigabitEthernet3/0/1.3: mtu:1500 next:5 flags:[] 9c69b461a1dd6805ca3246158100006586dd

 forwarding:   unicast-ip6-chain
  [@0]: dpo-load-balance: [proto:ip6 index:130060 buckets:1 uRPF:106 to:[0:0]]
    [0] [@5]: ipv6 via fe80::9e69:b4ff:fe61:a1dd TenGigabitEthernet3/0/1.3: mtu:1500 next:5 flags:[] 9c69b461a1dd6805ca3246158100006586dd
```

In the snippet above we can see elements of the Linux CP Netlink Listener plugin doing its work.
It found the right nexthop, the right interface, enabled the FIB entry, and marked it with the
correct _FIB_ source `lcp-rt-dynamic`. And, with OSPF and OSPFv3 now enabled, VPP has gained visibility
to all of my internal network:
```
pim@hippo:~/src/lcpng$ traceroute nlams0.ipng.ch
traceroute to nlams0.ipng.ch (2001:678:d78::8) from 2001:678:d78:3::86, 30 hops max, 24 byte packets
 1  chbtl1.ipng.ch (2001:678:d78:3::1)  0.3182 ms  0.2840 ms  0.1841 ms
 2  chgtg0.ipng.ch (2001:678:d78::2:4:2)  0.5473 ms  0.6996 ms  0.6836 ms
 3  chrma0.ipng.ch (2001:678:d78::2:0:1)  0.7700 ms  0.7693 ms  0.7692 ms
 4  defra0.ipng.ch (2001:678:d78::7)  6.6586 ms  6.6443 ms  6.9292 ms
 5  nlams0.ipng.ch (2001:678:d78::8)  12.8321 ms  12.9398 ms  12.6225 ms
```

## Control Plane: BGP

But the holy grail, and what got me started on this whole adventure, is to be able to participate in the
_Default Free Zone_ using BGP, So let's put these plugins to the test and load up a so-called _full table_
which means: all the routing information needed to reach any part of the internet. As of August'21,
there are about 870'000 such prefixes for IPv4, and aboug 133'000 prefixes for IPv6. We passed the magic
1M number, which I'm sure makes some silicon vendors anxious, because lots of older kit in the field won't
scale beyond a certain size. VPP is totally immune to this problem, so here we go!

```
template bgp T_IBGP4 {
  local as 50869;
  neighbor as 50869;
  source address 194.1.163.86;
  ipv4 { import all; export none; next hop self on; };
};
protocol bgp rr4_frggh0 from T_IBGP4 { neighbor 194.1.163.140; }
protocol bgp rr4_chplo0 from T_IBGP4 { neighbor 194.1.163.148; }
protocol bgp rr4_chbtl0 from T_IBGP4 { neighbor 194.1.163.87; }

template bgp T_IBGP6 {
  local as 50869;
  neighbor as 50869;
  source address 2001:678:d78:3::86;
  ipv6 { import all; export none; next hop self ibgp; };
};
protocol bgp rr6_frggh0 from T_IBGP6 { neighbor 2001:678:d78:6::140; }
protocol bgp rr6_chplo0 from T_IBGP6 { neighbor 2001:678:d78:7::148; }
protocol bgp rr6_chbtl0 from T_IBGP6 { neighbor 2001:678:d78:3::87; }
```

And with these two blocks, I've added six new protocols -- three of them are IPv4 route-reflector
clients, and three of them are IPv6 ones. Once this commits, Bird will be able to find these IP
addresses due to the OSPF routes being loaded into the _FIB_, and once it does that, each of the
route-reflector servers will download a full routing table into Bird's memory, and in turn Bird
will use the `kernel4` and `kernel6` protocol to export them into Linux (essentially performing
an `ip ro add ... via ...` on each), and the kernel will then generate a Netlink message, which
the Linux CP **Netlink Listener** plugin will pick up and the rest, as they say, is history.

I gotta tell you - the first time I saw this working end to end, I was elated. Just seeing blocks
of 6800-7000 of these being pumped into VPP's _FIB_ each 40ms block was just .. magical. And the
performance is pretty good, too, because 7000/40ms is 175K/sec alluding to VPP operators being
able to not only consume but also program into the _FIB_, a full IPv4 and IPv6 table in about 6
seconds, whoa!

```
DBGvpp#
linux-cp/nl   [warn  ]: process_msgs: Processed 6550 messages in 40001 usecs, 2607 left in queue
linux-cp/nl   [warn  ]: process_msgs: Processed 6368 messages in 40000 usecs, 7012 left in queue
linux-cp/nl   [warn  ]: process_msgs: Processed 6460 messages in 40001 usecs, 13163 left in queue
...
linux-cp/nl   [warn  ]: process_msgs: Processed 6418 messages in 40004 usecs, 93606 left in queue
linux-cp/nl   [warn  ]: process_msgs: Processed 6438 messages in 40002 usecs, 96944 left in queue
linux-cp/nl   [warn  ]: process_msgs: Processed 6575 messages in 40002 usecs, 99986 left in queue
linux-cp/nl   [warn  ]: process_msgs: Processed 6552 messages in 40004 usecs, 94767 left in queue
linux-cp/nl   [warn  ]: process_msgs: Processed 5890 messages in 40001 usecs, 88877 left in queue
linux-cp/nl   [warn  ]: process_msgs: Processed 6829 messages in 40003 usecs, 82048 left in queue
...
linux-cp/nl   [warn  ]: process_msgs: Processed 6685 messages in 40004 usecs, 13576 left in queue
linux-cp/nl   [warn  ]: process_msgs: Processed 6701 messages in 40003 usecs, 6893 left in queue
linux-cp/nl   [warn  ]: process_msgs: Processed 6579 messages in 40003 usecs, 314 left in queue
DBGvpp#
```

Due to a good cooperative multitasking approach in the Netlink message queue producer, I will continuously
read Netlink messages from the kernel and put them in a queue, but only consume 40ms or 8000 messages
whichever comes first, after which I yield control back to VPP. So you can see here that when the
kernel is flooding the Netlink messages of the learned BGP routing table, the plugin correctly consumes
what it can, the queue grows (in this case to just about 100K messages) and then quickly shrinks again.

And indeed, Bird, IP and VPP all seem to agree, we did a good job:
```
pim@hippo:~/src/lcpng$ birdc show route count
BIRD 2.0.7 ready.
1741035 of 1741035 routes for 870479 networks in table master4
396518 of 396518 routes for 132479 networks in table master6
Total: 2137553 of 2137553 routes for 1002958 networks in 2 tables

pim@hippo:~/src/lcpng$ sudo ip netns exec dataplane ip -6 ro | wc -l
132430
pim@hippo:~/src/lcpng$ sudo ip netns exec dataplane ip ro | wc -l
870494

pim@hippo:~/src/lcpng$ vppctl sh ip6 fib sum | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
132479
pim@hippo:~/src/lcpng$ vppctl sh ip fib sum | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
870529
```


## Results

The functional regression test I made on day one, the one that ensures end-to-end connectivity to and
from the Linux host interfaces works for all 5 interface types (untagged, .1q tagged, QinQ, .1ad tagged
and QinAD) and for both physical and virtual interfaces (like `TenGigabitEthernet3/0/0` and `BondEthernet0`),
still works. Great.

Here's a screencast [[asciinema](/assets/vpp/432943.cast), [gif](/assets/vpp/432942.gif)] showing me
playing around a bit with that configuration shown above, demonstrating that RIB and FIB
synchronisation works pretty well in both directions, making the combination of these two plugins
sufficient to run a VPP router in the _Default Free Zone_, Whoohoo!

{{< asciinema src="/assets/vpp/432943.cast" >}}


### Future work

**Atomic Updates** - When running VPP + Linux CP in a default free zone BGP environment,
IPv4 and IPv6 prefixes will be constantly updated as the internet topology morphs and changes.
One thing I noticed is that those are often deletes followed by adds with the exact same
nexthop (ie. something in Germany flapped, and this is not deduplicated), which shows up
as many of these pairs of messages like so:

```
linux-cp/nl          [debug ]: route_del: netlink route/route: del family inet6 type 1 proto 12 table 254 dst 2a10:cc40:b03::/48 nexthops { gateway fe80::9e69:b4ff:fe61:a1dd idx 197 }
linux-cp/nl          [debug ]: route_path_parse: path ip6 fe80::9e69:b4ff:fe61:a1dd, TenGigabitEthernet3/0/1.3, []
linux-cp/nl          [info  ]: route_del: table 254 prefix 2a10:cc40:b03::/48 flags
linux-cp/nl          [debug ]: route_add: netlink route/route: add family inet6 type 1 proto 12 table 254 dst 2a10:cc40:b03::/48 nexthops { gateway fe80::9e69:b4ff:fe61:a1dd idx 197 }
linux-cp/nl          [debug ]: route_path_parse: path ip6 fe80::9e69:b4ff:fe61:a1dd, TenGigabitEthernet3/0/1.3, []
linux-cp/nl          [info  ]: route_add: table 254 prefix 2a10:cc40:b03::/48 flags
linux-cp/nl          [info  ]: process_msgs: Processed 2 messages in 225 usecs
```

See how `2a10:cc40:b03::/48` is first removed, and then immediately reinstalled to the exact same
nexthop `fe80::9e69:b4ff:fe61:a1dd` on interface `TenGigabitEthernet3/0/1.3` ? Although it only takes
225&micro;s, it's still a bit sad to parse, create paths, just to remove from the FIB and re-insert the
exact same thing into the FIB. But more importantly, if a packet destined for this prefix arrives in that
225&micro;s window, it will be lost.  So I think I'll build a peek-ahead mechanism to capture specifically
this occurence, and let the two del+add messages cancel each other out.

**Prefix updates towards lo** - When writing the code, I borrowed a bunch from the
pending [[Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122)] but that one has a nasty crash which was hard to
debug and I haven't yet fully understood it. When a add/del occurs for a route towards IPv6 localhost (these
are typically seen when Bird shuts down eBGP sessions and I no longer have a path to a prefix, it'll mark
it as 'unreachable' rather than deleting it. These are *additions* which have a nexthop without a gateway
but with an interface index of 1 (which, in Netlink, is 'lo'). This makes VPP intermittently crash, so I
currently commented this out, while I gain better understanding. Result: blackhole/unreachable/prohibit
specials can not be set using the plugin. Beware!
(disabled in this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c864ed099821f62c5be8cbe9ed3f4dd34000a42)]).

## Credits

I'd like to make clear that the Linux CP plugin is a collaboration between several great minds,
and that my work stands on other software engineer's shoulders. In particular most of the Netlink
socket handling and Netlink message queueing was written by Matthew Smith, and I've had a little bit
of help along the way from Neale Ranns and Jon Loeliger. I'd like to thank them for their work!

## Appendix

#### VPP config

We only use one TenGigabitEthernet device on the router, and create two VLANs on it:

```
IP="sudo ip netns exec dataplane ip"

vppctl set logging class linux-cp rate-limit 1000 level warn syslog-level notice
vppctl lcp create TenGigabitEthernet3/0/1 host-if e1 netns dataplane
$IP link set e1 mtu 1500 up

$IP link add link e1 name ixp type vlan id 179
$IP link set ixp mtu 1500 up
$IP addr add 45.129.224.235/29 dev ixp
$IP addr add 2a0e:5040:0:2::235/64 dev ixp

$IP link add link e1 name servers type vlan id 101
$IP link set servers mtu 1500 up
$IP addr add 194.1.163.86/27 dev servers
$IP addr add 2001:678:d78:3::86/64 dev servers
```

#### Bird config

I'm using a purposefully minimalist configuration for demonstration purposes, posted here
in full for posterity:

```
log syslog all;
log "/var/log/bird/bird.log" { debug, trace, info, remote, warning, error, auth, fatal, bug };

router id 194.1.163.86;

protocol device { scan time 10; }
protocol direct { ipv4; ipv6; check link yes; }
protocol kernel kernel4 { ipv4 { import all; export where source != RTS_DEVICE; }; }
protocol kernel kernel6 { ipv6 { import all; export where source != RTS_DEVICE; }; }

protocol ospf v2 ospf4 {
  ipv4 { export where source = RTS_DEVICE; import all; };
  area 0 {
   interface "lo" { stub yes; };
   interface "servers" { type broadcast; cost 5; };
 };
}

protocol ospf v3 ospf6 {
  ipv6 { export where source = RTS_DEVICE; import all; };
  area 0 {
   interface "lo" { stub yes; };
   interface "servers" { type broadcast; cost 5; };
 };
}

template bgp T_IBGP4 {
  local as 50869;
  neighbor as 50869;
  source address 194.1.163.86;
  ipv4 { import all; export none; next hop self on; };
};
protocol bgp rr4_frggh0 from T_IBGP4 { neighbor 194.1.163.140; }
protocol bgp rr4_chplo0 from T_IBGP4 { neighbor 194.1.163.148; }
protocol bgp rr4_chbtl0 from T_IBGP4 { neighbor 194.1.163.87; }

template bgp T_IBGP6 {
  local as 50869;
  neighbor as 50869;
  source address 2001:678:d78:3::86;
  ipv6 { import all; export none; next hop self ibgp; };
};
protocol bgp rr6_frggh0 from T_IBGP6 { neighbor 2001:678:d78:6::140; }
protocol bgp rr6_chplo0 from T_IBGP6 { neighbor 2001:678:d78:7::148; }
protocol bgp rr6_chbtl0 from T_IBGP6 { neighbor 2001:678:d78:3::87; }
```

#### Final note

You may have noticed that the [commit] links are all to git commits in my private working copy. I
want to wait until my [previous work](https://gerrit.fd.io/r/c/vpp/+/33481) is reviewed and
submitted before piling on more changes. Feel free to contact vpp-dev@ for more information in the
mean time :-)