All checks were successful
continuous-integration/drone/push Build is passing
465 lines
26 KiB
Markdown
465 lines
26 KiB
Markdown
---
|
|
date: "2023-05-21T11:01:14Z"
|
|
title: VPP MPLS - Part 3
|
|
aliases:
|
|
- /s/articles/2023/05/21/vpp-mpls-3.html
|
|
---
|
|
|
|
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
|
|
|
# About this series
|
|
|
|
**Special Thanks**: Adrian _vifino_ Pistol for writing this code and for the wonderful collaboration!
|
|
|
|
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its
|
|
performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic
|
|
_ASR_ (aggregation service router), VPP will look and feel quite familiar as many of the approaches
|
|
are shared between the two.
|
|
|
|
In the [[first article]({{< ref "2023-05-07-vpp-mpls-1" >}})] of this series, I took a look at MPLS
|
|
in general, and how setting up static _Label Switched Paths_ can be done in VPP. A few details on
|
|
special case labels (such as _Implicit Null_ which enabled the fabled _Penultimate Hop Popping_)
|
|
were missing, so I took a good look at them in the [[second article]({{< ref "2023-05-17-vpp-mpls-2" >}})] of the series.
|
|
|
|
This was all just good fun but also allowed me to buy some time for
|
|
[@vifino](https://chaos.social/@vifino) who has been implementing MPLS handling within the Linux
|
|
Control Plane plugin for VPP! This final article in the series shows the engineering considerations
|
|
that went in to writing the plugin, which is currently under review but reasonably complete.
|
|
Considering the VPP 23.06 cutoff is next week, I'm not super hopeful that we'll be able to get a
|
|
full community / committer review in time, but at this point both @vifino and I think this code is
|
|
ready for consumption - considering FRR has a good _Label Distribution Protocol_ daemon, I'll switch
|
|
out of my usual habitat of Bird and install a LAB with FRR.
|
|
|
|
Caveat empor, outside of a modest functional and load-test, this MPLS functionality
|
|
hasn't seen a lot of mileage as it's only a few weeks old at this point, so it could definitely
|
|
contain some rough edges. Use at your own risk, but if you did want to discuss issues, the
|
|
[[vpp-dev@](mailto:vpp-dev@lists.fd.io)] mailinglist is a good first stop.
|
|
|
|
## Introduction
|
|
|
|
MPLS support is fairly complete in VPP already, but programming the dataplane would require custom
|
|
integrations, while using the Linux netlink subsystem feels easier from an end-user point of view.
|
|
This is a technical deep dive into the implementation of MPLS in the Linux Control Plane plugin for
|
|
VPP. If you haven't already, now is a good time to read up on the initial implementation of LCP:
|
|
|
|
* [[Part 1]({{< ref "2021-08-12-vpp-1" >}})]: Punting traffic through TUN/TAP interfaces into Linux
|
|
* [[Part 2]({{< ref "2021-08-13-vpp-2" >}})]: Mirroring VPP interface configuration into Linux
|
|
* [[Part 3]({{< ref "2021-08-15-vpp-3" >}})]: Automatically creating sub-interfaces in Linux
|
|
* [[Part 4]({{< ref "2021-08-25-vpp-4" >}})]: Synchronize link state, MTU and addresses to Linux
|
|
* [[Part 5]({{< ref "2021-09-02-vpp-5" >}})]: Netlink Listener, synchronizing state from Linux to VPP
|
|
* [[Part 6]({{< ref "2021-09-10-vpp-6" >}})]: Observability with LibreNMS and VPP SNMP Agent
|
|
* [[Part 7]({{< ref "2021-09-21-vpp-7" >}})]: Productionizing and reference Supermicro fleet at IPng
|
|
|
|
To keep this writeup focused, I'll assume the anatomy of VPP plugins and the Linux Controlplane
|
|
_Interface_ and _Netlink_ plugins are understood. That way, I can focus on the _changes_ needed for
|
|
MPLS integration, which at first glance seem reasonably straight forward.
|
|
|
|
## VPP Linux-CP: Interfaces
|
|
|
|
First off, to enable any MPLS forwarding at all in VPP, I have to create the MPLS forwarding table
|
|
and enable MPLS on one or more interfaces:
|
|
|
|
```
|
|
vpp# mpls table add 0
|
|
vpp# lcp create GigabitEthernet10/0/0 host-if e0
|
|
vpp# set int mpls GigabitEthernet10/0/0 enable
|
|
```
|
|
|
|
What happens when the Gi10/0/0 interface has a _Linux Interface Pair (LIP)_ is that there exists a
|
|
corresponding TAP interface in the dataplane (typically called `tapX`) which in turn appears on the
|
|
Linux side as `e0`. Linux will want to be able to send MPLS datagrams into `e0`, and for that, two
|
|
things must happen:
|
|
|
|
1. Linux kernel must enable MPLS input on `e0`, typically with a sysctl.
|
|
1. VPP must enable MPLS on the TAP, in addition to the phy Gi10/0/0.
|
|
|
|
Therefore, the first order of business is to create a hook where the Linux CP interface plugin can
|
|
be made aware if MPLS is enabled or disabled in VPP - it turns out, such a callback function
|
|
_definition_ already exists, but it was never implemented. [[Gerrit
|
|
38826](https://gerrit.fd.io/r/c/vpp/+/38826)] adds a function `mpls_interface_state_change_add_callback()`,
|
|
which implements the ability to register a callback on MPLS on/off in VPP.
|
|
|
|
Now that the callback plumbing exists, Linux CP will want to register one of these, so that it can
|
|
set MPLS to the same enabled or disabled state on the Linux interface using
|
|
`/proc/sys/net/mpls/conf/${host-if}/input` (which is the moral equivalent of running `sysctl`), and
|
|
it'll also call `mpls_sw_interface_enable_disable()` on the TAP interface. With these changes both
|
|
implemented, enabling MPLS now looks like this in the logs:
|
|
|
|
```
|
|
linux-cp/mpls-sync: sync_state_cb: called for sw_if_index 1
|
|
linux-cp/mpls-sync: sync_state_cb: mpls enabled 1 parent itf-pair: [1] GigabitEthernet10/0/0 tap2 e0 97 type tap netns dataplane
|
|
linux-cp/mpls-sync: sync_state_cb: called for sw_if_index 8
|
|
linux-cp/mpls-sync: sync_state_cb: set mpls input for e0
|
|
```
|
|
|
|
Take a look at the code that implements enable/disable semantics in `src/plugins/linux-cp/lcp_mpls_sync.c`.
|
|
|
|
## VPP Linux-CP: Netlink
|
|
|
|
When Linux installs a route with MPLS labels, it will be seen in the return value of
|
|
`rtnl_route_nh_get_encap_mpls_dst()`. One or more labels can now be read using
|
|
`nl_addr_get_binary_addr()` yielding `struct mpls_label`, which contains the label value, experiment
|
|
bits and TTL, and these can be added to the route path in VPP by casting them to `struct
|
|
fib_mpls_label_t`. The last label in the stackwill have the S-bit set, so we can continue consuming these
|
|
until we find that condition. The first patchset that plays around with these semantics is
|
|
[[38702#2](https://gerrit.fd.io/r/c/vpp/+/38702/2)]. As you can see, MPLS is going to look very much
|
|
like IPv4 and IPv6 route updates in [[previous work]({{< ref "2021-09-02-vpp-5" >}})], in that they
|
|
take the Netlink representation, rewrite them into VPP representation, and update the FIB.
|
|
|
|
Up until now, the Linux Controlplane netlink plugin understands only IPv4 and IPv6. So some
|
|
preparation work is called for:
|
|
|
|
* ***lcp_router_proto_k2f()*** gains the ability to cast Linux `AF_*` into VPP's `FIB_PROTOCOL_*`.
|
|
* ***lcp_router_route_mk_prefix()*** turns into a switch statement that creates a `fib_prefix_t`
|
|
for MPLS address family, in addition to the existing IPv4 and IPv6 types. It uses the non-EOS
|
|
type.
|
|
* ***lcp_router_mpls_nladdr_to_path()*** implements the loop that I described above, taking the
|
|
stack of `struct mpls_label` from Netlink and turning them into a vector of `fib_mpls_label_t`
|
|
for the VPP FIB.
|
|
* ***lcp_router_route_path_parse()*** becomes aware of MPLS SWAP and POP operations (the latter
|
|
being the case if there are 0 labels in the Netlink label stack)
|
|
* ***lcp_router_fib_route_path_dup()*** is a helper function to make a copy of a the FIB path
|
|
for the EOS and non-EOS VPP FIB inserts.
|
|
|
|
The VPP FIB differentiates between entries that are non-EOS (S=0), and can treat them differently to
|
|
those which are EOS (end of stack, S=1). Linux does not make this destinction, so it's safest to
|
|
just install non-EOS **and** EOS entries for each route from Linux. This is why
|
|
`lcp_router_fib_route_path_dup()` exists, otherwise Netlink route deletions for the MPLS routes
|
|
will yield a double free later on.
|
|
|
|
This prep work then allows for the following two main functions to become MPLS aware:
|
|
|
|
* ***lcp_router_route_add()*** when Linux sends a Netlink message about a new route, and that
|
|
route carries MPLS labels, make a copy of the path for the EOS entry and proceed to insert
|
|
both the non-EOS and newly crated EOS entries into the FIB,
|
|
* ***lcp_router_route_del()*** when Linux sends a Netlink message about a deleted route, we can
|
|
remove both the EOS and non-EOS variants of the route from VPP's FIB.
|
|
|
|
## VPP Linux-CP: MPLS with FRR
|
|
|
|
{{< image src="/assets/vpp-mpls/LAB v2.svg" alt="Lab Setup" >}}
|
|
|
|
I finally get to show off @vifino's lab! It's installed based off of a Debian Bookworm build, because
|
|
there's a few Netlink Library changes that haven't made their way into Debian Bullseye yet. The LAB
|
|
image is quickly built and distributed, and for this LAB I'm choosing specifically for
|
|
[[FRR](https://frrouting.org/)] because it ships with a _Label Distribution Protocol_ daemon out of
|
|
the box.
|
|
|
|
First order of business is to enable MPLS on the correct interfaces, and create the MPLS FIB table.
|
|
On each machine, I insert the following in the startup sequence:
|
|
|
|
```
|
|
ipng@vpp0-1:~$ cat << EOF | tee -a /etc/vpp/config/manual.vpp
|
|
mpls table add 0
|
|
set interface mpls GigabitEthernet10/0/0 enable
|
|
set interface mpls GigabitEthernet10/0/1 enable
|
|
EOF
|
|
```
|
|
|
|
The lab comes with OSPF and OSPFv3 enabled on each of the Gi10/0/0 and Gi10/0/1 interfaces that go
|
|
from East to West. This extra sequence enables MPLS on those interfaces, and because they have a
|
|
_Linux Interface Pair (LIP)_, VPP will enable MPLS on the internal TAP interfaces, as well as set
|
|
the Linux `sysctl` to allow the kernel to send MPLS encapsulated packets towards VPP.
|
|
|
|
Next up, turning on _LDP_ for FRR, which is easy enough:
|
|
```
|
|
ipng@vpp0-1:~$ vtysh
|
|
vpp0-2# conf t
|
|
vpp0-2(config)# mpls ldp
|
|
router-id 192.168.10.1
|
|
dual-stack cisco-interop
|
|
ordered-control
|
|
!
|
|
address-family ipv4
|
|
discovery transport-address 192.168.10.1
|
|
label local advertise explicit-null
|
|
interface e0
|
|
interface e1
|
|
exit-address-family
|
|
!
|
|
address-family ipv6
|
|
discovery transport-address 2001:678:d78:200::1
|
|
label local advertise explicit-null
|
|
ttl-security disable
|
|
interface e0
|
|
interface e1
|
|
exit-address-family
|
|
exit
|
|
```
|
|
|
|
I configure _LDP_ here to prefer advertising locally connected routes as _MPLS Explicit NULL_, which I
|
|
described in detail in the [[previous post]({{< ref "2023-05-17-vpp-mpls-2" >}})]. It tells the
|
|
penultimate router to send the router a packet as MPLS with label value 0,S=1 for IPv4 and value 2,S=1
|
|
for IPv6, so that VPP knows imediately to decapsulate the packet and continue to IPv4/IPv6 forwarding.
|
|
An alternative here is setting implicit-null, which instructs the router before this one to perform
|
|
_Penultimate Hop Popping_. If this is confusing, take a look at that article for reference!
|
|
|
|
Otherwise, just giving each router a transport-address of a loopback interface, and a unique router-id,
|
|
the same as used in OSPF and OSPFv3, and we're off to the races. Just take a look at how easy this was:
|
|
|
|
```
|
|
vpp0-1# show mpls ldp discovery
|
|
AF ID Type Source Holdtime
|
|
ipv4 192.168.10.0 Link e0 15
|
|
ipv4 192.168.10.2 Link e1 15
|
|
ipv6 192.168.10.0 Link e0 15
|
|
ipv6 192.168.10.2 Link e1 15
|
|
|
|
vpp0-1# show mpls ldp neighbor
|
|
AF ID State Remote Address Uptime
|
|
ipv6 192.168.10.0 OPERATIONAL 2001:678:d78:200::
|
|
19:49:10
|
|
ipv6 192.168.10.2 OPERATIONAL 2001:678:d78:200::2
|
|
19:49:10
|
|
```
|
|
|
|
The first `show ... discovery` shows which interfaces are receiving multicast _LDP Hello Packets_,
|
|
and because I enabled discovery for both IPv4 and IPv6, I can see two pairs there. If I look at
|
|
which interfaces formed adjacencies, `show ... neighbor` reveals that LDP is preferring IPv6, and
|
|
that both adjacencies to `vpp0-0` and `vpp0-2` are operational. Awesome sauce!
|
|
|
|
I see _LDP_ neighbor adjacencies, so let me show you what label information was actually
|
|
exchanged, in three different places, **FRR**'s label distribution protocol daemon, **Linux**'s
|
|
IPv4, IPv6 and MPLS routing tables, and **VPP**'s dataplane forwarding information base.
|
|
|
|
### MPLS: FRR view
|
|
|
|
There are two things to note -- the IPv4 and IPv6 routing table, called a _Forwarding Equivalent Class
|
|
(FEC)_, and the MPLS forwarding table, called the _MPLS FIB_:
|
|
|
|
```
|
|
vpp0-1# show mpls ldp binding
|
|
AF Destination Nexthop Local Label Remote Label In Use
|
|
ipv4 192.168.10.0/32 192.168.10.0 20 exp-null yes
|
|
ipv4 192.168.10.1/32 0.0.0.0 exp-null - no
|
|
ipv4 192.168.10.2/32 192.168.10.2 16 exp-null yes
|
|
ipv4 192.168.10.3/32 192.168.10.2 33 33 yes
|
|
ipv4 192.168.10.4/31 192.168.10.0 21 exp-null yes
|
|
ipv4 192.168.10.6/31 192.168.10.0 exp-null exp-null no
|
|
ipv4 192.168.10.8/31 192.168.10.2 exp-null exp-null no
|
|
ipv4 192.168.10.10/31 192.168.10.2 17 exp-null yes
|
|
ipv6 2001:678:d78:200::/128 192.168.10.0 18 exp-null yes
|
|
ipv6 2001:678:d78:200::1/128 0.0.0.0 exp-null - no
|
|
ipv6 2001:678:d78:200::2/128 192.168.10.2 31 exp-null yes
|
|
ipv6 2001:678:d78:200::3/128 192.168.10.2 38 34 yes
|
|
ipv6 2001:678:d78:210::/60 0.0.0.0 48 - no
|
|
ipv6 2001:678:d78:210::/128 0.0.0.0 39 - no
|
|
ipv6 2001:678:d78:210::1/128 0.0.0.0 40 - no
|
|
ipv6 2001:678:d78:210::2/128 0.0.0.0 41 - no
|
|
ipv6 2001:678:d78:210::3/128 0.0.0.0 42 - no
|
|
|
|
vpp0-1# show mpls table
|
|
Inbound Label Type Nexthop Outbound Label
|
|
------------------------------------------------------------------
|
|
16 LDP 192.168.10.9 IPv4 Explicit Null
|
|
17 LDP 192.168.10.9 IPv4 Explicit Null
|
|
18 LDP fe80::5054:ff:fe00:1001 IPv6 Explicit Null
|
|
19 LDP fe80::5054:ff:fe00:1001 IPv6 Explicit Null
|
|
20 LDP 192.168.10.6 IPv4 Explicit Null
|
|
21 LDP 192.168.10.6 IPv4 Explicit Null
|
|
31 LDP fe80::5054:ff:fe02:1000 IPv6 Explicit Null
|
|
32 LDP fe80::5054:ff:fe02:1000 IPv6 Explicit Null
|
|
33 LDP 192.168.10.9 33
|
|
38 LDP fe80::5054:ff:fe02:1000 34
|
|
```
|
|
|
|
In the first table, each entry of the IPv4 and IPv6 routing table, as fed by OSPF and OSPFv3,
|
|
will get a label associated with them. The negotiation of _LDP_ will ask our peer to set a
|
|
specific label, and it'll inform the peer on which label we are intending to use for the
|
|
_Label Switched Path_ towards that destination. I'll give two examples to illustrate how
|
|
this table is used:
|
|
1. This router (`vpp0-1`) has a peer `vpp0-0` and when this router wants to send traffic to
|
|
it, it'll be sent with `exp-null` (because it is the last router in the _LSP_), but when
|
|
other routers might want to use this router to reach `vpp0-0`, they should use the MPLS
|
|
label value 20.
|
|
1. This router (`vpp0-1`) is _not_ directly connected to `vpp0-3` and as such, its IPv4 and IPv6
|
|
loopback addresses are going to contain labels in both directions: if `vpp0-1` itself
|
|
wants to send a packet to `vpp0-3`, it will use label value 33 and 38 respectively.
|
|
However, if other routers want to use this router to reach `vpp0-3`, they should use the
|
|
MPLS label value 33 and 34 respectively.
|
|
|
|
The second table describes the MPLS _Forwarding Information Base (FIB)_. When receiving an
|
|
MPLS packet with an inbound label noted in this table, the operation applied is _SWAP_ to the
|
|
outbound label, and forward towards a nexthop -- this is the stuff that _P-Routers_ use when
|
|
transiting MPLS traffic.
|
|
|
|
### MPLS: Linux view
|
|
|
|
FRR's LDP daemon will offer both of these routing tables to the Linux kernel using Netlink
|
|
messages, so the Linux view looks similar:
|
|
|
|
```
|
|
root@vpp0-1:~# ip ro
|
|
192.168.10.0 nhid 230 encap mpls 0 via 192.168.10.6 dev e0 proto ospf src 192.168.10.1 metric 20
|
|
192.168.10.2 nhid 226 encap mpls 0 via 192.168.10.9 dev e1 proto ospf src 192.168.10.1 metric 20
|
|
192.168.10.3 nhid 227 encap mpls 33 via 192.168.10.9 dev e1 proto ospf src 192.168.10.1 metric 20
|
|
192.168.10.4/31 nhid 230 encap mpls 0 via 192.168.10.6 dev e0 proto ospf src 192.168.10.1 metric 20
|
|
192.168.10.6/31 dev e0 proto kernel scope link src 192.168.10.7
|
|
192.168.10.8/31 dev e1 proto kernel scope link src 192.168.10.8
|
|
192.168.10.10/31 nhid 226 encap mpls 0 via 192.168.10.9 dev e1 proto ospf src 192.168.10.1 metric 20
|
|
|
|
root@vpp0-1:~# ip -6 ro
|
|
2001:678:d78:200:: nhid 231 encap mpls 2 via fe80::5054:ff:fe00:1001 dev e0 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
|
2001:678:d78:200::1 dev loop0 proto kernel metric 256 pref medium
|
|
2001:678:d78:200::2 nhid 237 encap mpls 2 via fe80::5054:ff:fe02:1000 dev e1 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
|
2001:678:d78:200::3 nhid 239 encap mpls 34 via fe80::5054:ff:fe02:1000 dev e1 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
|
2001:678:d78:201::/112 nhid 231 encap mpls 2 via fe80::5054:ff:fe00:1001 dev e0 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
|
2001:678:d78:201::1:0/112 dev e0 proto kernel metric 256 pref medium
|
|
2001:678:d78:201::2:0/112 dev e1 proto kernel metric 256 pref medium
|
|
2001:678:d78:201::3:0/112 nhid 237 encap mpls 2 via fe80::5054:ff:fe02:1000 dev e1 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
|
|
|
root@vpp0-1:~# ip -f mpls ro
|
|
16 as to 0 via inet 192.168.10.9 dev e1 proto ldp
|
|
17 as to 0 via inet 192.168.10.9 dev e1 proto ldp
|
|
18 as to 2 via inet6 fe80::5054:ff:fe00:1001 dev e0 proto ldp
|
|
19 as to 2 via inet6 fe80::5054:ff:fe00:1001 dev e0 proto ldp
|
|
20 as to 0 via inet 192.168.10.6 dev e0 proto ldp
|
|
21 as to 0 via inet 192.168.10.6 dev e0 proto ldp
|
|
31 as to 2 via inet6 fe80::5054:ff:fe02:1000 dev e1 proto ldp
|
|
32 as to 2 via inet6 fe80::5054:ff:fe02:1000 dev e1 proto ldp
|
|
33 as to 33 via inet 192.168.10.9 dev e1 proto ldp
|
|
38 as to 34 via inet6 fe80::5054:ff:fe02:1000 dev e1 proto ldp
|
|
```
|
|
|
|
The first two tabled show a 'regular' Linux routing table for IPv4 and IPv6 respectively, except there's
|
|
an `encap mpls <X>` added for all not-directly-connected prefixes. In this case, `vpp0-1` connects on
|
|
`e0` to `vpp0-0` to the West, and on interface `e1` to `vpp0-2` to the East. These connected routes do
|
|
not carry MPLS information and in fact, this is how LDP can continue to work and exchange information
|
|
naturally even when no _LSPs_ are established yet.
|
|
|
|
The third table is the _MPLS FIB_, and it shows the special case of _MPLS Explicit NULL_ clearly. All IPv4
|
|
routes for which this router is the penultimate hop carry the outbound label value 0,S=1, while the IPv6
|
|
routes carry the value 2,S=1. Booyah!
|
|
|
|
### MPLS: VPP view
|
|
|
|
The _FIB_ information in general is super densely populated in VPP. Rather than dumping the whole table,
|
|
I'll show one example, for `192.168.10.3` which we can see above will be encapsulated into an MPLS
|
|
packet with label value 33,S=0 before being fowarded:
|
|
|
|
```
|
|
root@vpp0-1:~# vppctl show ip fib 192.168.10.3
|
|
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]
|
|
192.168.10.3/32 fib:0 index:78 locks:2
|
|
lcp-rt-dynamic refs:1 src-flags:added,contributing,active,
|
|
path-list:[29] locks:6 flags:shared, uPRF-list:53 len:1 itfs:[2, ]
|
|
path:[41] pl-index:29 ip4 weight=1 pref=20 attached-nexthop: oper-flags:resolved,
|
|
192.168.10.9 GigabitEthernet10/0/1
|
|
[@0]: ipv4 via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:7 flags:[] 5254000210005254000110010800
|
|
Extensions:
|
|
path:41 labels:[[33 pipe ttl:0 exp:0]]
|
|
forwarding: unicast-ip4-chain
|
|
[@0]: dpo-load-balance: [proto:ip4 index:81 buckets:1 uRPF:53 to:[2421:363846]]
|
|
[0] [@13]: mpls-label[@4]:[33:64:0:eos]
|
|
[@1]: mpls via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:3 flags:[] 5254000210005254000110018847
|
|
```
|
|
|
|
The trick is looking at the Extensions, which shows the _out-labels_ set to 33, with ttl=0 (which makes
|
|
VPP copy the TTL from the IPv4 packet itself), and exp=0. It can then forward the packet as MPLS onto
|
|
the nexthop at 192.168.10.9 (`vpp0-2.e0` on Gi10/0/1).
|
|
|
|
The MPLS _FIB_ is also a bit chatty, and shows a fundamental difference with Linux:
|
|
|
|
```
|
|
root@vpp0-1:~# vppctl show mpls fib 33
|
|
MPLS-VRF:0, fib_index:0 locks:[interface:6, CLI:1, lcp-rt:1, ]
|
|
33:neos/21 fib:0 index:37 locks:2
|
|
lcp-rt-dynamic refs:1 src-flags:added,contributing,active,
|
|
path-list:[57] locks:12 flags:shared, uPRF-list:21 len:1 itfs:[2, ]
|
|
path:[81] pl-index:57 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
|
|
192.168.10.9 GigabitEthernet10/0/1
|
|
[@0]: ipv4 via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:7 flags:[] 5254000210005254000110010800
|
|
Extensions:
|
|
path:81 labels:[[33 pipe ttl:0 exp:0]]
|
|
forwarding: mpls-neos-chain
|
|
[@0]: dpo-load-balance: [proto:mpls index:40 buckets:1 uRPF:21 to:[0:0]]
|
|
[0] [@6]: mpls-label[@28]:[33:64:0:neos]
|
|
[@1]: mpls via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:3 flags:[] 5254000210005254000110018847
|
|
|
|
33:eos/21 fib:0 index:64 locks:2
|
|
lcp-rt-dynamic refs:1 src-flags:added,contributing,active,
|
|
path-list:[57] locks:12 flags:shared, uPRF-list:21 len:1 itfs:[2, ]
|
|
path:[81] pl-index:57 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
|
|
192.168.10.9 GigabitEthernet10/0/1
|
|
[@0]: ipv4 via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:7 flags:[] 5254000210005254000110010800
|
|
Extensions:
|
|
path:81 labels:[[33 pipe ttl:0 exp:0]]
|
|
forwarding: mpls-eos-chain
|
|
[@0]: dpo-load-balance: [proto:mpls index:67 buckets:1 uRPF:21 to:[73347:10747680]]
|
|
[0] [@6]: mpls-label[@29]:[33:64:0:eos]
|
|
[@1]: mpls via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:3 flags:[] 5254000210005254000110018847
|
|
```
|
|
|
|
I note that there are two entries here -- I wrote about them above. The MPLS implementation in VPP
|
|
allows for a different forwarding behavior in the case that the label inspected is the last one in the
|
|
stack (S=1), which is the usual case called _End of Stack (EOS)_. But, it also has a second entry
|
|
which tells it what to do if S=0 or _Not End of Stack (NEOS)_. Linux doesn't make the destinction, so
|
|
@vifino added two identical entries using that ***lcp_router_fib_route_path_dup()*** function.
|
|
|
|
But, what the entries themselves mean is that if this `vpp0-1` router were to receive an MPLS
|
|
packet with label value 33,S=1 (or value 33,S=0), it'll perform a _SWAP_ operation and put as
|
|
new outbound label (the same) value 33, and forward the packet as MPLS onto 192.168.10.9 on Gi10/0/1.
|
|
|
|
## Results
|
|
|
|
And with that, I think we achieved a running LDP with IPv4 and IPv6 and forwarding + encapsulation
|
|
of MPLS with VPP. One cool wrapup I thought I'd leave you with, is showing how these MPLS routers
|
|
are transparent with respect to IP traffic going through them. If I look at the diagram above, `lab`
|
|
reaches `vpp0-3` via three hops: first into `vpp0-0` where it is wrapped into MPLS and forwarded
|
|
to `vpp0-1`, and then through `vpp0-2`, which sets the _Explicit NULL_ label and forwards again
|
|
as MPLS onto `vpp0-3`, which does the IPv4 and IPv6 lookup.
|
|
|
|
Check this out:
|
|
|
|
```
|
|
pim@lab:~$ for node in $(seq 0 3); do traceroute -4 -q1 vpp0-$node; done
|
|
traceroute to vpp0-0 (192.168.10.0), 30 hops max, 60 byte packets
|
|
1 vpp0-0.lab.ipng.ch (192.168.10.0) 1.907 ms
|
|
traceroute to vpp0-1 (192.168.10.1), 30 hops max, 60 byte packets
|
|
1 vpp0-1.lab.ipng.ch (192.168.10.1) 2.460 ms
|
|
traceroute to vpp0-1 (192.168.10.2), 30 hops max, 60 byte packets
|
|
1 vpp0-2.lab.ipng.ch (192.168.10.2) 3.860 ms
|
|
traceroute to vpp0-1 (192.168.10.3), 30 hops max, 60 byte packets
|
|
1 vpp0-3.lab.ipng.ch (192.168.10.3) 4.414 ms
|
|
|
|
pim@lab:~$ for node in $(seq 0 3); do traceroute -6 -q1 vpp0-$node; done
|
|
traceroute to vpp0-0 (2001:678:d78:200::), 30 hops max, 80 byte packets
|
|
1 vpp0-0.lab.ipng.ch (2001:678:d78:200::) 3.037 ms
|
|
traceroute to vpp0-1 (2001:678:d78:200::1), 30 hops max, 80 byte packets
|
|
1 vpp0-1.lab.ipng.ch (2001:678:d78:200::1) 5.125 ms
|
|
traceroute to vpp0-1 (2001:678:d78:200::2), 30 hops max, 80 byte packets
|
|
1 vpp0-2.lab.ipng.ch (2001:678:d78:200::2) 7.135 ms
|
|
traceroute to vpp0-1 (2001:678:d78:200::3), 30 hops max, 80 byte packets
|
|
1 vpp0-3.lab.ipng.ch (2001:678:d78:200::3) 8.763 ms
|
|
```
|
|
|
|
With MPLS, each of these routers appears to the naked eye to be directly connected to the
|
|
`lab` headend machine, but we know better! :)
|
|
|
|
## What's next
|
|
|
|
I joined forces with [@vifino](https://chaos.social/@vifino) who has effectively added MPLS handling
|
|
to the Linux Control Plane, so VPP can start to function as an MPLS router using FRR's label
|
|
distribution protocol implementation. Gosh, I wish Bird3 would have LDP :)
|
|
|
|
Our work is mostly complete, there's two pending Gerrit's which should be ready to review and
|
|
certainly ready to play with:
|
|
|
|
1. [[Gerrit 38826](https://gerrit.fd.io/r/c/vpp/+/38826)]: This adds the ability to listen to internal
|
|
state changes of an interface, so that the Linux Control Plane plugin can enable MPLS on the
|
|
_LIP_ interfaces and Linux sysctl for MPLS input.
|
|
1. [[Gerrit 38702](https://gerrit.fd.io/r/c/vpp/+/38702)]: This adds the ability to listen to Netlink
|
|
messages in the Linux Control Plane plugin, and sensibly apply these routes to the IPv4, IPv6
|
|
and MPLS FIB in the VPP dataplane.
|
|
|
|
Finally, a note from your friendly neighborhood developers: this code is brand-new and has had _very
|
|
limited_ peer-review from the VPP developer community. It adds a significant feature to the Linux
|
|
Controlplane plugin, so make sure you both understand the semantics, the differences between Linux
|
|
and VPP, and the overall implementation before attempting to use in production. We're pretty sure
|
|
we got at least some of this right, but testing and runtime experience will tell.
|
|
|
|
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
|
[[GitHub](https://git.ipng.ch/ipng/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
|
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
|
|