Rewrite all images to Hugo format
This commit is contained in:
463
content/articles/2023-05-21-vpp-mpls-3.md
Normal file
463
content/articles/2023-05-21-vpp-mpls-3.md
Normal file
@ -0,0 +1,463 @@
|
||||
---
|
||||
date: "2023-05-21T11:01:14Z"
|
||||
title: VPP MPLS - Part 3
|
||||
---
|
||||
|
||||
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
||||
|
||||
# About this series
|
||||
|
||||
**Special Thanks**: Adrian _vifino_ Pistol for writing this code and for the wonderful collaboration!
|
||||
|
||||
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its
|
||||
performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic
|
||||
_ASR_ (aggregation service router), VPP will look and feel quite familiar as many of the approaches
|
||||
are shared between the two.
|
||||
|
||||
In the [[first article]({%post_url 2023-05-07-vpp-mpls-1 %})] of this series, I took a look at MPLS
|
||||
in general, and how setting up static _Label Switched Paths_ can be done in VPP. A few details on
|
||||
special case labels (such as _Implicit Null_ which enabled the fabled _Penultimate Hop Popping_)
|
||||
were missing, so I took a good look at them in the [[second article]({% post_url
|
||||
2023-05-17-vpp-mpls-2 %})] of the series.
|
||||
|
||||
This was all just good fun but also allowed me to buy some time for
|
||||
[@vifino](https://chaos.social/@vifino) who has been implementing MPLS handling within the Linux
|
||||
Control Plane plugin for VPP! This final article in the series shows the engineering considerations
|
||||
that went in to writing the plugin, which is currently under review but reasonably complete.
|
||||
Considering the VPP 23.06 cutoff is next week, I'm not super hopeful that we'll be able to get a
|
||||
full community / committer review in time, but at this point both @vifino and I think this code is
|
||||
ready for consumption - considering FRR has a good _Label Distribution Protocol_ daemon, I'll switch
|
||||
out of my usual habitat of Bird and install a LAB with FRR.
|
||||
|
||||
Caveat empor, outside of a modest functional and load-test, this MPLS functionality
|
||||
hasn't seen a lot of mileage as it's only a few weeks old at this point, so it could definitely
|
||||
contain some rough edges. Use at your own risk, but if you did want to discuss issues, the
|
||||
[[vpp-dev@](mailto:vpp-dev@lists.fd.io)] mailinglist is a good first stop.
|
||||
|
||||
## Introduction
|
||||
|
||||
MPLS support is fairly complete in VPP already, but programming the dataplane would require custom
|
||||
integrations, while using the Linux netlink subsystem feels easier from an end-user point of view.
|
||||
This is a technical deep dive into the implementation of MPLS in the Linux Control Plane plugin for
|
||||
VPP. If you haven't already, now is a good time to read up on the initial implementation of LCP:
|
||||
|
||||
* [[Part 1]({% post_url 2021-08-12-vpp-1 %})]: Punting traffic through TUN/TAP interfaces into Linux
|
||||
* [[Part 2]({% post_url 2021-08-13-vpp-2 %})]: Mirroring VPP interface configuration into Linux
|
||||
* [[Part 3]({% post_url 2021-08-15-vpp-3 %})]: Automatically creating sub-interfaces in Linux
|
||||
* [[Part 4]({% post_url 2021-08-25-vpp-4 %})]: Synchronize link state, MTU and addresses to Linux
|
||||
* [[Part 5]({% post_url 2021-09-02-vpp-5 %})]: Netlink Listener, synchronizing state from Linux to VPP
|
||||
* [[Part 6]({% post_url 2021-09-10-vpp-6 %})]: Observability with LibreNMS and VPP SNMP Agent
|
||||
* [[Part 7]({% post_url 2021-09-21-vpp-7 %})]: Productionizing and reference Supermicro fleet at IPng
|
||||
|
||||
To keep this writeup focused, I'll assume the anatomy of VPP plugins and the Linux Controlplane
|
||||
_Interface_ and _Netlink_ plugins are understood. That way, I can focus on the _changes_ needed for
|
||||
MPLS integration, which at first glance seem reasonably straight forward.
|
||||
|
||||
## VPP Linux-CP: Interfaces
|
||||
|
||||
First off, to enable any MPLS forwarding at all in VPP, I have to create the MPLS forwarding table
|
||||
and enable MPLS on one or more interfaces:
|
||||
|
||||
```
|
||||
vpp# mpls table add 0
|
||||
vpp# lcp create GigabitEthernet10/0/0 host-if e0
|
||||
vpp# set int mpls GigabitEthernet10/0/0 enable
|
||||
```
|
||||
|
||||
What happens when the Gi10/0/0 interface has a _Linux Interface Pair (LIP)_ is that there exists a
|
||||
corresponding TAP interface in the dataplane (typically called `tapX`) which in turn appears on the
|
||||
Linux side as `e0`. Linux will want to be able to send MPLS datagrams into `e0`, and for that, two
|
||||
things must happen:
|
||||
|
||||
1. Linux kernel must enable MPLS input on `e0`, typically with a sysctl.
|
||||
1. VPP must enable MPLS on the TAP, in addition to the phy Gi10/0/0.
|
||||
|
||||
Therefore, the first order of business is to create a hook where the Linux CP interface plugin can
|
||||
be made aware if MPLS is enabled or disabled in VPP - it turns out, such a callback function
|
||||
_definition_ already exists, but it was never implemented. [[Gerrit
|
||||
38826](https://gerrit.fd.io/r/c/vpp/+/38826)] adds a function `mpls_interface_state_change_add_callback()`,
|
||||
which implements the ability to register a callback on MPLS on/off in VPP.
|
||||
|
||||
Now that the callback plumbing exists, Linux CP will want to register one of these, so that it can
|
||||
set MPLS to the same enabled or disabled state on the Linux interface using
|
||||
`/proc/sys/net/mpls/conf/${host-if}/input` (which is the moral equivalent of running `sysctl`), and
|
||||
it'll also call `mpls_sw_interface_enable_disable()` on the TAP interface. With these changes both
|
||||
implemented, enabling MPLS now looks like this in the logs:
|
||||
|
||||
```
|
||||
linux-cp/mpls-sync: sync_state_cb: called for sw_if_index 1
|
||||
linux-cp/mpls-sync: sync_state_cb: mpls enabled 1 parent itf-pair: [1] GigabitEthernet10/0/0 tap2 e0 97 type tap netns dataplane
|
||||
linux-cp/mpls-sync: sync_state_cb: called for sw_if_index 8
|
||||
linux-cp/mpls-sync: sync_state_cb: set mpls input for e0
|
||||
```
|
||||
|
||||
Take a look at the code that implements enable/disable semantics in `src/plugins/linux-cp/lcp_mpls_sync.c`.
|
||||
|
||||
## VPP Linux-CP: Netlink
|
||||
|
||||
When Linux installs a route with MPLS labels, it will be seen in the return value of
|
||||
`rtnl_route_nh_get_encap_mpls_dst()`. One or more labels can now be read using
|
||||
`nl_addr_get_binary_addr()` yielding `struct mpls_label`, which contains the label value, experiment
|
||||
bits and TTL, and these can be added to the route path in VPP by casting them to `struct
|
||||
fib_mpls_label_t`. The last label in the stackwill have the S-bit set, so we can continue consuming these
|
||||
until we find that condition. The first patchset that plays around with these semantics is
|
||||
[[38702#2](https://gerrit.fd.io/r/c/vpp/+/38702/2)]. As you can see, MPLS is going to look very much
|
||||
like IPv4 and IPv6 route updates in [[previous work]({%post_url 2021-09-02-vpp-5 %})], in that they
|
||||
take the Netlink representation, rewrite them into VPP representation, and update the FIB.
|
||||
|
||||
Up until now, the Linux Controlplane netlink plugin understands only IPv4 and IPv6. So some
|
||||
preparation work is called for:
|
||||
|
||||
* ***lcp_router_proto_k2f()*** gains the ability to cast Linux `AF_*` into VPP's `FIB_PROTOCOL_*`.
|
||||
* ***lcp_router_route_mk_prefix()*** turns into a switch statement that creates a `fib_prefix_t`
|
||||
for MPLS address family, in addition to the existing IPv4 and IPv6 types. It uses the non-EOS
|
||||
type.
|
||||
* ***lcp_router_mpls_nladdr_to_path()*** implements the loop that I described above, taking the
|
||||
stack of `struct mpls_label` from Netlink and turning them into a vector of `fib_mpls_label_t`
|
||||
for the VPP FIB.
|
||||
* ***lcp_router_route_path_parse()*** becomes aware of MPLS SWAP and POP operations (the latter
|
||||
being the case if there are 0 labels in the Netlink label stack)
|
||||
* ***lcp_router_fib_route_path_dup()*** is a helper function to make a copy of a the FIB path
|
||||
for the EOS and non-EOS VPP FIB inserts.
|
||||
|
||||
The VPP FIB differentiates between entries that are non-EOS (S=0), and can treat them differently to
|
||||
those which are EOS (end of stack, S=1). Linux does not make this destinction, so it's safest to
|
||||
just install non-EOS **and** EOS entries for each route from Linux. This is why
|
||||
`lcp_router_fib_route_path_dup()` exists, otherwise Netlink route deletions for the MPLS routes
|
||||
will yield a double free later on.
|
||||
|
||||
This prep work then allows for the following two main functions to become MPLS aware:
|
||||
|
||||
* ***lcp_router_route_add()*** when Linux sends a Netlink message about a new route, and that
|
||||
route carries MPLS labels, make a copy of the path for the EOS entry and proceed to insert
|
||||
both the non-EOS and newly crated EOS entries into the FIB,
|
||||
* ***lcp_router_route_del()*** when Linux sends a Netlink message about a deleted route, we can
|
||||
remove both the EOS and non-EOS variants of the route from VPP's FIB.
|
||||
|
||||
## VPP Linux-CP: MPLS with FRR
|
||||
|
||||
{{< image src="/assets/vpp-mpls/LAB v2.svg" alt="Lab Setup" >}}
|
||||
|
||||
I finally get to show off @vifino's lab! It's installed based off of a Debian Bookworm build, because
|
||||
there's a few Netlink Library changes that haven't made their way into Debian Bullseye yet. The LAB
|
||||
image is quickly built and distributed, and for this LAB I'm choosing specifically for
|
||||
[[FRR](https://frrouting.org/)] because it ships with a _Label Distribution Protocol_ daemon out of
|
||||
the box.
|
||||
|
||||
First order of business is to enable MPLS on the correct interfaces, and create the MPLS FIB table.
|
||||
On each machine, I insert the following in the startup sequence:
|
||||
|
||||
```
|
||||
ipng@vpp0-1:~$ cat << EOF | tee -a /etc/vpp/config/manual.vpp
|
||||
mpls table add 0
|
||||
set interface mpls GigabitEthernet10/0/0 enable
|
||||
set interface mpls GigabitEthernet10/0/1 enable
|
||||
EOF
|
||||
```
|
||||
|
||||
The lab comes with OSPF and OSPFv3 enabled on each of the Gi10/0/0 and Gi10/0/1 interfaces that go
|
||||
from East to West. This extra sequence enables MPLS on those interfaces, and because they have a
|
||||
_Linux Interface Pair (LIP)_, VPP will enable MPLS on the internal TAP interfaces, as well as set
|
||||
the Linux `sysctl` to allow the kernel to send MPLS encapsulated packets towards VPP.
|
||||
|
||||
Next up, turning on _LDP_ for FRR, which is easy enough:
|
||||
```
|
||||
ipng@vpp0-1:~$ vtysh
|
||||
vpp0-2# conf t
|
||||
vpp0-2(config)# mpls ldp
|
||||
router-id 192.168.10.1
|
||||
dual-stack cisco-interop
|
||||
ordered-control
|
||||
!
|
||||
address-family ipv4
|
||||
discovery transport-address 192.168.10.1
|
||||
label local advertise explicit-null
|
||||
interface e0
|
||||
interface e1
|
||||
exit-address-family
|
||||
!
|
||||
address-family ipv6
|
||||
discovery transport-address 2001:678:d78:200::1
|
||||
label local advertise explicit-null
|
||||
ttl-security disable
|
||||
interface e0
|
||||
interface e1
|
||||
exit-address-family
|
||||
exit
|
||||
```
|
||||
|
||||
I configure _LDP_ here to prefer advertising locally connected routes as _MPLS Explicit NULL_, which I
|
||||
described in detail in the [[previous post]({% post_url 2023-05-17-vpp-mpls-2 %})]. It tells the
|
||||
penultimate router to send the router a packet as MPLS with label value 0,S=1 for IPv4 and value 2,S=1
|
||||
for IPv6, so that VPP knows imediately to decapsulate the packet and continue to IPv4/IPv6 forwarding.
|
||||
An alternative here is setting implicit-null, which instructs the router before this one to perform
|
||||
_Penultimate Hop Popping_. If this is confusing, take a look at that article for reference!
|
||||
|
||||
Otherwise, just giving each router a transport-address of a loopback interface, and a unique router-id,
|
||||
the same as used in OSPF and OSPFv3, and we're off to the races. Just take a look at how easy this was:
|
||||
|
||||
```
|
||||
vpp0-1# show mpls ldp discovery
|
||||
AF ID Type Source Holdtime
|
||||
ipv4 192.168.10.0 Link e0 15
|
||||
ipv4 192.168.10.2 Link e1 15
|
||||
ipv6 192.168.10.0 Link e0 15
|
||||
ipv6 192.168.10.2 Link e1 15
|
||||
|
||||
vpp0-1# show mpls ldp neighbor
|
||||
AF ID State Remote Address Uptime
|
||||
ipv6 192.168.10.0 OPERATIONAL 2001:678:d78:200::
|
||||
19:49:10
|
||||
ipv6 192.168.10.2 OPERATIONAL 2001:678:d78:200::2
|
||||
19:49:10
|
||||
```
|
||||
|
||||
The first `show ... discovery` shows which interfaces are receiving multicast _LDP Hello Packets_,
|
||||
and because I enabled discovery for both IPv4 and IPv6, I can see two pairs there. If I look at
|
||||
which interfaces formed adjacencies, `show ... neighbor` reveals that LDP is preferring IPv6, and
|
||||
that both adjacencies to `vpp0-0` and `vpp0-2` are operational. Awesome sauce!
|
||||
|
||||
I see _LDP_ neighbor adjacencies, so let me show you what label information was actually
|
||||
exchanged, in three different places, **FRR**'s label distribution protocol daemon, **Linux**'s
|
||||
IPv4, IPv6 and MPLS routing tables, and **VPP**'s dataplane forwarding information base.
|
||||
|
||||
### MPLS: FRR view
|
||||
|
||||
There are two things to note -- the IPv4 and IPv6 routing table, called a _Forwarding Equivalent Class
|
||||
(FEC)_, and the MPLS forwarding table, called the _MPLS FIB_:
|
||||
|
||||
```
|
||||
vpp0-1# show mpls ldp binding
|
||||
AF Destination Nexthop Local Label Remote Label In Use
|
||||
ipv4 192.168.10.0/32 192.168.10.0 20 exp-null yes
|
||||
ipv4 192.168.10.1/32 0.0.0.0 exp-null - no
|
||||
ipv4 192.168.10.2/32 192.168.10.2 16 exp-null yes
|
||||
ipv4 192.168.10.3/32 192.168.10.2 33 33 yes
|
||||
ipv4 192.168.10.4/31 192.168.10.0 21 exp-null yes
|
||||
ipv4 192.168.10.6/31 192.168.10.0 exp-null exp-null no
|
||||
ipv4 192.168.10.8/31 192.168.10.2 exp-null exp-null no
|
||||
ipv4 192.168.10.10/31 192.168.10.2 17 exp-null yes
|
||||
ipv6 2001:678:d78:200::/128 192.168.10.0 18 exp-null yes
|
||||
ipv6 2001:678:d78:200::1/128 0.0.0.0 exp-null - no
|
||||
ipv6 2001:678:d78:200::2/128 192.168.10.2 31 exp-null yes
|
||||
ipv6 2001:678:d78:200::3/128 192.168.10.2 38 34 yes
|
||||
ipv6 2001:678:d78:210::/60 0.0.0.0 48 - no
|
||||
ipv6 2001:678:d78:210::/128 0.0.0.0 39 - no
|
||||
ipv6 2001:678:d78:210::1/128 0.0.0.0 40 - no
|
||||
ipv6 2001:678:d78:210::2/128 0.0.0.0 41 - no
|
||||
ipv6 2001:678:d78:210::3/128 0.0.0.0 42 - no
|
||||
|
||||
vpp0-1# show mpls table
|
||||
Inbound Label Type Nexthop Outbound Label
|
||||
------------------------------------------------------------------
|
||||
16 LDP 192.168.10.9 IPv4 Explicit Null
|
||||
17 LDP 192.168.10.9 IPv4 Explicit Null
|
||||
18 LDP fe80::5054:ff:fe00:1001 IPv6 Explicit Null
|
||||
19 LDP fe80::5054:ff:fe00:1001 IPv6 Explicit Null
|
||||
20 LDP 192.168.10.6 IPv4 Explicit Null
|
||||
21 LDP 192.168.10.6 IPv4 Explicit Null
|
||||
31 LDP fe80::5054:ff:fe02:1000 IPv6 Explicit Null
|
||||
32 LDP fe80::5054:ff:fe02:1000 IPv6 Explicit Null
|
||||
33 LDP 192.168.10.9 33
|
||||
38 LDP fe80::5054:ff:fe02:1000 34
|
||||
```
|
||||
|
||||
In the first table, each entry of the IPv4 and IPv6 routing table, as fed by OSPF and OSPFv3,
|
||||
will get a label associated with them. The negotiation of _LDP_ will ask our peer to set a
|
||||
specific label, and it'll inform the peer on which label we are intending to use for the
|
||||
_Label Switched Path_ towards that destination. I'll give two examples to illustrate how
|
||||
this table is used:
|
||||
1. This router (`vpp0-1`) has a peer `vpp0-0` and when this router wants to send traffic to
|
||||
it, it'll be sent with `exp-null` (because it is the last router in the _LSP_), but when
|
||||
other routers might want to use this router to reach `vpp0-0`, they should use the MPLS
|
||||
label value 20.
|
||||
1. This router (`vpp0-1`) is _not_ directly connected to `vpp0-3` and as such, its IPv4 and IPv6
|
||||
loopback addresses are going to contain labels in both directions: if `vpp0-1` itself
|
||||
wants to send a packet to `vpp0-3`, it will use label value 33 and 38 respectively.
|
||||
However, if other routers want to use this router to reach `vpp0-3`, they should use the
|
||||
MPLS label value 33 and 34 respectively.
|
||||
|
||||
The second table describes the MPLS _Forwarding Information Base (FIB)_. When receiving an
|
||||
MPLS packet with an inbound label noted in this table, the operation applied is _SWAP_ to the
|
||||
outbound label, and forward towards a nexthop -- this is the stuff that _P-Routers_ use when
|
||||
transiting MPLS traffic.
|
||||
|
||||
### MPLS: Linux view
|
||||
|
||||
FRR's LDP daemon will offer both of these routing tables to the Linux kernel using Netlink
|
||||
messages, so the Linux view looks similar:
|
||||
|
||||
```
|
||||
root@vpp0-1:~# ip ro
|
||||
192.168.10.0 nhid 230 encap mpls 0 via 192.168.10.6 dev e0 proto ospf src 192.168.10.1 metric 20
|
||||
192.168.10.2 nhid 226 encap mpls 0 via 192.168.10.9 dev e1 proto ospf src 192.168.10.1 metric 20
|
||||
192.168.10.3 nhid 227 encap mpls 33 via 192.168.10.9 dev e1 proto ospf src 192.168.10.1 metric 20
|
||||
192.168.10.4/31 nhid 230 encap mpls 0 via 192.168.10.6 dev e0 proto ospf src 192.168.10.1 metric 20
|
||||
192.168.10.6/31 dev e0 proto kernel scope link src 192.168.10.7
|
||||
192.168.10.8/31 dev e1 proto kernel scope link src 192.168.10.8
|
||||
192.168.10.10/31 nhid 226 encap mpls 0 via 192.168.10.9 dev e1 proto ospf src 192.168.10.1 metric 20
|
||||
|
||||
root@vpp0-1:~# ip -6 ro
|
||||
2001:678:d78:200:: nhid 231 encap mpls 2 via fe80::5054:ff:fe00:1001 dev e0 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
||||
2001:678:d78:200::1 dev loop0 proto kernel metric 256 pref medium
|
||||
2001:678:d78:200::2 nhid 237 encap mpls 2 via fe80::5054:ff:fe02:1000 dev e1 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
||||
2001:678:d78:200::3 nhid 239 encap mpls 34 via fe80::5054:ff:fe02:1000 dev e1 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
||||
2001:678:d78:201::/112 nhid 231 encap mpls 2 via fe80::5054:ff:fe00:1001 dev e0 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
||||
2001:678:d78:201::1:0/112 dev e0 proto kernel metric 256 pref medium
|
||||
2001:678:d78:201::2:0/112 dev e1 proto kernel metric 256 pref medium
|
||||
2001:678:d78:201::3:0/112 nhid 237 encap mpls 2 via fe80::5054:ff:fe02:1000 dev e1 proto ospf src 2001:678:d78:200::1 metric 20 pref medium
|
||||
|
||||
root@vpp0-1:~# ip -f mpls ro
|
||||
16 as to 0 via inet 192.168.10.9 dev e1 proto ldp
|
||||
17 as to 0 via inet 192.168.10.9 dev e1 proto ldp
|
||||
18 as to 2 via inet6 fe80::5054:ff:fe00:1001 dev e0 proto ldp
|
||||
19 as to 2 via inet6 fe80::5054:ff:fe00:1001 dev e0 proto ldp
|
||||
20 as to 0 via inet 192.168.10.6 dev e0 proto ldp
|
||||
21 as to 0 via inet 192.168.10.6 dev e0 proto ldp
|
||||
31 as to 2 via inet6 fe80::5054:ff:fe02:1000 dev e1 proto ldp
|
||||
32 as to 2 via inet6 fe80::5054:ff:fe02:1000 dev e1 proto ldp
|
||||
33 as to 33 via inet 192.168.10.9 dev e1 proto ldp
|
||||
38 as to 34 via inet6 fe80::5054:ff:fe02:1000 dev e1 proto ldp
|
||||
```
|
||||
|
||||
The first two tabled show a 'regular' Linux routing table for IPv4 and IPv6 respectively, except there's
|
||||
an `encap mpls <X>` added for all not-directly-connected prefixes. In this case, `vpp0-1` connects on
|
||||
`e0` to `vpp0-0` to the West, and on interface `e1` to `vpp0-2` to the East. These connected routes do
|
||||
not carry MPLS information and in fact, this is how LDP can continue to work and exchange information
|
||||
naturally even when no _LSPs_ are established yet.
|
||||
|
||||
The third table is the _MPLS FIB_, and it shows the special case of _MPLS Explicit NULL_ clearly. All IPv4
|
||||
routes for which this router is the penultimate hop carry the outbound label value 0,S=1, while the IPv6
|
||||
routes carry the value 2,S=1. Booyah!
|
||||
|
||||
### MPLS: VPP view
|
||||
|
||||
The _FIB_ information in general is super densely populated in VPP. Rather than dumping the whole table,
|
||||
I'll show one example, for `192.168.10.3` which we can see above will be encapsulated into an MPLS
|
||||
packet with label value 33,S=0 before being fowarded:
|
||||
|
||||
```
|
||||
root@vpp0-1:~# vppctl show ip fib 192.168.10.3
|
||||
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[adjacency:1, default-route:1, lcp-rt:1, ]
|
||||
192.168.10.3/32 fib:0 index:78 locks:2
|
||||
lcp-rt-dynamic refs:1 src-flags:added,contributing,active,
|
||||
path-list:[29] locks:6 flags:shared, uPRF-list:53 len:1 itfs:[2, ]
|
||||
path:[41] pl-index:29 ip4 weight=1 pref=20 attached-nexthop: oper-flags:resolved,
|
||||
192.168.10.9 GigabitEthernet10/0/1
|
||||
[@0]: ipv4 via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:7 flags:[] 5254000210005254000110010800
|
||||
Extensions:
|
||||
path:41 labels:[[33 pipe ttl:0 exp:0]]
|
||||
forwarding: unicast-ip4-chain
|
||||
[@0]: dpo-load-balance: [proto:ip4 index:81 buckets:1 uRPF:53 to:[2421:363846]]
|
||||
[0] [@13]: mpls-label[@4]:[33:64:0:eos]
|
||||
[@1]: mpls via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:3 flags:[] 5254000210005254000110018847
|
||||
```
|
||||
|
||||
The trick is looking at the Extensions, which shows the _out-labels_ set to 33, with ttl=0 (which makes
|
||||
VPP copy the TTL from the IPv4 packet itself), and exp=0. It can then forward the packet as MPLS onto
|
||||
the nexthop at 192.168.10.9 (`vpp0-2.e0` on Gi10/0/1).
|
||||
|
||||
The MPLS _FIB_ is also a bit chatty, and shows a fundamental difference with Linux:
|
||||
|
||||
```
|
||||
root@vpp0-1:~# vppctl show mpls fib 33
|
||||
MPLS-VRF:0, fib_index:0 locks:[interface:6, CLI:1, lcp-rt:1, ]
|
||||
33:neos/21 fib:0 index:37 locks:2
|
||||
lcp-rt-dynamic refs:1 src-flags:added,contributing,active,
|
||||
path-list:[57] locks:12 flags:shared, uPRF-list:21 len:1 itfs:[2, ]
|
||||
path:[81] pl-index:57 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
|
||||
192.168.10.9 GigabitEthernet10/0/1
|
||||
[@0]: ipv4 via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:7 flags:[] 5254000210005254000110010800
|
||||
Extensions:
|
||||
path:81 labels:[[33 pipe ttl:0 exp:0]]
|
||||
forwarding: mpls-neos-chain
|
||||
[@0]: dpo-load-balance: [proto:mpls index:40 buckets:1 uRPF:21 to:[0:0]]
|
||||
[0] [@6]: mpls-label[@28]:[33:64:0:neos]
|
||||
[@1]: mpls via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:3 flags:[] 5254000210005254000110018847
|
||||
|
||||
33:eos/21 fib:0 index:64 locks:2
|
||||
lcp-rt-dynamic refs:1 src-flags:added,contributing,active,
|
||||
path-list:[57] locks:12 flags:shared, uPRF-list:21 len:1 itfs:[2, ]
|
||||
path:[81] pl-index:57 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved,
|
||||
192.168.10.9 GigabitEthernet10/0/1
|
||||
[@0]: ipv4 via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:7 flags:[] 5254000210005254000110010800
|
||||
Extensions:
|
||||
path:81 labels:[[33 pipe ttl:0 exp:0]]
|
||||
forwarding: mpls-eos-chain
|
||||
[@0]: dpo-load-balance: [proto:mpls index:67 buckets:1 uRPF:21 to:[73347:10747680]]
|
||||
[0] [@6]: mpls-label[@29]:[33:64:0:eos]
|
||||
[@1]: mpls via 192.168.10.9 GigabitEthernet10/0/1: mtu:9000 next:3 flags:[] 5254000210005254000110018847
|
||||
```
|
||||
|
||||
I note that there are two entries here -- I wrote about them above. The MPLS implementation in VPP
|
||||
allows for a different forwarding behavior in the case that the label inspected is the last one in the
|
||||
stack (S=1), which is the usual case called _End of Stack (EOS)_. But, it also has a second entry
|
||||
which tells it what to do if S=0 or _Not End of Stack (NEOS)_. Linux doesn't make the destinction, so
|
||||
@vifino added two identical entries using that ***lcp_router_fib_route_path_dup()*** function.
|
||||
|
||||
But, what the entries themselves mean is that if this `vpp0-1` router were to receive an MPLS
|
||||
packet with label value 33,S=1 (or value 33,S=0), it'll perform a _SWAP_ operation and put as
|
||||
new outbound label (the same) value 33, and forward the packet as MPLS onto 192.168.10.9 on Gi10/0/1.
|
||||
|
||||
## Results
|
||||
|
||||
And with that, I think we achieved a running LDP with IPv4 and IPv6 and forwarding + encapsulation
|
||||
of MPLS with VPP. One cool wrapup I thought I'd leave you with, is showing how these MPLS routers
|
||||
are transparent with respect to IP traffic going through them. If I look at the diagram above, `lab`
|
||||
reaches `vpp0-3` via three hops: first into `vpp0-0` where it is wrapped into MPLS and forwarded
|
||||
to `vpp0-1`, and then through `vpp0-2`, which sets the _Explicit NULL_ label and forwards again
|
||||
as MPLS onto `vpp0-3`, which does the IPv4 and IPv6 lookup.
|
||||
|
||||
Check this out:
|
||||
|
||||
```
|
||||
pim@lab:~$ for node in $(seq 0 3); do traceroute -4 -q1 vpp0-$node; done
|
||||
traceroute to vpp0-0 (192.168.10.0), 30 hops max, 60 byte packets
|
||||
1 vpp0-0.lab.ipng.ch (192.168.10.0) 1.907 ms
|
||||
traceroute to vpp0-1 (192.168.10.1), 30 hops max, 60 byte packets
|
||||
1 vpp0-1.lab.ipng.ch (192.168.10.1) 2.460 ms
|
||||
traceroute to vpp0-1 (192.168.10.2), 30 hops max, 60 byte packets
|
||||
1 vpp0-2.lab.ipng.ch (192.168.10.2) 3.860 ms
|
||||
traceroute to vpp0-1 (192.168.10.3), 30 hops max, 60 byte packets
|
||||
1 vpp0-3.lab.ipng.ch (192.168.10.3) 4.414 ms
|
||||
|
||||
pim@lab:~$ for node in $(seq 0 3); do traceroute -6 -q1 vpp0-$node; done
|
||||
traceroute to vpp0-0 (2001:678:d78:200::), 30 hops max, 80 byte packets
|
||||
1 vpp0-0.lab.ipng.ch (2001:678:d78:200::) 3.037 ms
|
||||
traceroute to vpp0-1 (2001:678:d78:200::1), 30 hops max, 80 byte packets
|
||||
1 vpp0-1.lab.ipng.ch (2001:678:d78:200::1) 5.125 ms
|
||||
traceroute to vpp0-1 (2001:678:d78:200::2), 30 hops max, 80 byte packets
|
||||
1 vpp0-2.lab.ipng.ch (2001:678:d78:200::2) 7.135 ms
|
||||
traceroute to vpp0-1 (2001:678:d78:200::3), 30 hops max, 80 byte packets
|
||||
1 vpp0-3.lab.ipng.ch (2001:678:d78:200::3) 8.763 ms
|
||||
```
|
||||
|
||||
With MPLS, each of these routers appears to the naked eye to be directly connected to the
|
||||
`lab` headend machine, but we know better! :)
|
||||
|
||||
## What's next
|
||||
|
||||
I joined forces with [@vifino](https://chaos.social/@vifino) who has effectively added MPLS handling
|
||||
to the Linux Control Plane, so VPP can start to function as an MPLS router using FRR's label
|
||||
distribution protocol implementation. Gosh, I wish Bird3 would have LDP :)
|
||||
|
||||
Our work is mostly complete, there's two pending Gerrit's which should be ready to review and
|
||||
certainly ready to play with:
|
||||
|
||||
1. [[Gerrit 38826](https://gerrit.fd.io/r/c/vpp/+/38826)]: This adds the ability to listen to internal
|
||||
state changes of an interface, so that the Linux Control Plane plugin can enable MPLS on the
|
||||
_LIP_ interfaces and Linux sysctl for MPLS input.
|
||||
1. [[Gerrit 38702](https://gerrit.fd.io/r/c/vpp/+/38702)]: This adds the ability to listen to Netlink
|
||||
messages in the Linux Control Plane plugin, and sensibly apply these routes to the IPv4, IPv6
|
||||
and MPLS FIB in the VPP dataplane.
|
||||
|
||||
Finally, a note from your friendly neighborhood developers: this code is brand-new and has had _very
|
||||
limited_ peer-review from the VPP developer community. It adds a significant feature to the Linux
|
||||
Controlplane plugin, so make sure you both understand the semantics, the differences between Linux
|
||||
and VPP, and the overall implementation before attempting to use in production. We're pretty sure
|
||||
we got at least some of this right, but testing and runtime experience will tell.
|
||||
|
||||
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
||||
[[GitHub](https://github.com/pimvanpelt/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
||||
|
Reference in New Issue
Block a user