All checks were successful
continuous-integration/drone/push Build is passing
752 lines
40 KiB
Markdown
752 lines
40 KiB
Markdown
---
|
|
date: "2023-05-07T10:01:14Z"
|
|
title: VPP MPLS - Part 1
|
|
aliases:
|
|
- /s/articles/2023/05/07/vpp-mpls-1.html
|
|
---
|
|
|
|
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
|
|
|
# About this series
|
|
|
|
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its
|
|
performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic
|
|
_ASR_ (aggregation service router), VPP will look and feel quite familiar as many of the approaches
|
|
are shared between the two.
|
|
|
|
I've deployed an MPLS core for IPng Networks, which allows me to provide L2VPN services, and at the
|
|
same time keep an IPng Site Local network with IPv4 and IPv6 that is separate from the internet,
|
|
based on hardware/silicon based forwarding at line rate and high availability. You can read all
|
|
about my Centec MPLS shenanigans in [[this article]({{< ref "2023-03-11-mpls-core" >}})].
|
|
|
|
Ever since the release of the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)]
|
|
plugin in VPP, folks have asked "What about MPLS?" -- I have never really felt the need to go this
|
|
rabbit hole, because I figured that in this day and age, higher level IP protocols that do tunneling
|
|
are just as performant, and a little bit less of an 'art' to get right. For example, the Centec
|
|
switches I deployed perform VxLAN, GENEVE and GRE all at line rate in silicon. And in an earlier
|
|
article, I showed that the performance of VPP in these tunneling protocols is actually pretty good.
|
|
Take a look at my [[VPP L2 article]({{< ref "2022-01-12-vpp-l2" >}})] for context.
|
|
|
|
You might ask yourself: _Then why bother?_ To which I would respond: if you have to ask that question,
|
|
clearly you don't know me :) This article will form a deep dive into MPLS as implemented by VPP. In
|
|
a later set of articles, I'll partner with the incomparable [@vifino](https://chaos.social/@vifino) who
|
|
is adding MPLS support to the Linux Controlplane plugin. After that, I do expect VPP to be able to act
|
|
as a fully fledged provider- and provider-edge MPLS router.
|
|
|
|
## Lab Setup
|
|
|
|
A while ago I created a [[VPP Lab]({{< ref "2022-10-14-lab-1" >}})] which is pretty slick, I use it
|
|
all the time. Most of the time I find myself messing around on the hypervisor and adding namespaces
|
|
with interfaces in it, to pair up with the VPP interfaces. And I tcpdump a lot! It's time for me to
|
|
make an upgrade to the Lab -- take a look at this picture:
|
|
|
|
{{< image src="/assets/vpp-mpls/LAB v2.svg" alt="Lab Setup" >}}
|
|
|
|
There's quite a bit to unpack here, but it will be useful to know this layout as I'll be referring
|
|
to the components here throughout the rest of the article. Each **lab** now has seven virtual
|
|
machines:
|
|
|
|
1. **vppX-Y** are Debian Testing machines running a reasonably fresh VPP - they are daisychained
|
|
with the first one attaching to the headend called `lab.ipng.ch`, using its Gi10/0/0 interface, and
|
|
onwards to its eastbound neighbor vpp0-1 using its GI10/0/1 interface.
|
|
1. **hostX-Y** are two Debian machines which have their 4 network cards (enp16s0fX) connected each
|
|
to one VPP instance's Gi10/0/2 interface (for `host0-0`) or Gi10/0/3 (for `host0-1`). This way, I
|
|
can test all sorts of topologies with one router, two routers, or multiple routers.
|
|
1. **tapX-0** is a special virtual machine which receives a copy of every packet on the underlying
|
|
Open vSwitch network fabric.
|
|
|
|
***NOTE***: X is the 0-based lab number, and Y stands for the 0-based logical machine number, so `vpp1-3`
|
|
is the fourth VPP virtualmachine on the second lab.
|
|
|
|
### Detour 1: Open vSwitch
|
|
|
|
To explain this tap a little bit - let me first talk about the underlay. All seven of these machines
|
|
(and each their four network cards) are bound by the hypervisor into an Open vSwitch bridge called
|
|
`vpplan`. Then, I use two features to build this topology:
|
|
|
|
Firstly, each pair of interfaces will be added as an access port into individual VLANs. For
|
|
example, `vpp0-0.Gi10/0/1` connects with `vpp0-1.Gi10/0/0` in VLAN 20 (annotated in orange), and
|
|
`vpp0-0.Gi10/0/2` connects to `host0-0.enp16s0f0` in VLAN 30 (annotated in purple). You can see the
|
|
East-West traffic over the VPP backbone are in the 20s, the `host0-0` traffic northbound is in the
|
|
30s, and the `host0-1` traffic southbound is in the 40s. Finally, the whole Open vSwitch fabric is
|
|
connected to `lab.ipng.ch` using VLAN 10 and a physical network card on the hypervisor (annotated in
|
|
green). The `lab.ipng.ch` machine then has internet connectivity.
|
|
|
|
```
|
|
BR=vpplan
|
|
for p in $(ovs-vsctl list-ifaces $BR); do
|
|
ovs-vsctl set port $p vlan_mode=access
|
|
done
|
|
|
|
# Uplink (Green)
|
|
ovs-vsctl set port uplink tag=10 ## eno1.200 on the Hypervisor
|
|
ovs-vsctl set port vpp0-0-0 tag=10
|
|
|
|
# Backbone (Orange)
|
|
ovs-vsctl set port vpp0-0-1 tag=20
|
|
ovs-vsctl set port vpp0-1-0 tag=20
|
|
...
|
|
|
|
# Northbound (Purple)
|
|
ovs-vsctl set port vpp0-0-2 tag=30
|
|
ovs-vsctl set port host0-0-0 tag=30
|
|
...
|
|
|
|
# Southbound (Red)
|
|
...
|
|
ovs-vsctl set port vpp0-3-3 tag=43
|
|
ovs-vsctl set port host0-1-3 tag=43
|
|
```
|
|
|
|
**NOTE**: The KVM interface names such as `vppX-Y-Z` where X means the lab number (0 in this case --
|
|
IPng does have multiple labs so I can run experiments and lab environments independently and isolated),
|
|
Y is the machine number, and Z is the interface number on the machine (from [0..3]).
|
|
|
|
### Detour 2: Mirroring Traffic
|
|
|
|
Secondly, now that I have created a 29 port switch with 12 VLANs, I decide to create an OVS _mirror
|
|
port_, which can be used to make a copy of traffic going in- or out of (a list of) ports. This is a
|
|
super powerful feature, and it looks like this:
|
|
|
|
```
|
|
BR=vpplan
|
|
MIRROR=mirror-rx
|
|
ovs-vsctl set port tap0-0-0 vlan_mode=access
|
|
|
|
[ ovs-vsctl list mirror $MIRROR >/dev/null 2>&1 ] || \
|
|
ovs-vsctl --id=@m get mirror $MIRROR -- remove bridge $BR mirrors @m
|
|
|
|
ovs-vsctl --id=@m create mirror name=$MIRROR \
|
|
-- --id=@p get port tap0-0-0 \
|
|
-- add bridge $BR mirrors @m \
|
|
-- set mirror $MIRROR output-port=@p \
|
|
-- set mirror $MIRROR select_dst_port=[] \
|
|
-- set mirror $MIRROR select_src_port=[]
|
|
|
|
for iface in $(ovs-vsctl list-ports $BR); do
|
|
[[ $iface == tap* ]] && continue
|
|
ovs-vsctl add mirror $MIRROR select_dst_port $(ovs-vsctl get port $iface _uuid)
|
|
done
|
|
```
|
|
|
|
The first call sets up the OVS switchport called `tap0-0-0` (which is enp16s0f0 on the machine
|
|
`tap0-0`) as an access port. To allow for this script to be idempotent, the second line will look up
|
|
if the mirror exists and if so, delete it. Then, I (re)create a mirror port with a given name
|
|
(`mirror-rx`), add it to the bridge, make the mirror's output port become `tap0-0-0`, and finally
|
|
clear the selected source and destination ports (this is where the traffic is mirrored _from_). At
|
|
this point, I have an empty mirror. To give it something useful to do, I loop over all of the ports
|
|
in the `vpplan` bridge and add them to the mirror, if they are the _destination_ port (here I have
|
|
to specify the uuid of the interface, not its name). I will add all interfaces, except those
|
|
of the `tap0-0` machine itself, to avoid loops.
|
|
|
|
In the end, I create two of these, one called `mirror-rx` which is forwarded to `tap0-0-0`
|
|
(enp16s0f0) and the other called `mirror-tx` which is forwarded to `tap0-0-1` (enp16s0f1). I can use
|
|
tcpdump on either of these ports, to show all the traffic either going _ingress_ to any port on any
|
|
machine, or emitting _egress_ from any port on any machine, respectively.
|
|
|
|
## Preparing the LAB
|
|
|
|
I wrote a little bit about the automation I use to maintain a few reproducable lab environments in a
|
|
[[previous article]({{< ref "2022-10-14-lab-1" >}})], so I'll only show the commands themselves here,
|
|
not the underlying systems. When the LAB boots up, it comes with a basic Linux CP configuration that
|
|
uses OSPF and OSPFv3 running in Bird2, to connect the `vpp0-0` through `vpp0-3` machines together (each
|
|
router's Gi10/0/0 port connects to the next router's Gi10/0/1 port). LAB0 is in use by
|
|
[@vifino](https://chaos.social/@vifino) at the moment, so I'll take the next one running on its own
|
|
hypervisor, called LAB1.
|
|
|
|
Each machine has an IPv4 and IPv6 loopback, so the LAB will come up with basic connectivity:
|
|
|
|
```
|
|
pim@lab:~/src/lab$ LAB=1 ./create
|
|
pim@lab:~/src/lab$ LAB=1 ./command pristine
|
|
pim@lab:~/src/lab$ LAB=1 ./command start && sleep 150
|
|
pim@lab:~/src/lab$ traceroute6 vpp1-3.lab.ipng.ch
|
|
traceroute to vpp1-3.lab.ipng.ch (2001:678:d78:211::3), 30 hops max, 24 byte packets
|
|
1 e0.vpp1-0.lab.ipng.ch (2001:678:d78:211::fffe) 2.0363 ms 2.0123 ms 2.0138 ms
|
|
2 e0.vpp1-1.lab.ipng.ch (2001:678:d78:211::1:11) 3.0969 ms 3.1261 ms 3.3413 ms
|
|
3 e0.vpp1-2.lab.ipng.ch (2001:678:d78:211::2:12) 6.4845 ms 6.3981 ms 6.5409 ms
|
|
4 vpp1-3.lab.ipng.ch (2001:678:d78:211::3) 7.4610 ms 7.5698 ms 7.6413 ms
|
|
```
|
|
|
|
## MPLS For Dummies
|
|
|
|
.. like me! MPLS stands for [[Multi Protocol Label
|
|
Switching](https://en.wikipedia.org/wiki/Multiprotocol_Label_Switching)]. Rather than looking at the
|
|
IPv4 or IPv6 header in the packet, and making the routing decision based on the destination address,
|
|
MPLS takes the whole packet and encpsulates it into a new datagram that carries a 20-bit number
|
|
(called the _label_), three bits to classify the traffic, one _S-bit_ to signal that this is the
|
|
last label in a stack of labels, and finally 8 bits of TTL.
|
|
|
|
In total, 32 bits are prepended to the whole IP packet, or Ethernet frame, or any other type of inner
|
|
datagram. This is why it's called _Multi Protocol_. The _S-bit_ allows routers to know if the
|
|
following data is the inner payload (S=1), or if the following 32 bits are another MPLS label (S=0).
|
|
This way, routers can add more than one labels into a _label stack_.
|
|
|
|
Forwarding decisions are made on the contents of this MPLS _label_, without the need to examine
|
|
the packet itself. Two significant benefits become obvious:
|
|
|
|
1. The inner data payload (ie. an IPv6 packet or an Ethernet frame) doesn't have to be
|
|
rewritten, no new checksum created, no TTL decremented. Any datagram can be stuffed into an MPLS
|
|
packet, the routing (or _packet switching_) entirely happens using only the MPLS headers.
|
|
|
|
2. Importantly, no source- or destination IP addresses have to be looked up in a possibly very
|
|
large ~1M large FIB tree to figure out the next hop. Rather than traversing a [[Radix
|
|
Trie](https://en.wikipedia.org/wiki/Radix_tree)] or other datastructure to find the next-hop, a
|
|
static [[Hash Table](https://en.wikipedia.org/wiki/Hash_table)] with literal integer MPLS labels
|
|
can be consulted. This greatly simplifies the computational complexity in transit.
|
|
|
|
***P-Router***: The simplest form of an MPLS router is a so-called Label-Switch-Router (_LSR_) which is synonymous
|
|
for Provider-Router (_P-Router_). This is the router that sits in the core of the network, and its
|
|
only purpose is to receive MPLS packets, look up what to do with them based on the _label_ value,
|
|
and then forward the packet onto the next router. Sometimes the router can (and will) rewrite the
|
|
label, in an operation called a SWAP, but it can also leave the label as it was (in other words, the
|
|
input label value can be the same as the outgoing label value). The logic kind of goes like
|
|
**MPLS In-Label** => **{ MPLS Out-Label, Out-Interface, Out-NextHop }**. It's this behavior
|
|
that explains the name _Label Switching_.
|
|
|
|
If you were to imagine plotting a path through the lab network from say `vpp1-0` on the one side,
|
|
through `vpp1-1` and `vpp1-2` on finally onwards to `vpp1-3`, each router would be receiving MPLS packets on one
|
|
interface, and emitting them on their way to the next router on another interface. That *path* of
|
|
*switching* operations on the *labels* of those MPLS packets thus forms a so-called _Label-Switched-Path
|
|
(LSP)_. These LSPs are fundamental building blocks of MPLS networks, as I'll demonstrate later.
|
|
|
|
***PE-Router***: Some routers have a less boring job to do - those that sit at the edge of an MPLS network, accept
|
|
customer traffic and do something useful with it. These are called Label-Edge-Router (_LER_) which
|
|
is often colloquially called a Provider-Edge-Router (_PE-Router_). These routers receive normal
|
|
packets (ethernet or IP or otherwise), and perform the encapsulation by adding MPLS labels to them
|
|
upon receipt (ingress, called PUSH), or removing the encapsulation (called POP) and finding the
|
|
inner payload, continuing to handle them as per normal. The logic for these can be much more
|
|
complicated, but you can imagine it goes something like **MPLS In-Label** => **{ Operation }**
|
|
where the operation may be "take the resulting datagram, assume it is an IPv4 packet, so look it up
|
|
in the IPv4 routing table" or "take the resulting datagram, assume it is an ethernet frame, and emit
|
|
it on a specific interface", and really any number of other "operations".
|
|
|
|
The cool thing about MPLS is that the type of operations are vendor-extensible. If two routers A and B
|
|
agree what label 1234 means _to them_, they can simply insert it at the top of the _labelstack_ say
|
|
{100,1234}, where the bottom one (the 100 label that all the _P-Routers_ see) carries the semantic
|
|
meaning of "switch this packet onto the destination _PE-router_", where that _PE-router_ can pop the
|
|
outer label, to reveal the 1234-label, which it can look up in its table to tell it what to do next
|
|
with the MPLS payload in any way it chooses - the _P-Routers_ don't have to understand the meaning
|
|
of label 1234, they don't have to use or inspect it at all!
|
|
|
|
### Step 0: End Host setup
|
|
|
|
{{< image src="/assets/vpp-mpls/LAB v2 (1).svg" alt="Lab Setup" >}}
|
|
|
|
For this lab, I'm going to boot up instance LAB1 with no changes (for posterity, using image
|
|
`vpp-proto-disk0@20230403-release`). As an aside, IPng Networks has several of these lab
|
|
environments, and while [@vifino](https://chaos.social/@vifino) is doing some development testing on
|
|
LAB0, I simply switch to LAB1 to let him work in peace.
|
|
|
|
With the MPLS concepts introduced, let me start by configuring `host1-0` and `host1-1` and giving them
|
|
an IPv4 loopback address, and a transit network to their routers `vpp1-0` and `vpp1-3` respectively:
|
|
|
|
```
|
|
root@host1-1:~# ip link set enp16s0f0 up mtu 1500
|
|
root@host1-1:~# ip addr add 192.0.2.2/31 dev enp16s0f0
|
|
root@host1-1:~# ip addr add 10.0.1.1/32 dev lo
|
|
root@host1-1:~# ip ro add 10.0.1.0/32 via 192.0.2.3
|
|
|
|
root@host1-0:~# ip link set enp16s0f3 up mtu 1500
|
|
root@host1-0:~# ip addr add 192.0.2.0/31 dev enp16s0f3
|
|
root@host1-0:~# ip addr add 10.0.1.0/32 dev lo
|
|
root@host1-0:~# ip ro add 10.0.1.1/32 via 192.0.2.1
|
|
root@host1-0:~# ping -I 10.0.1.0 10.0.1.1
|
|
```
|
|
|
|
At this point, I don't expect to see much, as I haven't configured VPP yet. But `host1-0` will start
|
|
ARPing for 192.0.2.1 on `enp16s0f3`, which is connected to `vpp1-3.e2`. Let me take a look on the
|
|
Open vSwitch mirror to confirm that:
|
|
|
|
```
|
|
root@tap1-0:~# tcpdump -vni enp16s0f0 vlan 33
|
|
12:41:27.174052 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.1 tell 192.0.2.0, length 28
|
|
12:41:28.333901 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.1 tell 192.0.2.0, length 28
|
|
12:41:29.517415 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.1 tell 192.0.2.0, length 28
|
|
12:41:30.645418 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.0.2.1 tell 192.0.2.0, length 28
|
|
```
|
|
|
|
Alright! I'm going to leave the ping running in the background, and I'll trace packets through the
|
|
network using the Open vSwitch mirror, as well as take a look at what VPP is doing with the packets
|
|
using its packet tracer.
|
|
|
|
### Step 1: PE Ingress
|
|
|
|
```
|
|
vpp1-3# set interface state GigabitEthernet10/0/2 up
|
|
vpp1-3# set interface ip address GigabitEthernet10/0/2 192.0.2.1/31
|
|
vpp1-3# mpls table add 0
|
|
vpp1-3# set interface mpls GigabitEthernet10/0/1 enable
|
|
vpp1-3# set interface mpls GigabitEthernet10/0/0 enable
|
|
vpp1-3# ip route add 10.0.1.1/32 via 192.168.11.10 GigabitEthernet10/0/0 out-labels 100
|
|
```
|
|
|
|
Now the ARP resolution succeeds, and I can see that `host1-0` starts sending ICMP packets towards the
|
|
loopback that I have configured on `host1-1`, and it's of course using the newly learned L2 adjacency
|
|
for 192.0.2.1 at 52:54:00:13:10:02 (which is `vpp1-3.e2`). But, take a look at what the VPP router
|
|
does next: due to the `ip route add ...` command, I've told it to reach 10.0.1.1 via a nexthop of
|
|
`vpp1-2.e1`, but it will PUSH a single MPLS label 100,S=1 and forward it out on its Gi10/0/0
|
|
interface:
|
|
|
|
```
|
|
root@tap1-0:~# tcpdump -eni enp16s0f0 vlan or mpls
|
|
12:45:56.551896 52:54:00:20:10:03 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 33
|
|
p 0, ethertype ARP (0x0806), Request who-has 192.0.2.1 tell 192.0.2.0, length 28
|
|
12:45:56.553311 52:54:00:13:10:02 > 52:54:00:20:10:03, ethertype 802.1Q (0x8100), length 46: vlan 33
|
|
p 0, ethertype ARP (0x0806), Reply 192.0.2.1 is-at 52:54:00:13:10:02, length 28
|
|
|
|
12:45:56.620924 52:54:00:20:10:03 > 52:54:00:13:10:02, ethertype 802.1Q (0x8100), length 102: vlan 33
|
|
p 0, ethertype IPv4 (0x0800), 10.0.1.0 > 10.0.1.1: ICMP echo request, id 38791, seq 184, length 64
|
|
12:45:56.621473 52:54:00:13:10:00 > 52:54:00:12:10:01, ethertype 802.1Q (0x8100), length 106: vlan 22
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 100, exp 0, [S], ttl 64)
|
|
10.0.1.0 > 10.0.1.1: ICMP echo request, id 38791, seq 184, length 64
|
|
```
|
|
|
|
My MPLS journey on VPP has officially begun! The first exchange in the tcpdump (packets 1 and 2)
|
|
is the ARP resolution of 192.0.2.1 by `host1-0`, after which it knows where to send the ICMP echo
|
|
(packet 3, on VLAN33), which is then sent out by `vpp1-3` as MPLS to `vpp1-2` (packet 4, on VLAN22).
|
|
|
|
Let me show you what such a packet looks like from the point of view of VPP. It has a _packet
|
|
tracing_ function which shows how any individual packet traverses the graph of nodes through the
|
|
router. It's a lot of information, but as a VPP operator, let alone a developer, it's really
|
|
important skill to learn -- so off I go, capturing and tracing a handful of packets:
|
|
|
|
```
|
|
vpp1-3# trace add dpdk-input 10
|
|
vpp1-3# show trace
|
|
------------------- Start of thread 0 vpp_main -------------------
|
|
Packet 1
|
|
|
|
20:15:00:496109: dpdk-input
|
|
GigabitEthernet10/0/2 rx queue 0
|
|
buffer 0x4c44df: current data 0, length 98, buffer-pool 0, ref-count 1, trace handle 0x0
|
|
ext-hdr-valid
|
|
PKT MBUF: port 2, nb_segs 1, pkt_len 98
|
|
buf_len 2176, data_len 98, ol_flags 0x0, data_off 128, phys_addr 0x2ed13840
|
|
packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
|
|
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
|
|
IP4: 52:54:00:20:10:03 -> 52:54:00:13:10:02
|
|
ICMP: 10.0.1.0 -> 10.0.1.1
|
|
tos 0x00, ttl 64, length 84, checksum 0x46a2 dscp CS0 ecn NON_ECN
|
|
fragment id 0x2706, flags DONT_FRAGMENT
|
|
ICMP echo_request checksum 0x3bd6 id 8399
|
|
|
|
20:15:00:496167: ethernet-input
|
|
frame: flags 0x1, hw-if-index 3, sw-if-index 3
|
|
IP4: 52:54:00:20:10:03 -> 52:54:00:13:10:02
|
|
|
|
20:15:00:496201: ip4-input
|
|
ICMP: 10.0.1.0 -> 10.0.1.1
|
|
tos 0x00, ttl 64, length 84, checksum 0x46a2 dscp CS0 ecn NON_ECN
|
|
fragment id 0x2706, flags DONT_FRAGMENT
|
|
ICMP echo_request checksum 0x3bd6 id 8399
|
|
|
|
20:15:00:496225: ip4-lookup
|
|
fib 0 dpo-idx 1 flow hash: 0x00000000
|
|
ICMP: 10.0.1.0 -> 10.0.1.1
|
|
tos 0x00, ttl 64, length 84, checksum 0x46a2 dscp CS0 ecn NON_ECN
|
|
fragment id 0x2706, flags DONT_FRAGMENT
|
|
ICMP echo_request checksum 0x3bd6 id 8399
|
|
|
|
20:15:00:496256: ip4-mpls-label-imposition-pipe
|
|
mpls-header:[100:64:0:eos]
|
|
|
|
20:15:00:496258: mpls-output
|
|
adj-idx 25 : mpls via 192.168.11.10 GigabitEthernet10/0/0: mtu:9000 next:2 flags:[] 5254001210015254001310008847 flow hash: 0x00000000
|
|
|
|
20:15:00:496260: GigabitEthernet10/0/0-output
|
|
GigabitEthernet10/0/0 flags 0x0018000d
|
|
MPLS: 52:54:00:13:10:00 -> 52:54:00:12:10:01
|
|
label 100 exp 0, s 1, ttl 64
|
|
|
|
20:15:00:496262: GigabitEthernet10/0/0-tx
|
|
GigabitEthernet10/0/0 tx queue 0
|
|
buffer 0x4c44df: current data -4, length 102, buffer-pool 0, ref-count 1, trace handle 0x0
|
|
ext-hdr-valid
|
|
l2-hdr-offset 0 l3-hdr-offset 14
|
|
PKT MBUF: port 2, nb_segs 1, pkt_len 102
|
|
buf_len 2176, data_len 102, ol_flags 0x0, data_off 124, phys_addr 0x2ed13840
|
|
packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
|
|
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
|
|
MPLS: 52:54:00:13:10:00 -> 52:54:00:12:10:01
|
|
label 100 exp 0, s 1, ttl 64
|
|
```
|
|
|
|
This packet has gone through a total of eight nodes, and the local timestamps are the uptime of VPP
|
|
when the packets were received. I'll try to explain them in turn:
|
|
|
|
1. ***dpdk-input***: The packet is initially received by from Gi10/0/2 receive queue 0. It was an
|
|
ethernet packet from 52:54:00:20:10:03 (`host1-0.enp16s0f3`) to 52:54:00:13:10:02 (`vpp1-3.e2`). Some
|
|
more information is gleaned here, notably that it was an ethernet frame, an L3 IPv4 and L4 ICMP
|
|
packet.
|
|
1. ***ethernet-input***: Since it was an ethernet frame, it was passed into this node. Here VPP
|
|
concludes that this is an IPv4 packet, because the ethertype is 0x0800.
|
|
1. ***ip4-input***: We know it's an IPv4 packet, and the layer4 information shows this is an ICMP
|
|
echo packet from 10.0.1.0 to 10.0.1.1 (configured on `host1-1.lo`). VPP now needs to figure out where
|
|
to route this packet.
|
|
1. ***ip4-lookup***: VPP takes a look at its FIB for 10.0.1.1 - note the information I specified
|
|
above (the `ip route add ...` on `vpp1-3`) - the next-hop here is 192.168.11.10 on Gi10/0/0 _but_ VPP
|
|
also sees that I'm intending to add an MPLS _out-label_ of 100.
|
|
1. ***ip4-mpls-label-inposition-pipe***: An MPLS packet header is prepended in front of the IPv4
|
|
packet, which will have only one label (100) and since it's the only label, it will set the S-bit
|
|
(end-of-stack) to 1, and the MPLS TTL initializes at 64.
|
|
1. ***mpls-output***: Now that the IPv4 packet is wrapped into an MPLS packet, VPP uses the rest
|
|
of the FIB entry (notably the next-hop 192.168.11.0 and the output interface Gi10/0/0) to find where
|
|
this thing is supposed to go.
|
|
1. ***Gi10/0/0-output***: VPP now prepares the packet to be sent out on Gi10/0/0 as an MPLS
|
|
ethernet type. It uses the L2FIB adjacency table to figure out that we'll be sending it from our mac
|
|
address 52:54:00:13:10:00 (`vpp1-3.e0`) to the next hop on 52:54:00:12:10:01 (`vpp1-2.e1`).
|
|
1. ***Gi10/0/0-tx***: VPP hands the fully formed packet with all necessary information back to
|
|
DPDK to marshall it on the wire.
|
|
|
|
Can you imagine this router can do such a thing at a rate of 18-20 Million packets per second,
|
|
linearly scaling up per added CPU thread? I learn something new every time I look at a packet trace,
|
|
I simply love this dataplane implementation!
|
|
|
|
### Step 2: P-routers
|
|
|
|
In Step 1 I've shown that `vpp1-3` did send the MPLS packet to `vpp1-2`, but I haven't configured
|
|
anything there yet, and because I didn't enable MPLS, each of these beautiful packets is brutally
|
|
sent to the bit-bucket (also called _dpo-drop_):
|
|
|
|
```
|
|
vpp1-2# show err
|
|
Count Node Reason Severity
|
|
132 mpls-input MPLS input packets decapsulated info
|
|
132 mpls-input MPLS not enabled error
|
|
```
|
|
|
|
The purpose of a _P-router_ is to switch labels along the _Label-Switched-Path_. So let's manually
|
|
create the following to tell this `vpp1-2` router what to do when it receives an MPLS frame with
|
|
label 100:
|
|
|
|
```
|
|
vpp1-2# mpls table add 0
|
|
vpp1-2# set interface mpls GigabitEthernet10/0/0 enable
|
|
vpp1-2# set interface mpls GigabitEthernet10/0/1 enable
|
|
vpp1-2# mpls local-label add 100 eos via 192.168.11.8 GigabitEthernet10/0/0 out-labels 100
|
|
```
|
|
|
|
Remember, above I explained that the _P-Router_ has a simple job? It really does! All I'm doing here
|
|
is telling VPP, that if it receives an MPLS packet on any MPLS-enabled interface (notably Gi10/0/1
|
|
from which it is currently receiving MPLS packets from `vpp1-3`), that it should send the MPLS packet
|
|
out on Gi10/0/0 to neighbor 192.168.11.8 after imposing label 100.
|
|
|
|
If I've done a good job, I should be able to see this packet traversing the P-Router in a packet
|
|
trace:
|
|
|
|
```
|
|
20:42:51:151144: dpdk-input
|
|
GigabitEthernet10/0/1 rx queue 0
|
|
buffer 0x4c7d8b: current data 0, length 102, buffer-pool 0, ref-count 1, trace handle 0x0
|
|
ext-hdr-valid
|
|
PKT MBUF: port 1, nb_segs 1, pkt_len 102
|
|
buf_len 2176, data_len 102, ol_flags 0x0, data_off 128, phys_addr 0x1d1f6340
|
|
packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
|
|
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
|
|
MPLS: 52:54:00:13:10:00 -> 52:54:00:12:10:01
|
|
label 100 exp 0, s 1, ttl 64
|
|
|
|
20:42:51:151161: ethernet-input
|
|
frame: flags 0x1, hw-if-index 2, sw-if-index 2
|
|
MPLS: 52:54:00:13:10:00 -> 52:54:00:12:10:01
|
|
|
|
20:42:51:151171: mpls-input
|
|
MPLS: next mpls-lookup[1] label 100 ttl 64 exp 0
|
|
|
|
20:42:51:151174: mpls-lookup
|
|
MPLS: next [6], lookup fib index 0, LB index 74 hash 0 label 100 eos 1
|
|
|
|
20:42:51:151177: mpls-label-imposition-pipe
|
|
mpls-header:[100:63:0:eos]
|
|
|
|
20:42:51:151179: mpls-output
|
|
adj-idx 28 : mpls via 192.168.11.8 GigabitEthernet10/0/0: mtu:9000 next:2 flags:[] 5254001110015254001210008847 flow hash: 0x00000000
|
|
|
|
20:42:51:151181: GigabitEthernet10/0/0-output
|
|
GigabitEthernet10/0/0 flags 0x0018000d
|
|
MPLS: 52:54:00:12:10:00 -> 52:54:00:11:10:01
|
|
label 100 exp 0, s 1, ttl 63
|
|
|
|
20:42:51:151184: GigabitEthernet10/0/0-tx
|
|
GigabitEthernet10/0/0 tx queue 0
|
|
buffer 0x4c7d8b: current data 0, length 102, buffer-pool 0, ref-count 1, trace handle 0x0
|
|
ext-hdr-valid
|
|
l2-hdr-offset 0 l3-hdr-offset 14
|
|
PKT MBUF: port 1, nb_segs 1, pkt_len 102
|
|
buf_len 2176, data_len 102, ol_flags 0x0, data_off 128, phys_addr 0x1d1f6340
|
|
packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
|
|
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
|
|
MPLS: 52:54:00:12:10:00 -> 52:54:00:11:10:01
|
|
label 100 exp 0, s 1, ttl 63
|
|
```
|
|
|
|
In order, the following nodes are traversed:
|
|
1. ***dpdk-input***: received the frame from the network interface Gi10/0/1
|
|
1. ***ethernet-input***: the frame was an ethernet frame, and VPP determines based on the
|
|
ethertype (0x8847) that it is an MPLS frame
|
|
1. ***mpls-input***: inspects the MPLS labelstack and sees the outermost label (the only one on
|
|
this frame) with a value of 100.
|
|
1. ***mpls-lookup***: looks up the MPLS FIB what to do with packets which are End-Of-Stack or
|
|
_EOS_ (ie. with the S-bit set to 1), and are labeled 100. At this point VPP could make a different
|
|
choice if there is 1 label (as in this case), or a stack of multiple labels (Not-End-of-Stack or
|
|
_NEOS_, ie. with the S-bit set to 0).
|
|
1. ***mpls-label-imposition-pipe***: reads from the FIB that the outer label needs to be SWAPd
|
|
to a new _out-label_ (also with value 100). Because it's the same label, this is a no-op. However,
|
|
since this router is forwarding the MPLS packet, it will decrement the TTL to 63.
|
|
1. ***mpls-output***: VPP then uses the rest of the FIB information to determine the L3 nexthop is
|
|
192.168.11.8 on Gi10/0/0.
|
|
1. ***Gi10/0/0-output***: uses the L2FIB adjacency table to determine that the L2 nexthop is MAC
|
|
address 52:54:00:11:10:01 (`vpp1-1.e1`). If there is no L2 adjacency, this would be a good time for
|
|
VPP to send an ARP request to resolve the IP-to-MAC and store it in the L2FIB.
|
|
1. ***Gi10/0/0-tx***: hands off the frame to DPDK for marshalling on the wire.
|
|
|
|
If you counted with me, you'll see that this flow in the _P-Router_ also has eight nodes. However,
|
|
while the IPv4 FIB can and will be north of one million entries in a longest-prefix match radix trie
|
|
(which is computationally expensive), the MPLS FIB contains far fewer entries which are organized as
|
|
a literal key lookup in a hash table; and as well compared to IPv4 routing, the packet that is being
|
|
transported does not have to get a decremented TTL which requires a recalculated IPv4 checksum. MPLS
|
|
switching is _much_ cheaper than IPv4 routing!
|
|
|
|
Now that our packets are switched from `vpp1-2` to `vpp1-1` (which is also a _P-Router_), I'll just
|
|
rinse and repeat there, using the L3 adjacency pointing at `vpp1-0.e1` (192.168.11.6 on Gi10/0/0):
|
|
|
|
```
|
|
vpp1-1# mpls table add 0
|
|
vpp1-1# set interface mpls GigabitEthernet10/0/0 enable
|
|
vpp1-1# set interface mpls GigabitEthernet10/0/1 enable
|
|
vpp1-1# mpls local-label add 100 eos via 192.168.11.6 GigabitEthernet10/0/0 out-labels 100
|
|
```
|
|
|
|
Did I do this correctly? One way to check is by taking a look at which packets are seen on the Open
|
|
vSwitch mirror ports:
|
|
|
|
|
|
```
|
|
root@tap1-0:~# tcpdump -eni enp16s0f0
|
|
13:42:47.724107 52:54:00:20:10:03 > 52:54:00:13:10:02, ethertype 802.1Q (0x8100), length 102: vlan 33
|
|
p 0, ethertype IPv4 (0x0800), 10.0.1.0 > 10.0.1.1: ICMP echo request, id 8399, seq 3238, length 64
|
|
|
|
13:42:47.724769 52:54:00:13:10:00 > 52:54:00:12:10:01, ethertype 802.1Q (0x8100), length 106: vlan 22
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 100, exp 0, [S], ttl 64)
|
|
10.0.1.0 > 10.0.1.1: ICMP echo request, id 8399, seq 3238, length 64
|
|
|
|
13:42:47.725038 52:54:00:12:10:00 > 52:54:00:11:10:01, ethertype 802.1Q (0x8100), length 106: vlan 21
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 100, exp 0, [S], ttl 63)
|
|
10.0.1.0 > 10.0.1.1: ICMP echo request, id 8399, seq 3238, length 64
|
|
|
|
13:42:47.726155 52:54:00:11:10:00 > 52:54:00:10:10:01, ethertype 802.1Q (0x8100), length 106: vlan 20
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 100, exp 0, [S], ttl 62)
|
|
10.0.1.0 > 10.0.1.1: ICMP echo request, id 8399, seq 3238, length 64
|
|
```
|
|
|
|
Nice!! I confirm that the ICMP packet first travels over VLAN 33 (from `host1-0` to `vpp1-3`), and then
|
|
the MPLS packets travel from `vpp1-3`, through `vpp-1-2`, through `vpp1-1` and towards `vpp1-0` over VLAN
|
|
22, 21 and 20 respectively.
|
|
|
|
### Step 3: PE Egress
|
|
|
|
Seeing as I haven't done anything with `vpp1-0` yet, now the MPLS packets all get dropped there. But not
|
|
for much longer, as I'm now ready to tell `vpp1-0` what to do with those packets:
|
|
|
|
```
|
|
vpp1-0# mpls table add 0
|
|
vpp1-0# set interface mpls GigabitEthernet10/0/0 enable
|
|
vpp1-0# set interface mpls GigabitEthernet10/0/1 enable
|
|
vpp1-0# mpls local-label add 100 eos via ip4-lookup-in-table 0
|
|
vpp1-0# ip route add 10.0.1.1/32 via 192.0.2.2
|
|
```
|
|
|
|
The difference between the _P-Routers_ in Step 2 and this _PE-Router_, is the operation provided in
|
|
the MPLS FIB. When an MPLS packet with _label_ value 100 is received, instead of forwarding it into
|
|
another interface (which is what the _P-Router_ would do), I tell VPP here to unwrap the MPLS label,
|
|
and expect to find an IPv4 packet which I'm asking it to route by looking up an IPv4 next hop in the
|
|
(IPv4) FIB table 0.
|
|
|
|
All that's left for me to do is add a regular static route for 10.0.1.1/32 via 192.0.2.2 (which is
|
|
the address on interface `host1-1.enp16s0f3`). If my thinkingcap is still working, I should now see
|
|
packets emit from `vpp1-0` on Gi10/0/3:
|
|
|
|
```
|
|
vpp1-0# trace add dpdk-input 10
|
|
vpp1-0# show trace
|
|
|
|
21:34:39:370589: dpdk-input
|
|
GigabitEthernet10/0/1 rx queue 0
|
|
buffer 0x4c4a34: current data 0, length 102, buffer-pool 0, ref-count 1, trace handle 0x0
|
|
ext-hdr-valid
|
|
PKT MBUF: port 1, nb_segs 1, pkt_len 102
|
|
buf_len 2176, data_len 102, ol_flags 0x0, data_off 128, phys_addr 0x2ff28d80
|
|
packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
|
|
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
|
|
MPLS: 52:54:00:11:10:00 -> 52:54:00:10:10:01
|
|
label 100 exp 0, s 1, ttl 62
|
|
|
|
21:34:39:370672: ethernet-input
|
|
frame: flags 0x1, hw-if-index 2, sw-if-index 2
|
|
MPLS: 52:54:00:11:10:00 -> 52:54:00:10:10:01
|
|
|
|
21:34:39:370702: mpls-input
|
|
MPLS: next mpls-lookup[1] label 100 ttl 62 exp 0
|
|
|
|
21:34:39:370704: mpls-lookup
|
|
MPLS: next [6], lookup fib index 0, LB index 83 hash 0 label 100 eos 1
|
|
|
|
21:34:39:370706: ip4-mpls-label-disposition-pipe
|
|
rpf-id:-1 ip4, pipe
|
|
|
|
21:34:39:370708: lookup-ip4-dst
|
|
fib-index:0 addr:10.0.1.1 load-balance:82
|
|
|
|
21:34:39:370710: ip4-rewrite
|
|
tx_sw_if_index 4 dpo-idx 32 : ipv4 via 192.0.2.2 GigabitEthernet10/0/3: mtu:9000 next:9 flags:[] 5254002110005254001010030800 flow hash: 0x00000000
|
|
00000000: 5254002110005254001010030800450000543dec40003e01e8bc0a0001000a00
|
|
00000020: 01010800173d231c01a0fce65864000000009ce80b00000000001011
|
|
|
|
21:34:39:370735: GigabitEthernet10/0/3-output
|
|
GigabitEthernet10/0/3 flags 0x0418000d
|
|
IP4: 52:54:00:10:10:03 -> 52:54:00:21:10:00
|
|
ICMP: 10.0.1.0 -> 10.0.1.1
|
|
tos 0x00, ttl 62, length 84, checksum 0xe8bc dscp CS0 ecn NON_ECN
|
|
fragment id 0x3dec, flags DONT_FRAGMENT
|
|
ICMP echo_request checksum 0x173d id 8988
|
|
|
|
21:34:39:370739: GigabitEthernet10/0/3-tx
|
|
GigabitEthernet10/0/3 tx queue 0
|
|
buffer 0x4c4a34: current data 4, length 98, buffer-pool 0, ref-count 1, trace handle 0x0
|
|
ext-hdr-valid
|
|
l2-hdr-offset 0 l3-hdr-offset 14 loop-counter 1
|
|
PKT MBUF: port 1, nb_segs 1, pkt_len 98
|
|
buf_len 2176, data_len 98, ol_flags 0x0, data_off 132, phys_addr 0x2ff28d80
|
|
packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
|
|
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
|
|
IP4: 52:54:00:10:10:03 -> 52:54:00:21:10:00
|
|
ICMP: 10.0.1.0 -> 10.0.1.1
|
|
tos 0x00, ttl 62, length 84, checksum 0xe8bc dscp CS0 ecn NON_ECN
|
|
fragment id 0x3dec, flags DONT_FRAGMENT
|
|
ICMP echo_request checksum 0x173d id 8988
|
|
```
|
|
|
|
Alright, another one of those huge blobs of information about a single packet traversing the VPP
|
|
dataplane, but it's the last one for this article, I promise! In order:
|
|
|
|
1. ***dpdk-input***: DPDK reads the frame which is arriving from `vpp1-1` on Gi10/0/1, it determines
|
|
that this is an ethernet frame
|
|
1. ***ethernet-input***: Based on the ethertype 0x8447, it knows that this ethernet frame is an
|
|
MPLS packet
|
|
1. ***mpls-input***: The MPLS _labelstack_ has one label, value 100, with (obviously) the
|
|
EndOfStack _S-bit_ set to 1; I can also see the (MPLS) TTL is 62 here, because it has traversed three
|
|
routers (`vpp1-3` TTL=64, `vpp1-2` TTL=63, and `vpp1-1` TTL=62)
|
|
1. ***mpls-lookup***: The lookup of local _label_ 100 informs VPP that it should switch to IPv4
|
|
processing and handle the packet as such
|
|
1. ***ip4-mpls-label-disposition-pipe***: The MPLS label is removed, revealing an IPv4 packet as
|
|
the inner payload of the MPLS datagram
|
|
1. ***lookup-ip4-dst***: VPP can now do a regular IPv4 forwarding table lookup for 10.0.1.1 which
|
|
informs it that it should forward the packet via 192.0.2.2 which is directly connected to Gi10/0/3.
|
|
1. ***ip4-rewrite***: The IPv4 TTL is decremented and the IP header checksum recomputed
|
|
1. ***Gi10/0/3-output***: VPP now can look up the L2FIB adjacency belonging to 192.0.2.2 on Gi10/0/3,
|
|
which informs it that 52:54:00:21:10:00 is the ethernet nexthop
|
|
1. ***Gi10/0/3-tx***: The packet is now handed off to DPDK to marshall on the wire, destined to
|
|
`host1-1.enp16s0f3`
|
|
|
|
That means I should be able to see it on `host1-1`, right? If you, too, are dying to know, check this out:
|
|
|
|
```
|
|
root@host1-1:~# tcpdump -ni enp16s0f0 icmp
|
|
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
|
|
listening on enp16s0f0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
|
|
14:25:53.776486 IP 10.0.1.0 > 10.0.1.1: ICMP echo request, id 8988, seq 1249, length 64
|
|
14:25:53.776522 IP 10.0.1.1 > 10.0.1.0: ICMP echo reply, id 8988, seq 1249, length 64
|
|
14:25:54.799829 IP 10.0.1.0 > 10.0.1.1: ICMP echo request, id 8988, seq 1250, length 64
|
|
14:25:54.799866 IP 10.0.1.1 > 10.0.1.0: ICMP echo reply, id 8988, seq 1250, length 64
|
|
```
|
|
|
|
"Jiggle jiggle, wiggle wiggle!", as I do a premature congratulatory dance on the chair in my lab! I
|
|
created a _label-switched-path_ using VPP as MPLS provider-edge and provider routers, to move
|
|
this ICMP echo packet all the way from `host1-0` to `host1-1`, but there's absolutely nothing to
|
|
suggest that the resulting ICMP echo-reply can go to back from `host1-1` to `host1-0`, because
|
|
_LSPs_ are unidirectional. The final step for me to do is create an _LSP_ back in the other
|
|
direction:
|
|
|
|
```
|
|
vpp1-0# ip route add 10.0.1.0/32 via 192.168.11.7 GigabitEthernet10/0/1 out-labels 103
|
|
vpp1-1# mpls local-label add 103 eos via 192.168.11.9 GigabitEthernet10/0/1 out-labels 103
|
|
vpp1-2# mpls local-label add 103 eos via 192.168.11.11 GigabitEthernet10/0/1 out-labels 103
|
|
vpp1-3# mpls local-label add 103 eos via ip4-lookup-in-table 0
|
|
vpp1-3# ip route add 10.0.1.0/32 via 192.0.2.0
|
|
```
|
|
|
|
And with that, the ping I started at the beginning of this article, shoots to life:
|
|
|
|
```
|
|
root@host1-0:~# ping -I 10.0.1.0 10.0.1.1
|
|
PING 10.0.1.1 (10.0.1.1) from 10.0.1.0 : 56(84) bytes of data.
|
|
64 bytes from 10.0.1.1: icmp_seq=7644 ttl=62 time=6.28 ms
|
|
64 bytes from 10.0.1.1: icmp_seq=7645 ttl=62 time=7.45 ms
|
|
64 bytes from 10.0.1.1: icmp_seq=7646 ttl=62 time=7.01 ms
|
|
64 bytes from 10.0.1.1: icmp_seq=7647 ttl=62 time=5.76 ms
|
|
64 bytes from 10.0.1.1: icmp_seq=7648 ttl=62 time=5.88 ms
|
|
64 bytes from 10.0.1.1: icmp_seq=7649 ttl=62 time=9.23 ms
|
|
```
|
|
|
|
I will leave you with this packetdump from the Open vSwitch mirror, showing the entire flow of one
|
|
ICMP packet through the network:
|
|
|
|
```
|
|
root@tap1-0:~# tcpdump -c 10 -eni enp16s0f0
|
|
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
|
|
listening on enp16s0f0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
|
|
14:41:07.526861 52:54:00:20:10:03 > 52:54:00:13:10:02, ethertype 802.1Q (0x8100), length 102: vlan 33
|
|
p 0, ethertype IPv4 (0x0800), 10.0.1.0 > 10.0.1.1: ICMP echo request, id 51470, seq 20, length 64
|
|
14:41:07.528103 52:54:00:13:10:00 > 52:54:00:12:10:01, ethertype 802.1Q (0x8100), length 106: vlan 22
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 100, exp 0, [S], ttl 64)
|
|
10.0.1.0 > 10.0.1.1: ICMP echo request, id 51470, seq 20, length 64
|
|
14:41:07.529342 52:54:00:12:10:00 > 52:54:00:11:10:01, ethertype 802.1Q (0x8100), length 106: vlan 21
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 100, exp 0, [S], ttl 63)
|
|
10.0.1.0 > 10.0.1.1: ICMP echo request, id 51470, seq 20, length 64
|
|
14:41:07.530421 52:54:00:11:10:00 > 52:54:00:10:10:01, ethertype 802.1Q (0x8100), length 106: vlan 20
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 100, exp 0, [S], ttl 62)
|
|
10.0.1.0 > 10.0.1.1: ICMP echo request, id 51470, seq 20, length 64
|
|
14:41:07.531160 52:54:00:10:10:03 > 52:54:00:21:10:00, ethertype 802.1Q (0x8100), length 102: vlan 40
|
|
p 0, ethertype IPv4 (0x0800), 10.0.1.0 > 10.0.1.1: ICMP echo request, id 51470, seq 20, length 64
|
|
|
|
14:41:07.531455 52:54:00:21:10:00 > 52:54:00:10:10:03, ethertype 802.1Q (0x8100), length 102: vlan 40
|
|
p 0, ethertype IPv4 (0x0800), 10.0.1.1 > 10.0.1.0: ICMP echo reply, id 51470, seq 20, length 64
|
|
14:41:07.532245 52:54:00:10:10:01 > 52:54:00:11:10:00, ethertype 802.1Q (0x8100), length 106: vlan 20
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 103, exp 0, [S], ttl 64)
|
|
10.0.1.1 > 10.0.1.0: ICMP echo reply, id 51470, seq 20, length 64
|
|
14:41:07.532732 52:54:00:11:10:01 > 52:54:00:12:10:00, ethertype 802.1Q (0x8100), length 106: vlan 21
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 103, exp 0, [S], ttl 63)
|
|
10.0.1.1 > 10.0.1.0: ICMP echo reply, id 51470, seq 20, length 64
|
|
14:41:07.533923 52:54:00:12:10:01 > 52:54:00:13:10:00, ethertype 802.1Q (0x8100), length 106: vlan 22
|
|
p 0, ethertype MPLS unicast (0x8847), MPLS (label 103, exp 0, [S], ttl 62)
|
|
10.0.1.1 > 10.0.1.0: ICMP echo reply, id 51470, seq 20, length 64
|
|
14:41:07.535040 52:54:00:13:10:02 > 52:54:00:20:10:03, ethertype 802.1Q (0x8100), length 102: vlan 33
|
|
p 0, ethertype IPv4 (0x0800), 10.0.1.1 > 10.0.1.0: ICMP echo reply, id 51470, seq 20, length 64
|
|
10 packets captured
|
|
10 packets received by filter
|
|
```
|
|
|
|
You can see all of the attributes I demonstrated in this article in one go: ingress ICMP packet on
|
|
VLAN 33, encapsulation with label 100, S=1 and ttl decrementing as the MPLS packet traverses
|
|
eastwards through the string of VPP routers on VLANs 22, 21 and 20, ultimately being sent out on
|
|
VLAN 40. There, the ICMP echo request packet is responded to, and we can trace the ICMP response as
|
|
it makes its way back westwards through the MPLS network using label 103, ultimately re-appearing on
|
|
VLAN 33.
|
|
|
|
There you have it. This is a fun story on _Multi Protocol Label Switching (MPLS)_ bringing a packet from
|
|
a _Label-Edge-Router (LER)_ through several _Label-Switch-Routers (LSRs)_ over a staticlly
|
|
configured _Label-Switched-Path (LSP)_. I feel like I can now more confidently use these terms
|
|
without sounding silly.
|
|
|
|
## What's next
|
|
|
|
The first mission is accomplished. I've taken a good look at IPv4 forwarding in the VPP dataplane as
|
|
MPLS packets, thereby en- and decapsulating the traffic using _PE-Routers_ and forwarding the
|
|
traffic using intermediary _P-Routers_. MPLS switching is cheaper than IPv4/IPv6 routing, but it can
|
|
also open a bunch of possibilities regarding advanced services offering, such as my coveted _Martini
|
|
Tunnels_ which transport ethernet frames point-to-point over an MPLS backbone. That will be the topic
|
|
of an upcoming article, as will I join forces with [@vifino](https://chaos.social/@vifino) who is adding
|
|
Linux Controlplane functionality to program the MPLS FIB using Netlink -- such that things like 'ip'
|
|
and 'FRR' can discover and share label information using a Label Distribution Protocol or _LDP_.
|