ipng.ch/content/articles/2022-12-09-oem-switch-2.md

---
date: "2022-12-09T11:56:54Z"
title: 'Review: S5648X-2Q4Z Switch - Part 2: MPLS'
aliases:
- /s/articles/2022/12/09/oem-switch-2.html
---

After receiving an e-mail from a newer [[China based OEM](https://starry-networks.com)], I had a chat with their
founder and learned that the combination of switch silicon and software may be a good match for IPng Networks.

I got pretty enthusiastic when this new vendor claimed VxLAN, GENEVE, MPLS and GRE at 56 ports and
line rate, on a really affordable budget ($4'200,- for the 56 port; and $1'650,- for the 26 port
switch). This reseller is using a less known silicon vendor called
[[Centec](https://www.centec.com/silicon)], who have a lineup of ethernet silicon. In this device,
the CTC8096 (GoldenGate) is used for cost effective high density 10GbE/40GbE applications paired
with 4x100GbE uplink capability. This is Centec's fourth generation, so CTC8096 inherits the feature
set from L2/L3 switching to advanced data center and metro Ethernet features with innovative
enhancement. The switch chip provides up to 96x10GbE ports, or 24x40GbE, or 80x10GbE + 4x100GbE
ports, inheriting from its predecessors a variety of features, including L2, L3, MPLS, VXLAN, MPLS
SR, and OAM/APS. Highlights features include Telemetry, Programmability, Security and traffic
management, and Network time synchronization.

{{< image width="450px" float="left" src="/assets/oem-switch/S5624X-front.png" alt="S5624X Front" >}}

{{< image width="450px" float="right" src="/assets/oem-switch/S5648X-front.png" alt="S5648X Front" >}}

<br /><br />

After discussing basic L2, L3 and Overlay functionality in my [[previous post]({{< ref "2022-12-05-oem-switch-1" >}})], I left somewhat of a cliffhanger alluding to all this fancy MPLS and
VPLS stuff. Honestly, I needed a bit more time to play around with the featureset and clarify a few things.
I'm now ready to assert that this stuff is really possible on this switch, and if this tickles your fancy, by
all means read on :)

## Detailed findings

### Hardware

{{< image width="400px" float="right" src="/assets/oem-switch/s5648x-front-opencase.png" alt="Front" >}}

The switch comes well packaged with two removable 400W Gold powersupplies from _Compuware
Technology_ which output 12V/33A and +5V/3A as well as four removable PWM controlled fans from
_Protechnic_. The switch chip is a Centec
[[CTC8096](https://www.centec.com/silicon/Ethernet-Switch-Silicon/CTC8096)] which is a competent silicon unit that can offer
48x10, 2x40 and 4x100G, and its smaller sibling carries the newer
[[CTC7132](https://www.centec.com/silicon/Ethernet-Switch-Silicon/CTC7132)] from 2019, which brings
24x10 and 2x100G connectivity. While the firmware seems slightly different in denomination, the large
one shows `NetworkOS-e580-v7.4.4.r.bin` as the firmware, and the smaller one shows
`uImage-v7.0.4.40.bin`, I get the impression that the latter is a compiled down version of the
former to work with the newer chipset.

In my [[previous post]({{< ref "2022-12-05-oem-switch-1" >}})], I showed L2, L3 and VxLAN, GENEVE
and NvGRE capabilities of this switch to be line rate. But the hardware also supports MPLS, so I
figured I'd complete the Overlay series by exploring VxLAN, and the MPLS, EoMPLS (L2VPN, Martini
style), and VPLS functionality of these units.

### Topology

![Front](/assets/oem-switch/topology.svg){: style="width:500px; float: right; margin-left: 1em;
margin-bottom: 1em;"}

In the [[IPng Networks LAB]({{< ref "2022-10-14-lab-1" >}})], I build the following topology using
the loadtester, packet analyzer, and switches:

*  **msw-top**: S5624-2Z-EI switch
*  **msw-core**: S5648X-2Q4ZA switch
*  **msw-bottom**: S5624-2Z-EI switch
*  All switches connect to:
   *   each other with 100G DACs (right, black)
   *   T-Rex machine with 4x10G (left, rainbow)
*  Each switch gets a mgmt IPv4 and IPv6

This is the same topology in the previous post, and it gives me lots of wiggle room to patch anything
to anything as I build point to point MPLS tunnels, VPLS clouds and eVPN overlays. Although I will
also load/stress test these configurations, this post is more about the higher level configuration
work that goes into building such an MPLS enabled telco network.

### MPLS

Why even bother, if we have these fancy new IP based transports that I [[wrote about]({{< ref "2022-12-05-oem-switch-1" >}})] last week? I mentioned that the industry is _moving on_ from MPLS
to a set of more flexible IP based solutions like VxLAN and GENEVE, as they certainly offer lots of
benefits in deployment (notably as overlays on top of existing IP networks).

Here's one plausible answer: You may have come across an architectural network design concept known
as [[BGP Free Core](https://bgphelp.com/2017/02/12/bgp-free-core/)] and operating this way gives
very little room for outages to occur in the L2 (Ethernet and MPLS) transport network, because it's
relatively simple in design and implementation. Some advantages worth mentioning:

*  Transport devices do not need to be capable of supporting a large number of IPv4/IPv6 routes, either
   in the RIB or FIB, allowing them to be much cheaper.
*  As there is no eBGP, transport devices will not be impacted by BGP-related issues, such as high CPU
   utilization during massive BGP re-convergence.
*  Also, without eBGP, some of the attack vectors in ISPs (loopback DDoS or ARP storms on public
   internet exchange, to take two common examples) can be eliminated.  If a new BGP security
   vulnerability were to be discovered, transport devices aren't impacted.
*  Operator errors (the #1 reason for outages in our industry) associated with BGP configuration and
   the use of large RIBs (eg. leaking into IGP, flapping transit sessions, etc) can be eradicated.
*  New transport services such as MPLS point to point virtual leased lines, SR-MPLS, VPLS clouds, and
   eVPN can all be introduced without modifying the routing core.

If deployed correctly, this type of transport-only network can be kept entirely isolated from the Internet,
making DDoS and hacking attacks against transport elements impossible, and it also opens up possibilities
for relatively safe sharing of infrastructure resources between ISPs (think of things like dark fibers
between locations, rackspace, power, cross connects).

For smaller clubs (like IPng Networks), being able to share a 100G wave with others, significantly reduces
price per Megabit! So if you're in Zurich, Switzerland, or Europe and find this an interesting avenue to
expand your reach in a co-op style environment, [[reach out](/s/contact)] to us, any time!

#### MPLS + LDP Configuration

OK, let's talk bits and bytes. Table stakes functionality is of course MPLS switching and label distribution,
which is performed with LDP, described in [[RFC3036](https://www.rfc-editor.org/rfc/rfc3036.html)].
Enabling these features is relatively straight forward:

```
msw-top# show run int loop0
interface loopback0
 ip address 172.20.0.2/32
 ipv6 address 2001:678:d78:400::2/128
 ipv6 router ospf 8298 area 0

msw-top# show run int eth-0-25
interface eth-0-25
 description Core: msw-bottom eth-0-25
 speed 100G
 no switchport
 mtu 9216
 label-switching
 ip address 172.20.0.12/31
 ipv6 address 2001:678:d78:400::3:1/112
 ip ospf network point-to-point
 ip ospf cost 104
 ipv6 ospf network point-to-point
 ipv6 ospf cost 106
 ipv6 router ospf 8298 area 0
 enable-ldp

msw-top# show run router ospf
router ospf 8298
 network 172.20.0.0/24 area 0

msw-top# show run router ipv6 ospf
router ipv6 ospf 8298
 router-id 172.20.0.2

msw-top# show run router ldp
router ldp
 router-id 172.20.0.2
 transport-address 172.20.0.2
```

This seems like a mouthful, but really not too complicated. From the top, I create a loopback
interface with an IPv4 (/32) and IPv6 (/128) address. Then, on the 100G transport interfaces, I
specify an IPv4 (/31, let's not be wasteful, take a look at [[RFC
3021](https://www.rfc-editor.org/rfc/rfc3021.html)]) and IPv6 (/112) transit network, after which I
add the interface to OSPF and OSPFv3.

The main two things to note in the interface definition is the use of `label-switching` which enables
MPLS on the interface, and `enable-ldp` which makes it periodically multicast LDP discovery
packets. If another device is also doing that, an LDP _adjacency_ is formed using a TCP session.
The two devices then exchange MPLS label tables, so that they learn from each other how to switch
MPLS packets across the network.

LDP _signalling_ kind of looks like this on the wire:
```
14:21:43.741089 IP 172.20.0.12.646 > 224.0.0.2.646: LDP, Label-Space-ID: 172.20.0.2:0, pdu-length: 30
14:21:44.331613 IP 172.20.0.13.646 > 224.0.0.2.646: LDP, Label-Space-ID: 172.20.0.1:0, pdu-length: 30

14:21:44.332773 IP 172.20.0.2.36475 > 172.20.0.1.646: Flags [S],seq 195175, win 27528,
  options [mss 9176,sackOK,TS val 104349486 ecr 0,nop,wscale 7], length 0
14:21:44.333700 IP 172.20.0.1.646 > 172.20.0.2.36475: Flags [S.], seq 466968, ack 195176, win 18328,
  options [mss 9176,sackOK,TS val 104335979 ecr 104349486,nop,wscale 7], length 0
14:21:44.334313 IP 172.20.0.2.36475 > 172.20.0.1.646: Flags [.], ack 1, win 216,
  options [nop,nop,TS val 104349486 ecr 104335979], length 0
```

The first two packets here are the routers announcing to [[well known
multicast](https://en.wikipedia.org/wiki/Multicast_address)] address for _all-routers_ (224.0.0.2),
and well known port 646 (for LDP), in a packet called a _Hello Message_. The router with
address 172.20.0.12 is the one we just configured (`msw-top`), and the one with address 172.20.0.13 is
the other side (`msw-bottom`). In these _Hello messages_, the router informs multicast listeners
where they should connect (called the _IPv4 transport address_), in the case of `msw-top`, it's
172.20.0.2.

Now that they've noticed one anothers willingness to form an adjacency, a TCP connection is
initiated from our router's loopback address (specified by `transport-address` in the LDP
configuration), towards the loopback that was learned from the _Hello Message_ in the multicast
packet earlier. A TCP three way handshake follows, in which the routers also tell each other their
MTU (by means of the MSS field set to 9176, which is 9216 minus 20 bytes [[IPv4
header](https://en.wikipedia.org/wiki/Internet_Protocol_version_4)] and 20 bytes [[TCP
header](https://en.wikipedia.org/wiki/Transmission_Control_Protocol)]). The adjacency forms and both
routers exchange label information (in things called a _Label Mapping Message_). Once done
exchanging this info, `msw-top` can now switch MPLS packets across its two 100G interfaces.

Zooming back out from what happened on the wire with the LDP _signalling_, I can take a look at the
`msw-top` switch: besides the adjacency that I described in detail above, another one has formed over
the IPv4 transit network between `msw-top` and `msw-core` (refer to the topology diagram to see what
connects where).  As this is a layer3 network, icky things like spanning tree and forwarding loops
are no longer an issue. Any switch can forward MPLS packets to any neighbor in this topology, preference
on the used path is informed with OSPF costs for the IPv4 interfaces (because LDP is using IPv4 here).

```
msw-top# show ldp adjacency
IP Address                 Intf Name    Holdtime   LDP-Identifier
172.20.0.10                   eth-0-26    15         172.20.0.1:0
172.20.0.13                   eth-0-25    15         172.20.0.0:0

msw-top# show ldp session
Peer IP Address           IF Name    My Role    State      KeepAlive
172.20.0.0                eth-0-25   Active    OPERATIONAL   30
172.20.0.1                eth-0-26   Active    OPERATIONAL   30
```

#### MPLS pseudowire

The easiest form (and possibly most widely used one), is to create a point to point ethernet link
betwen an interface on one switch, through the MPLS network, and into another switch's interface on
the other side. Think of this as a really long network cable. Ethernet frames are encapsulated into
an MPLS frame, and passed through the network though some sort of tunnel, called a _pseudowire_.

There are many names of this tunneling technique. Folks refer to them as PWs (PseudoWires), VLLs
(Virtual Leased Lines), Carrier Ethernet, or Metro Ethernet. Luckily, these are almost always
interoperable, because under the covers, the vendors are implementing these MPLS cross connect
circuits using [[Martini Tunnels](https://datatracker.ietf.org/doc/html/draft-martini-l2circuit-trans-mpls-00)]
which were formalized in [[RFC 4447](https://datatracker.ietf.org/doc/html/rfc4447)].

The way Martini tunnels work is by creating an extension in LDP signalling. An MPLS label-switched-path is
annotated as being of a certain type, carrying a 32 bit _pseudowire ID_, which is ignored by all
intermediate routers (they will just switch the MPLS packet onto the next hop), but the last router
will inspect the MPLS packet and find which _pseudowire ID_ it belongs to, and look up in its local
table what to do with it (mostly just unwrap the MPLS packet, and marshall the resulting ethernet
frame into an interface or tagged sub-interface).

Configuring the _pseudowire_ is really simple:

```
msw-top# configure terminal
interface eth-0-1
 mpls-l2-circuit pw-vll1 ethernet
!

mpls l2-circuit pw-vll1 829800 172.20.0.0 raw mtu 9000

msw-top# show ldp mpls-l2-circuit 829800
Transport  Client     VC     Trans    Local       Remote      Destination
VC ID      Binding    State  Type     VC Label    VC Label    Address
829800     eth-0-1    UP     Ethernet      32774       32773       172.20.0.0

```

After I've configured this on both `msw-top` and `msw-bottom`, using LDP signalling, a new LSP will be
set up which carries ethernet packets at up to 9000 bytes, encapsulated MPLS, over the network. To
show this in more detail, I'll take the two ethernet interfaces that are connected to `msw-top:eth-0-1`
and `msw-bottom:eth-0-1`, and move them in their own network namespace on the lab machine:

```
root@dut-lab:~# ip netns add top
root@dut-lab:~# ip netns add bottom
root@dut-lab:~# ip link set netns top enp66s0f0
root@dut-lab:~# ip link set netns bottom enp66s0f1
```

I can now enter the _top_ and _bottom_ namespaces, and play around with those interfaces, for example
I'll give them an IPv4 address and a sub-interface with dot1q tag 1234 and an IPv6 address:

```
root@dut-lab:~# nsenter --net=/var/run/netns/bottom
root@dut-lab:~# ip addr add 192.0.2.1/31 dev enp66s0f1
root@dut-lab:~# ip link add link enp66s0f1 name v1234 type vlan id 1234
root@dut-lab:~# ip addr add 2001:db8::2/64 dev v1234
root@dut-lab:~# ip link set v1234 up

root@dut-lab:~# nsenter --net=/var/run/netns/top
root@dut-lab:~# ip addr add 192.0.2.0/31 dev enp66s0f0
root@dut-lab:~# ip link add link enp66s0f0 name v1234 type vlan id 1234
root@dut-lab:~# ip addr add 2001:db8::1/64 dev v1234
root@dut-lab:~# ip link set v1234 up

root@dut-lab:~# ping -c 5 2001:db8::2
PING 2001:db8::2(2001:db8::2) 56 data bytes
64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.158 ms
64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.155 ms
64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.162 ms
```

The `mpls-l2-circuit` that I created will transport the received ethernet frames between `enp66s0f0`
(in the _top_ namespace) and `enp66s0f1` (in the _bottom_ namespace), using MPLS encapsulation, and
giving the packets a stack of _two_ labels. The outer most label helps the switches determine where
to switch the MPLS packet (in other words, route it from `msw-top` to `msw-bottom`). Once the
destination is reached, the outer label is popped off the stack, to reveal the second label, the
purpose of which is to tell the `msw-bottom` switch what, preciesly, to do with this payload. The
switch will find that the second label instructs it to transmit the MPLS payload as an ethernet
frame out on port `eth-0-1`.

If I want to look at what happens on the wire with tcpdump(8), I can use the monitor port on
`msw-core` which mirrors all packets transiting through it. But, I don't get very far:

```
root@dut-lab:~# tcpdump -evni eno2 mpls
19:57:37.055854 00:1e:08:0d:6e:88 > 00:1e:08:26:ec:f3, ethertype MPLS unicast (0x8847), length 144:
MPLS (label 32768, exp 0, ttl 255) (label 32773, exp 0, [S], ttl 255)
        0x0000:  9c69 b461 7679 9c69 b461 7678 8100 04d2  .i.avy.i.avx....
        0x0010:  86dd 6003 4a42 0040 3a40 2001 0db8 0000  ..`.JB.@:@......
        0x0020:  0000 0000 0000 0000 0001 2001 0db8 0000  ................
        0x0030:  0000 0000 0000 0000 0002 8000 3553 9326  ............5S.&
        0x0040:  0001 2185 9363 0000 0000 e7d9 0000 0000  ..!..c..........
        0x0050:  0000 1011 1213 1415 1617 1819 1a1b 1c1d  ................
        0x0060:  1e1f 2021 2223 2425 2627 2829 2a2b 2c2d  ...!"#$%&'()*+,-
        0x0070:  2e2f 3031 3233 3435 3637                 ./01234567
19:57:37.055890 00:1e:08:26:ec:f3 > 00:1e:08:0d:6e:88, ethertype MPLS unicast (0x8847), length 140:
MPLS (label 32774, exp 0, [S], ttl 254)
        0x0000:  9c69 b461 7678 9c69 b461 7679 8100 04d2  .i.avx.i.avy....
        0x0010:  86dd 6009 4122 0040 3a40 2001 0db8 0000  ..`.A".@:@......
        0x0020:  0000 0000 0000 0000 0002 2001 0db8 0000  ................
        0x0030:  0000 0000 0000 0000 0001 8100 3453 9326  ............4S.&
        0x0040:  0001 2185 9363 0000 0000 e7d9 0000 0000  ..!..c..........
        0x0050:  0000 1011 1213 1415 1617 1819 1a1b 1c1d  ................
        0x0060:  1e1f 2021 2223 2425 2627 2829 2a2b 2c2d  ...!"#$%&'()*+,-
        0x0070:  2e2f 3031 3233 3435 3637                 ./01234567
```

For a brief moment, I stare closely at the first part of the hex dump, and I recognize two MAC addresses
`9c69.b461.7678` and `9c69.b461.7679` followed by what appears to be `0x8100` (the ethertype for
[[Dot1Q](https://en.wikipedia.org/wiki/IEEE_802.1Q)]) and then `0x04d2` (which is 1234 in decimal,
the VLAN tag I chose).

Clearly, the hexdump here is "just" an ethernet frame. So why doesn't tcpdump decode it? The answer is simple:
nothing in the MPLS packet tells me that the payload is actually ethernet. It could be anything, and
it's really up to the recipient of the packet with the label 32773 to determine what its payload means.
Luckily, Wireshark can be prompted to decode further based on which MPLS label is present. Using the
_Decode As..._ option, I can specify that data following label 32773 is _Ethernet PW (no CW)_, where
PW here means _pseudowire_ and CW means _controlword_. Et, voil&agrave;, the first packet reveals itself:

{{< image src="/assets/oem-switch/mpls-wireshark-1.png" alt="MPLS Frame #1 disected" >}}

#### Pseudowires on Sub Interfaces

One very common use case for me at IPng Networks is to work with excellent partners like
[[IP-Max](https://www.ip-max.net/)] who provide Internet Exchange transport, for example from DE-CIX
or SwissIX, to the customer premises. IP-Max uses Cisco's ASR9k routers, an absolutely beautiful piece
of technology [[ref]({{< ref "2022-02-21-asr9006" >}})], and with those you can terminate a _L2VPN_ in
any sub-interface.

Let's configure something similar. I take one port on `msw-top`, and branch that out into three
remote locations, in this case `msw-bottom` port 1, 2 and 3. I will be terminating all three _pseudowires_
on the same endpoint, but obviously this could also be one port that goes to three internet exchanges,
say SwissIX, DE-CIX and FranceIX, on three different endpoints.

The configuration for both switches will look like this:

```
msw-top# configure terminal
interface eth-0-1
 switchport mode trunk
 switchport trunk native vlan 5
 switchport trunk allowed vlan add 6-8
 mpls-l2-circuit pw-vlan10 vlan 10
 mpls-l2-circuit pw-vlan20 vlan 20
 mpls-l2-circuit pw-vlan30 vlan 30

mpls l2-circuit pw-vlan10 829810 172.20.0.0 raw mtu 9000
mpls l2-circuit pw-vlan20 829820 172.20.0.0 raw mtu 9000
mpls l2-circuit pw-vlan30 829830 172.20.0.0 raw mtu 9000

msw-bottom# configure terminal
interface eth-0-1
 mpls-l2-circuit pw-vlan10 ethernet
interface eth-0-2
 mpls-l2-circuit pw-vlan20 ethernet
interface eth-0-3
 mpls-l2-circuit pw-vlan30 ethernet

mpls l2-circuit pw-vlan10 829810 172.20.0.2 raw mtu 9000
mpls l2-circuit pw-vlan20 829820 172.20.0.2 raw mtu 9000
mpls l2-circuit pw-vlan30 829830 172.20.0.2 raw mtu 9000
```

Previously, I configured the port in _ethernet_ mode, which takes all frames and forwards them into
the MPLS tunnel. In this case, I'm using _vlan_ mode, specifying a VLAN tag that, when frames arrive
on the port matching it, will selectively be put into a pseudowire. As an added benefit, this allows
me to still use the port as a regular switchport, in the snippet above it will take untagged frames
and assign them to VLAN 5, allow tagged frames with dot1q VLAN tag 6, 7 or 8, and handle them as any
normal switch would. VLAN tag 10, however, is directed into the pseudowire called _pw-vlan10_, and
the other two tags similarly get put into their own `l2-circuit`. Using LDP signalling, the _pw-id_
(829810, 829820, and 829830) determines which label is assigned. On the way back, that label
allows the switch to correlate the ethernet frame with the correct port and transmit the it with the
configured VLAN tag.

To show this from an end-user point of view, let's take a look at the Linux server connected to these
switches. I'll put one port in a namespace called _top_, and three other ports in a network namespace
called _bottom_, and then proceed to give them a little bit of config:

```
root@dut-lab:~# ip link set netns top dev enp66s0f0
root@dut-lab:~# ip link set netns bottom dev enp66s0f1
root@dut-lab:~# ip link set netns bottom dev enp66s0f2
root@dut-lab:~# ip link set netns bottom dev enp4s0f1

root@dut-lab:~# nsenter --net=/var/run/netns/top
root@dut-lab:~# ip link add link enp66s0f0 name v10 type vlan id 10
root@dut-lab:~# ip link add link enp66s0f0 name v20 type vlan id 20
root@dut-lab:~# ip link add link enp66s0f0 name v30 type vlan id 30
root@dut-lab:~# ip addr add 192.0.2.0/31 dev v10
root@dut-lab:~# ip addr add 192.0.2.2/31 dev v20
root@dut-lab:~# ip addr add 192.0.2.4/31 dev v30

root@dut-lab:~# nsenter --net=/var/run/netns/bottom
root@dut-lab:~# ip addr add 192.0.2.1/31 dev enp66s0f1
root@dut-lab:~# ip addr add 192.0.2.3/31 dev enp66s0f2
root@dut-lab:~# ip addr add 192.0.2.5/31 dev enp4s0f1
root@dut-lab:~# ping 192.0.2.4
PING 192.0.2.4 (192.0.2.4) 56(84) bytes of data.
64 bytes from 192.0.2.4: icmp_seq=1 ttl=64 time=0.153 ms
64 bytes from 192.0.2.4: icmp_seq=2 ttl=64 time=0.209 ms
```

To unpack this a little bit, in the first block I assign the interfaces to their respective
namespace. Then, for the interface connected to the `msw-top` switch, I create three dot1q
sub-interfaces, corresponding to the pseudowires I created. Note: untagged traffic out of
`enp66s0f0` will simply be picked up by the switch and assigned VLAN 5 (and I'm also allowed to send
VLAN tags 6, 7 and 8, which will all be handled locally).

But, VLAN 10, 20 and 30 will be moved through the MPLS network and pop out on the `msw-bottom`
switch, where they are each assigned a unique port, represented by `enp66s0f1`, `enp66s0f2` and
`enp4s0f1` connected to the bottom switch.

When I finally ping 192.0.2.4, that ICMP packet goes out on `enp4s0f1`, which enters
`msw-bottom:eth-0-3`, it gets assigned the pseudowire name _pw-vlan30_, which corresponds to the
_pw-id_ 829830, then it travels over the MPLS network, arriving at `msw-bottom` carrying a label
that tells that switch that it belongs to its local _pw-id_ 829830 which corresponds to name
_pw-vlan30_ and is assigned VLAN tag 30 on port `eth-0-1`. Phew, I made it. It actually makes sense
when you think about it!

#### VPLS

The _pseudowires_ that I described in the previous section are simply ethernet cross connects
spanning over an MPLS network. They are inherently point-to-point, much like a physical Ethernet
cable is. Sometimes, it makes more sense to take a local port and create what is called a _Virtual
Private LAN Service_ (VPLS), described in [[RFC4762]](https://www.rfc-editor.org/rfc/rfc4762.html),
where packets into this port are capable of being sent to any number of other ports on any number of
other switches, while using MPLS as transport.

By means of example, let's say a telco offers me one port in Amsterdam, one in Zurich and one in
Frankfurt. A VPLS instance would create an emulated LAN segment between these locations, in other
words a Layer 2 broadcast domain that is fully capable of learning and forwarding on Ethernet MAC
addresses but the ports are dedicated to me, and they are isolated from other customers. The telco
has essentially created a three-port switch for me, but at the same time, that telco can create
any number of VPLS services, each one unique to their individual customers. It's a pretty powerful
concept.

In principle, a VPLS consists of two parts:

1.   A full mesh of simple MPLS point-to-point tunnels from each participating switch to
     each other one. These are just _pseudowires_ with a given _pw-id_, just like I showed before.
1.   The _pseudowires_ are then tied together in a form of bridge domain, and learning is applied to
     MAC addresses that appear behind each port, signalling that these are available behind the port.

Configuration on the switch looks like this:

```
msw-top# configure terminal
interface eth-0-1
 mpls-vpls v-ipng ethernet
interface eth-0-2
 mpls-vpls v-ipng ethernet
interface eth-0-3
 mpls-vpls v-ipng ethernet
interface eth-0-4
 mpls-vpls v-ipng ethernet
!
mpls vpls v-ipng 829801
 vpls-peer 172.20.0.0 raw
 vpls-peer 172.20.0.1 raw
```

The first set of commands add each individual interface into the VPLS instance by binding it to a
name, in this case _v-ipng_. Then, the VPLS neighbors are specified, by offering a _pw-id_ (829801)
which is used to construct a _pseudowire_ to the two peers. The first, 172.20.0.0 is `msw-bottom`, and
the other, 172.20.0.1 is `msw-core`. Each switch that participates in the VPLS for _v-ipng_ will signal
LSPs to each of its peers, and MAC learning will be enabled just as if each of these _pseudowires_
were a regular switchport.

Once I configure this pattern on all three switches, effectively interfaces `eth-0-1 - 4` are now
bound together as a virtual switch with a unique broadcast domain dedicated to instance _v-ipng_.
I've created a fully transparent 12-port switch, which means that what-ever traffic I generate, will
be encapsulated in MPLS and sent through the MPLS network towards its destination port.

Let's take a look at the `msw-core` switch to see how this looks like:

```
msw-core# show ldp vpls
VPLS-ID     Peer Address      State  Type       Label-Sent   Label-Rcvd  Cw
829801      172.20.0.0        Up     ethernet   32774        32773        0
829801      172.20.0.2        Up     ethernet   32776        32774        0

msw-core# show mpls vpls mesh
VPLS-ID    Peer Addr/name         In-Label   Out-Intf   Out-Label  Type     St   Evpn Type2 Sr-tunid
829801     172.20.0.0/-           32777      eth-0-50   32775      RAW      Up   N    N     -
829801     172.20.0.2/-           32778      eth-0-49   32776      RAW      Up   N    N     -

msw-core# show mpls vpls detail
Virtual Private LAN Service Instance: v-ipng, ID: 829801
 Group ID: 0, Configured MTU: NULL
 Description: none
 AC interface :
     Name            TYPE          Vlan
     eth-0-1         Ethernet      ALL
     eth-0-2         Ethernet      ALL
     eth-0-3         Ethernet      ALL
     eth-0-4         Ethernet      ALL
 Mesh Peers :
  Peer            TYPE State C-Word  Tunnel name          LSP name
  172.20.0.0      RAW  UP    Disable N/A                  N/A
  172.20.0.2      RAW  UP    Disable N/A                  N/A
 Vpls-mac-learning enable
 Discard broadcast disabled
 Discard unknown-unicast disabled
 Discard unknown-multicast disabled
```

Putting this to the test, I decide to run a loadtest saturating 12x 10G of traffic through this
spiffy 12-port virtual switch. I randomly assign ports on the loadtester to the 12 ports in the
_v-ipng_ VPLS, and then I start full line rate load with 128 byte packets. Considering I'm using
twelve TenGig ports, I would expect 12x8.43 or roughly 101Mpps flowing, and indeed, the loadtests
demonstrate this mark nicely:

{{< image src="/assets/oem-switch/vpls-trex.png" alt="VPLS T-Rex" >}}

**Important**: The screenshot above shows the first four ports on the T-Rex interface only, but there
are actually _twelve ports_ participating in this loadtest. In the top right corner, the total
throughput is correctly represented. The switches are handling 120Gbps of L1, 103.5Gbps of L2 (which
is expected at 128b frames, as there is a little bit of ethernet overhead for each frame), which is
a whopping 101Mpps, which is exactly what I would expect.

And the chassis doesn't even get warm.

### Conclusions

It's just super cool to see a switch like this work as expected. I did not manage to overload it at
all, in my [[previous article]({{< ref "2022-12-05-oem-switch-1" >}})], I showed VxLAN, GENEVE and
NvGRE overlays at line rate. Here, I can see that MPLS with all of its Martini bells and whistles,
and as well the more advanced VPLS, are keeping up like a champ. I think at least for initial
configuration and throughput on all MPLS features I tested, both the small 24x10 + 2x100G switch,
and the larger 48x10 + 2x40 + 4x100G switch, are keeping up just fine.

A duration test will have to show if the configuration and switch fabric are stable _over time_, but
I am hopeful that Centec is hitting the exact sweet spot for me on the MPLS transport front.

Yes, yes yes. I did as well promise to take a look at eVPN functionality (this is another form of
L3VPN which uses iBGP to share which MAC addresses live behind which VxLAN ports). This post has
been fun, but also quite long (4300 words!) so I'll follow up in a future article on the eVPN
capabilities of the Centec switches.