--- date: "2022-12-09T11:56:54Z" title: 'Review: S5648X-2Q4Z Switch - Part 2: MPLS' aliases: - /s/articles/2022/12/09/oem-switch-2.html --- After receiving an e-mail from a newer [[China based OEM](https://starry-networks.com)], I had a chat with their founder and learned that the combination of switch silicon and software may be a good match for IPng Networks. I got pretty enthusiastic when this new vendor claimed VxLAN, GENEVE, MPLS and GRE at 56 ports and line rate, on a really affordable budget ($4'200,- for the 56 port; and $1'650,- for the 26 port switch). This reseller is using a less known silicon vendor called [[Centec](https://www.centec.com/silicon)], who have a lineup of ethernet silicon. In this device, the CTC8096 (GoldenGate) is used for cost effective high density 10GbE/40GbE applications paired with 4x100GbE uplink capability. This is Centec's fourth generation, so CTC8096 inherits the feature set from L2/L3 switching to advanced data center and metro Ethernet features with innovative enhancement. The switch chip provides up to 96x10GbE ports, or 24x40GbE, or 80x10GbE + 4x100GbE ports, inheriting from its predecessors a variety of features, including L2, L3, MPLS, VXLAN, MPLS SR, and OAM/APS. Highlights features include Telemetry, Programmability, Security and traffic management, and Network time synchronization. {{< image width="450px" float="left" src="/assets/oem-switch/S5624X-front.png" alt="S5624X Front" >}} {{< image width="450px" float="right" src="/assets/oem-switch/S5648X-front.png" alt="S5648X Front" >}}

After discussing basic L2, L3 and Overlay functionality in my [[previous post]({{< ref "2022-12-05-oem-switch-1" >}})], I left somewhat of a cliffhanger alluding to all this fancy MPLS and VPLS stuff. Honestly, I needed a bit more time to play around with the featureset and clarify a few things. I'm now ready to assert that this stuff is really possible on this switch, and if this tickles your fancy, by all means read on :) ## Detailed findings ### Hardware {{< image width="400px" float="right" src="/assets/oem-switch/s5648x-front-opencase.png" alt="Front" >}} The switch comes well packaged with two removable 400W Gold powersupplies from _Compuware Technology_ which output 12V/33A and +5V/3A as well as four removable PWM controlled fans from _Protechnic_. The switch chip is a Centec [[CTC8096](https://www.centec.com/silicon/Ethernet-Switch-Silicon/CTC8096)] which is a competent silicon unit that can offer 48x10, 2x40 and 4x100G, and its smaller sibling carries the newer [[CTC7132](https://www.centec.com/silicon/Ethernet-Switch-Silicon/CTC7132)] from 2019, which brings 24x10 and 2x100G connectivity. While the firmware seems slightly different in denomination, the large one shows `NetworkOS-e580-v7.4.4.r.bin` as the firmware, and the smaller one shows `uImage-v7.0.4.40.bin`, I get the impression that the latter is a compiled down version of the former to work with the newer chipset. In my [[previous post]({{< ref "2022-12-05-oem-switch-1" >}})], I showed L2, L3 and VxLAN, GENEVE and NvGRE capabilities of this switch to be line rate. But the hardware also supports MPLS, so I figured I'd complete the Overlay series by exploring VxLAN, and the MPLS, EoMPLS (L2VPN, Martini style), and VPLS functionality of these units. ### Topology ![Front](/assets/oem-switch/topology.svg){: style="width:500px; float: right; margin-left: 1em; margin-bottom: 1em;"} In the [[IPng Networks LAB]({{< ref "2022-10-14-lab-1" >}})], I build the following topology using the loadtester, packet analyzer, and switches: * **msw-top**: S5624-2Z-EI switch * **msw-core**: S5648X-2Q4ZA switch * **msw-bottom**: S5624-2Z-EI switch * All switches connect to: * each other with 100G DACs (right, black) * T-Rex machine with 4x10G (left, rainbow) * Each switch gets a mgmt IPv4 and IPv6 This is the same topology in the previous post, and it gives me lots of wiggle room to patch anything to anything as I build point to point MPLS tunnels, VPLS clouds and eVPN overlays. Although I will also load/stress test these configurations, this post is more about the higher level configuration work that goes into building such an MPLS enabled telco network. ### MPLS Why even bother, if we have these fancy new IP based transports that I [[wrote about]({{< ref "2022-12-05-oem-switch-1" >}})] last week? I mentioned that the industry is _moving on_ from MPLS to a set of more flexible IP based solutions like VxLAN and GENEVE, as they certainly offer lots of benefits in deployment (notably as overlays on top of existing IP networks). Here's one plausible answer: You may have come across an architectural network design concept known as [[BGP Free Core](https://bgphelp.com/2017/02/12/bgp-free-core/)] and operating this way gives very little room for outages to occur in the L2 (Ethernet and MPLS) transport network, because it's relatively simple in design and implementation. Some advantages worth mentioning: * Transport devices do not need to be capable of supporting a large number of IPv4/IPv6 routes, either in the RIB or FIB, allowing them to be much cheaper. * As there is no eBGP, transport devices will not be impacted by BGP-related issues, such as high CPU utilization during massive BGP re-convergence. * Also, without eBGP, some of the attack vectors in ISPs (loopback DDoS or ARP storms on public internet exchange, to take two common examples) can be eliminated. If a new BGP security vulnerability were to be discovered, transport devices aren't impacted. * Operator errors (the #1 reason for outages in our industry) associated with BGP configuration and the use of large RIBs (eg. leaking into IGP, flapping transit sessions, etc) can be eradicated. * New transport services such as MPLS point to point virtual leased lines, SR-MPLS, VPLS clouds, and eVPN can all be introduced without modifying the routing core. If deployed correctly, this type of transport-only network can be kept entirely isolated from the Internet, making DDoS and hacking attacks against transport elements impossible, and it also opens up possibilities for relatively safe sharing of infrastructure resources between ISPs (think of things like dark fibers between locations, rackspace, power, cross connects). For smaller clubs (like IPng Networks), being able to share a 100G wave with others, significantly reduces price per Megabit! So if you're in Zurich, Switzerland, or Europe and find this an interesting avenue to expand your reach in a co-op style environment, [[reach out](/s/contact)] to us, any time! #### MPLS + LDP Configuration OK, let's talk bits and bytes. Table stakes functionality is of course MPLS switching and label distribution, which is performed with LDP, described in [[RFC3036](https://www.rfc-editor.org/rfc/rfc3036.html)]. Enabling these features is relatively straight forward: ``` msw-top# show run int loop0 interface loopback0 ip address 172.20.0.2/32 ipv6 address 2001:678:d78:400::2/128 ipv6 router ospf 8298 area 0 msw-top# show run int eth-0-25 interface eth-0-25 description Core: msw-bottom eth-0-25 speed 100G no switchport mtu 9216 label-switching ip address 172.20.0.12/31 ipv6 address 2001:678:d78:400::3:1/112 ip ospf network point-to-point ip ospf cost 104 ipv6 ospf network point-to-point ipv6 ospf cost 106 ipv6 router ospf 8298 area 0 enable-ldp msw-top# show run router ospf router ospf 8298 network 172.20.0.0/24 area 0 msw-top# show run router ipv6 ospf router ipv6 ospf 8298 router-id 172.20.0.2 msw-top# show run router ldp router ldp router-id 172.20.0.2 transport-address 172.20.0.2 ``` This seems like a mouthful, but really not too complicated. From the top, I create a loopback interface with an IPv4 (/32) and IPv6 (/128) address. Then, on the 100G transport interfaces, I specify an IPv4 (/31, let's not be wasteful, take a look at [[RFC 3021](https://www.rfc-editor.org/rfc/rfc3021.html)]) and IPv6 (/112) transit network, after which I add the interface to OSPF and OSPFv3. The main two things to note in the interface definition is the use of `label-switching` which enables MPLS on the interface, and `enable-ldp` which makes it periodically multicast LDP discovery packets. If another device is also doing that, an LDP _adjacency_ is formed using a TCP session. The two devices then exchange MPLS label tables, so that they learn from each other how to switch MPLS packets across the network. LDP _signalling_ kind of looks like this on the wire: ``` 14:21:43.741089 IP 172.20.0.12.646 > 224.0.0.2.646: LDP, Label-Space-ID: 172.20.0.2:0, pdu-length: 30 14:21:44.331613 IP 172.20.0.13.646 > 224.0.0.2.646: LDP, Label-Space-ID: 172.20.0.1:0, pdu-length: 30 14:21:44.332773 IP 172.20.0.2.36475 > 172.20.0.1.646: Flags [S],seq 195175, win 27528, options [mss 9176,sackOK,TS val 104349486 ecr 0,nop,wscale 7], length 0 14:21:44.333700 IP 172.20.0.1.646 > 172.20.0.2.36475: Flags [S.], seq 466968, ack 195176, win 18328, options [mss 9176,sackOK,TS val 104335979 ecr 104349486,nop,wscale 7], length 0 14:21:44.334313 IP 172.20.0.2.36475 > 172.20.0.1.646: Flags [.], ack 1, win 216, options [nop,nop,TS val 104349486 ecr 104335979], length 0 ``` The first two packets here are the routers announcing to [[well known multicast](https://en.wikipedia.org/wiki/Multicast_address)] address for _all-routers_ (224.0.0.2), and well known port 646 (for LDP), in a packet called a _Hello Message_. The router with address 172.20.0.12 is the one we just configured (`msw-top`), and the one with address 172.20.0.13 is the other side (`msw-bottom`). In these _Hello messages_, the router informs multicast listeners where they should connect (called the _IPv4 transport address_), in the case of `msw-top`, it's 172.20.0.2. Now that they've noticed one anothers willingness to form an adjacency, a TCP connection is initiated from our router's loopback address (specified by `transport-address` in the LDP configuration), towards the loopback that was learned from the _Hello Message_ in the multicast packet earlier. A TCP three way handshake follows, in which the routers also tell each other their MTU (by means of the MSS field set to 9176, which is 9216 minus 20 bytes [[IPv4 header](https://en.wikipedia.org/wiki/Internet_Protocol_version_4)] and 20 bytes [[TCP header](https://en.wikipedia.org/wiki/Transmission_Control_Protocol)]). The adjacency forms and both routers exchange label information (in things called a _Label Mapping Message_). Once done exchanging this info, `msw-top` can now switch MPLS packets across its two 100G interfaces. Zooming back out from what happened on the wire with the LDP _signalling_, I can take a look at the `msw-top` switch: besides the adjacency that I described in detail above, another one has formed over the IPv4 transit network between `msw-top` and `msw-core` (refer to the topology diagram to see what connects where). As this is a layer3 network, icky things like spanning tree and forwarding loops are no longer an issue. Any switch can forward MPLS packets to any neighbor in this topology, preference on the used path is informed with OSPF costs for the IPv4 interfaces (because LDP is using IPv4 here). ``` msw-top# show ldp adjacency IP Address Intf Name Holdtime LDP-Identifier 172.20.0.10 eth-0-26 15 172.20.0.1:0 172.20.0.13 eth-0-25 15 172.20.0.0:0 msw-top# show ldp session Peer IP Address IF Name My Role State KeepAlive 172.20.0.0 eth-0-25 Active OPERATIONAL 30 172.20.0.1 eth-0-26 Active OPERATIONAL 30 ``` #### MPLS pseudowire The easiest form (and possibly most widely used one), is to create a point to point ethernet link betwen an interface on one switch, through the MPLS network, and into another switch's interface on the other side. Think of this as a really long network cable. Ethernet frames are encapsulated into an MPLS frame, and passed through the network though some sort of tunnel, called a _pseudowire_. There are many names of this tunneling technique. Folks refer to them as PWs (PseudoWires), VLLs (Virtual Leased Lines), Carrier Ethernet, or Metro Ethernet. Luckily, these are almost always interoperable, because under the covers, the vendors are implementing these MPLS cross connect circuits using [[Martini Tunnels](https://datatracker.ietf.org/doc/html/draft-martini-l2circuit-trans-mpls-00)] which were formalized in [[RFC 4447](https://datatracker.ietf.org/doc/html/rfc4447)]. The way Martini tunnels work is by creating an extension in LDP signalling. An MPLS label-switched-path is annotated as being of a certain type, carrying a 32 bit _pseudowire ID_, which is ignored by all intermediate routers (they will just switch the MPLS packet onto the next hop), but the last router will inspect the MPLS packet and find which _pseudowire ID_ it belongs to, and look up in its local table what to do with it (mostly just unwrap the MPLS packet, and marshall the resulting ethernet frame into an interface or tagged sub-interface). Configuring the _pseudowire_ is really simple: ``` msw-top# configure terminal interface eth-0-1 mpls-l2-circuit pw-vll1 ethernet ! mpls l2-circuit pw-vll1 829800 172.20.0.0 raw mtu 9000 msw-top# show ldp mpls-l2-circuit 829800 Transport Client VC Trans Local Remote Destination VC ID Binding State Type VC Label VC Label Address 829800 eth-0-1 UP Ethernet 32774 32773 172.20.0.0 ``` After I've configured this on both `msw-top` and `msw-bottom`, using LDP signalling, a new LSP will be set up which carries ethernet packets at up to 9000 bytes, encapsulated MPLS, over the network. To show this in more detail, I'll take the two ethernet interfaces that are connected to `msw-top:eth-0-1` and `msw-bottom:eth-0-1`, and move them in their own network namespace on the lab machine: ``` root@dut-lab:~# ip netns add top root@dut-lab:~# ip netns add bottom root@dut-lab:~# ip link set netns top enp66s0f0 root@dut-lab:~# ip link set netns bottom enp66s0f1 ``` I can now enter the _top_ and _bottom_ namespaces, and play around with those interfaces, for example I'll give them an IPv4 address and a sub-interface with dot1q tag 1234 and an IPv6 address: ``` root@dut-lab:~# nsenter --net=/var/run/netns/bottom root@dut-lab:~# ip addr add 192.0.2.1/31 dev enp66s0f1 root@dut-lab:~# ip link add link enp66s0f1 name v1234 type vlan id 1234 root@dut-lab:~# ip addr add 2001:db8::2/64 dev v1234 root@dut-lab:~# ip link set v1234 up root@dut-lab:~# nsenter --net=/var/run/netns/top root@dut-lab:~# ip addr add 192.0.2.0/31 dev enp66s0f0 root@dut-lab:~# ip link add link enp66s0f0 name v1234 type vlan id 1234 root@dut-lab:~# ip addr add 2001:db8::1/64 dev v1234 root@dut-lab:~# ip link set v1234 up root@dut-lab:~# ping -c 5 2001:db8::2 PING 2001:db8::2(2001:db8::2) 56 data bytes 64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.158 ms 64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.155 ms 64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.162 ms ``` The `mpls-l2-circuit` that I created will transport the received ethernet frames between `enp66s0f0` (in the _top_ namespace) and `enp66s0f1` (in the _bottom_ namespace), using MPLS encapsulation, and giving the packets a stack of _two_ labels. The outer most label helps the switches determine where to switch the MPLS packet (in other words, route it from `msw-top` to `msw-bottom`). Once the destination is reached, the outer label is popped off the stack, to reveal the second label, the purpose of which is to tell the `msw-bottom` switch what, preciesly, to do with this payload. The switch will find that the second label instructs it to transmit the MPLS payload as an ethernet frame out on port `eth-0-1`. If I want to look at what happens on the wire with tcpdump(8), I can use the monitor port on `msw-core` which mirrors all packets transiting through it. But, I don't get very far: ``` root@dut-lab:~# tcpdump -evni eno2 mpls 19:57:37.055854 00:1e:08:0d:6e:88 > 00:1e:08:26:ec:f3, ethertype MPLS unicast (0x8847), length 144: MPLS (label 32768, exp 0, ttl 255) (label 32773, exp 0, [S], ttl 255) 0x0000: 9c69 b461 7679 9c69 b461 7678 8100 04d2 .i.avy.i.avx.... 0x0010: 86dd 6003 4a42 0040 3a40 2001 0db8 0000 ..`.JB.@:@...... 0x0020: 0000 0000 0000 0000 0001 2001 0db8 0000 ................ 0x0030: 0000 0000 0000 0000 0002 8000 3553 9326 ............5S.& 0x0040: 0001 2185 9363 0000 0000 e7d9 0000 0000 ..!..c.......... 0x0050: 0000 1011 1213 1415 1617 1819 1a1b 1c1d ................ 0x0060: 1e1f 2021 2223 2425 2627 2829 2a2b 2c2d ...!"#$%&'()*+,- 0x0070: 2e2f 3031 3233 3435 3637 ./01234567 19:57:37.055890 00:1e:08:26:ec:f3 > 00:1e:08:0d:6e:88, ethertype MPLS unicast (0x8847), length 140: MPLS (label 32774, exp 0, [S], ttl 254) 0x0000: 9c69 b461 7678 9c69 b461 7679 8100 04d2 .i.avx.i.avy.... 0x0010: 86dd 6009 4122 0040 3a40 2001 0db8 0000 ..`.A".@:@...... 0x0020: 0000 0000 0000 0000 0002 2001 0db8 0000 ................ 0x0030: 0000 0000 0000 0000 0001 8100 3453 9326 ............4S.& 0x0040: 0001 2185 9363 0000 0000 e7d9 0000 0000 ..!..c.......... 0x0050: 0000 1011 1213 1415 1617 1819 1a1b 1c1d ................ 0x0060: 1e1f 2021 2223 2425 2627 2829 2a2b 2c2d ...!"#$%&'()*+,- 0x0070: 2e2f 3031 3233 3435 3637 ./01234567 ``` For a brief moment, I stare closely at the first part of the hex dump, and I recognize two MAC addresses `9c69.b461.7678` and `9c69.b461.7679` followed by what appears to be `0x8100` (the ethertype for [[Dot1Q](https://en.wikipedia.org/wiki/IEEE_802.1Q)]) and then `0x04d2` (which is 1234 in decimal, the VLAN tag I chose). Clearly, the hexdump here is "just" an ethernet frame. So why doesn't tcpdump decode it? The answer is simple: nothing in the MPLS packet tells me that the payload is actually ethernet. It could be anything, and it's really up to the recipient of the packet with the label 32773 to determine what its payload means. Luckily, Wireshark can be prompted to decode further based on which MPLS label is present. Using the _Decode As..._ option, I can specify that data following label 32773 is _Ethernet PW (no CW)_, where PW here means _pseudowire_ and CW means _controlword_. Et, voilà, the first packet reveals itself: {{< image src="/assets/oem-switch/mpls-wireshark-1.png" alt="MPLS Frame #1 disected" >}} #### Pseudowires on Sub Interfaces One very common use case for me at IPng Networks is to work with excellent partners like [[IP-Max](https://www.ip-max.net/)] who provide Internet Exchange transport, for example from DE-CIX or SwissIX, to the customer premises. IP-Max uses Cisco's ASR9k routers, an absolutely beautiful piece of technology [[ref]({{< ref "2022-02-21-asr9006" >}})], and with those you can terminate a _L2VPN_ in any sub-interface. Let's configure something similar. I take one port on `msw-top`, and branch that out into three remote locations, in this case `msw-bottom` port 1, 2 and 3. I will be terminating all three _pseudowires_ on the same endpoint, but obviously this could also be one port that goes to three internet exchanges, say SwissIX, DE-CIX and FranceIX, on three different endpoints. The configuration for both switches will look like this: ``` msw-top# configure terminal interface eth-0-1 switchport mode trunk switchport trunk native vlan 5 switchport trunk allowed vlan add 6-8 mpls-l2-circuit pw-vlan10 vlan 10 mpls-l2-circuit pw-vlan20 vlan 20 mpls-l2-circuit pw-vlan30 vlan 30 mpls l2-circuit pw-vlan10 829810 172.20.0.0 raw mtu 9000 mpls l2-circuit pw-vlan20 829820 172.20.0.0 raw mtu 9000 mpls l2-circuit pw-vlan30 829830 172.20.0.0 raw mtu 9000 msw-bottom# configure terminal interface eth-0-1 mpls-l2-circuit pw-vlan10 ethernet interface eth-0-2 mpls-l2-circuit pw-vlan20 ethernet interface eth-0-3 mpls-l2-circuit pw-vlan30 ethernet mpls l2-circuit pw-vlan10 829810 172.20.0.2 raw mtu 9000 mpls l2-circuit pw-vlan20 829820 172.20.0.2 raw mtu 9000 mpls l2-circuit pw-vlan30 829830 172.20.0.2 raw mtu 9000 ``` Previously, I configured the port in _ethernet_ mode, which takes all frames and forwards them into the MPLS tunnel. In this case, I'm using _vlan_ mode, specifying a VLAN tag that, when frames arrive on the port matching it, will selectively be put into a pseudowire. As an added benefit, this allows me to still use the port as a regular switchport, in the snippet above it will take untagged frames and assign them to VLAN 5, allow tagged frames with dot1q VLAN tag 6, 7 or 8, and handle them as any normal switch would. VLAN tag 10, however, is directed into the pseudowire called _pw-vlan10_, and the other two tags similarly get put into their own `l2-circuit`. Using LDP signalling, the _pw-id_ (829810, 829820, and 829830) determines which label is assigned. On the way back, that label allows the switch to correlate the ethernet frame with the correct port and transmit the it with the configured VLAN tag. To show this from an end-user point of view, let's take a look at the Linux server connected to these switches. I'll put one port in a namespace called _top_, and three other ports in a network namespace called _bottom_, and then proceed to give them a little bit of config: ``` root@dut-lab:~# ip link set netns top dev enp66s0f0 root@dut-lab:~# ip link set netns bottom dev enp66s0f1 root@dut-lab:~# ip link set netns bottom dev enp66s0f2 root@dut-lab:~# ip link set netns bottom dev enp4s0f1 root@dut-lab:~# nsenter --net=/var/run/netns/top root@dut-lab:~# ip link add link enp66s0f0 name v10 type vlan id 10 root@dut-lab:~# ip link add link enp66s0f0 name v20 type vlan id 20 root@dut-lab:~# ip link add link enp66s0f0 name v30 type vlan id 30 root@dut-lab:~# ip addr add 192.0.2.0/31 dev v10 root@dut-lab:~# ip addr add 192.0.2.2/31 dev v20 root@dut-lab:~# ip addr add 192.0.2.4/31 dev v30 root@dut-lab:~# nsenter --net=/var/run/netns/bottom root@dut-lab:~# ip addr add 192.0.2.1/31 dev enp66s0f1 root@dut-lab:~# ip addr add 192.0.2.3/31 dev enp66s0f2 root@dut-lab:~# ip addr add 192.0.2.5/31 dev enp4s0f1 root@dut-lab:~# ping 192.0.2.4 PING 192.0.2.4 (192.0.2.4) 56(84) bytes of data. 64 bytes from 192.0.2.4: icmp_seq=1 ttl=64 time=0.153 ms 64 bytes from 192.0.2.4: icmp_seq=2 ttl=64 time=0.209 ms ``` To unpack this a little bit, in the first block I assign the interfaces to their respective namespace. Then, for the interface connected to the `msw-top` switch, I create three dot1q sub-interfaces, corresponding to the pseudowires I created. Note: untagged traffic out of `enp66s0f0` will simply be picked up by the switch and assigned VLAN 5 (and I'm also allowed to send VLAN tags 6, 7 and 8, which will all be handled locally). But, VLAN 10, 20 and 30 will be moved through the MPLS network and pop out on the `msw-bottom` switch, where they are each assigned a unique port, represented by `enp66s0f1`, `enp66s0f2` and `enp4s0f1` connected to the bottom switch. When I finally ping 192.0.2.4, that ICMP packet goes out on `enp4s0f1`, which enters `msw-bottom:eth-0-3`, it gets assigned the pseudowire name _pw-vlan30_, which corresponds to the _pw-id_ 829830, then it travels over the MPLS network, arriving at `msw-bottom` carrying a label that tells that switch that it belongs to its local _pw-id_ 829830 which corresponds to name _pw-vlan30_ and is assigned VLAN tag 30 on port `eth-0-1`. Phew, I made it. It actually makes sense when you think about it! #### VPLS The _pseudowires_ that I described in the previous section are simply ethernet cross connects spanning over an MPLS network. They are inherently point-to-point, much like a physical Ethernet cable is. Sometimes, it makes more sense to take a local port and create what is called a _Virtual Private LAN Service_ (VPLS), described in [[RFC4762]](https://www.rfc-editor.org/rfc/rfc4762.html), where packets into this port are capable of being sent to any number of other ports on any number of other switches, while using MPLS as transport. By means of example, let's say a telco offers me one port in Amsterdam, one in Zurich and one in Frankfurt. A VPLS instance would create an emulated LAN segment between these locations, in other words a Layer 2 broadcast domain that is fully capable of learning and forwarding on Ethernet MAC addresses but the ports are dedicated to me, and they are isolated from other customers. The telco has essentially created a three-port switch for me, but at the same time, that telco can create any number of VPLS services, each one unique to their individual customers. It's a pretty powerful concept. In principle, a VPLS consists of two parts: 1. A full mesh of simple MPLS point-to-point tunnels from each participating switch to each other one. These are just _pseudowires_ with a given _pw-id_, just like I showed before. 1. The _pseudowires_ are then tied together in a form of bridge domain, and learning is applied to MAC addresses that appear behind each port, signalling that these are available behind the port. Configuration on the switch looks like this: ``` msw-top# configure terminal interface eth-0-1 mpls-vpls v-ipng ethernet interface eth-0-2 mpls-vpls v-ipng ethernet interface eth-0-3 mpls-vpls v-ipng ethernet interface eth-0-4 mpls-vpls v-ipng ethernet ! mpls vpls v-ipng 829801 vpls-peer 172.20.0.0 raw vpls-peer 172.20.0.1 raw ``` The first set of commands add each individual interface into the VPLS instance by binding it to a name, in this case _v-ipng_. Then, the VPLS neighbors are specified, by offering a _pw-id_ (829801) which is used to construct a _pseudowire_ to the two peers. The first, 172.20.0.0 is `msw-bottom`, and the other, 172.20.0.1 is `msw-core`. Each switch that participates in the VPLS for _v-ipng_ will signal LSPs to each of its peers, and MAC learning will be enabled just as if each of these _pseudowires_ were a regular switchport. Once I configure this pattern on all three switches, effectively interfaces `eth-0-1 - 4` are now bound together as a virtual switch with a unique broadcast domain dedicated to instance _v-ipng_. I've created a fully transparent 12-port switch, which means that what-ever traffic I generate, will be encapsulated in MPLS and sent through the MPLS network towards its destination port. Let's take a look at the `msw-core` switch to see how this looks like: ``` msw-core# show ldp vpls VPLS-ID Peer Address State Type Label-Sent Label-Rcvd Cw 829801 172.20.0.0 Up ethernet 32774 32773 0 829801 172.20.0.2 Up ethernet 32776 32774 0 msw-core# show mpls vpls mesh VPLS-ID Peer Addr/name In-Label Out-Intf Out-Label Type St Evpn Type2 Sr-tunid 829801 172.20.0.0/- 32777 eth-0-50 32775 RAW Up N N - 829801 172.20.0.2/- 32778 eth-0-49 32776 RAW Up N N - msw-core# show mpls vpls detail Virtual Private LAN Service Instance: v-ipng, ID: 829801 Group ID: 0, Configured MTU: NULL Description: none AC interface : Name TYPE Vlan eth-0-1 Ethernet ALL eth-0-2 Ethernet ALL eth-0-3 Ethernet ALL eth-0-4 Ethernet ALL Mesh Peers : Peer TYPE State C-Word Tunnel name LSP name 172.20.0.0 RAW UP Disable N/A N/A 172.20.0.2 RAW UP Disable N/A N/A Vpls-mac-learning enable Discard broadcast disabled Discard unknown-unicast disabled Discard unknown-multicast disabled ``` Putting this to the test, I decide to run a loadtest saturating 12x 10G of traffic through this spiffy 12-port virtual switch. I randomly assign ports on the loadtester to the 12 ports in the _v-ipng_ VPLS, and then I start full line rate load with 128 byte packets. Considering I'm using twelve TenGig ports, I would expect 12x8.43 or roughly 101Mpps flowing, and indeed, the loadtests demonstrate this mark nicely: {{< image src="/assets/oem-switch/vpls-trex.png" alt="VPLS T-Rex" >}} **Important**: The screenshot above shows the first four ports on the T-Rex interface only, but there are actually _twelve ports_ participating in this loadtest. In the top right corner, the total throughput is correctly represented. The switches are handling 120Gbps of L1, 103.5Gbps of L2 (which is expected at 128b frames, as there is a little bit of ethernet overhead for each frame), which is a whopping 101Mpps, which is exactly what I would expect. And the chassis doesn't even get warm. ### Conclusions It's just super cool to see a switch like this work as expected. I did not manage to overload it at all, in my [[previous article]({{< ref "2022-12-05-oem-switch-1" >}})], I showed VxLAN, GENEVE and NvGRE overlays at line rate. Here, I can see that MPLS with all of its Martini bells and whistles, and as well the more advanced VPLS, are keeping up like a champ. I think at least for initial configuration and throughput on all MPLS features I tested, both the small 24x10 + 2x100G switch, and the larger 48x10 + 2x40 + 4x100G switch, are keeping up just fine. A duration test will have to show if the configuration and switch fabric are stable _over time_, but I am hopeful that Centec is hitting the exact sweet spot for me on the MPLS transport front. Yes, yes yes. I did as well promise to take a look at eVPN functionality (this is another form of L3VPN which uses iBGP to share which MAC addresses live behind which VxLAN ports). This post has been fun, but also quite long (4300 words!) so I'll follow up in a future article on the eVPN capabilities of the Centec switches.