All checks were successful
continuous-integration/drone/push Build is passing
790 lines
47 KiB
Markdown
790 lines
47 KiB
Markdown
---
|
|
date: "2025-04-09T07:51:23Z"
|
|
title: 'FrysIX eVPN: think different'
|
|
---
|
|
|
|
{{< image float="right" src="/assets/frys-ix/frysix-logo-small.png" alt="FrysIX Logo" width="12em" >}}
|
|
|
|
# Introduction
|
|
|
|
Somewhere in the far north of the Netherlands, the country where I was born, a town called Jubbega
|
|
is the home of the Frysian Internet Exchange called [[Frys-IX](https://frys-ix.net/)]. Back in 2021,
|
|
a buddy of mine, Arend, said that he was planning on renting a rack at the NIKHEF facility, one of
|
|
the most densely populated facilities in western Europe. He was looking for a few launching
|
|
customers and I was definitely in the market for a presence in Amsterdam. I even wrote about it on
|
|
my [[bucketlist]({{< ref 2021-07-26-bucketlist.md >}})]. Arend and his IT company
|
|
[[ERITAP](https://www.eritap.com/)], took delivery of that rack in May of 2021, and this is when the
|
|
internet exchange with _Frysian roots_ was born.
|
|
|
|
In the years from 2021 until now, Arend and I have been operating the exchange with reasonable
|
|
success. It grew from a handful of folks in that first rack, to now some 250 participating ISPs
|
|
with about ten switches in six datacenters across the Amsterdam metro area. It's shifting a cool
|
|
800Gbit of traffic or so. It's dope, and very rewarding to be able to contribute to this community!
|
|
|
|
## Frys-IX is growing
|
|
|
|
We have several members with a 2x100G LAG and even though all inter-datacenter links are either dark
|
|
fiber or WDM, we're starting to feel the growing pains as we set our sights to the next step growth.
|
|
You see, when FrysIX did 13.37Gbit of traffic, Arend organized a barbecue. When it did 133.7Gbit of
|
|
traffic, Arend organized an even bigger barbecue. Obviously, the next step is 1337Gbit and joining
|
|
the infamous [[One TeraBit Club](https://github.com/tking/OneTeraBitClub)]. Thomas: we're on our
|
|
way!
|
|
|
|
It became clear that we will not be able to keep a dependable peering platform if FrysIX remains a
|
|
single L2 broadcast domain, and it also became clear that concatenating multiple 100G ports would be
|
|
operationally expensive (think of all the dark fiber or WDM waves!), and brittle (think of LACP and
|
|
balancing traffic over those ports). We need to modernize in order to stay ahead of the growth
|
|
curve.
|
|
|
|
## Hello Nokia
|
|
|
|
{{< image float="right" src="/assets/frys-ix/nokia-7220-d4.png" alt="Nokia 7220-D4" width="20em" >}}
|
|
|
|
The Nokia 7220 Interconnect Router (7220 IXR) for data center fabric provides fixed-configuration,
|
|
high-capacity platforms that let you bring unmatched scale, flexibility and operational simplicity
|
|
to your data center networks and peering network environments. These devices are built around the
|
|
Broadcom _Trident_ chipset, in the case of the "D4" platform, this is a Trident4 with 28x100G and
|
|
8x400G ports. Whoot!
|
|
|
|
{{< image float="right" src="/assets/frys-ix/IXR-7220-D3.jpg" alt="Nokia 7220-D3" width="20em" >}}
|
|
|
|
What I find particularly awesome of the Trident series is their speed (total bandwidth of
|
|
12.8Tbps _per router_), low power use (without optics, the IXR-7220-D4 consumes about 150W) and
|
|
a plethora of advanced capabilities like L2/L3 filtering, IPv4, IPv6 and MPLS routing, and modern
|
|
approaches to scale-out networking such as VXLAN based EVPN. At the FrysIX barbecue in September of
|
|
2024, FrysIX was gifted a rather powerful IXR-7220-D3 router, shown in the picture to the right.
|
|
That's a 32x100G router.
|
|
|
|
ERITAP has bought two (new in box) IXR-7220-D4 (8x400G,28x100G) routers, and has also acquired two
|
|
IXR-7220-D2 (48x25G,8x100G) routers. So in total, FrysIX is now the proud owner of five of these
|
|
beautiful Nokia devices. If you haven't yet, you should definitely read about these versatile
|
|
routers on the [[Nokia](https://onestore.nokia.com/asset/207599)] website, and some details of the
|
|
_merchant silicon_ switch chips in use on the
|
|
[[Broadcom](https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56880-series)]
|
|
website.
|
|
|
|
### eVPN: A small rant
|
|
|
|
{{< image float="right" src="/assets/frys-ix/FrysIX_ Topology (concept).svg" alt="Topology Concept" width="50%" >}}
|
|
|
|
First, I need to get something off my chest. Consider a topology for an internet exchange platform,
|
|
taking into account the available equipment, rackspace, power, and cross connects. Somehow, almost
|
|
every design or reference architecture I can find on the Internet, assumes folks want to build a
|
|
[[Clos network](https://en.wikipedia.org/wiki/Clos_network)], which has a topology existing of leaf
|
|
and spine switches. The _spine_ switches have a different set of features than the _leaf_ ones,
|
|
notably they don't have to do provider edge functionality like VXLAN encap and decapsulation.
|
|
Almost all of these designs are showing how one might build a leaf-spine network for hyperscale.
|
|
|
|
**Critique 1**: my 'spine' (IXR-7220-D4 routers) must also be provider edge. Practically speaking,
|
|
in the picture above I have these beautiful Nokia IXR-7220-D4 switches, using two 400G ports to
|
|
connect between the facilities, and six 100G ports to connect the smaller breakout switches. That
|
|
would leave a _massive_ amount of capacity unused: 22x 100G and 6x400G ports, to be exact.
|
|
|
|
**Critique 2**: all 'leaf' (either IXR-7220-D2 routers or Arista switches) can't realistically
|
|
connect to both 'spines'. Our devices are spread out over two (and in practice, more like six)
|
|
datacenters, and it's prohibitively expensive to get 100G waves or dark fiber to create a full mesh.
|
|
It's much more economical to create a star-topology that minimizes cross-datacenter fiber spans.
|
|
|
|
**Critique 3**: Most of these 'spine-leaf' reference architectures assume that the interior gateway
|
|
protocol is eBGP in what they call the _underlay_, and on top of that, some secondary eBGP that's
|
|
called the _overlay_. Frankly, such a design makes my head spin a little bit. These designs assume
|
|
hundreds of switches, in which case making use of one AS number per switch could make sense (as iBGP
|
|
needs either a 'full mesh', or external route reflectors).
|
|
|
|
**Critique 4**: These reference designs also make an assumption that all fiber is local and while
|
|
links can fail, it will be relatively rare to _drain_ a link. However, in cross-datacenter networks,
|
|
draining links for maintenance is very common, for example if the dark fiber provider needs to
|
|
perform maintenance. With these eBGP-over-eBGP connections, traffic engineering is more difficult
|
|
than simply raising the OSPF (or IS-IS) cost of a link, to reroute traffic.
|
|
|
|
Setting aside eVPN for a second, if I were to build an IP transport network, like I did when I built
|
|
[[IPng Site Local]({{< ref 2023-03-11-mpls-core.md >}})], I would use a much more intuitive
|
|
and simple (I would even dare say elegant) design:
|
|
|
|
1. Take a classic IGP like [[OSPF](https://en.wikipedia.org/wiki/Open_Shortest_Path_First)], or
|
|
perhaps [[IS-IS](https://en.wikipedia.org/wiki/IS-IS)]. There is no benefit, to me at least, to use
|
|
BGP as an IGP.
|
|
1. I would give each of the links between the switches an IPv4 /31 and enable link-local, and give
|
|
each switch a loopback address with a /32 IPv4 and a /128 IPv6.
|
|
1. If I had multiple links between two given switches, I would probably just use ECMP if my devices
|
|
supported it, and fall back to a LACP signaled bundle-ethernet otherwise.
|
|
1. If I were to need to use BGP (and for eVPN, this need exists), taking the ISP mindset (as opposed
|
|
to the datacenter fabric mindset), I would simply install iBGP against two or three route
|
|
reflectors, and exchange routing information within the same single AS number.
|
|
|
|
### eVPN: A demo topology
|
|
|
|
{{< image float="right" src="/assets/frys-ix/Nokia Arista VXLAN.svg" alt="Demo topology" width="50%" >}}
|
|
|
|
So, that's exactly how I'm going to approach the FrysIX eVPN design: OSPF for the underlay and iBGP
|
|
for the overlay! I have a feeling that some folks will dispise me for being contrarian, but you can
|
|
leave your comments below, and don't forget to like-and-subscribe :-)
|
|
|
|
Arend builds this topology for me in Jubbega - also known as FrysIX HQ. He takes the two
|
|
400G-capable switches and connects them. Then he takes an Arista DCS-7060CX switch (which is eVPN
|
|
capable, with 32x100G ports, based on the Broadcom Tomahawk3 chipset), and a smaller Nokia
|
|
IXR-7220-D2 (with 48x25G and 8x100G ports, based on the Trident3 chipset). He wires all of this up
|
|
to look like the picture on the right.
|
|
|
|
#### Underlay: Nokia's SR Linux
|
|
|
|
We boot up the lab, verify that all the optics and links are up, and connect the management ports to
|
|
an OOB network that I can remotely log in to. This is the first time that either of us work on
|
|
Nokia, but I find it reasonably intuitive once I get a few tips and tricks from Niek.
|
|
|
|
```
|
|
[pim@nikhef ~]$ sr_cli
|
|
--{ running }--[ ]--
|
|
A:pim@nikhef# enter candidate
|
|
--{ candidate shared default }--[ ]--
|
|
A:pim@nikhef# set / interface lo0 admin-state enable
|
|
A:pim@nikhef# set / interface lo0 subinterface 0 admin-state enable
|
|
A:pim@nikhef# set / interface lo0 subinterface 0 ipv4 admin-state enable
|
|
A:pim@nikhef# set / interface lo0 subinterface 0 ipv4 address 198.19.16.1/32
|
|
A:pim@nikhef# commit stay
|
|
```
|
|
|
|
There, my first config snippet! This creates a _loopback_ interface, and similar to JunOS, a
|
|
_subinterface_ (which Juniper calls a _unit_) which enables IPv4 and gives it an /32 address. In SR
|
|
Linux, any interface has to be associated with a _network-instance_, think of those as routing
|
|
domains or VRFs. There's a conveniently named _default_ network-instance, which I'll add this and
|
|
the point-to-point interface between the two 400G routers to:
|
|
|
|
```
|
|
A:pim@nikhef# info flat interface ethernet-1/29
|
|
set / interface ethernet-1/29 admin-state enable
|
|
set / interface ethernet-1/29 subinterface 0 admin-state enable
|
|
set / interface ethernet-1/29 subinterface 0 ip-mtu 9190
|
|
set / interface ethernet-1/29 subinterface 0 ipv4 admin-state enable
|
|
set / interface ethernet-1/29 subinterface 0 ipv4 address 198.19.17.1/31
|
|
set / interface ethernet-1/29 subinterface 0 ipv6 admin-state enable
|
|
|
|
A:pim@nikhef# set / network-instance default type default
|
|
A:pim@nikhef# set / network-instance default admin-state enable
|
|
A:pim@nikhef# set / network-instance default interface ethernet-1/29.0
|
|
A:pim@nikhef# set / network-instance default interface lo0.0
|
|
A:pim@nikhef# commit stay
|
|
```
|
|
|
|
Cool. Assuming I now also do this on the other IXR-7220-D4 router, called _equinix_ (which gets the
|
|
loopback address 198.19.16.0/32 and the point-to-point on the 400G interface of 198.19.17.0/31), I
|
|
should be able to do my first jumboframe ping:
|
|
|
|
```
|
|
A:pim@equinix# ping network-instance default 198.19.17.1 -s 9162 -M do
|
|
Using network instance default
|
|
PING 198.19.17.1 (198.19.17.1) 9162(9190) bytes of data.
|
|
9170 bytes from 198.19.17.1: icmp_seq=1 ttl=64 time=0.466 ms
|
|
9170 bytes from 198.19.17.1: icmp_seq=2 ttl=64 time=0.477 ms
|
|
9170 bytes from 198.19.17.1: icmp_seq=3 ttl=64 time=0.547 ms
|
|
```
|
|
|
|
#### Underlay: SR Linux OSPF
|
|
|
|
OK, let's get these two Nokia routers to speak OSPF, so that they can reach each others' loopbacks.
|
|
It's really easy:
|
|
|
|
```
|
|
A:pim@nikhef# / network-instance default protocols ospf instance default
|
|
--{ candidate shared default }--[ network-instance default protocols ospf instance default ]--
|
|
A:pim@nikhef# set admin-state enable
|
|
A:pim@nikhef# set version ospf-v2
|
|
A:pim@nikhef# set router-id 198.19.16.1
|
|
A:pim@nikhef# set area 0.0.0.0 interface ethernet-1/29.0 interface-type point-to-point
|
|
A:pim@nikhef# set area 0.0.0.0 interface lo0.0 passive true
|
|
A:pim@nikhef# commit stay
|
|
```
|
|
|
|
Similar to in JunOS, I can descend into a configuration scope (the first line goes into the
|
|
_network-instance_ called `default` and then the _protocols_ called `ospf`, and then the _instance_
|
|
called `default`. Subsequent `set` commands operate at this scope. Once I commit this configuration
|
|
(on the _nikhef_ router and also the _equinix_ router, with its own unique router-id), OSPF shoots
|
|
to life immediately:
|
|
|
|
```
|
|
A:pim@nikhef# show network-instance default protocols ospf neighbor
|
|
=========================================================================================
|
|
Net-Inst default OSPFv2 Instance default Neighbors
|
|
=========================================================================================
|
|
+---------------------------------------------------------------------------------------+
|
|
| Interface-Name Rtr Id State Pri RetxQ Time Before Dead |
|
|
+=======================================================================================+
|
|
| ethernet-1/29.0 198.19.16.0 full 1 0 36 |
|
|
+---------------------------------------------------------------------------------------+
|
|
-----------------------------------------------------------------------------------------
|
|
No. of Neighbors: 1
|
|
=========================================================================================
|
|
|
|
A:pim@nikhef# show network-instance default route-table all | more
|
|
IPv4 unicast route table of network instance default
|
|
+------------------+-----+------------+--------------+--------+----------+--------+------+-------------+-----------------+
|
|
| Prefix | ID | Route Type | Route Owner | Active | Origin | Metric | Pref | Next-hop | Next-hop |
|
|
| | | | | | Network | | | (Type) | Interface |
|
|
| | | | | | Instance | | | | |
|
|
+==================+=====+============+==============+========+==========+========+======+=============+=================+
|
|
| 198.19.16.0/32 | 0 | ospfv2 | ospf_mgr | True | default | 1 | 10 | 198.19.17.0 | ethernet-1/29.0 |
|
|
| | | | | | | | | (direct) | |
|
|
| 198.19.16.1/32 | 7 | host | net_inst_mgr | True | default | 0 | 0 | None | None |
|
|
| 198.19.17.0/31 | 6 | local | net_inst_mgr | True | default | 0 | 0 | 198.19.17.1 | ethernet-1/29.0 |
|
|
| | | | | | | | | (direct) | |
|
|
| 198.19.17.1/32 | 6 | host | net_inst_mgr | True | default | 0 | 0 | None | None |
|
|
+==================+=====+============+==============+========+==========+========+======+=============+=================+
|
|
|
|
A:pim@nikhef# ping network-instance default 198.19.16.0
|
|
Using network instance default
|
|
PING 198.19.16.0 (198.19.16.0) 56(84) bytes of data.
|
|
64 bytes from 198.19.16.0: icmp_seq=1 ttl=64 time=0.484 ms
|
|
64 bytes from 198.19.16.0: icmp_seq=2 ttl=64 time=0.663 ms
|
|
```
|
|
|
|
Delicious! OSPF has learned the loopback, and it is now reachable. As with most things, going from 0
|
|
to 1 (in this case: understanding how SR Linux works at all) is the most difficult part. Then going
|
|
from 1 to 2 is critical (in this case: making two routers interact with OSPF), but from there on,
|
|
going from 2 to N is easy (in my case: enabling several other point-to-point /31 transit networks on
|
|
the _nikhef_ router, using ethernet-1/1.0 through ethernet-1/4.0 with the correct MTU and turning on OSPF
|
|
for these), makes the whole network shoot to life. Slick!
|
|
|
|
#### Underlay: Arista
|
|
|
|
I'll point out that one of the devices in this topology is an Arista. We have several of these ready
|
|
for deployment at FrysIX. They are a lot more affordable and easy to find on the second hand /
|
|
refurbished market. These switches come with 32x100G ports, and are really good at packet slinging
|
|
because they're based on the Broadcom _Tomahawk_ chipset. They pack a few less faetures than the
|
|
_Trident_ chipset that powers the Nokia, but they happen to have all the features we need to run our
|
|
internet exchange . So I turn my attention to the Arista in the topology. I am much more
|
|
comfortable configuring the whole thing here, as it's not my first time touching these devices:
|
|
|
|
```
|
|
arista-leaf#show run int loop0
|
|
interface Loopback0
|
|
ip address 198.19.16.2/32
|
|
ip ospf area 0.0.0.0
|
|
arista-leaf#show run int Ethernet32/1
|
|
interface Ethernet32/1
|
|
description Core: Connected to nikhef:ethernet-1/2
|
|
load-interval 1
|
|
mtu 9190
|
|
no switchport
|
|
ip address 198.19.17.5/31
|
|
ip ospf cost 1000
|
|
ip ospf network point-to-point
|
|
ip ospf area 0.0.0.0
|
|
arista-leaf#show run section router ospf
|
|
router ospf 65500
|
|
router-id 198.19.16.2
|
|
redistribute connected
|
|
network 198.19.0.0/16 area 0.0.0.0
|
|
max-lsa 12000
|
|
```
|
|
|
|
I complete the configuration for the other two core ports on this Arista, port Eth31/1 connects also
|
|
to the _nikhef_ IXR-7220-D4 and I give it a high cost of 1000, while Eth30/1 connects only 1x100G to
|
|
the _nokia-leaf_ IXR-7220-D2 with a cost of 10.
|
|
It's nice to see that OSPF in action - there are two equal path (but high cost) OSPF paths via
|
|
router-id 198.19.16.1 (nikhef), and there's one lower cost path via router-id 198.19.16.3
|
|
(nokia-leaf). The traceroute nicely shows the scenic route (arista-leaf -> nokia-leaf -> nokia ->
|
|
equinix).
|
|
```
|
|
arista-leaf#show ip ospf nei
|
|
Neighbor ID Instance VRF Pri State Dead Time Address Interface
|
|
198.19.16.1 65500 default 1 FULL 00:00:36 198.19.17.4 Ethernet32/1
|
|
198.19.16.3 65500 default 1 FULL 00:00:31 198.19.17.11 Ethernet30/1
|
|
198.19.16.1 65500 default 1 FULL 00:00:35 198.19.17.2 Ethernet31/1
|
|
|
|
arista-leaf#traceroute 198.19.16.0
|
|
traceroute to 198.19.16.0 (198.19.16.0), 30 hops max, 60 byte packets
|
|
1 198.19.17.11 (198.19.17.11) 0.220 ms 0.150 ms 0.206 ms
|
|
2 198.19.17.6 (198.19.17.6) 0.169 ms 0.107 ms 0.099 ms
|
|
3 198.19.16.0 (198.19.16.0) 0.434 ms 0.346 ms 0.303 ms
|
|
```
|
|
|
|
So far, so good! The _underlay_ is up, every router can reach every other router on its loopback,
|
|
and all OSPF adjacencies are formed. I'll leave the 2x100G between _nikhef_ and _arista-leaf_ at
|
|
high cost for now.
|
|
|
|
#### Overlay EVPN: SR Linux
|
|
|
|
The big-picture idea here is to use iBGP with the same AS number, and because there are two main
|
|
facilities (NIKHEF and Equinix), make each of those bigger IXR-7220-D4 routers act as
|
|
route-reflectors for others. It means that they will have an iBGP session amongst themselves
|
|
(198.191.16.0 <-> 198.19.16.1) and otherwise accept iBGP sessions from any IP address in the
|
|
198.19.16.0/24 subnet. This way, I don't have to configure any more than strictly necessary on the
|
|
core routers. Any new router can just plug in, form an OSPF adjacency, and connect to both core
|
|
routers. I proceed to configure BGP on the Nokia's like this:
|
|
```
|
|
A:pim@nikhef# / network-instance default protocols bgp
|
|
A:pim@nikhef# set admin-state enable
|
|
A:pim@nikhef# set autonomous-system 65500
|
|
A:pim@nikhef# set router-id 198.19.16.1
|
|
A:pim@nikhef# set dynamic-neighbors accept match 198.19.16.0/24 peer-group overlay
|
|
A:pim@nikhef# set afi-safi evpn admin-state enable
|
|
A:pim@nikhef# set preference ibgp 170
|
|
A:pim@nikhef# set route-advertisement rapid-withdrawal true
|
|
A:pim@nikhef# set route-advertisement wait-for-fib-install false
|
|
A:pim@nikhef# set group overlay peer-as 65500
|
|
A:pim@nikhef# set group overlay afi-safi evpn admin-state enable
|
|
A:pim@nikhef# set group overlay afi-safi ipv4-unicast admin-state disable
|
|
A:pim@nikhef# set group overlay afi-safi ipv6-unicast admin-state disable
|
|
A:pim@nikhef# set group overlay local-as as-number 65500
|
|
A:pim@nikhef# set group overlay route-reflector client true
|
|
A:pim@nikhef# set group overlay transport local-address 198.19.16.1
|
|
A:pim@nikhef# set neighbor 198.19.16.0 admin-state enable
|
|
A:pim@nikhef# set neighbor 198.19.16.0 peer-group overlay
|
|
A:pim@nikhef# commit stay
|
|
```
|
|
|
|
I can see that iBGP sessions establish between all the devices:
|
|
|
|
```
|
|
A:pim@nikhef# show network-instance default protocols bgp neighbor
|
|
---------------------------------------------------------------------------------------------------------------------------
|
|
BGP neighbor summary for network-instance "default"
|
|
Flags: S static, D dynamic, L discovered by LLDP, B BFD enabled, - disabled, * slow
|
|
---------------------------------------------------------------------------------------------------------------------------
|
|
---------------------------------------------------------------------------------------------------------------------------
|
|
+-------------+-------------+----------+-------+----------+-------------+---------------+------------+--------------------+
|
|
| Net-Inst | Peer | Group | Flags | Peer-AS | State | Uptime | AFI/SAFI | [Rx/Active/Tx] |
|
|
+=============+=============+==========+=======+==========+=============+===============+============+====================+
|
|
| default | 198.19.16.0 | overlay | S | 65500 | established | 0d:0h:2m:32s | evpn | [0/0/0] |
|
|
| default | 198.19.16.2 | overlay | D | 65500 | established | 0d:0h:2m:27s | evpn | [0/0/0] |
|
|
| default | 198.19.16.3 | overlay | D | 65500 | established | 0d:0h:2m:41s | evpn | [0/0/0] |
|
|
+-------------+-------------+----------+-------+----------+-------------+---------------+------------+--------------------+
|
|
---------------------------------------------------------------------------------------------------------------------------
|
|
Summary:
|
|
1 configured neighbors, 1 configured sessions are established, 0 disabled peers
|
|
2 dynamic peers
|
|
```
|
|
|
|
A few things to note here - there one _configured_ neighbor (this is the other IXR-7220-D4 router),
|
|
and two _dynamic_ peers, these are the Arista and the smaller IXR-7220-D2 router. The only address
|
|
family that they are exchanging information for is the _evpn_ family, and no prefixes have been
|
|
learned or sent yet (that's the `[0/0/0]` designation in the last column).
|
|
|
|
#### Overlay EVPN: Arista
|
|
|
|
The Arista is also remarkably straight forward to configure. Here, I'll simply enable the iBGP
|
|
session as follows:
|
|
|
|
```
|
|
arista-leaf#show run section bgp
|
|
router bgp 65500
|
|
neighbor evpn peer group
|
|
neighbor evpn remote-as 65500
|
|
neighbor evpn update-source Loopback0
|
|
neighbor evpn ebgp-multihop 3
|
|
neighbor evpn send-community extended
|
|
neighbor evpn maximum-routes 12000 warning-only
|
|
neighbor 198.19.16.0 peer group evpn
|
|
neighbor 198.19.16.1 peer group evpn
|
|
!
|
|
address-family evpn
|
|
neighbor evpn activate
|
|
|
|
arista-leaf#show bgp summary
|
|
BGP summary information for VRF default
|
|
Router identifier 198.19.16.2, local AS number 65500
|
|
Neighbor AS Session State AFI/SAFI AFI/SAFI State NLRI Rcd NLRI Acc
|
|
----------- ----------- ------------- ----------------------- -------------- ---------- ----------
|
|
198.19.16.0 65500 Established IPv4 Unicast Advertised 0 0
|
|
198.19.16.0 65500 Established L2VPN EVPN Negotiated 0 0
|
|
198.19.16.1 65500 Established IPv4 Unicast Advertised 0 0
|
|
198.19.16.1 65500 Established L2VPN EVPN Negotiated 0 0
|
|
```
|
|
|
|
On this leaf node, I'll have a redundant iBGP session with the two core nodes. Since those core
|
|
nodes are peering amongst themselves, and are configured as route-reflectors, this is all I need. No
|
|
matter how many additional Arista (or Nokia) devices I add to the network, all they'll have to do is
|
|
enable OSPF (so they can reach 198.19.16.0 and .1) and turn on iBGP sesions with both core routers.
|
|
Voila!
|
|
|
|
#### VXLAN EVPN: SR Linux
|
|
|
|
Nokia documentation informs me that SR Linux uses a special interface called _system0_ to source its
|
|
VXLAN traffic from, and add the interface to the _default_ network-instance. So it's a matter of
|
|
defining that interface and associate a VXLAN interface with it, like so:
|
|
|
|
```
|
|
A:pim@nikhef# set / interface system0 admin-state enable
|
|
A:pim@nikhef# set / interface system0 subinterface 0 admin-state enable
|
|
A:pim@nikhef# set / interface system0 subinterface 0 ipv4 admin-state enable
|
|
A:pim@nikhef# set / interface system0 subinterface 0 ipv4 address 198.19.18.1/32
|
|
A:pim@nikhef# set / network-instance default interface system0.0
|
|
A:pim@nikhef# set / tunnel-interface vxlan1 vxlan-interface 2604 type bridged
|
|
A:pim@nikhef# set / tunnel-interface vxlan1 vxlan-interface 2604 ingress vni 2604
|
|
A:pim@nikhef# set / tunnel-interface vxlan1 vxlan-interface 2604 egress source-ip use-system-ipv4-address
|
|
A:pim@nikhef# commit stay
|
|
```
|
|
|
|
This creates the plumbing for a VXLAN sub-interface called `vxlan1.2604` which will accept/send
|
|
traffic using VNI 2604 (this happens to be the VLAN id we use at FrysIX for our production Peering
|
|
LAN), and it'll use the `system0.0` address to source that traffic from.
|
|
|
|
The second part is to create what SR Linux calls a MAC-VRF and put some interface in it:
|
|
|
|
```
|
|
A:pim@nikhef# set / interface ethernet-1/9 admin-state enable
|
|
A:pim@nikhef# set / interface ethernet-1/9 breakout-mode num-breakout-ports 4
|
|
A:pim@nikhef# set / interface ethernet-1/9 breakout-mode breakout-port-speed 10G
|
|
A:pim@nikhef# set / interface ethernet-1/9/3 admin-state enable
|
|
A:pim@nikhef# set / interface ethernet-1/9/3 vlan-tagging true
|
|
A:pim@nikhef# set / interface ethernet-1/9/3 subinterface 0 type bridged
|
|
A:pim@nikhef# set / interface ethernet-1/9/3 subinterface 0 admin-state enable
|
|
A:pim@nikhef# set / interface ethernet-1/9/3 subinterface 0 vlan encap untagged
|
|
|
|
A:pim@nikhef# / network-instance peeringlan
|
|
A:pim@nikhef# set type mac-vrf
|
|
A:pim@nikhef# set admin-state enable
|
|
A:pim@nikhef# set interface ethernet-1/9/3.0
|
|
A:pim@nikhef# set vxlan-interface vxlan1.2604
|
|
A:pim@nikhef# set protocols bgp-evpn bgp-instance 1 admin-state enable
|
|
A:pim@nikhef# set protocols bgp-evpn bgp-instance 1 vxlan-interface vxlan1.2604
|
|
A:pim@nikhef# set protocols bgp-evpn bgp-instance 1 evi 2604
|
|
A:pim@nikhef# set protocols bgp-vpn bgp-instance 1 route-distinguisher rd 65500:2604
|
|
A:pim@nikhef# set protocols bgp-vpn bgp-instance 1 route-target export-rt target:65500:2604
|
|
A:pim@nikhef# set protocols bgp-vpn bgp-instance 1 route-target import-rt target:65500:2604
|
|
A:pim@nikhef# commit stay
|
|
```
|
|
|
|
In the first block here, Arend took what is a 100G port called `ethernet-1/9` and split it into 4x25G
|
|
ports. Arend forced the port speed to 10G because he has taken a 40G-4x10G DAC, and it happens that
|
|
the third lane is plugged into the Debian machine. So on `ethernet-1/9/3` I'll create a
|
|
sub-interface, make it type _bridged_ (which I've also done on `vxlan1.2604`!) and allow any
|
|
untagged traffic to enter it.
|
|
|
|
{{< image width="5em" float="left" src="/assets/shared/brain.png" alt="brain" >}}
|
|
|
|
If you, like me, are used to either VPP or IOS/XR, this type of sub-interface stuff should feel very
|
|
natural to you. I've written about the sub-interfaces logic on Cisco's IOS/XR and VPP approach in a
|
|
previous [[article]({{< ref 2022-02-14-vpp-vlan-gym.md >}})] which my buddy Fred lovingly calls
|
|
_VLAN Gymnastics_ because the ports are just so damn flexible. Worth a read!
|
|
|
|
The second block creates a new _network-instance_ which I'll name `peeringlan`, and it associates
|
|
the newly crated untagged sub-interface `ethernet-1/9/3.0` with with the VXLAN interface, and starts a
|
|
protocol for eVPN instructing traffic in and out of this network-instance to use EVI 2604 on the
|
|
VXLAN interface, and signalling of all MAC addresses learned to use route-distinguisher and
|
|
import/export route-targets. For simplicity I've just used the same for each: 65500:2604.
|
|
|
|
I continue to add an interface to the `peeringlan` _network-instance_ on the other two Nokia
|
|
routers: `ethernet-1/9/3.0` on the _equinix_ router and `ethernet-1/9.0` on the _nokia-leaf_ router.
|
|
Each of these goes to a 10Gbps port on a Debian machine.
|
|
|
|
#### VXLAN EVPN: Arista
|
|
|
|
At this point I'm feeling pretty bullish about the whole project. Arista does not make it very
|
|
difficult on me to configure it for L2 EVPN (which is called MAC-VRF here also):
|
|
|
|
```
|
|
arista-leaf#conf t
|
|
vlan 2604
|
|
name v-peeringlan
|
|
interface Ethernet9/3
|
|
speed forced 10000full
|
|
switchport access vlan 2604
|
|
|
|
interface Loopback1
|
|
ip address 198.19.18.2/32
|
|
interface Vxlan1
|
|
vxlan source-interface Loopback1
|
|
vxlan udp-port 4789
|
|
vxlan vlan 2604 vni 2604
|
|
```
|
|
|
|
After creating VLAN 2604 on making port Eth9/3 an access port in that VLAN, I'll add a VTEP endpoint
|
|
called `Loopback1`, and a VXLAN interface that uses that to source its traffic. Here, I'll associate
|
|
local VLAN 2604 with the `Vxlan1` and its VNI 2604, to match up with how I configured the Nokias
|
|
previously.
|
|
|
|
Finally, it's a matter of tying these together by announcing the MAC addresses into the EVPN iBGP
|
|
sessions:
|
|
```
|
|
arista-leaf#conf t
|
|
router bgp 65500
|
|
vlan 2604
|
|
rd 65500:2604
|
|
route-target both 65500:2604
|
|
redistribute learned
|
|
!
|
|
```
|
|
|
|
### Results
|
|
|
|
To validate the configurations, I learn a cool trick from my buddy Andy on the SR Linux discord
|
|
server. In EOS, I can ask it to check for any obvious mistakes in two places:
|
|
|
|
```
|
|
arista-leaf#show vxlan config-sanity detail
|
|
Category Result Detail
|
|
---------------------------------- -------- --------------------------------------------------
|
|
Local VTEP Configuration Check OK
|
|
Loopback IP Address OK
|
|
VLAN-VNI Map OK
|
|
Flood List OK
|
|
Routing OK
|
|
VNI VRF ACL OK
|
|
Decap VRF-VNI Map OK
|
|
VRF-VNI Dynamic VLAN OK
|
|
Remote VTEP Configuration Check OK
|
|
Remote VTEP OK
|
|
Platform Dependent Check OK
|
|
VXLAN Bridging OK
|
|
VXLAN Routing OK VXLAN Routing not enabled
|
|
CVX Configuration Check OK
|
|
CVX Server OK Not in controller client mode
|
|
MLAG Configuration Check OK Run 'show mlag config-sanity' to verify MLAG config
|
|
Peer VTEP IP OK MLAG peer is not connected
|
|
MLAG VTEP IP OK
|
|
Peer VLAN-VNI OK
|
|
Virtual VTEP IP OK
|
|
MLAG Inactive State OK
|
|
|
|
arista-leaf#show bgp evpn sanity detail
|
|
Category Check Status Detail
|
|
-------- -------------------- ------ ------
|
|
General Send community OK
|
|
General Multi-agent mode OK
|
|
General Neighbor established OK
|
|
L2 MAC-VRF route-target OK
|
|
import and export
|
|
L2 MAC-VRF OK
|
|
route-distinguisher
|
|
L2 MAC-VRF redistribute OK
|
|
L2 MAC-VRF overlapping OK
|
|
VLAN
|
|
L2 Suppressed MAC OK
|
|
VXLAN VLAN to VNI map for OK
|
|
MAC-VRF
|
|
VXLAN VRF to VNI map for OK
|
|
IP-VRF
|
|
```
|
|
|
|
#### Results: Arista view
|
|
|
|
Inspecting the MAC addresses learned from all four of the client ports on the Debian machine is
|
|
easy:
|
|
|
|
```
|
|
arista-leaf#show bgp evpn summary
|
|
BGP summary information for VRF default
|
|
Router identifier 198.19.16.2, local AS number 65500
|
|
Neighbor Status Codes: m - Under maintenance
|
|
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
|
|
198.19.16.0 4 65500 3311 3867 0 0 18:06:28 Estab 7 7
|
|
198.19.16.1 4 65500 3308 3873 0 0 18:06:28 Estab 7 7
|
|
|
|
arista-leaf#show bgp evpn vni 2604 next-hop 198.19.18.3
|
|
BGP routing table information for VRF default
|
|
Router identifier 198.19.16.2, local AS number 65500
|
|
Route status codes: * - valid, > - active, S - Stale, E - ECMP head, e - ECMP
|
|
c - Contributing to ECMP, % - Pending BGP convergence
|
|
Origin codes: i - IGP, e - EGP, ? - incomplete
|
|
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
|
|
|
|
Network Next Hop Metric LocPref Weight Path
|
|
* >Ec RD: 65500:2604 mac-ip e43a.6e5f.0c59
|
|
198.19.18.3 - 100 0 i Or-ID: 198.19.16.3 C-LST: 198.19.16.1
|
|
* ec RD: 65500:2604 mac-ip e43a.6e5f.0c59
|
|
198.19.18.3 - 100 0 i Or-ID: 198.19.16.3 C-LST: 198.19.16.0
|
|
* >Ec RD: 65500:2604 imet 198.19.18.3
|
|
198.19.18.3 - 100 0 i Or-ID: 198.19.16.3 C-LST: 198.19.16.1
|
|
* ec RD: 65500:2604 imet 198.19.18.3
|
|
198.19.18.3 - 100 0 i Or-ID: 198.19.16.3 C-LST: 198.19.16.0
|
|
```
|
|
There's a lot to unpack here! The Arista is seeing that from the _route-distinguisher_ I configured
|
|
on all the sessions, it is learning one MAC address on neighbor 198.19.18.3 (this is the VTEP for
|
|
the _nokia-leaf_ router) from both iBGP sessions. The MAC address is learned from originator
|
|
198.19.16.3 (the loopback of the nokia-leaf router), from two cluster members, the _active_ one on
|
|
iBGP speaker 198.19.16.1 (_nikhef_) and a backup member on 198.19.16.0 (_equinix_).
|
|
|
|
I can also see that there's a bunch of `imet` route entries, and Andy explained these to me. They are
|
|
a signal from a VTEP participant that they are interested in seeing multicast traffic (like neighbor
|
|
discovery or ARP requests) flooded to them. Every router participating in this L2VPN will raise such
|
|
an `imet` route, which I'll see in duplicates as well (one from each iBGP session). This checks out.
|
|
|
|
#### Results: SR Linux view
|
|
|
|
The Nokia IXR-7220-D4 router called _equinix_ has also learned a bunch of EVPN routing entries,
|
|
which I can inspect as follows:
|
|
|
|
```
|
|
A:pim@equinix# show network-instance default protocols bgp routes evpn route-type summary
|
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
Show report for the BGP route table of network-instance "default"
|
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
Status codes: u=used, *=valid, >=best, x=stale, b=backup
|
|
Origin codes: i=IGP, e=EGP, ?=incomplete
|
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
BGP Router ID: 198.19.16.0 AS: 65500 Local AS: 65500
|
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
Type 2 MAC-IP Advertisement Routes
|
|
+--------+---------------+--------+-------------------+------------+-------------+------+-------------+--------+--------------------------------+------------------+
|
|
| Status | Route- | Tag-ID | MAC-address | IP-address | neighbor | Path-| Next-Hop | Label | ESI | MAC Mobility |
|
|
| | distinguisher | | | | | id | | | | |
|
|
+========+===============+========+===================+============+=============+======+============-+========+================================+==================+
|
|
| u*> | 65500:2604 | 0 | E4:3A:6E:5F:0C:57 | 0.0.0.0 | 198.19.16.1 | 0 | 198.19.18.1 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
|
| * | 65500:2604 | 0 | E4:3A:6E:5F:0C:58 | 0.0.0.0 | 198.19.16.1 | 0 | 198.19.18.2 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
|
| u*> | 65500:2604 | 0 | E4:3A:6E:5F:0C:58 | 0.0.0.0 | 198.19.16.2 | 0 | 198.19.18.2 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
|
| * | 65500:2604 | 0 | E4:3A:6E:5F:0C:59 | 0.0.0.0 | 198.19.16.1 | 0 | 198.19.18.3 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
|
| u*> | 65500:2604 | 0 | E4:3A:6E:5F:0C:59 | 0.0.0.0 | 198.19.16.3 | 0 | 198.19.18.3 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
|
+--------+---------------+--------+-------------------+------------+-------------+------+-------------+--------+--------------------------------+------------------+
|
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
Type 3 Inclusive Multicast Ethernet Tag Routes
|
|
+--------+-----------------------------+--------+---------------------+-----------------+--------+-----------------------+
|
|
| Status | Route-distinguisher | Tag-ID | Originator-IP | neighbor | Path- | Next-Hop |
|
|
| | | | | | id | |
|
|
+========+=============================+========+=====================+=================+========+=======================+
|
|
| u*> | 65500:2604 | 0 | 198.19.18.1 | 198.19.16.1 | 0 | 198.19.18.1 |
|
|
| * | 65500:2604 | 0 | 198.19.18.2 | 198.19.16.1 | 0 | 198.19.18.2 |
|
|
| u*> | 65500:2604 | 0 | 198.19.18.2 | 198.19.16.2 | 0 | 198.19.18.2 |
|
|
| * | 65500:2604 | 0 | 198.19.18.3 | 198.19.16.1 | 0 | 198.19.18.3 |
|
|
| u*> | 65500:2604 | 0 | 198.19.18.3 | 198.19.16.3 | 0 | 198.19.18.3 |
|
|
+--------+-----------------------------+--------+---------------------+-----------------+--------+-----------------------+
|
|
--------------------------------------------------------------------------------------------------------------------------
|
|
0 Ethernet Auto-Discovery routes 0 used, 0 valid
|
|
5 MAC-IP Advertisement routes 3 used, 5 valid
|
|
5 Inclusive Multicast Ethernet Tag routes 3 used, 5 valid
|
|
0 Ethernet Segment routes 0 used, 0 valid
|
|
0 IP Prefix routes 0 used, 0 valid
|
|
0 Selective Multicast Ethernet Tag routes 0 used, 0 valid
|
|
0 Selective Multicast Membership Report Sync routes 0 used, 0 valid
|
|
0 Selective Multicast Leave Sync routes 0 used, 0 valid
|
|
--------------------------------------------------------------------------------------------------------------------------
|
|
```
|
|
|
|
I have to say, SR Linux is incredibly chatty! But, I can see all the relevant bits and bobs here.
|
|
Each MAC-IP entry is accounted for, I can see several nexthops pointing at the nikhef switch, one
|
|
pointing at the nokia-leaf router and one pointing at the Arista switch. I also see the IMET
|
|
entries. One thing to note -- the SR Linux implementation leaves the type-2 routes empty with a
|
|
0.0.0.0 IPv4 address, while the Arista (in my opinion, more correctly) leaves them as NULL
|
|
(unspecified). But, everything looks great!
|
|
|
|
#### Results: Debian view
|
|
|
|
There's one more thing to show, and that's kind of the 'proof is in the pudding' moment. As I said,
|
|
Arend hooked up a Debian machine with an Intel X710-DA4 network card, which sports 4x10G SFP+
|
|
connections. This network card is a regular in my AS8298 network, as it has excellent DPDK support
|
|
and can pump easily 40Mpps with VPP. IPng 🥰 Intel X710!
|
|
|
|
```
|
|
root@debian:~ # ip netns add nikhef
|
|
root@debian:~ # ip link set enp1s0f0 netns nikhef
|
|
root@debian:~ # ip netns exec nikhef ip link set enp1s0f0 up mtu 9000
|
|
root@debian:~ # ip netns exec nikhef ip addr add 192.0.2.10/24 dev enp1s0f0
|
|
root@debian:~ # ip netns exec nikhef ip addr add 2001:db8::10/64 dev enp1s0f0
|
|
|
|
root@debian:~ # ip netns add arista-leaf
|
|
root@debian:~ # ip link set enp1s0f1 netns arista-leaf
|
|
root@debian:~ # ip netns exec arista-leaf ip link set enp1s0f1 up mtu 9000
|
|
root@debian:~ # ip netns exec arista-leaf ip addr add 192.0.2.11/24 dev enp1s0f1
|
|
root@debian:~ # ip netns exec arista-leaf ip addr add 2001:db8::11/64 dev enp1s0f1
|
|
|
|
root@debian:~ # ip netns add nokia-leaf
|
|
root@debian:~ # ip link set enp1s0f2 netns nokia-leaf
|
|
root@debian:~ # ip netns exec nokia-leaf ip link set enp1s0f2 up mtu 9000
|
|
root@debian:~ # ip netns exec nokia-leaf ip addr add 192.0.2.12/24 dev enp1s0f2
|
|
root@debian:~ # ip netns exec nokia-leaf ip addr add 2001:db8::12/64 dev enp1s0f2
|
|
|
|
root@debian:~ # ip netns add equinix
|
|
root@debian:~ # ip link set enp1s0f3 netns equinix
|
|
root@debian:~ # ip netns exec equinix ip link set enp1s0f3 up mtu 9000
|
|
root@debian:~ # ip netns exec equinix ip addr add 192.0.2.13/24 dev enp1s0f3
|
|
root@debian:~ # ip netns exec equinix ip addr add 2001:db8::13/64 dev enp1s0f3
|
|
|
|
root@debian:~# ip netns exec nikhef fping -g 192.0.2.8/29
|
|
192.0.2.10 is alive
|
|
192.0.2.11 is alive
|
|
192.0.2.12 is alive
|
|
192.0.2.13 is alive
|
|
|
|
root@debian:~# ip netns exec arista-leaf fping 2001:db8::10 2001:db8::11 2001:db8::12 2001:db8::13
|
|
2001:db8::10 is alive
|
|
2001:db8::11 is alive
|
|
2001:db8::12 is alive
|
|
2001:db8::13 is alive
|
|
|
|
root@debian:~# ip netns exec equinix ip nei
|
|
192.0.2.10 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:57 STALE
|
|
192.0.2.11 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:58 STALE
|
|
192.0.2.12 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:59 STALE
|
|
fe80::e63a:6eff:fe5f:c57 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:57 STALE
|
|
fe80::e63a:6eff:fe5f:c58 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:58 STALE
|
|
fe80::e63a:6eff:fe5f:c59 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:59 STALE
|
|
2001:db8::10 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:57 STALE
|
|
2001:db8::11 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:58 STALE
|
|
2001:db8::12 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:59 STALE
|
|
```
|
|
|
|
The Debian machine puts each network card into its own network namespace, and gives it both an IPv4
|
|
and an IPv6 address. I can then enter the `nikhef` network namespace, which has its NIC connected to
|
|
the IXR-7220-D4 router called _nikhef_, and ping all four endpoints. Similarly, I can enter the
|
|
`arista-leaf` namespace and ping6 all four endpoints. Finally, I take a look at the IPv6 and IPv4
|
|
neighbor table on the network card that is connected to the Equinix router. All three MAC addresses are
|
|
seen. This proves end to end connectivity across the EVPN VXLAN, and full interoperability.
|
|
|
|
Performance? We got that!
|
|
```
|
|
root@debian:~# ip netns exec equinix iperf3 -c 192.0.2.12
|
|
Connecting to host 192.0.2.12, port 5201
|
|
[ 5] local 192.0.2.10 port 34598 connected to 192.0.2.12 port 5201
|
|
[ ID] Interval Transfer Bitrate Retr Cwnd
|
|
[ 5] 0.00-1.00 sec 1.15 GBytes 9.91 Gbits/sec 19 1.52 MBytes
|
|
[ 5] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 3 1.54 MBytes
|
|
[ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 1 1.54 MBytes
|
|
[ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 1 1.54 MBytes
|
|
[ 5] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
|
[ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
|
[ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
|
[ 5] 7.00-8.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
|
[ 5] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
|
[ 5] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
|
- - - - - - - - - - - - - - - - - - - - - - - - -
|
|
[ ID] Interval Transfer Bitrate Retr
|
|
[ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec 24 sender
|
|
[ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec receiver
|
|
|
|
iperf Done.
|
|
```
|
|
|
|
## What's Next
|
|
|
|
There's a few improvements I can make before deploying this architecture to the internet exchange.
|
|
Notably:
|
|
* the functional equivalent of _port security_, that is to say only allowing one or two MAC
|
|
addresses per member port. FrysIX has a strict one-port-one-member-one-MAC rule, and having port
|
|
security will greatly improve our resilience.
|
|
* SR Linux has the ability to suppress ARP, _even on L2 MAC-VRF_! It's relatively well known for
|
|
IRB based setups, but adding this to transparent bridge-domains is possible in Nokia
|
|
[[ref](https://documentation.nokia.com/srlinux/22-6/SR_Linux_Book_Files/EVPN-VXLAN_Guide/services-evpn-vxlan-l2.html#configuring_evpn_learning_for_proxy_arp)],
|
|
using the syntax of `protocols bgp-evpn bgp-instance 1 routes bridge-table mac-ip advertise
|
|
true`. This will glean the IP addresses based on intercepted ARP requests, and reduce the need for
|
|
BUM flooding.
|
|
* Andy informs me that Arista also has this feature. By setting 'router l2-vpn' and 'arp learning bridged',
|
|
the suppression of ARP requests/replies also works in the same way. This greatly reduces cross
|
|
router BUM flooding. If DE-CIX can do it, so can FrysIX :)
|
|
* some automation - although configuring the MAC-VRF across Arista and SR Linux is definitely not
|
|
as difficult as I thought, having some automation in place will avoid errors and mistakes. It
|
|
would suck if the IXP collapsed because I botched a link drain or PNI configuration!
|
|
|
|
### Acknowledgements
|
|
|
|
I am relatively new to EVPN configurations, and wanted to give a shoutout to Andy Whitaker who
|
|
jumped in very quickly when I asked a question on the SR Linux Discord. He was gracious with his
|
|
time and spent a few hours on a video call with me, explaining EVPN in great detail both for Arista
|
|
as well as SR Linux, and in particular wanted to give a big "Thank you!" for helping me understand
|
|
symmetric and asymmetric IRB in the context of multivendor EVPN. Andy is about to start a new job at
|
|
Nokia, and I wish him all the best. To my friends at Nokia: you caught a good one, Andy is pure
|
|
gold!
|
|
|
|
I also want to thank Niek for helping me take my first baby steps onto this platform and patiently
|
|
answering my nerdly questions about the platform, the switch chip, and the configuration philosophy.
|
|
Learning a new NOS is always a fun task, and it was made super fun because Niek spent an hour with
|
|
Arend and me on a video call, giving a bunch of operational tips and tricks along the way.
|
|
|
|
Finally, Arend and ERITAP are an absolute joy to work with. We took turns hacking on the lab, which
|
|
Arend made available for me while I am traveling to Mississippi this week. Thanks for the kWh and
|
|
OOB access, and for brainstorming the config with me!
|
|
|
|
### Reference configurations
|
|
|
|
Here's the configs for all machines in this demonstration:
|
|
[[nikhef](/assets/frys-ix/nikhef.conf)] | [[equinix](/assets/frys-ix/equinix.conf)] | [[nokia-leaf](/assets/frys-ix/nokia-leaf.conf)] | [[arista-leaf](/assets/frys-ix/arista-leaf.conf)]
|