Typo and readability
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
2025-04-10 01:16:19 -05:00
parent d0a7cdbe38
commit 3b7e576d20

View File

@ -76,7 +76,7 @@ notably they don't have to do provider edge functionality like VXLAN encap and d
Almost all of these designs are showing how one might build a leaf-spine network for hyperscale. Almost all of these designs are showing how one might build a leaf-spine network for hyperscale.
**Critique 1**: my 'spine' (IXR-7220-D4 routers) must also be provider edge. Practically speaking, **Critique 1**: my 'spine' (IXR-7220-D4 routers) must also be provider edge. Practically speaking,
in the picture above I have these beautiful Nokia IXR-7220-D4 switches, using two 400G ports to in the picture above I have these beautiful Nokia IXR-7220-D4 routers, using two 400G ports to
connect between the facilities, and six 100G ports to connect the smaller breakout switches. That connect between the facilities, and six 100G ports to connect the smaller breakout switches. That
would leave a _massive_ amount of capacity unused: 22x 100G and 6x400G ports, to be exact. would leave a _massive_ amount of capacity unused: 22x 100G and 6x400G ports, to be exact.
@ -88,14 +88,15 @@ It's much more economical to create a star-topology that minimizes cross-datacen
**Critique 3**: Most of these 'spine-leaf' reference architectures assume that the interior gateway **Critique 3**: Most of these 'spine-leaf' reference architectures assume that the interior gateway
protocol is eBGP in what they call the _underlay_, and on top of that, some secondary eBGP that's protocol is eBGP in what they call the _underlay_, and on top of that, some secondary eBGP that's
called the _overlay_. Frankly, such a design makes my head spin a little bit. These designs assume called the _overlay_. Frankly, such a design makes my head spin a little bit. These designs assume
hundreds of switches, in which case making use of one AS number per switch could make sense (as iBGP hundreds of switches, in which case making use of one AS number per switch could make sense, as iBGP
needs either a 'full mesh', or external route reflectors). needs either a 'full mesh', or external route reflectors.
**Critique 4**: These reference designs also make an assumption that all fiber is local and while **Critique 4**: These reference designs also make an assumption that all fiber is local and while
links can fail, it will be relatively rare to _drain_ a link. However, in cross-datacenter networks, optics and links can fail, it will be relatively rare to _drain_ a link. However, in
draining links for maintenance is very common, for example if the dark fiber provider needs to cross-datacenter networks, draining links for maintenance is very common, for example if the dark
perform maintenance. With these eBGP-over-eBGP connections, traffic engineering is more difficult fiber provider needs to perform repairs on a span that was damaged. With these eBGP-over-eBGP
than simply raising the OSPF (or IS-IS) cost of a link, to reroute traffic. connections, traffic engineering is more difficult than simply raising the OSPF (or IS-IS) cost of a
link, to reroute traffic.
Setting aside eVPN for a second, if I were to build an IP transport network, like I did when I built Setting aside eVPN for a second, if I were to build an IP transport network, like I did when I built
[[IPng Site Local]({{< ref 2023-03-11-mpls-core.md >}})], I would use a much more intuitive [[IPng Site Local]({{< ref 2023-03-11-mpls-core.md >}})], I would use a much more intuitive
@ -121,16 +122,16 @@ for the overlay! I have a feeling that some folks will dispise me for being cont
leave your comments below, and don't forget to like-and-subscribe :-) leave your comments below, and don't forget to like-and-subscribe :-)
Arend builds this topology for me in Jubbega - also known as FrysIX HQ. He takes the two Arend builds this topology for me in Jubbega - also known as FrysIX HQ. He takes the two
400G-capable switches and connects them. Then he takes an Arista DCS-7060CX switch (which is eVPN 400G-capable routers and connects them. Then he takes an Arista DCS-7060CX switch, which is eVPN
capable, with 32x100G ports, based on the Broadcom Tomahawk3 chipset), and a smaller Nokia capable, with 32x100G ports, based on the Broadcom Tomahawk3 chipset, and a smaller Nokia
IXR-7220-D2 (with 48x25G and 8x100G ports, based on the Trident3 chipset). He wires all of this up IXR-7220-D2 with 48x25G and 8x100G ports, based on the Trident3 chipset. He wires all of this up
to look like the picture on the right. to look like the picture on the right.
#### Underlay: Nokia's SR Linux #### Underlay: Nokia's SR Linux
We boot up the lab, verify that all the optics and links are up, and connect the management ports to We boot up the equipment, verify that all the optics and links are up, and connect the management
an OOB network that I can remotely log in to. This is the first time that either of us work on ports to an OOB network that I can remotely log in to. This is the first time that either of us work
Nokia, but I find it reasonably intuitive once I get a few tips and tricks from Niek. on Nokia, but I find it reasonably intuitive once I get a few tips and tricks from Niek.
``` ```
[pim@nikhef ~]$ sr_cli [pim@nikhef ~]$ sr_cli
@ -181,7 +182,7 @@ PING 198.19.17.1 (198.19.17.1) 9162(9190) bytes of data.
#### Underlay: SR Linux OSPF #### Underlay: SR Linux OSPF
OK, let's get these two Nokia routers to speak OSPF, so that they can reach each others' loopbacks. OK, let's get these two Nokia routers to speak OSPF, so that they can reach each other's loopback.
It's really easy: It's really easy:
``` ```
@ -195,11 +196,11 @@ A:pim@nikhef# set area 0.0.0.0 interface lo0.0 passive true
A:pim@nikhef# commit stay A:pim@nikhef# commit stay
``` ```
Similar to in JunOS, I can descend into a configuration scope (the first line goes into the Similar to in JunOS, I can descend into a configuration scope: the first line goes into the
_network-instance_ called `default` and then the _protocols_ called `ospf`, and then the _instance_ _network-instance_ called `default` and then the _protocols_ called `ospf`, and then the _instance_
called `default`. Subsequent `set` commands operate at this scope. Once I commit this configuration called `default`. Subsequent `set` commands operate at this scope. Once I commit this configuration
(on the _nikhef_ router and also the _equinix_ router, with its own unique router-id), OSPF shoots (on the _nikhef_ router and also the _equinix_ router, with its own unique router-id), OSPF quickly
to life immediately: shoots in action:
``` ```
A:pim@nikhef# show network-instance default protocols ospf neighbor A:pim@nikhef# show network-instance default protocols ospf neighbor
@ -241,8 +242,8 @@ Delicious! OSPF has learned the loopback, and it is now reachable. As with most
to 1 (in this case: understanding how SR Linux works at all) is the most difficult part. Then going to 1 (in this case: understanding how SR Linux works at all) is the most difficult part. Then going
from 1 to 2 is critical (in this case: making two routers interact with OSPF), but from there on, from 1 to 2 is critical (in this case: making two routers interact with OSPF), but from there on,
going from 2 to N is easy (in my case: enabling several other point-to-point /31 transit networks on going from 2 to N is easy (in my case: enabling several other point-to-point /31 transit networks on
the _nikhef_ router, using ethernet-1/1.0 through ethernet-1/4.0 with the correct MTU and turning on OSPF the _nikhef_ router, using `ethernet-1/1.0` through `ethernet-1/4.0` with the correct MTU and
for these), makes the whole network shoot to life. Slick! turning on OSPF for these), makes the whole network shoot to life. Slick!
#### Underlay: Arista #### Underlay: Arista
@ -277,13 +278,14 @@ router ospf 65500
max-lsa 12000 max-lsa 12000
``` ```
I complete the configuration for the other two core ports on this Arista, port Eth31/1 connects also I complete the configuration for the other two interfaces on this Arista, port Eth31/1 connects also
to the _nikhef_ IXR-7220-D4 and I give it a high cost of 1000, while Eth30/1 connects only 1x100G to to the _nikhef_ IXR-7220-D4 and I give it a high cost of 1000, while Eth30/1 connects only 1x100G to
the _nokia-leaf_ IXR-7220-D2 with a cost of 10. the _nokia-leaf_ IXR-7220-D2 with a cost of 10.
It's nice to see that OSPF in action - there are two equal path (but high cost) OSPF paths via It's nice to see that OSPF in action - there are two equal path (but high cost) OSPF paths via
router-id 198.19.16.1 (nikhef), and there's one lower cost path via router-id 198.19.16.3 router-id 198.19.16.1 (nikhef), and there's one lower cost path via router-id 198.19.16.3
(nokia-leaf). The traceroute nicely shows the scenic route (arista-leaf -> nokia-leaf -> nokia -> (_nokia-leaf_). The traceroute nicely shows the scenic route (arista-leaf -> nokia-leaf -> nokia ->
equinix). equinix). Dope!
``` ```
arista-leaf#show ip ospf nei arista-leaf#show ip ospf nei
Neighbor ID Instance VRF Pri State Dead Time Address Interface Neighbor ID Instance VRF Pri State Dead Time Address Interface
@ -304,13 +306,14 @@ high cost for now.
#### Overlay EVPN: SR Linux #### Overlay EVPN: SR Linux
The big-picture idea here is to use iBGP with the same AS number, and because there are two main The big-picture idea here is to use iBGP with the same private AS number, and because there are two
facilities (NIKHEF and Equinix), make each of those bigger IXR-7220-D4 routers act as main facilities (NIKHEF and Equinix), make each of those bigger IXR-7220-D4 routers act as
route-reflectors for others. It means that they will have an iBGP session amongst themselves route-reflectors for others. It means that they will have an iBGP session amongst themselves
(198.191.16.0 <-> 198.19.16.1) and otherwise accept iBGP sessions from any IP address in the (198.191.16.0 <-> 198.19.16.1) and otherwise accept iBGP sessions from any IP address in the
198.19.16.0/24 subnet. This way, I don't have to configure any more than strictly necessary on the 198.19.16.0/24 subnet. This way, I don't have to configure any more than strictly necessary on the
core routers. Any new router can just plug in, form an OSPF adjacency, and connect to both core core routers. Any new router can just plug in, form an OSPF adjacency, and connect to both core
routers. I proceed to configure BGP on the Nokia's like this: routers. I proceed to configure BGP on the Nokia's like this:
``` ```
A:pim@nikhef# / network-instance default protocols bgp A:pim@nikhef# / network-instance default protocols bgp
A:pim@nikhef# set admin-state enable A:pim@nikhef# set admin-state enable
@ -358,7 +361,7 @@ Summary:
A few things to note here - there one _configured_ neighbor (this is the other IXR-7220-D4 router), A few things to note here - there one _configured_ neighbor (this is the other IXR-7220-D4 router),
and two _dynamic_ peers, these are the Arista and the smaller IXR-7220-D2 router. The only address and two _dynamic_ peers, these are the Arista and the smaller IXR-7220-D2 router. The only address
family that they are exchanging information for is the _evpn_ family, and no prefixes have been family that they are exchanging information for is the _evpn_ family, and no prefixes have been
learned or sent yet (that's the `[0/0/0]` designation in the last column). learned or sent yet, shown by the `[0/0/0]` designation in the last column.
#### Overlay EVPN: Arista #### Overlay EVPN: Arista
@ -400,7 +403,7 @@ Voila!
#### VXLAN EVPN: SR Linux #### VXLAN EVPN: SR Linux
Nokia documentation informs me that SR Linux uses a special interface called _system0_ to source its Nokia documentation informs me that SR Linux uses a special interface called _system0_ to source its
VXLAN traffic from, and add the interface to the _default_ network-instance. So it's a matter of VXLAN traffic from, and to add this interface to the _default_ network-instance. So it's a matter of
defining that interface and associate a VXLAN interface with it, like so: defining that interface and associate a VXLAN interface with it, like so:
``` ```
@ -459,10 +462,11 @@ previous [[article]({{< ref 2022-02-14-vpp-vlan-gym.md >}})] which my buddy Fred
_VLAN Gymnastics_ because the ports are just so damn flexible. Worth a read! _VLAN Gymnastics_ because the ports are just so damn flexible. Worth a read!
The second block creates a new _network-instance_ which I'll name `peeringlan`, and it associates The second block creates a new _network-instance_ which I'll name `peeringlan`, and it associates
the newly crated untagged sub-interface `ethernet-1/9/3.0` with with the VXLAN interface, and starts a the newly crated untagged sub-interface `ethernet-1/9/3.0` with the VXLAN interface, and starts a
protocol for eVPN instructing traffic in and out of this network-instance to use EVI 2604 on the protocol for eVPN instructing traffic in and out of this network-instance to use EVI 2604 on the
VXLAN interface, and signalling of all MAC addresses learned to use route-distinguisher and VXLAN sub-interface, and signalling of all MAC addresses learned to use the specified
import/export route-targets. For simplicity I've just used the same for each: 65500:2604. route-distinguisher and import/export route-targets. For simplicity I've just used the same for
each: 65500:2604.
I continue to add an interface to the `peeringlan` _network-instance_ on the other two Nokia I continue to add an interface to the `peeringlan` _network-instance_ on the other two Nokia
routers: `ethernet-1/9/3.0` on the _equinix_ router and `ethernet-1/9.0` on the _nokia-leaf_ router. routers: `ethernet-1/9/3.0` on the _equinix_ router and `ethernet-1/9.0` on the _nokia-leaf_ router.
@ -592,7 +596,7 @@ AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Li
There's a lot to unpack here! The Arista is seeing that from the _route-distinguisher_ I configured There's a lot to unpack here! The Arista is seeing that from the _route-distinguisher_ I configured
on all the sessions, it is learning one MAC address on neighbor 198.19.18.3 (this is the VTEP for on all the sessions, it is learning one MAC address on neighbor 198.19.18.3 (this is the VTEP for
the _nokia-leaf_ router) from both iBGP sessions. The MAC address is learned from originator the _nokia-leaf_ router) from both iBGP sessions. The MAC address is learned from originator
198.19.16.3 (the loopback of the nokia-leaf router), from two cluster members, the _active_ one on 198.19.16.3 (the loopback of the _nokia-leaf_ router), from two cluster members, the active one on
iBGP speaker 198.19.16.1 (_nikhef_) and a backup member on 198.19.16.0 (_equinix_). iBGP speaker 198.19.16.1 (_nikhef_) and a backup member on 198.19.16.0 (_equinix_).
I can also see that there's a bunch of `imet` route entries, and Andy explained these to me. They are I can also see that there's a bunch of `imet` route entries, and Andy explained these to me. They are
@ -650,9 +654,9 @@ Type 3 Inclusive Multicast Ethernet Tag Routes
-------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------
``` ```
I have to say, SR Linux is incredibly chatty! But, I can see all the relevant bits and bobs here. I have to say, SR Linux output is incredibly verbose! But, I can see all the relevant bits and bobs
Each MAC-IP entry is accounted for, I can see several nexthops pointing at the nikhef switch, one here. Each MAC-IP entry is accounted for, I can see several nexthops pointing at the nikhef switch,
pointing at the nokia-leaf router and one pointing at the Arista switch. I also see the IMET one pointing at the nokia-leaf router and one pointing at the Arista switch. I also see the `imet`
entries. One thing to note -- the SR Linux implementation leaves the type-2 routes empty with a entries. One thing to note -- the SR Linux implementation leaves the type-2 routes empty with a
0.0.0.0 IPv4 address, while the Arista (in my opinion, more correctly) leaves them as NULL 0.0.0.0 IPv4 address, while the Arista (in my opinion, more correctly) leaves them as NULL
(unspecified). But, everything looks great! (unspecified). But, everything looks great!
@ -713,14 +717,14 @@ fe80::e63a:6eff:fe5f:c59 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:59 STALE
2001:db8::12 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:59 STALE 2001:db8::12 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:59 STALE
``` ```
The Debian machine puts each network card into its own network namespace, and gives it both an IPv4 The Debian machine puts each network card into its own network namespace, and gives them both an IPv4
and an IPv6 address. I can then enter the `nikhef` network namespace, which has its NIC connected to and an IPv6 address. I can then enter the `nikhef` network namespace, which has its NIC connected to
the IXR-7220-D4 router called _nikhef_, and ping all four endpoints. Similarly, I can enter the the IXR-7220-D4 router called _nikhef_, and ping all four endpoints. Similarly, I can enter the
`arista-leaf` namespace and ping6 all four endpoints. Finally, I take a look at the IPv6 and IPv4 `arista-leaf` namespace and ping6 all four endpoints. Finally, I take a look at the IPv6 and IPv4
neighbor table on the network card that is connected to the Equinix router. All three MAC addresses are neighbor table on the network card that is connected to the _equinix_ router. All three MAC addresses are
seen. This proves end to end connectivity across the EVPN VXLAN, and full interoperability. seen. This proves end to end connectivity across the EVPN VXLAN, and full interoperability. Booyah!
Performance? We got that! Performance? We got that! I'm not worried as these Nokia routers are rated for 12.8Tbps of VXLAN....
``` ```
root@debian:~# ip netns exec equinix iperf3 -c 192.0.2.12 root@debian:~# ip netns exec equinix iperf3 -c 192.0.2.12
Connecting to host 192.0.2.12, port 5201 Connecting to host 192.0.2.12, port 5201
@ -757,9 +761,9 @@ Notably:
using the syntax of `protocols bgp-evpn bgp-instance 1 routes bridge-table mac-ip advertise using the syntax of `protocols bgp-evpn bgp-instance 1 routes bridge-table mac-ip advertise
true`. This will glean the IP addresses based on intercepted ARP requests, and reduce the need for true`. This will glean the IP addresses based on intercepted ARP requests, and reduce the need for
BUM flooding. BUM flooding.
* Andy informs me that Arista also has this feature. By setting 'router l2-vpn' and 'arp learning bridged', * Andy informs me that Arista also has this feature. By setting `router l2-vpn` and `arp learning bridged`,
the suppression of ARP requests/replies also works in the same way. This greatly reduces cross the suppression of ARP requests/replies also works in the same way. This greatly reduces cross-router
router BUM flooding. If DE-CIX can do it, so can FrysIX :) BUM flooding. If DE-CIX can do it, so can FrysIX :)
* some automation - although configuring the MAC-VRF across Arista and SR Linux is definitely not * some automation - although configuring the MAC-VRF across Arista and SR Linux is definitely not
as difficult as I thought, having some automation in place will avoid errors and mistakes. It as difficult as I thought, having some automation in place will avoid errors and mistakes. It
would suck if the IXP collapsed because I botched a link drain or PNI configuration! would suck if the IXP collapsed because I botched a link drain or PNI configuration!