781 lines
40 KiB
Markdown
781 lines
40 KiB
Markdown
---
|
|
date: "2022-02-21T09:35:14Z"
|
|
title: 'Review: Cisco ASR9006/RSP440-SE'
|
|
aliases:
|
|
- /s/articles/2022/02/21/asr9006.html
|
|
---
|
|
|
|
## Introduction
|
|
|
|
{{< image width="180px" float="right" src="/assets/asr9006/ipmax.png" alt="IP-Max" >}}
|
|
|
|
If you've read up on my articles, you'll know that I have deployed a [European Ring]({{< ref "2021-02-27-network" >}}),
|
|
which was reformatted late last year into [AS8298]({{< ref "2021-10-24-as8298" >}}) and upgraded to run
|
|
[VPP Routers]({{< ref "2021-09-21-vpp-7" >}}) with 10G between each city. IPng Networks rents these 10G point to point
|
|
virtual leased lines between each of our locations. It's a really great network, and it performs so well because it's
|
|
built on an EoMPLS underlay provided by [IP-Max](https://ip-max.net/). They, in turn, run carrier grade hardware in the
|
|
form of Cisco ASR9k. In part, we're such a good match together, because my choice of [VPP](https://fd.io/) on the IPng
|
|
Networks routers fits very well with Fred's choice of [IOS/XR](https://en.wikipedia.org/wiki/Cisco_IOS_XR) on the
|
|
IP-Max routers.
|
|
|
|
|
|
And if you follow us on Twitter (I post as [@IPngNetworks](https://twitter.com/IPngNetworks/)), you may have seen a
|
|
recent post where I upgraded an aging [ASR9006](https://www.cisco.com/c/en/us/support/routers/asr-9006-router/model.html)
|
|
with a significantly larger [ASR9010](https://www.cisco.com/c/en/us/support/routers/asr-9010-router/model.html). The ASR9006
|
|
was initially deployed at Equinix Zurich ZH05 in Oberenstringen near Zurich, Switzerland in 2015, which is seven years ago.
|
|
It has hauled countless packets from Zurich to Paris, Frankfurt and Lausanne. When it was deployed, it came with a
|
|
A9K-RSP-4G route switch processor, which in 2019 was upgraded to the A9K-RSP-8G, and after so many hours^W years of
|
|
runtime needed a replacement. Also, IP-Max was starting to run out of ports for the chassis, hence the upgrade.
|
|
|
|
{{< image width="300px" float="left" src="/assets/asr9006/staging.png" alt="IP-Max" >}}
|
|
|
|
If you're interested in the line-up, there's this epic reference guide from [Cisco Live!](https://www.cisco.com/c/en/us/td/docs/iosxr/asr9000/hardware-install/overview-reference/b-asr9k-overview-reference-guide/b-asr9k-overview-reference-guide_chapter_010.html#con_733653)
|
|
that shows a deep dive of the ASR9k architecture. The chassis and power supplies can host several generations of silicon,
|
|
and even mix-and-match generations. So IP-Max ordered a few new RSPs, and after deploying the ASR9010 at ZH05, we made
|
|
plans to redeploy this ASR9006 at NTT Zurich in Rümlang next to the airport, to replace an even older Cisco 7600
|
|
at that location. Seeing as we have to order XFP optics (IP-Max has some DWDM/CWDM links in service at NTT), we have
|
|
to park the chassis in and around Zurich. What better place to park it, than in my lab ? :-)
|
|
|
|
The IPng Networks laboratory is where I do most of my work on [VPP](https://fd.io/). The rack you see to the left here holds my coveted
|
|
Rhino and Hippo (two beefy AMD Ryzen 5950X machines with 100G network cards), and a few Dells that comprise my VPP
|
|
lab. There was not enough room, so I gave this little fridge a place just adjacent to the rack, connected with 10x 10Gbps
|
|
and serial and management ports.
|
|
|
|
I immediately had a little giggle when booting up the machine. It comes with 4x 3kW power supply slots (3 are installed),
|
|
and when booting the machine, I was happy that there was no debris laying on the side or back of the router, as its
|
|
fans create a veritable vortex of airflow. Also, overnight the temperature in my basement lab + office room raised a
|
|
few degrees. It's now nice and toasty in my office, no need for the heater in the winter. Yet the machine stays quite
|
|
cool at 26C intake, consuming 2.2KW _idle_ with each of the two route processor (RSP440) drawing 240 Watts, each of the
|
|
three 8x TenGigE blades drawing 575W each, and the 40x GigE blade drawing a respectable 320 Watts.
|
|
|
|
```
|
|
RP/0/RSP0/CPU0:fridge(admin)#show environment power-supply
|
|
R/S/I Power Supply Voltage Current
|
|
(W) (V) (A)
|
|
0/PS0/M1/* 741.1 54.9 13.5
|
|
0/PS0/M2/* 712.4 54.8 13.0
|
|
0/PS0/M3/* 765.8 55.1 13.9
|
|
--------------
|
|
Total: 2219.3
|
|
```
|
|
|
|
For reference, Rhino and Hippo draw approximately 265W each, but they come with 4x1G, 4x10G, 2x100G and forward ~300Mpps
|
|
when fully loaded. By the end of this article, I hope you'll see why this is a funny juxtaposition to me.
|
|
|
|
### Installing the ASR9006
|
|
|
|
The Cisco RSPs came to me new-in-refurbished-box. When booting, I had no idea what username/password was used for the
|
|
preinstall, and none of the standard passwords worked. So the first order of business is to take ownership of the
|
|
machine. I do this by putting both RSPs in _rommon_ (which is done by sending _Break_ after powercycling the machine --
|
|
my choice of _tio(1)_ has ***Ctrl-t b*** as the magic incantation). The first RSP (in slot 0) is then set to a different
|
|
`confreg 0x142`, while the other is kept in rommon so it doesn't boot and take over the machine. After booting, I'm
|
|
then presented with a root user setup dialog. I create a user `pim` with some temporary password, set back the configuration
|
|
register, and reload. When the RSP is about to boot, I release the standby RSP to catch up, and voila: I'm _In like Flynn._
|
|
|
|
Wiring this up - I connect Te0/0/0/0 to IPng's office switch on port sfp-sfpplus9, and I assign the router an IPv4 and IPv6
|
|
address. Then, I connect four Tengig ports to the lab switch, so that I can play around with loadtests a little bit.
|
|
After turning on LLDP, I can see the following physical view:
|
|
|
|
```
|
|
RP/0/RSP0/CPU0:fridge#show lldp neighbors
|
|
Sun Feb 20 19:14:21.775 UTC
|
|
Capability codes:
|
|
(R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device
|
|
(W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other
|
|
|
|
Device ID Local Intf Hold-time Capability Port ID
|
|
xsw1-btl Te0/0/0/0 120 B,R bridge/sfp-sfpplus9
|
|
fsw0 Te0/1/0/0 41 P,B,R TenGigabitEthernet 0/9
|
|
fsw0 Te0/1/0/1 41 P,B,R TenGigabitEthernet 0/10
|
|
fsw0 Te0/2/0/0 41 P,B,R TenGigabitEthernet 0/7
|
|
fsw0 Te0/2/0/1 41 P,B,R TenGigabitEthernet 0/8
|
|
|
|
Total entries displayed: 5
|
|
```
|
|
|
|
First, I decide to hook up basic connectivity behind port Te0/0/0/0. I establish OSPF, OSPFv3 and this gives me
|
|
visibility to the route-reflectors at IPng's AS8298. Next, I also establish three IPv4 and IPv6 iBGP sessions, so
|
|
the machine enters the Default Free Zone (also, _daaayum_, that table keeps on growing at 903K IPv4 prefixes and
|
|
143K IPv6 prefixes).
|
|
|
|
```
|
|
RP/0/RSP0/CPU0:fridge#show ip ospf neighbor
|
|
Neighbor ID Pri State Dead Time Address Interface
|
|
194.1.163.3 1 2WAY/DROTHER 00:00:35 194.1.163.66 TenGigE0/0/0/0.101
|
|
Neighbor is up for 00:11:14
|
|
194.1.163.4 1 FULL/BDR 00:00:38 194.1.163.67 TenGigE0/0/0/0.101
|
|
Neighbor is up for 00:11:11
|
|
194.1.163.87 1 FULL/DR 00:00:37 194.1.163.87 TenGigE0/0/0/0.101
|
|
Neighbor is up for 00:11:12
|
|
|
|
RP/0/RSP0/CPU0:fridge#show ospfv3 neighbor
|
|
Neighbor ID Pri State Dead Time Interface ID Interface
|
|
194.1.163.87 1 FULL/DR 00:00:35 2 TenGigE0/0/0/0.101
|
|
Neighbor is up for 00:12:14
|
|
194.1.163.3 1 2WAY/DROTHER 00:00:33 16 TenGigE0/0/0/0.101
|
|
Neighbor is up for 00:12:16
|
|
194.1.163.4 1 FULL/BDR 00:00:36 20 TenGigE0/0/0/0.101
|
|
Neighbor is up for 00:12:12
|
|
|
|
|
|
RP/0/RSP0/CPU0:fridge#show bgp ipv4 uni sum
|
|
Process RcvTblVer bRIB/RIB LabelVer ImportVer SendTblVer StandbyVer
|
|
Speaker 915517 915517 915517 915517 915517 915517
|
|
|
|
Neighbor Spk AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down St/PfxRcd
|
|
194.1.163.87 0 8298 172514 9 915517 0 0 00:04:47 903406
|
|
194.1.163.140 0 8298 171853 9 915517 0 0 00:04:56 903406
|
|
194.1.163.148 0 8298 176244 9 915517 0 0 00:04:49 903406
|
|
|
|
RP/0/RSP0/CPU0:fridge#show bgp ipv6 uni sum
|
|
Process RcvTblVer bRIB/RIB LabelVer ImportVer SendTblVer StandbyVer
|
|
Speaker 151597 151597 151597 151597 151597 151597
|
|
|
|
Neighbor Spk AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down St/PfxRcd
|
|
2001:678:d78:3::87
|
|
0 8298 54763 10 151597 0 0 00:05:19 142542
|
|
2001:678:d78:6::140
|
|
0 8298 51350 10 151597 0 0 00:05:23 142542
|
|
2001:678:d78:7::148
|
|
0 8298 54572 10 151597 0 0 00:05:25 142542
|
|
```
|
|
|
|
One of the acceptance tests of new hardware at AS25091 IP-Max is to ensure that it takes a full table
|
|
to help ensure memory is present, accounted for, and working. These route switch processor boards come
|
|
with 12GB of ECC memory, and can scale the routing table for a small while to come. If/when they are
|
|
at the end of their useful life, they will be replaced with A9K-RSP-880's, which will also give us
|
|
access to 40G and 100G and 24x10G SFP+ line cards. At that point, the upgrade path is much easier as
|
|
the chassis will already be installed. It's a matter of popping in new RSPs and replacing the line
|
|
cards one by one.
|
|
|
|
## Loadtesting the ASR9006/RSP440-SE
|
|
|
|
Now that this router has some basic connectivity, I'll do something that I always wanted to do: loadtest
|
|
an ASR9k! I have mad amounts of respect for Cisco's ASR9k series, but as we'll soon see, their stability
|
|
is their most redeeming quality, not their performance. Nowadays, many flashy 100G machines are around,
|
|
which do indeed have the performance, but not the stability! I've seen routers with an uptime of 7 years,
|
|
and BGP sessions and OSPF adjacencies with an uptime of 5 years+. It's just .. I've not seen that type
|
|
of stability beyond Cisco and maybe Juniper. So if you want _Rock Solid Internet_, this is definitely
|
|
the way to go.
|
|
|
|
I have written a word or two on how VPP (an open source dataplane very similar to these industrial machines)
|
|
works. A great example is my recent [VPP VLAN Gymnastics]({{< ref "2022-02-14-vpp-vlan-gym" >}}) article.
|
|
There's a lot I can learn from comparing the performance between VPP and Cisco ASR9k, so I will focus
|
|
on the following set of practical questions:
|
|
|
|
1. See if unidirectional versus bidirectional traffic impacts performance.
|
|
1. See if there is a performance penalty of using _Bundle-Ether_ (LACP controlled link aggregation).
|
|
1. Of course, replay my standard issue 1514b large packets, internet mix (_imix_) packets, small 64b packets
|
|
from random source/destination addresses (ie. multiple flows); and finally the killer test of small 64b
|
|
packets from a static source/destination address (ie. single flow).
|
|
|
|
This is in total 2 (uni/bi) x2 (lag/plain) x4 (packet mix) or 16 loadtest runs, for three forwarding types ...
|
|
|
|
1. See performance of L2VPN (Point-to-Point), similar to what VPP would call "l2 xconnect". I'll create an
|
|
L2 crossconnect between port Te0/1/0/0 and Te0/2/0/0; this is the simplest form computationally: it
|
|
forwards any frame received on the first interface directly out on the second interface.
|
|
1. Take a look at performance of L2VPN (Bridge Domain), what VPP would call "bridge-domain". I'll create a
|
|
Bridge Domain between port Te0/1/0/0 and Te0/2/0/0; this includes layer2 learning and FIB, and can tie
|
|
together any number of interfaces into a layer2 broadcast domain.
|
|
1. And of course, tablestakes, see performance of IPv4 forwarding, with Te0/1/0/0 as 100.64.0.1/30 and
|
|
Te0/2/0/0 as 100.64.1.1/30 and setting a static for 48.0.0.0/8 and 16.0.0.0/8 back to the loadtester.
|
|
|
|
... making a grand total of 48 loadtests. I have my work cut out for me! So I boot up Rhino, which has a
|
|
Mellanox ConnectX5-Ex (PCIe v4.0 x16) network card sporting two 100G interfaces, and it can easily keep up
|
|
with this 2x10G single interface, and 2x20G LAG, even with 64 byte packets. I am continually amazed that
|
|
a full line rate loadtest of small 64 byte packets at a rate of 40Gbps boils down to 59.52Mpps!
|
|
|
|
For each loadtest, I ramp up the traffic using a [T-Rex loadtester]({{< ref "2021-02-27-coloclue-loadtest" >}})
|
|
that I wrote. It starts with a low-pps warmup duration of 30s, then it ramps up from 0% to a certain line rate
|
|
(in this case, alternating to 10GbpsL1 for the single TenGig tests, or 20GbpsL1 for the LACP tests), with a
|
|
rampup duration of 120s and finally it holds for duration of 30s.
|
|
|
|
The following sections describe the methodology and the configuration statements on the ASR9k, with a quick
|
|
table of results per test, and a longer set of thoughts all the way at the bottom of this document. I so
|
|
encourage you to not skip ahead. Instead, read on and learn a bit (as I did!) from the configuration itself.
|
|
|
|
**The question to answer**: Can this beasty mini-fridge sustain line rate? Let's go take a look!
|
|
|
|
## Test 1 - 2x 10G
|
|
|
|
In this test, I configure a very simple physical environment (this is a good time to take another look at the LLDP table
|
|
above). The Cisco is connected with 4x 10G to the switch, Rhino and Hippo are connected with 2x 100G to the switch
|
|
and I have a Dell connected as well with 2x 10G to the switch (this can be very useful to take a look at what's going
|
|
on on the wire). The switch is an FS S5860-48SC (with 48x10G SFP+ ports, and 8x100G QSFP ports), which is a piece of
|
|
kit that I highly recommend by the way.
|
|
|
|
Its configuration:
|
|
|
|
```
|
|
interface TenGigabitEthernet 0/1
|
|
description Infra: Dell R720xd hvn0:enp5s0f0
|
|
no switchport
|
|
mtu 9216
|
|
!
|
|
interface TenGigabitEthernet 0/2
|
|
description Infra: Dell R720xd hvn0:enp5s0f1
|
|
no switchport
|
|
mtu 9216
|
|
!
|
|
interface TenGigabitEthernet 0/7
|
|
description Cust: Fridge Te0/2/0/0
|
|
mtu 9216
|
|
switchport access vlan 20
|
|
!
|
|
interface TenGigabitEthernet 0/9
|
|
description Cust: Fridge Te0/1/0/0
|
|
mtu 9216
|
|
switchport access vlan 10
|
|
!
|
|
interface HundredGigabitEthernet 0/53
|
|
description Cust: Rhino HundredGigabitEthernet15/0/1
|
|
mtu 9216
|
|
switchport access vlan 10
|
|
!
|
|
interface HundredGigabitEthernet 0/54
|
|
description Cust: Rhino HundredGigabitEthernet15/0/0
|
|
mtu 9216
|
|
switchport access vlan 20
|
|
!
|
|
monitor session 1 destination interface TenGigabitEthernet 0/1
|
|
monitor session 1 source vlan 10 rx
|
|
monitor session 2 destination interface TenGigabitEthernet 0/2
|
|
monitor session 2 source vlan 20 rx
|
|
```
|
|
|
|
What this does is connect Rhino's Hu15/0/1 and Fridge's Te0/1/0/0 in VLAN 10, and sends a readonly copy of all
|
|
traffic to the Dell's enp5s0f0 interface. Similarly, Rhino's Hu15/0/0 and Fridge's Te0/2/0/0 in VLAN 20 with a copy
|
|
of traffic to the Dell's enp5s0f1 interface. I can now run `tcpdump` on the Dell to see what's going back and forth.
|
|
|
|
In case you're curious: the monitor on Te0/1 and Te0/2 ports will saturate in case both machines are transmitting at
|
|
a combined rate of over 10Gbps. If this is the case, the traffic that doesn't fit is simply dropped from the monitor
|
|
port, but it's of course forwarded correctly between the original Hu0/53 and Te0/9 ports. In other words: the monitor
|
|
session has no performance penalty. It's merely a convenience to be able to take a look on ports where `tcpdump` is
|
|
not easily available (ie. both VPP as well as the ASR9k in this case!)
|
|
|
|
### Test 1.1: 10G L2 Cross Connect
|
|
|
|
A simple matter of virtually patching one interface into the other, I choose the first port on blade 1 and 2, and
|
|
tie them together in a `p2p` cross connect. In my [VLAN Gymnastics]({{< ref "2022-02-14-vpp-vlan-gym" >}}) post, I
|
|
called this a `l2 xconnect`, and although the configuration statements are a bit different, the purpose and expected
|
|
semantics are identical:
|
|
|
|
```
|
|
interface TenGigE0/1/0/0
|
|
l2transport
|
|
!
|
|
!
|
|
interface TenGigE0/2/0/0
|
|
l2transport
|
|
!
|
|
!
|
|
l2vpn
|
|
xconnect group loadtest
|
|
p2p xc01
|
|
interface TenGigE0/1/0/0
|
|
interface TenGigE0/2/0/0
|
|
!
|
|
!
|
|
```
|
|
|
|
The results of this loadtest look promising - although I can already see that the port will not sustain
|
|
line rate at 64 byte packets, which I find somewhat surprising. Both when using multiple flows (ie. random
|
|
source and destination IP addresses), as well as when using a single flow (repeating the same src/dst packet),
|
|
the machine tops out at around 20 Mpps which is 68% of line rate (29.76 Mpps). Fascinating!
|
|
|
|
Loadtest | Unidirectional (pps) | L1 Unidirectional (bps) | Bidirectional (pps) | L1 Bidirectional (bps)
|
|
---------- | ---------- | ------------ | ----------- | ------------
|
|
1514b | 810 kpps | 9.94 Gbps | 1.61 Mpps | 19.77 Gbps
|
|
imix | 3.25 Mpps | 9.94 Gbps | 6.46 Mpps | 19.78 Gbps
|
|
64b Multi | 14.66 Mpps | 9.86 Gbps | 20.3 Mpps | 13.64 Gbps
|
|
64b Single | 14.28 Mpps | 9.60 Gbps | 20.3 Mpps | 13.62 Gbps
|
|
|
|
|
|
### Test 1.2: 10G L2 Bridge Domain
|
|
|
|
I then keep the two physical interfaces in `l2transport` mode, but change the type of l2vpn into a
|
|
`bridge-domain`, which I described in my [VLAN Gymnastics]({{< ref "2022-02-14-vpp-vlan-gym" >}}) post
|
|
as well. VPP and Cisco IOS/XR semantics look very similar indeed, they differ really only in the way
|
|
in which the configuration is expressed:
|
|
|
|
```
|
|
interface TenGigE0/1/0/0
|
|
l2transport
|
|
!
|
|
!
|
|
interface TenGigE0/2/0/0
|
|
l2transport
|
|
!
|
|
!
|
|
l2vpn
|
|
xconnect group loadtest
|
|
!
|
|
bridge group loadtest
|
|
bridge-domain bd01
|
|
interface TenGigE0/1/0/0
|
|
!
|
|
interface TenGigE0/2/0/0
|
|
!
|
|
!
|
|
!
|
|
!
|
|
```
|
|
|
|
Here, I find that performance in one direction is line rate, and with 64b packets ever so slightly better
|
|
than the L2 crossconnect test above. In both directions though, the router struggles to obtain line rate
|
|
in small packets, delivering 64% (or 19.0 Mpps) of the total offered 29.76 Mpps back to the loadtester.
|
|
|
|
Loadtest | Unidirectional (pps) | L1 Unidirectional (bps) | Bidirectional (pps) | L1 Bidirectional (bps)
|
|
---------- | ---------- | ------------ | ----------- | ------------
|
|
1514b | 807 kpps | 9.91 Gbps | 1.63 Mpps | 19.96 Gbps
|
|
imix | 3.24 Mpps | 9.92 Gbps | 6.47 Mpps | 19.81 Gbps
|
|
64b Multi | 14.82 Mpps | 9.96 Gbps | 19.0 Mpps | 12.79 Gbps
|
|
64b Single | 14.86 Mpps | 9.98 Gbps | 19.0 Mpps | 12.81 Gbps
|
|
|
|
I would say that in practice, the performance of a bridge-domain is comparable to that of an L2XC.
|
|
|
|
### Test 1.3: 10G L3 IPv4 Routing
|
|
|
|
This is the most straight forward test: the T-Rex loadtester in this case is sourcing traffic from
|
|
100.64.0.2 on its first interface, and 100.64.1.2 on its second interface. It will send ARP for the
|
|
nexthop (100.64.0.1 and 100.64.1.1, the Cisco), but the Cisco will not maintain an ARP table for the
|
|
loadtester, so I have to add static ARP entries for it. Otherwise, this is a simple test, which stress
|
|
tests the IPv4 forwarding path:
|
|
|
|
```
|
|
interface TenGigE0/1/0/0
|
|
ipv4 address 100.64.0.1 255.255.255.252
|
|
!
|
|
interface TenGigE0/2/0/0
|
|
ipv4 address 100.64.1.1 255.255.255.252
|
|
!
|
|
router static
|
|
address-family ipv4 unicast
|
|
16.0.0.0/8 100.64.1.2
|
|
48.0.0.0/8 100.64.0.2
|
|
!
|
|
!
|
|
arp vrf default 100.64.0.2 043f.72c3.d048 ARPA
|
|
arp vrf default 100.64.1.2 043f.72c3.d049 ARPA
|
|
!
|
|
```
|
|
|
|
Alright, so the cracks definitely show on this loadtest. The performance of small routed packets is quite
|
|
poor, weighing in at 35% of line rate in the unidirectional test, and 43% in the bidirectional test. It seems
|
|
that the ASR9k (at least in this hardware profile of `l3xl`) is not happy forwarding traffic at line rate,
|
|
and the routing performance is indeed significantly lower than the L2VPN performance. That's good to know!
|
|
|
|
Loadtest | Unidirectional (pps) | L1 Unidirectional (bps) | Bidirectional (pps) | L1 Bidirectional (bps)
|
|
---------- | ---------- | ------------ | ----------- | ------------
|
|
1514b | 815 kpps | 10.0 Gbps | 1.63 Mpps | 19.98 Gbps
|
|
imix | 3.27 Mpps | 9.99 Gbps | 6.52 Mpps | 19.96 Gbps
|
|
64b Multi | 5.14 Mpps | 3.45 Gbps | 12.3 Mpps | 8.28 Gbps
|
|
64b Single | 5.25 Mpps | 3.53 Gbps | 12.6 Mpps | 8.51 Gbps
|
|
|
|
## Test 2 - LACP 2x 20G
|
|
|
|
Link aggregation ([ref](https://en.wikipedia.org/wiki/Link_aggregation)) means combining or aggregating multiple
|
|
network connections in parallel by any of several methods, in order to increase throughput beyond what a single
|
|
connection could sustain, to provide redundancy in case one of the links should fail, or both. A link aggregation
|
|
group (LAG) is the combined collection of physical ports. Other umbrella terms used to describe the concept include
|
|
_trunking_, _bundling_, _bonding_, _channeling_ or _teaming_. Bundling ports together on a Cisco IOS/XR platform
|
|
like the ASR9k can be done by creating a _Bundle-Ether_ or _BE_. For reference, the same concept on VPP is called
|
|
a _BondEthernet_ and in Linux it'll often be referred to as simply a _bond_. They all refer to the same concept.
|
|
|
|
One thing that immediately comes to mind when thinking about LAGs is: how will the member port be selected on
|
|
outgoing traffic? A sensible approach will be to either hash on the L2 source and/or destination (ie. the ethernet
|
|
host on either side of the LAG), but in the case of a router and as is the case in our loadtest here, there is only
|
|
one MAC address on either side of the LAG. So a different hashing algorithm has to be chosen, preferably of the
|
|
source and/or destination _L3_ (IPv4 or IPv6) address. Luckily, both the FS switch as well as the Cisco ASR9006
|
|
support this.
|
|
|
|
First I'll reconfigure the switch, and then reconfigure the router to use the newly created 2x 20G LAG ports.
|
|
|
|
```
|
|
interface TenGigabitEthernet 0/7
|
|
description Cust: Fridge Te0/2/0/0
|
|
port-group 2 mode active
|
|
!
|
|
interface TenGigabitEthernet 0/8
|
|
description Cust: Fridge Te0/2/0/1
|
|
port-group 2 mode active
|
|
!
|
|
interface TenGigabitEthernet 0/9
|
|
description Cust: Fridge Te0/1/0/0
|
|
port-group 1 mode active
|
|
!
|
|
interface TenGigabitEthernet 0/10
|
|
description Cust: Fridge Te0/1/0/1
|
|
port-group 1 mode active
|
|
!
|
|
interface AggregatePort 1
|
|
mtu 9216
|
|
aggregateport load-balance dst-ip
|
|
switchport access vlan 10
|
|
!
|
|
interface AggregatePort 2
|
|
mtu 9216
|
|
aggregateport load-balance dst-ip
|
|
switchport access vlan 20
|
|
!
|
|
```
|
|
|
|
And after the Cisco is converted to use _Bundle-Ether_ as well, the link status looks like this:
|
|
|
|
```
|
|
fsw0#show int ag1
|
|
...
|
|
Aggregate Port Informations:
|
|
Aggregate Number: 1
|
|
Name: "AggregatePort 1"
|
|
Members: (count=2)
|
|
Lower Limit: 1
|
|
TenGigabitEthernet 0/9 Link Status: Up Lacp Status: bndl
|
|
TenGigabitEthernet 0/10 Link Status: Up Lacp Status: bndl
|
|
Load Balance by: Destination IP
|
|
|
|
fsw0#show int usage up
|
|
Interface Bandwidth Average Usage Output Usage Input Usage
|
|
-------------------------------- ----------- ---------------- ---------------- ----------------
|
|
TenGigabitEthernet 0/1 10000 Mbit 0.0000018300% 0.0000013100% 0.0000023500%
|
|
TenGigabitEthernet 0/2 10000 Mbit 0.0000003450% 0.0000004700% 0.0000002200%
|
|
TenGigabitEthernet 0/7 10000 Mbit 0.0000012350% 0.0000022900% 0.0000001800%
|
|
TenGigabitEthernet 0/8 10000 Mbit 0.0000011450% 0.0000021800% 0.0000001100%
|
|
TenGigabitEthernet 0/9 10000 Mbit 0.0000011350% 0.0000022300% 0.0000000400%
|
|
TenGigabitEthernet 0/10 10000 Mbit 0.0000016700% 0.0000022500% 0.0000010900%
|
|
HundredGigabitEthernet 0/53 100000 Mbit 0.00000011900% 0.00000023800% 0.00000000000%
|
|
HundredGigabitEthernet 0/54 100000 Mbit 0.00000012500% 0.00000025000% 0.00000000000%
|
|
AggregatePort 1 20000 Mbit 0.0000014600% 0.0000023400% 0.0000005799%
|
|
AggregatePort 2 20000 Mbit 0.0000019575% 0.0000023950% 0.0000015200%
|
|
```
|
|
|
|
It's clear that both `AggregatePort` interfaces have 20Gbps of capacity and are using an L3
|
|
loadbalancing policy. Cool beans!
|
|
|
|
If you recall my loadtest theory in for example my [Netgate 6100 review]({{< ref "2021-11-26-netgate-6100" >}}),
|
|
it can sometimes be useful to operate a single-flow loadtest, in which the source and destination
|
|
IP:Port stay the same. As I'll demonstrate, it's not only relevant for PC based routers like ones built
|
|
on VPP, it can also be very relevant in silicon vendors and high-end routers!
|
|
|
|
### Test 2.1 - 2x 20G LAG L2 Cross Connect
|
|
|
|
I scratched my head a little while (and with a little while I mean more like an hour or so!), because usually
|
|
I come across _Bundle-Ether_ interfaces which have hashing turned on in the interface stanza, but in my
|
|
first loadtest run I did not see any traffic on the second member port. I then found out that I need L2VPN
|
|
setting `l2vpn load-balancing flow src-dst-ip` applied rather than the Interface setting:
|
|
|
|
```
|
|
interface Bundle-Ether1
|
|
description LAG1
|
|
l2transport
|
|
!
|
|
!
|
|
interface TenGigE0/1/0/0
|
|
bundle id 1 mode active
|
|
!
|
|
interface TenGigE0/1/0/1
|
|
bundle id 1 mode active
|
|
!
|
|
interface Bundle-Ether2
|
|
description LAG2
|
|
l2transport
|
|
!
|
|
!
|
|
interface TenGigE0/2/0/0
|
|
bundle id 2 mode active
|
|
!
|
|
interface TenGigE0/2/0/1
|
|
bundle id 2 mode active
|
|
!
|
|
l2vpn
|
|
load-balancing flow src-dst-ip
|
|
xconnect group loadtest
|
|
p2p xc01
|
|
interface Bundle-Ether1
|
|
interface Bundle-Ether2
|
|
!
|
|
!
|
|
!
|
|
```
|
|
|
|
Overall, the router performs as well as can be expected. In the single-flow 64 byte test, however, due to
|
|
the hashing over the available members in the LAG being on L3 information, the router is forced to always
|
|
choose the same member and effectively perform at 10G throughput, so it'll get a pass from me on the 64b
|
|
single test. In the multi-flow test, I can see that it does indeed forward over both LAG members, however
|
|
it reaches only 34.9Mpps which is 59% of line rate.
|
|
|
|
Loadtest | Unidirectional (pps) | L1 Unidirectional (bps) | Bidirectional (pps) | L1 Bidirectional (bps)
|
|
---------- | ---------- | ------------ | ----------- | ------------
|
|
1514b | 1.61 Mpps | 19.8 Gbps | 3.23 Mpps | 39.64 Gbps
|
|
imix | 6.40 Mpps | 19.8 Gbps | 12.8 Mpps | 39.53 Gbps
|
|
64b Multi | 29.44 Mpps | 19.8 Gbps | 34.9 Mpps | 23.48 Gbps
|
|
64b Single | 14.86 Mpps | 9.99 Gbps | 29.8 Mpps | 20.0 Gbps
|
|
|
|
### Test 2.2 - 2x 20G LAG Bridge Domain
|
|
|
|
Just like with Test 1.2 above, I can now transform this service from a Cross Connect into a fully formed
|
|
L2 bridge, by simply putting the two _Bundle-Ether_ interfaces in a _bridge-domain_ together, again
|
|
being careful to apply the L3 load-balancing policy on the `l2vpn` scope rather than the `interface`
|
|
scope:
|
|
|
|
```
|
|
l2vpn
|
|
load-balancing flow src-dst-ip
|
|
no xconnect group loadtest
|
|
bridge group loadtest
|
|
bridge-domain bd01
|
|
interface Bundle-Ether1
|
|
!
|
|
interface Bundle-Ether2
|
|
!
|
|
!
|
|
!
|
|
!
|
|
```
|
|
|
|
The results for this test show that indeed L2XC is computationally cheaper than _bridge-domain_ work. With
|
|
imix and 1514b packets, the router is fine and forwards 20G and 40G respectively. When the bridge is slammed
|
|
with 64 byte packets, its performance reaches only 65% with multiple flows in the unidirectional, and 47%
|
|
in the bidirectional loadtest. I found the performance difference with the L2 crossconnect above remarkable.
|
|
|
|
The single-flow loadtest cannot meaningfully stress both members of the LAG due to the src/dst being identical:
|
|
the best I can expect here, is 10G performance regardless how many LAG members there are.
|
|
|
|
Loadtest | Unidirectional (pps) | L1 Unidirectional (bps) | Bidirectional (pps) | L1 Bidirectional (bps)
|
|
---------- | ---------- | ------------ | ----------- | ------------
|
|
1514b | 1.61 Mpps | 19.8 Gbps | 3.22 Mpps | 39.56 Gbps
|
|
imix | 6.39 Mpps | 19.8 Gbps | 12.8 Mpps | 39.58 Gbps
|
|
64b Multi | 20.12 Mpps | 13.5 Gbps | 28.2 Mpps | 18.93 Gbps
|
|
64b Single | 9.49 Mpps | 6.38 Gbps | 19.0 Mpps | 12.78 Gbps
|
|
|
|
### Test 2.3 - 2x 20G LAG L3 IPv4 Routing
|
|
|
|
And finally I turn my attention to the usual suspect: IPv4 routing. Here, I simply remove the `l2vpn`
|
|
stanza alltogether, and remember to put the load-balancing policy on the _Bundle-Ether_ interfaces.
|
|
This ensures that upon transmission, both members of the LAG are used. That is, if and only if the
|
|
IP src/dst addresses differ, which is the case in most, but not all of my loadtests :-)
|
|
|
|
```
|
|
no l2vpn
|
|
|
|
interface Bundle-Ether1
|
|
description LAG1
|
|
ipv4 address 100.64.1.1 255.255.255.252
|
|
bundle load-balancing hash src-ip
|
|
!
|
|
interface TenGigE0/1/0/0
|
|
bundle id 1 mode active
|
|
!
|
|
interface TenGigE0/1/0/1
|
|
bundle id 1 mode active
|
|
!
|
|
interface Bundle-Ether2
|
|
description LAG2
|
|
ipv4 address 100.64.0.1 255.255.255.252
|
|
bundle load-balancing hash src-ip
|
|
!
|
|
interface TenGigE0/2/0/0
|
|
bundle id 2 mode active
|
|
!
|
|
interface TenGigE0/2/0/1
|
|
bundle id 2 mode active
|
|
!
|
|
```
|
|
|
|
The LAG is fine at forwarding IPv4 traffic in 1514b and imix - full line rate and 40Gbps of traffic is
|
|
passed in the bidirectional test. With the 64b frames though, the forwarding performance is not line rate
|
|
but rather 84% of line in one direction, and 76% of line rate in the bidirectional test.
|
|
|
|
And once again, the single-flow loadtest cannot make use of more than one member port in the LAG, so it
|
|
will be constrained to 10G throughput -- that said, it performs at 42.6% of line rate only.
|
|
|
|
Loadtest | Unidirectional (pps) | L1 Unidirectional (bps) | Bidirectional (pps) | L1 Bidirectional (bps)
|
|
---------- | ---------- | ------------ | ----------- | ------------
|
|
1514b | 1.63 Mpps | 20.0 Gbps | 3.25 Mpps | 39.92 Gbps
|
|
imix | 6.51 Mpps | 19.9 Gbps | 13.04 Mpps | 39.91 Gbps
|
|
64b Multi | 12.52 Mpps | 8.41 Gbps | 22.49 Mpps | 15.11 Gbps
|
|
64b Single | 6.49 Mpps | 4.36 Gbps | 11.62 Mpps | 7.81 Gbps
|
|
|
|
## Bonus - ASR9k linear scaling
|
|
|
|
{{< image width="300px" float="right" src="/assets/asr9006/loaded.png" alt="ASR9k Loaded" >}}
|
|
|
|
As I've shown above, the loadtests often topped out at well under line rate for tests with small packet sizes, but I
|
|
can also see that the LAG tests offered a higher performance, although not quite double that of single ports. I can't
|
|
help but wonder: is this perhaps ***a per-port limit*** rather than a router-wide limit?
|
|
|
|
To answer this question, I decide to pull out the stops and populate the ASR9k with as many XFPs as I have in my
|
|
stash, which is 9 pieces. One (Te0/0/0/0) still goes to uplink, because the machine should be carrying IGP and full
|
|
BGP tables at all times; which leaves me with 8x 10G XFPs, which I decide it might be nice to combine all three
|
|
scenarios in one test:
|
|
|
|
1. Test 1.1 with Te0/1/0/2 cross connected to Te0/2/0/2, with a loadtest at 20Gbps.
|
|
1. Test 1.2 with Te0/1/0/3 in a bridge-domain with Te0/2/0/3, also with a loadtest at 20Gbps.
|
|
1. Test 2.3 with Te0/1/0/0+Te0/2/0/0 on one end, and Te0/1/0/1+Te0/2/0/1 on the other end, with an IPv4
|
|
loadtest at 40Gbps.
|
|
|
|
### 64 byte packets
|
|
|
|
It would be unfair to use single-flow on the LAG, considering the hashing is on L3 source and/or destination IPv4
|
|
addresses, so really only one member port would be used. To avoid this pitfall, I run with `vm=var2`. On the other
|
|
two tests, however, I do run the most stringent of traffic pattern with single-flow loadtests. So off I go, firing
|
|
up ***three T-Rex*** instances.
|
|
|
|
First, the 10G L2 Cross Connect test (approximately 17.7Mpps):
|
|
```
|
|
Tx bps L2 | 7.64 Gbps | 7.64 Gbps | 15.27 Gbps
|
|
Tx bps L1 | 10.02 Gbps | 10.02 Gbps | 20.05 Gbps
|
|
Tx pps | 14.92 Mpps | 14.92 Mpps | 29.83 Mpps
|
|
Line Util. | 100.24 % | 100.24 % |
|
|
--- | | |
|
|
Rx bps | 4.52 Gbps | 4.52 Gbps | 9.05 Gbps
|
|
Rx pps | 8.84 Mpps | 8.84 Mpps | 17.67 Mpps
|
|
```
|
|
|
|
Then, the 10G Bridge Domain test (approximately 17.0Mpps):
|
|
```
|
|
Tx bps L2 | 7.61 Gbps | 7.61 Gbps | 15.22 Gbps
|
|
Tx bps L1 | 9.99 Gbps | 9.99 Gbps | 19.97 Gbps
|
|
Tx pps | 14.86 Mpps | 14.86 Mpps | 29.72 Mpps
|
|
Line Util. | 99.87 % | 99.87 % |
|
|
--- | | |
|
|
Rx bps | 4.36 Gbps | 4.36 Gbps | 8.72 Gbps
|
|
Rx pps | 8.51 Mpps | 8.51 Mpps | 17.02 Mpps
|
|
```
|
|
|
|
Finally, the 20G LAG IPv4 forwarding test (approximately 24.4Mpps), noting that the _Line Util._ here is of the 100G
|
|
loadtester ports, so 20% is expected:
|
|
```
|
|
Tx bps L2 | 15.22 Gbps | 15.23 Gbps | 30.45 Gbps
|
|
Tx bps L1 | 19.97 Gbps | 19.99 Gbps | 39.96 Gbps
|
|
Tx pps | 29.72 Mpps | 29.74 Mpps | 59.46 Mpps
|
|
Line Util. | 19.97 % | 19.99 % |
|
|
--- | | |
|
|
Rx bps | 5.68 Gbps | 6.82 Gbps | 12.51 Gbps
|
|
Rx pps | 11.1 Mpps | 13.33 Mpps | 24.43 Mpps
|
|
```
|
|
|
|
To summarize, in the above tests I am pumping 80Gbit (which is 8x 10Gbit full linerate at 64 byte packets, in
|
|
other words 119Mpps) into the machine, and it's returning 30.28Gbps (or 59.2Mpps which is 38%) of that traffic back
|
|
to the loadtesters. Features: yes; linerate: nope!
|
|
|
|
### 256 byte packets
|
|
|
|
Seeing the lowest performance of the router coming in at 8.5Mpps (or 57% of linerate), it stands to reason
|
|
that sending 256 byte packets will stay under the per-port observed packets/sec limits, so I decide to restart
|
|
the loadtesters with 256b packets. The expected ethernet frame is now 256 + 20 byte overhead, or 2208 bits,
|
|
of which ~4.53Mpps can fit into a 10G link. Immediately all ports go up entirely to full capacity. As seen from
|
|
the Cisco's commandline:
|
|
|
|
```
|
|
RP/0/RSP0/CPU0:fridge#show interfaces | utility egrep 'output.*packets/sec' | exclude 0 packets
|
|
Mon Feb 21 22:14:02.250 UTC
|
|
5 minute output rate 18390237000 bits/sec, 9075919 packets/sec
|
|
5 minute output rate 18391127000 bits/sec, 9056714 packets/sec
|
|
5 minute output rate 9278278000 bits/sec, 4547012 packets/sec
|
|
5 minute output rate 9242023000 bits/sec, 4528937 packets/sec
|
|
5 minute output rate 9287749000 bits/sec, 4563507 packets/sec
|
|
5 minute output rate 9273688000 bits/sec, 4537368 packets/sec
|
|
5 minute output rate 9237466000 bits/sec, 4519367 packets/sec
|
|
5 minute output rate 9289136000 bits/sec, 4562365 packets/sec
|
|
5 minute output rate 9290096000 bits/sec, 4554872 packets/sec
|
|
```
|
|
|
|
The first two ports there are _Bundle-Ether_ interface _BE1_ and _BE2_, and the other eight are the TenGigE
|
|
ports. You can see that each one is forwarding the expected 4.53Mpps, and this lines up perfectly with T-Rex
|
|
which is sending 10Gbps of L1, and 9.28Gbps of L2 (the difference here is the ethernet overhead of 20 bytes
|
|
per frame, or 4.53 * 160 bits = 724Mbps), and it's receiving all of that traffic back on the other side, which
|
|
is good.
|
|
|
|
This clearly demonstrates the hypothesis that the machine is ***per-port pps-bound***.
|
|
|
|
So the conclusion is that, the A9K-RSP440-SE typically will forward maybe only 8Mpps on a single TenGigE port, and
|
|
13Mpps on a two-member LAG. However, it will do this _for every port_, and with at least 8x 10G ports saturated,
|
|
it remained fully responsive, OSPF and iBGP adjacencies stayed up, and ping times on the regular (Te0/0/0/0)
|
|
uplink port were smooth.
|
|
|
|
## Results
|
|
|
|
### 1514b and imix: OK!
|
|
|
|
{{< image width="1200px" src="/assets/asr9006/results-imix.png" alt="ASR9k Results - imix" >}}
|
|
|
|
Let me start by showing a side-by-side comparison of the imix tests in all scenarios in the graph above. The
|
|
graph for 1514b tests looks very similar, differing only in the left-Y axis: imix is a 3.2Mpps stream, while
|
|
1514b saturates the 10G port already at 810Kpps. But obviously, the router can do this just fine, even if used
|
|
on 8 ports, it doesn't mind at all. As I later learned, any traffic mix larger than than 256b packets, or 4.5Mpps
|
|
per port, forwards fine in any configuration.
|
|
|
|
### 64b: Not so much :)
|
|
|
|
{{< image width="1200px" src="/assets/asr9006/results-64b.png" alt="ASR9k Results - 64b" >}}
|
|
|
|
{{< image width="1200px" src="/assets/asr9006/results-lacp-64b.png" alt="ASR9k Results - LACP 64b" >}}
|
|
|
|
These graphs show the throughput of the ASR9006 with a pair of A9K-RSP440-SE route switch processors. They
|
|
are rated at 440Gbps per slot, but their packets/sec rates are significantly lower than line rate. The top
|
|
graph shows the tests with 10G ports, and the bottom graph shows the same tests but with a 2x10G ports in
|
|
_Bundle-Ether_ LAG.
|
|
|
|
In an ideal situation, each test would follow the loadtester up to completion, and there would be no horizontal
|
|
lines breaking out partway through. As I showed, some of the loadtests really performed poorly in terms of
|
|
packets/sec forwarded. Understandably, the 20G LAG with single-flow can only utilize one member port (which is
|
|
logical) but then managed to push through only 6Mpps or so. Other tests did better, but overall I must say, the
|
|
results were lower than I had expected.
|
|
|
|
### That juxtaposition
|
|
|
|
At the very top of this article I alluded to what I think is a cool juxtaposition. On the one hand, we have these
|
|
beasty ASR9k routers, running idle at 2.2kW for 24x10G and 40x1G ports (as is the case for the IP-Max router that
|
|
I took out for a spin here). They are large (10U of rackspace), heavy (40kg loaded), expensive (who cares about
|
|
list price, the street price is easily $10'000,- apiece).
|
|
|
|
On the other hand, we have these PC based machines with Vector Packet Processing, operating as low as 19W for 2x10G,
|
|
2x1G and 4x2.5G ports (like the [Netgate 6100]({{< ref "2021-11-26-netgate-6100" >}})) and offering roughly equal
|
|
performance per port, except having to drop only $700,- apiece. The VPP machines come with ~infinite RAM, even a
|
|
16GB machine will run much larger routing tables, including full BGP and so on - there is no (need for) TCAM, and yet
|
|
routing performance scales out with CPUs and larger CPU instruction/data-cache. Looking at my Ryzen 5950X based Hippo/Rhino
|
|
VPP machines, they *can* sustain line rate 64b packets on their 10G ports, due to each CPU being able to process
|
|
around 22.3Mpps, and the machine has 15 usable CPU cores. Intel or Mellanox 100G network cards are affordable, the
|
|
whole machine with 2x100G, 4x10G and 4x1G will set me back about $3'000,- in 1U and run 265 Watts when fully loaded.
|
|
|
|
See an extended rationale with backing data in my [FOSDEM'22 talk](/media/fosdem22/index.html).
|
|
|
|
## Conclusion
|
|
|
|
I set out to answer three questions in this article, and I'm ready to opine now:
|
|
|
|
1. Unidirectional vs Bidirectional: there is an impact - bidirectional tests (stressing both ingress and egress
|
|
of each individual router port) have lower performance, notably in packets smaller than 256b.
|
|
1. LACP performance penalty: there is an impact - 64b multiflow loadtest on LAG obtained 59%, 47% and 42% (for
|
|
Test 2.1-3) while for single ports, they obtained 68%, 64% and 43% (for Test 1.1-3). So while aggregate
|
|
throughput grows with the LACP _Bundle-Ether_ ports, individual port throughput is reduced.
|
|
1. The router performs line rate 1514b, imix, and anything beyond 256b packets really. However, it does _not_
|
|
sustain line rate at 64b packets. Some tests passed with a unidirectional loadtest, but all tests failed
|
|
with bidirectional loadtests.
|
|
|
|
After all of these tests, I have to say I am ***still a huge fan*** of the ASR9k. I had kind of expected that it
|
|
would perform at line rate for any/all of my tests, but the theme became clear after a few - the ports will only
|
|
forward between 8Mpps and 11Mpps (out of the needed 14.88Mpps), but _every_ port will do that, which means
|
|
the machine will still scale up significantly in practice. But for business internet, colocation, and non-residential
|
|
purposes, I would argue that routing _stability_ is most important, and with regards to performance, I would argue
|
|
that _aggregate bandwidth_ is more important than pure _packets/sec_ performance. Finally, the ASR in Cisco ASR9k stands
|
|
for _Advanced Services Router_, and being able to mix-and-match MPLS, L2VPN, Bridges, encapsulation, tunneling, and
|
|
have an expectation of 8-10Mpps per 10G port is absolutely reasonable. The ASR9k is a very competent machine.
|
|
|
|
### Loadtest data
|
|
|
|
I've dropped all loadtest data [here](/assets/asr9006/asr9006-loadtest.tar.gz) and if you'd like to play around with
|
|
the data, take a look at the HTML files in [this directory](/assets/asr9006/), they were built with Michal's
|
|
[trex-loadtest-viz](https://github.com/wejn/trex-loadtest-viz/) scripts.
|
|
|
|
## Acknowledgements
|
|
|
|
I wanted to give a shout-out to Fred and the crew at IP-Max for allowing me to play with their router during
|
|
these loadtests. I'll be configuring it to replace their router at NTT in March, so if you have a connection
|
|
to SwissIX via IP-Max, you will be notified for maintenance ahead of time as we plan the maintenance window.
|
|
|
|
We call these things Fridges in the IP-Max world, because they emit so much cool air when they start :) The
|
|
ASR9001 is the microfridge, this ASR9006 is the minifridge, and the ASR9010 is the regular fridge.
|
|
|