Rewrite all images to Hugo format
This commit is contained in:
505
content/articles/2023-02-12-fitlet2.md
Normal file
505
content/articles/2023-02-12-fitlet2.md
Normal file
@ -0,0 +1,505 @@
|
||||
---
|
||||
date: "2023-02-12T09:51:23Z"
|
||||
title: 'Review: Compulab Fitlet2'
|
||||
---
|
||||
|
||||
{{< image width="400px" float="right" src="/assets/fitlet2/Fitlet2-stock.png" alt="Fitlet" >}}
|
||||
|
||||
A while ago, in June 2021, we were discussing home routers that can keep up with 1G+ internet
|
||||
connections in the [CommunityRack](https://www.communityrack.org) telegram channel. Of course
|
||||
at IPng Networks we are fond of the Supermicro Xeon D1518 [[ref]({% post_url 2021-09-21-vpp-7 %})],
|
||||
which has a bunch of 10Gbit X522 and 1Gbit i350 and i210 intel NICs, but it does come at a certain
|
||||
price.
|
||||
|
||||
For smaller applications, PC Engines APU6 [[ref]({%post_url 2021-07-19-pcengines-apu6 %})] is
|
||||
kind of cool and definitely more affordable. But, in this chat, Patrick offered an alternative,
|
||||
the [[Fitlet2](https://fit-iot.com/web/products/fitlet2/)] which is a small, passively cooled,
|
||||
and expandable IoT-esque machine.
|
||||
|
||||
Fast forward 18 months, and Patrick decided to sell off his units, so I bought one off of him,
|
||||
and decided to loadtest it. Considering the pricetag (the unit I will be testing will ship for
|
||||
around $400), and has the ability to use (1G/SFP) fiber optics, it may be a pretty cool one!
|
||||
|
||||
# Executive Summary
|
||||
|
||||
**TL/DR: Definitely a cool VPP router, 3x 1Gbit line rate, A- would buy again**
|
||||
|
||||
With some care on the VPP configuration (notably RX/TX descriptors), this unit can handle L2XC at
|
||||
(almost) line rate in both directions (2.94Mpps out a theoretical 2.97Mpps), with one VPP worker
|
||||
thread, which it not just good, it's _Good Enough™_, at which time there is still plenty of
|
||||
headroom on the CPU, as the Atom E3950 has 4 cores.
|
||||
|
||||
In IPv4 routing, using two VPP worker threads, and 2 RX/TX queues on each NIC, the machine keeps up
|
||||
with 64 byte traffic in both directions (ie 2.97Mpps), again with compute power to spare, and while
|
||||
using only two out of four CPU cores on the Atom E3950.
|
||||
|
||||
For a $400,- machine that draws close to 11 Watts fully loaded, and sporting 8GB (at a max of 16GB)
|
||||
this Fitlet2 is a gem: it will easily keep up 3x 1Gbit in a production environment, while carrying
|
||||
multiple full BGP tables (900K IPv4 and 170K IPv6), with room to spare. _It's a classy little
|
||||
machine!_
|
||||
|
||||
## Detailed findings
|
||||
|
||||
{{< image width="250px" float="right" src="/assets/fitlet2/Fitlet2-BottomOpen.png" alt="Fitlet2 Open" >}}
|
||||
|
||||
The first thing that I noticed when it arrived is how small it is! The design of the Fitlet2 has a
|
||||
motherboard with a non-removable Atom E3950 CPU running at 1.6GHz, from the _Goldmont_ series. This
|
||||
is a notoriously slow/budget CPU, and it comes with 4C/4T, each CPU thread comes with 24kB of L1
|
||||
and 1MB of L2 cache, and there is no L3 cache on this CPU at all. That would mean performance in
|
||||
applications like VPP (which try to leverage these caches) will be poorer -- the main question on
|
||||
my mind is: does the CPU have enough __oompff__ to keep up with the 1G network cards? I'll want this
|
||||
CPU to be able to handle roughly 4.5Mpps in total, in order for Fitlet2 to count itself amongst the
|
||||
_wirespeed_ routers.
|
||||
|
||||
Looking further, Fitlet2 has one HDMI and one MiniDP port, two USB2 and two USB3 ports, two Intel
|
||||
i211 NICs with RJ45 port (these are 1Gbit). There's a helpful MicroSD slot, two LEDs and an audio
|
||||
in- and output 3.5mm jack. The power button does worry me a little bit, I feel like just brushing
|
||||
against it may turn the machine off. I do appreciate the cooling situation - the top finned plate
|
||||
mates with the CPU on the top of the motherboard, and the bottom bracket holds a sizable aluminium
|
||||
cooling block which further helps dissipate heat, without needing any active cooling. The Fitlet
|
||||
folks claim this machine can run in environments anywhere between -50C and +112C, which I won't be
|
||||
doing :)
|
||||
|
||||
{{< image width="400px" float="right" src="/assets/fitlet2/Fitlet2+FACET.png" alt="Fitlet2" >}}
|
||||
|
||||
Inside, there's a single DDR3 SODIMM slot for memory (the one I have came with 8GB at 1600MT/s) and
|
||||
a custom, ableit open specification expansion board called a __FACET-Card__ which stands for
|
||||
**F**unction **A**nd **C**onnectivity **E**xtension **T**-Card, well okay then! The __FACET__ card
|
||||
in this little machine sports one extra Intel i210-IS NIC, an M2 for an SSD, and an M2E for a WiFi
|
||||
port. The NIC is a 1Gbit SFP capable device. You can see its optic cage on the _FACET_ card above,
|
||||
next to the yellow CMOS / Clock battery.
|
||||
|
||||
The whole thing is fed with 12V powerbrick delivering 2A, and a nice touch is that the barrel
|
||||
connector has a plastic bracket that locks it into the chassis by turning it 90degrees, so it won't
|
||||
flap around in the breeze and detach. I wish other embedded PCs would ship with those, as I've been
|
||||
fumbling around in 19" racks that are, let me say, less tightly cable organized, and may or may not
|
||||
have disconnected the CHIX routeserver at some point in the past. Sorry, Max :)
|
||||
|
||||
For the curious, here's a list of interesting details: [[lspci](/assets/fitlet2/lspci.txt)] -
|
||||
[[dmidecode](/assets/fitlet2/dmidecode.txt)] -
|
||||
[[likwid-topology](/assets/fitlet2/likwid-topology.txt)] - [[dmesg](/assets/fitlet2/dmesg.txt)].
|
||||
|
||||
## Preparing the Fitlet2
|
||||
|
||||
First, I grab a USB key and install Debian _Bullseye_ (11.5) on it, using the UEFI installer. After
|
||||
booting, I carry through the instructions on my [[VPP Production]({% post_url 2021-09-21-vpp-7 %})]
|
||||
post. Notably, I create the `dataplane` namespace, run an SSH and SNMP agent there, run
|
||||
`isolcpus=1-3` so that I can give three worker threads to VPP, but I start off giving it only one (1)
|
||||
worker thread, because this way I can take a look at what the performance is of a single CPU, before
|
||||
scaling out to the three (3) threads that this CPU can offer. I also take the defaults for DPDK,
|
||||
notably allowing the DPDK poll-mode-drivers to take their proposed defaults:
|
||||
|
||||
* **GigabitEthernet1/0/0**: Intel Corporation I211 Gigabit Network Connection (rev 03)
|
||||
> rx: queues 1 (max 2), desc 512 (min 32 max 4096 align 8) <br />
|
||||
> tx: queues 2 (max 2), desc 512 (min 32 max 4096 align 8)
|
||||
* **GigabitEthernet3/0/0**: Intel Corporation I210 Gigabit Fiber Network Connection (rev 03)
|
||||
> rx: queues 1 (max 4), desc 512 (min 32 max 4096 align 8) <br />
|
||||
> tx: queues 2 (max 4), desc 512 (min 32 max 4096 align 8)
|
||||
|
||||
I observe that the i211 NIC allows for a maximum of two (2) RX/TX queues, while the (older!) i210
|
||||
will allow for four (4) of them. And another thing that I see here is that there are two (2) TX
|
||||
queues active, but I only have one worker thread, so what gives? This is because there is always a
|
||||
main thread and a worker thread, and it could be that the main thread needs to / wants to send
|
||||
traffic out on an interface, so it always attaches to a queue in addition to the worker thread(s).
|
||||
When exploring new hardware, I find it useful to take a look at the output of a few tactical `show`
|
||||
commands on the CLI, such as:
|
||||
|
||||
**1. What CPU is in this machine?**
|
||||
|
||||
```
|
||||
vpp# show cpu
|
||||
Model name: Intel(R) Atom(TM) Processor E3950 @ 1.60GHz
|
||||
Microarch model (family): [0x6] Goldmont ([0x5c] Apollo Lake) stepping 0x9
|
||||
Flags: sse3 pclmulqdq ssse3 sse41 sse42 rdrand pqe rdseed aes sha invariant_tsc
|
||||
Base frequency: 1.59 GHz
|
||||
```
|
||||
|
||||
**2. Which devices on the PCI bus, PCIe speed details, and driver?**
|
||||
|
||||
```
|
||||
vpp# show pci
|
||||
Address Sock VID:PID Link Speed Driver Product Name Vital Product Data
|
||||
0000:01:00.0 0 8086:1539 2.5 GT/s x1 uio_pci_generic
|
||||
0000:02:00.0 0 8086:1539 2.5 GT/s x1 igb
|
||||
0000:03:00.0 0 8086:1536 2.5 GT/s x1 uio_pci_generic
|
||||
```
|
||||
|
||||
__Note__: This device at slot `02:00.0` is the second onboard RJ45 i211 NIC. I have used this one
|
||||
to log in to the Fitlet2 and more easily kill/restart VPP and so on, but I could of course just as
|
||||
well give it to VPP, in which case I'd have three gigabit interfaces to play with!
|
||||
|
||||
**3. What details are known for the physical NICs?**
|
||||
|
||||
```
|
||||
vpp# show hardware GigabitEthernet1/0/0
|
||||
GigabitEthernet1/0/0 1 up GigabitEthernet1/0/0
|
||||
Link speed: 1 Gbps
|
||||
RX Queues:
|
||||
queue thread mode
|
||||
0 vpp_wk_0 (1) polling
|
||||
TX Queues:
|
||||
TX Hash: [name: hash-eth-l34 priority: 50 description: Hash ethernet L34 headers]
|
||||
queue shared thread(s)
|
||||
0 no 0
|
||||
1 no 1
|
||||
Ethernet address 00:01:c0:2a:eb:a8
|
||||
Intel e1000
|
||||
carrier up full duplex max-frame-size 2048
|
||||
flags: admin-up maybe-multiseg tx-offload intel-phdr-cksum rx-ip4-cksum int-supported
|
||||
rx: queues 1 (max 2), desc 512 (min 32 max 4096 align 8)
|
||||
tx: queues 2 (max 2), desc 512 (min 32 max 4096 align 8)
|
||||
pci: device 8086:1539 subsystem 8086:0000 address 0000:01:00.00 numa 0
|
||||
max rx packet len: 16383
|
||||
promiscuous: unicast off all-multicast on
|
||||
vlan offload: strip off filter off qinq off
|
||||
rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum vlan-filter
|
||||
vlan-extend scatter keep-crc rss-hash
|
||||
rx offload active: ipv4-cksum scatter
|
||||
tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum
|
||||
tcp-tso multi-segs
|
||||
tx offload active: ipv4-cksum udp-cksum tcp-cksum multi-segs
|
||||
rss avail: ipv4-tcp ipv4-udp ipv4 ipv6-tcp-ex ipv6-udp-ex ipv6-tcp
|
||||
ipv6-udp ipv6-ex ipv6
|
||||
rss active: none
|
||||
tx burst function: (not available)
|
||||
rx burst function: (not available)
|
||||
```
|
||||
|
||||
### Configuring VPP
|
||||
|
||||
After this exploratory exercise, I have learned enough about the hardware to be able to take the
|
||||
Fitlet2 out for a spin. To configure the VPP instance, I turn to
|
||||
[[vppcfg](https://github.com/pimvanpelt/vppcfg)], which can take a YAML configuration file
|
||||
describing the desired VPP configuration, and apply it safely to the running dataplane using the VPP
|
||||
API. I've written a few more posts on how it does that, notably on its [[syntax]({% post_url
|
||||
2022-03-27-vppcfg-1 %})] and its [[planner]({% post_url 2022-04-02-vppcfg-2 %})]. A complete
|
||||
configuration guide on vppcfg can be found
|
||||
[[here](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md)].
|
||||
|
||||
```
|
||||
pim@fitlet:~$ sudo dpkg -i {lib,}vpp*23.06*deb
|
||||
pim@fitlet:~$ sudo apt install python3-pip
|
||||
pim@fitlet:~$ sudo pip install vppcfg-0.0.3-py3-none-any.whl
|
||||
```
|
||||
|
||||
### Methodology
|
||||
|
||||
#### Method 1: Single CPU Thread Saturation
|
||||
|
||||
First I will take VPP out for a spin by creating an L2 Cross Connect where any ethernet frame
|
||||
received on `Gi1/0/0` will be directly transmitted as-is on `Gi3/0/0` and vice versa. This is a
|
||||
relatively cheap operation for VPP, as it will not have to do any routing table lookups. The
|
||||
configuration looks like this:
|
||||
|
||||
```
|
||||
pim@fitlet:~$ cat << EOF > l2xc.yaml
|
||||
interfaces:
|
||||
GigabitEthernet1/0/0:
|
||||
mtu: 1500
|
||||
l2xc: GigabitEthernet3/0/0
|
||||
GigabitEthernet3/0/0:
|
||||
mtu: 1500
|
||||
l2xc: GigabitEthernet1/0/0
|
||||
EOF
|
||||
pim@fitlet:~$ vppcfg plan -c l2xc.yaml
|
||||
[INFO ] root.main: Loading configfile l2xc.yaml
|
||||
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
|
||||
[INFO ] root.main: Configuration is valid
|
||||
[INFO ] vppcfg.vppapi.connect: VPP version is 23.06-rc0~35-gaf4046134
|
||||
comment { vppcfg sync: 10 CLI statement(s) follow }
|
||||
set interface l2 xconnect GigabitEthernet1/0/0 GigabitEthernet3/0/0
|
||||
set interface l2 tag-rewrite GigabitEthernet1/0/0 disable
|
||||
set interface l2 xconnect GigabitEthernet3/0/0 GigabitEthernet1/0/0
|
||||
set interface l2 tag-rewrite GigabitEthernet3/0/0 disable
|
||||
set interface mtu 1500 GigabitEthernet1/0/0
|
||||
set interface mtu 1500 GigabitEthernet3/0/0
|
||||
set interface mtu packet 1500 GigabitEthernet1/0/0
|
||||
set interface mtu packet 1500 GigabitEthernet3/0/0
|
||||
set interface state GigabitEthernet1/0/0 up
|
||||
set interface state GigabitEthernet3/0/0 up
|
||||
[INFO ] vppcfg.reconciler.write: Wrote 11 lines to (stdout)
|
||||
[INFO ] root.main: Planning succeeded
|
||||
```
|
||||
|
||||
{{< image width="500px" float="right" src="/assets/fitlet2/l2xc-demo1.png" alt="Fitlet2 L2XC First Try" >}}
|
||||
|
||||
After I paste these commands on the CLI, I start T-Rex in L2 stateless mode, and start T-Rex, I can
|
||||
generate some activity by starting the `bench` profile on port 0 with packets of 64 bytes in size
|
||||
and with varying IPv4 source and destination addresses _and_ ports:
|
||||
|
||||
```
|
||||
tui>start -f stl/bench.py -m 1.48mpps -p 0
|
||||
-t size=64,vm=var2
|
||||
```
|
||||
|
||||
Let me explain a few hilights from the picture to the right. When starting this profile, I
|
||||
specified 1.48Mpps, which is the maximum amount of packets/second that can be generated on a 1Gbit
|
||||
link when using 64 byte frames (the smallest permissible ethernet frames). I do this because the
|
||||
loadtester comes with 10Gbit (and 100Gbit) ports, but the Fitlet2 has only 1Gbit ports. Then, I see
|
||||
that port0 is indeed transmitting (**Tx pps**) 1.48 Mpps, shown in dark blue. This is about 992 Mbps
|
||||
on the wire (the **Tx bps L1**), but due to the overhead of ethernet (each 64 byte ethernet frame
|
||||
needs an additional 20 bytes [[details](https://en.wikipedia.org/wiki/Ethernet_frame)]), so the **Tx
|
||||
bps L2** is about `64/84 * 992.35 = 756.08` Mbps, which lines up.
|
||||
|
||||
Then, after the Fitlet2 tries its best to forward those from its receiving Gi1/0/0 port onto its
|
||||
transmitting port Gi3/0/0, they are received again by T-Rex on port 1. Here, I can see that the **Rx
|
||||
pps** is 1.29 Mpps, with an **Rx bps** of 660.49 Mbps (which is the L2 counter), and in bright red
|
||||
at the top I see the **drop_rate** is about 95.59 Mbps. In other words, the Fitlet2 is _not keeping
|
||||
up_.
|
||||
|
||||
But, after I take a look at the runtime statistics, I see that the CPU isn't very busy at all:
|
||||
|
||||
```
|
||||
vpp# show run
|
||||
...
|
||||
Thread 1 vpp_wk_0 (lcore 1)
|
||||
Time 23.8, 10 sec internal node vector rate 4.30 loops/sec 1638976.68
|
||||
vector rates in 1.2908e6, out 1.2908e6, drop 0.0000e0, punt 0.0000e0
|
||||
Name State Calls Vectors Suspends Clocks Vectors/Call
|
||||
GigabitEthernet3/0/0-output active 6323688 27119700 0 9.14e1 4.29
|
||||
GigabitEthernet3/0/0-tx active 6323688 27119700 0 1.79e2 4.29
|
||||
dpdk-input polling 44406936 27119701 0 5.35e2 .61
|
||||
ethernet-input active 6323689 27119701 0 1.42e2 4.29
|
||||
l2-input active 6323689 27119701 0 9.94e1 4.29
|
||||
l2-output active 6323689 27119701 0 9.77e1 4.29
|
||||
```
|
||||
|
||||
Very interesting! Notice that the line above says `vector rates in .. out ..` are saying that the
|
||||
thread is receiving only 1.29Mpps, and it is managing to send all of them out as well. When a VPP
|
||||
worker is busy, each DPDK call will yield many packets, up to 256 in one call, which means the
|
||||
amount of "vectors per call" will rise. Here, I see that on average, DPDK is returning an average of
|
||||
only 0.61 packets each time it polls the NIC, and in each time a bunch of the packets are sent off
|
||||
into the VPP graph, there is an average of 4.29 packets per loop. If the CPU was the bottleneck, it
|
||||
would look more like 256 in the Vectors/Call column -- so the **bottleneck must be in the NIC**.
|
||||
|
||||
Remember above, when I showed the `show hardware` command output? There's a clue in there. The
|
||||
Fitlet2 has two onboard i211 NICs and one i210 NIC on the _FACET_ card. Despite the lower number,
|
||||
the i210 is a bit more advanced
|
||||
[[datasheet](/assets/fitlet2/i210_ethernet_controller_datasheet-257785.pdf)]. If I reverse the
|
||||
direction of flow (so receiving on the i210 Gi3/0/0, and transmitting on the i211 Gi1/0/0), things
|
||||
look a fair bit better:
|
||||
|
||||
```
|
||||
vpp# show run
|
||||
...
|
||||
Thread 1 vpp_wk_0 (lcore 1)
|
||||
Time 12.6, 10 sec internal node vector rate 4.02 loops/sec 853956.73
|
||||
vector rates in 1.4799e6, out 1.4799e6, drop 0.0000e0, punt 0.0000e0
|
||||
Name State Calls Vectors Suspends Clocks Vectors/Call
|
||||
GigabitEthernet1/0/0-output active 4642964 18652932 0 9.34e1 4.02
|
||||
GigabitEthernet1/0/0-tx active 4642964 18652420 0 1.73e2 4.02
|
||||
dpdk-input polling 12200880 18652933 0 3.27e2 1.53
|
||||
ethernet-input active 4642965 18652933 0 1.54e2 4.02
|
||||
l2-input active 4642964 18652933 0 1.04e2 4.02
|
||||
l2-output active 4642964 18652933 0 1.01e2 4.02
|
||||
```
|
||||
|
||||
Hey, would you look at that! The line up top here shows vector rates of in 1.4799e6 (which is
|
||||
1.48Mpps) and outbound is the same number. And in this configuration as well, the DPDK node isn't
|
||||
even reading that many packets, and the graph traversal is on average with 4.02 packets per run,
|
||||
which means that this CPU can do in excess of 1.48Mpps on one (1) CPU thread. Slick!
|
||||
|
||||
So what _is_ the maximum throughput per CPU thread? To show this, I will saturate both ports with
|
||||
line rate traffic, and see what makes it through the other side. After instructing the T-Rex to
|
||||
perform the following profile:
|
||||
|
||||
```
|
||||
tui>start -f stl/bench.py -m 1.48mpps -p 0 1 \
|
||||
-t size=64,vm=var2
|
||||
```
|
||||
|
||||
T-Rex will faithfully start to send traffic on both ports and expect the same amount back from the
|
||||
Fitlet2 (the _Device Under Test_ or _DUT_). I can see that from T-Rex port 1->0 all traffic makes
|
||||
its way back, but from port 0->1 there is a little bit of loss (for the 1.48Mpps sent, only 1.43Mpps
|
||||
is returned). This is the same phenomenon that I explained above -- the i211 NIC is not quite as
|
||||
good at eating packets as the i210 NIC is.
|
||||
|
||||
Even when doing this though, the (still) single threaded VPP is keeping up just fine, CPU wise:
|
||||
|
||||
```
|
||||
vpp# show run
|
||||
...
|
||||
Thread 1 vpp_wk_0 (lcore 1)
|
||||
Time 13.4, 10 sec internal node vector rate 13.59 loops/sec 122820.33
|
||||
vector rates in 2.9599e6, out 2.8834e6, drop 0.0000e0, punt 0.0000e0
|
||||
Name State Calls Vectors Suspends Clocks Vectors/Call
|
||||
GigabitEthernet1/0/0-output active 1822674 19826616 0 3.69e1 10.88
|
||||
GigabitEthernet1/0/0-tx active 1822674 19597360 0 1.51e2 10.75
|
||||
GigabitEthernet3/0/0-output active 1823770 19826612 0 4.79e1 10.87
|
||||
GigabitEthernet3/0/0-tx active 1823770 19029508 0 1.56e2 10.43
|
||||
dpdk-input polling 1827320 39653228 0 1.62e2 21.70
|
||||
ethernet-input active 3646444 39653228 0 7.67e1 10.87
|
||||
l2-input active 1825356 39653228 0 4.96e1 21.72
|
||||
l2-output active 1825356 39653228 0 4.58e1 21.72
|
||||
```
|
||||
|
||||
Here we can see 2.96Mpps received (_vector rates in_) while only 2.88Mpps are transmitted (_vector
|
||||
rates out_). First off, this lines up perfectly with the reporting of T-Rex in the screenshot above,
|
||||
and it also shows that one direction loses more packets than the other. We're dropping some 80kpps,
|
||||
but where did they go? Looking at the statistics counters, which include any packets which had
|
||||
errors in processing, we learn more:
|
||||
|
||||
```
|
||||
vpp# show err
|
||||
Count Node Reason Severity
|
||||
3109141488 l2-output L2 output packets error
|
||||
3109141488 l2-input L2 input packets error
|
||||
9936649 GigabitEthernet1/0/0-tx Tx packet drops (dpdk tx failure) error
|
||||
32120469 GigabitEthernet3/0/0-tx Tx packet drops (dpdk tx failure) error
|
||||
```
|
||||
|
||||
{{< image width="500px" float="right" src="/assets/fitlet2/l2xc-demo2.png" alt="Fitlet2 L2XC Second Try" >}}
|
||||
|
||||
Aha! From previous experience I know that when DPDK signals packet drops due to 'tx failure',
|
||||
that this is often because it's trying to hand off the packet to the NIC, which has a ringbuffer to
|
||||
collect them while the hardware transmits them onto the wire, and this NIC has run out of slots,
|
||||
which means the packet has to be dropped and a kitten gets hurt. But, I can raise the number of
|
||||
RX and TX slots, by setting them in VPP's `startup.conf` file:
|
||||
|
||||
```
|
||||
dpdk {
|
||||
dev default {
|
||||
num-rx-desc 512 ## default
|
||||
num-tx-desc 1024
|
||||
}
|
||||
no-multi-seg
|
||||
}
|
||||
```
|
||||
|
||||
And with that simple tweak, I've succeeded in configuring the Fitlet2 in a way that it is capable of
|
||||
receiving and transmitting 64 byte packets in both directions at (almost) line rate, with **one CPU
|
||||
thread**.
|
||||
|
||||
#### Method 2: Rampup using trex-loadtest.py
|
||||
|
||||
For this test, I decide to put the Fitlet2 into L3 mode (up until now it was set up in _L2 Cross
|
||||
Connect_ mode). To do this, I give the interfaces an IPv4 address and set a route for the loadtest
|
||||
traffic (which will be coming from `16.0.0.0/8` and going to `48.0.0.0/8`). I will once again look
|
||||
to `vppcfg` to do this, because manipulating the YAML files like this allow me to easily and reliabily
|
||||
swap back and forth, letting `vppcfg` do the mundane chore of figuring out what commands to type, in
|
||||
which order, safely.
|
||||
|
||||
From my existing L2XC dataplane configuration, I switch to L3 like so:
|
||||
|
||||
```
|
||||
pim@fitlet:~$ cat << EOF > l3.yaml
|
||||
interfaces:
|
||||
GigabitEthernet1/0/0:
|
||||
mtu: 1500
|
||||
lcp: e1-0-0
|
||||
addresses: [ 100.64.10.1/30 ]
|
||||
GigabitEthernet3/0/0:
|
||||
mtu: 1500
|
||||
lcp: e3-0-0
|
||||
addresses: [ 100.64.10.5/30 ]
|
||||
EOF
|
||||
pim@fitlet:~$ vppcfg plan -c l3.yaml
|
||||
[INFO ] root.main: Loading configfile l3.yaml
|
||||
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
|
||||
[INFO ] root.main: Configuration is valid
|
||||
[INFO ] vppcfg.vppapi.connect: VPP version is 23.06-rc0~35-gaf4046134
|
||||
comment { vppcfg prune: 2 CLI statement(s) follow }
|
||||
set interface l3 GigabitEthernet1/0/0
|
||||
set interface l3 GigabitEthernet3/0/0
|
||||
comment { vppcfg create: 2 CLI statement(s) follow }
|
||||
lcp create GigabitEthernet1/0/0 host-if e1-0-0
|
||||
lcp create GigabitEthernet3/0/0 host-if e3-0-0
|
||||
comment { vppcfg sync: 2 CLI statement(s) follow }
|
||||
set interface ip address GigabitEthernet1/0/0 100.64.10.1/30
|
||||
set interface ip address GigabitEthernet3/0/0 100.64.10.5/30
|
||||
[INFO ] vppcfg.reconciler.write: Wrote 9 lines to (stdout)
|
||||
[INFO ] root.main: Planning succeeded
|
||||
```
|
||||
|
||||
One small note -- `vppcfg` cannot set routes, and this is by design as the Linux Control Plane is
|
||||
meant to take care of that. I can either set routes using `ip` in the `dataplane` network namespace,
|
||||
like so:
|
||||
|
||||
```
|
||||
pim@fitlet:~$ sudo nsenter --net=/var/run/netns/dataplane
|
||||
root@fitlet:/home/pim# ip route add 16.0.0.0/8 via 100.64.10.2
|
||||
root@fitlet:/home/pim# ip route add 48.0.0.0/8 via 100.64.10.6
|
||||
```
|
||||
|
||||
Or, alternatively, I can set them directly on VPP in the CLI, interestingly with identical syntax:
|
||||
```
|
||||
pim@fitlet:~$ vppctl
|
||||
vpp# ip route add 16.0.0.0/8 via 100.64.10.2
|
||||
vpp# ip route add 48.0.0.0/8 via 100.64.10.6
|
||||
```
|
||||
|
||||
The loadtester will run a bunch of profiles (1514b, _imix_, 64b with multiple flows, and 64b with
|
||||
only one flow), either in unidirectional or bidirectional mode, which gives me a wealth of data to
|
||||
share:
|
||||
|
||||
Loadtest | 1514b | imix | Multi 64b | Single 64b
|
||||
-------------------- | -------- | -------- | --------- | ----------
|
||||
***Bidirectional*** | [81.7k (100%)](/assets/fitlet2/fitlet2.bench-var2-1514b-bidirectional.html) | [327k (100%)](/assets/fitlet2/fitlet2.bench-var2-imix-bidirectional.html) | [1.48M (100%)](/assets/fitlet2/fitlet2.bench-var2-bidirectional.html) | [1.43M (98.8%)](/assets/fitlet2/fitlet2.bench-bidirectional.html)
|
||||
***Unidirectional*** | [73.2k (89.6%)](/assets/fitlet2/fitlet2.bench-var2-1514b-unidirectional.html) | [255k (78.2%)](/assets/fitlet2/fitlet2.bench-var2-imix-unidirectional.html) | [1.18M (79.4%)](/assets/fitlet2/fitlet2.bench-var2-unidirectional.html) | [1.23M (82.7%)](/assets/fitlet2/fitlet2.bench-bidirectional.html)
|
||||
|
||||
## Caveats
|
||||
|
||||
While all results of the loadtests are navigable [[here](/assets/fitlet2/fitlet2.html)], I will cherrypick
|
||||
one interesting bundle showing the results of _all_ (bi- and unidirectional) tests:
|
||||
|
||||
{{< image src="/assets/fitlet2/loadtest.png" alt="Fitlet2 All Loadtests" >}}
|
||||
|
||||
I have to admit I was a bit stumped with the unidirectional loadtests - these
|
||||
are pushing traffic into the i211 (onboard RJ45) NIC, and out of the i210
|
||||
(_FACET_ SFP) NIC. What I found super weird (and can't really explain), is
|
||||
that the _unidirectional_ load, which in the end serves half the packets/sec,
|
||||
is __lower__ than the _bidirectional_ load, which was almost perfect dropping
|
||||
only a little bit of traffic at the very end. A picture says a thousand words -
|
||||
so here's a graph of all the loadtests, which you can also find by clicking on
|
||||
the links in the table.
|
||||
|
||||
## Appendix
|
||||
|
||||
### Generating the data
|
||||
|
||||
The JSON files that are emitted by my loadtester script can be fed directly into Michal's
|
||||
[visualizer](https://github.com/wejn/trex-loadtest-viz) to plot interactive graphs (which I've
|
||||
done for the table above):
|
||||
|
||||
```
|
||||
DEVICE=Fitlet2
|
||||
|
||||
## Loadtest
|
||||
|
||||
SERVER=${SERVER:=hvn0.lab.ipng.ch}
|
||||
TARGET=${TARGET:=l3}
|
||||
RATE=${RATE:=10} ## % of line
|
||||
DURATION=${DURATION:=600}
|
||||
OFFSET=${OFFSET:=10}
|
||||
PROFILE=${PROFILE:="ipng"}
|
||||
|
||||
for DIR in unidirectional bidirectional; do
|
||||
for SIZE in 1514 imix 64; do
|
||||
[ $DIR == "unidirectional" ] && FLAGS="-u "
|
||||
## Multiple Flows
|
||||
./trex-loadtest -s ${SERVER} ${FLAGS} -p $PROFILE}.py -t "offset=${OFFSET},vm=var2,size=${SIZE}" \
|
||||
-rd ${DURATION} -rt ${RATE} -o ${DEVICE}-${TARGET}-${PROFILE}-var2-${SIZE}-${DIR}.json
|
||||
|
||||
[ $SIZE -eq 64 ] && {
|
||||
## Specialcase: Single Flow
|
||||
./trex-loadtest -s ${SERVER} ${FLAGS -p ${PROFILE}.py -t "offset=${OFFSET},size=${SIZE}" \
|
||||
-rd ${DURATION} -rt ${RATE} -o ${DEVICE}-${TARGET}-${PROFILE}-${SIZE}-${DIR}.json
|
||||
}
|
||||
done
|
||||
done
|
||||
|
||||
## Graphs
|
||||
|
||||
ruby graph.rb -t "${DEVICE} All Loadtests" ${DEVICE}*.json -o ${DEVICE}.html
|
||||
ruby graph.rb -t "${DEVICE} Unidirectional Loadtests" ${DEVICE}*unidir*.json \
|
||||
-o ${DEVICE}.unidirectional.html
|
||||
ruby graph.rb -t "${DEVICE} Bidirectional Loadtests" ${DEVICE}*bidir*.json \
|
||||
-o ${DEVICE}.bidirectional.html
|
||||
|
||||
for i in ${PROFILE}-var2-1514 ${PROFILE}-var2-imix ${PROFILE}-var2-64 ${PROFILE}-64; do
|
||||
ruby graph.rb -t "${DEVICE} Unidirectional Loadtests" ${DEVICE}*-${i}*unidirectional.json \
|
||||
-o ${DEVICE}.$i-unidirectional.html; done
|
||||
ruby graph.rb -t "${DEVICE} Bidirectional Loadtests" ${DEVICE}*-${i}*bidirectional.json \
|
||||
-o ${DEVICE}.$i-bidirectional.html; done
|
||||
done
|
||||
```
|
Reference in New Issue
Block a user