Files
ipng.ch/content/articles/2023-02-12-fitlet2.md

508 lines
26 KiB
Markdown

---
date: "2023-02-12T09:51:23Z"
title: 'Review: Compulab Fitlet2'
aliases:
- /s/articles/2023/02/12/fitlet2.html
---
{{< image width="400px" float="right" src="/assets/fitlet2/Fitlet2-stock.png" alt="Fitlet" >}}
A while ago, in June 2021, we were discussing home routers that can keep up with 1G+ internet
connections in the [CommunityRack](https://www.communityrack.org) telegram channel. Of course
at IPng Networks we are fond of the Supermicro Xeon D1518 [[ref]({{< ref "2021-09-21-vpp-7" >}})],
which has a bunch of 10Gbit X522 and 1Gbit i350 and i210 intel NICs, but it does come at a certain
price.
For smaller applications, PC Engines APU6 [[ref]({{< ref "2021-07-19-pcengines-apu6" >}})] is
kind of cool and definitely more affordable. But, in this chat, Patrick offered an alternative,
the [[Fitlet2](https://fit-iot.com/web/products/fitlet2/)] which is a small, passively cooled,
and expandable IoT-esque machine.
Fast forward 18 months, and Patrick decided to sell off his units, so I bought one off of him,
and decided to loadtest it. Considering the pricetag (the unit I will be testing will ship for
around $400), and has the ability to use (1G/SFP) fiber optics, it may be a pretty cool one!
# Executive Summary
**TL/DR: Definitely a cool VPP router, 3x 1Gbit line rate, A- would buy again**
With some care on the VPP configuration (notably RX/TX descriptors), this unit can handle L2XC at
(almost) line rate in both directions (2.94Mpps out a theoretical 2.97Mpps), with one VPP worker
thread, which it not just good, it's _Good Enough&trade;_, at which time there is still plenty of
headroom on the CPU, as the Atom E3950 has 4 cores.
In IPv4 routing, using two VPP worker threads, and 2 RX/TX queues on each NIC, the machine keeps up
with 64 byte traffic in both directions (ie 2.97Mpps), again with compute power to spare, and while
using only two out of four CPU cores on the Atom E3950.
For a $400,- machine that draws close to 11 Watts fully loaded, and sporting 8GB (at a max of 16GB)
this Fitlet2 is a gem: it will easily keep up 3x 1Gbit in a production environment, while carrying
multiple full BGP tables (900K IPv4 and 170K IPv6), with room to spare. _It's a classy little
machine!_
## Detailed findings
{{< image width="250px" float="right" src="/assets/fitlet2/Fitlet2-BottomOpen.png" alt="Fitlet2 Open" >}}
The first thing that I noticed when it arrived is how small it is! The design of the Fitlet2 has a
motherboard with a non-removable Atom E3950 CPU running at 1.6GHz, from the _Goldmont_ series. This
is a notoriously slow/budget CPU, and it comes with 4C/4T, each CPU thread comes with 24kB of L1
and 1MB of L2 cache, and there is no L3 cache on this CPU at all. That would mean performance in
applications like VPP (which try to leverage these caches) will be poorer -- the main question on
my mind is: does the CPU have enough __oompff__ to keep up with the 1G network cards? I'll want this
CPU to be able to handle roughly 4.5Mpps in total, in order for Fitlet2 to count itself amongst the
_wirespeed_ routers.
Looking further, Fitlet2 has one HDMI and one MiniDP port, two USB2 and two USB3 ports, two Intel
i211 NICs with RJ45 port (these are 1Gbit). There's a helpful MicroSD slot, two LEDs and an audio
in- and output 3.5mm jack. The power button does worry me a little bit, I feel like just brushing
against it may turn the machine off. I do appreciate the cooling situation - the top finned plate
mates with the CPU on the top of the motherboard, and the bottom bracket holds a sizable aluminium
cooling block which further helps dissipate heat, without needing any active cooling. The Fitlet
folks claim this machine can run in environments anywhere between -50C and +112C, which I won't be
doing :)
{{< image width="400px" float="right" src="/assets/fitlet2/Fitlet2+FACET.png" alt="Fitlet2" >}}
Inside, there's a single DDR3 SODIMM slot for memory (the one I have came with 8GB at 1600MT/s) and
a custom, ableit open specification expansion board called a __FACET-Card__ which stands for
**F**unction **A**nd **C**onnectivity **E**xtension **T**-Card, well okay then! The __FACET__ card
in this little machine sports one extra Intel i210-IS NIC, an M2 for an SSD, and an M2E for a WiFi
port. The NIC is a 1Gbit SFP capable device. You can see its optic cage on the _FACET_ card above,
next to the yellow CMOS / Clock battery.
The whole thing is fed with 12V powerbrick delivering 2A, and a nice touch is that the barrel
connector has a plastic bracket that locks it into the chassis by turning it 90degrees, so it won't
flap around in the breeze and detach. I wish other embedded PCs would ship with those, as I've been
fumbling around in 19" racks that are, let me say, less tightly cable organized, and may or may not
have disconnected the CHIX routeserver at some point in the past. Sorry, Max :)
For the curious, here's a list of interesting details: [[lspci](/assets/fitlet2/lspci.txt)] -
[[dmidecode](/assets/fitlet2/dmidecode.txt)] -
[[likwid-topology](/assets/fitlet2/likwid-topology.txt)] - [[dmesg](/assets/fitlet2/dmesg.txt)].
## Preparing the Fitlet2
First, I grab a USB key and install Debian _Bullseye_ (11.5) on it, using the UEFI installer. After
booting, I carry through the instructions on my [[VPP Production]({{< ref "2021-09-21-vpp-7" >}})]
post. Notably, I create the `dataplane` namespace, run an SSH and SNMP agent there, run
`isolcpus=1-3` so that I can give three worker threads to VPP, but I start off giving it only one (1)
worker thread, because this way I can take a look at what the performance is of a single CPU, before
scaling out to the three (3) threads that this CPU can offer. I also take the defaults for DPDK,
notably allowing the DPDK poll-mode-drivers to take their proposed defaults:
* **GigabitEthernet1/0/0**: Intel Corporation I211 Gigabit Network Connection (rev 03)
> rx: queues 1 (max 2), desc 512 (min 32 max 4096 align 8) <br />
> tx: queues 2 (max 2), desc 512 (min 32 max 4096 align 8)
* **GigabitEthernet3/0/0**: Intel Corporation I210 Gigabit Fiber Network Connection (rev 03)
> rx: queues 1 (max 4), desc 512 (min 32 max 4096 align 8) <br />
> tx: queues 2 (max 4), desc 512 (min 32 max 4096 align 8)
I observe that the i211 NIC allows for a maximum of two (2) RX/TX queues, while the (older!) i210
will allow for four (4) of them. And another thing that I see here is that there are two (2) TX
queues active, but I only have one worker thread, so what gives? This is because there is always a
main thread and a worker thread, and it could be that the main thread needs to / wants to send
traffic out on an interface, so it always attaches to a queue in addition to the worker thread(s).
When exploring new hardware, I find it useful to take a look at the output of a few tactical `show`
commands on the CLI, such as:
**1. What CPU is in this machine?**
```
vpp# show cpu
Model name: Intel(R) Atom(TM) Processor E3950 @ 1.60GHz
Microarch model (family): [0x6] Goldmont ([0x5c] Apollo Lake) stepping 0x9
Flags: sse3 pclmulqdq ssse3 sse41 sse42 rdrand pqe rdseed aes sha invariant_tsc
Base frequency: 1.59 GHz
```
**2. Which devices on the PCI bus, PCIe speed details, and driver?**
```
vpp# show pci
Address Sock VID:PID Link Speed Driver Product Name Vital Product Data
0000:01:00.0 0 8086:1539 2.5 GT/s x1 uio_pci_generic
0000:02:00.0 0 8086:1539 2.5 GT/s x1 igb
0000:03:00.0 0 8086:1536 2.5 GT/s x1 uio_pci_generic
```
__Note__: This device at slot `02:00.0` is the second onboard RJ45 i211 NIC. I have used this one
to log in to the Fitlet2 and more easily kill/restart VPP and so on, but I could of course just as
well give it to VPP, in which case I'd have three gigabit interfaces to play with!
**3. What details are known for the physical NICs?**
```
vpp# show hardware GigabitEthernet1/0/0
GigabitEthernet1/0/0 1 up GigabitEthernet1/0/0
Link speed: 1 Gbps
RX Queues:
queue thread mode
0 vpp_wk_0 (1) polling
TX Queues:
TX Hash: [name: hash-eth-l34 priority: 50 description: Hash ethernet L34 headers]
queue shared thread(s)
0 no 0
1 no 1
Ethernet address 00:01:c0:2a:eb:a8
Intel e1000
carrier up full duplex max-frame-size 2048
flags: admin-up maybe-multiseg tx-offload intel-phdr-cksum rx-ip4-cksum int-supported
rx: queues 1 (max 2), desc 512 (min 32 max 4096 align 8)
tx: queues 2 (max 2), desc 512 (min 32 max 4096 align 8)
pci: device 8086:1539 subsystem 8086:0000 address 0000:01:00.00 numa 0
max rx packet len: 16383
promiscuous: unicast off all-multicast on
vlan offload: strip off filter off qinq off
rx offload avail: vlan-strip ipv4-cksum udp-cksum tcp-cksum vlan-filter
vlan-extend scatter keep-crc rss-hash
rx offload active: ipv4-cksum scatter
tx offload avail: vlan-insert ipv4-cksum udp-cksum tcp-cksum sctp-cksum
tcp-tso multi-segs
tx offload active: ipv4-cksum udp-cksum tcp-cksum multi-segs
rss avail: ipv4-tcp ipv4-udp ipv4 ipv6-tcp-ex ipv6-udp-ex ipv6-tcp
ipv6-udp ipv6-ex ipv6
rss active: none
tx burst function: (not available)
rx burst function: (not available)
```
### Configuring VPP
After this exploratory exercise, I have learned enough about the hardware to be able to take the
Fitlet2 out for a spin. To configure the VPP instance, I turn to
[[vppcfg](https://github.com/pimvanpelt/vppcfg)], which can take a YAML configuration file
describing the desired VPP configuration, and apply it safely to the running dataplane using the VPP
API. I've written a few more posts on how it does that, notably on its [[syntax]({{< ref "2022-03-27-vppcfg-1" >}})]
and its [[planner]({{< ref "2022-04-02-vppcfg-2" >}})]. A complete
configuration guide on vppcfg can be found
[[here](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md)].
```
pim@fitlet:~$ sudo dpkg -i {lib,}vpp*23.06*deb
pim@fitlet:~$ sudo apt install python3-pip
pim@fitlet:~$ sudo pip install vppcfg-0.0.3-py3-none-any.whl
```
### Methodology
#### Method 1: Single CPU Thread Saturation
First I will take VPP out for a spin by creating an L2 Cross Connect where any ethernet frame
received on `Gi1/0/0` will be directly transmitted as-is on `Gi3/0/0` and vice versa. This is a
relatively cheap operation for VPP, as it will not have to do any routing table lookups. The
configuration looks like this:
```
pim@fitlet:~$ cat << EOF > l2xc.yaml
interfaces:
GigabitEthernet1/0/0:
mtu: 1500
l2xc: GigabitEthernet3/0/0
GigabitEthernet3/0/0:
mtu: 1500
l2xc: GigabitEthernet1/0/0
EOF
pim@fitlet:~$ vppcfg plan -c l2xc.yaml
[INFO ] root.main: Loading configfile l2xc.yaml
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
[INFO ] root.main: Configuration is valid
[INFO ] vppcfg.vppapi.connect: VPP version is 23.06-rc0~35-gaf4046134
comment { vppcfg sync: 10 CLI statement(s) follow }
set interface l2 xconnect GigabitEthernet1/0/0 GigabitEthernet3/0/0
set interface l2 tag-rewrite GigabitEthernet1/0/0 disable
set interface l2 xconnect GigabitEthernet3/0/0 GigabitEthernet1/0/0
set interface l2 tag-rewrite GigabitEthernet3/0/0 disable
set interface mtu 1500 GigabitEthernet1/0/0
set interface mtu 1500 GigabitEthernet3/0/0
set interface mtu packet 1500 GigabitEthernet1/0/0
set interface mtu packet 1500 GigabitEthernet3/0/0
set interface state GigabitEthernet1/0/0 up
set interface state GigabitEthernet3/0/0 up
[INFO ] vppcfg.reconciler.write: Wrote 11 lines to (stdout)
[INFO ] root.main: Planning succeeded
```
{{< image width="500px" float="right" src="/assets/fitlet2/l2xc-demo1.png" alt="Fitlet2 L2XC First Try" >}}
After I paste these commands on the CLI, I start T-Rex in L2 stateless mode, and start T-Rex, I can
generate some activity by starting the `bench` profile on port 0 with packets of 64 bytes in size
and with varying IPv4 source and destination addresses _and_ ports:
```
tui>start -f stl/bench.py -m 1.48mpps -p 0
-t size=64,vm=var2
```
Let me explain a few hilights from the picture to the right. When starting this profile, I
specified 1.48Mpps, which is the maximum amount of packets/second that can be generated on a 1Gbit
link when using 64 byte frames (the smallest permissible ethernet frames). I do this because the
loadtester comes with 10Gbit (and 100Gbit) ports, but the Fitlet2 has only 1Gbit ports. Then, I see
that port0 is indeed transmitting (**Tx pps**) 1.48 Mpps, shown in dark blue. This is about 992 Mbps
on the wire (the **Tx bps L1**), but due to the overhead of ethernet (each 64 byte ethernet frame
needs an additional 20 bytes [[details](https://en.wikipedia.org/wiki/Ethernet_frame)]), so the **Tx
bps L2** is about `64/84 * 992.35 = 756.08` Mbps, which lines up.
Then, after the Fitlet2 tries its best to forward those from its receiving Gi1/0/0 port onto its
transmitting port Gi3/0/0, they are received again by T-Rex on port 1. Here, I can see that the **Rx
pps** is 1.29 Mpps, with an **Rx bps** of 660.49 Mbps (which is the L2 counter), and in bright red
at the top I see the **drop_rate** is about 95.59 Mbps. In other words, the Fitlet2 is _not keeping
up_.
But, after I take a look at the runtime statistics, I see that the CPU isn't very busy at all:
```
vpp# show run
...
Thread 1 vpp_wk_0 (lcore 1)
Time 23.8, 10 sec internal node vector rate 4.30 loops/sec 1638976.68
vector rates in 1.2908e6, out 1.2908e6, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors Suspends Clocks Vectors/Call
GigabitEthernet3/0/0-output active 6323688 27119700 0 9.14e1 4.29
GigabitEthernet3/0/0-tx active 6323688 27119700 0 1.79e2 4.29
dpdk-input polling 44406936 27119701 0 5.35e2 .61
ethernet-input active 6323689 27119701 0 1.42e2 4.29
l2-input active 6323689 27119701 0 9.94e1 4.29
l2-output active 6323689 27119701 0 9.77e1 4.29
```
Very interesting! Notice that the line above says `vector rates in .. out ..` are saying that the
thread is receiving only 1.29Mpps, and it is managing to send all of them out as well. When a VPP
worker is busy, each DPDK call will yield many packets, up to 256 in one call, which means the
amount of "vectors per call" will rise. Here, I see that on average, DPDK is returning an average of
only 0.61 packets each time it polls the NIC, and in each time a bunch of the packets are sent off
into the VPP graph, there is an average of 4.29 packets per loop. If the CPU was the bottleneck, it
would look more like 256 in the Vectors/Call column -- so the **bottleneck must be in the NIC**.
Remember above, when I showed the `show hardware` command output? There's a clue in there. The
Fitlet2 has two onboard i211 NICs and one i210 NIC on the _FACET_ card. Despite the lower number,
the i210 is a bit more advanced
[[datasheet](/assets/fitlet2/i210_ethernet_controller_datasheet-257785.pdf)]. If I reverse the
direction of flow (so receiving on the i210 Gi3/0/0, and transmitting on the i211 Gi1/0/0), things
look a fair bit better:
```
vpp# show run
...
Thread 1 vpp_wk_0 (lcore 1)
Time 12.6, 10 sec internal node vector rate 4.02 loops/sec 853956.73
vector rates in 1.4799e6, out 1.4799e6, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors Suspends Clocks Vectors/Call
GigabitEthernet1/0/0-output active 4642964 18652932 0 9.34e1 4.02
GigabitEthernet1/0/0-tx active 4642964 18652420 0 1.73e2 4.02
dpdk-input polling 12200880 18652933 0 3.27e2 1.53
ethernet-input active 4642965 18652933 0 1.54e2 4.02
l2-input active 4642964 18652933 0 1.04e2 4.02
l2-output active 4642964 18652933 0 1.01e2 4.02
```
Hey, would you look at that! The line up top here shows vector rates of in 1.4799e6 (which is
1.48Mpps) and outbound is the same number. And in this configuration as well, the DPDK node isn't
even reading that many packets, and the graph traversal is on average with 4.02 packets per run,
which means that this CPU can do in excess of 1.48Mpps on one (1) CPU thread. Slick!
So what _is_ the maximum throughput per CPU thread? To show this, I will saturate both ports with
line rate traffic, and see what makes it through the other side. After instructing the T-Rex to
perform the following profile:
```
tui>start -f stl/bench.py -m 1.48mpps -p 0 1 \
-t size=64,vm=var2
```
T-Rex will faithfully start to send traffic on both ports and expect the same amount back from the
Fitlet2 (the _Device Under Test_ or _DUT_). I can see that from T-Rex port 1->0 all traffic makes
its way back, but from port 0->1 there is a little bit of loss (for the 1.48Mpps sent, only 1.43Mpps
is returned). This is the same phenomenon that I explained above -- the i211 NIC is not quite as
good at eating packets as the i210 NIC is.
Even when doing this though, the (still) single threaded VPP is keeping up just fine, CPU wise:
```
vpp# show run
...
Thread 1 vpp_wk_0 (lcore 1)
Time 13.4, 10 sec internal node vector rate 13.59 loops/sec 122820.33
vector rates in 2.9599e6, out 2.8834e6, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors Suspends Clocks Vectors/Call
GigabitEthernet1/0/0-output active 1822674 19826616 0 3.69e1 10.88
GigabitEthernet1/0/0-tx active 1822674 19597360 0 1.51e2 10.75
GigabitEthernet3/0/0-output active 1823770 19826612 0 4.79e1 10.87
GigabitEthernet3/0/0-tx active 1823770 19029508 0 1.56e2 10.43
dpdk-input polling 1827320 39653228 0 1.62e2 21.70
ethernet-input active 3646444 39653228 0 7.67e1 10.87
l2-input active 1825356 39653228 0 4.96e1 21.72
l2-output active 1825356 39653228 0 4.58e1 21.72
```
Here we can see 2.96Mpps received (_vector rates in_) while only 2.88Mpps are transmitted (_vector
rates out_). First off, this lines up perfectly with the reporting of T-Rex in the screenshot above,
and it also shows that one direction loses more packets than the other. We're dropping some 80kpps,
but where did they go? Looking at the statistics counters, which include any packets which had
errors in processing, we learn more:
```
vpp# show err
Count Node Reason Severity
3109141488 l2-output L2 output packets error
3109141488 l2-input L2 input packets error
9936649 GigabitEthernet1/0/0-tx Tx packet drops (dpdk tx failure) error
32120469 GigabitEthernet3/0/0-tx Tx packet drops (dpdk tx failure) error
```
{{< image width="500px" float="right" src="/assets/fitlet2/l2xc-demo2.png" alt="Fitlet2 L2XC Second Try" >}}
Aha! From previous experience I know that when DPDK signals packet drops due to 'tx failure',
that this is often because it's trying to hand off the packet to the NIC, which has a ringbuffer to
collect them while the hardware transmits them onto the wire, and this NIC has run out of slots,
which means the packet has to be dropped and a kitten gets hurt. But, I can raise the number of
RX and TX slots, by setting them in VPP's `startup.conf` file:
```
dpdk {
dev default {
num-rx-desc 512 ## default
num-tx-desc 1024
}
no-multi-seg
}
```
And with that simple tweak, I've succeeded in configuring the Fitlet2 in a way that it is capable of
receiving and transmitting 64 byte packets in both directions at (almost) line rate, with **one CPU
thread**.
#### Method 2: Rampup using trex-loadtest.py
For this test, I decide to put the Fitlet2 into L3 mode (up until now it was set up in _L2 Cross
Connect_ mode). To do this, I give the interfaces an IPv4 address and set a route for the loadtest
traffic (which will be coming from `16.0.0.0/8` and going to `48.0.0.0/8`). I will once again look
to `vppcfg` to do this, because manipulating the YAML files like this allow me to easily and reliabily
swap back and forth, letting `vppcfg` do the mundane chore of figuring out what commands to type, in
which order, safely.
From my existing L2XC dataplane configuration, I switch to L3 like so:
```
pim@fitlet:~$ cat << EOF > l3.yaml
interfaces:
GigabitEthernet1/0/0:
mtu: 1500
lcp: e1-0-0
addresses: [ 100.64.10.1/30 ]
GigabitEthernet3/0/0:
mtu: 1500
lcp: e3-0-0
addresses: [ 100.64.10.5/30 ]
EOF
pim@fitlet:~$ vppcfg plan -c l3.yaml
[INFO ] root.main: Loading configfile l3.yaml
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
[INFO ] root.main: Configuration is valid
[INFO ] vppcfg.vppapi.connect: VPP version is 23.06-rc0~35-gaf4046134
comment { vppcfg prune: 2 CLI statement(s) follow }
set interface l3 GigabitEthernet1/0/0
set interface l3 GigabitEthernet3/0/0
comment { vppcfg create: 2 CLI statement(s) follow }
lcp create GigabitEthernet1/0/0 host-if e1-0-0
lcp create GigabitEthernet3/0/0 host-if e3-0-0
comment { vppcfg sync: 2 CLI statement(s) follow }
set interface ip address GigabitEthernet1/0/0 100.64.10.1/30
set interface ip address GigabitEthernet3/0/0 100.64.10.5/30
[INFO ] vppcfg.reconciler.write: Wrote 9 lines to (stdout)
[INFO ] root.main: Planning succeeded
```
One small note -- `vppcfg` cannot set routes, and this is by design as the Linux Control Plane is
meant to take care of that. I can either set routes using `ip` in the `dataplane` network namespace,
like so:
```
pim@fitlet:~$ sudo nsenter --net=/var/run/netns/dataplane
root@fitlet:/home/pim# ip route add 16.0.0.0/8 via 100.64.10.2
root@fitlet:/home/pim# ip route add 48.0.0.0/8 via 100.64.10.6
```
Or, alternatively, I can set them directly on VPP in the CLI, interestingly with identical syntax:
```
pim@fitlet:~$ vppctl
vpp# ip route add 16.0.0.0/8 via 100.64.10.2
vpp# ip route add 48.0.0.0/8 via 100.64.10.6
```
The loadtester will run a bunch of profiles (1514b, _imix_, 64b with multiple flows, and 64b with
only one flow), either in unidirectional or bidirectional mode, which gives me a wealth of data to
share:
Loadtest | 1514b | imix | Multi 64b | Single 64b
-------------------- | -------- | -------- | --------- | ----------
***Bidirectional*** | [81.7k (100%)](/assets/fitlet2/fitlet2.bench-var2-1514b-bidirectional.html) | [327k (100%)](/assets/fitlet2/fitlet2.bench-var2-imix-bidirectional.html) | [1.48M (100%)](/assets/fitlet2/fitlet2.bench-var2-bidirectional.html) | [1.43M (98.8%)](/assets/fitlet2/fitlet2.bench-bidirectional.html)
***Unidirectional*** | [73.2k (89.6%)](/assets/fitlet2/fitlet2.bench-var2-1514b-unidirectional.html) | [255k (78.2%)](/assets/fitlet2/fitlet2.bench-var2-imix-unidirectional.html) | [1.18M (79.4%)](/assets/fitlet2/fitlet2.bench-var2-unidirectional.html) | [1.23M (82.7%)](/assets/fitlet2/fitlet2.bench-bidirectional.html)
## Caveats
While all results of the loadtests are navigable [[here](/assets/fitlet2/fitlet2.html)], I will cherrypick
one interesting bundle showing the results of _all_ (bi- and unidirectional) tests:
{{< image src="/assets/fitlet2/loadtest.png" alt="Fitlet2 All Loadtests" >}}
I have to admit I was a bit stumped with the unidirectional loadtests - these
are pushing traffic into the i211 (onboard RJ45) NIC, and out of the i210
(_FACET_ SFP) NIC. What I found super weird (and can't really explain), is
that the _unidirectional_ load, which in the end serves half the packets/sec,
is __lower__ than the _bidirectional_ load, which was almost perfect dropping
only a little bit of traffic at the very end. A picture says a thousand words -
so here's a graph of all the loadtests, which you can also find by clicking on
the links in the table.
## Appendix
### Generating the data
The JSON files that are emitted by my loadtester script can be fed directly into Michal's
[visualizer](https://github.com/wejn/trex-loadtest-viz) to plot interactive graphs (which I've
done for the table above):
```
DEVICE=Fitlet2
## Loadtest
SERVER=${SERVER:=hvn0.lab.ipng.ch}
TARGET=${TARGET:=l3}
RATE=${RATE:=10} ## % of line
DURATION=${DURATION:=600}
OFFSET=${OFFSET:=10}
PROFILE=${PROFILE:="ipng"}
for DIR in unidirectional bidirectional; do
for SIZE in 1514 imix 64; do
[ $DIR == "unidirectional" ] && FLAGS="-u "
## Multiple Flows
./trex-loadtest -s ${SERVER} ${FLAGS} -p $PROFILE}.py -t "offset=${OFFSET},vm=var2,size=${SIZE}" \
-rd ${DURATION} -rt ${RATE} -o ${DEVICE}-${TARGET}-${PROFILE}-var2-${SIZE}-${DIR}.json
[ $SIZE -eq 64 ] && {
## Specialcase: Single Flow
./trex-loadtest -s ${SERVER} ${FLAGS -p ${PROFILE}.py -t "offset=${OFFSET},size=${SIZE}" \
-rd ${DURATION} -rt ${RATE} -o ${DEVICE}-${TARGET}-${PROFILE}-${SIZE}-${DIR}.json
}
done
done
## Graphs
ruby graph.rb -t "${DEVICE} All Loadtests" ${DEVICE}*.json -o ${DEVICE}.html
ruby graph.rb -t "${DEVICE} Unidirectional Loadtests" ${DEVICE}*unidir*.json \
-o ${DEVICE}.unidirectional.html
ruby graph.rb -t "${DEVICE} Bidirectional Loadtests" ${DEVICE}*bidir*.json \
-o ${DEVICE}.bidirectional.html
for i in ${PROFILE}-var2-1514 ${PROFILE}-var2-imix ${PROFILE}-var2-64 ${PROFILE}-64; do
ruby graph.rb -t "${DEVICE} Unidirectional Loadtests" ${DEVICE}*-${i}*unidirectional.json \
-o ${DEVICE}.$i-unidirectional.html; done
ruby graph.rb -t "${DEVICE} Bidirectional Loadtests" ${DEVICE}*-${i}*bidirectional.json \
-o ${DEVICE}.$i-bidirectional.html; done
done
```