Some readability fixes
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
2025-02-09 18:30:38 +01:00
parent bcbb119b20
commit 4ac8c47127

View File

@ -38,10 +38,10 @@ and is homed on the sFlow.org website [[ref](https://sflow.org/sflow_version_5.t
Switching ASIC in the dataplane (seen at the bottom of the diagram to the left) is asked to copy Switching ASIC in the dataplane (seen at the bottom of the diagram to the left) is asked to copy
1-in-N packets to local sFlow Agent. 1-in-N packets to local sFlow Agent.
**Samples**: The agent will copy the first N bytes (typically 128) of the packet into a sample. As **Sampling**: The agent will copy the first N bytes (typically 128) of the packet into a sample. As
the ASIC knows which interface the packet was received on, the `inIfIndex` will be added. After the the ASIC knows which interface the packet was received on, the `inIfIndex` will be added. After a
egress port(s) are found in an L2 FIB, or a next hop (and port) is found after a routing decision, routing decision is made, the nexthop and its L2 address and interface become known. The ASIC might
the ASIC can annotate the sample with this `outIfIndex` and `DstMAC` metadata as well. annotate the sample with this `outIfIndex` and `DstMAC` metadata as well.
**Drop Monitoring**: There's one rather clever insight that sFlow gives: what if the packet _was **Drop Monitoring**: There's one rather clever insight that sFlow gives: what if the packet _was
not_ routed or switched, but rather discarded? For this, sFlow is able to describe the reason for not_ routed or switched, but rather discarded? For this, sFlow is able to describe the reason for
@ -53,27 +53,28 @@ to overstate how important it is to have this so-called _drop monitoring_, as op
hours and hours figuring out _why_ packets are lost their network or datacenter switching fabric. hours and hours figuring out _why_ packets are lost their network or datacenter switching fabric.
**Metadata**: The agent may have other metadata as well, such as which prefix was the source and **Metadata**: The agent may have other metadata as well, such as which prefix was the source and
destination of the packet, what additional RIB information do we have (AS path, BGP communities, and destination of the packet, what additional RIB information is available (AS path, BGP communities,
so on). This may be added to the sample record as well. and so on). This may be added to the sample record as well.
**Counters**: Since we're doing sampling of 1:N packets, we can estimate total traffic in a **Counters**: Since sFlow is sampling 1:N packets, the system can estimate total traffic in a
reasonably accurate way. Peter and Sonia wrote a succint reasonably accurate way. Peter and Sonia wrote a succint
[[paper](https://sflow.org/packetSamplingBasics/)] about the math, so I won't get into that here. [[paper](https://sflow.org/packetSamplingBasics/)] about the math, so I won't get into that here.
Mostly because I am but a software engineer, not a statistician... :) However, I will say this: if we Mostly because I am but a software engineer, not a statistician... :) However, I will say this: if a
sample a fraction of the traffic but know how many bytes and packets we saw in total, we can provide fraction of the traffic is sampled but the _Agent_ knows how many bytes and packets were forwarded
an overview with a quantifiable accuracy. This is why the Agent will periodically get the interface in total, it can provide an overview with a quantifiable accuracy. This is why the _Agent_ will
counters from the ASIC. periodically get the interface counters from the ASIC.
**Collector**: One or more samples can be concatenated into UDP messages that go from the Agent to a **Collector**: One or more samples can be concatenated into UDP messages that go from the _sFlow
central _sFlow Collector_. The heavy lifting in analysis is done upstream from the switch or router, Agent_ to a central _sFlow Collector_. The heavy lifting in analysis is done upstream from the
which is great for performance. Many thousands or even tens of thousands of agents can forward switch or router, which is great for performance. Many thousands or even tens of thousands of
their samples and interface counters to a single central collector, which in turn can be used to agents can forward their samples and interface counters to a single central collector, which in turn
draw up a near real time picture of the state of traffic through even the largest of ISP networks or can be used to draw up a near real time picture of the state of traffic through even the largest of
datacenter switch fabrics. ISP networks or datacenter switch fabrics.
In sFlow parlance [[VPP](https://fd.io/)] and its companion `hsflowd` is an _Agent_ (it sends the In sFlow parlance [[VPP](https://fd.io/)] and its companion
UDP packets over the network), and for example the commandline tool `sflowtool` could be a [[hsflowd](https://github.com/sflow/host-sflow)] together form an _Agent_ (it sends the UDP packets
_Collector_ (it receives the UDP packets). over the network), and for example the commandline tool `sflowtool` could be a _Collector_ (it
receives the UDP packets).
## Recap: sFlow in VPP ## Recap: sFlow in VPP
@ -81,13 +82,12 @@ First, I have some pretty good news to report - our work on this plugin was
[[merged](https://gerrit.fd.io/r/c/vpp/+/41680)] and will be included in the VPP 25.02 release in a [[merged](https://gerrit.fd.io/r/c/vpp/+/41680)] and will be included in the VPP 25.02 release in a
few weeks! Last weekend, I gave a lightning talk at few weeks! Last weekend, I gave a lightning talk at
[[FOSDEM](https://fosdem.org/2025/schedule/event/fosdem-2025-4196-vpp-monitoring-100gbps-with-sflow/)] [[FOSDEM](https://fosdem.org/2025/schedule/event/fosdem-2025-4196-vpp-monitoring-100gbps-with-sflow/)]
and caught up with a lot of community members and network- and software engineers. I had a great in Brussels, Belgium, and caught up with a lot of community members and network- and software
time. engineers. I had a great time.
in the dataplane low, we get both high performance, and a smaller probability of bugs causing harm. In trying to keep the amount of code as small as possible, and therefor the probability of bugs that
And I do like simple implementations, as they tend to cause less _SIGSEGVs_. The architecture of the might impact VPP's dataplane stability low, the architecture of the end to end solution consists of
end to end solution consists of three distinct parts, each with their own risk and performance three distinct parts, each with their own risk and performance profile:
profile:
{{< image float="left" src="/assets/sflow/sflow-vpp-overview.png" alt="sFlow VPP Overview" width="18em" >}} {{< image float="left" src="/assets/sflow/sflow-vpp-overview.png" alt="sFlow VPP Overview" width="18em" >}}
@ -104,7 +104,7 @@ get their fair share of samples into the Agent's hands.
processing time _away_ from the dataplane. This _sflow-process_ does two things. Firstly, it processing time _away_ from the dataplane. This _sflow-process_ does two things. Firstly, it
consumes samples from the per-worker FIFO queues (both forwarded packets in green, and dropped ones consumes samples from the per-worker FIFO queues (both forwarded packets in green, and dropped ones
in red). Secondly, it keeps track of time and every few seconds (20 by default, but this is in red). Secondly, it keeps track of time and every few seconds (20 by default, but this is
configurable), it'll grab all interface counters from those interfaces for which we have sFlow configurable), it'll grab all interface counters from those interfaces for which I have sFlow
turned on. VPP produces _Netlink_ messages and sends them to the kernel. turned on. VPP produces _Netlink_ messages and sends them to the kernel.
**host-sflow**: The third component is external to VPP: `hsflowd` subscribes to the _Netlink_ **host-sflow**: The third component is external to VPP: `hsflowd` subscribes to the _Netlink_
@ -132,7 +132,7 @@ turns on sampling at a given rate on physical devices, also known as _hardware-i
the open source component [[host-sflow](https://github.com/sflow/host-sflow/releases)] can be the open source component [[host-sflow](https://github.com/sflow/host-sflow/releases)] can be
configured as of release v2.11-5 [[ref](https://github.com/sflow/host-sflow/tree/v2.1.11-5)]. configured as of release v2.11-5 [[ref](https://github.com/sflow/host-sflow/tree/v2.1.11-5)].
I can configure VPP in three ways: I will show how to configure VPP in three ways:
***1. VPP Configuration via CLI*** ***1. VPP Configuration via CLI***
@ -148,22 +148,26 @@ vpp0-0# sflow enable GigabitEthernet10/0/3
``` ```
The first three commands set the global defaults - in my case I'm going to be sampling at 1:100 The first three commands set the global defaults - in my case I'm going to be sampling at 1:100
which is unusually high frequency. A production setup may take 1:<linkspeed-in-megabits> so for a which is unusually high frequency. A production setup may take 1-in-_linkspeed-in-megabits_ so for a
1Gbps device 1:1'000 is appropriate. For 100GE, something between 1:10'000 and 1:100'000 is more 1Gbps device 1:1'000 is appropriate. For 100GE, something between 1:10'000 and 1:100'000 is more
appropriate. The second command sets the interface stats polling interval. The default is to gather appropriate, depending on link load. The second command sets the interface stats polling interval.
these statistics every 20 seconds, but I set it to 10s here. Then, I instruct the plugin how many The default is to gather these statistics every 20 seconds, but I set it to 10s here.
bytes of the sampled ethernet frame should be taken. Common values are 64 and 128. I want enough
data to see the headers, like MPLS label(s), Dot1Q tag(s), IP header and TCP/UDP/ICMP header, but Next, I instruct the plugin how many bytes of the sampled ethernet frame should be taken. Common
the contents of the payload are rarely interesting for statistics purposes. Finally, I can turn on values are 64 and 128. I want enough data to see the headers, like MPLS label(s), Dot1Q tag(s), IP
the sFlow plugin on an interface with the `sflow enable-disable` CLI. In VPP, an idiomatic way to header and TCP/UDP/ICMP header, but the contents of the payload are rarely interesting for
turn on and off things is to have an enabler/disabler. It feels a bit clunky maybe to write `sflow statistics purposes.
enable $iface disable` but it makes more logical sends if you parse that as "enable-disable" with
the default being the "enable" operation, and the alternate being the "disable" operation. Finally, I can turn on the sFlow plugin on an interface with the `sflow enable-disable` CLI. In VPP,
an idiomatic way to turn on and off things is to have an enabler/disabler. It feels a bit clunky
maybe to write `sflow enable $iface disable` but it makes more logical sends if you parse that as
"enable-disable" with the default being the "enable" operation, and the alternate being the
"disable" operation.
***2. VPP Configuration via API*** ***2. VPP Configuration via API***
I wrote a few API calls for the most common operations. Here's a snippet that shows the same calls I implemented a few API calls for the most common operations. Here's a snippet that shows the same
as from the CLI above, but in Python APIs: calls as from the CLI above, but using these Python API calls:
```python ```python
from vpp_papi import VPPApiClient, VPPApiJSONFiles from vpp_papi import VPPApiClient, VPPApiJSONFiles
@ -209,13 +213,14 @@ This short program toys around a bit with the sFlow API. I first set the samplin
the current value. Then I set the polling interval to 10s and retrieve the current value again. the current value. Then I set the polling interval to 10s and retrieve the current value again.
Finally, I set the header bytes to 128, and retrieve the value again. Finally, I set the header bytes to 128, and retrieve the value again.
Adding and removing interfaces shows the idiom I mentioned before - the API being an Enabling and disabling sFlow on interfaces shows the idiom I mentioned before - the API being an
`enable_disable()` call of sorts, and typically taking a flag if the operator wants to enable (the `enable_disable()` call of sorts, and typically taking a boolean argument if the operator wants to
default), or disable sFlow on the interface. Getting the list of enabled interfaces can be done with enable (the default), or disable sFlow on the interface. Getting the list of enabled interfaces can
the `sflow_interface_dump()` call, which returns a list of `sflow_interface_details` messages. be done with the `sflow_interface_dump()` call, which returns a list of `sflow_interface_details`
messages.
I wrote an article showing the Python API and how it works in a fair amount of detail in a I demonstrated VPP's Python API and how it works in a fair amount of detail in a [[previous
[[previous article]({{< ref 2024-01-27-vpp-papi >}})], in case this type of stuff interests you. article]({{< ref 2024-01-27-vpp-papi >}})], in case this type of stuff interests you.
***3. VPPCfg YAML Configuration*** ***3. VPPCfg YAML Configuration***
@ -667,9 +672,9 @@ booyah!
One question I get a lot about this plugin is: what is the performance impact when using One question I get a lot about this plugin is: what is the performance impact when using
sFlow? I spent a considerable amount of time tinkering with this, and together with Neil bringing sFlow? I spent a considerable amount of time tinkering with this, and together with Neil bringing
the plugin to what we both agree is the most efficient use of CPU. We could go a bit further, but the plugin to what we both agree is the most efficient use of CPU. We could have gone a bit further,
that would require somewhat intrusive changes to VPP's internals and as _North of the Border_ would but that would require somewhat intrusive changes to VPP's internals and as _North of the Border_
say: what we have isn't just good, it's good enough! (and the Simpsons!) would say: what we have isn't just good, it's good enough!
I've built a small testbed based on two Dell R730 machines. On the left, I have a Debian machine I've built a small testbed based on two Dell R730 machines. On the left, I have a Debian machine
running Cisco T-Rex using four quad-tengig network cards, the classic Intel i710-DA4. On the right, running Cisco T-Rex using four quad-tengig network cards, the classic Intel i710-DA4. On the right,
@ -754,7 +759,7 @@ hippo# mpls local-label add 21 eos via 100.64.4.0 TenGigabitEthernet130/0/0 out-
``` ```
Here, the MPLS configuration implements a simple P-router, where incoming MPLS packets with label 16 Here, the MPLS configuration implements a simple P-router, where incoming MPLS packets with label 16
will be sent back to T-Rex on Te3/0/1 to the specified IPv4 nexthop (for which we already know the will be sent back to T-Rex on Te3/0/1 to the specified IPv4 nexthop (for which I already know the
MAC address), and with label 16 removed and new label 17 imposed, in other words a SWAP operation. MAC address), and with label 16 removed and new label 17 imposed, in other words a SWAP operation.
***3. L2XC*** ***3. L2XC***
@ -812,7 +817,7 @@ know that MPLS is a little bit more expensive computationally than IPv4, and tha
total capacity is 10.11Mpps when sFlow is turned off. total capacity is 10.11Mpps when sFlow is turned off.
**Overhead**: If I turn on sFLow on the interface, VPP will insert the _sflow-node_ into the **Overhead**: If I turn on sFLow on the interface, VPP will insert the _sflow-node_ into the
dataplane graph between `device-input` and `ethernet-input`. It means that the sFlow node will see forwarding graph between `device-input` and `ethernet-input`. It means that the sFlow node will see
_every single_ packet, and it will have to move all of these into the next node, which costs about _every single_ packet, and it will have to move all of these into the next node, which costs about
9.5 CPU cycles per packet. The regression on L2XC is 3.8% but I have to note that VPP was not CPU 9.5 CPU cycles per packet. The regression on L2XC is 3.8% but I have to note that VPP was not CPU
bound on the L2XC so it used some CPU cycles which were still available, before regressing bound on the L2XC so it used some CPU cycles which were still available, before regressing