Some readability fixes
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
2025-02-09 18:30:38 +01:00
parent bcbb119b20
commit 4ac8c47127

View File

@ -38,10 +38,10 @@ and is homed on the sFlow.org website [[ref](https://sflow.org/sflow_version_5.t
Switching ASIC in the dataplane (seen at the bottom of the diagram to the left) is asked to copy
1-in-N packets to local sFlow Agent.
**Samples**: The agent will copy the first N bytes (typically 128) of the packet into a sample. As
the ASIC knows which interface the packet was received on, the `inIfIndex` will be added. After the
egress port(s) are found in an L2 FIB, or a next hop (and port) is found after a routing decision,
the ASIC can annotate the sample with this `outIfIndex` and `DstMAC` metadata as well.
**Sampling**: The agent will copy the first N bytes (typically 128) of the packet into a sample. As
the ASIC knows which interface the packet was received on, the `inIfIndex` will be added. After a
routing decision is made, the nexthop and its L2 address and interface become known. The ASIC might
annotate the sample with this `outIfIndex` and `DstMAC` metadata as well.
**Drop Monitoring**: There's one rather clever insight that sFlow gives: what if the packet _was
not_ routed or switched, but rather discarded? For this, sFlow is able to describe the reason for
@ -53,27 +53,28 @@ to overstate how important it is to have this so-called _drop monitoring_, as op
hours and hours figuring out _why_ packets are lost their network or datacenter switching fabric.
**Metadata**: The agent may have other metadata as well, such as which prefix was the source and
destination of the packet, what additional RIB information do we have (AS path, BGP communities, and
so on). This may be added to the sample record as well.
destination of the packet, what additional RIB information is available (AS path, BGP communities,
and so on). This may be added to the sample record as well.
**Counters**: Since we're doing sampling of 1:N packets, we can estimate total traffic in a
**Counters**: Since sFlow is sampling 1:N packets, the system can estimate total traffic in a
reasonably accurate way. Peter and Sonia wrote a succint
[[paper](https://sflow.org/packetSamplingBasics/)] about the math, so I won't get into that here.
Mostly because I am but a software engineer, not a statistician... :) However, I will say this: if we
sample a fraction of the traffic but know how many bytes and packets we saw in total, we can provide
an overview with a quantifiable accuracy. This is why the Agent will periodically get the interface
counters from the ASIC.
Mostly because I am but a software engineer, not a statistician... :) However, I will say this: if a
fraction of the traffic is sampled but the _Agent_ knows how many bytes and packets were forwarded
in total, it can provide an overview with a quantifiable accuracy. This is why the _Agent_ will
periodically get the interface counters from the ASIC.
**Collector**: One or more samples can be concatenated into UDP messages that go from the Agent to a
central _sFlow Collector_. The heavy lifting in analysis is done upstream from the switch or router,
which is great for performance. Many thousands or even tens of thousands of agents can forward
their samples and interface counters to a single central collector, which in turn can be used to
draw up a near real time picture of the state of traffic through even the largest of ISP networks or
datacenter switch fabrics.
**Collector**: One or more samples can be concatenated into UDP messages that go from the _sFlow
Agent_ to a central _sFlow Collector_. The heavy lifting in analysis is done upstream from the
switch or router, which is great for performance. Many thousands or even tens of thousands of
agents can forward their samples and interface counters to a single central collector, which in turn
can be used to draw up a near real time picture of the state of traffic through even the largest of
ISP networks or datacenter switch fabrics.
In sFlow parlance [[VPP](https://fd.io/)] and its companion `hsflowd` is an _Agent_ (it sends the
UDP packets over the network), and for example the commandline tool `sflowtool` could be a
_Collector_ (it receives the UDP packets).
In sFlow parlance [[VPP](https://fd.io/)] and its companion
[[hsflowd](https://github.com/sflow/host-sflow)] together form an _Agent_ (it sends the UDP packets
over the network), and for example the commandline tool `sflowtool` could be a _Collector_ (it
receives the UDP packets).
## Recap: sFlow in VPP
@ -81,13 +82,12 @@ First, I have some pretty good news to report - our work on this plugin was
[[merged](https://gerrit.fd.io/r/c/vpp/+/41680)] and will be included in the VPP 25.02 release in a
few weeks! Last weekend, I gave a lightning talk at
[[FOSDEM](https://fosdem.org/2025/schedule/event/fosdem-2025-4196-vpp-monitoring-100gbps-with-sflow/)]
and caught up with a lot of community members and network- and software engineers. I had a great
time.
in Brussels, Belgium, and caught up with a lot of community members and network- and software
engineers. I had a great time.
in the dataplane low, we get both high performance, and a smaller probability of bugs causing harm.
And I do like simple implementations, as they tend to cause less _SIGSEGVs_. The architecture of the
end to end solution consists of three distinct parts, each with their own risk and performance
profile:
In trying to keep the amount of code as small as possible, and therefor the probability of bugs that
might impact VPP's dataplane stability low, the architecture of the end to end solution consists of
three distinct parts, each with their own risk and performance profile:
{{< image float="left" src="/assets/sflow/sflow-vpp-overview.png" alt="sFlow VPP Overview" width="18em" >}}
@ -104,7 +104,7 @@ get their fair share of samples into the Agent's hands.
processing time _away_ from the dataplane. This _sflow-process_ does two things. Firstly, it
consumes samples from the per-worker FIFO queues (both forwarded packets in green, and dropped ones
in red). Secondly, it keeps track of time and every few seconds (20 by default, but this is
configurable), it'll grab all interface counters from those interfaces for which we have sFlow
configurable), it'll grab all interface counters from those interfaces for which I have sFlow
turned on. VPP produces _Netlink_ messages and sends them to the kernel.
**host-sflow**: The third component is external to VPP: `hsflowd` subscribes to the _Netlink_
@ -132,7 +132,7 @@ turns on sampling at a given rate on physical devices, also known as _hardware-i
the open source component [[host-sflow](https://github.com/sflow/host-sflow/releases)] can be
configured as of release v2.11-5 [[ref](https://github.com/sflow/host-sflow/tree/v2.1.11-5)].
I can configure VPP in three ways:
I will show how to configure VPP in three ways:
***1. VPP Configuration via CLI***
@ -148,22 +148,26 @@ vpp0-0# sflow enable GigabitEthernet10/0/3
```
The first three commands set the global defaults - in my case I'm going to be sampling at 1:100
which is unusually high frequency. A production setup may take 1:<linkspeed-in-megabits> so for a
which is unusually high frequency. A production setup may take 1-in-_linkspeed-in-megabits_ so for a
1Gbps device 1:1'000 is appropriate. For 100GE, something between 1:10'000 and 1:100'000 is more
appropriate. The second command sets the interface stats polling interval. The default is to gather
these statistics every 20 seconds, but I set it to 10s here. Then, I instruct the plugin how many
bytes of the sampled ethernet frame should be taken. Common values are 64 and 128. I want enough
data to see the headers, like MPLS label(s), Dot1Q tag(s), IP header and TCP/UDP/ICMP header, but
the contents of the payload are rarely interesting for statistics purposes. Finally, I can turn on
the sFlow plugin on an interface with the `sflow enable-disable` CLI. In VPP, an idiomatic way to
turn on and off things is to have an enabler/disabler. It feels a bit clunky maybe to write `sflow
enable $iface disable` but it makes more logical sends if you parse that as "enable-disable" with
the default being the "enable" operation, and the alternate being the "disable" operation.
appropriate, depending on link load. The second command sets the interface stats polling interval.
The default is to gather these statistics every 20 seconds, but I set it to 10s here.
Next, I instruct the plugin how many bytes of the sampled ethernet frame should be taken. Common
values are 64 and 128. I want enough data to see the headers, like MPLS label(s), Dot1Q tag(s), IP
header and TCP/UDP/ICMP header, but the contents of the payload are rarely interesting for
statistics purposes.
Finally, I can turn on the sFlow plugin on an interface with the `sflow enable-disable` CLI. In VPP,
an idiomatic way to turn on and off things is to have an enabler/disabler. It feels a bit clunky
maybe to write `sflow enable $iface disable` but it makes more logical sends if you parse that as
"enable-disable" with the default being the "enable" operation, and the alternate being the
"disable" operation.
***2. VPP Configuration via API***
I wrote a few API calls for the most common operations. Here's a snippet that shows the same calls
as from the CLI above, but in Python APIs:
I implemented a few API calls for the most common operations. Here's a snippet that shows the same
calls as from the CLI above, but using these Python API calls:
```python
from vpp_papi import VPPApiClient, VPPApiJSONFiles
@ -209,13 +213,14 @@ This short program toys around a bit with the sFlow API. I first set the samplin
the current value. Then I set the polling interval to 10s and retrieve the current value again.
Finally, I set the header bytes to 128, and retrieve the value again.
Adding and removing interfaces shows the idiom I mentioned before - the API being an
`enable_disable()` call of sorts, and typically taking a flag if the operator wants to enable (the
default), or disable sFlow on the interface. Getting the list of enabled interfaces can be done with
the `sflow_interface_dump()` call, which returns a list of `sflow_interface_details` messages.
Enabling and disabling sFlow on interfaces shows the idiom I mentioned before - the API being an
`enable_disable()` call of sorts, and typically taking a boolean argument if the operator wants to
enable (the default), or disable sFlow on the interface. Getting the list of enabled interfaces can
be done with the `sflow_interface_dump()` call, which returns a list of `sflow_interface_details`
messages.
I wrote an article showing the Python API and how it works in a fair amount of detail in a
[[previous article]({{< ref 2024-01-27-vpp-papi >}})], in case this type of stuff interests you.
I demonstrated VPP's Python API and how it works in a fair amount of detail in a [[previous
article]({{< ref 2024-01-27-vpp-papi >}})], in case this type of stuff interests you.
***3. VPPCfg YAML Configuration***
@ -667,9 +672,9 @@ booyah!
One question I get a lot about this plugin is: what is the performance impact when using
sFlow? I spent a considerable amount of time tinkering with this, and together with Neil bringing
the plugin to what we both agree is the most efficient use of CPU. We could go a bit further, but
that would require somewhat intrusive changes to VPP's internals and as _North of the Border_ would
say: what we have isn't just good, it's good enough!
the plugin to what we both agree is the most efficient use of CPU. We could have gone a bit further,
but that would require somewhat intrusive changes to VPP's internals and as _North of the Border_
(and the Simpsons!) would say: what we have isn't just good, it's good enough!
I've built a small testbed based on two Dell R730 machines. On the left, I have a Debian machine
running Cisco T-Rex using four quad-tengig network cards, the classic Intel i710-DA4. On the right,
@ -754,7 +759,7 @@ hippo# mpls local-label add 21 eos via 100.64.4.0 TenGigabitEthernet130/0/0 out-
```
Here, the MPLS configuration implements a simple P-router, where incoming MPLS packets with label 16
will be sent back to T-Rex on Te3/0/1 to the specified IPv4 nexthop (for which we already know the
will be sent back to T-Rex on Te3/0/1 to the specified IPv4 nexthop (for which I already know the
MAC address), and with label 16 removed and new label 17 imposed, in other words a SWAP operation.
***3. L2XC***
@ -812,7 +817,7 @@ know that MPLS is a little bit more expensive computationally than IPv4, and tha
total capacity is 10.11Mpps when sFlow is turned off.
**Overhead**: If I turn on sFLow on the interface, VPP will insert the _sflow-node_ into the
dataplane graph between `device-input` and `ethernet-input`. It means that the sFlow node will see
forwarding graph between `device-input` and `ethernet-input`. It means that the sFlow node will see
_every single_ packet, and it will have to move all of these into the next node, which costs about
9.5 CPU cycles per packet. The regression on L2XC is 3.8% but I have to note that VPP was not CPU
bound on the L2XC so it used some CPU cycles which were still available, before regressing