Some readability fixes
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
@ -38,10 +38,10 @@ and is homed on the sFlow.org website [[ref](https://sflow.org/sflow_version_5.t
|
||||
Switching ASIC in the dataplane (seen at the bottom of the diagram to the left) is asked to copy
|
||||
1-in-N packets to local sFlow Agent.
|
||||
|
||||
**Samples**: The agent will copy the first N bytes (typically 128) of the packet into a sample. As
|
||||
the ASIC knows which interface the packet was received on, the `inIfIndex` will be added. After the
|
||||
egress port(s) are found in an L2 FIB, or a next hop (and port) is found after a routing decision,
|
||||
the ASIC can annotate the sample with this `outIfIndex` and `DstMAC` metadata as well.
|
||||
**Sampling**: The agent will copy the first N bytes (typically 128) of the packet into a sample. As
|
||||
the ASIC knows which interface the packet was received on, the `inIfIndex` will be added. After a
|
||||
routing decision is made, the nexthop and its L2 address and interface become known. The ASIC might
|
||||
annotate the sample with this `outIfIndex` and `DstMAC` metadata as well.
|
||||
|
||||
**Drop Monitoring**: There's one rather clever insight that sFlow gives: what if the packet _was
|
||||
not_ routed or switched, but rather discarded? For this, sFlow is able to describe the reason for
|
||||
@ -53,27 +53,28 @@ to overstate how important it is to have this so-called _drop monitoring_, as op
|
||||
hours and hours figuring out _why_ packets are lost their network or datacenter switching fabric.
|
||||
|
||||
**Metadata**: The agent may have other metadata as well, such as which prefix was the source and
|
||||
destination of the packet, what additional RIB information do we have (AS path, BGP communities, and
|
||||
so on). This may be added to the sample record as well.
|
||||
destination of the packet, what additional RIB information is available (AS path, BGP communities,
|
||||
and so on). This may be added to the sample record as well.
|
||||
|
||||
**Counters**: Since we're doing sampling of 1:N packets, we can estimate total traffic in a
|
||||
**Counters**: Since sFlow is sampling 1:N packets, the system can estimate total traffic in a
|
||||
reasonably accurate way. Peter and Sonia wrote a succint
|
||||
[[paper](https://sflow.org/packetSamplingBasics/)] about the math, so I won't get into that here.
|
||||
Mostly because I am but a software engineer, not a statistician... :) However, I will say this: if we
|
||||
sample a fraction of the traffic but know how many bytes and packets we saw in total, we can provide
|
||||
an overview with a quantifiable accuracy. This is why the Agent will periodically get the interface
|
||||
counters from the ASIC.
|
||||
Mostly because I am but a software engineer, not a statistician... :) However, I will say this: if a
|
||||
fraction of the traffic is sampled but the _Agent_ knows how many bytes and packets were forwarded
|
||||
in total, it can provide an overview with a quantifiable accuracy. This is why the _Agent_ will
|
||||
periodically get the interface counters from the ASIC.
|
||||
|
||||
**Collector**: One or more samples can be concatenated into UDP messages that go from the Agent to a
|
||||
central _sFlow Collector_. The heavy lifting in analysis is done upstream from the switch or router,
|
||||
which is great for performance. Many thousands or even tens of thousands of agents can forward
|
||||
their samples and interface counters to a single central collector, which in turn can be used to
|
||||
draw up a near real time picture of the state of traffic through even the largest of ISP networks or
|
||||
datacenter switch fabrics.
|
||||
**Collector**: One or more samples can be concatenated into UDP messages that go from the _sFlow
|
||||
Agent_ to a central _sFlow Collector_. The heavy lifting in analysis is done upstream from the
|
||||
switch or router, which is great for performance. Many thousands or even tens of thousands of
|
||||
agents can forward their samples and interface counters to a single central collector, which in turn
|
||||
can be used to draw up a near real time picture of the state of traffic through even the largest of
|
||||
ISP networks or datacenter switch fabrics.
|
||||
|
||||
In sFlow parlance [[VPP](https://fd.io/)] and its companion `hsflowd` is an _Agent_ (it sends the
|
||||
UDP packets over the network), and for example the commandline tool `sflowtool` could be a
|
||||
_Collector_ (it receives the UDP packets).
|
||||
In sFlow parlance [[VPP](https://fd.io/)] and its companion
|
||||
[[hsflowd](https://github.com/sflow/host-sflow)] together form an _Agent_ (it sends the UDP packets
|
||||
over the network), and for example the commandline tool `sflowtool` could be a _Collector_ (it
|
||||
receives the UDP packets).
|
||||
|
||||
## Recap: sFlow in VPP
|
||||
|
||||
@ -81,13 +82,12 @@ First, I have some pretty good news to report - our work on this plugin was
|
||||
[[merged](https://gerrit.fd.io/r/c/vpp/+/41680)] and will be included in the VPP 25.02 release in a
|
||||
few weeks! Last weekend, I gave a lightning talk at
|
||||
[[FOSDEM](https://fosdem.org/2025/schedule/event/fosdem-2025-4196-vpp-monitoring-100gbps-with-sflow/)]
|
||||
and caught up with a lot of community members and network- and software engineers. I had a great
|
||||
time.
|
||||
in Brussels, Belgium, and caught up with a lot of community members and network- and software
|
||||
engineers. I had a great time.
|
||||
|
||||
in the dataplane low, we get both high performance, and a smaller probability of bugs causing harm.
|
||||
And I do like simple implementations, as they tend to cause less _SIGSEGVs_. The architecture of the
|
||||
end to end solution consists of three distinct parts, each with their own risk and performance
|
||||
profile:
|
||||
In trying to keep the amount of code as small as possible, and therefor the probability of bugs that
|
||||
might impact VPP's dataplane stability low, the architecture of the end to end solution consists of
|
||||
three distinct parts, each with their own risk and performance profile:
|
||||
|
||||
{{< image float="left" src="/assets/sflow/sflow-vpp-overview.png" alt="sFlow VPP Overview" width="18em" >}}
|
||||
|
||||
@ -104,7 +104,7 @@ get their fair share of samples into the Agent's hands.
|
||||
processing time _away_ from the dataplane. This _sflow-process_ does two things. Firstly, it
|
||||
consumes samples from the per-worker FIFO queues (both forwarded packets in green, and dropped ones
|
||||
in red). Secondly, it keeps track of time and every few seconds (20 by default, but this is
|
||||
configurable), it'll grab all interface counters from those interfaces for which we have sFlow
|
||||
configurable), it'll grab all interface counters from those interfaces for which I have sFlow
|
||||
turned on. VPP produces _Netlink_ messages and sends them to the kernel.
|
||||
|
||||
**host-sflow**: The third component is external to VPP: `hsflowd` subscribes to the _Netlink_
|
||||
@ -132,7 +132,7 @@ turns on sampling at a given rate on physical devices, also known as _hardware-i
|
||||
the open source component [[host-sflow](https://github.com/sflow/host-sflow/releases)] can be
|
||||
configured as of release v2.11-5 [[ref](https://github.com/sflow/host-sflow/tree/v2.1.11-5)].
|
||||
|
||||
I can configure VPP in three ways:
|
||||
I will show how to configure VPP in three ways:
|
||||
|
||||
***1. VPP Configuration via CLI***
|
||||
|
||||
@ -148,22 +148,26 @@ vpp0-0# sflow enable GigabitEthernet10/0/3
|
||||
```
|
||||
|
||||
The first three commands set the global defaults - in my case I'm going to be sampling at 1:100
|
||||
which is unusually high frequency. A production setup may take 1:<linkspeed-in-megabits> so for a
|
||||
which is unusually high frequency. A production setup may take 1-in-_linkspeed-in-megabits_ so for a
|
||||
1Gbps device 1:1'000 is appropriate. For 100GE, something between 1:10'000 and 1:100'000 is more
|
||||
appropriate. The second command sets the interface stats polling interval. The default is to gather
|
||||
these statistics every 20 seconds, but I set it to 10s here. Then, I instruct the plugin how many
|
||||
bytes of the sampled ethernet frame should be taken. Common values are 64 and 128. I want enough
|
||||
data to see the headers, like MPLS label(s), Dot1Q tag(s), IP header and TCP/UDP/ICMP header, but
|
||||
the contents of the payload are rarely interesting for statistics purposes. Finally, I can turn on
|
||||
the sFlow plugin on an interface with the `sflow enable-disable` CLI. In VPP, an idiomatic way to
|
||||
turn on and off things is to have an enabler/disabler. It feels a bit clunky maybe to write `sflow
|
||||
enable $iface disable` but it makes more logical sends if you parse that as "enable-disable" with
|
||||
the default being the "enable" operation, and the alternate being the "disable" operation.
|
||||
appropriate, depending on link load. The second command sets the interface stats polling interval.
|
||||
The default is to gather these statistics every 20 seconds, but I set it to 10s here.
|
||||
|
||||
Next, I instruct the plugin how many bytes of the sampled ethernet frame should be taken. Common
|
||||
values are 64 and 128. I want enough data to see the headers, like MPLS label(s), Dot1Q tag(s), IP
|
||||
header and TCP/UDP/ICMP header, but the contents of the payload are rarely interesting for
|
||||
statistics purposes.
|
||||
|
||||
Finally, I can turn on the sFlow plugin on an interface with the `sflow enable-disable` CLI. In VPP,
|
||||
an idiomatic way to turn on and off things is to have an enabler/disabler. It feels a bit clunky
|
||||
maybe to write `sflow enable $iface disable` but it makes more logical sends if you parse that as
|
||||
"enable-disable" with the default being the "enable" operation, and the alternate being the
|
||||
"disable" operation.
|
||||
|
||||
***2. VPP Configuration via API***
|
||||
|
||||
I wrote a few API calls for the most common operations. Here's a snippet that shows the same calls
|
||||
as from the CLI above, but in Python APIs:
|
||||
I implemented a few API calls for the most common operations. Here's a snippet that shows the same
|
||||
calls as from the CLI above, but using these Python API calls:
|
||||
|
||||
```python
|
||||
from vpp_papi import VPPApiClient, VPPApiJSONFiles
|
||||
@ -209,13 +213,14 @@ This short program toys around a bit with the sFlow API. I first set the samplin
|
||||
the current value. Then I set the polling interval to 10s and retrieve the current value again.
|
||||
Finally, I set the header bytes to 128, and retrieve the value again.
|
||||
|
||||
Adding and removing interfaces shows the idiom I mentioned before - the API being an
|
||||
`enable_disable()` call of sorts, and typically taking a flag if the operator wants to enable (the
|
||||
default), or disable sFlow on the interface. Getting the list of enabled interfaces can be done with
|
||||
the `sflow_interface_dump()` call, which returns a list of `sflow_interface_details` messages.
|
||||
Enabling and disabling sFlow on interfaces shows the idiom I mentioned before - the API being an
|
||||
`enable_disable()` call of sorts, and typically taking a boolean argument if the operator wants to
|
||||
enable (the default), or disable sFlow on the interface. Getting the list of enabled interfaces can
|
||||
be done with the `sflow_interface_dump()` call, which returns a list of `sflow_interface_details`
|
||||
messages.
|
||||
|
||||
I wrote an article showing the Python API and how it works in a fair amount of detail in a
|
||||
[[previous article]({{< ref 2024-01-27-vpp-papi >}})], in case this type of stuff interests you.
|
||||
I demonstrated VPP's Python API and how it works in a fair amount of detail in a [[previous
|
||||
article]({{< ref 2024-01-27-vpp-papi >}})], in case this type of stuff interests you.
|
||||
|
||||
***3. VPPCfg YAML Configuration***
|
||||
|
||||
@ -667,9 +672,9 @@ booyah!
|
||||
|
||||
One question I get a lot about this plugin is: what is the performance impact when using
|
||||
sFlow? I spent a considerable amount of time tinkering with this, and together with Neil bringing
|
||||
the plugin to what we both agree is the most efficient use of CPU. We could go a bit further, but
|
||||
that would require somewhat intrusive changes to VPP's internals and as _North of the Border_ would
|
||||
say: what we have isn't just good, it's good enough!
|
||||
the plugin to what we both agree is the most efficient use of CPU. We could have gone a bit further,
|
||||
but that would require somewhat intrusive changes to VPP's internals and as _North of the Border_
|
||||
(and the Simpsons!) would say: what we have isn't just good, it's good enough!
|
||||
|
||||
I've built a small testbed based on two Dell R730 machines. On the left, I have a Debian machine
|
||||
running Cisco T-Rex using four quad-tengig network cards, the classic Intel i710-DA4. On the right,
|
||||
@ -754,7 +759,7 @@ hippo# mpls local-label add 21 eos via 100.64.4.0 TenGigabitEthernet130/0/0 out-
|
||||
```
|
||||
|
||||
Here, the MPLS configuration implements a simple P-router, where incoming MPLS packets with label 16
|
||||
will be sent back to T-Rex on Te3/0/1 to the specified IPv4 nexthop (for which we already know the
|
||||
will be sent back to T-Rex on Te3/0/1 to the specified IPv4 nexthop (for which I already know the
|
||||
MAC address), and with label 16 removed and new label 17 imposed, in other words a SWAP operation.
|
||||
|
||||
***3. L2XC***
|
||||
@ -812,7 +817,7 @@ know that MPLS is a little bit more expensive computationally than IPv4, and tha
|
||||
total capacity is 10.11Mpps when sFlow is turned off.
|
||||
|
||||
**Overhead**: If I turn on sFLow on the interface, VPP will insert the _sflow-node_ into the
|
||||
dataplane graph between `device-input` and `ethernet-input`. It means that the sFlow node will see
|
||||
forwarding graph between `device-input` and `ethernet-input`. It means that the sFlow node will see
|
||||
_every single_ packet, and it will have to move all of these into the next node, which costs about
|
||||
9.5 CPU cycles per packet. The regression on L2XC is 3.8% but I have to note that VPP was not CPU
|
||||
bound on the L2XC so it used some CPU cycles which were still available, before regressing
|
||||
|
Reference in New Issue
Block a user