All checks were successful
continuous-integration/drone/push Build is passing
385 lines
22 KiB
Markdown
385 lines
22 KiB
Markdown
---
|
|
date: "2023-04-09T11:01:14Z"
|
|
title: VPP - Monitoring
|
|
aliases:
|
|
- /s/articles/2023/04/09/vpp-stats.html
|
|
---
|
|
|
|
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
|
|
|
# About this series
|
|
|
|
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its
|
|
performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic
|
|
_ASR_ (aggregation service router), VPP will look and feel quite familiar as many of the approaches
|
|
are shared between the two.
|
|
|
|
I've been working on the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)], which you
|
|
can read all about in my series on VPP back in 2021:
|
|
|
|
[{: style="width:300px; float: right; margin-left: 1em;"}](https://video.ipng.ch/w/erc9sAofrSZ22qjPwmv6H4)
|
|
|
|
* [[Part 1]({{< ref "2021-08-12-vpp-1" >}})]: Punting traffic through TUN/TAP interfaces into Linux
|
|
* [[Part 2]({{< ref "2021-08-13-vpp-2" >}})]: Mirroring VPP interface configuration into Linux
|
|
* [[Part 3]({{< ref "2021-08-15-vpp-3" >}})]: Automatically creating sub-interfaces in Linux
|
|
* [[Part 4]({{< ref "2021-08-25-vpp-4" >}})]: Synchronize link state, MTU and addresses to Linux
|
|
* [[Part 5]({{< ref "2021-09-02-vpp-5" >}})]: Netlink Listener, synchronizing state from Linux to VPP
|
|
* [[Part 6]({{< ref "2021-09-10-vpp-6" >}})]: Observability with LibreNMS and VPP SNMP Agent
|
|
* [[Part 7]({{< ref "2021-09-21-vpp-7" >}})]: Productionizing and reference Supermicro fleet at IPng
|
|
|
|
With this, I can make a regular server running Linux use VPP as kind of a software ASIC for super
|
|
fast forwarding, filtering, NAT, and so on, while keeping control of the interface state (links,
|
|
addresses and routes) itself. With Linux CP, running software like [FRR](https://frrouting.org/) or
|
|
[Bird](https://bird.network.cz/) on top of VPP and achieving >150Mpps and >180Gbps forwarding
|
|
rates are easily in reach. If you find that hard to believe, check out [[my DENOG14
|
|
talk](https://video.ipng.ch/w/erc9sAofrSZ22qjPwmv6H4)] or click the thumbnail above. I am
|
|
continuously surprised at the performance per watt, and the performance per Swiss Franc spent.
|
|
|
|
## Monitoring VPP
|
|
|
|
Of course, it's important to be able to see what routers are _doing_ in production. For the longest
|
|
time, the _de facto_ standard for monitoring in the networking industry has been Simple Network
|
|
Management Protocol (SNMP), described in [[RFC 1157](https://www.rfc-editor.org/rfc/rfc1157)]. But
|
|
there's another way, using a metrics and time series system called _Borgmon_, originally designed by
|
|
Google [[ref](https://sre.google/sre-book/practical-alerting/)] but popularized by Soundcloud in an
|
|
open source interpretation called **Prometheus** [[ref](https://prometheus.io/)]. IPng Networks ♥ Prometheus.
|
|
|
|
I'm a really huge fan of Prometheus and its graphical frontend Grafana, as you can see with my work on
|
|
Mastodon in [[this article]({{< ref "2022-11-27-mastodon-3" >}})]. Join me on
|
|
[[ublog.tech](https://ublog.tech)] if you haven't joined the Fediverse yet. It's well monitored!
|
|
|
|
### SNMP
|
|
|
|
SNMP defines an extensible model by which parts of the OID (object identifier) tree can be delegated
|
|
to another process, and the main SNMP daemon will call out to it using an _AgentX_ protocol,
|
|
described in [[RFC 2741](https://datatracker.ietf.org/doc/html/rfc2741)]. In a nutshell, this
|
|
allows an external program to connect to the main SNMP daemon, register an interest in certain OIDs,
|
|
and get called whenever the SNMPd is being queried for them.
|
|
|
|
{{< image width="400px" float="right" src="/assets/vpp-stats/librenms.png" alt="LibreNMS" >}}
|
|
|
|
The flow is pretty simple (see section 6.2 of the RFC), the Agent (client):
|
|
1. opens a TCP or Unix domain socket to the SNMPd
|
|
1. sends an Open PDU, which the server will respond or reject.
|
|
1. (optionally) can send a Ping PDU, the server will respond.
|
|
1. registers an interest with Register PDU
|
|
|
|
It then waits and gets called by the SNMPd with Get PDUs (to retrieve one single value), GetNext PDU
|
|
(to enable snmpwalk), GetBulk PDU (to retrieve a whole subsection of the MIB), all of which are
|
|
answered by a Response PDU.
|
|
|
|
Using parts of a Python Agentx library written by GitHub user hosthvo
|
|
[[ref](https://github.com/hosthvo/pyagentx)], I tried my hands at writing one of these AgentX's.
|
|
The resulting source code is on [[GitHub](https://git.ipng.ch/ipng/vpp-snmp-agent)]. That's the
|
|
one that's running in production ever since I started running VPP routers at IPng Networks AS8298.
|
|
After the _AgentX_ exposes the dataplane interfaces and their statistics into _SNMP_, an open source
|
|
monitoring tool such as LibreNMS [[ref](https://librenms.org/)] can discover the routers and draw
|
|
pretty graphs, as well as detect when interfaces go down, or are overloaded, and so on. That's
|
|
pretty slick.
|
|
|
|
### VPP Stats Segment in Go
|
|
|
|
But if I may offer some critique on my own approach, SNMP monitoring is _very_ 1990s. I'm
|
|
continously surpsied that our industry is still clinging on to this archaic approach. VPP offers
|
|
_a lot_ of observability, its statistics segment is chock full of interesting counters and gauges
|
|
that can be really helpful to understand how the dataplane performs. If there are errors or a
|
|
bottleneck develops in the router, going over `show runtime` or `show errors` can be a life saver.
|
|
Let's take another look at that Stats Segment (the one that the SNMP AgentX connects to in order to
|
|
query it for packets/byte counters and interface names).
|
|
|
|
You can think of the Stats Segment as a directory hierarchy where each file represents a type of
|
|
counter. VPP comes with a small helper tool called VPP Stats FS, which uses a FUSE based read-only
|
|
filesystem to expose those counters in an intuitive way, so let's take a look
|
|
|
|
```
|
|
pim@hippo:~/src/vpp/extras/vpp_stats_fs$ sudo systemctl start vpp
|
|
pim@hippo:~/src/vpp/extras/vpp_stats_fs$ sudo make start
|
|
pim@hippo:~/src/vpp/extras/vpp_stats_fs$ mount | grep stats
|
|
rawBridge on /run/vpp/stats_fs_dir type fuse.rawBridge (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
|
|
|
|
pim@hippo:/run/vpp/stats_fs_dir$ ls -la
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 bfd
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 buffer-pools
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 err
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 if
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 interfaces
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 mem
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 net
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 node
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 nodes
|
|
drwxr-xr-x 0 root root 0 Apr 9 14:07 sys
|
|
|
|
pim@hippo:/run/vpp/stats_fs_dir$ cat sys/boottime
|
|
1681042046.00
|
|
pim@hippo:/run/vpp/stats_fs_dir$ date +%s
|
|
1681042058
|
|
pim@hippo:~/src/vpp/extras/vpp_stats_fs$ sudo make stop
|
|
```
|
|
|
|
There's lots of really interesting stuff in here - for example in the `/sys` hierarchy we can see a
|
|
`boottime` file, and from there I can determine the uptime of the process. Further, the `/mem`
|
|
hierarchy shows the current memory usage for each of the _main_, _api_ and _stats_ segment heaps.
|
|
And of course, in the `/interfaces` hierarchy we can see all the usual packets and bytes counters
|
|
for any interface created in the dataplane.
|
|
|
|
### VPP Stats Segment in C
|
|
|
|
I wish I were good at Go, but I never really took to the language. I'm pretty good at Python, but
|
|
sorting through the stats segment isn't super quick as I've already noticed in the Python3 based
|
|
[[VPP SNMP Agent](https://git.ipng.ch/ipng/vpp-snmp-agent)]. I'm probably the world's least
|
|
terrible C programmer, so maybe I can take a look at the VPP Stats Client and make sense of it. Luckily,
|
|
there's an example already in `src/vpp/app/vpp_get_stats.c` and it reveals the following pattern:
|
|
|
|
1. assemble a vector of regular expression patterns in the hierarchy, or just `^/` to start
|
|
1. get a handle to the stats segment with `stats_segment_ls()` using the pattern(s)
|
|
1. use the handle to dump the stats segment into a vector with `stat_segment_dump()`.
|
|
1. iterate over the returned stats structure, each element has a type and a given name:
|
|
* ***STAT_DIR_TYPE_SCALAR_INDEX***: these are floating point doubles
|
|
* ***STAT_DIR_TYPE_COUNTER_VECTOR_SIMPLE***: single uint32 counter
|
|
* ***STAT_DIR_TYPE_COUNTER_VECTOR_COMBINED***: two uint32 counters
|
|
1. freeing the used stats structure with `stat_segment_data_free()`
|
|
|
|
The simple and combined stats turn out to be associative arrays, the outer of which notes the
|
|
_thread_ and the inner of which refers to the _index_. As such, a statistic of type
|
|
***VECTOR_SIMPLE*** can be decoded like so:
|
|
|
|
```
|
|
if (res[i].type == STAT_DIR_TYPE_COUNTER_VECTOR_SIMPLE)
|
|
for (k = 0; k < vec_len (res[i].simple_counter_vec); k++)
|
|
for (j = 0; j < vec_len (res[i].simple_counter_vec[k]); j++)
|
|
printf ("[%d @ %d]: %llu packets %s\n", j, k, res[i].simple_counter_vec[k][j], res[i].name);
|
|
```
|
|
|
|
The statistic of type ***VECTOR_COMBINED*** is very similar, except the union type there is a
|
|
`combined_counter_vec[k][j]` which has a member `.packets` and a member called `.bytes`. The
|
|
simplest form, ***SCALAR_INDEX***, is just a single floating point number attached to the name.
|
|
|
|
In principle, this should be really easy to sift through and decode. Now that I've figured that
|
|
out, let me dump a bunch of stats with the `vpp_get_stats` tool that comes with vanilla VPP:
|
|
|
|
```
|
|
pim@chrma0:~$ vpp_get_stats dump /interfaces/TenGig.*40121 | grep -v ': 0'
|
|
[0 @ 2]: 67057 packets /interfaces/TenGigabitEthernet81_0_0.40121/drops
|
|
[0 @ 2]: 76125287 packets /interfaces/TenGigabitEthernet81_0_0.40121/ip4
|
|
[0 @ 2]: 1793946 packets /interfaces/TenGigabitEthernet81_0_0.40121/ip6
|
|
[0 @ 2]: 77919629 packets, 66184628769 bytes /interfaces/TenGigabitEthernet81_0_0.40121/rx
|
|
[0 @ 0]: 7 packets, 610 bytes /interfaces/TenGigabitEthernet81_0_0.40121/tx
|
|
[0 @ 1]: 26687 packets, 18771919 bytes /interfaces/TenGigabitEthernet81_0_0.40121/tx
|
|
[0 @ 2]: 6448944 packets, 3663975508 bytes /interfaces/TenGigabitEthernet81_0_0.40121/tx
|
|
[0 @ 3]: 138924 packets, 20599785 bytes /interfaces/TenGigabitEthernet81_0_0.40121/tx
|
|
[0 @ 4]: 130720342 packets, 57436383614 bytes /interfaces/TenGigabitEthernet81_0_0.40121/tx
|
|
```
|
|
|
|
I can see both types of counter at play here, let me explain the first line: it is saying that the
|
|
counter of name `/interfaces/TenGigabitEthernet81_0_0.40121/drops`, at counter index 0, CPU thread
|
|
2, has a simple counter with value 67057. Taking the last line, this is a combined counter type with
|
|
name `/interfaces/TenGigabitEthernet81_0_0.40121/tx` at index 0, all five CPU threads (the main
|
|
thread and four worker threads) have all sent traffic into this interface, and the counters for each
|
|
in packets and bytes is given.
|
|
|
|
For readability's sake, my `grep -v` above doesn't print any counter that is 0. For example,
|
|
interface `Te81/0/0` has only one receive queue, and it's bound to thread 2. The other threads will
|
|
not receive any packets for it, consequently their `rx` counters stay zero:
|
|
|
|
```
|
|
pim@chrma0:~/src/vpp$ vpp_get_stats dump /interfaces/TenGig.*40121 | grep rx$
|
|
[0 @ 0]: 0 packets, 0 bytes /interfaces/TenGigabitEthernet81_0_0.40121/rx
|
|
[0 @ 1]: 0 packets, 0 bytes /interfaces/TenGigabitEthernet81_0_0.40121/rx
|
|
[0 @ 2]: 80720186 packets, 68458816253 bytes /interfaces/TenGigabitEthernet81_0_0.40121/rx
|
|
[0 @ 3]: 0 packets, 0 bytes /interfaces/TenGigabitEthernet81_0_0.40121/rx
|
|
[0 @ 4]: 0 packets, 0 bytes /interfaces/TenGigabitEthernet81_0_0.40121/rx
|
|
```
|
|
|
|
### Hierarchy: Pattern Matching
|
|
|
|
I quickly discover a pattern in most of these names: they start with a scope, say `/interfaces`,
|
|
then have a path entry for the interface name, and finally a specific counter (`/rx` or `/mpls`).
|
|
This is true also for the `/nodes` hiearchy, in which all VPP's graph nodes have a set of counters:
|
|
|
|
```
|
|
pim@chrma0:~$ vpp_get_stats dump /nodes/ip4-lookup | grep -v ': 0'
|
|
[0 @ 1]: 11365675493301 packets /nodes/ip4-lookup/clocks
|
|
[0 @ 2]: 3256664129799 packets /nodes/ip4-lookup/clocks
|
|
[0 @ 3]: 28364098623954 packets /nodes/ip4-lookup/clocks
|
|
[0 @ 4]: 30198798628761 packets /nodes/ip4-lookup/clocks
|
|
[0 @ 1]: 80870763789 packets /nodes/ip4-lookup/vectors
|
|
[0 @ 2]: 17392446654 packets /nodes/ip4-lookup/vectors
|
|
[0 @ 3]: 259363625369 packets /nodes/ip4-lookup/vectors
|
|
[0 @ 4]: 298176625181 packets /nodes/ip4-lookup/vectors
|
|
[0 @ 1]: 49730112811 packets /nodes/ip4-lookup/calls
|
|
[0 @ 2]: 13035172295 packets /nodes/ip4-lookup/calls
|
|
[0 @ 3]: 109088424231 packets /nodes/ip4-lookup/calls
|
|
[0 @ 4]: 119789874274 packets /nodes/ip4-lookup/calls
|
|
```
|
|
|
|
If you're ever seen the output of `show runtime`, it looks like this:
|
|
|
|
```
|
|
vpp# show runtime
|
|
Thread 1 vpp_wk_0 (lcore 28)
|
|
Time 3377500.2, 10 sec internal node vector rate 1.46 loops/sec 3301017.05
|
|
vector rates in 2.7440e6, out 2.7210e6, drop 3.6025e1, punt 7.2243e-5
|
|
Name State Calls Vectors Suspends Clocks Vectors/Call
|
|
...
|
|
ip4-lookup active 49732141978 80873724903 0 1.41e2 1.63
|
|
```
|
|
|
|
Hey look! On thread 1, which is called `vpp_wk_0` and is running on logical CPU core #28, there are
|
|
a bunch of VPP graph nodes that are all keeping stats of what they've been doing, and you can see
|
|
here that the following numbers line up between `show runtime` and the VPP Stats dumper:
|
|
|
|
* ***Name***: This is the name of the VPP graph node, in this case `ip4-lookup`, which is performing an
|
|
IPv4 FIB lookup to figure out what the L3 nexthop is of a given IPv4 packet we're trying to route.
|
|
* ***Calls***: How often did we invoke this graph node, 49.7 billion times so far.
|
|
* ***Vectors***: How many packets did we push through, 80.87 billion, humble brag.
|
|
* ***Clocks***: This one is a bit different -- you can see the cumulative clock cycles spent by
|
|
this CPU thread in the stats dump: 11365675493301 divided by 80870763789 packets is 140.54 CPU
|
|
cycles per packet. It's a cool interview question "How many CPU cycles does it take to do an
|
|
IPv4 routing table lookup". You now know the answer :-)
|
|
* ***Vectors/Call***: This is a measure of how busy the node is (did it run for only one packet,
|
|
or for many packets?). On average when the worker thread gave the `ip4-lookup` node some work to
|
|
do, there have been a total of 80873724903 packets handled in 49732141978 calls, so 1.626
|
|
packets per call. If ever you're handling 256 packets per call (the most VPP will allow per call),
|
|
your router will be sobbing.
|
|
|
|
### Prometheus Metrics
|
|
|
|
Prometheus has metrics which carry a name, and zero or more labels. The prometheus query language
|
|
can then use these labels to do aggregation, division, averages, and so on. As a practical example,
|
|
above I looked at interface stats and saw that the Rx/Tx numbers were counted one per thread. If
|
|
we'd like the total on the interface, it would be great if we could `sum without (thread,index)`,
|
|
which will have the effect of adding all of these numbers together.
|
|
For the monotonically increasing counter numbers (like the total vectors/calls/clocks per node), we
|
|
can take the running _rate_ of change, showing the time spent over the last minute, or so. This way,
|
|
spikes in traffic will clearly correlate both with a spike in packets/sec or bytes/sec on the
|
|
interface, but also a higher number of _vectors/call_, and correspondingly typically a lower number
|
|
of _clocks/vector_, as VPP gets more efficient when it can re-use the CPU's instruction and data
|
|
cache to do repeat work on multiple packets.
|
|
|
|
I decide to massage the statistic names a little bit, by transforming them of the basic format:
|
|
`prefix_suffix{label="X",index="A",thread="B"} value`
|
|
|
|
A few examples:
|
|
* The single counter that looks like `[6 @ 0]: 994403888 packets /mem/main heap` becomes:
|
|
* `mem{heap="main heap",index="6",thread="0"}`
|
|
* The combined counter `[0 @ 1]: 79582338270 packets, 16265349667188 bytes /interfaces/Te1_0_2/rx`
|
|
becomes:
|
|
* `interfaces_rx_packets{interface="Te1_0_2",index="0",thread="1"} 79582338270`
|
|
* `interfaces_rx_bytes{interface="Te1_0_2",index="0",thread="1"} 16265349667188`
|
|
* The node information running on, say thread 4, becomes:
|
|
* `nodes_clocks{node="ip4-lookup",index="0",thread="4"} 30198798628761`
|
|
* `nodes_vectors{node="ip4-lookup",index="0",thread="4"} 298176625181`
|
|
* `nodes_calls{node="ip4-lookup",index="0",thread="4"} 119789874274`
|
|
* `nodes_suspends{node="ip4-lookup",index="0",thread="4"} 0`
|
|
|
|
### VPP Exporter
|
|
|
|
I wish I had things like `split()` and `re.match()` but in C (well, I guess I do have POSIX regular
|
|
expressions...), but it's all a little bit more low level. Based on my basic loop that opens the
|
|
stats segment, registers its desired patterns, and then retrieves a vector of {name, type,
|
|
counter}-tuples, I decide to do a little bit of non-intrusive string tokenization first:
|
|
|
|
```
|
|
static int tokenize (const char *str, char delimiter, char **tokens, int *lengths) {
|
|
char *p = (char *) str;
|
|
char *savep = p;
|
|
int i = 0;
|
|
|
|
while (*p) if (*p == delimiter) {
|
|
tokens[i] = (char *) savep;
|
|
lengths[i] = (int) (p - savep);
|
|
i++; p++; savep = p;
|
|
} else p++;
|
|
tokens[i] = (char *) savep;
|
|
lengths[i] = (int) (p - savep);
|
|
return i++;
|
|
}
|
|
|
|
/* The call site */
|
|
char *tokens[10];
|
|
int lengths[10];
|
|
int num_tokens = tokenize (res[i].name, '/', tokens, lengths);
|
|
```
|
|
|
|
The tokenizer takes an array of N pointers to the resulting tokens, and their lengths. This sets it
|
|
aside from `strtok()` and friends, because those will overwrite the occurences of the delimiter in
|
|
the input string with `\0`, and as such cannot take a `const char *str` as input. This one leaves
|
|
the string alone though, and will return the tokens as {ptr, len}-tuples, including how many
|
|
tokens it found.
|
|
|
|
One thing I'll probably regret is that there's no bounds checking on the number of tokens -- if I
|
|
have more than 10 of these, I'll come to regret it. But for now, the depth of the hierarchy is only
|
|
3, so I should be fine. Besides, I got into a fight with ChatGPT after it declared a romantic
|
|
interest in my cat, so it won't write code for me anymore :-(
|
|
|
|
But using this simple tokenizer, and knowledge of the structure of well known hierarchy paths, the
|
|
rest of the exporter is quickly in hand. Some variables don't have a label (for example
|
|
`/sys/boottime`), but those that do will see that field transposed from the directory path
|
|
`/mem/main heap/free` into the label as I showed above.
|
|
|
|
### Results
|
|
|
|
{{< image width="400px" float="right" src="/assets/vpp-stats/grafana1.png" alt="Grafana 1" >}}
|
|
|
|
With this VPP Prometheus Exporter, I can now hook the VPP routers up to Prometheus and Grafana.
|
|
Aggregations in Grafana are super easy and scalable, due to the conversion of the static paths into
|
|
dynamically created labels on the prometheus metric names.
|
|
|
|
Drawing a graph of the running time spent by each individual VPP graph node might look something
|
|
like this:
|
|
|
|
```
|
|
sum without (node)(rate(nodes_clocks[60s]))
|
|
/
|
|
sum without (node)(rate(nodes_vectors[60s]))
|
|
```
|
|
|
|
The plot to the right shows a system under a loadtest that ramps up from 0% to 100% of line rate,
|
|
and the traces are the cummulative time spent in each node (on a logarithmic scale). The top purple
|
|
line represents `dpdk-input`. When a VPP dataplane is idle, the worker threads will be repeatedly
|
|
polling DPDK to ask it if it has something to do, spending 100% of their time being told "there is
|
|
nothing for you to do". But, once load starts appearing, the other nodes start spending CPU time,
|
|
for example the chain of IPv4 forwarding is `ethernet-input`, `ip4-input`, `ip4-lookup`, followed by
|
|
`ip4-rewrite` and ultimately the packet is transmitted on some other interface. When the system is
|
|
lightly loaded, the `ethernet-input` node for example will spend 1100 or so CPU cycles per packet,
|
|
but when the machine is under higher load, the time spent will decrease to as low as 22 CPU cycles
|
|
per packet. This is true for almost all of the nodes - VPP gets relatively _more efficient_ under
|
|
load.
|
|
|
|
{{< image width="400px" float="right" src="/assets/vpp-stats/grafana2.png" alt="Grafana 2" >}}
|
|
|
|
Another cool graph that I won't be able to see when using only LibreNMS and SNMP polling, is how
|
|
busy the router is. In VPP, each dispatch of the worker loop will poll DPDK and dispatch the packets
|
|
through the directed graph of nodes that I showed above. But how many packets can be moved through
|
|
the graph per CPU? The largest number of packets that VPP will ever offer into a call of the nodes
|
|
is 256. Typically an unloaded machine will have an average number of Vectors/Call of around 1.00.
|
|
When the worker thread is loaded, it may sit at around 130-150 Vectors/Call. If it's saturated, it
|
|
will quickly shoot up to 256.
|
|
|
|
As a good approximation, Vectors/Call normalized to 100% will be an indication of how busy the
|
|
dataplane is. In the picture above, between 10:30 and 11:00 my test router was pushing about 180Gbps
|
|
of traffic, but with large packets so its total vectors/call was modest (roughly 35-40), which you
|
|
can see as all threads there are running in the ~25% load range. Then at 11:00 a few threads got
|
|
hotter, and one of them completely saturated, and the traffic being forwarded by the CPU thread was
|
|
suffering _packetlo_, even though the others were absolutely fine... forwarding 150Mpps on a 10 year
|
|
old Dell R720!
|
|
|
|
### What's Next
|
|
|
|
Together with the graph above, I can also see how many CPU cycles are spent in which
|
|
type of operation. For example, encapsulation of GENEVE or VxLAN is not _free_, although it's also
|
|
not every expensive. If I know how many CPU cycles are available (roughly the clock speed of the CPU
|
|
threads, in our case Xeon X1518 (2.2GHz) or Xeon E5-2683 v4 CPUs (3GHz), I can pretty accurately
|
|
calculate what a given mix of traffic and features is going to cost, and how many packets/sec our
|
|
routers at IPng will be able to forward. Spoiler alert: it's way more than currently needed. Our
|
|
supermicros can handle roughly 35Mpps each, and considering a regular mixture of internet traffic
|
|
(called _imix_) is about 3Mpps per 10G, I will have room to spare for the time being
|
|
|
|
This is super valuable information for folks running VPP in production.
|
|
I haven't put the finishing touches on the VPP Prometheus Exporter, for example there are no
|
|
commandline flags yet, it doesn't listen on any port other than 9482 (the same one that the toy
|
|
exporter in `src/vpp/app/vpp_prometheus_export.c` ships with
|
|
[[ref](https://github.com/prometheus/prometheus/wiki/Default-port-allocations)]). My grafana
|
|
dashboard is also not fully completed yet. I hope to get that done in April, and publish both the
|
|
exporter and the dashboard on GitHub. Stay tuned!
|