Compare commits
28 Commits
26397d69c6
...
main
Author | SHA1 | Date | |
---|---|---|---|
fdb77838b8 | |||
6d3f4ac206 | |||
baa3e78045 | |||
0972cf4aa1 | |||
4f81d377a0 | |||
153048eda4 | |||
4aa5745d06 | |||
7d3f617966 | |||
8918821413 | |||
9783c7d39c | |||
af68c1ec3b | |||
0baadb5089 | |||
3b7e576d20 | |||
d0a7cdbe38 | |||
ed087f3fc6 | |||
51e6c0e1c2 | |||
8a991bee47 | |||
d9e2f407e7 | |||
01820776af | |||
d5d4f7ff55 | |||
2a61bdc028 | |||
c2b8eef4f4 | |||
533cca0108 | |||
4ac8c47127 | |||
bcbb119b20 | |||
ce6e6cde22 | |||
610835925b | |||
16ac42bad9 |
@ -8,9 +8,9 @@ steps:
|
||||
- git lfs install
|
||||
- git lfs pull
|
||||
- name: build
|
||||
image: git.ipng.ch/ipng/drone-hugo:release-0.134.3
|
||||
image: git.ipng.ch/ipng/drone-hugo:release-0.145.1
|
||||
settings:
|
||||
hugo_version: 0.134.3
|
||||
hugo_version: 0.145.0
|
||||
extended: true
|
||||
- name: rsync
|
||||
image: drillster/drone-rsync
|
||||
|
@ -89,7 +89,7 @@ lcp lcp-sync off
|
||||
```
|
||||
|
||||
The prep work for the rest of the interface syncer starts with this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||
for the rest of this blog post, the behavior will be in the 'on' position.
|
||||
|
||||
### Change interface: state
|
||||
@ -120,7 +120,7 @@ the state it was. I did notice that you can't bring up a sub-interface if its pa
|
||||
is down, which I found counterintuitive, but that's neither here nor there.
|
||||
|
||||
All of this is to say that we have to be careful when copying state forward, because as
|
||||
this [[commit](https://github.com/pimvanpelt/lcpng/commit/7c15c84f6c4739860a85c599779c199cb9efef03)]
|
||||
this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c15c84f6c4739860a85c599779c199cb9efef03)]
|
||||
shows, issuing `set int state ... up` on an interface, won't touch its sub-interfaces in VPP, but
|
||||
the subsequent netlink message to bring the _LIP_ for that interface up, **will** update the
|
||||
children, thus desynchronising Linux and VPP: Linux will have interface **and all its
|
||||
@ -128,7 +128,7 @@ sub-interfaces** up unconditionally; VPP will have the interface up and its sub-
|
||||
whatever state they were before.
|
||||
|
||||
To address this, a second
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/a3dc56c01461bdffcac8193ead654ae79225220f)] was
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a3dc56c01461bdffcac8193ead654ae79225220f)] was
|
||||
needed. I'm not too sure I want to keep this behavior, but for now, it results in an intuitive
|
||||
end-state, which is that all interfaces states are exactly the same between Linux and VPP.
|
||||
|
||||
@ -157,7 +157,7 @@ DBGvpp# set int state TenGigabitEthernet3/0/0 up
|
||||
### Change interface: MTU
|
||||
|
||||
Finally, a straight forward
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/39bfa1615fd1cafe5df6d8fc9d34528e8d3906e2)], or
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/39bfa1615fd1cafe5df6d8fc9d34528e8d3906e2)], or
|
||||
so I thought. When the MTU changes in VPP (with `set interface mtu packet N <int>`), there is
|
||||
callback that can be registered which copies this into the _LIP_. I did notice a specific corner
|
||||
case: In VPP, a sub-interface can have a larger MTU than its parent. In Linux, this cannot happen,
|
||||
@ -179,7 +179,7 @@ higher than that, perhaps logging an error explaining why. This means two things
|
||||
1. Any change in VPP of a parent MTU should ensure all children are clamped to at most that.
|
||||
|
||||
I addressed the issue in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/79a395b3c9f0dae9a23e6fbf10c5f284b1facb85)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/79a395b3c9f0dae9a23e6fbf10c5f284b1facb85)].
|
||||
|
||||
### Change interface: IP Addresses
|
||||
|
||||
@ -199,7 +199,7 @@ VPP into the companion Linux devices:
|
||||
_LIP_ with `lcp_itf_set_interface_addr()`.
|
||||
|
||||
This means with this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/f7e1bb951d648a63dfa27d04ded0b6261b9e39fe)], at
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/f7e1bb951d648a63dfa27d04ded0b6261b9e39fe)], at
|
||||
any time a new _LIP_ is created, the IPv4 and IPv6 address on the VPP interface are fully copied
|
||||
over by the third change, while at runtime, new addresses can be set/removed as well by the first
|
||||
and second change.
|
||||
|
@ -100,7 +100,7 @@ linux-cp {
|
||||
|
||||
Based on this config, I set the startup default in `lcp_set_lcp_auto_subint()`, but I realize that
|
||||
an administrator may want to turn it on/off at runtime, too, so I add a CLI getter/setter that
|
||||
interacts with the flag in this [[commit](https://github.com/pimvanpelt/lcpng/commit/d23aab2d95aabcf24efb9f7aecaf15b513633ab7)]:
|
||||
interacts with the flag in this [[commit](https://git.ipng.ch/ipng/lcpng/commit/d23aab2d95aabcf24efb9f7aecaf15b513633ab7)]:
|
||||
|
||||
```
|
||||
DBGvpp# show lcp
|
||||
@ -116,11 +116,11 @@ lcp lcp-sync off
|
||||
```
|
||||
|
||||
The prep work for the rest of the interface syncer starts with this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||
for the rest of this blog post, the behavior will be in the 'on' position.
|
||||
|
||||
The code for the configuration toggle is in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||
|
||||
### Auto create/delete sub-interfaces
|
||||
|
||||
@ -145,7 +145,7 @@ I noticed that interface deletion had a bug (one that I fell victim to as well:
|
||||
remove the netlink device in the correct network namespace), which I fixed.
|
||||
|
||||
The code for the auto create/delete and the bugfix is in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||
|
||||
### Further Work
|
||||
|
||||
|
@ -154,7 +154,7 @@ For now, `lcp_nl_dispatch()` just throws the message away after logging it with
|
||||
a function that will come in very useful as I start to explore all the different Netlink message types.
|
||||
|
||||
The code that forms the basis of our Netlink Listener lives in [[this
|
||||
commit](https://github.com/pimvanpelt/lcpng/commit/c4e3043ea143d703915239b2390c55f7b6a9b0b1)] and
|
||||
commit](https://git.ipng.ch/ipng/lcpng/commit/c4e3043ea143d703915239b2390c55f7b6a9b0b1)] and
|
||||
specifically, here I want to call out I was not the primary author, I worked off of Matt and Neale's
|
||||
awesome work in this pending [Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122).
|
||||
|
||||
@ -182,7 +182,7 @@ Linux interface VPP is not aware of. But, if I can find the _LIP_, I can convert
|
||||
add or remove the ip4/ip6 neighbor adjacency.
|
||||
|
||||
The code for this first Netlink message handler lives in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/30bab1d3f9ab06670fbef2c7c6a658e7b77f7738)]. An
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/30bab1d3f9ab06670fbef2c7c6a658e7b77f7738)]. An
|
||||
ironic insight is that after writing the code, I don't think any of it will be necessary, because
|
||||
the interface plugin will already copy ARP and IPv6 ND packets back and forth and itself update its
|
||||
neighbor adjacency tables; but I'm leaving the code in for now.
|
||||
@ -197,7 +197,7 @@ it or remove it, and if there are no link-local addresses left, disable IPv6 on
|
||||
There's also a few multicast routes to add (notably 224.0.0.0/24 and ff00::/8, all-local-subnet).
|
||||
|
||||
The code for IP address handling is in this
|
||||
[[commit]](https://github.com/pimvanpelt/lcpng/commit/87742b4f541d389e745f0297d134e34f17b5b485), but
|
||||
[[commit]](https://git.ipng.ch/ipng/lcpng/commit/87742b4f541d389e745f0297d134e34f17b5b485), but
|
||||
when I took it out for a spin, I noticed something curious, looking at the log lines that are
|
||||
generated for the following sequence:
|
||||
|
||||
@ -236,7 +236,7 @@ interface and directly connected route addition/deletion is slightly different i
|
||||
So, I decide to take a little shortcut -- if an addition returns "already there", or a deletion returns
|
||||
"no such entry", I'll just consider it a successful addition and deletion respectively, saving my eyes
|
||||
from being screamed at by this red error message. I changed that in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/d63fbd8a9a612d038aa385e79a57198785d409ca)],
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/d63fbd8a9a612d038aa385e79a57198785d409ca)],
|
||||
turning this situation in a friendly green notice instead.
|
||||
|
||||
### Netlink: Link (existing)
|
||||
@ -267,7 +267,7 @@ To avoid this loop, I temporarily turn off `lcp-sync` just before handling a bat
|
||||
turn it back to its original state when I'm done with that.
|
||||
|
||||
The code for all/del of existing links is in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)].
|
||||
|
||||
### Netlink: Link (new)
|
||||
|
||||
@ -276,7 +276,7 @@ doesn't have a _LIP_ for, but specifically describes a VLAN interface? Well, th
|
||||
is trying to create a new sub-interface. And supporting that operation would be super cool, so let's go!
|
||||
|
||||
Using the earlier placeholder hint in `lcp_nl_link_add()` (see the previous
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)]),
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)]),
|
||||
I know that I've gotten a NEWLINK request but the Linux ifindex doesn't have a _LIP_. This could be
|
||||
because the interface is entirely foreign to VPP, for example somebody created a dummy interface or
|
||||
a VLAN sub-interface on one:
|
||||
@ -331,7 +331,7 @@ a boring `<phy>.<subid>` name.
|
||||
|
||||
Alright, without further ado, the code for the main innovation here, the implementation of
|
||||
`lcp_nl_link_add_vlan()`, is in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/45f408865688eb7ea0cdbf23aa6f8a973be49d1a)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/45f408865688eb7ea0cdbf23aa6f8a973be49d1a)].
|
||||
|
||||
## Results
|
||||
|
||||
|
@ -118,7 +118,7 @@ or Virtual Routing/Forwarding domains). So first, I need to add these:
|
||||
|
||||
All of this code was heavily inspired by the pending [[Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122)]
|
||||
but a few finishing touches were added, and wrapped up in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/7a76498277edc43beaa680e91e3a0c1787319106)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/7a76498277edc43beaa680e91e3a0c1787319106)].
|
||||
|
||||
### Deletion
|
||||
|
||||
@ -459,7 +459,7 @@ it as 'unreachable' rather than deleting it. These are *additions* which have a
|
||||
but with an interface index of 1 (which, in Netlink, is 'lo'). This makes VPP intermittently crash, so I
|
||||
currently commented this out, while I gain better understanding. Result: blackhole/unreachable/prohibit
|
||||
specials can not be set using the plugin. Beware!
|
||||
(disabled in this [[commit](https://github.com/pimvanpelt/lcpng/commit/7c864ed099821f62c5be8cbe9ed3f4dd34000a42)]).
|
||||
(disabled in this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c864ed099821f62c5be8cbe9ed3f4dd34000a42)]).
|
||||
|
||||
## Credits
|
||||
|
||||
|
@ -88,7 +88,7 @@ stat['/if/rx-miss'][:, 1].sum() - returns the sum of packet counters for
|
||||
```
|
||||
|
||||
Alright, so let's grab that file and refactor it into a small library for me to use, I do
|
||||
this in [[this commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||
this in [[this commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||
|
||||
### VPP's API
|
||||
|
||||
@ -159,7 +159,7 @@ idx=19 name=tap4 mac=02:fe:17:06:fc:af mtu=9000 flags=3
|
||||
|
||||
So I added a little abstration with some error handling and one main function
|
||||
to return interfaces as a Python dictionary of those `sw_interface_details`
|
||||
tuples in [[this commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||
tuples in [[this commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||
|
||||
### AgentX
|
||||
|
||||
@ -207,9 +207,9 @@ once asked with `GetPDU` or `GetNextPDU` requests, by issuing a corresponding `R
|
||||
to the SNMP server -- it takes care of all the rest!
|
||||
|
||||
The resulting code is in [[this
|
||||
commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/8c9c1e2b4aa1d40a981f17581f92bba133dd2c29)]
|
||||
commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/8c9c1e2b4aa1d40a981f17581f92bba133dd2c29)]
|
||||
but you can also check out the whole thing on
|
||||
[[Github](https://github.com/pimvanpelt/vpp-snmp-agent)].
|
||||
[[Github](https://git.ipng.ch/ipng/vpp-snmp-agent)].
|
||||
|
||||
### Building
|
||||
|
||||
|
@ -480,7 +480,7 @@ is to say, those packets which were destined to any IP address configured on the
|
||||
plane. Any traffic going _through_ VPP will never be seen by Linux! So, I'll have to be
|
||||
clever and count this traffic by polling VPP instead. This was the topic of my previous
|
||||
[VPP Part 6]({{< ref "2021-09-10-vpp-6" >}}) about the SNMP Agent. All of that code
|
||||
was released to [Github](https://github.com/pimvanpelt/vpp-snmp-agent), notably there's
|
||||
was released to [Github](https://git.ipng.ch/ipng/vpp-snmp-agent), notably there's
|
||||
a hint there for an `snmpd-dataplane.service` and a `vpp-snmp-agent.service`, including
|
||||
the compiled binary that reads from VPP and feeds this to SNMP.
|
||||
|
||||
|
@ -62,7 +62,7 @@ plugins:
|
||||
or route, or the system receiving ARP or IPv6 neighbor request/reply from neighbors), and applying
|
||||
these events to the VPP dataplane.
|
||||
|
||||
I've published the code on [Github](https://github.com/pimvanpelt/lcpng/) and I am targeting a release
|
||||
I've published the code on [Github](https://git.ipng.ch/ipng/lcpng/) and I am targeting a release
|
||||
in upstream VPP, hoping to make the upcoming 22.02 release in February 2022. I have a lot of ground to
|
||||
cover, but I will note that the plugin has been running in production in [AS8298]({{< ref "2021-02-27-network" >}})
|
||||
since Sep'21 and no crashes related to LinuxCP have been observed.
|
||||
@ -195,7 +195,7 @@ So grab a cup of tea, while we let Rhino stretch its legs, ehh, CPUs ...
|
||||
pim@rhino:~$ mkdir -p ~/src
|
||||
pim@rhino:~$ cd ~/src
|
||||
pim@rhino:~/src$ sudo apt install libmnl-dev
|
||||
pim@rhino:~/src$ git clone https://github.com/pimvanpelt/lcpng.git
|
||||
pim@rhino:~/src$ git clone https://git.ipng.ch/ipng/lcpng.git
|
||||
pim@rhino:~/src$ git clone https://gerrit.fd.io/r/vpp
|
||||
pim@rhino:~/src$ ln -s ~/src/lcpng ~/src/vpp/src/plugins/lcpng
|
||||
pim@rhino:~/src$ cd ~/src/vpp
|
||||
|
@ -33,7 +33,7 @@ In this first post, let's take a look at tablestakes: writing a YAML specificati
|
||||
configuration elements of VPP, and then ensures that the YAML file is both syntactically as well as
|
||||
semantically correct.
|
||||
|
||||
**Note**: Code is on [my Github](https://github.com/pimvanpelt/vppcfg), but it's not quite ready for
|
||||
**Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for
|
||||
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
|
||||
or reach out by [contacting us](/s/contact/).
|
||||
|
||||
@ -348,7 +348,7 @@ to mess up my (or your!) VPP router by feeding it garbage, so the lions' share o
|
||||
has been to assert the YAML file is both syntactically and semantically valid.
|
||||
|
||||
|
||||
In the mean time, you can take a look at my code on [GitHub](https://github.com/pimvanpelt/vppcfg), but to
|
||||
In the mean time, you can take a look at my code on [GitHub](https://git.ipng.ch/ipng/vppcfg), but to
|
||||
whet your appetite, here's a hefty configuration that demonstrates all implemented types:
|
||||
|
||||
```
|
||||
|
@ -32,7 +32,7 @@ the configuration to the dataplane. Welcome to `vppcfg`!
|
||||
In this second post of the series, I want to talk a little bit about how planning a path from a running
|
||||
configuration to a desired new configuration might look like.
|
||||
|
||||
**Note**: Code is on [my Github](https://github.com/pimvanpelt/vppcfg), but it's not quite ready for
|
||||
**Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for
|
||||
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
|
||||
or reach out by [contacting us](/s/contact/).
|
||||
|
||||
|
@ -171,12 +171,12 @@ GigabitEthernet1/0/0 1 up GigabitEthernet1/0/0
|
||||
|
||||
After this exploratory exercise, I have learned enough about the hardware to be able to take the
|
||||
Fitlet2 out for a spin. To configure the VPP instance, I turn to
|
||||
[[vppcfg](https://github.com/pimvanpelt/vppcfg)], which can take a YAML configuration file
|
||||
[[vppcfg](https://git.ipng.ch/ipng/vppcfg)], which can take a YAML configuration file
|
||||
describing the desired VPP configuration, and apply it safely to the running dataplane using the VPP
|
||||
API. I've written a few more posts on how it does that, notably on its [[syntax]({{< ref "2022-03-27-vppcfg-1" >}})]
|
||||
and its [[planner]({{< ref "2022-04-02-vppcfg-2" >}})]. A complete
|
||||
configuration guide on vppcfg can be found
|
||||
[[here](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md)].
|
||||
[[here](https://git.ipng.ch/ipng/vppcfg/blob/main/docs/config-guide.md)].
|
||||
|
||||
```
|
||||
pim@fitlet:~$ sudo dpkg -i {lib,}vpp*23.06*deb
|
||||
|
@ -185,7 +185,7 @@ forgetful chipmunk-sized brain!), so here, I'll only recap what's already writte
|
||||
|
||||
**1. BUILD:** For the first step, the build is straight forward, and yields a VPP instance based on
|
||||
`vpp-ext-deps_23.06-1` at version `23.06-rc0~71-g182d2b466`, which contains my
|
||||
[[LCPng](https://github.com/pimvanpelt/lcpng.git)] plugin. I then copy the packages to the router.
|
||||
[[LCPng](https://git.ipng.ch/ipng/lcpng.git)] plugin. I then copy the packages to the router.
|
||||
The router has an E-2286G CPU @ 4.00GHz with 6 cores and 6 hyperthreads. There's a really handy tool
|
||||
called `likwid-topology` that can show how the L1, L2 and L3 cache lines up with respect to CPU
|
||||
cores. Here I learn that CPU (0+6) and (1+7) share L1 and L2 cache -- so I can conclude that 0-5 are
|
||||
@ -351,7 +351,7 @@ in `vppcfg`:
|
||||
* When I create the initial `--novpp` config, there's a bug in `vppcfg` where I incorrectly
|
||||
reference a dataplane object which I haven't initialized (because with `--novpp` the tool
|
||||
will not contact the dataplane at all. That one was easy to fix, which I did in [[this
|
||||
commit](https://github.com/pimvanpelt/vppcfg/commit/0a0413927a0be6ed3a292a8c336deab8b86f5eee)]).
|
||||
commit](https://git.ipng.ch/ipng/vppcfg/commit/0a0413927a0be6ed3a292a8c336deab8b86f5eee)]).
|
||||
|
||||
After that small detour, I can now proceed to configure the dataplane by offering the resulting
|
||||
VPP commands, like so:
|
||||
@ -573,7 +573,7 @@ see is that which is destined to the controlplane (eg, to one of the IPv4 or IPv
|
||||
multicast/broadcast groups that they are participating in), so things like tcpdump or SNMP won't
|
||||
really work.
|
||||
|
||||
However, due to my [[vpp-snmp-agent](https://github.com/pimvanpelt/vpp-snmp-agent.git)], which is
|
||||
However, due to my [[vpp-snmp-agent](https://git.ipng.ch/ipng/vpp-snmp-agent.git)], which is
|
||||
feeding as an AgentX behind an snmpd that in turn is running in the `dataplane` namespace, SNMP scrapes
|
||||
work as they did before, albeit with a few different interface names.
|
||||
|
||||
|
@ -14,7 +14,7 @@ performance and versatility. For those of us who have used Cisco IOS/XR devices,
|
||||
_ASR_ (aggregation service router), VPP will look and feel quite familiar as many of the approaches
|
||||
are shared between the two.
|
||||
|
||||
I've been working on the Linux Control Plane [[ref](https://github.com/pimvanpelt/lcpng)], which you
|
||||
I've been working on the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)], which you
|
||||
can read all about in my series on VPP back in 2021:
|
||||
|
||||
[{: style="width:300px; float: right; margin-left: 1em;"}](https://video.ipng.ch/w/erc9sAofrSZ22qjPwmv6H4)
|
||||
@ -70,7 +70,7 @@ answered by a Response PDU.
|
||||
|
||||
Using parts of a Python Agentx library written by GitHub user hosthvo
|
||||
[[ref](https://github.com/hosthvo/pyagentx)], I tried my hands at writing one of these AgentX's.
|
||||
The resulting source code is on [[GitHub](https://github.com/pimvanpelt/vpp-snmp-agent)]. That's the
|
||||
The resulting source code is on [[GitHub](https://git.ipng.ch/ipng/vpp-snmp-agent)]. That's the
|
||||
one that's running in production ever since I started running VPP routers at IPng Networks AS8298.
|
||||
After the _AgentX_ exposes the dataplane interfaces and their statistics into _SNMP_, an open source
|
||||
monitoring tool such as LibreNMS [[ref](https://librenms.org/)] can discover the routers and draw
|
||||
@ -126,7 +126,7 @@ for any interface created in the dataplane.
|
||||
|
||||
I wish I were good at Go, but I never really took to the language. I'm pretty good at Python, but
|
||||
sorting through the stats segment isn't super quick as I've already noticed in the Python3 based
|
||||
[[VPP SNMP Agent](https://github.com/pimvanpelt/vpp-snmp-agent)]. I'm probably the world's least
|
||||
[[VPP SNMP Agent](https://git.ipng.ch/ipng/vpp-snmp-agent)]. I'm probably the world's least
|
||||
terrible C programmer, so maybe I can take a look at the VPP Stats Client and make sense of it. Luckily,
|
||||
there's an example already in `src/vpp/app/vpp_get_stats.c` and it reveals the following pattern:
|
||||
|
||||
|
@ -19,7 +19,7 @@ same time keep an IPng Site Local network with IPv4 and IPv6 that is separate fr
|
||||
based on hardware/silicon based forwarding at line rate and high availability. You can read all
|
||||
about my Centec MPLS shenanigans in [[this article]({{< ref "2023-03-11-mpls-core" >}})].
|
||||
|
||||
Ever since the release of the Linux Control Plane [[ref](https://github.com/pimvanpelt/lcpng)]
|
||||
Ever since the release of the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)]
|
||||
plugin in VPP, folks have asked "What about MPLS?" -- I have never really felt the need to go this
|
||||
rabbit hole, because I figured that in this day and age, higher level IP protocols that do tunneling
|
||||
are just as performant, and a little bit less of an 'art' to get right. For example, the Centec
|
||||
|
@ -459,6 +459,6 @@ and VPP, and the overall implementation before attempting to use in production.
|
||||
we got at least some of this right, but testing and runtime experience will tell.
|
||||
|
||||
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
||||
[[GitHub](https://github.com/pimvanpelt/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||
[[GitHub](https://git.ipng.ch/ipng/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
||||
|
||||
|
@ -385,5 +385,5 @@ and VPP, and the overall implementation before attempting to use in production.
|
||||
we got at least some of this right, but testing and runtime experience will tell.
|
||||
|
||||
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
||||
[[GitHub](https://github.com/pimvanpelt/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||
[[GitHub](https://git.ipng.ch/ipng/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
||||
|
@ -304,7 +304,7 @@ Gateway, just to show a few of the more advanced features of VPP. For me, this t
|
||||
line of thinking: classifiers. This extract/match/act pattern can be used in policers, ACLs and
|
||||
arbitrary traffic redirection through VPP's directed graph (eg. selecting a next node for
|
||||
processing). I'm going to deep-dive into this classifier behavior in an upcoming article, and see
|
||||
how I might add this to [[vppcfg](https://github.com/pimvanpelt/vppcfg.git)], because I think it
|
||||
how I might add this to [[vppcfg](https://git.ipng.ch/ipng/vppcfg.git)], because I think it
|
||||
would be super powerful to abstract away the rather complex underlying API into something a little
|
||||
bit more ... user friendly. Stay tuned! :)
|
||||
|
||||
|
@ -359,7 +359,7 @@ does not have an IPv4 address. Except -- I'm bending the rules a little bit by d
|
||||
There's an internal function `ip4_sw_interface_enable_disable()` which is called to enable IPv4
|
||||
processing on an interface once the first IPv4 address is added. So my first fix is to force this to
|
||||
be enabled for any interface that is exposed via Linux Control Plane, notably in `lcp_itf_pair_create()`
|
||||
[[here](https://github.com/pimvanpelt/lcpng/blob/main/lcpng_interface.c#L777)].
|
||||
[[here](https://git.ipng.ch/ipng/lcpng/blob/main/lcpng_interface.c#L777)].
|
||||
|
||||
This approach is partially effective:
|
||||
|
||||
@ -500,7 +500,7 @@ which is unnumbered. Because I don't know for sure if everybody would find this
|
||||
I make sure to guard the behavior behind a backwards compatible configuration option.
|
||||
|
||||
If you're curious, please take a look at the change in my [[GitHub
|
||||
repo](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||
repo](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||
which I:
|
||||
1. add a new configuration option, `lcp-sync-unnumbered`, which defaults to `on`. That would be
|
||||
what the plugin would do in the normal case: copy forward these borrowed IP addresses to Linux.
|
||||
|
@ -147,7 +147,7 @@ With all of that, I am ready to demonstrate two working solutions now. I first c
|
||||
Ondrej's [[commit](https://gitlab.nic.cz/labs/bird/-/commit/280daed57d061eb1ebc89013637c683fe23465e8)].
|
||||
Then, I compile VPP with my pending [[gerrit](https://gerrit.fd.io/r/c/vpp/+/40482)]. Finally,
|
||||
to demonstrate how `update_loopback_addr()` might work, I compile `lcpng` with my previous
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)],
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)],
|
||||
which allows me to inhibit copying forward addresses from VPP to Linux, when using _unnumbered_
|
||||
interfaces.
|
||||
|
||||
|
@ -1,8 +1,9 @@
|
||||
---
|
||||
date: "2024-04-27T10:52:11Z"
|
||||
title: FreeIX - Remote
|
||||
title: "FreeIX Remote - Part 1"
|
||||
aliases:
|
||||
- /s/articles/2024/04/27/freeix-1.html
|
||||
- /s/articles/2024/04/27/freeix-remote/
|
||||
---
|
||||
|
||||
# Introduction
|
||||
|
@ -250,10 +250,10 @@ remove the IPv4 and IPv6 addresses from the <span style='color:red;font-weight:b
|
||||
routers in Brüttisellen. They are directly connected, and if anything goes wrong, I can walk
|
||||
over and rescue them. Sounds like a safe way to start!
|
||||
|
||||
I quickly add the ability for [[vppcfg](https://github.com/pimvanpelt/vppcfg)] to configure
|
||||
I quickly add the ability for [[vppcfg](https://git.ipng.ch/ipng/vppcfg)] to configure
|
||||
_unnumbered_ interfaces. In VPP, these are interfaces that don't have an IPv4 or IPv6 address of
|
||||
their own, but they borrow one from another interface. If you're curious, you can take a look at the
|
||||
[[User Guide](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md#interfaces)] on
|
||||
[[User Guide](https://git.ipng.ch/ipng/vppcfg/blob/main/docs/config-guide.md#interfaces)] on
|
||||
GitHub.
|
||||
|
||||
Looking at their `vppcfg` files, the change is actually very easy, taking as an example the
|
||||
@ -291,7 +291,7 @@ interface.
|
||||
|
||||
In the article, you'll see that discussed as _Solution 2_, and it includes a bit of rationale why I
|
||||
find this better. I implemented it in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||
case you're curious, and the commandline keyword is `lcp lcp-sync-unnumbered off` (the default is
|
||||
_on_).
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
date: "2024-10-21T10:52:11Z"
|
||||
title: "FreeIX - Remote, part 2"
|
||||
title: "FreeIX Remote - Part 2"
|
||||
---
|
||||
|
||||
{{< image width="18em" float="right" src="/assets/freeix/freeix-artist-rendering.png" alt="FreeIX, Artists Rendering" >}}
|
||||
@ -8,7 +8,7 @@ title: "FreeIX - Remote, part 2"
|
||||
# Introduction
|
||||
|
||||
A few months ago, I wrote about [[an idea]({{< ref 2024-04-27-freeix-1.md >}})] to help boost the
|
||||
value of small Internet Exchange Points (_IXPs). When such an exchange doesn't have many members,
|
||||
value of small Internet Exchange Points (_IXPs_). When such an exchange doesn't have many members,
|
||||
then the operational costs of connecting to it (cross connects, router ports, finding peers, etc)
|
||||
are not very favorable.
|
||||
|
||||
|
857
content/articles/2025-02-08-sflow-3.md
Normal file
857
content/articles/2025-02-08-sflow-3.md
Normal file
@ -0,0 +1,857 @@
|
||||
---
|
||||
date: "2025-02-08T07:51:23Z"
|
||||
title: 'VPP with sFlow - Part 3'
|
||||
---
|
||||
|
||||
# Introduction
|
||||
|
||||
{{< image float="right" src="/assets/sflow/sflow.gif" alt="sFlow Logo" width="12em" >}}
|
||||
|
||||
In the second half of last year, I picked up a project together with Neil McKee of
|
||||
[[inMon](https://inmon.com/)], the care takers of [[sFlow](https://sflow.org)]: an industry standard
|
||||
technology for monitoring high speed networks. `sFlow` gives complete visibility into the
|
||||
use of networks enabling performance optimization, accounting/billing for usage, and defense against
|
||||
security threats.
|
||||
|
||||
The open source software dataplane [[VPP](https://fd.io)] is a perfect match for sampling, as it
|
||||
forwards packets at very high rates using underlying libraries like [[DPDK](https://dpdk.org/)] and
|
||||
[[RDMA](https://en.wikipedia.org/wiki/Remote_direct_memory_access)]. A clever design choice in the
|
||||
so called _Host sFlow Daemon_ [[host-sflow](https://github.com/sflow/host-sflow)], which allows for
|
||||
a small portion of code to _grab_ the samples, for example in a merchant silicon ASIC or FPGA, but
|
||||
also in the VPP software dataplane. The agent then _transmits_ these samples using a Linux kernel
|
||||
feature called [[PSAMPLE](https://github.com/torvalds/linux/blob/master/net/psample/psample.c)].
|
||||
This greatly reduces the complexity of code to be implemented in the forwarding path, while at the
|
||||
same time bringing consistency to the `sFlow` delivery pipeline by (re)using the `hsflowd` business
|
||||
logic for the more complex state keeping, packet marshalling and transmission from the _Agent_ to a
|
||||
central _Collector_.
|
||||
|
||||
In this third article, I wanted to spend some time discussing how samples make their way out of the
|
||||
VPP dataplane, and into higher level tools.
|
||||
|
||||
## Recap: sFlow
|
||||
|
||||
{{< image float="left" src="/assets/sflow/sflow-overview.png" alt="sFlow Overview" width="14em" >}}
|
||||
|
||||
sFlow describes a method for Monitoring Traffic in Switched/Routed Networks, originally described in
|
||||
[[RFC3176](https://datatracker.ietf.org/doc/html/rfc3176)]. The current specification is version 5
|
||||
and is homed on the sFlow.org website [[ref](https://sflow.org/sflow_version_5.txt)]. Typically, a
|
||||
Switching ASIC in the dataplane (seen at the bottom of the diagram to the left) is asked to copy
|
||||
1-in-N packets to local sFlow Agent.
|
||||
|
||||
**Sampling**: The agent will copy the first N bytes (typically 128) of the packet into a sample. As
|
||||
the ASIC knows which interface the packet was received on, the `inIfIndex` will be added. After a
|
||||
routing decision is made, the nexthop and its L2 address and interface become known. The ASIC might
|
||||
annotate the sample with this `outIfIndex` and `DstMAC` metadata as well.
|
||||
|
||||
**Drop Monitoring**: There's one rather clever insight that sFlow gives: what if the packet _was
|
||||
not_ routed or switched, but rather discarded? For this, sFlow is able to describe the reason for
|
||||
the drop. For example, the ASIC receive queue could have been overfull, or it did not find a
|
||||
destination to forward the packet to (no FIB entry), perhaps it was instructed by an ACL to drop the
|
||||
packet or maybe even tried to transmit the packet but the physical datalink layer had to abandon the
|
||||
transmission for whatever reason (link down, TX queue full, link saturation, and so on). It's hard
|
||||
to overstate how important it is to have this so-called _drop monitoring_, as operators often spend
|
||||
hours and hours figuring out _why_ packets are lost their network or datacenter switching fabric.
|
||||
|
||||
**Metadata**: The agent may have other metadata as well, such as which prefix was the source and
|
||||
destination of the packet, what additional RIB information is available (AS path, BGP communities,
|
||||
and so on). This may be added to the sample record as well.
|
||||
|
||||
**Counters**: Since sFlow is sampling 1:N packets, the system can estimate total traffic in a
|
||||
reasonably accurate way. Peter and Sonia wrote a succint
|
||||
[[paper](https://sflow.org/packetSamplingBasics/)] about the math, so I won't get into that here.
|
||||
Mostly because I am but a software engineer, not a statistician... :) However, I will say this: if a
|
||||
fraction of the traffic is sampled but the _Agent_ knows how many bytes and packets were forwarded
|
||||
in total, it can provide an overview with a quantifiable accuracy. This is why the _Agent_ will
|
||||
periodically get the interface counters from the ASIC.
|
||||
|
||||
**Collector**: One or more samples can be concatenated into UDP messages that go from the _sFlow
|
||||
Agent_ to a central _sFlow Collector_. The heavy lifting in analysis is done upstream from the
|
||||
switch or router, which is great for performance. Many thousands or even tens of thousands of
|
||||
agents can forward their samples and interface counters to a single central collector, which in turn
|
||||
can be used to draw up a near real time picture of the state of traffic through even the largest of
|
||||
ISP networks or datacenter switch fabrics.
|
||||
|
||||
In sFlow parlance [[VPP](https://fd.io/)] and its companion
|
||||
[[hsflowd](https://github.com/sflow/host-sflow)] together form an _Agent_ (it sends the UDP packets
|
||||
over the network), and for example the commandline tool `sflowtool` could be a _Collector_ (it
|
||||
receives the UDP packets).
|
||||
|
||||
## Recap: sFlow in VPP
|
||||
|
||||
First, I have some pretty good news to report - our work on this plugin was
|
||||
[[merged](https://gerrit.fd.io/r/c/vpp/+/41680)] and will be included in the VPP 25.02 release in a
|
||||
few weeks! Last weekend, I gave a lightning talk at
|
||||
[[FOSDEM](https://fosdem.org/2025/schedule/event/fosdem-2025-4196-vpp-monitoring-100gbps-with-sflow/)]
|
||||
in Brussels, Belgium, and caught up with a lot of community members and network- and software
|
||||
engineers. I had a great time.
|
||||
|
||||
In trying to keep the amount of code as small as possible, and therefore the probability of bugs that
|
||||
might impact VPP's dataplane stability low, the architecture of the end to end solution consists of
|
||||
three distinct parts, each with their own risk and performance profile:
|
||||
|
||||
{{< image float="left" src="/assets/sflow/sflow-vpp-overview.png" alt="sFlow VPP Overview" width="18em" >}}
|
||||
|
||||
**1. sFlow worker node**: Its job is to do what the ASIC does in the hardware case. As VPP moves
|
||||
packets from `device-input` to the `ethernet-input` nodes in its forwarding graph, the sFlow plugin
|
||||
will inspect 1-in-N, taking a sample for further processing. Here, we don't try to be clever, simply
|
||||
copy the `inIfIndex` and the first bytes of the ethernet frame, and append them to a
|
||||
[[FIFO](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics))] queue. If too many samples
|
||||
arrive, samples are dropped at the tail, and a counter incremented. This way, I can tell when the
|
||||
dataplane is congested. Bounded FIFOs also provide fairness: it allows for each VPP worker thread to
|
||||
get their fair share of samples into the Agent's hands.
|
||||
|
||||
**2. sFlow main process**: There's a function running on the _main thread_, which shifts further
|
||||
processing time _away_ from the dataplane. This _sflow-process_ does two things. Firstly, it
|
||||
consumes samples from the per-worker FIFO queues (both forwarded packets in green, and dropped ones
|
||||
in red). Secondly, it keeps track of time and every few seconds (20 by default, but this is
|
||||
configurable), it'll grab all interface counters from those interfaces for which I have sFlow
|
||||
turned on. VPP produces _Netlink_ messages and sends them to the kernel.
|
||||
|
||||
**3. Host sFlow daemon**: The third component is external to VPP: `hsflowd` subscribes to the _Netlink_
|
||||
messages. It goes without saying that `hsflowd` is a battle-hardened implementation running on
|
||||
hundreds of different silicon and software defined networking stacks. The PSAMPLE stuff is easy,
|
||||
this module already exists. But Neil implemented a _mod_vpp_ which can grab interface names and their
|
||||
`ifIndex`, and counter statistics. VPP emits this data as _Netlink_ `USERSOCK` messages alongside
|
||||
the PSAMPLEs.
|
||||
|
||||
|
||||
By the way, I've written about _Netlink_ before when discussing the [[Linux Control Plane]({{< ref
|
||||
2021-08-25-vpp-4 >}})] plugin. It's a mechanism for programs running in userspace to share
|
||||
information with the kernel. In the Linux kernel, packets can be sampled as well, and sent from
|
||||
kernel to userspace using a _PSAMPLE_ Netlink channel. However, the pattern is that of a message
|
||||
producer/subscriber relationship and nothing precludes one userspace process (`vpp`) to be the
|
||||
producer while another userspace process (`hsflowd`) acts as the consumer!
|
||||
|
||||
Assuming the sFlow plugin in VPP produces samples and counters properly, `hsflowd` will do the rest,
|
||||
giving correctness and upstream interoperability pretty much for free. That's slick!
|
||||
|
||||
### VPP: sFlow Configuration
|
||||
|
||||
The solution that I offer is based on two moving parts. First, the VPP plugin configuration, which
|
||||
turns on sampling at a given rate on physical devices, also known as _hardware-interfaces_. Second,
|
||||
the open source component [[host-sflow](https://github.com/sflow/host-sflow/releases)] can be
|
||||
configured as of release v2.11-5 [[ref](https://github.com/sflow/host-sflow/tree/v2.1.11-5)].
|
||||
|
||||
I will show how to configure VPP in three ways:
|
||||
|
||||
***1. VPP Configuration via CLI***
|
||||
|
||||
```
|
||||
pim@vpp0-0:~$ vppctl
|
||||
vpp0-0# sflow sampling-rate 100
|
||||
vpp0-0# sflow polling-interval 10
|
||||
vpp0-0# sflow header-bytes 128
|
||||
vpp0-0# sflow enable GigabitEthernet10/0/0
|
||||
vpp0-0# sflow enable GigabitEthernet10/0/0 disable
|
||||
vpp0-0# sflow enable GigabitEthernet10/0/2
|
||||
vpp0-0# sflow enable GigabitEthernet10/0/3
|
||||
```
|
||||
|
||||
The first three commands set the global defaults - in my case I'm going to be sampling at 1:100
|
||||
which is an unusually high rate. A production setup may take 1-in-_linkspeed-in-megabits_ so for a
|
||||
1Gbps device 1:1'000 is appropriate. For 100GE, something between 1:10'000 and 1:100'000 is more
|
||||
appropriate, depending on link load. The second command sets the interface stats polling interval.
|
||||
The default is to gather these statistics every 20 seconds, but I set it to 10s here.
|
||||
|
||||
Next, I tell the plugin how many bytes of the sampled ethernet frame should be taken. Common
|
||||
values are 64 and 128 but it doesn't have to be a power of two. I want enough data to see the
|
||||
headers, like MPLS label(s), Dot1Q tag(s), IP header and TCP/UDP/ICMP header, but the contents of
|
||||
the payload are rarely interesting for
|
||||
statistics purposes.
|
||||
|
||||
Finally, I can turn on the sFlow plugin on an interface with the `sflow enable-disable` CLI. In VPP,
|
||||
an idiomatic way to turn on and off things is to have an enabler/disabler. It feels a bit clunky
|
||||
maybe to write `sflow enable $iface disable` but it makes more logical sends if you parse that as
|
||||
"enable-disable" with the default being the "enable" operation, and the alternate being the
|
||||
"disable" operation.
|
||||
|
||||
***2. VPP Configuration via API***
|
||||
|
||||
I implemented a few API methods for the most common operations. Here's a snippet that obtains the
|
||||
same config as what I typed on the CLI above, but using these Python API calls:
|
||||
|
||||
```python
|
||||
from vpp_papi import VPPApiClient, VPPApiJSONFiles
|
||||
import sys
|
||||
|
||||
vpp_api_dir = VPPApiJSONFiles.find_api_dir([])
|
||||
vpp_api_files = VPPApiJSONFiles.find_api_files(api_dir=vpp_api_dir)
|
||||
vpp = VPPApiClient(apifiles=vpp_api_files, server_address="/run/vpp/api.sock")
|
||||
vpp.connect("sflow-api-client")
|
||||
print(vpp.api.show_version().version)
|
||||
# Output: 25.06-rc0~14-g9b1c16039
|
||||
|
||||
vpp.api.sflow_sampling_rate_set(sampling_N=100)
|
||||
print(vpp.api.sflow_sampling_rate_get())
|
||||
# Output: sflow_sampling_rate_get_reply(_0=655, context=3, sampling_N=100)
|
||||
|
||||
vpp.api.sflow_polling_interval_set(polling_S=10)
|
||||
print(vpp.api.sflow_polling_interval_get())
|
||||
# Output: sflow_polling_interval_get_reply(_0=661, context=5, polling_S=10)
|
||||
|
||||
vpp.api.sflow_header_bytes_set(header_B=128)
|
||||
print(vpp.api.sflow_header_bytes_get())
|
||||
# Output: sflow_header_bytes_get_reply(_0=665, context=7, header_B=128)
|
||||
|
||||
vpp.api.sflow_enable_disable(hw_if_index=1, enable_disable=True)
|
||||
vpp.api.sflow_enable_disable(hw_if_index=2, enable_disable=True)
|
||||
print(vpp.api.sflow_interface_dump())
|
||||
# Output: [ sflow_interface_details(_0=667, context=8, hw_if_index=1),
|
||||
# sflow_interface_details(_0=667, context=8, hw_if_index=2) ]
|
||||
|
||||
print(vpp.api.sflow_interface_dump(hw_if_index=2))
|
||||
# Output: [ sflow_interface_details(_0=667, context=9, hw_if_index=2) ]
|
||||
|
||||
print(vpp.api.sflow_interface_dump(hw_if_index=1234)) ## Invalid hw_if_index
|
||||
# Output: []
|
||||
|
||||
vpp.api.sflow_enable_disable(hw_if_index=1, enable_disable=False)
|
||||
print(vpp.api.sflow_interface_dump())
|
||||
# Output: [ sflow_interface_details(_0=667, context=10, hw_if_index=2) ]
|
||||
```
|
||||
|
||||
This short program toys around a bit with the sFlow API. I first set the sampling to 1:100 and get
|
||||
the current value. Then I set the polling interval to 10s and retrieve the current value again.
|
||||
Finally, I set the header bytes to 128, and retrieve the value again.
|
||||
|
||||
Enabling and disabling sFlow on interfaces shows the idiom I mentioned before - the API being an
|
||||
`*_enable_disable()` call of sorts, and typically taking a boolean argument if the operator wants to
|
||||
enable (the default), or disable sFlow on the interface. Getting the list of enabled interfaces can
|
||||
be done with the `sflow_interface_dump()` call, which returns a list of `sflow_interface_details`
|
||||
messages.
|
||||
|
||||
I demonstrated VPP's Python API and how it works in a fair amount of detail in a [[previous
|
||||
article]({{< ref 2024-01-27-vpp-papi >}})], in case this type of stuff interests you.
|
||||
|
||||
***3. VPPCfg YAML Configuration***
|
||||
|
||||
Writing on the CLI and calling the API is good and all, but many users of VPP have noticed that it
|
||||
does not have any form of configuration persistence and that's deliberate. VPP's goal is to be a
|
||||
programmable dataplane, and explicitly has left the programming and configuration as an exercise for
|
||||
integrators. I have written a Python project that takes a YAML file as input and uses it to
|
||||
configure (and reconfigure, on the fly) the dataplane automatically, called
|
||||
[[VPPcfg](https://git.ipng.ch/ipng/vppcfg.git)]. Previously, I wrote some implementation thoughts
|
||||
on its [[datamodel]({{< ref 2022-03-27-vppcfg-1 >}})] and its [[operations]({{< ref 2022-04-02-vppcfg-2
|
||||
>}})] so I won't repeat that here. Instead, I will just show the configuration:
|
||||
|
||||
```
|
||||
pim@vpp0-0:~$ cat << EOF > vppcfg.yaml
|
||||
interfaces:
|
||||
GigabitEthernet10/0/0:
|
||||
sflow: true
|
||||
GigabitEthernet10/0/1:
|
||||
sflow: true
|
||||
GigabitEthernet10/0/2:
|
||||
sflow: true
|
||||
GigabitEthernet10/0/3:
|
||||
sflow: true
|
||||
|
||||
sflow:
|
||||
sampling-rate: 100
|
||||
polling-interval: 10
|
||||
header-bytes: 128
|
||||
EOF
|
||||
pim@vpp0-0:~$ vppcfg plan -c vppcfg.yaml -o /etc/vpp/config/vppcfg.vpp
|
||||
[INFO ] root.main: Loading configfile vppcfg.yaml
|
||||
[INFO ] vppcfg.config.valid_config: Configuration validated successfully
|
||||
[INFO ] root.main: Configuration is valid
|
||||
[INFO ] vppcfg.reconciler.write: Wrote 13 lines to /etc/vpp/config/vppcfg.vpp
|
||||
[INFO ] root.main: Planning succeeded
|
||||
pim@vpp0-0:~$ vppctl exec /etc/vpp/config/vppcfg.vpp
|
||||
```
|
||||
|
||||
The nifty thing about `vppcfg` is that if I were to change, say, the sampling-rate (setting it to
|
||||
1000) and disable sFlow from an interface, say Gi10/0/0, I can re-run the `vppcfg plan` and `vppcfg
|
||||
apply` stages and the VPP dataplane will be reprogrammed to reflect the newly declared configuration.
|
||||
|
||||
### hsflowd: Configuration
|
||||
|
||||
When sFlow is enabled, VPP will start to emit _Netlink_ messages of type PSAMPLE with packet samples
|
||||
and of type USERSOCK with the custom messages containing interface names and counters. These latter
|
||||
custom messages have to be decoded, which is done by the _mod_vpp_ module in `hsflowd`, starting
|
||||
from release v2.11-5 [[ref](https://github.com/sflow/host-sflow/tree/v2.1.11-5)].
|
||||
|
||||
Here's a minimalist configuration:
|
||||
|
||||
```
|
||||
pim@vpp0-0:~$ cat /etc/hsflowd.conf
|
||||
sflow {
|
||||
collector { ip=127.0.0.1 udpport=16343 }
|
||||
collector { ip=192.0.2.1 namespace=dataplane }
|
||||
psample { group=1 }
|
||||
vpp { osIndex=off }
|
||||
}
|
||||
```
|
||||
|
||||
{{< image width="5em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
|
||||
|
||||
There are two important details that can be confusing at first: \
|
||||
**1.** kernel network namespaces \
|
||||
**2.** interface index namespaces
|
||||
|
||||
#### hsflowd: Network namespace
|
||||
|
||||
Network namespaces virtualize Linux's network stack. Upon creation, a network namespace contains only
|
||||
a loopback interface, and subsequently interfaces can be moved between namespaces. Each network
|
||||
namespace will have its own set of IP addresses, its own routing table, socket listing, connection
|
||||
tracking table, firewall, and other network-related resources. When started by systemd, `hsflowd`
|
||||
and VPP will normally both run in the _default_ network namespace.
|
||||
|
||||
Given this, I can conclude that when the sFlow plugin opens a Netlink channel, it will
|
||||
naturally do this in the network namespace that its VPP process is running in (the _default_
|
||||
namespace, normally). It is therefore important that the recipient of these Netlink messages,
|
||||
notably `hsflowd` runs in ths ***same*** namespace as VPP. It's totally fine to run them together in
|
||||
a different namespace (eg. a container in Kubernetes or Docker), as long as they can see each other.
|
||||
|
||||
It might pose a problem if the network connectivity lives in a different namespace than the default
|
||||
one. One common example (that I heavily rely on at IPng!) is to create Linux Control Plane interface
|
||||
pairs, _LIPs_, in a dataplane namespace. The main reason for doing this is to allow something like
|
||||
FRR or Bird to completely govern the routing table in the kernel and keep it in-sync with the FIB in
|
||||
VPP. In such a _dataplane_ network namespace, typically every interface is owned by VPP.
|
||||
|
||||
Luckily, `hsflowd` can attach to one (default) namespace to get the PSAMPLEs, but create a socket in
|
||||
a _different_ (dataplane) namespace to send packets to a collector. This explains the second
|
||||
_collector_ entry in the config-file above. Here, `hsflowd` will send UDP packets to 192.0.2.1:6343
|
||||
from within the (VPP) dataplane namespace, and to 127.0.0.1:16343 in the default namespace.
|
||||
|
||||
#### hsflowd: osIndex
|
||||
|
||||
I hope the previous section made some sense, because this one will be a tad more esoteric. When
|
||||
creating a network namespace, each interface will get its own uint32 interface index that identifies
|
||||
it, and such an ID is typically called an `ifIndex`. It's important to note that the same number can
|
||||
(and will!) occur multiple times, once for each namespace. Let me give you an example:
|
||||
|
||||
```
|
||||
pim@summer:~$ ip link
|
||||
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN ...
|
||||
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
|
||||
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ipng-sl state UP ...
|
||||
link/ether 00:22:19:6a:46:2e brd ff:ff:ff:ff:ff:ff
|
||||
altname enp1s0f0
|
||||
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 900 qdisc mq master ipng-sl state DOWN ...
|
||||
link/ether 00:22:19:6a:46:30 brd ff:ff:ff:ff:ff:ff
|
||||
altname enp1s0f1
|
||||
|
||||
pim@summer:~$ ip netns exec dataplane ip link
|
||||
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN ...
|
||||
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
|
||||
2: loop0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc mq state UP ...
|
||||
link/ether de:ad:00:00:00:00 brd ff:ff:ff:ff:ff:ff
|
||||
3: xe1-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc mq state UP ...
|
||||
link/ether 00:1b:21:bd:c7:18 brd ff:ff:ff:ff:ff:ff
|
||||
```
|
||||
|
||||
I want to draw your attention to the number at the beginning of the line. In the _default_
|
||||
namespace, `ifIndex=3` corresponds to `ifName=eno2` (which has no link, it's marked `DOWN`). But in
|
||||
the _dataplane_ namespace, that index corresponds to a completely different interface called
|
||||
`ifName=xe1-0` (which is link `UP`).
|
||||
|
||||
Now, let me show you the interfaces in VPP:
|
||||
|
||||
```
|
||||
pim@summer:~$ vppctl show int | grep Gigabit | egrep 'Name|loop0|tap0|Gigabit'
|
||||
Name Idx State MTU (L3/IP4/IP6/MPLS)
|
||||
GigabitEthernet4/0/0 1 up 9000/0/0/0
|
||||
GigabitEthernet4/0/1 2 down 9000/0/0/0
|
||||
GigabitEthernet4/0/2 3 down 9000/0/0/0
|
||||
GigabitEthernet4/0/3 4 down 9000/0/0/0
|
||||
TenGigabitEthernet5/0/0 5 up 9216/0/0/0
|
||||
TenGigabitEthernet5/0/1 6 up 9216/0/0/0
|
||||
loop0 7 up 9216/0/0/0
|
||||
tap0 19 up 9216/0/0/0
|
||||
```
|
||||
|
||||
Here, I want you to look at the second column `Idx`, which shows what VPP calls the _sw_if_index_
|
||||
(the software interface index, as opposed to hardware index). Here, `ifIndex=3` corresponds to
|
||||
`ifName=GigabitEthernet4/0/2`, which is neither `eno2` nor `xe1-0`. Oh my, yet _another_ namespace!
|
||||
|
||||
It turns out that there are three (relevant) types of namespaces at play here:
|
||||
1. ***Linux network*** namespace; here using `dataplane` and `default` each with their own unique
|
||||
(and overlapping) numbering.
|
||||
1. ***VPP hardware*** interface namespace, also called PHYs (for physical interfaces). When VPP
|
||||
first attaches to or creates network interfaces like the ones from DPDK or RDMA, these will
|
||||
create an _hw_if_index_ in a list.
|
||||
1. ***VPP software*** interface namespace. All interfaces (including hardware ones!) will
|
||||
receive a _sw_if_index_ in VPP. A good example is sub-interfaces: if I create a sub-int on
|
||||
GigabitEthernet4/0/2, it will NOT get a hardware index, but it _will_ get the next available
|
||||
software index (in this example, `sw_if_index=7`).
|
||||
|
||||
In Linux CP, I can see a mapping from one to the other, just look at this:
|
||||
|
||||
```
|
||||
pim@summer:~$ vppctl show lcp
|
||||
lcp default netns dataplane
|
||||
lcp lcp-auto-subint off
|
||||
lcp lcp-sync on
|
||||
lcp lcp-sync-unnumbered on
|
||||
itf-pair: [0] loop0 tap0 loop0 2 type tap netns dataplane
|
||||
itf-pair: [1] TenGigabitEthernet5/0/0 tap1 xe1-0 3 type tap netns dataplane
|
||||
itf-pair: [2] TenGigabitEthernet5/0/1 tap2 xe1-1 4 type tap netns dataplane
|
||||
itf-pair: [3] TenGigabitEthernet5/0/0.20 tap1.20 xe1-0.20 5 type tap netns dataplane
|
||||
```
|
||||
|
||||
Those `itf-pair` describe our _LIPs_, and they have the coordinates to three things. 1) The VPP
|
||||
software interface (VPP `ifName=loop0` with `sw_if_index=7`), which 2) Linux CP will mirror into the
|
||||
Linux kernel using a TAP device (VPP `ifName=tap0` with `sw_if_index=19`). That TAP has one leg in
|
||||
VPP (`tap0`), and another in 3) Linux (with `ifName=loop` and `ifIndex=2` in namespace `dataplane`).
|
||||
|
||||
> So the tuple that fully describes a _LIP_ is `{7, 19,'dataplane', 2}`
|
||||
|
||||
Climbing back out of that rabbit hole, I am now finally ready to explain the feature. When sFlow in
|
||||
VPP takes its sample, it will be doing this on a PHY, that is a given interface with a specific
|
||||
_hw_if_index_. When it polls the counters, it'll do it for that specific _hw_if_index_. It now has a
|
||||
choice: should it share with the world the representation of *its* namespace, or should it try to be
|
||||
smarter? If LinuxCP is enabled, this interface will likely have a representation in Linux. So the
|
||||
plugin will first resolve the _sw_if_index_ belonging to that PHY, and using that, try to look up a
|
||||
_LIP_ with it. If it finds one, it'll know both the namespace in which it lives as well as the
|
||||
osIndex in that namespace. If it doesn't find a _LIP_, it will at least have the _sw_if_index_ at
|
||||
hand, so it'll annotate the USERSOCK counter messages with this information instead.
|
||||
|
||||
Now, `hsflowd` has a choice to make: does it share the Linux representation and hide VPP as an
|
||||
implementation detail? Or does it share the VPP dataplane _sw_if_index_? There are use cases
|
||||
relevant to both, so the decision was to let the operator decide, by setting `osIndex` either `on`
|
||||
(use Linux ifIndex) or `off` (use VPP _sw_if_index_).
|
||||
|
||||
### hsflowd: Host Counters
|
||||
|
||||
Now that I understand the configuration parts of VPP and `hsflowd`, I decide to configure everything
|
||||
but without enabling sFlow on on any interfaces yet in VPP. Once I start the daemon, I can see that
|
||||
it sends an UDP packet every 30 seconds to the configured _collector_:
|
||||
|
||||
```
|
||||
pim@vpp0-0:~$ sudo tcpdump -s 9000 -i lo -n
|
||||
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
|
||||
listening on lo, link-type EN10MB (Ethernet), snapshot length 9000 bytes
|
||||
15:34:19.695042 IP 127.0.0.1.48753 > 127.0.0.1.6343: sFlowv5,
|
||||
IPv4 agent 198.19.5.16, agent-id 100000, length 716
|
||||
```
|
||||
|
||||
The `tcpdump` I have on my Debian bookworm machines doesn't know how to decode the contents of these
|
||||
sFlow packets. Actually, neither does Wireshark. I've attached a file of these mysterious packets
|
||||
[[sflow-host.pcap](/assets/sflow/sflow-host.pcap)] in case you want to take a look.
|
||||
Neil however gives me a tip. A full message decoder and otherwise handy Swiss army knife lives in
|
||||
[[sflowtool](https://github.com/sflow/sflowtool)].
|
||||
|
||||
I can offer this pcap file to `sflowtool`, or let it just listen on the UDP port directly, and
|
||||
it'll tell me what it finds:
|
||||
|
||||
```
|
||||
pim@vpp0-0:~$ sflowtool -p 6343
|
||||
startDatagram =================================
|
||||
datagramSourceIP 127.0.0.1
|
||||
datagramSize 716
|
||||
unixSecondsUTC 1739112018
|
||||
localtime 2025-02-09T15:40:18+0100
|
||||
datagramVersion 5
|
||||
agentSubId 100000
|
||||
agent 198.19.5.16
|
||||
packetSequenceNo 57
|
||||
sysUpTime 987398
|
||||
samplesInPacket 1
|
||||
startSample ----------------------
|
||||
sampleType_tag 0:4
|
||||
sampleType COUNTERSSAMPLE
|
||||
sampleSequenceNo 33
|
||||
sourceId 2:1
|
||||
counterBlock_tag 0:2001
|
||||
adaptor_0_ifIndex 2
|
||||
adaptor_0_MACs 1
|
||||
adaptor_0_MAC_0 525400f00100
|
||||
counterBlock_tag 0:2010
|
||||
udpInDatagrams 123904
|
||||
udpNoPorts 23132459
|
||||
udpInErrors 0
|
||||
udpOutDatagrams 46480629
|
||||
udpRcvbufErrors 0
|
||||
udpSndbufErrors 0
|
||||
udpInCsumErrors 0
|
||||
counterBlock_tag 0:2009
|
||||
tcpRtoAlgorithm 1
|
||||
tcpRtoMin 200
|
||||
tcpRtoMax 120000
|
||||
tcpMaxConn 4294967295
|
||||
tcpActiveOpens 0
|
||||
tcpPassiveOpens 30
|
||||
tcpAttemptFails 0
|
||||
tcpEstabResets 0
|
||||
tcpCurrEstab 1
|
||||
tcpInSegs 89120
|
||||
tcpOutSegs 86961
|
||||
tcpRetransSegs 59
|
||||
tcpInErrs 0
|
||||
tcpOutRsts 4
|
||||
tcpInCsumErrors 0
|
||||
counterBlock_tag 0:2008
|
||||
icmpInMsgs 23129314
|
||||
icmpInErrors 32
|
||||
icmpInDestUnreachs 0
|
||||
icmpInTimeExcds 23129282
|
||||
icmpInParamProbs 0
|
||||
icmpInSrcQuenchs 0
|
||||
icmpInRedirects 0
|
||||
icmpInEchos 0
|
||||
icmpInEchoReps 32
|
||||
icmpInTimestamps 0
|
||||
icmpInAddrMasks 0
|
||||
icmpInAddrMaskReps 0
|
||||
icmpOutMsgs 0
|
||||
icmpOutErrors 0
|
||||
icmpOutDestUnreachs 23132467
|
||||
icmpOutTimeExcds 0
|
||||
icmpOutParamProbs 23132467
|
||||
icmpOutSrcQuenchs 0
|
||||
icmpOutRedirects 0
|
||||
icmpOutEchos 0
|
||||
icmpOutEchoReps 0
|
||||
icmpOutTimestamps 0
|
||||
icmpOutTimestampReps 0
|
||||
icmpOutAddrMasks 0
|
||||
icmpOutAddrMaskReps 0
|
||||
counterBlock_tag 0:2007
|
||||
ipForwarding 2
|
||||
ipDefaultTTL 64
|
||||
ipInReceives 46590552
|
||||
ipInHdrErrors 0
|
||||
ipInAddrErrors 0
|
||||
ipForwDatagrams 0
|
||||
ipInUnknownProtos 0
|
||||
ipInDiscards 0
|
||||
ipInDelivers 46402357
|
||||
ipOutRequests 69613096
|
||||
ipOutDiscards 0
|
||||
ipOutNoRoutes 80
|
||||
ipReasmTimeout 0
|
||||
ipReasmReqds 0
|
||||
ipReasmOKs 0
|
||||
ipReasmFails 0
|
||||
ipFragOKs 0
|
||||
ipFragFails 0
|
||||
ipFragCreates 0
|
||||
counterBlock_tag 0:2005
|
||||
disk_total 6253608960
|
||||
disk_free 2719039488
|
||||
disk_partition_max_used 56.52
|
||||
disk_reads 11512
|
||||
disk_bytes_read 626214912
|
||||
disk_read_time 48469
|
||||
disk_writes 1058955
|
||||
disk_bytes_written 8924332032
|
||||
disk_write_time 7954804
|
||||
counterBlock_tag 0:2004
|
||||
mem_total 8326963200
|
||||
mem_free 5063872512
|
||||
mem_shared 0
|
||||
mem_buffers 86425600
|
||||
mem_cached 827752448
|
||||
swap_total 0
|
||||
swap_free 0
|
||||
page_in 306365
|
||||
page_out 4357584
|
||||
swap_in 0
|
||||
swap_out 0
|
||||
counterBlock_tag 0:2003
|
||||
cpu_load_one 0.030
|
||||
cpu_load_five 0.050
|
||||
cpu_load_fifteen 0.040
|
||||
cpu_proc_run 1
|
||||
cpu_proc_total 138
|
||||
cpu_num 2
|
||||
cpu_speed 1699
|
||||
cpu_uptime 1699306
|
||||
cpu_user 64269210
|
||||
cpu_nice 1810
|
||||
cpu_system 34690140
|
||||
cpu_idle 3234293560
|
||||
cpu_wio 3568580
|
||||
cpuintr 0
|
||||
cpu_sintr 5687680
|
||||
cpuinterrupts 1596621688
|
||||
cpu_contexts 3246142972
|
||||
cpu_steal 329520
|
||||
cpu_guest 0
|
||||
cpu_guest_nice 0
|
||||
counterBlock_tag 0:2006
|
||||
nio_bytes_in 250283
|
||||
nio_pkts_in 2931
|
||||
nio_errs_in 0
|
||||
nio_drops_in 0
|
||||
nio_bytes_out 370244
|
||||
nio_pkts_out 1640
|
||||
nio_errs_out 0
|
||||
nio_drops_out 0
|
||||
counterBlock_tag 0:2000
|
||||
hostname vpp0-0
|
||||
UUID ec933791-d6af-7a93-3b8d-aab1a46d6faa
|
||||
machine_type 3
|
||||
os_name 2
|
||||
os_release 6.1.0-26-amd64
|
||||
endSample ----------------------
|
||||
endDatagram =================================
|
||||
```
|
||||
|
||||
If you thought: "What an obnoxiously long paste!", then my slightly RSI-induced mouse-hand might
|
||||
agree with you. But it is really cool to see that every 30 seconds, the _collector_ will receive
|
||||
this form of heartbeat from the _agent_. There's a lot of vitalsigns in this packet, including some
|
||||
non-obvious but interesting stats like CPU load, memory, disk use and disk IO, and kernel version
|
||||
information. It's super dope!
|
||||
|
||||
### hsflowd: Interface Counters
|
||||
|
||||
Next, I'll enable sFlow in VPP on all four interfaces (Gi10/0/0-Gi10/0/3), set the sampling rate to
|
||||
something very high (1 in 100M), and the interface polling-interval to every 10 seconds. And indeed,
|
||||
every ten seconds or so I get a few packets, which I captured in
|
||||
[[sflow-interface.pcap](/assets/sflow/sflow-interface.pcap)]. Most of the packets contain only one
|
||||
counter record, while some contain more than one (in the PCAP, packet #9 has two). If I update the
|
||||
polling-interval to every second, I can see that most of the packets have all four counters.
|
||||
|
||||
Those interface counters, as decoded by `sflowtool`, look like this:
|
||||
|
||||
```
|
||||
pim@vpp0-0:~$ sflowtool -r sflow-interface.pcap | \
|
||||
awk '/startSample/ { on=1 } { if (on) { print $0 } } /endSample/ { on=0 }'
|
||||
startSample ----------------------
|
||||
sampleType_tag 0:4
|
||||
sampleType COUNTERSSAMPLE
|
||||
sampleSequenceNo 745
|
||||
sourceId 0:3
|
||||
counterBlock_tag 0:1005
|
||||
ifName GigabitEthernet10/0/2
|
||||
counterBlock_tag 0:1
|
||||
ifIndex 3
|
||||
networkType 6
|
||||
ifSpeed 0
|
||||
ifDirection 1
|
||||
ifStatus 3
|
||||
ifInOctets 858282015
|
||||
ifInUcastPkts 780540
|
||||
ifInMulticastPkts 0
|
||||
ifInBroadcastPkts 0
|
||||
ifInDiscards 0
|
||||
ifInErrors 0
|
||||
ifInUnknownProtos 0
|
||||
ifOutOctets 1246716016
|
||||
ifOutUcastPkts 975772
|
||||
ifOutMulticastPkts 0
|
||||
ifOutBroadcastPkts 0
|
||||
ifOutDiscards 127
|
||||
ifOutErrors 28
|
||||
ifPromiscuousMode 0
|
||||
endSample ----------------------
|
||||
```
|
||||
|
||||
What I find particularly cool about it, is that sFlow provides an automatic mapping between the
|
||||
`ifName=GigabitEthernet10/0/2` (tag 0:1005), together with an object (tag 0:1), which contains the
|
||||
`ifIndex=3`, and lots of packet and octet counters both in the ingress and egress direction. This is
|
||||
super useful for upstream _collectors_, as they can now find the hostname, agent name and address,
|
||||
and the correlation between interface names and their indexes. Noice!
|
||||
|
||||
#### hsflowd: Packet Samples
|
||||
|
||||
Now it's time to ratchet up the packet sampling, so I move it from 1:100M to 1:1000, while keeping
|
||||
the interface polling-interval at 10 seconds and I ask VPP to sample 64 bytes of each packet that it
|
||||
inspects. On either side of my pet VPP instance, I start an `iperf3` run to generate some traffic. I
|
||||
now see a healthy stream of sflow packets coming in on port 6343. They still contain every 30
|
||||
seconds or so a host counter, and every 10 seconds a set of interface counters come by, but mostly
|
||||
these UDP packets are showing me samples. I've captured a few minutes of these in
|
||||
[[sflow-all.pcap](/assets/sflow/sflow-all.pcap)].
|
||||
Although Wireshark doesn't know how to interpret the sFlow counter messages, it _does_ know how to
|
||||
interpret the sFlow sample messagess, and it reveals one of them like this:
|
||||
|
||||
{{< image width="100%" src="/assets/sflow/sflow-wireshark.png" alt="sFlow Wireshark" >}}
|
||||
|
||||
Let me take a look at the picture from top to bottom. First, the outer header (from 127.0.0.1:48753
|
||||
to 127.0.0.1:6343) is the sFlow agent sending to the collector. The agent identifies itself as
|
||||
having IPv4 address 198.19.5.16 with ID 100000 and an uptime of 1h52m. Then, it says it's going to
|
||||
send 9 samples, the first of which says it's from ifIndex=2 and at a sampling rate of 1:1000. It
|
||||
then shows that sample, saying that the frame length is 1518 bytes, and the first 64 bytes of those
|
||||
are sampled. Finally, the first sampled packet starts at the blue line. It shows the SrcMAC and
|
||||
DstMAC, and that it was a TCP packet from 192.168.10.17:51028 to 192.168.10.33:5201 - my running
|
||||
`iperf3`, booyah!
|
||||
|
||||
### VPP: sFlow Performance
|
||||
|
||||
{{< image float="right" src="/assets/sflow/sflow-lab.png" alt="sFlow Lab" width="20em" >}}
|
||||
|
||||
One question I get a lot about this plugin is: what is the performance impact when using
|
||||
sFlow? I spent a considerable amount of time tinkering with this, and together with Neil bringing
|
||||
the plugin to what we both agree is the most efficient use of CPU. We could have gone a bit further,
|
||||
but that would require somewhat intrusive changes to VPP's internals and as _North of the Border_
|
||||
(and the Simpsons!) would say: what we have isn't just good, it's good enough!
|
||||
|
||||
I've built a small testbed based on two Dell R730 machines. On the left, I have a Debian machine
|
||||
running Cisco T-Rex using four quad-tengig network cards, the classic Intel i710-DA4. On the right,
|
||||
I have my VPP machine called _Hippo_ (because it's always hungry for packets), with the same
|
||||
hardware. I'll build two halves. On the top NIC (Te3/0/0-3 in VPP), I will install IPv4 and MPLS
|
||||
forwarding on the purple circuit, and a simple Layer2 cross connect on the cyan circuit. On all four
|
||||
interfaces, I will enable sFlow. Then, I will mirror this configuration on the bottom NIC
|
||||
(Te130/0/0-3) in the red and green circuits, for which I will leave sFlow turned off.
|
||||
|
||||
To help you reproduce my results, and under the assumption that this is your jam, here's the
|
||||
configuration for all of the kit:
|
||||
|
||||
***0. Cisco T-Rex***
|
||||
```
|
||||
pim@trex:~ $ cat /srv/trex/8x10.yaml
|
||||
- version: 2
|
||||
interfaces: [ '06:00.0', '06:00.1', '83:00.0', '83:00.1', '87:00.0', '87:00.1', '85:00.0', '85:00.1' ]
|
||||
port_info:
|
||||
- src_mac: 00:1b:21:06:00:00
|
||||
dest_mac: 9c:69:b4:61:a1:dc # Connected to Hippo Te3/0/0, purple
|
||||
- src_mac: 00:1b:21:06:00:01
|
||||
dest_mac: 9c:69:b4:61:a1:dd # Connected to Hippo Te3/0/1, purple
|
||||
- src_mac: 00:1b:21:83:00:00
|
||||
dest_mac: 00:1b:21:83:00:01 # L2XC via Hippo Te3/0/2, cyan
|
||||
- src_mac: 00:1b:21:83:00:01
|
||||
dest_mac: 00:1b:21:83:00:00 # L2XC via Hippo Te3/0/3, cyan
|
||||
|
||||
- src_mac: 00:1b:21:87:00:00
|
||||
dest_mac: 9c:69:b4:61:75:d0 # Connected to Hippo Te130/0/0, red
|
||||
- src_mac: 00:1b:21:87:00:01
|
||||
dest_mac: 9c:69:b4:61:75:d1 # Connected to Hippo Te130/0/1, red
|
||||
- src_mac: 9c:69:b4:85:00:00
|
||||
dest_mac: 9c:69:b4:85:00:01 # L2XC via Hippo Te130/0/2, green
|
||||
- src_mac: 9c:69:b4:85:00:01
|
||||
dest_mac: 9c:69:b4:85:00:00 # L2XC via Hippo Te130/0/3, green
|
||||
pim@trex:~ $ sudo t-rex-64 -i -c 4 --cfg /srv/trex/8x10.yaml
|
||||
```
|
||||
|
||||
When constructing the T-Rex configuration, I specifically set the destination MAC address for L3
|
||||
circuits (the purple and red ones) using Hippo's interface MAC address, which I can find with
|
||||
`vppctl show hardware-interfaces`. This way, T-Rex does not have to ARP for the VPP endpoint. On
|
||||
L2XC circuits (the cyan and green ones), VPP does not concern itself with the MAC addressing at
|
||||
all. It puts its interface in _promiscuous_ mode, and simply writes out any ethernet frame received,
|
||||
directly to the egress interface.
|
||||
|
||||
***1. IPv4***
|
||||
```
|
||||
hippo# set int state TenGigabitEthernet3/0/0 up
|
||||
hippo# set int state TenGigabitEthernet3/0/1 up
|
||||
hippo# set int state TenGigabitEthernet130/0/0 up
|
||||
hippo# set int state TenGigabitEthernet130/0/1 up
|
||||
hippo# set int ip address TenGigabitEthernet3/0/0 100.64.0.1/31
|
||||
hippo# set int ip address TenGigabitEthernet3/0/1 100.64.1.1/31
|
||||
hippo# set int ip address TenGigabitEthernet130/0/0 100.64.4.1/31
|
||||
hippo# set int ip address TenGigabitEthernet130/0/1 100.64.5.1/31
|
||||
hippo# ip route add 16.0.0.0/24 via 100.64.0.0
|
||||
hippo# ip route add 48.0.0.0/24 via 100.64.1.0
|
||||
hippo# ip route add 16.0.2.0/24 via 100.64.4.0
|
||||
hippo# ip route add 48.0.2.0/24 via 100.64.5.0
|
||||
hippo# ip neighbor TenGigabitEthernet3/0/0 100.64.0.0 00:1b:21:06:00:00 static
|
||||
hippo# ip neighbor TenGigabitEthernet3/0/1 100.64.1.0 00:1b:21:06:00:01 static
|
||||
hippo# ip neighbor TenGigabitEthernet130/0/0 100.64.4.0 00:1b:21:87:00:00 static
|
||||
hippo# ip neighbor TenGigabitEthernet130/0/1 100.64.5.0 00:1b:21:87:00:01 static
|
||||
```
|
||||
|
||||
By the way, one note to this last piece, I'm setting static IPv4 neighbors so that Cisco T-Rex
|
||||
as well as VPP do not have to use ARP to resolve each other. You'll see above that the T-Rex
|
||||
configuration also uses MAC addresses exclusively. Setting the `ip neighbor` like this allows VPP
|
||||
to know where to send return traffic.
|
||||
|
||||
***2. MPLS***
|
||||
```
|
||||
hippo# mpls table add 0
|
||||
hippo# set interface mpls TenGigabitEthernet3/0/0 enable
|
||||
hippo# set interface mpls TenGigabitEthernet3/0/1 enable
|
||||
hippo# set interface mpls TenGigabitEthernet130/0/0 enable
|
||||
hippo# set interface mpls TenGigabitEthernet130/0/1 enable
|
||||
hippo# mpls local-label add 16 eos via 100.64.1.0 TenGigabitEthernet3/0/1 out-labels 17
|
||||
hippo# mpls local-label add 17 eos via 100.64.0.0 TenGigabitEthernet3/0/0 out-labels 16
|
||||
hippo# mpls local-label add 20 eos via 100.64.5.0 TenGigabitEthernet130/0/1 out-labels 21
|
||||
hippo# mpls local-label add 21 eos via 100.64.4.0 TenGigabitEthernet130/0/0 out-labels 20
|
||||
```
|
||||
|
||||
Here, the MPLS configuration implements a simple P-router, where incoming MPLS packets with label 16
|
||||
will be sent back to T-Rex on Te3/0/1 to the specified IPv4 nexthop (for which I already know the
|
||||
MAC address), and with label 16 removed and new label 17 imposed, in other words a SWAP operation.
|
||||
|
||||
***3. L2XC***
|
||||
```
|
||||
hippo# set int state TenGigabitEthernet3/0/2 up
|
||||
hippo# set int state TenGigabitEthernet3/0/3 up
|
||||
hippo# set int state TenGigabitEthernet130/0/2 up
|
||||
hippo# set int state TenGigabitEthernet130/0/3 up
|
||||
hippo# set int l2 xconnect TenGigabitEthernet3/0/2 TenGigabitEthernet3/0/3
|
||||
hippo# set int l2 xconnect TenGigabitEthernet3/0/3 TenGigabitEthernet3/0/2
|
||||
hippo# set int l2 xconnect TenGigabitEthernet130/0/2 TenGigabitEthernet130/0/3
|
||||
hippo# set int l2 xconnect TenGigabitEthernet130/0/3 TenGigabitEthernet130/0/2
|
||||
```
|
||||
|
||||
I've added a layer2 cross connect as well because it's computationally very cheap for VPP to receive
|
||||
an L2 (ethernet) datagram, and immediately transmit it on another interface. There's no FIB lookup
|
||||
and not even an L2 nexthop lookup involved, VPP is just shoveling ethernet packets in-and-out as
|
||||
fast as it can!
|
||||
|
||||
Here's how a loadtest looks like when sending 80Gbps at 192b packets on all eight interfaces:
|
||||
|
||||
{{< image src="/assets/sflow/sflow-lab-trex.png" alt="sFlow T-Rex" width="100%" >}}
|
||||
|
||||
The leftmost ports p0 <-> p1 are sending IPv4+MPLS, while ports p0 <-> p2 are sending ethernet back
|
||||
and forth. All four of them have sFlow enabled, at a sampling rate of 1:10'000, the default. These
|
||||
four ports are my experiment, to show the CPU use of sFlow. Then, ports p3 <-> p4 and p5 <-> p6
|
||||
respectively have sFlow turned off but with the same configuration. They are my control, showing
|
||||
the CPU use without sFlow.
|
||||
|
||||
**First conclusion**: This stuff works a treat. There is absolutely no impact of throughput at
|
||||
80Gbps with 47.6Mpps either _with_, or _without_ sFlow turned on. That's wonderful news, as it shows
|
||||
that the dataplane has more CPU available than is needed for any combination of functionality.
|
||||
|
||||
But what _is_ the limit? For this, I'll take a deeper look at the runtime statistics by varying the
|
||||
CPU time spent and maximum throughput achievable on a single VPP worker, thus using a single CPU
|
||||
thread on this Hippo machine that has 44 cores and 44 hyperthreads. I switch the loadtester to emit
|
||||
64 byte ethernet packets, the smallest I'm allowed to send.
|
||||
|
||||
| Loadtest | no sFlow | 1:1'000'000 | 1:10'000 | 1:1'000 | 1:100 |
|
||||
|-------------|-----------|-----------|-----------|-----------|-----------|
|
||||
| L2XC | 14.88Mpps | 14.32Mpps | 14.31Mpps | 14.27Mpps | 14.15Mpps |
|
||||
| IPv4 | 10.89Mpps | 9.88Mpps | 9.88Mpps | 9.84Mpps | 9.73Mpps |
|
||||
| MPLS | 10.11Mpps | 9.52Mpps | 9.52Mpps | 9.51Mpps | 9.45Mpps |
|
||||
| ***sFlow Packets*** / 10sec | N/A | 337.42M total | 337.39M total | 336.48M total | 333.64M total |
|
||||
| .. Sampled | | 328 | 33.8k | 336k | 3.34M |
|
||||
| .. Sent | | 328 | 33.8k | 336k | 1.53M |
|
||||
| .. Dropped | | 0 | 0 | 0 | 1.81M |
|
||||
|
||||
Here I can make a few important observations.
|
||||
|
||||
**Baseline**: One worker (thus, one CPU thread) can sustain 14.88Mpps of L2XC when sFlow is turned off, which
|
||||
implies that it has a little bit of CPU left over to do other work, if needed. With IPv4, I can see
|
||||
that the throughput is actually CPU limited: 10.89Mpps can be handled by one worker (thus, one CPU thread). I
|
||||
know that MPLS is a little bit more expensive computationally than IPv4, and that checks out. The
|
||||
total capacity is 10.11Mpps for one worker, when sFlow is turned off.
|
||||
|
||||
**Overhead**: When I turn on sFlow on the interface, VPP will insert the _sflow-node_ into the
|
||||
forwarding graph between `device-input` and `ethernet-input`. It means that the sFlow node will see
|
||||
_every single_ packet, and it will have to move all of these into the next node, which costs about
|
||||
9.5 CPU cycles per packet. The regression on L2XC is 3.8% but I have to note that VPP was not CPU
|
||||
bound on the L2XC so it used some CPU cycles which were still available, before regressing
|
||||
throughput. There is an immediate regression of 9.3% on IPv4 and 5.9% on MPLS, only to shuffle the
|
||||
packets through the graph.
|
||||
|
||||
**Sampling Cost**: But when then doing higher rates of sampling, the further regression is not _that_
|
||||
terrible. Between 1:1'000'000 and 1:10'000, there's barely a noticeable difference. Even in the
|
||||
worst case of 1:100, the regression is from 14.32Mpps to 14.15Mpps for L2XC, only 1.2%. The
|
||||
regression for L2XC, IPv4 and MPLS are all very modest, at 1.2% (L2XC), 1.6% (IPv4) and 0.8% (MPLS).
|
||||
Of course, by using multiple hardware receive queues and multiple RX workers per interface, the cost
|
||||
can be kept well in hand.
|
||||
|
||||
**Overload Protection**: At 1:1'000 and an effective rate of 33.65Mpps across all ports, I correctly
|
||||
observe 336k samples taken, and sent to PSAMPLE. At 1:100 however, there are 3.34M samples, but
|
||||
they are not fitting through the FIFO, so the plugin is dropping samples to protect downstream
|
||||
`sflow-main` thread and `hsflowd`. I can see that here, 1.81M samples have been dropped, while 1.53M
|
||||
samples made it through. By the way, this means VPP is happily sending a whopping 153K samples/sec
|
||||
to the collector!
|
||||
|
||||
## What's Next
|
||||
|
||||
Now that I've seen the UDP packets from our agent to a collector on the wire, and also how
|
||||
incredibly efficient the sFlow sampling implementation turned out, I'm super motivated to
|
||||
continue the journey with higher level collector receivers like ntopng, sflow-rt or Akvorado. In an
|
||||
upcoming article, I'll describe how I rolled out Akvorado at IPng, and what types of changes would
|
||||
make the user experience even better (or simpler to understand, at least).
|
||||
|
||||
### Acknowledgements
|
||||
|
||||
I'd like to thank Neil McKee from inMon for his dedication to getting things right, including the
|
||||
finer details such as logging, error handling, API specifications, and documentation. He has been a
|
||||
true pleasure to work with and learn from. Also, thank you to the VPP developer community, notably
|
||||
Benoit, Florin, Damjan, Dave and Matt, for helping with the review and getting this thing merged in
|
||||
time for the 25.02 release.
|
793
content/articles/2025-04-09-frysix-evpn.md
Normal file
793
content/articles/2025-04-09-frysix-evpn.md
Normal file
@ -0,0 +1,793 @@
|
||||
---
|
||||
date: "2025-04-09T07:51:23Z"
|
||||
title: 'FrysIX eVPN: think different'
|
||||
---
|
||||
|
||||
{{< image float="right" src="/assets/frys-ix/frysix-logo-small.png" alt="FrysIX Logo" width="12em" >}}
|
||||
|
||||
# Introduction
|
||||
|
||||
Somewhere in the far north of the Netherlands, the country where I was born, a town called Jubbega
|
||||
is the home of the Frysian Internet Exchange called [[Frys-IX](https://frys-ix.net/)]. Back in 2021,
|
||||
a buddy of mine, Arend, said that he was planning on renting a rack at the NIKHEF facility, one of
|
||||
the most densely populated facilities in western Europe. He was looking for a few launching
|
||||
customers and I was definitely in the market for a presence in Amsterdam. I even wrote about it on
|
||||
my [[bucketlist]({{< ref 2021-07-26-bucketlist.md >}})]. Arend and his IT company
|
||||
[[ERITAP](https://www.eritap.com/)], took delivery of that rack in May of 2021, and this is when the
|
||||
internet exchange with _Frysian roots_ was born.
|
||||
|
||||
In the years from 2021 until now, Arend and I have been operating the exchange with reasonable
|
||||
success. It grew from a handful of folks in that first rack, to now some 250 participating ISPs
|
||||
with about ten switches in six datacenters across the Amsterdam metro area. It's shifting a cool
|
||||
800Gbit of traffic or so. It's dope, and very rewarding to be able to contribute to this community!
|
||||
|
||||
## Frys-IX is growing
|
||||
|
||||
We have several members with a 2x100G LAG and even though all inter-datacenter links are either dark
|
||||
fiber or WDM, we're starting to feel the growing pains as we set our sights to the next step growth.
|
||||
You see, when FrysIX did 13.37Gbit of traffic, Arend organized a barbecue. When it did 133.7Gbit of
|
||||
traffic, Arend organized an even bigger barbecue. Obviously, the next step is 1337Gbit and joining
|
||||
the infamous [[One TeraBit Club](https://github.com/tking/OneTeraBitClub)]. Thomas: we're on our
|
||||
way!
|
||||
|
||||
It became clear that we will not be able to keep a dependable peering platform if FrysIX remains a
|
||||
single L2 broadcast domain, and it also became clear that concatenating multiple 100G ports would be
|
||||
operationally expensive (think of all the dark fiber or WDM waves!), and brittle (think of LACP and
|
||||
balancing traffic over those ports). We need to modernize in order to stay ahead of the growth
|
||||
curve.
|
||||
|
||||
## Hello Nokia
|
||||
|
||||
{{< image float="right" src="/assets/frys-ix/nokia-7220-d4.png" alt="Nokia 7220-D4" width="20em" >}}
|
||||
|
||||
The Nokia 7220 Interconnect Router (7220 IXR) for data center fabric provides fixed-configuration,
|
||||
high-capacity platforms that let you bring unmatched scale, flexibility and operational simplicity
|
||||
to your data center networks and peering network environments. These devices are built around the
|
||||
Broadcom _Trident_ chipset, in the case of the "D4" platform, this is a Trident4 with 28x100G and
|
||||
8x400G ports. Whoot!
|
||||
|
||||
{{< image float="right" src="/assets/frys-ix/IXR-7220-D3.jpg" alt="Nokia 7220-D3" width="20em" >}}
|
||||
|
||||
What I find particularly awesome of the Trident series is their speed (total bandwidth of
|
||||
12.8Tbps _per router_), low power use (without optics, the IXR-7220-D4 consumes about 150W) and
|
||||
a plethora of advanced capabilities like L2/L3 filtering, IPv4, IPv6 and MPLS routing, and modern
|
||||
approaches to scale-out networking such as VXLAN based EVPN. At the FrysIX barbecue in September of
|
||||
2024, FrysIX was gifted a rather powerful IXR-7220-D3 router, shown in the picture to the right.
|
||||
That's a 32x100G router.
|
||||
|
||||
ERITAP has bought two (new in box) IXR-7220-D4 (8x400G,28x100G) routers, and has also acquired two
|
||||
IXR-7220-D2 (48x25G,8x100G) routers. So in total, FrysIX is now the proud owner of five of these
|
||||
beautiful Nokia devices. If you haven't yet, you should definitely read about these versatile
|
||||
routers on the [[Nokia](https://onestore.nokia.com/asset/207599)] website, and some details of the
|
||||
_merchant silicon_ switch chips in use on the
|
||||
[[Broadcom](https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56880-series)]
|
||||
website.
|
||||
|
||||
### eVPN: A small rant
|
||||
|
||||
{{< image float="right" src="/assets/frys-ix/FrysIX_ Topology (concept).svg" alt="Topology Concept" width="50%" >}}
|
||||
|
||||
First, I need to get something off my chest. Consider a topology for an internet exchange platform,
|
||||
taking into account the available equipment, rackspace, power, and cross connects. Somehow, almost
|
||||
every design or reference architecture I can find on the Internet, assumes folks want to build a
|
||||
[[Clos network](https://en.wikipedia.org/wiki/Clos_network)], which has a topology existing of leaf
|
||||
and spine switches. The _spine_ switches have a different set of features than the _leaf_ ones,
|
||||
notably they don't have to do provider edge functionality like VXLAN encap and decapsulation.
|
||||
Almost all of these designs are showing how one might build a leaf-spine network for hyperscale.
|
||||
|
||||
**Critique 1**: my 'spine' (IXR-7220-D4 routers) must also be provider edge. Practically speaking,
|
||||
in the picture above I have these beautiful Nokia IXR-7220-D4 routers, using two 400G ports to
|
||||
connect between the facilities, and six 100G ports to connect the smaller breakout switches. That
|
||||
would leave a _massive_ amount of capacity unused: 22x 100G and 6x400G ports, to be exact.
|
||||
|
||||
**Critique 2**: all 'leaf' (either IXR-7220-D2 routers or Arista switches) can't realistically
|
||||
connect to both 'spines'. Our devices are spread out over two (and in practice, more like six)
|
||||
datacenters, and it's prohibitively expensive to get 100G waves or dark fiber to create a full mesh.
|
||||
It's much more economical to create a star-topology that minimizes cross-datacenter fiber spans.
|
||||
|
||||
**Critique 3**: Most of these 'spine-leaf' reference architectures assume that the interior gateway
|
||||
protocol is eBGP in what they call the _underlay_, and on top of that, some secondary eBGP that's
|
||||
called the _overlay_. Frankly, such a design makes my head spin a little bit. These designs assume
|
||||
hundreds of switches, in which case making use of one AS number per switch could make sense, as iBGP
|
||||
needs either a 'full mesh', or external route reflectors.
|
||||
|
||||
**Critique 4**: These reference designs also make an assumption that all fiber is local and while
|
||||
optics and links can fail, it will be relatively rare to _drain_ a link. However, in
|
||||
cross-datacenter networks, draining links for maintenance is very common, for example if the dark
|
||||
fiber provider needs to perform repairs on a span that was damaged. With these eBGP-over-eBGP
|
||||
connections, traffic engineering is more difficult than simply raising the OSPF (or IS-IS) cost of a
|
||||
link, to reroute traffic.
|
||||
|
||||
Setting aside eVPN for a second, if I were to build an IP transport network, like I did when I built
|
||||
[[IPng Site Local]({{< ref 2023-03-11-mpls-core.md >}})], I would use a much more intuitive
|
||||
and simple (I would even dare say elegant) design:
|
||||
|
||||
1. Take a classic IGP like [[OSPF](https://en.wikipedia.org/wiki/Open_Shortest_Path_First)], or
|
||||
perhaps [[IS-IS](https://en.wikipedia.org/wiki/IS-IS)]. There is no benefit, to me at least, to use
|
||||
BGP as an IGP.
|
||||
1. I would give each of the links between the switches an IPv4 /31 and enable link-local, and give
|
||||
each switch a loopback address with a /32 IPv4 and a /128 IPv6.
|
||||
1. If I had multiple links between two given switches, I would probably just use ECMP if my devices
|
||||
supported it, and fall back to a LACP signaled bundle-ethernet otherwise.
|
||||
1. If I were to need to use BGP (and for eVPN, this need exists), taking the ISP mindset (as opposed
|
||||
to the datacenter fabric mindset), I would simply install iBGP against two or three route
|
||||
reflectors, and exchange routing information within the same single AS number.
|
||||
|
||||
### eVPN: A demo topology
|
||||
|
||||
{{< image float="right" src="/assets/frys-ix/Nokia Arista VXLAN.svg" alt="Demo topology" width="50%" >}}
|
||||
|
||||
So, that's exactly how I'm going to approach the FrysIX eVPN design: OSPF for the underlay and iBGP
|
||||
for the overlay! I have a feeling that some folks will despise me for being contrarian, but you can
|
||||
leave your comments below, and don't forget to like-and-subscribe :-)
|
||||
|
||||
Arend builds this topology for me in Jubbega - also known as FrysIX HQ. He takes the two
|
||||
400G-capable routers and connects them. Then he takes an Arista DCS-7060CX switch, which is eVPN
|
||||
capable, with 32x100G ports, based on the Broadcom Tomahawk chipset, and a smaller Nokia
|
||||
IXR-7220-D2 with 48x25G and 8x100G ports, based on the Trident3 chipset. He wires all of this up
|
||||
to look like the picture on the right.
|
||||
|
||||
#### Underlay: Nokia's SR Linux
|
||||
|
||||
We boot up the equipment, verify that all the optics and links are up, and connect the management
|
||||
ports to an OOB network that I can remotely log in to. This is the first time that either of us work
|
||||
on Nokia, but I find it reasonably intuitive once I get a few tips and tricks from Niek.
|
||||
|
||||
```
|
||||
[pim@nikhef ~]$ sr_cli
|
||||
--{ running }--[ ]--
|
||||
A:pim@nikhef# enter candidate
|
||||
--{ candidate shared default }--[ ]--
|
||||
A:pim@nikhef# set / interface lo0 admin-state enable
|
||||
A:pim@nikhef# set / interface lo0 subinterface 0 admin-state enable
|
||||
A:pim@nikhef# set / interface lo0 subinterface 0 ipv4 admin-state enable
|
||||
A:pim@nikhef# set / interface lo0 subinterface 0 ipv4 address 198.19.16.1/32
|
||||
A:pim@nikhef# commit stay
|
||||
```
|
||||
|
||||
There, my first config snippet! This creates a _loopback_ interface, and similar to JunOS, a
|
||||
_subinterface_ (which Juniper calls a _unit_) which enables IPv4 and gives it an /32 address. In SR
|
||||
Linux, any interface has to be associated with a _network-instance_, think of those as routing
|
||||
domains or VRFs. There's a conveniently named _default_ network-instance, which I'll add this and
|
||||
the point-to-point interface between the two 400G routers to:
|
||||
|
||||
```
|
||||
A:pim@nikhef# info flat interface ethernet-1/29
|
||||
set / interface ethernet-1/29 admin-state enable
|
||||
set / interface ethernet-1/29 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/29 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/29 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/29 subinterface 0 ipv4 address 198.19.17.1/31
|
||||
set / interface ethernet-1/29 subinterface 0 ipv6 admin-state enable
|
||||
|
||||
A:pim@nikhef# set / network-instance default type default
|
||||
A:pim@nikhef# set / network-instance default admin-state enable
|
||||
A:pim@nikhef# set / network-instance default interface ethernet-1/29.0
|
||||
A:pim@nikhef# set / network-instance default interface lo0.0
|
||||
A:pim@nikhef# commit stay
|
||||
```
|
||||
|
||||
Cool. Assuming I now also do this on the other IXR-7220-D4 router, called _equinix_ (which gets the
|
||||
loopback address 198.19.16.0/32 and the point-to-point on the 400G interface of 198.19.17.0/31), I
|
||||
should be able to do my first jumboframe ping:
|
||||
|
||||
```
|
||||
A:pim@equinix# ping network-instance default 198.19.17.1 -s 9162 -M do
|
||||
Using network instance default
|
||||
PING 198.19.17.1 (198.19.17.1) 9162(9190) bytes of data.
|
||||
9170 bytes from 198.19.17.1: icmp_seq=1 ttl=64 time=0.466 ms
|
||||
9170 bytes from 198.19.17.1: icmp_seq=2 ttl=64 time=0.477 ms
|
||||
9170 bytes from 198.19.17.1: icmp_seq=3 ttl=64 time=0.547 ms
|
||||
```
|
||||
|
||||
#### Underlay: SR Linux OSPF
|
||||
|
||||
OK, let's get these two Nokia routers to speak OSPF, so that they can reach each other's loopback.
|
||||
It's really easy:
|
||||
|
||||
```
|
||||
A:pim@nikhef# / network-instance default protocols ospf instance default
|
||||
--{ candidate shared default }--[ network-instance default protocols ospf instance default ]--
|
||||
A:pim@nikhef# set admin-state enable
|
||||
A:pim@nikhef# set version ospf-v2
|
||||
A:pim@nikhef# set router-id 198.19.16.1
|
||||
A:pim@nikhef# set area 0.0.0.0 interface ethernet-1/29.0 interface-type point-to-point
|
||||
A:pim@nikhef# set area 0.0.0.0 interface lo0.0 passive true
|
||||
A:pim@nikhef# commit stay
|
||||
```
|
||||
|
||||
Similar to in JunOS, I can descend into a configuration scope: the first line goes into the
|
||||
_network-instance_ called `default` and then the _protocols_ called `ospf`, and then the _instance_
|
||||
called `default`. Subsequent `set` commands operate at this scope. Once I commit this configuration
|
||||
(on the _nikhef_ router and also the _equinix_ router, with its own unique router-id), OSPF quickly
|
||||
shoots in action:
|
||||
|
||||
```
|
||||
A:pim@nikhef# show network-instance default protocols ospf neighbor
|
||||
=========================================================================================
|
||||
Net-Inst default OSPFv2 Instance default Neighbors
|
||||
=========================================================================================
|
||||
+---------------------------------------------------------------------------------------+
|
||||
| Interface-Name Rtr Id State Pri RetxQ Time Before Dead |
|
||||
+=======================================================================================+
|
||||
| ethernet-1/29.0 198.19.16.0 full 1 0 36 |
|
||||
+---------------------------------------------------------------------------------------+
|
||||
-----------------------------------------------------------------------------------------
|
||||
No. of Neighbors: 1
|
||||
=========================================================================================
|
||||
|
||||
A:pim@nikhef# show network-instance default route-table all | more
|
||||
IPv4 unicast route table of network instance default
|
||||
+------------------+-----+------------+--------------+--------+----------+--------+------+-------------+-----------------+
|
||||
| Prefix | ID | Route Type | Route Owner | Active | Origin | Metric | Pref | Next-hop | Next-hop |
|
||||
| | | | | | Network | | | (Type) | Interface |
|
||||
| | | | | | Instance | | | | |
|
||||
+==================+=====+============+==============+========+==========+========+======+=============+=================+
|
||||
| 198.19.16.0/32 | 0 | ospfv2 | ospf_mgr | True | default | 1 | 10 | 198.19.17.0 | ethernet-1/29.0 |
|
||||
| | | | | | | | | (direct) | |
|
||||
| 198.19.16.1/32 | 7 | host | net_inst_mgr | True | default | 0 | 0 | None | None |
|
||||
| 198.19.17.0/31 | 6 | local | net_inst_mgr | True | default | 0 | 0 | 198.19.17.1 | ethernet-1/29.0 |
|
||||
| | | | | | | | | (direct) | |
|
||||
| 198.19.17.1/32 | 6 | host | net_inst_mgr | True | default | 0 | 0 | None | None |
|
||||
+==================+=====+============+==============+========+==========+========+======+=============+=================+
|
||||
|
||||
A:pim@nikhef# ping network-instance default 198.19.16.0
|
||||
Using network instance default
|
||||
PING 198.19.16.0 (198.19.16.0) 56(84) bytes of data.
|
||||
64 bytes from 198.19.16.0: icmp_seq=1 ttl=64 time=0.484 ms
|
||||
64 bytes from 198.19.16.0: icmp_seq=2 ttl=64 time=0.663 ms
|
||||
```
|
||||
|
||||
Delicious! OSPF has learned the loopback, and it is now reachable. As with most things, going from 0
|
||||
to 1 (in this case: understanding how SR Linux works at all) is the most difficult part. Then going
|
||||
from 1 to 2 is critical (in this case: making two routers interact with OSPF), but from there on,
|
||||
going from 2 to N is easy (in my case: enabling several other point-to-point /31 transit networks on
|
||||
the _nikhef_ router, using `ethernet-1/1.0` through `ethernet-1/4.0` with the correct MTU and
|
||||
turning on OSPF for these), makes the whole network shoot to life. Slick!
|
||||
|
||||
#### Underlay: Arista
|
||||
|
||||
I'll point out that one of the devices in this topology is an Arista. We have several of these ready
|
||||
for deployment at FrysIX. They are a lot more affordable and easy to find on the second hand /
|
||||
refurbished market. These switches come with 32x100G ports, and are really good at packet slinging
|
||||
because they're based on the Broadcom _Tomahawk_ chipset. They pack a few less features than the
|
||||
_Trident_ chipset that powers the Nokia, but they happen to have all the features we need to run our
|
||||
internet exchange . So I turn my attention to the Arista in the topology. I am much more
|
||||
comfortable configuring the whole thing here, as it's not my first time touching these devices:
|
||||
|
||||
```
|
||||
arista-leaf#show run int loop0
|
||||
interface Loopback0
|
||||
ip address 198.19.16.2/32
|
||||
ip ospf area 0.0.0.0
|
||||
arista-leaf#show run int Ethernet32/1
|
||||
interface Ethernet32/1
|
||||
description Core: Connected to nikhef:ethernet-1/2
|
||||
load-interval 1
|
||||
mtu 9190
|
||||
no switchport
|
||||
ip address 198.19.17.5/31
|
||||
ip ospf cost 1000
|
||||
ip ospf network point-to-point
|
||||
ip ospf area 0.0.0.0
|
||||
arista-leaf#show run section router ospf
|
||||
router ospf 65500
|
||||
router-id 198.19.16.2
|
||||
redistribute connected
|
||||
network 198.19.0.0/16 area 0.0.0.0
|
||||
max-lsa 12000
|
||||
```
|
||||
|
||||
I complete the configuration for the other two interfaces on this Arista, port Eth31/1 connects also
|
||||
to the _nikhef_ IXR-7220-D4 and I give it a high cost of 1000, while Eth30/1 connects only 1x100G to
|
||||
the _nokia-leaf_ IXR-7220-D2 with a cost of 10.
|
||||
It's nice to see that OSPF in action - there are two equal path (but high cost) OSPF paths via
|
||||
router-id 198.19.16.1 (nikhef), and there's one lower cost path via router-id 198.19.16.3
|
||||
(_nokia-leaf_). The traceroute nicely shows the scenic route (arista-leaf -> nokia-leaf -> nokia ->
|
||||
equinix). Dope!
|
||||
|
||||
```
|
||||
arista-leaf#show ip ospf nei
|
||||
Neighbor ID Instance VRF Pri State Dead Time Address Interface
|
||||
198.19.16.1 65500 default 1 FULL 00:00:36 198.19.17.4 Ethernet32/1
|
||||
198.19.16.3 65500 default 1 FULL 00:00:31 198.19.17.11 Ethernet30/1
|
||||
198.19.16.1 65500 default 1 FULL 00:00:35 198.19.17.2 Ethernet31/1
|
||||
|
||||
arista-leaf#traceroute 198.19.16.0
|
||||
traceroute to 198.19.16.0 (198.19.16.0), 30 hops max, 60 byte packets
|
||||
1 198.19.17.11 (198.19.17.11) 0.220 ms 0.150 ms 0.206 ms
|
||||
2 198.19.17.6 (198.19.17.6) 0.169 ms 0.107 ms 0.099 ms
|
||||
3 198.19.16.0 (198.19.16.0) 0.434 ms 0.346 ms 0.303 ms
|
||||
```
|
||||
|
||||
So far, so good! The _underlay_ is up, every router can reach every other router on its loopback,
|
||||
and all OSPF adjacencies are formed. I'll leave the 2x100G between _nikhef_ and _arista-leaf_ at
|
||||
high cost for now.
|
||||
|
||||
#### Overlay EVPN: SR Linux
|
||||
|
||||
The big-picture idea here is to use iBGP with the same private AS number, and because there are two
|
||||
main facilities (NIKHEF and Equinix), make each of those bigger IXR-7220-D4 routers act as
|
||||
route-reflectors for others. It means that they will have an iBGP session amongst themselves
|
||||
(198.191.16.0 <-> 198.19.16.1) and otherwise accept iBGP sessions from any IP address in the
|
||||
198.19.16.0/24 subnet. This way, I don't have to configure any more than strictly necessary on the
|
||||
core routers. Any new router can just plug in, form an OSPF adjacency, and connect to both core
|
||||
routers. I proceed to configure BGP on the Nokia's like this:
|
||||
|
||||
```
|
||||
A:pim@nikhef# / network-instance default protocols bgp
|
||||
A:pim@nikhef# set admin-state enable
|
||||
A:pim@nikhef# set autonomous-system 65500
|
||||
A:pim@nikhef# set router-id 198.19.16.1
|
||||
A:pim@nikhef# set dynamic-neighbors accept match 198.19.16.0/24 peer-group overlay
|
||||
A:pim@nikhef# set afi-safi evpn admin-state enable
|
||||
A:pim@nikhef# set preference ibgp 170
|
||||
A:pim@nikhef# set route-advertisement rapid-withdrawal true
|
||||
A:pim@nikhef# set route-advertisement wait-for-fib-install false
|
||||
A:pim@nikhef# set group overlay peer-as 65500
|
||||
A:pim@nikhef# set group overlay afi-safi evpn admin-state enable
|
||||
A:pim@nikhef# set group overlay afi-safi ipv4-unicast admin-state disable
|
||||
A:pim@nikhef# set group overlay afi-safi ipv6-unicast admin-state disable
|
||||
A:pim@nikhef# set group overlay local-as as-number 65500
|
||||
A:pim@nikhef# set group overlay route-reflector client true
|
||||
A:pim@nikhef# set group overlay transport local-address 198.19.16.1
|
||||
A:pim@nikhef# set neighbor 198.19.16.0 admin-state enable
|
||||
A:pim@nikhef# set neighbor 198.19.16.0 peer-group overlay
|
||||
A:pim@nikhef# commit stay
|
||||
```
|
||||
|
||||
I can see that iBGP sessions establish between all the devices:
|
||||
|
||||
```
|
||||
A:pim@nikhef# show network-instance default protocols bgp neighbor
|
||||
---------------------------------------------------------------------------------------------------------------------------
|
||||
BGP neighbor summary for network-instance "default"
|
||||
Flags: S static, D dynamic, L discovered by LLDP, B BFD enabled, - disabled, * slow
|
||||
---------------------------------------------------------------------------------------------------------------------------
|
||||
---------------------------------------------------------------------------------------------------------------------------
|
||||
+-------------+-------------+----------+-------+----------+-------------+---------------+------------+--------------------+
|
||||
| Net-Inst | Peer | Group | Flags | Peer-AS | State | Uptime | AFI/SAFI | [Rx/Active/Tx] |
|
||||
+=============+=============+==========+=======+==========+=============+===============+============+====================+
|
||||
| default | 198.19.16.0 | overlay | S | 65500 | established | 0d:0h:2m:32s | evpn | [0/0/0] |
|
||||
| default | 198.19.16.2 | overlay | D | 65500 | established | 0d:0h:2m:27s | evpn | [0/0/0] |
|
||||
| default | 198.19.16.3 | overlay | D | 65500 | established | 0d:0h:2m:41s | evpn | [0/0/0] |
|
||||
+-------------+-------------+----------+-------+----------+-------------+---------------+------------+--------------------+
|
||||
---------------------------------------------------------------------------------------------------------------------------
|
||||
Summary:
|
||||
1 configured neighbors, 1 configured sessions are established, 0 disabled peers
|
||||
2 dynamic peers
|
||||
```
|
||||
|
||||
A few things to note here - there one _configured_ neighbor (this is the other IXR-7220-D4 router),
|
||||
and two _dynamic_ peers, these are the Arista and the smaller IXR-7220-D2 router. The only address
|
||||
family that they are exchanging information for is the _evpn_ family, and no prefixes have been
|
||||
learned or sent yet, shown by the `[0/0/0]` designation in the last column.
|
||||
|
||||
#### Overlay EVPN: Arista
|
||||
|
||||
The Arista is also remarkably straight forward to configure. Here, I'll simply enable the iBGP
|
||||
session as follows:
|
||||
|
||||
```
|
||||
arista-leaf#show run section bgp
|
||||
router bgp 65500
|
||||
neighbor evpn peer group
|
||||
neighbor evpn remote-as 65500
|
||||
neighbor evpn update-source Loopback0
|
||||
neighbor evpn ebgp-multihop 3
|
||||
neighbor evpn send-community extended
|
||||
neighbor evpn maximum-routes 12000 warning-only
|
||||
neighbor 198.19.16.0 peer group evpn
|
||||
neighbor 198.19.16.1 peer group evpn
|
||||
!
|
||||
address-family evpn
|
||||
neighbor evpn activate
|
||||
|
||||
arista-leaf#show bgp summary
|
||||
BGP summary information for VRF default
|
||||
Router identifier 198.19.16.2, local AS number 65500
|
||||
Neighbor AS Session State AFI/SAFI AFI/SAFI State NLRI Rcd NLRI Acc
|
||||
----------- ----------- ------------- ----------------------- -------------- ---------- ----------
|
||||
198.19.16.0 65500 Established IPv4 Unicast Advertised 0 0
|
||||
198.19.16.0 65500 Established L2VPN EVPN Negotiated 0 0
|
||||
198.19.16.1 65500 Established IPv4 Unicast Advertised 0 0
|
||||
198.19.16.1 65500 Established L2VPN EVPN Negotiated 0 0
|
||||
```
|
||||
|
||||
On this leaf node, I'll have a redundant iBGP session with the two core nodes. Since those core
|
||||
nodes are peering amongst themselves, and are configured as route-reflectors, this is all I need. No
|
||||
matter how many additional Arista (or Nokia) devices I add to the network, all they'll have to do is
|
||||
enable OSPF (so they can reach 198.19.16.0 and .1) and turn on iBGP sessions with both core routers.
|
||||
Voila!
|
||||
|
||||
#### VXLAN EVPN: SR Linux
|
||||
|
||||
Nokia documentation informs me that SR Linux uses a special interface called _system0_ to source its
|
||||
VXLAN traffic from, and to add this interface to the _default_ network-instance. So it's a matter of
|
||||
defining that interface and associate a VXLAN interface with it, like so:
|
||||
|
||||
```
|
||||
A:pim@nikhef# set / interface system0 admin-state enable
|
||||
A:pim@nikhef# set / interface system0 subinterface 0 admin-state enable
|
||||
A:pim@nikhef# set / interface system0 subinterface 0 ipv4 admin-state enable
|
||||
A:pim@nikhef# set / interface system0 subinterface 0 ipv4 address 198.19.18.1/32
|
||||
A:pim@nikhef# set / network-instance default interface system0.0
|
||||
A:pim@nikhef# set / tunnel-interface vxlan1 vxlan-interface 2604 type bridged
|
||||
A:pim@nikhef# set / tunnel-interface vxlan1 vxlan-interface 2604 ingress vni 2604
|
||||
A:pim@nikhef# set / tunnel-interface vxlan1 vxlan-interface 2604 egress source-ip use-system-ipv4-address
|
||||
A:pim@nikhef# commit stay
|
||||
```
|
||||
|
||||
This creates the plumbing for a VXLAN sub-interface called `vxlan1.2604` which will accept/send
|
||||
traffic using VNI 2604 (this happens to be the VLAN id we use at FrysIX for our production Peering
|
||||
LAN), and it'll use the `system0.0` address to source that traffic from.
|
||||
|
||||
The second part is to create what SR Linux calls a MAC-VRF and put some interface(s) in it:
|
||||
|
||||
```
|
||||
A:pim@nikhef# set / interface ethernet-1/9 admin-state enable
|
||||
A:pim@nikhef# set / interface ethernet-1/9 breakout-mode num-breakout-ports 4
|
||||
A:pim@nikhef# set / interface ethernet-1/9 breakout-mode breakout-port-speed 10G
|
||||
A:pim@nikhef# set / interface ethernet-1/9/3 admin-state enable
|
||||
A:pim@nikhef# set / interface ethernet-1/9/3 vlan-tagging true
|
||||
A:pim@nikhef# set / interface ethernet-1/9/3 subinterface 0 type bridged
|
||||
A:pim@nikhef# set / interface ethernet-1/9/3 subinterface 0 admin-state enable
|
||||
A:pim@nikhef# set / interface ethernet-1/9/3 subinterface 0 vlan encap untagged
|
||||
|
||||
A:pim@nikhef# / network-instance peeringlan
|
||||
A:pim@nikhef# set type mac-vrf
|
||||
A:pim@nikhef# set admin-state enable
|
||||
A:pim@nikhef# set interface ethernet-1/9/3.0
|
||||
A:pim@nikhef# set vxlan-interface vxlan1.2604
|
||||
A:pim@nikhef# set protocols bgp-evpn bgp-instance 1 admin-state enable
|
||||
A:pim@nikhef# set protocols bgp-evpn bgp-instance 1 vxlan-interface vxlan1.2604
|
||||
A:pim@nikhef# set protocols bgp-evpn bgp-instance 1 evi 2604
|
||||
A:pim@nikhef# set protocols bgp-vpn bgp-instance 1 route-distinguisher rd 65500:2604
|
||||
A:pim@nikhef# set protocols bgp-vpn bgp-instance 1 route-target export-rt target:65500:2604
|
||||
A:pim@nikhef# set protocols bgp-vpn bgp-instance 1 route-target import-rt target:65500:2604
|
||||
A:pim@nikhef# commit stay
|
||||
```
|
||||
|
||||
In the first block here, Arend took what is a 100G port called `ethernet-1/9` and split it into 4x25G
|
||||
ports. Arend forced the port speed to 10G because he has taken a 40G-4x10G DAC, and it happens that
|
||||
the third lane is plugged into the Debian machine. So on `ethernet-1/9/3` I'll create a
|
||||
sub-interface, make it type _bridged_ (which I've also done on `vxlan1.2604`!) and allow any
|
||||
untagged traffic to enter it.
|
||||
|
||||
{{< image width="5em" float="left" src="/assets/shared/brain.png" alt="brain" >}}
|
||||
|
||||
If you, like me, are used to either VPP or IOS/XR, this type of sub-interface stuff should feel very
|
||||
natural to you. I've written about the sub-interfaces logic on Cisco's IOS/XR and VPP approach in a
|
||||
previous [[article]({{< ref 2022-02-14-vpp-vlan-gym.md >}})] which my buddy Fred lovingly calls
|
||||
_VLAN Gymnastics_ because the ports are just so damn flexible. Worth a read!
|
||||
|
||||
The second block creates a new _network-instance_ which I'll name `peeringlan`, and it associates
|
||||
the newly created untagged sub-interface `ethernet-1/9/3.0` with the VXLAN interface, and starts a
|
||||
protocol for eVPN instructing traffic in and out of this network-instance to use EVI 2604 on the
|
||||
VXLAN sub-interface, and signalling of all MAC addresses learned to use the specified
|
||||
route-distinguisher and import/export route-targets. For simplicity I've just used the same for
|
||||
each: 65500:2604.
|
||||
|
||||
I continue to add an interface to the `peeringlan` _network-instance_ on the other two Nokia
|
||||
routers: `ethernet-1/9/3.0` on the _equinix_ router and `ethernet-1/9.0` on the _nokia-leaf_ router.
|
||||
Each of these goes to a 10Gbps port on a Debian machine.
|
||||
|
||||
#### VXLAN EVPN: Arista
|
||||
|
||||
At this point I'm feeling pretty bullish about the whole project. Arista does not make it very
|
||||
difficult on me to configure it for L2 EVPN (which is called MAC-VRF here also):
|
||||
|
||||
```
|
||||
arista-leaf#conf t
|
||||
vlan 2604
|
||||
name v-peeringlan
|
||||
interface Ethernet9/3
|
||||
speed forced 10000full
|
||||
switchport access vlan 2604
|
||||
|
||||
interface Loopback1
|
||||
ip address 198.19.18.2/32
|
||||
interface Vxlan1
|
||||
vxlan source-interface Loopback1
|
||||
vxlan udp-port 4789
|
||||
vxlan vlan 2604 vni 2604
|
||||
```
|
||||
|
||||
After creating VLAN 2604 and making port Eth9/3 an access port in that VLAN, I'll add a VTEP endpoint
|
||||
called `Loopback1`, and a VXLAN interface that uses that to source its traffic. Here, I'll associate
|
||||
local VLAN 2604 with the `Vxlan1` and its VNI 2604, to match up with how I configured the Nokias
|
||||
previously.
|
||||
|
||||
Finally, it's a matter of tying these together by announcing the MAC addresses into the EVPN iBGP
|
||||
sessions:
|
||||
```
|
||||
arista-leaf#conf t
|
||||
router bgp 65500
|
||||
vlan 2604
|
||||
rd 65500:2604
|
||||
route-target both 65500:2604
|
||||
redistribute learned
|
||||
!
|
||||
```
|
||||
|
||||
### Results
|
||||
|
||||
To validate the configurations, I learn a cool trick from my buddy Andy on the SR Linux discord
|
||||
server. In EOS, I can ask it to check for any obvious mistakes in two places:
|
||||
|
||||
```
|
||||
arista-leaf#show vxlan config-sanity detail
|
||||
Category Result Detail
|
||||
---------------------------------- -------- --------------------------------------------------
|
||||
Local VTEP Configuration Check OK
|
||||
Loopback IP Address OK
|
||||
VLAN-VNI Map OK
|
||||
Flood List OK
|
||||
Routing OK
|
||||
VNI VRF ACL OK
|
||||
Decap VRF-VNI Map OK
|
||||
VRF-VNI Dynamic VLAN OK
|
||||
Remote VTEP Configuration Check OK
|
||||
Remote VTEP OK
|
||||
Platform Dependent Check OK
|
||||
VXLAN Bridging OK
|
||||
VXLAN Routing OK VXLAN Routing not enabled
|
||||
CVX Configuration Check OK
|
||||
CVX Server OK Not in controller client mode
|
||||
MLAG Configuration Check OK Run 'show mlag config-sanity' to verify MLAG config
|
||||
Peer VTEP IP OK MLAG peer is not connected
|
||||
MLAG VTEP IP OK
|
||||
Peer VLAN-VNI OK
|
||||
Virtual VTEP IP OK
|
||||
MLAG Inactive State OK
|
||||
|
||||
arista-leaf#show bgp evpn sanity detail
|
||||
Category Check Status Detail
|
||||
-------- -------------------- ------ ------
|
||||
General Send community OK
|
||||
General Multi-agent mode OK
|
||||
General Neighbor established OK
|
||||
L2 MAC-VRF route-target OK
|
||||
import and export
|
||||
L2 MAC-VRF OK
|
||||
route-distinguisher
|
||||
L2 MAC-VRF redistribute OK
|
||||
L2 MAC-VRF overlapping OK
|
||||
VLAN
|
||||
L2 Suppressed MAC OK
|
||||
VXLAN VLAN to VNI map for OK
|
||||
MAC-VRF
|
||||
VXLAN VRF to VNI map for OK
|
||||
IP-VRF
|
||||
```
|
||||
|
||||
#### Results: Arista view
|
||||
|
||||
Inspecting the MAC addresses learned from all four of the client ports on the Debian machine is
|
||||
easy:
|
||||
|
||||
```
|
||||
arista-leaf#show bgp evpn summary
|
||||
BGP summary information for VRF default
|
||||
Router identifier 198.19.16.2, local AS number 65500
|
||||
Neighbor Status Codes: m - Under maintenance
|
||||
Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
|
||||
198.19.16.0 4 65500 3311 3867 0 0 18:06:28 Estab 7 7
|
||||
198.19.16.1 4 65500 3308 3873 0 0 18:06:28 Estab 7 7
|
||||
|
||||
arista-leaf#show bgp evpn vni 2604 next-hop 198.19.18.3
|
||||
BGP routing table information for VRF default
|
||||
Router identifier 198.19.16.2, local AS number 65500
|
||||
Route status codes: * - valid, > - active, S - Stale, E - ECMP head, e - ECMP
|
||||
c - Contributing to ECMP, % - Pending BGP convergence
|
||||
Origin codes: i - IGP, e - EGP, ? - incomplete
|
||||
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
|
||||
|
||||
Network Next Hop Metric LocPref Weight Path
|
||||
* >Ec RD: 65500:2604 mac-ip e43a.6e5f.0c59
|
||||
198.19.18.3 - 100 0 i Or-ID: 198.19.16.3 C-LST: 198.19.16.1
|
||||
* ec RD: 65500:2604 mac-ip e43a.6e5f.0c59
|
||||
198.19.18.3 - 100 0 i Or-ID: 198.19.16.3 C-LST: 198.19.16.0
|
||||
* >Ec RD: 65500:2604 imet 198.19.18.3
|
||||
198.19.18.3 - 100 0 i Or-ID: 198.19.16.3 C-LST: 198.19.16.1
|
||||
* ec RD: 65500:2604 imet 198.19.18.3
|
||||
198.19.18.3 - 100 0 i Or-ID: 198.19.16.3 C-LST: 198.19.16.0
|
||||
```
|
||||
There's a lot to unpack here! The Arista is seeing that from the _route-distinguisher_ I configured
|
||||
on all the sessions, it is learning one MAC address on neighbor 198.19.18.3 (this is the VTEP for
|
||||
the _nokia-leaf_ router) from both iBGP sessions. The MAC address is learned from originator
|
||||
198.19.16.3 (the loopback of the _nokia-leaf_ router), from two cluster members, the active one on
|
||||
iBGP speaker 198.19.16.1 (_nikhef_) and a backup member on 198.19.16.0 (_equinix_).
|
||||
|
||||
I can also see that there's a bunch of `imet` route entries, and Andy explained these to me. They are
|
||||
a signal from a VTEP participant that they are interested in seeing multicast traffic (like neighbor
|
||||
discovery or ARP requests) flooded to them. Every router participating in this L2VPN will raise such
|
||||
an `imet` route, which I'll see in duplicates as well (one from each iBGP session). This checks out.
|
||||
|
||||
#### Results: SR Linux view
|
||||
|
||||
The Nokia IXR-7220-D4 router called _equinix_ has also learned a bunch of EVPN routing entries,
|
||||
which I can inspect as follows:
|
||||
|
||||
```
|
||||
A:pim@equinix# show network-instance default protocols bgp routes evpn route-type summary
|
||||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
Show report for the BGP route table of network-instance "default"
|
||||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
Status codes: u=used, *=valid, >=best, x=stale, b=backup
|
||||
Origin codes: i=IGP, e=EGP, ?=incomplete
|
||||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
BGP Router ID: 198.19.16.0 AS: 65500 Local AS: 65500
|
||||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
Type 2 MAC-IP Advertisement Routes
|
||||
+--------+---------------+--------+-------------------+------------+-------------+------+-------------+--------+--------------------------------+------------------+
|
||||
| Status | Route- | Tag-ID | MAC-address | IP-address | neighbor | Path-| Next-Hop | Label | ESI | MAC Mobility |
|
||||
| | distinguisher | | | | | id | | | | |
|
||||
+========+===============+========+===================+============+=============+======+============-+========+================================+==================+
|
||||
| u*> | 65500:2604 | 0 | E4:3A:6E:5F:0C:57 | 0.0.0.0 | 198.19.16.1 | 0 | 198.19.18.1 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
||||
| * | 65500:2604 | 0 | E4:3A:6E:5F:0C:58 | 0.0.0.0 | 198.19.16.1 | 0 | 198.19.18.2 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
||||
| u*> | 65500:2604 | 0 | E4:3A:6E:5F:0C:58 | 0.0.0.0 | 198.19.16.2 | 0 | 198.19.18.2 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
||||
| * | 65500:2604 | 0 | E4:3A:6E:5F:0C:59 | 0.0.0.0 | 198.19.16.1 | 0 | 198.19.18.3 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
||||
| u*> | 65500:2604 | 0 | E4:3A:6E:5F:0C:59 | 0.0.0.0 | 198.19.16.3 | 0 | 198.19.18.3 | 2604 | 00:00:00:00:00:00:00:00:00:00 | - |
|
||||
+--------+---------------+--------+-------------------+------------+-------------+------+-------------+--------+--------------------------------+------------------+
|
||||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
Type 3 Inclusive Multicast Ethernet Tag Routes
|
||||
+--------+-----------------------------+--------+---------------------+-----------------+--------+-----------------------+
|
||||
| Status | Route-distinguisher | Tag-ID | Originator-IP | neighbor | Path- | Next-Hop |
|
||||
| | | | | | id | |
|
||||
+========+=============================+========+=====================+=================+========+=======================+
|
||||
| u*> | 65500:2604 | 0 | 198.19.18.1 | 198.19.16.1 | 0 | 198.19.18.1 |
|
||||
| * | 65500:2604 | 0 | 198.19.18.2 | 198.19.16.1 | 0 | 198.19.18.2 |
|
||||
| u*> | 65500:2604 | 0 | 198.19.18.2 | 198.19.16.2 | 0 | 198.19.18.2 |
|
||||
| * | 65500:2604 | 0 | 198.19.18.3 | 198.19.16.1 | 0 | 198.19.18.3 |
|
||||
| u*> | 65500:2604 | 0 | 198.19.18.3 | 198.19.16.3 | 0 | 198.19.18.3 |
|
||||
+--------+-----------------------------+--------+---------------------+-----------------+--------+-----------------------+
|
||||
--------------------------------------------------------------------------------------------------------------------------
|
||||
0 Ethernet Auto-Discovery routes 0 used, 0 valid
|
||||
5 MAC-IP Advertisement routes 3 used, 5 valid
|
||||
5 Inclusive Multicast Ethernet Tag routes 3 used, 5 valid
|
||||
0 Ethernet Segment routes 0 used, 0 valid
|
||||
0 IP Prefix routes 0 used, 0 valid
|
||||
0 Selective Multicast Ethernet Tag routes 0 used, 0 valid
|
||||
0 Selective Multicast Membership Report Sync routes 0 used, 0 valid
|
||||
0 Selective Multicast Leave Sync routes 0 used, 0 valid
|
||||
--------------------------------------------------------------------------------------------------------------------------
|
||||
```
|
||||
|
||||
I have to say, SR Linux output is incredibly verbose! But, I can see all the relevant bits and bobs
|
||||
here. Each MAC-IP entry is accounted for, I can see several nexthops pointing at the nikhef switch,
|
||||
one pointing at the nokia-leaf router and one pointing at the Arista switch. I also see the `imet`
|
||||
entries. One thing to note -- the SR Linux implementation leaves the type-2 routes empty with a
|
||||
0.0.0.0 IPv4 address, while the Arista (in my opinion, more correctly) leaves them as NULL
|
||||
(unspecified). But, everything looks great!
|
||||
|
||||
#### Results: Debian view
|
||||
|
||||
There's one more thing to show, and that's kind of the 'proof is in the pudding' moment. As I said,
|
||||
Arend hooked up a Debian machine with an Intel X710-DA4 network card, which sports 4x10G SFP+
|
||||
connections. This network card is a regular in my AS8298 network, as it has excellent DPDK support
|
||||
and can easily pump 40Mpps with VPP. IPng 🥰 Intel X710!
|
||||
|
||||
```
|
||||
root@debian:~ # ip netns add nikhef
|
||||
root@debian:~ # ip link set enp1s0f0 netns nikhef
|
||||
root@debian:~ # ip netns exec nikhef ip link set enp1s0f0 up mtu 9000
|
||||
root@debian:~ # ip netns exec nikhef ip addr add 192.0.2.10/24 dev enp1s0f0
|
||||
root@debian:~ # ip netns exec nikhef ip addr add 2001:db8::10/64 dev enp1s0f0
|
||||
|
||||
root@debian:~ # ip netns add arista-leaf
|
||||
root@debian:~ # ip link set enp1s0f1 netns arista-leaf
|
||||
root@debian:~ # ip netns exec arista-leaf ip link set enp1s0f1 up mtu 9000
|
||||
root@debian:~ # ip netns exec arista-leaf ip addr add 192.0.2.11/24 dev enp1s0f1
|
||||
root@debian:~ # ip netns exec arista-leaf ip addr add 2001:db8::11/64 dev enp1s0f1
|
||||
|
||||
root@debian:~ # ip netns add nokia-leaf
|
||||
root@debian:~ # ip link set enp1s0f2 netns nokia-leaf
|
||||
root@debian:~ # ip netns exec nokia-leaf ip link set enp1s0f2 up mtu 9000
|
||||
root@debian:~ # ip netns exec nokia-leaf ip addr add 192.0.2.12/24 dev enp1s0f2
|
||||
root@debian:~ # ip netns exec nokia-leaf ip addr add 2001:db8::12/64 dev enp1s0f2
|
||||
|
||||
root@debian:~ # ip netns add equinix
|
||||
root@debian:~ # ip link set enp1s0f3 netns equinix
|
||||
root@debian:~ # ip netns exec equinix ip link set enp1s0f3 up mtu 9000
|
||||
root@debian:~ # ip netns exec equinix ip addr add 192.0.2.13/24 dev enp1s0f3
|
||||
root@debian:~ # ip netns exec equinix ip addr add 2001:db8::13/64 dev enp1s0f3
|
||||
|
||||
root@debian:~# ip netns exec nikhef fping -g 192.0.2.8/29
|
||||
192.0.2.10 is alive
|
||||
192.0.2.11 is alive
|
||||
192.0.2.12 is alive
|
||||
192.0.2.13 is alive
|
||||
|
||||
root@debian:~# ip netns exec arista-leaf fping 2001:db8::10 2001:db8::11 2001:db8::12 2001:db8::13
|
||||
2001:db8::10 is alive
|
||||
2001:db8::11 is alive
|
||||
2001:db8::12 is alive
|
||||
2001:db8::13 is alive
|
||||
|
||||
root@debian:~# ip netns exec equinix ip nei
|
||||
192.0.2.10 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:57 STALE
|
||||
192.0.2.11 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:58 STALE
|
||||
192.0.2.12 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:59 STALE
|
||||
fe80::e63a:6eff:fe5f:c57 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:57 STALE
|
||||
fe80::e63a:6eff:fe5f:c58 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:58 STALE
|
||||
fe80::e63a:6eff:fe5f:c59 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:59 STALE
|
||||
2001:db8::10 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:57 STALE
|
||||
2001:db8::11 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:58 STALE
|
||||
2001:db8::12 dev enp1s0f3 lladdr e4:3a:6e:5f:0c:59 STALE
|
||||
```
|
||||
|
||||
The Debian machine puts each network card into its own network namespace, and gives them both an IPv4
|
||||
and an IPv6 address. I can then enter the `nikhef` network namespace, which has its NIC connected to
|
||||
the IXR-7220-D4 router called _nikhef_, and ping all four endpoints. Similarly, I can enter the
|
||||
`arista-leaf` namespace and ping6 all four endpoints. Finally, I take a look at the IPv6 and IPv4
|
||||
neighbor table on the network card that is connected to the _equinix_ router. All three MAC addresses are
|
||||
seen. This proves end to end connectivity across the EVPN VXLAN, and full interoperability. Booyah!
|
||||
|
||||
Performance? We got that! I'm not worried as these Nokia routers are rated for 12.8Tbps of VXLAN....
|
||||
```
|
||||
root@debian:~# ip netns exec equinix iperf3 -c 192.0.2.12
|
||||
Connecting to host 192.0.2.12, port 5201
|
||||
[ 5] local 192.0.2.10 port 34598 connected to 192.0.2.12 port 5201
|
||||
[ ID] Interval Transfer Bitrate Retr Cwnd
|
||||
[ 5] 0.00-1.00 sec 1.15 GBytes 9.91 Gbits/sec 19 1.52 MBytes
|
||||
[ 5] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 3 1.54 MBytes
|
||||
[ 5] 2.00-3.00 sec 1.15 GBytes 9.90 Gbits/sec 1 1.54 MBytes
|
||||
[ 5] 3.00-4.00 sec 1.15 GBytes 9.90 Gbits/sec 1 1.54 MBytes
|
||||
[ 5] 4.00-5.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
||||
[ 5] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
||||
[ 5] 6.00-7.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
||||
[ 5] 7.00-8.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
||||
[ 5] 8.00-9.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
||||
[ 5] 9.00-10.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.54 MBytes
|
||||
- - - - - - - - - - - - - - - - - - - - - - - - -
|
||||
[ ID] Interval Transfer Bitrate Retr
|
||||
[ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec 24 sender
|
||||
[ 5] 0.00-10.00 sec 11.5 GBytes 9.90 Gbits/sec receiver
|
||||
|
||||
iperf Done.
|
||||
```
|
||||
|
||||
## What's Next
|
||||
|
||||
There's a few improvements I can make before deploying this architecture to the internet exchange.
|
||||
Notably:
|
||||
* the functional equivalent of _port security_, that is to say only allowing one or two MAC
|
||||
addresses per member port. FrysIX has a strict one-port-one-member-one-MAC rule, and having port
|
||||
security will greatly improve our resilience.
|
||||
* SR Linux has the ability to suppress ARP, _even on L2 MAC-VRF_! It's relatively well known for
|
||||
IRB based setups, but adding this to transparent bridge-domains is possible in Nokia
|
||||
[[ref](https://documentation.nokia.com/srlinux/22-6/SR_Linux_Book_Files/EVPN-VXLAN_Guide/services-evpn-vxlan-l2.html#configuring_evpn_learning_for_proxy_arp)],
|
||||
using the syntax of `protocols bgp-evpn bgp-instance 1 routes bridge-table mac-ip advertise
|
||||
true`. This will glean the IP addresses based on intercepted ARP requests, and reduce the need for
|
||||
BUM flooding.
|
||||
* Andy informs me that Arista also has this feature. By setting `router l2-vpn` and `arp learning bridged`,
|
||||
the suppression of ARP requests/replies also works in the same way. This greatly reduces cross-router
|
||||
BUM flooding. If DE-CIX can do it, so can FrysIX :)
|
||||
* some automation - although configuring the MAC-VRF across Arista and SR Linux is definitely not
|
||||
as difficult as I thought, having some automation in place will avoid errors and mistakes. It
|
||||
would suck if the IXP collapsed because I botched a link drain or PNI configuration!
|
||||
|
||||
### Acknowledgements
|
||||
|
||||
I am relatively new to EVPN configurations, and wanted to give a shoutout to Andy Whitaker who
|
||||
jumped in very quickly when I asked a question on the SR Linux Discord. He was gracious with his
|
||||
time and spent a few hours on a video call with me, explaining EVPN in great detail both for Arista
|
||||
as well as SR Linux, and in particular wanted to give a big "Thank you!" for helping me understand
|
||||
symmetric and asymmetric IRB in the context of multivendor EVPN. Andy is about to start a new job at
|
||||
Nokia, and I wish him all the best. To my friends at Nokia: you caught a good one, Andy is pure
|
||||
gold!
|
||||
|
||||
I also want to thank Niek for helping me take my first baby steps onto this platform and patiently
|
||||
answering my nerdly questions about the platform, the switch chip, and the configuration philosophy.
|
||||
Learning a new NOS is always a fun task, and it was made super fun because Niek spent an hour with
|
||||
Arend and I on a video call, giving a bunch of operational tips and tricks along the way.
|
||||
|
||||
Finally, Arend and ERITAP are an absolute joy to work with. We took turns hacking on the lab, which
|
||||
Arend made available for me while I am traveling to Mississippi this week. Thanks for the kWh and
|
||||
OOB access, and for brainstorming the config with me!
|
||||
|
||||
### Reference configurations
|
||||
|
||||
Here's the configs for all machines in this demonstration:
|
||||
[[nikhef](/assets/frys-ix/nikhef.conf)] | [[equinix](/assets/frys-ix/equinix.conf)] | [[nokia-leaf](/assets/frys-ix/nokia-leaf.conf)] | [[arista-leaf](/assets/frys-ix/arista-leaf.conf)]
|
464
content/articles/2025-05-03-containerlab-1.md
Normal file
464
content/articles/2025-05-03-containerlab-1.md
Normal file
@ -0,0 +1,464 @@
|
||||
---
|
||||
date: "2025-05-03T15:07:23Z"
|
||||
title: 'VPP in Containerlab - Part 1'
|
||||
---
|
||||
|
||||
{{< image float="right" src="/assets/containerlab/containerlab.svg" alt="Containerlab Logo" width="12em" >}}
|
||||
|
||||
# Introduction
|
||||
|
||||
From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in
|
||||
AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance.
|
||||
However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines
|
||||
like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to
|
||||
allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP
|
||||
performance almost the same as on bare metal. But did you know that VPP can also run in Docker?
|
||||
|
||||
The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa.
|
||||
One of the presenters was Nardus le Roux of Nokia, and he showed off a project called
|
||||
[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing
|
||||
container-based networking labs. It starts the containers, builds a virtual wiring between them to
|
||||
create lab topologies of users choice and manages labs lifecycle.
|
||||
|
||||
Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise
|
||||
to actually add them. Here I go, on a journey to integrate VPP into Containerlab!
|
||||
|
||||
## Containerized VPP
|
||||
|
||||
The folks at [[Tigera](https://www.tigera.io/project-calico/)] maintain a project called _Calico_,
|
||||
which accelerates Kubernetes CNI (Container Network Interface) by using [[FD.io](https://fd.io)]
|
||||
VPP. Since the origins of Kubernetes are to run containers in a Docker environment, it stands to
|
||||
reason that it should be possible to run a containerized VPP. I start by reading up on how they
|
||||
create their Docker image, and I learn a lot.
|
||||
|
||||
### Docker Build
|
||||
|
||||
Considering IPng runs bare metal Debian (currently Bookworm) machines, my Docker image will be based
|
||||
on `debian:bookworm` as well. The build starts off quite modest:
|
||||
|
||||
```
|
||||
pim@summer:~$ mkdir -p src/vpp-containerlab
|
||||
pim@summer:~/src/vpp-containerlab$ cat < EOF > Dockerfile.bookworm
|
||||
FROM debian:bookworm
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
ARG VPP_INSTALL_SKIP_SYSCTL=true
|
||||
ARG REPO=release
|
||||
RUN apt-get update && apt-get -y install curl procps && apt-get clean
|
||||
|
||||
# Install VPP
|
||||
RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash
|
||||
RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean
|
||||
|
||||
CMD ["/usr/bin/vpp","-c","/etc/vpp/startup.conf"]
|
||||
EOF
|
||||
pim@summer:~/src/vpp-containerlab$ docker build -f Dockerfile.bookworm . -t pimvanpelt/vpp-containerlab
|
||||
```
|
||||
|
||||
One gotcha - when I install the upstream VPP debian packages, they generate a `sysctl` file which it
|
||||
tries to execute. However, I can't set sysctl's in the container, so the build fails. I take a look
|
||||
at the VPP source code and find `src/pkg/debian/vpp.postinst` which helpfully contains a means to
|
||||
override setting the sysctl's, using an environment variable called `VPP_INSTALL_SKIP_SYSCTL`.
|
||||
|
||||
### Running VPP in Docker
|
||||
|
||||
With the Docker image built, I need to tweak the VPP startup configuration a little bit, to allow it
|
||||
to run well in a Docker environment. There are a few things I make note of:
|
||||
1. We may not have huge pages on the host machine, so I'll set all the page sizes to the
|
||||
linux-default 4kB rather than 2MB or 1GB hugepages. This creates a performance regression, but
|
||||
in the case of Containerlab, we're not here to build high performance stuff, but rather users
|
||||
will be doing functional testing.
|
||||
1. DPDK requires either UIO of VFIO kernel drivers, so that it can bind its so-called _poll mode
|
||||
driver_ to the network cards. It also requires huge pages. Since my first version will be
|
||||
using only virtual ethernet interfaces, I'll disable DPDK and VFIO alltogether.
|
||||
1. VPP can run any number of CPU worker threads. In its simplest form, I can also run it with only
|
||||
one thread. Of course, this will not be a high performance setup, but since I'm already not
|
||||
using hugepages, I'll use only 1 thread.
|
||||
|
||||
The VPP `startup.conf` configuration file I came up with:
|
||||
|
||||
```
|
||||
pim@summer:~/src/vpp-containerlab$ cat < EOF > clab-startup.conf
|
||||
unix {
|
||||
interactive
|
||||
log /var/log/vpp/vpp.log
|
||||
full-coredump
|
||||
cli-listen /run/vpp/cli.sock
|
||||
cli-prompt vpp-clab#
|
||||
cli-no-pager
|
||||
poll-sleep-usec 100
|
||||
}
|
||||
|
||||
api-trace {
|
||||
on
|
||||
}
|
||||
|
||||
memory {
|
||||
main-heap-size 512M
|
||||
main-heap-page-size 4k
|
||||
}
|
||||
buffers {
|
||||
buffers-per-numa 16000
|
||||
default data-size 2048
|
||||
page-size 4k
|
||||
}
|
||||
|
||||
statseg {
|
||||
size 64M
|
||||
page-size 4k
|
||||
per-node-counters on
|
||||
}
|
||||
|
||||
plugins {
|
||||
plugin default { enable }
|
||||
plugin dpdk_plugin.so { disable }
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
Just a couple of notes for those who are running VPP in production. Each of the `*-page-size` config
|
||||
settings take the normal Linux pagesize of 4kB, which effectively avoids VPP from using anhy
|
||||
hugepages. Then, I'll specifically disable the DPDK plugin, although I didn't install it in the
|
||||
Dockerfile build, as it lives in its own dedicated Debian package called `vpp-plugin-dpdk`. Finally,
|
||||
I'll make VPP use less CPU by telling it to sleep for 100 microseconds between each poll iteration.
|
||||
In production environments, VPP will use 100% of the CPUs it's assigned, but in this lab, it will
|
||||
not be quite as hungry. By the way, even in this sleepy mode, it'll still easily handle a gigabit
|
||||
of traffic!
|
||||
|
||||
Now, VPP wants to run as root and it needs a few host features, notably tuntap devices and vhost,
|
||||
and a few capabilites, notably NET_ADMIN and SYS_PTRACE. I take a look at the
|
||||
[[manpage](https://man7.org/linux/man-pages/man7/capabilities.7.html)]:
|
||||
* ***CAP_SYS_NICE***: allows to set real-time scheduling, CPU affinity, I/O scheduling class, and
|
||||
to migrate and move memory pages.
|
||||
* ***CAP_NET_ADMIN***: allows to perform various network-relates operations like interface
|
||||
configs, routing tables, nested network namespaces, multicast, set promiscuous mode, and so on.
|
||||
* ***CAP_SYS_PTRACE***: allows to trace arbitrary processes using `ptrace(2)`, and a few related
|
||||
kernel system calls.
|
||||
|
||||
Being a networking dataplane implementation, VPP wants to be able to tinker with network devices.
|
||||
This is not typically allowed in Docker containers, although the Docker developers did make some
|
||||
consessions for those containers that need just that little bit more access. They described it in
|
||||
their
|
||||
[[docs](https://docs.docker.com/engine/containers/run/#runtime-privilege-and-linux-capabilities)] as
|
||||
follows:
|
||||
|
||||
| The --privileged flag gives all capabilities to the container. When the operator executes docker
|
||||
| run --privileged, Docker enables access to all devices on the host, and reconfigures AppArmor or
|
||||
| SELinux to allow the container nearly all the same access to the host as processes running outside
|
||||
| containers on the host. Use this flag with caution. For more information about the --privileged
|
||||
| flag, see the docker run reference.
|
||||
|
||||
{{< image width="4em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
|
||||
In this moment, I feel I should point out that running a Docker container with `--privileged` flag
|
||||
set does give it _a lot_ of privileges. A container with `--privileged` is not a securely sandboxed
|
||||
process. Containers in this mode can get a root shell on the host and take control over the system.
|
||||
|
||||
With that little fineprint warning out of the way, I am going to Yolo like a boss:
|
||||
|
||||
```
|
||||
pim@summer:~/src/vpp-containerlab$ docker run --name clab-pim \
|
||||
--cap-add=NET_ADMIN --cap-add=SYS_NICE --cap-add=SYS_PTRACE \
|
||||
--device=/dev/net/tun:/dev/net/tun --device=/dev/vhost-net:/dev/vhost-net \
|
||||
--privileged -v $(pwd)/clab-startup.conf:/etc/vpp/startup.conf:ro \
|
||||
docker.io/pimvanpelt/vpp-containerlab
|
||||
clab-pim
|
||||
```
|
||||
|
||||
### Configuring VPP in Docker
|
||||
|
||||
And with that, the Docker container is running! I post a screenshot on
|
||||
[[Mastodon](https://ublog.tech/@IPngNetworks/114392852468494211)] and my buddy John responds with a
|
||||
polite but firm insistence that I explain myself. Here you go, buddy :)
|
||||
|
||||
In another terminal, I can play around with this VPP instance a little bit:
|
||||
```
|
||||
pim@summer:~$ docker exec -it clab-pim bash
|
||||
root@d57c3716eee9:/# ip -br l
|
||||
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
|
||||
eth0@if530566 UP 02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
|
||||
|
||||
root@d57c3716eee9:/# ps auxw
|
||||
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
||||
root 1 2.2 0.2 17498852 160300 ? Rs 15:11 0:00 /usr/bin/vpp -c /etc/vpp/startup.conf
|
||||
root 10 0.0 0.0 4192 3388 pts/0 Ss 15:11 0:00 bash
|
||||
root 18 0.0 0.0 8104 4056 pts/0 R+ 15:12 0:00 ps auxw
|
||||
|
||||
root@d57c3716eee9:/# vppctl
|
||||
_______ _ _ _____ ___
|
||||
__/ __/ _ \ (_)__ | | / / _ \/ _ \
|
||||
_/ _// // / / / _ \ | |/ / ___/ ___/
|
||||
/_/ /____(_)_/\___/ |___/_/ /_/
|
||||
|
||||
vpp-clab# show version
|
||||
vpp v25.02-release built by root on d5cd2c304b7f at 2025-02-26T13:58:32
|
||||
vpp-clab# show interfaces
|
||||
Name Idx State MTU (L3/IP4/IP6/MPLS) Counter Count
|
||||
local0 0 down 0/0/0/0
|
||||
```
|
||||
|
||||
Slick! I can see that the container has an `eth0` device, which Docker has connected to the main
|
||||
bridged network. For now, there's only one process running, pid 1 proudly shows VPP (as in Docker,
|
||||
the `CMD` field will simply replace `init`. Later on, I can imagine running a few more daemons like
|
||||
SSH and so on, but for now, I'm happy.
|
||||
|
||||
Looking at VPP itself, it has no network interfaces yet, except for the default `local0` interface.
|
||||
|
||||
### Adding Interfaces in Docker
|
||||
|
||||
But if I don't have DPDK, how will I add interfaces? Enter `veth(4)`. From the
|
||||
[[manpage](https://man7.org/linux/man-pages/man4/veth.4.html)], I learn that veth devices are
|
||||
virtual Ethernet devices. They can act as tunnels between network namespaces to create a bridge to
|
||||
a physical network device in another namespace, but can also be used as standalone network devices.
|
||||
veth devices are always created in interconnected pairs.
|
||||
|
||||
Of course, Docker users will recognize this. It's like bread and butter for containers to
|
||||
communicate with one another - and with the host they're running on. I can simply create a Docker
|
||||
network and attach one half of it to a running container, like so:
|
||||
|
||||
```
|
||||
pim@summer:~$ docker network create --driver=bridge clab-network \
|
||||
--subnet 192.0.2.0/24 --ipv6 --subnet 2001:db8::/64
|
||||
5711b95c6c32ac0ed185a54f39e5af4b499677171ff3d00f99497034e09320d2
|
||||
pim@summer:~$ docker network connect clab-network clab-pim --ip '' --ip6 ''
|
||||
```
|
||||
|
||||
The first command here creates a new network called `clab-network` in Docker. As a result, a new
|
||||
bridge called `br-5711b95c6c32` shows up on the host. The bridge name is chosen from the UUID of the
|
||||
Docker object. Seeing as I added an IPv4 and IPv6 subnet to the bridge, it gets configured with the
|
||||
first address in both:
|
||||
|
||||
```
|
||||
pim@summer:~/src/vpp-containerlab$ brctl show br-5711b95c6c32
|
||||
bridge name bridge id STP enabled interfaces
|
||||
br-5711b95c6c32 8000.0242099728c6 no veth021e363
|
||||
|
||||
|
||||
pim@summer:~/src/vpp-containerlab$ ip -br a show dev br-5711b95c6c32
|
||||
br-5711b95c6c32 UP 192.0.2.1/24 2001:db8::1/64 fe80::42:9ff:fe97:28c6/64 fe80::1/64
|
||||
```
|
||||
|
||||
The second command creates a `veth` pair, and puts one half of it in the bridge, and this interface
|
||||
is called `veth021e363` above. The other half of it pops up as `eth1` in the Docker container:
|
||||
|
||||
```
|
||||
pim@summer:~/src/vpp-containerlab$ docker exec -it clab-pim bash
|
||||
root@d57c3716eee9:/# ip -br l
|
||||
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
|
||||
eth0@if530566 UP 02:42:ac:11:00:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
|
||||
eth1@if530577 UP 02:42:c0:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
|
||||
```
|
||||
|
||||
One of the many awesome features of VPP is its ability to attach to these `veth` devices by means of
|
||||
its `af-packet` driver, by reusing the same MAC address (in this case `02:42:c0:00:02:02`). I first
|
||||
take a look at the linux [[manpage](https://man7.org/linux/man-pages/man7/packet.7.html)] for it,
|
||||
and then read up on the VPP
|
||||
[[documentation](https://fd.io/docs/vpp/v2101/gettingstarted/progressivevpp/interface)] on the
|
||||
topic.
|
||||
|
||||
|
||||
However, my attention is drawn to Docker assigning an IPv4 and IPv6 address to the container:
|
||||
```
|
||||
root@d57c3716eee9:/# ip -br a
|
||||
lo UNKNOWN 127.0.0.1/8 ::1/128
|
||||
eth0@if530566 UP 172.17.0.2/16
|
||||
eth1@if530577 UP 192.0.2.2/24 2001:db8::2/64 fe80::42:c0ff:fe00:202/64
|
||||
root@d57c3716eee9:/# ip addr del 192.0.2.2/24 dev eth1
|
||||
root@d57c3716eee9:/# ip addr del 2001:db8::2/64 dev eth1
|
||||
```
|
||||
|
||||
I decide to remove them from here, as in the end, `eth1` will be owned by VPP so _it_ should be
|
||||
setting the IPv4 and IPv6 addresses. For the life of me, I don't see how I can avoid Docker from
|
||||
assinging IPv4 and IPv6 addresses to this container ... and the
|
||||
[[docs](https://docs.docker.com/engine/network/)] seem to be off as well, as they suggest I can pass
|
||||
a flagg `--ipv4=False` but that flag doesn't exist, at least not on my Bookworm Docker variant. I
|
||||
make a mental note to discuss this with the folks in the Containerlab community.
|
||||
|
||||
|
||||
Anyway, armed with this knowledge I can bind the container-side veth pair called `eth1` to VPP, like
|
||||
so:
|
||||
|
||||
```
|
||||
root@d57c3716eee9:/# vppctl
|
||||
_______ _ _ _____ ___
|
||||
__/ __/ _ \ (_)__ | | / / _ \/ _ \
|
||||
_/ _// // / / / _ \ | |/ / ___/ ___/
|
||||
/_/ /____(_)_/\___/ |___/_/ /_/
|
||||
|
||||
vpp-clab# create host-interface name eth1 hw-addr 02:42:c0:00:02:02
|
||||
vpp-clab# set interface name host-eth1 eth1
|
||||
vpp-clab# set interface mtu 1500 eth1
|
||||
vpp-clab# set interface ip address eth1 192.0.2.2/24
|
||||
vpp-clab# set interface ip address eth1 2001:db8::2/64
|
||||
vpp-clab# set interface state eth1 up
|
||||
vpp-clab# show int addr
|
||||
eth1 (up):
|
||||
L3 192.0.2.2/24
|
||||
L3 2001:db8::2/64
|
||||
local0 (dn):
|
||||
```
|
||||
|
||||
## Results
|
||||
|
||||
After all this work, I've successfully created a Docker image based on Debian Bookworm and VPP 25.02
|
||||
(the current stable release version), started a container with it, added a network bridge in Docker,
|
||||
which binds the host `summer` to the container. Proof, as they say, is in the ping-pudding:
|
||||
|
||||
```
|
||||
pim@summer:~/src/vpp-containerlab$ ping -c5 2001:db8::2
|
||||
PING 2001:db8::2(2001:db8::2) 56 data bytes
|
||||
64 bytes from 2001:db8::2: icmp_seq=1 ttl=64 time=0.113 ms
|
||||
64 bytes from 2001:db8::2: icmp_seq=2 ttl=64 time=0.056 ms
|
||||
64 bytes from 2001:db8::2: icmp_seq=3 ttl=64 time=0.202 ms
|
||||
64 bytes from 2001:db8::2: icmp_seq=4 ttl=64 time=0.102 ms
|
||||
64 bytes from 2001:db8::2: icmp_seq=5 ttl=64 time=0.100 ms
|
||||
|
||||
--- 2001:db8::2 ping statistics ---
|
||||
5 packets transmitted, 5 received, 0% packet loss, time 4098ms
|
||||
rtt min/avg/max/mdev = 0.056/0.114/0.202/0.047 ms
|
||||
pim@summer:~/src/vpp-containerlab$ ping -c5 192.0.2.2
|
||||
PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
|
||||
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.043 ms
|
||||
64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.032 ms
|
||||
64 bytes from 192.0.2.2: icmp_seq=3 ttl=64 time=0.019 ms
|
||||
64 bytes from 192.0.2.2: icmp_seq=4 ttl=64 time=0.041 ms
|
||||
64 bytes from 192.0.2.2: icmp_seq=5 ttl=64 time=0.027 ms
|
||||
|
||||
--- 192.0.2.2 ping statistics ---
|
||||
5 packets transmitted, 5 received, 0% packet loss, time 4063ms
|
||||
rtt min/avg/max/mdev = 0.019/0.032/0.043/0.008 ms
|
||||
```
|
||||
|
||||
And in case that simple ping-test wasn't enough to get you excited, here's a packet trace from VPP
|
||||
itself, while I'm performing this ping:
|
||||
|
||||
```
|
||||
vpp-clab# trace add af-packet-input 100
|
||||
vpp-clab# wait 3
|
||||
vpp-clab# show trace
|
||||
------------------- Start of thread 0 vpp_main -------------------
|
||||
Packet 1
|
||||
|
||||
00:07:03:979275: af-packet-input
|
||||
af_packet: hw_if_index 1 rx-queue 0 next-index 4
|
||||
block 47:
|
||||
address 0x7fbf23b7d000 version 2 seq_num 48 pkt_num 0
|
||||
tpacket3_hdr:
|
||||
status 0x20000001 len 98 snaplen 98 mac 92 net 106
|
||||
sec 0x68164381 nsec 0x258e7659 vlan 0 vlan_tpid 0
|
||||
vnet-hdr:
|
||||
flags 0x00 gso_type 0x00 hdr_len 0
|
||||
gso_size 0 csum_start 0 csum_offset 0
|
||||
00:07:03:979293: ethernet-input
|
||||
IP4: 02:42:09:97:28:c6 -> 02:42:c0:00:02:02
|
||||
00:07:03:979306: ip4-input
|
||||
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||
fragment id 0x5813, flags DONT_FRAGMENT
|
||||
ICMP echo_request checksum 0xc16 id 21197
|
||||
00:07:03:979315: ip4-lookup
|
||||
fib 0 dpo-idx 9 flow hash: 0x00000000
|
||||
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||
fragment id 0x5813, flags DONT_FRAGMENT
|
||||
ICMP echo_request checksum 0xc16 id 21197
|
||||
00:07:03:979322: ip4-receive
|
||||
fib:0 adj:9 flow:0x00000000
|
||||
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||
fragment id 0x5813, flags DONT_FRAGMENT
|
||||
ICMP echo_request checksum 0xc16 id 21197
|
||||
00:07:03:979323: ip4-icmp-input
|
||||
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||
fragment id 0x5813, flags DONT_FRAGMENT
|
||||
ICMP echo_request checksum 0xc16 id 21197
|
||||
00:07:03:979323: ip4-icmp-echo-request
|
||||
ICMP: 192.0.2.1 -> 192.0.2.2
|
||||
tos 0x00, ttl 64, length 84, checksum 0x5e92 dscp CS0 ecn NON_ECN
|
||||
fragment id 0x5813, flags DONT_FRAGMENT
|
||||
ICMP echo_request checksum 0xc16 id 21197
|
||||
00:07:03:979326: ip4-load-balance
|
||||
fib 0 dpo-idx 5 flow hash: 0x00000000
|
||||
ICMP: 192.0.2.2 -> 192.0.2.1
|
||||
tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
|
||||
fragment id 0x2dc4, flags DONT_FRAGMENT
|
||||
ICMP echo_reply checksum 0x1416 id 21197
|
||||
00:07:03:979325: ip4-rewrite
|
||||
tx_sw_if_index 1 dpo-idx 5 : ipv4 via 192.0.2.1 eth1: mtu:1500 next:3 flags:[] 0242099728c60242c00002020800 flow hash: 0x00000000
|
||||
00000000: 0242099728c60242c00002020800450000542dc44000400188e1c0000202c000
|
||||
00000020: 02010000141652cd00018143166800000000399d0900000000001011
|
||||
00:07:03:979326: eth1-output
|
||||
eth1 flags 0x02180005
|
||||
IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
|
||||
ICMP: 192.0.2.2 -> 192.0.2.1
|
||||
tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
|
||||
fragment id 0x2dc4, flags DONT_FRAGMENT
|
||||
ICMP echo_reply checksum 0x1416 id 21197
|
||||
00:07:03:979327: eth1-tx
|
||||
af_packet: hw_if_index 1 tx-queue 0
|
||||
tpacket3_hdr:
|
||||
status 0x1 len 108 snaplen 108 mac 0 net 0
|
||||
sec 0x0 nsec 0x0 vlan 0 vlan_tpid 0
|
||||
vnet-hdr:
|
||||
flags 0x00 gso_type 0x00 hdr_len 0
|
||||
gso_size 0 csum_start 0 csum_offset 0
|
||||
buffer 0xf97c4:
|
||||
current data 0, length 98, buffer-pool 0, ref-count 1, trace handle 0x0
|
||||
local l2-hdr-offset 0 l3-hdr-offset 14
|
||||
IP4: 02:42:c0:00:02:02 -> 02:42:09:97:28:c6
|
||||
ICMP: 192.0.2.2 -> 192.0.2.1
|
||||
tos 0x00, ttl 64, length 84, checksum 0x88e1 dscp CS0 ecn NON_ECN
|
||||
fragment id 0x2dc4, flags DONT_FRAGMENT
|
||||
ICMP echo_reply checksum 0x1416 id 21197
|
||||
```
|
||||
|
||||
Well, that's a mouthfull, isn't it! Here, I get to show you VPP in action. After receiving the
|
||||
packet on its `af-packet-input` node from 192.0.2.1 (Summer, who is pinging us) to 192.0.2.2 (the
|
||||
VPP container), the packet traverses the dataplane graph. It goes through `ethernet-input`, then
|
||||
`ip4-input`, which sees it's destined to an IPv4 address configured, so the packet is handed to
|
||||
`ip4-receive`. That one sees that the IP protocol is ICMP, so it hands the packet to
|
||||
`ip4-icmp-input` which notices that the packet is an ICMP echo request, so off to
|
||||
`ip4-icmp-echo-request` our little packet goes. The ICMP plugin in VPP now answers by
|
||||
`ip4-rewrite`'ing the packet, sending the return to 192.0.2.1 at MAC address `02:42:09:97:28:c6`
|
||||
(this is Summer, the host doing the pinging!), after which the newly created ICMP echo-reply is
|
||||
handed to `eth1-output` which marshalls it back into the kernel's AF_PACKET interface using
|
||||
`eth1-tx`.
|
||||
|
||||
Boom. I could not be more pleased.
|
||||
|
||||
## What's Next
|
||||
|
||||
This was a nice exercise for me! I'm going this direction becaue the
|
||||
[[Containerlab](https://containerlab.dev)] framework will start containers with given NOS images,
|
||||
not too dissimilar from the one I just made, and then attaches `veth` pairs between the containers.
|
||||
I started dabbling with a [[pull-request](https://github.com/srl-labs/containerlab/pull/2571)], but
|
||||
I got stuck with a part of the Containerlab code that pre-deploys config files into the containers.
|
||||
You see, I will need to generate two files:
|
||||
|
||||
1. A `startup.conf` file that is specific to the containerlab Docker container. I'd like them to
|
||||
each set their own hostname so that the CLI has a unique prompt. I can do this by setting `unix
|
||||
{ cli-prompt {{ .ShortName }}# }` in the template renderer.
|
||||
1. Containerlab will know all of the veth pairs that are planned to be created into each VPP
|
||||
container. I'll need it to then write a little snippet of config that does the `create
|
||||
host-interface` spiel, to attach these `veth` pairs to the VPP dataplane.
|
||||
|
||||
I reached out to Roman from Nokia, who is one of the authors and current maintainer of Containerlab.
|
||||
Roman was keen to help out, and seeing as he knows the COntainerlab stuff well, and I know the VPP
|
||||
stuff well, this is a reasonable partnership! Soon, he and I plan to have a bare-bones setup that
|
||||
will connect a few VPP containers together with an SR Linux node in a lab. Stand by!
|
||||
|
||||
Once we have that, there's still quite some work for me to do. Notably:
|
||||
* Configuration persistence. `clab` allows you to save the running config. For that, I'll need to
|
||||
introduce [[vppcfg](https://git.ipng.ch/ipng/vppcfg)] and a means to invoke it when
|
||||
the lab operator wants to save their config, and then reconfigure VPP when the container
|
||||
restarts.
|
||||
* I'll need to have a few files from `clab` shared with the host, notably the `startup.conf` and
|
||||
`vppcfg.yaml`, as well as some manual pre- and post-flight configuration for the more esoteric
|
||||
stuff. Building the plumbing for this is a TODO for now.
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
I wanted to give a shout-out to Nardus le Roux who inspired me to contribute this Containerlab VPP
|
||||
node type, and to Roman Dodin for his help getting the Containerlab parts squared away when I got a
|
||||
little bit stuck.
|
||||
|
||||
First order of business: get it to ping at all ... it'll go faster from there on out :)
|
373
content/articles/2025-05-04-containerlab-2.md
Normal file
373
content/articles/2025-05-04-containerlab-2.md
Normal file
@ -0,0 +1,373 @@
|
||||
---
|
||||
date: "2025-05-04T15:07:23Z"
|
||||
title: 'VPP in Containerlab - Part 2'
|
||||
params:
|
||||
asciinema: true
|
||||
---
|
||||
|
||||
{{< image float="right" src="/assets/containerlab/containerlab.svg" alt="Containerlab Logo" width="12em" >}}
|
||||
|
||||
# Introduction
|
||||
|
||||
From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in
|
||||
AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance.
|
||||
However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines
|
||||
like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to
|
||||
allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP
|
||||
performance almost the same as on bare metal. But did you know that VPP can also run in Docker?
|
||||
|
||||
The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa.
|
||||
One of the presenters was Nardus le Roux of Nokia, and he showed off a project called
|
||||
[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing
|
||||
container-based networking labs. It starts the containers, builds virtual wiring between them to
|
||||
create lab topologies of users' choice and manages the lab lifecycle.
|
||||
|
||||
Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise
|
||||
to actually add it. In my previous [[article]({{< ref 2025-05-03-containerlab-1.md >}})], I took
|
||||
a good look at VPP as a dockerized container. In this article, I'll explore how to make such a
|
||||
container run in Containerlab!
|
||||
|
||||
## Completing the Docker container
|
||||
|
||||
Just having VPP running by itself in a container is not super useful (although it _is_ cool!). I
|
||||
decide first to add a few bits and bobs that will come in handy in the `Dockerfile`:
|
||||
|
||||
```
|
||||
FROM debian:bookworm
|
||||
ARG DEBIAN_FRONTEND=noninteractive
|
||||
ARG VPP_INSTALL_SKIP_SYSCTL=true
|
||||
ARG REPO=release
|
||||
EXPOSE 22/tcp
|
||||
RUN apt-get update && apt-get -y install curl procps tcpdump iproute2 iptables \
|
||||
iputils-ping net-tools git python3 python3-pip vim-tiny openssh-server bird2 \
|
||||
mtr-tiny traceroute && apt-get clean
|
||||
|
||||
# Install VPP
|
||||
RUN mkdir -p /var/log/vpp /root/.ssh/
|
||||
RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash
|
||||
RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean
|
||||
|
||||
# Build vppcfg
|
||||
RUN pip install --break-system-packages build netaddr yamale argparse pyyaml ipaddress
|
||||
RUN git clone https://git.ipng.ch/ipng/vppcfg.git && cd vppcfg && python3 -m build && \
|
||||
pip install --break-system-packages dist/vppcfg-*-py3-none-any.whl
|
||||
|
||||
# Config files
|
||||
COPY files/etc/vpp/* /etc/vpp/
|
||||
COPY files/etc/bird/* /etc/bird/
|
||||
COPY files/init-container.sh /sbin/
|
||||
RUN chmod 755 /sbin/init-container.sh
|
||||
CMD ["/sbin/init-container.sh"]
|
||||
```
|
||||
|
||||
A few notable additions:
|
||||
* ***vppcfg*** is a handy utility I wrote and discussed in a previous [[article]({{< ref
|
||||
2022-04-02-vppcfg-2 >}})]. Its purpose is to take YAML file that describes the configuration of
|
||||
the dataplane (like which interfaces, sub-interfaces, MTU, IP addresses and so on), and then
|
||||
apply this safely to a running dataplane. You can check it out in my
|
||||
[[vppcfg](https://git.ipng.ch/ipng/vppcfg)] git repository.
|
||||
* ***openssh-server*** will come in handy to log in to the container, in addition to the already
|
||||
available `docker exec`.
|
||||
* ***bird2*** which will be my controlplane of choice. At a future date, I might also add FRR,
|
||||
which may be a good alterantive for some. VPP works well with both. You can check out Bird on
|
||||
the nic.cz [[website](https://bird.network.cz/?get_doc&f=bird.html&v=20)].
|
||||
|
||||
I'll add a couple of default config files for Bird and VPP, and replace the CMD with a generic
|
||||
`/sbin/init-container.sh` in which I can do any late binding stuff before launching VPP.
|
||||
|
||||
### Initializing the Container
|
||||
|
||||
#### VPP Containerlab: NetNS
|
||||
|
||||
VPP's Linux Control Plane plugin wants to run in its own network namespace. So the first order of
|
||||
business of `/sbin/init-container.sh` is to create it:
|
||||
|
||||
```
|
||||
NETNS=${NETNS:="dataplane"}
|
||||
|
||||
echo "Creating dataplane namespace"
|
||||
/usr/bin/mkdir -p /etc/netns/$NETNS
|
||||
/usr/bin/touch /etc/netns/$NETNS/resolv.conf
|
||||
/usr/sbin/ip netns add $NETNS
|
||||
```
|
||||
|
||||
#### VPP Containerlab: SSH
|
||||
|
||||
Then, I'll set the root password (which is `vpp` by the way), and start aan SSH daemon which allows
|
||||
for password-less logins:
|
||||
|
||||
```
|
||||
echo "Starting SSH, with credentials root:vpp"
|
||||
sed -i -e 's,^#PermitRootLogin prohibit-password,PermitRootLogin yes,' /etc/ssh/sshd_config
|
||||
sed -i -e 's,^root:.*,root:$y$j9T$kG8pyZEVmwLXEtXekQCRK.$9iJxq/bEx5buni1hrC8VmvkDHRy7ZMsw9wYvwrzexID:20211::::::,' /etc/shadow
|
||||
/etc/init.d/ssh start
|
||||
```
|
||||
|
||||
#### VPP Containerlab: Bird2
|
||||
|
||||
I can already predict that Bird2 won't be the only option for a controlplane, even though I'm a huge
|
||||
fan of it. Therefore, I'll make it configurable to leave the door open for other controlplane
|
||||
implementations in the future:
|
||||
|
||||
```
|
||||
BIRD_ENABLED=${BIRD_ENABLED:="true"}
|
||||
|
||||
if [ "$BIRD_ENABLED" == "true" ]; then
|
||||
echo "Starting Bird in $NETNS"
|
||||
mkdir -p /run/bird /var/log/bird
|
||||
chown bird:bird /var/log/bird
|
||||
ROUTERID=$(ip -br a show eth0 | awk '{ print $3 }' | cut -f1 -d/)
|
||||
sed -i -e "s,.*router id .*,router id $ROUTERID; # Set by container-init.sh," /etc/bird/bird.conf
|
||||
/usr/bin/nsenter --net=/var/run/netns/$NETNS /usr/sbin/bird -u bird -g bird
|
||||
fi
|
||||
```
|
||||
|
||||
I am reminded that Bird won't start if it cannot determine its _router id_. When I start it in the
|
||||
`dataplane` namespace, it will immediately exit, because there will be no IP addresses configured
|
||||
yet. But luckily, it logs its complaint and it's easily addressed. I decide to take the management
|
||||
IPv4 address from `eth0` and write that into the `bird.conf` file, which otherwise does some basic
|
||||
initialization that I described in a previous [[article]({{< ref 2021-09-02-vpp-5 >}})], so I'll
|
||||
skip that here. However, I do include an empty file called `/etc/bird/bird-local.conf` for users to
|
||||
further configure Bird2.
|
||||
|
||||
#### VPP Containerlab: Binding veth pairs
|
||||
|
||||
When Containerlab starts the VPP container, it'll offer it a set of `veth` ports that connect this
|
||||
container to other nodes in the lab. This is done by the `links` list in the topology file
|
||||
[[ref](https://containerlab.dev/manual/network/)]. It's my goal to take all of the interfaces
|
||||
that are of type `veth`, and generate a little snippet to grab them and bind them into VPP while
|
||||
setting their MTU to 9216 to allow for jumbo frames:
|
||||
|
||||
```
|
||||
CLAB_VPP_FILE=${CLAB_VPP_FILE:=/etc/vpp/clab.vpp}
|
||||
|
||||
echo "Generating $CLAB_VPP_FILE"
|
||||
: > $CLAB_VPP_FILE
|
||||
MTU=9216
|
||||
for IFNAME in $(ip -br link show type veth | cut -f1 -d@ | grep -v '^eth0$' | sort); do
|
||||
MAC=$(ip -br link show dev $IFNAME | awk '{ print $3 }')
|
||||
echo " * $IFNAME hw-addr $MAC mtu $MTU"
|
||||
ip link set $IFNAME up mtu $MTU
|
||||
cat << EOF >> $CLAB_VPP_FILE
|
||||
create host-interface name $IFNAME hw-addr $MAC
|
||||
set interface name host-$IFNAME $IFNAME
|
||||
set interface mtu $MTU $IFNAME
|
||||
set interface state $IFNAME up
|
||||
|
||||
EOF
|
||||
done
|
||||
```
|
||||
|
||||
{{< image width="5em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
|
||||
|
||||
One thing I realized is that VPP will assign a random MAC address on its copy of the `veth` port,
|
||||
which is not great. I'll explicitly configure it with the same MAC address as the `veth` interface
|
||||
itself, otherwise I'd have to put the interface into promiscuous mode.
|
||||
|
||||
#### VPP Containerlab: VPPcfg
|
||||
|
||||
I'm almost ready, but I have one more detail. The user will be able to offer a
|
||||
[[vppcfg](https://git.ipng.ch/ipng/vppcfg)] YAML file to configure the interfaces and so on. If such
|
||||
a file exists, I'll apply it to the dataplane upon startup:
|
||||
|
||||
```
|
||||
VPPCFG_VPP_FILE=${VPPCFG_VPP_FILE:=/etc/vpp/vppcfg.vpp}
|
||||
|
||||
echo "Generating $VPPCFG_VPP_FILE"
|
||||
: > $VPPCFG_VPP_FILE
|
||||
if [ -r /etc/vpp/vppcfg.yaml ]; then
|
||||
vppcfg plan --novpp -c /etc/vpp/vppcfg.yaml -o $VPPCFG_VPP_FILE
|
||||
fi
|
||||
```
|
||||
|
||||
Once the VPP process starts, it'll execute `/etc/vpp/bootstrap.vpp`, which in turn executes these
|
||||
newly generated `/etc/vpp/clab.vpp` to grab the `veth` interfaces, and then `/etc/vpp/vppcfg.vpp` to
|
||||
further configure the dataplane. Easy peasy!
|
||||
|
||||
### Adding VPP to Containerlab
|
||||
|
||||
Roman points out a previous integration for the 6WIND VSR in
|
||||
[[PR#2540](https://github.com/srl-labs/containerlab/pull/2540)]. This serves as a useful guide to
|
||||
get me started. I fork the repo, create a branch so that Roman can also add a few commits, and
|
||||
together we start hacking in [[PR#2571](https://github.com/srl-labs/containerlab/pull/2571)].
|
||||
|
||||
First, I add the documentation skeleton in `docs/manual/kinds/fdio_vpp.md`, which links in from a
|
||||
few other places, and will be where the end-user facing documentation will live. That's about half
|
||||
the contributed LOC, right there!
|
||||
|
||||
Next, I'll create a Go module in `nodes/fdio_vpp/fdio_vpp.go` which doesn't do much other than
|
||||
creating the `struct`, and its required `Register` and `Init` functions. The `Init` function ensures
|
||||
the right capabilities are set in Docker, and the right devices are bound for the container.
|
||||
|
||||
I notice that Containerlab rewrites the Dockerfile `CMD` string and prepends an `if-wait.sh` script
|
||||
to it. This is because when Containerlab starts the container, it'll still be busy adding these
|
||||
`link` interfaces to it, and if a container starts too quickly, it may not see all the interfaces.
|
||||
So, containerlab informs the container using an environment variable called `CLAB_INTFS`, so this
|
||||
script simply sleeps for a while until that exact amount of interfaces are present. Ok, cool beans.
|
||||
|
||||
Roman helps me a bit with Go templating. You see, I think it'll be slick to have the CLI prompt for
|
||||
the VPP containers to reflect their hostname, because normally, VPP will assign `vpp# `. I add the
|
||||
template in `nodes/fdio_vpp/vpp_startup_config.go.tpl` and it only has one variable expansion: `unix
|
||||
{ cli-prompt {{ .ShortName }}# }`. But I totally think it's worth it, because when running many VPP
|
||||
containers in the lab, it could otherwise get confusing.
|
||||
|
||||
Roman also shows me a trick in the function `PostDeploy()`, which will write the user's SSH pubkeys
|
||||
to `/root/.ssh/authorized_keys`. This allows users to log in without having to use password
|
||||
authentication.
|
||||
|
||||
Collectively, we decide to punt on the `SaveConfig` function until we're a bit further along. I have
|
||||
an idea how this would work, basically along the lines of calling `vppcfg dump` and bind-mounting
|
||||
that file into the lab directory somewhere. This way, upon restarting, the YAML file can be re-read
|
||||
and the dataplane initialized. But it'll be for another day.
|
||||
|
||||
After the main module is finished, all I have to do is add it to `clab/register.go` and that's just
|
||||
about it. In about 170 lines of code, 50 lines of Go template, and 170 lines of Markdown, this
|
||||
contribution is about ready to ship!
|
||||
|
||||
### Containerlab: Demo
|
||||
|
||||
After I finish writing the documentation, I decide to include a demo with a quickstart to help folks
|
||||
along. A simple lab showing two VPP instances and two Alpine Linux clients can be found on
|
||||
[[git.ipng.ch/ipng/vpp-containerlab](https://git.ipng.ch/ipng/vpp-containerlab)]. Simply check out the
|
||||
repo and start the lab, like so:
|
||||
|
||||
```
|
||||
$ git clone https://git.ipng.ch/ipng/vpp-containerlab.git
|
||||
$ cd vpp-containerlab
|
||||
$ containerlab deploy --topo vpp.clab.yml
|
||||
```
|
||||
|
||||
#### Containerlab: configs
|
||||
|
||||
The file `vpp.clab.yml` contains an example topology existing of two VPP instances connected each to
|
||||
one Alpine linux container, in the following topology:
|
||||
|
||||
{{< image src="/assets/containerlab/learn-vpp.png" alt="Containerlab Topo" width="100%" >}}
|
||||
|
||||
Two relevant files for each VPP router are included in this
|
||||
[[repository](https://git.ipng.ch/ipng/vpp-containerlab)]:
|
||||
1. `config/vpp*/vppcfg.yaml` configures the dataplane interfaces, including a loopback address.
|
||||
1. `config/vpp*/bird-local.conf` configures the controlplane to enable BFD and OSPF.
|
||||
|
||||
To illustrate these files, let me take a closer look at node `vpp1`. It's VPP dataplane
|
||||
configuration looks like this:
|
||||
```
|
||||
pim@summer:~/src/vpp-containerlab$ cat config/vpp1/vppcfg.yaml
|
||||
interfaces:
|
||||
eth1:
|
||||
description: 'To client1'
|
||||
mtu: 1500
|
||||
lcp: eth1
|
||||
addresses: [ 10.82.98.65/28, 2001:db8:8298:101::1/64 ]
|
||||
eth2:
|
||||
description: 'To vpp2'
|
||||
mtu: 9216
|
||||
lcp: eth2
|
||||
addresses: [ 10.82.98.16/31, 2001:db8:8298:1::1/64 ]
|
||||
loopbacks:
|
||||
loop0:
|
||||
description: 'vpp1'
|
||||
lcp: loop0
|
||||
addresses: [ 10.82.98.0/32, 2001:db8:8298::/128 ]
|
||||
```
|
||||
|
||||
Then, I enable BFD, OSPF and OSPFv3 on `eth2` and `loop0` on both of the VPP routers:
|
||||
```
|
||||
pim@summer:~/src/vpp-containerlab$ cat config/vpp1/bird-local.conf
|
||||
protocol bfd bfd1 {
|
||||
interface "eth2" { interval 100 ms; multiplier 30; };
|
||||
}
|
||||
|
||||
protocol ospf v2 ospf4 {
|
||||
ipv4 { import all; export all; };
|
||||
area 0 {
|
||||
interface "loop0" { stub yes; };
|
||||
interface "eth2" { type pointopoint; cost 10; bfd on; };
|
||||
};
|
||||
}
|
||||
|
||||
protocol ospf v3 ospf6 {
|
||||
ipv6 { import all; export all; };
|
||||
area 0 {
|
||||
interface "loop0" { stub yes; };
|
||||
interface "eth2" { type pointopoint; cost 10; bfd on; };
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
#### Containerlab: playtime!
|
||||
|
||||
Once the lab comes up, I can SSH to the VPP containers (`vpp1` and `vpp2`) which have my SSH pubkeys
|
||||
installed thanks to Roman's work. Barring that, I could still log in as user `root` using
|
||||
password `vpp`. VPP runs its own network namespace called `dataplane`, which is very similar to SR
|
||||
Linux default `network-instance`. I can join that namespace to take a closer look:
|
||||
|
||||
```
|
||||
pim@summer:~/src/vpp-containerlab$ ssh root@vpp1
|
||||
root@vpp1:~# nsenter --net=/var/run/netns/dataplane
|
||||
root@vpp1:~# ip -br a
|
||||
lo DOWN
|
||||
loop0 UP 10.82.98.0/32 2001:db8:8298::/128 fe80::dcad:ff:fe00:0/64
|
||||
eth1 UNKNOWN 10.82.98.65/28 2001:db8:8298:101::1/64 fe80::a8c1:abff:fe77:acb9/64
|
||||
eth2 UNKNOWN 10.82.98.16/31 2001:db8:8298:1::1/64 fe80::a8c1:abff:fef0:7125/64
|
||||
|
||||
root@vpp1:~# ping 10.82.98.1
|
||||
PING 10.82.98.1 (10.82.98.1) 56(84) bytes of data.
|
||||
64 bytes from 10.82.98.1: icmp_seq=1 ttl=64 time=9.53 ms
|
||||
64 bytes from 10.82.98.1: icmp_seq=2 ttl=64 time=15.9 ms
|
||||
^C
|
||||
--- 10.82.98.1 ping statistics ---
|
||||
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
|
||||
rtt min/avg/max/mdev = 9.530/12.735/15.941/3.205 ms
|
||||
```
|
||||
|
||||
From `vpp1`, I can tell that Bird2's OSPF adjacency has formed, because I can ping the `loop0`
|
||||
address of `vpp2` router on 10.82.98.1. Nice! The two client nodes are running a minimalistic Alpine
|
||||
Linux container, which doesn't ship with SSH by default. But of course I can still enter the
|
||||
containers using `docker exec`, like so:
|
||||
|
||||
```
|
||||
pim@summer:~/src/vpp-containerlab$ docker exec -it client1 sh
|
||||
/ # ip addr show dev eth1
|
||||
531235: eth1@if531234: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 9500 qdisc noqueue state UP
|
||||
link/ether 00:c1:ab:00:00:01 brd ff:ff:ff:ff:ff:ff
|
||||
inet 10.82.98.66/28 scope global eth1
|
||||
valid_lft forever preferred_lft forever
|
||||
inet6 2001:db8:8298:101::2/64 scope global
|
||||
valid_lft forever preferred_lft forever
|
||||
inet6 fe80::2c1:abff:fe00:1/64 scope link
|
||||
valid_lft forever preferred_lft forever
|
||||
/ # traceroute 10.82.98.82
|
||||
traceroute to 10.82.98.82 (10.82.98.82), 30 hops max, 46 byte packets
|
||||
1 10.82.98.65 (10.82.98.65) 5.906 ms 7.086 ms 7.868 ms
|
||||
2 10.82.98.17 (10.82.98.17) 24.007 ms 23.349 ms 15.933 ms
|
||||
3 10.82.98.82 (10.82.98.82) 39.978 ms 31.127 ms 31.854 ms
|
||||
|
||||
/ # traceroute 2001:db8:8298:102::2
|
||||
traceroute to 2001:db8:8298:102::2 (2001:db8:8298:102::2), 30 hops max, 72 byte packets
|
||||
1 2001:db8:8298:101::1 (2001:db8:8298:101::1) 0.701 ms 7.144 ms 7.900 ms
|
||||
2 2001:db8:8298:1::2 (2001:db8:8298:1::2) 23.909 ms 22.943 ms 23.893 ms
|
||||
3 2001:db8:8298:102::2 (2001:db8:8298:102::2) 31.964 ms 30.814 ms 32.000 ms
|
||||
```
|
||||
|
||||
From the vantage point of `client1`, the first hop represents the `vpp1` node, which forwards to
|
||||
`vpp2`, which finally forwards to `client2`, which shows that both VPP routers are passing traffic.
|
||||
Dope!
|
||||
|
||||
## Results
|
||||
|
||||
After all of this deep-diving, all that's left is for me to demonstrate the Containerlab by means of
|
||||
this little screencast [[asciinema](/assets/containerlab/vpp-containerlab.cast)]. I hope you enjoy
|
||||
it as much as I enjoyed creating it:
|
||||
|
||||
{{< asciinema src="/assets/containerlab/vpp-containerlab.cast" >}}
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
I wanted to give a shout-out Roman Dodin for his help getting the Containerlab parts squared away
|
||||
when I got a little bit stuck. He took the time to explain the internals and idiom of Containerlab
|
||||
project, which really saved me a tonne of time. He also pair-programmed the
|
||||
[[PR#2471](https://github.com/srl-labs/containerlab/pull/2571)] with me over the span of two
|
||||
evenings.
|
||||
|
||||
Collaborative open source rocks!
|
@ -34,3 +34,5 @@ taxonomies:
|
||||
|
||||
permalinks:
|
||||
articles: "/s/articles/:year/:month/:day/:slug"
|
||||
|
||||
ignoreLogs: [ "warning-goldmark-raw-html" ]
|
||||
|
1
static/assets/containerlab/containerlab.svg
Normal file
1
static/assets/containerlab/containerlab.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 21 KiB |
BIN
static/assets/containerlab/learn-vpp.png
(Stored with Git LFS)
Normal file
BIN
static/assets/containerlab/learn-vpp.png
(Stored with Git LFS)
Normal file
Binary file not shown.
1270
static/assets/containerlab/vpp-containerlab.cast
Normal file
1270
static/assets/containerlab/vpp-containerlab.cast
Normal file
File diff suppressed because it is too large
Load Diff
1
static/assets/frys-ix/FrysIX_ Topology (concept).svg
Normal file
1
static/assets/frys-ix/FrysIX_ Topology (concept).svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 90 KiB |
BIN
static/assets/frys-ix/IXR-7220-D3.jpg
(Stored with Git LFS)
Normal file
BIN
static/assets/frys-ix/IXR-7220-D3.jpg
(Stored with Git LFS)
Normal file
Binary file not shown.
1
static/assets/frys-ix/Nokia Arista VXLAN.svg
Normal file
1
static/assets/frys-ix/Nokia Arista VXLAN.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 166 KiB |
169
static/assets/frys-ix/arista-leaf.conf
Normal file
169
static/assets/frys-ix/arista-leaf.conf
Normal file
@ -0,0 +1,169 @@
|
||||
no aaa root
|
||||
!
|
||||
hardware counter feature vtep decap
|
||||
hardware counter feature vtep encap
|
||||
!
|
||||
service routing protocols model multi-agent
|
||||
!
|
||||
hostname arista-leaf
|
||||
!
|
||||
router l2-vpn
|
||||
arp learning bridged
|
||||
!
|
||||
spanning-tree mode mstp
|
||||
!
|
||||
system l1
|
||||
unsupported speed action error
|
||||
unsupported error-correction action error
|
||||
!
|
||||
vlan 2604
|
||||
name v-peeringlan
|
||||
!
|
||||
interface Ethernet1/1
|
||||
!
|
||||
interface Ethernet2/1
|
||||
!
|
||||
interface Ethernet3/1
|
||||
!
|
||||
interface Ethernet4/1
|
||||
!
|
||||
interface Ethernet5/1
|
||||
!
|
||||
interface Ethernet6/1
|
||||
!
|
||||
interface Ethernet7/1
|
||||
!
|
||||
interface Ethernet8/1
|
||||
!
|
||||
interface Ethernet9/1
|
||||
shutdown
|
||||
speed forced 10000full
|
||||
!
|
||||
interface Ethernet9/2
|
||||
shutdown
|
||||
!
|
||||
interface Ethernet9/3
|
||||
speed forced 10000full
|
||||
switchport access vlan 2604
|
||||
!
|
||||
interface Ethernet9/4
|
||||
shutdown
|
||||
!
|
||||
interface Ethernet10/1
|
||||
!
|
||||
interface Ethernet10/2
|
||||
shutdown
|
||||
!
|
||||
interface Ethernet10/4
|
||||
shutdown
|
||||
!
|
||||
interface Ethernet11/1
|
||||
!
|
||||
interface Ethernet12/1
|
||||
!
|
||||
interface Ethernet13/1
|
||||
!
|
||||
interface Ethernet14/1
|
||||
!
|
||||
interface Ethernet15/1
|
||||
!
|
||||
interface Ethernet16/1
|
||||
!
|
||||
interface Ethernet17/1
|
||||
!
|
||||
interface Ethernet18/1
|
||||
!
|
||||
interface Ethernet19/1
|
||||
!
|
||||
interface Ethernet20/1
|
||||
!
|
||||
interface Ethernet21/1
|
||||
!
|
||||
interface Ethernet22/1
|
||||
!
|
||||
interface Ethernet23/1
|
||||
!
|
||||
interface Ethernet24/1
|
||||
!
|
||||
interface Ethernet25/1
|
||||
!
|
||||
interface Ethernet26/1
|
||||
!
|
||||
interface Ethernet27/1
|
||||
!
|
||||
interface Ethernet28/1
|
||||
!
|
||||
interface Ethernet29/1
|
||||
no switchport
|
||||
!
|
||||
interface Ethernet30/1
|
||||
load-interval 1
|
||||
mtu 9190
|
||||
no switchport
|
||||
ip address 198.19.17.10/31
|
||||
ip ospf cost 10
|
||||
ip ospf network point-to-point
|
||||
ip ospf area 0.0.0.0
|
||||
!
|
||||
interface Ethernet31/1
|
||||
load-interval 1
|
||||
mtu 9190
|
||||
no switchport
|
||||
ip address 198.19.17.3/31
|
||||
ip ospf cost 1000
|
||||
ip ospf network point-to-point
|
||||
ip ospf area 0.0.0.0
|
||||
!
|
||||
interface Ethernet32/1
|
||||
load-interval 1
|
||||
mtu 9190
|
||||
no switchport
|
||||
ip address 198.19.17.5/31
|
||||
ip ospf cost 1000
|
||||
ip ospf network point-to-point
|
||||
ip ospf area 0.0.0.0
|
||||
!
|
||||
interface Loopback0
|
||||
ip address 198.19.16.2/32
|
||||
ip ospf area 0.0.0.0
|
||||
!
|
||||
interface Loopback1
|
||||
ip address 198.19.18.2/32
|
||||
!
|
||||
interface Management1
|
||||
ip address dhcp
|
||||
!
|
||||
interface Vxlan1
|
||||
vxlan source-interface Loopback1
|
||||
vxlan udp-port 4789
|
||||
vxlan vlan 2604 vni 2604
|
||||
!
|
||||
ip routing
|
||||
!
|
||||
ip route 0.0.0.0/0 Management1 10.75.8.1
|
||||
!
|
||||
router bgp 65500
|
||||
neighbor evpn peer group
|
||||
neighbor evpn remote-as 65500
|
||||
neighbor evpn update-source Loopback0
|
||||
neighbor evpn ebgp-multihop 3
|
||||
neighbor evpn send-community extended
|
||||
neighbor evpn maximum-routes 12000 warning-only
|
||||
neighbor 198.19.16.0 peer group evpn
|
||||
neighbor 198.19.16.1 peer group evpn
|
||||
!
|
||||
vlan 2604
|
||||
rd 65500:2604
|
||||
route-target both 65500:2604
|
||||
redistribute learned
|
||||
!
|
||||
address-family evpn
|
||||
neighbor evpn activate
|
||||
!
|
||||
router ospf 65500
|
||||
router-id 198.19.16.2
|
||||
redistribute connected
|
||||
network 198.19.0.0/16 area 0.0.0.0
|
||||
max-lsa 12000
|
||||
!
|
||||
end
|
90
static/assets/frys-ix/equinix.conf
Normal file
90
static/assets/frys-ix/equinix.conf
Normal file
@ -0,0 +1,90 @@
|
||||
set / interface ethernet-1/1 admin-state disable
|
||||
set / interface ethernet-1/9 admin-state enable
|
||||
set / interface ethernet-1/9 breakout-mode num-breakout-ports 4
|
||||
set / interface ethernet-1/9 breakout-mode breakout-port-speed 10G
|
||||
set / interface ethernet-1/9/3 admin-state enable
|
||||
set / interface ethernet-1/9/3 vlan-tagging true
|
||||
set / interface ethernet-1/9/3 subinterface 0 type bridged
|
||||
set / interface ethernet-1/9/3 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/9/3 subinterface 0 vlan encap untagged
|
||||
set / interface ethernet-1/29 admin-state enable
|
||||
set / interface ethernet-1/29 subinterface 0 type routed
|
||||
set / interface ethernet-1/29 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/29 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/29 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/29 subinterface 0 ipv4 address 198.19.17.0/31
|
||||
set / interface ethernet-1/29 subinterface 0 ipv6 admin-state enable
|
||||
set / interface lo0 admin-state enable
|
||||
set / interface lo0 subinterface 0 admin-state enable
|
||||
set / interface lo0 subinterface 0 ipv4 admin-state enable
|
||||
set / interface lo0 subinterface 0 ipv4 address 198.19.16.0/32
|
||||
set / interface mgmt0 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 ipv4 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 ipv4 dhcp-client
|
||||
set / interface mgmt0 subinterface 0 ipv6 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 ipv6 dhcp-client
|
||||
set / interface system0 admin-state enable
|
||||
set / interface system0 subinterface 0 admin-state enable
|
||||
set / interface system0 subinterface 0 ipv4 admin-state enable
|
||||
set / interface system0 subinterface 0 ipv4 address 198.19.18.0/32
|
||||
set / network-instance default type default
|
||||
set / network-instance default admin-state enable
|
||||
set / network-instance default description "fabric: dc2 role: spine"
|
||||
set / network-instance default router-id 198.19.16.0
|
||||
set / network-instance default ip-forwarding receive-ipv4-check false
|
||||
set / network-instance default interface ethernet-1/29.0
|
||||
set / network-instance default interface lo0.0
|
||||
set / network-instance default interface system0.0
|
||||
set / network-instance default protocols bgp admin-state enable
|
||||
set / network-instance default protocols bgp autonomous-system 65500
|
||||
set / network-instance default protocols bgp router-id 198.19.16.0
|
||||
set / network-instance default protocols bgp dynamic-neighbors accept match 198.19.16.0/24 peer-group overlay
|
||||
set / network-instance default protocols bgp afi-safi evpn admin-state enable
|
||||
set / network-instance default protocols bgp preference ibgp 170
|
||||
set / network-instance default protocols bgp route-advertisement rapid-withdrawal true
|
||||
set / network-instance default protocols bgp route-advertisement wait-for-fib-install false
|
||||
set / network-instance default protocols bgp group overlay peer-as 65500
|
||||
set / network-instance default protocols bgp group overlay afi-safi evpn admin-state enable
|
||||
set / network-instance default protocols bgp group overlay afi-safi ipv4-unicast admin-state disable
|
||||
set / network-instance default protocols bgp group overlay local-as as-number 65500
|
||||
set / network-instance default protocols bgp group overlay route-reflector client true
|
||||
set / network-instance default protocols bgp group overlay transport local-address 198.19.16.0
|
||||
set / network-instance default protocols bgp neighbor 198.19.16.1 admin-state enable
|
||||
set / network-instance default protocols bgp neighbor 198.19.16.1 peer-group overlay
|
||||
set / network-instance default protocols ospf instance default admin-state enable
|
||||
set / network-instance default protocols ospf instance default version ospf-v2
|
||||
set / network-instance default protocols ospf instance default router-id 198.19.16.0
|
||||
set / network-instance default protocols ospf instance default export-policy ospf
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface ethernet-1/29.0 interface-type point-to-point
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface lo0.0
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface system0.0
|
||||
set / network-instance mgmt type ip-vrf
|
||||
set / network-instance mgmt admin-state enable
|
||||
set / network-instance mgmt description "Management network instance"
|
||||
set / network-instance mgmt interface mgmt0.0
|
||||
set / network-instance mgmt protocols linux import-routes true
|
||||
set / network-instance mgmt protocols linux export-routes true
|
||||
set / network-instance mgmt protocols linux export-neighbors true
|
||||
set / network-instance peeringlan type mac-vrf
|
||||
set / network-instance peeringlan admin-state enable
|
||||
set / network-instance peeringlan interface ethernet-1/9/3.0
|
||||
set / network-instance peeringlan vxlan-interface vxlan1.2604
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 admin-state enable
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 vxlan-interface vxlan1.2604
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 evi 2604
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 routes bridge-table mac-ip advertise true
|
||||
set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-distinguisher rd 65500:2604
|
||||
set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target export-rt target:65500:2604
|
||||
set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target import-rt target:65500:2604
|
||||
set / network-instance peeringlan bridge-table proxy-arp admin-state enable
|
||||
set / network-instance peeringlan bridge-table proxy-arp dynamic-learning admin-state enable
|
||||
set / network-instance peeringlan bridge-table proxy-arp dynamic-learning age-time 600
|
||||
set / network-instance peeringlan bridge-table proxy-arp dynamic-learning send-refresh 180
|
||||
set / routing-policy policy ospf statement 100 match protocol host
|
||||
set / routing-policy policy ospf statement 100 action policy-result accept
|
||||
set / routing-policy policy ospf statement 200 match protocol ospfv2
|
||||
set / routing-policy policy ospf statement 200 action policy-result accept
|
||||
set / tunnel-interface vxlan1 vxlan-interface 2604 type bridged
|
||||
set / tunnel-interface vxlan1 vxlan-interface 2604 ingress vni 2604
|
||||
set / tunnel-interface vxlan1 vxlan-interface 2604 egress source-ip use-system-ipv4-address
|
BIN
static/assets/frys-ix/frysix-logo-small.png
(Stored with Git LFS)
Normal file
BIN
static/assets/frys-ix/frysix-logo-small.png
(Stored with Git LFS)
Normal file
Binary file not shown.
132
static/assets/frys-ix/nikhef.conf
Normal file
132
static/assets/frys-ix/nikhef.conf
Normal file
@ -0,0 +1,132 @@
|
||||
set / interface ethernet-1/1 admin-state enable
|
||||
set / interface ethernet-1/1 ethernet forward-error-correction fec-option rs-528
|
||||
set / interface ethernet-1/1 subinterface 0 type routed
|
||||
set / interface ethernet-1/1 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/1 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/1 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/1 subinterface 0 ipv4 address 198.19.17.2/31
|
||||
set / interface ethernet-1/1 subinterface 0 ipv6 admin-state enable
|
||||
set / interface ethernet-1/2 admin-state enable
|
||||
set / interface ethernet-1/2 ethernet forward-error-correction fec-option rs-528
|
||||
set / interface ethernet-1/2 subinterface 0 type routed
|
||||
set / interface ethernet-1/2 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/2 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/2 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/2 subinterface 0 ipv4 address 198.19.17.4/31
|
||||
set / interface ethernet-1/2 subinterface 0 ipv6 admin-state enable
|
||||
set / interface ethernet-1/3 admin-state enable
|
||||
set / interface ethernet-1/3 ethernet forward-error-correction fec-option rs-528
|
||||
set / interface ethernet-1/3 subinterface 0 type routed
|
||||
set / interface ethernet-1/3 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/3 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/3 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/3 subinterface 0 ipv4 address 198.19.17.6/31
|
||||
set / interface ethernet-1/3 subinterface 0 ipv6 admin-state enable
|
||||
set / interface ethernet-1/4 admin-state enable
|
||||
set / interface ethernet-1/4 ethernet forward-error-correction fec-option rs-528
|
||||
set / interface ethernet-1/4 subinterface 0 type routed
|
||||
set / interface ethernet-1/4 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/4 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/4 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/4 subinterface 0 ipv4 address 198.19.17.8/31
|
||||
set / interface ethernet-1/4 subinterface 0 ipv6 admin-state enable
|
||||
set / interface ethernet-1/9 admin-state enable
|
||||
set / interface ethernet-1/9 breakout-mode num-breakout-ports 4
|
||||
set / interface ethernet-1/9 breakout-mode breakout-port-speed 10G
|
||||
set / interface ethernet-1/9/1 admin-state disable
|
||||
set / interface ethernet-1/9/2 admin-state disable
|
||||
set / interface ethernet-1/9/3 admin-state enable
|
||||
set / interface ethernet-1/9/3 vlan-tagging true
|
||||
set / interface ethernet-1/9/3 subinterface 0 type bridged
|
||||
set / interface ethernet-1/9/3 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/9/3 subinterface 0 vlan encap untagged
|
||||
set / interface ethernet-1/9/4 admin-state disable
|
||||
set / interface ethernet-1/29 admin-state enable
|
||||
set / interface ethernet-1/29 subinterface 0 type routed
|
||||
set / interface ethernet-1/29 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/29 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/29 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/29 subinterface 0 ipv4 address 198.19.17.1/31
|
||||
set / interface ethernet-1/29 subinterface 0 ipv6 admin-state enable
|
||||
set / interface lo0 admin-state enable
|
||||
set / interface lo0 subinterface 0 admin-state enable
|
||||
set / interface lo0 subinterface 0 ipv4 admin-state enable
|
||||
set / interface lo0 subinterface 0 ipv4 address 198.19.16.1/32
|
||||
set / interface mgmt0 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 ipv4 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 ipv4 dhcp-client
|
||||
set / interface mgmt0 subinterface 0 ipv6 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 ipv6 dhcp-client
|
||||
set / interface system0 admin-state enable
|
||||
set / interface system0 subinterface 0 admin-state enable
|
||||
set / interface system0 subinterface 0 ipv4 admin-state enable
|
||||
set / interface system0 subinterface 0 ipv4 address 198.19.18.1/32
|
||||
set / network-instance default type default
|
||||
set / network-instance default admin-state enable
|
||||
set / network-instance default description "fabric: dc1 role: spine"
|
||||
set / network-instance default router-id 198.19.16.1
|
||||
set / network-instance default ip-forwarding receive-ipv4-check false
|
||||
set / network-instance default interface ethernet-1/1.0
|
||||
set / network-instance default interface ethernet-1/2.0
|
||||
set / network-instance default interface ethernet-1/29.0
|
||||
set / network-instance default interface ethernet-1/3.0
|
||||
set / network-instance default interface ethernet-1/4.0
|
||||
set / network-instance default interface lo0.0
|
||||
set / network-instance default interface system0.0
|
||||
set / network-instance default protocols bgp admin-state enable
|
||||
set / network-instance default protocols bgp autonomous-system 65500
|
||||
set / network-instance default protocols bgp router-id 198.19.16.1
|
||||
set / network-instance default protocols bgp dynamic-neighbors accept match 198.19.16.0/24 peer-group overlay
|
||||
set / network-instance default protocols bgp afi-safi evpn admin-state enable
|
||||
set / network-instance default protocols bgp preference ibgp 170
|
||||
set / network-instance default protocols bgp route-advertisement rapid-withdrawal true
|
||||
set / network-instance default protocols bgp route-advertisement wait-for-fib-install false
|
||||
set / network-instance default protocols bgp group overlay peer-as 65500
|
||||
set / network-instance default protocols bgp group overlay afi-safi evpn admin-state enable
|
||||
set / network-instance default protocols bgp group overlay afi-safi ipv4-unicast admin-state disable
|
||||
set / network-instance default protocols bgp group overlay local-as as-number 65500
|
||||
set / network-instance default protocols bgp group overlay route-reflector client true
|
||||
set / network-instance default protocols bgp group overlay transport local-address 198.19.16.1
|
||||
set / network-instance default protocols bgp neighbor 198.19.16.0 admin-state enable
|
||||
set / network-instance default protocols bgp neighbor 198.19.16.0 peer-group overlay
|
||||
set / network-instance default protocols ospf instance default admin-state enable
|
||||
set / network-instance default protocols ospf instance default version ospf-v2
|
||||
set / network-instance default protocols ospf instance default router-id 198.19.16.1
|
||||
set / network-instance default protocols ospf instance default export-policy ospf
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface ethernet-1/1.0 interface-type point-to-point
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface ethernet-1/2.0 interface-type point-to-point
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface ethernet-1/3.0 interface-type point-to-point
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface ethernet-1/4.0 interface-type point-to-point
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface ethernet-1/29.0 interface-type point-to-point
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface lo0.0 passive true
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface system0.0
|
||||
set / network-instance mgmt type ip-vrf
|
||||
set / network-instance mgmt admin-state enable
|
||||
set / network-instance mgmt description "Management network instance"
|
||||
set / network-instance mgmt interface mgmt0.0
|
||||
set / network-instance mgmt protocols linux import-routes true
|
||||
set / network-instance mgmt protocols linux export-routes true
|
||||
set / network-instance mgmt protocols linux export-neighbors true
|
||||
set / network-instance peeringlan type mac-vrf
|
||||
set / network-instance peeringlan admin-state enable
|
||||
set / network-instance peeringlan interface ethernet-1/9/3.0
|
||||
set / network-instance peeringlan vxlan-interface vxlan1.2604
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 admin-state enable
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 vxlan-interface vxlan1.2604
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 evi 2604
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 routes bridge-table mac-ip advertise true
|
||||
set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-distinguisher rd 65500:2604
|
||||
set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target export-rt target:65500:2604
|
||||
set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target import-rt target:65500:2604
|
||||
set / network-instance peeringlan bridge-table proxy-arp admin-state enable
|
||||
set / network-instance peeringlan bridge-table proxy-arp dynamic-learning admin-state enable
|
||||
set / network-instance peeringlan bridge-table proxy-arp dynamic-learning age-time 600
|
||||
set / network-instance peeringlan bridge-table proxy-arp dynamic-learning send-refresh 180
|
||||
set / routing-policy policy ospf statement 100 match protocol host
|
||||
set / routing-policy policy ospf statement 100 action policy-result accept
|
||||
set / routing-policy policy ospf statement 200 match protocol ospfv2
|
||||
set / routing-policy policy ospf statement 200 action policy-result accept
|
||||
set / tunnel-interface vxlan1 vxlan-interface 2604 type bridged
|
||||
set / tunnel-interface vxlan1 vxlan-interface 2604 ingress vni 2604
|
||||
set / tunnel-interface vxlan1 vxlan-interface 2604 egress source-ip use-system-ipv4-address
|
BIN
static/assets/frys-ix/nokia-7220-d2.png
(Stored with Git LFS)
Normal file
BIN
static/assets/frys-ix/nokia-7220-d2.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/frys-ix/nokia-7220-d4.png
(Stored with Git LFS)
Normal file
BIN
static/assets/frys-ix/nokia-7220-d4.png
(Stored with Git LFS)
Normal file
Binary file not shown.
105
static/assets/frys-ix/nokia-leaf.conf
Normal file
105
static/assets/frys-ix/nokia-leaf.conf
Normal file
@ -0,0 +1,105 @@
|
||||
set / interface ethernet-1/9 admin-state enable
|
||||
set / interface ethernet-1/9 vlan-tagging true
|
||||
set / interface ethernet-1/9 ethernet port-speed 10G
|
||||
set / interface ethernet-1/9 subinterface 0 type bridged
|
||||
set / interface ethernet-1/9 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/9 subinterface 0 vlan encap untagged
|
||||
set / interface ethernet-1/53 admin-state enable
|
||||
set / interface ethernet-1/53 ethernet forward-error-correction fec-option rs-528
|
||||
set / interface ethernet-1/53 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/53 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/53 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/53 subinterface 0 ipv4 address 198.19.17.11/31
|
||||
set / interface ethernet-1/53 subinterface 0 ipv6 admin-state enable
|
||||
set / interface ethernet-1/55 admin-state enable
|
||||
set / interface ethernet-1/55 ethernet forward-error-correction fec-option rs-528
|
||||
set / interface ethernet-1/55 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/55 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/55 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/55 subinterface 0 ipv4 address 198.19.17.7/31
|
||||
set / interface ethernet-1/55 subinterface 0 ipv6 admin-state enable
|
||||
set / interface ethernet-1/56 admin-state enable
|
||||
set / interface ethernet-1/56 ethernet forward-error-correction fec-option rs-528
|
||||
set / interface ethernet-1/56 subinterface 0 admin-state enable
|
||||
set / interface ethernet-1/56 subinterface 0 ip-mtu 9190
|
||||
set / interface ethernet-1/56 subinterface 0 ipv4 admin-state enable
|
||||
set / interface ethernet-1/56 subinterface 0 ipv4 address 198.19.17.9/31
|
||||
set / interface ethernet-1/56 subinterface 0 ipv6 admin-state enable
|
||||
set / interface lo0 admin-state enable
|
||||
set / interface lo0 subinterface 0 admin-state enable
|
||||
set / interface lo0 subinterface 0 ipv4 admin-state enable
|
||||
set / interface lo0 subinterface 0 ipv4 address 198.19.16.3/32
|
||||
set / interface mgmt0 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 ipv4 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 ipv4 dhcp-client
|
||||
set / interface mgmt0 subinterface 0 ipv6 admin-state enable
|
||||
set / interface mgmt0 subinterface 0 ipv6 dhcp-client
|
||||
set / interface system0 admin-state enable
|
||||
set / interface system0 subinterface 0 admin-state enable
|
||||
set / interface system0 subinterface 0 ipv4 admin-state enable
|
||||
set / interface system0 subinterface 0 ipv4 address 198.19.18.3/32
|
||||
set / network-instance default type default
|
||||
set / network-instance default admin-state enable
|
||||
set / network-instance default description "fabric: dc1 role: leaf"
|
||||
set / network-instance default router-id 198.19.16.3
|
||||
set / network-instance default ip-forwarding receive-ipv4-check false
|
||||
set / network-instance default interface ethernet-1/53.0
|
||||
set / network-instance default interface ethernet-1/55.0
|
||||
set / network-instance default interface ethernet-1/56.0
|
||||
set / network-instance default interface lo0.0
|
||||
set / network-instance default interface system0.0
|
||||
set / network-instance default protocols bgp admin-state enable
|
||||
set / network-instance default protocols bgp autonomous-system 65500
|
||||
set / network-instance default protocols bgp router-id 198.19.16.3
|
||||
set / network-instance default protocols bgp afi-safi evpn admin-state enable
|
||||
set / network-instance default protocols bgp preference ibgp 170
|
||||
set / network-instance default protocols bgp route-advertisement rapid-withdrawal true
|
||||
set / network-instance default protocols bgp route-advertisement wait-for-fib-install false
|
||||
set / network-instance default protocols bgp group overlay peer-as 65500
|
||||
set / network-instance default protocols bgp group overlay afi-safi evpn admin-state enable
|
||||
set / network-instance default protocols bgp group overlay afi-safi ipv4-unicast admin-state disable
|
||||
set / network-instance default protocols bgp group overlay local-as as-number 65500
|
||||
set / network-instance default protocols bgp group overlay transport local-address 198.19.16.3
|
||||
set / network-instance default protocols bgp neighbor 198.19.16.0 admin-state enable
|
||||
set / network-instance default protocols bgp neighbor 198.19.16.0 peer-group overlay
|
||||
set / network-instance default protocols bgp neighbor 198.19.16.1 admin-state enable
|
||||
set / network-instance default protocols bgp neighbor 198.19.16.1 peer-group overlay
|
||||
set / network-instance default protocols ospf instance default admin-state enable
|
||||
set / network-instance default protocols ospf instance default version ospf-v2
|
||||
set / network-instance default protocols ospf instance default router-id 198.19.16.3
|
||||
set / network-instance default protocols ospf instance default export-policy ospf
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface ethernet-1/53.0 interface-type point-to-point
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface ethernet-1/55.0 interface-type point-to-point
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface ethernet-1/56.0 interface-type point-to-point
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface lo0.0 passive true
|
||||
set / network-instance default protocols ospf instance default area 0.0.0.0 interface system0.0
|
||||
set / network-instance mgmt type ip-vrf
|
||||
set / network-instance mgmt admin-state enable
|
||||
set / network-instance mgmt description "Management network instance"
|
||||
set / network-instance mgmt interface mgmt0.0
|
||||
set / network-instance mgmt protocols linux import-routes true
|
||||
set / network-instance mgmt protocols linux export-routes true
|
||||
set / network-instance mgmt protocols linux export-neighbors true
|
||||
set / network-instance peeringlan type mac-vrf
|
||||
set / network-instance peeringlan admin-state enable
|
||||
set / network-instance peeringlan interface ethernet-1/9.0
|
||||
set / network-instance peeringlan vxlan-interface vxlan1.2604
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 admin-state enable
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 vxlan-interface vxlan1.2604
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 evi 2604
|
||||
set / network-instance peeringlan protocols bgp-evpn bgp-instance 1 routes bridge-table mac-ip advertise true
|
||||
set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-distinguisher rd 65500:2604
|
||||
set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target export-rt target:65500:2604
|
||||
set / network-instance peeringlan protocols bgp-vpn bgp-instance 1 route-target import-rt target:65500:2604
|
||||
set / network-instance peeringlan bridge-table proxy-arp admin-state enable
|
||||
set / network-instance peeringlan bridge-table proxy-arp dynamic-learning admin-state enable
|
||||
set / network-instance peeringlan bridge-table proxy-arp dynamic-learning age-time 600
|
||||
set / network-instance peeringlan bridge-table proxy-arp dynamic-learning send-refresh 180
|
||||
set / routing-policy policy ospf statement 100 match protocol host
|
||||
set / routing-policy policy ospf statement 100 action policy-result accept
|
||||
set / routing-policy policy ospf statement 200 match protocol ospfv2
|
||||
set / routing-policy policy ospf statement 200 action policy-result accept
|
||||
set / tunnel-interface vxlan1 vxlan-interface 2604 type bridged
|
||||
set / tunnel-interface vxlan1 vxlan-interface 2604 ingress vni 2604
|
||||
set / tunnel-interface vxlan1 vxlan-interface 2604 egress source-ip use-system-ipv4-address
|
BIN
static/assets/sflow/sflow-all.pcap
Normal file
BIN
static/assets/sflow/sflow-all.pcap
Normal file
Binary file not shown.
BIN
static/assets/sflow/sflow-host.pcap
Normal file
BIN
static/assets/sflow/sflow-host.pcap
Normal file
Binary file not shown.
BIN
static/assets/sflow/sflow-interface.pcap
Normal file
BIN
static/assets/sflow/sflow-interface.pcap
Normal file
Binary file not shown.
BIN
static/assets/sflow/sflow-lab-trex.png
(Stored with Git LFS)
Normal file
BIN
static/assets/sflow/sflow-lab-trex.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/sflow/sflow-lab.png
(Stored with Git LFS)
Normal file
BIN
static/assets/sflow/sflow-lab.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/sflow/sflow-overview.png
(Stored with Git LFS)
Normal file
BIN
static/assets/sflow/sflow-overview.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/sflow/sflow-vpp-overview.png
(Stored with Git LFS)
Normal file
BIN
static/assets/sflow/sflow-vpp-overview.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/sflow/sflow-wireshark.png
(Stored with Git LFS)
Normal file
BIN
static/assets/sflow/sflow-wireshark.png
(Stored with Git LFS)
Normal file
Binary file not shown.
@ -3,9 +3,9 @@
|
||||
# OpenBSD bastion
|
||||
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAXMfDOJtI3JztcPJ1DZMXzILZzMilMvodvMIfqqa1qr pim+openbsd@ipng.ch
|
||||
|
||||
# Macbook M2 Air (Secretive)
|
||||
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBOqcEzDb0ZmHl3s++rnxjOcoAeZKy5EkVU6WdChXLj8SuthjCinOTSMXy7k0PnxWejSST1KHxJ3nBbvpboGMwH8= pim+m2air@ipng.ch
|
||||
|
||||
# Mac Studio (Secretive)
|
||||
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBMtJZgTDWxBEbQ2vPYtOw4L0s4VRKUUjpu6aFPVx3CpqrjLpyJIxzBWTfb/VnEp95VfgM8IUAYYM8w7xoLd7QZc= pim+studio@ipng.ch
|
||||
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBMtJZgTDWxBEbQ2vPYtOw4L0s4VRKUUjpu6aFPVx3CpqrjLpyJIxzBWTfb/VnEp95VfgM8IUAYYM8w7xoLd7QZc= pim+jessica+secretive@ipng.ch
|
||||
|
||||
# Macbook Air M4 (Secretive)
|
||||
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBASymGKXfKkfsYbo7UDrIBxl1F6X7LmVPQ3XOFOKp8tLI6zLyCYs5zgRNs/qksHOgKUK+fE/TzJ4XJsuSbYNMB0= pim+tammy+secretive@ipng.ch
|
||||
|
||||
|
@ -17,6 +17,7 @@ $text-very-light: #767676;
|
||||
$medium-light-text: #4f4a5f;
|
||||
$code-background: #f3f3f3;
|
||||
$codeblock-background: #f6f8fa;
|
||||
$codeblock-text: #99a;
|
||||
$code-text: #f8f8f2;
|
||||
$ipng-orange: #f46524;
|
||||
$ipng-darkorange: #8c1919;
|
||||
@ -142,7 +143,7 @@ pre {
|
||||
|
||||
code {
|
||||
background-color: transparent;
|
||||
color: #444;
|
||||
color: $codeblock-text;
|
||||
}
|
||||
}
|
||||
|
||||
|
Reference in New Issue
Block a user