Add a proposal article for eVPN/VxLAN in VPP
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
375
content/articles/2025-07-12-vpp-evpn-1.md
Normal file
375
content/articles/2025-07-12-vpp-evpn-1.md
Normal file
@ -0,0 +1,375 @@
|
|||||||
|
---
|
||||||
|
date: "2025-07-12T10:07:23Z"
|
||||||
|
title: 'VPP and eVPN/VxLAN - Part 1'
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image width="6em" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
You know what would be really cool? If VPP could be an eVPN/VxLAN speaker! Sometimes I feel like I'm
|
||||||
|
the very last on the planet to learn about something cool. My latest "A-Ha!"-moment was when I was
|
||||||
|
configuring the eVPN fabric for [[Frys-IX](https://frys-ix.net/)], and I wrote up an article about
|
||||||
|
it [[here]({<< ref 2025-04-09-frysix-evpn >>})] back in April.
|
||||||
|
|
||||||
|
I can build the equivalent of Virtual Private Wires (VPWS), also called L2VPN or Virtual Leased
|
||||||
|
Lines, and these are straight forward because they typically only have two endpoints. A "regular"
|
||||||
|
VxLAN tunnel which is L2 cross connected with another interface already does that just fine. Take a
|
||||||
|
look at an article on [[L2 Gymnastics]({<< ref 2022-01-12-vpp-l2 >>})] for that. But the real kicker
|
||||||
|
is that I can also create multi-site L2 domains like Virtual Private LAN Services (VPLS) or also
|
||||||
|
called Virtual Private Ethernet, L2VPN or Ethernet LAN Service (E-LAN). And *that* is a whole other
|
||||||
|
level of awesome.
|
||||||
|
|
||||||
|
## Recap: VPP today
|
||||||
|
|
||||||
|
### VPP: VxLAN
|
||||||
|
|
||||||
|
The current VPP VxLAN tunnel plugin does point to point tunnels, that is they are configured with a
|
||||||
|
source address, destination address, destination port and VNI. As I mentioned, a point to point
|
||||||
|
ethernet transport is configured very easily:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.254 vni 8298 instance 0
|
||||||
|
vpp0# set int l2 xconnect vxlan_tunnel0 HundredGigabitEthernet10/0/0
|
||||||
|
vpp0# set int l2 xconnect HundredGigabitEthernet10/0/0 vxlan_tunnel0
|
||||||
|
vpp0# set int state vxlan_tunnel0 up
|
||||||
|
vpp0# set int state HundredGigabitEthernet10/0/0 up
|
||||||
|
|
||||||
|
vpp1# create vxlan tunnel src 192.0.2.254 dst 192.0.2.1 vni 8298 instance 0
|
||||||
|
vpp1# set int l2 xconnect vxlan_tunnel0 HundredGigabitEthernet10/0/1
|
||||||
|
vpp1# set int l2 xconnect HundredGigabitEthernet10/0/1 vxlan_tunnel0
|
||||||
|
vpp1# set int state vxlan_tunnel0 up
|
||||||
|
vpp1# set int state HundredGigabitEthernet10/0/1 up
|
||||||
|
```
|
||||||
|
|
||||||
|
And with that, `vpp0:Hu10/0/0` is cross connected with `vpp1:Hu10/0/1` and ethernet flows between
|
||||||
|
the two.
|
||||||
|
|
||||||
|
### VPP: Bridge Domains
|
||||||
|
|
||||||
|
Now consider a VPLS with five different routers. While it's possible to create a bridge-domain and add
|
||||||
|
some local ports and four other VxLAN tunnels:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp0# create bridge-domain 8298
|
||||||
|
vpp0# set int l2 bridge HundredGigabitEthernet10/0/1 8298
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.2 vni 8298 instance 0
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.3 vni 8298 instance 1
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.4 vni 8298 instance 2
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.5 vni 8298 instance 3
|
||||||
|
vpp0# set int l2 bridge vxlan_tunnel0 8298
|
||||||
|
vpp0# set int l2 bridge vxlan_tunnel1 8298
|
||||||
|
vpp0# set int l2 bridge vxlan_tunnel2 8298
|
||||||
|
vpp0# set int l2 bridge vxlan_tunnel3 8298
|
||||||
|
```
|
||||||
|
|
||||||
|
I have to replicate this configuration to all other `vpp1`-`vpp4` routers. While it does work, it's
|
||||||
|
really not very practical. When other VPP instances get added to a VPLS, every other router will
|
||||||
|
have to have a new VxLAN tunnel created and added to its local bridge domain. Consider 1000s of VPLS
|
||||||
|
instances on 100s of routers, it would yield ~100'000 VxLAN tunnels on every router, yikes!
|
||||||
|
|
||||||
|
Such a configuration reminds me in a way of iBGP in a large network: the naive approach is to have a
|
||||||
|
full mesh of all routers speaking to all other routers, but that quickly becomes a maintenance
|
||||||
|
headache. The canonical solution for this is to create iBGP _Route Reflectors_ to which every router
|
||||||
|
connects, and their job is to redistribute routing information between the fleet of routers. THis
|
||||||
|
turns the iBGP problem from an O(N^2) to an O(N) problem: all 1'000 routers connect to, say, three
|
||||||
|
regional route reflectors for a total of 3'000 BGP connections, which is much better than ~1'000'000
|
||||||
|
BGP connections in the naive approach.
|
||||||
|
|
||||||
|
## Recap: eVPN Moving parts
|
||||||
|
|
||||||
|
The reason why I got so enthusiastic when I was playing with Arista and Nokia's eVPN stuff, is
|
||||||
|
because it requires very little dataplane configuration, and a relatively intuitive controlplane
|
||||||
|
configuration:
|
||||||
|
|
||||||
|
1. **Dataplane**: For each L2 broadcast domain (be it a L2XC or a Bridge Domain), really all I
|
||||||
|
need is a single VxLAN interface with a given VNI, which should be able to send encapsulated
|
||||||
|
ethernet frames to one more more other speakers in the same domain.
|
||||||
|
1. **Controlplane**: I will need to learn MAC addresses locally, and inform some BGP eVPN
|
||||||
|
implementation of who-lives-where. Other VxLAN speakers learn of the MAC addresses I own, and
|
||||||
|
will send me encapsulated ethernet for those addresses
|
||||||
|
1. **Dataplane**: For unknown layer2 destinations, like _Broadcast_, _Unknown Unicast_, and
|
||||||
|
_Multicast_ (BUM) traffic, I will want to keep track of which other VxLAN speakers these
|
||||||
|
packets should be flooded. I make note that this is not that different to flooding the packets
|
||||||
|
to local interfaces, except here it'd be flooding them to remote VxLAN endpoints.
|
||||||
|
1. **ControlPlane**: Flooding L2 traffic across wide area networks is typically considered icky,
|
||||||
|
so a few tricks might be optionally deployed. Since the controlplane already knows which MAC
|
||||||
|
lives where, it may as well also make note of any local IPv6 ARP and IPv6 neighbor discovery
|
||||||
|
replies and teach its peers which IPv4/IPv6 addresses live where: a distributed neighbor table.
|
||||||
|
|
||||||
|
{{< image width="6em" float="left" src="/assets/shared/brain.png" alt="brain" >}}
|
||||||
|
|
||||||
|
For the controlplane parts, [[FRRouting](https://frrouting.org/)] has a working implementation for
|
||||||
|
L2 (MAC-VRF) and L3 (IP-VRF). My favorite, [[Bird](https://birg.nic.cz/)] is slowly catching up, and
|
||||||
|
has a few of these control plane parts set up (mostly MAC-VRF). Commercial vendors like Arista,
|
||||||
|
Nokia, Juniper, Cisco are ready to go. If we want VPP to inter-operate, we may need to make a few
|
||||||
|
changes.
|
||||||
|
|
||||||
|
## VPP: Changes needed
|
||||||
|
|
||||||
|
### Dynamic VxLAN
|
||||||
|
|
||||||
|
I propose two changes to the VxLAN plugin, or perhaps, a new plugin that changes the behavior so that
|
||||||
|
we don't have to break any performance or functional promises to existing users. This new VxLAN
|
||||||
|
interface behavior changes in the following ways:
|
||||||
|
|
||||||
|
1. Each VxLAN interface has a local L2FIB attached to it, the keys are MAC address and the
|
||||||
|
values are remote VTEPs. In its simplest form, the values would be just IPv4 or IPv6 addresses,
|
||||||
|
because I can re-use the VNI and port information from the tunnel definition itself.
|
||||||
|
|
||||||
|
1. Each VxLAN interface has a local flood-list attached to it. This list contains remote VTEPs
|
||||||
|
that I am supposed to send 'flood' packets to. Similar to the Bridge Domain, when packets are marked
|
||||||
|
for flooding, I will need to prepare and replicate them, sending them to each VTEP.
|
||||||
|
|
||||||
|
|
||||||
|
1. A set of APIs will be needed to manipulate these:
|
||||||
|
* ***Interface***: I will need to have an interface create, delete and list call, which will
|
||||||
|
be able to maintain the interfaces, their metadata like source address, source/destination port,
|
||||||
|
VNI and such.
|
||||||
|
* ***L2FIB***: I will need to add, replace, delete, and list which MAC addresses go where,
|
||||||
|
With such a table, each time a packet is handled for a given Dynamic VxLAN interface, the
|
||||||
|
dst_addr can be written into the packet.
|
||||||
|
* ***Flooding***: For those packets that are not unicast (BUM), I will need to be able to add,
|
||||||
|
remove and list which VTEPs should receive this packet.
|
||||||
|
|
||||||
|
It would be pretty dope if the configuration looked something like this:
|
||||||
|
```
|
||||||
|
vpp0# create evpn-vxlan src <v46address> dst-port <port> vni <vni> instance <id>
|
||||||
|
vpp0# evpn-vxlan l2fib interface <iface> mac <mac> dst <v46address> [del]
|
||||||
|
vpp0# evpn-vxlan flood interface <iface> dst <v46address> [del]
|
||||||
|
```
|
||||||
|
|
||||||
|
The VxLAN underlay transport can be either IPv4 or IPv6. Of course manipulating L2FIB or Flood
|
||||||
|
destinations must match the address family of an interface of type evpn-vxlan. A practical example
|
||||||
|
might be:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp0# create evpn-vxlan src 2001:db8::1 dst-port 4789 vni 8298 instance 6
|
||||||
|
vpp0# evpn-vxlan l2fib interface evpn-vxlan0 mac 00:01:02:82:98:02 dst 2001:db8::2
|
||||||
|
vpp0# evpn-vxlan l2fib interface evpn-vxlan0 mac 00:01:02:82:98:03 dst 2001:db8::3
|
||||||
|
vpp0# evpn-vxlan flood interface evpn-vxlan0 dst 2001:db8::2
|
||||||
|
vpp0# evpn-vxlan flood interface evpn-vxlan0 dst 2001:db8::3
|
||||||
|
vpp0# evpn-vxlan flood interface evpn-vxlan0 dst 2001:db8::4
|
||||||
|
```
|
||||||
|
|
||||||
|
By the way, while this _could_ be a new plugin, it could also just be added to the existing VxLAN
|
||||||
|
plugin. One way in which we might do this is to allow for the creation of a normal vxlan tunnel to
|
||||||
|
allow for its destination address to be either 0.0.0.0 for IPv4 or :: for IPv6. That would signal
|
||||||
|
'dynamic' tunneling, upon which the L2FIB and Flood lists are used. It would slow down each VxLAN
|
||||||
|
packet by the time it takes to call `ip46_address_is_zero()` which is only a handfull of clocks.
|
||||||
|
|
||||||
|
### Bridge Domain
|
||||||
|
|
||||||
|
{{< image width="6em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
|
||||||
|
|
||||||
|
It's important to understand that L2 learning is **required** for eVPN to work: each router needs to
|
||||||
|
be able to tell the iBGP eVPN session which MAC addresses should be forwarded to it. This rules out
|
||||||
|
the simple case of L2XC because there, no learning is performed. The corollary is that a
|
||||||
|
bridge-domain is required for any form of eVPN.
|
||||||
|
|
||||||
|
The L2 code in VPP already does most of what I'd need. It maintains an L2FIB in `vnet/l2/l2_fib.c`,
|
||||||
|
which is keyed by bridge-id and MAC address, and its values are a 64 bit structure that points
|
||||||
|
essentially to a sw_if_index output. The L2FIB of the eVPN needs a bit more information though,
|
||||||
|
notably a `ip46address` struct to know which VTEP to send to. It's tempting to add this extra data
|
||||||
|
to the bridge domain code. I would recommend against it, because other implementations, for example
|
||||||
|
MPLS, GENEVE or Carrier Pigeon IP may need more than just the destination address. Even the VxLAN
|
||||||
|
implementation I'm thinking about might want to be able to override other things like the
|
||||||
|
destination port for a given VTEP, or even the VNI. Putting all of this stuff in the bridge-domain
|
||||||
|
code will just clutter it, for all users, not just those users who might want eVPN.
|
||||||
|
|
||||||
|
Similarly, one might argue it is tempting to re-use/extend the behavior in `vnet/l2/l2_flood.c`,
|
||||||
|
because if it's already replicating BUM traffic, why not replicate it many times over the flood list
|
||||||
|
for any member interface that happens to be a dynamic VxLAN interface? This would be a bad idea
|
||||||
|
because of a few reasons. Firstly, it is not guaranteed that the VxLAN plugin is loaded, and in
|
||||||
|
doing this, I would leak internal details of VxLAN into the bridge-domain code. Secondly, the
|
||||||
|
`l2_flood.c` code would potentially get messy if other types were added (like the MPLS and GENEVE
|
||||||
|
above).
|
||||||
|
|
||||||
|
A reasonable request is to mark such BUM frames once in the existing L2 code and when handing the
|
||||||
|
replicated packet into the VxLAN node, to see the `is_bum` marker and once again replicate -- in the
|
||||||
|
vxlan plugin -- these packets to the VTEPs in our local flood-list. Although a bit more work, this
|
||||||
|
approach only requires a tiny amount of work in the `l2_flood.c` code (the marking), and will keep
|
||||||
|
all the logic tucked away where it is relevant, derisking the VPP vnet codebase.
|
||||||
|
|
||||||
|
Fundamentally, I think the cleanest design is to keep the dynamic VxLAN interface fully
|
||||||
|
self-contained and it would therefor maintain its own L2FIB and Flooding logic. The only thing I
|
||||||
|
would add to the L2 codebase is some form of BUM marker to allow for efficient flooding.
|
||||||
|
|
||||||
|
### Control Plane
|
||||||
|
|
||||||
|
There's a few things the control plane has to do. Some external agent, like FRR or Bird, will be
|
||||||
|
receiving a few types of eVPN messages. The ones I'm interested in are:
|
||||||
|
|
||||||
|
* ***Type 2***: MAC/IP Advertisement Route
|
||||||
|
- On the way in, these should be fed to the VxLAN L2FIB belonging to the bridge-domain.
|
||||||
|
- On the way out, learned addresses should be advertised to peers.
|
||||||
|
- Regarding IPv4/IPv6 addresses, that is the ARP / ND tables: we can talk about those later.
|
||||||
|
* ***Type 3***: Inclusive Multicast Ethernet Tag Route
|
||||||
|
- On the way in, these will populate the VxLAN Flood list belonging to the bridge-domain
|
||||||
|
- On the way out, each bridge-domain should advertise itself as IMET to peers.
|
||||||
|
* ***Type 5***: IP Prefix Route
|
||||||
|
- Similar to IP information in Type 2, we can talk about those later once L3VPN/eVPN is
|
||||||
|
needed.
|
||||||
|
|
||||||
|
The 'on the way in' stuff can be easily done with the proposed APIs in the Dynamic VxLAN plugin.
|
||||||
|
Adding, removing, listing L2FIB and Flood lists is easy as far as VPP is concerned. It's just that
|
||||||
|
the controlplane implementation needs to somehow _feed_ the API, so an external program may be
|
||||||
|
needed, or alterntively the Linux Control Plane netlink plugin might be used to consume this
|
||||||
|
information.
|
||||||
|
|
||||||
|
The 'on the way out' stuff is a bit trickier. I will need to listen to creation of new broadcast
|
||||||
|
domains and associate them with the right IMET announcements, and for each MAC address learned, pick
|
||||||
|
them up and advertise them into eVPN. Later, once ARP and ND proxying is a thing, I'll have to
|
||||||
|
revisit the bridge-domain feature to do IPv4 ARP and IPv6 Neighbor Discovery, and replace it with
|
||||||
|
some code that populates the IPv4/IPv6 parts of the Type2 messages on the way out, and similarly on
|
||||||
|
the way in, populates an L3 neighbor cache for the bridge domain, so ARP and ND replies can be
|
||||||
|
synthesized based on what we've learned in eVPN.
|
||||||
|
|
||||||
|
# Demonstration
|
||||||
|
|
||||||
|
### VPP: Current VxLAN
|
||||||
|
|
||||||
|
I'll build a small demo environment on Summer to show how the interaction of VxLAN and Bridge
|
||||||
|
Domain works today:
|
||||||
|
|
||||||
|
```
|
||||||
|
create tap host-if-name dummy0 host-mtu-size 9216 host-ip4-addr 192.0.2.1/24
|
||||||
|
set int state tap0 up
|
||||||
|
set int ip address tap0 192.0.2.1/24
|
||||||
|
set ip neighbor tap0 192.0.2.254 01:02:03:82:98:fe static
|
||||||
|
set ip neighbor tap0 192.0.2.2 01:02:03:82:98:02 static
|
||||||
|
set ip neighbor tap0 192.0.2.3 01:02:03:82:98:03 static
|
||||||
|
|
||||||
|
create vxlan tunnel src 192.0.2.1 dst 192.0.2.254 vni 8298
|
||||||
|
set int state vxlan_tunnel0 up
|
||||||
|
|
||||||
|
create tap host-if-name vpptap0 host-mtu-size 9216 hw-addr 02:fe:64:dc:1b:82
|
||||||
|
set int state tap1 up
|
||||||
|
|
||||||
|
create bridge-domain 8298
|
||||||
|
set int l2 bridge tap1 8298
|
||||||
|
set int l2 bridge vxlan_tunnel0 8298
|
||||||
|
```
|
||||||
|
|
||||||
|
I've created a tap device called `dummy0` and gave it an IPv4 address. Normally, I would use some
|
||||||
|
DPDK or RDMA interface like `TenGigabutEthernet10/0/0`. Then I'll populate some static ARP entries.
|
||||||
|
Again, normally this would just be 'use normal routing'. However, for the purposes of this
|
||||||
|
demonstration, it helps to use a TAP device, as any packets I make VPP send to those 192.0.2.254 and
|
||||||
|
so on, can be captured with `tcpdump` in Linux in addition to `trace add` in VPP.
|
||||||
|
|
||||||
|
Then, I create a VxLAN tunnel with a default destination of 192.0.2.254 and the given VNI.
|
||||||
|
Next, I create a TAP interface called `vpptap0` with the given MAC address.
|
||||||
|
Finally, I bind these two interfaces together in a bridge-domain.
|
||||||
|
|
||||||
|
I proceed to write a small ScaPY program:
|
||||||
|
|
||||||
|
```python
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
|
||||||
|
from scapy.all import Ether, IP, UDP, Raw, sendp
|
||||||
|
|
||||||
|
pkt = Ether(dst="01:02:03:04:05:02", src="02:fe:64:dc:1b:82", type=0x0800)
|
||||||
|
/ IP(src="192.168.1.1", dst="192.168.1.2")
|
||||||
|
/ UDP(sport=8298, dport=7) / Raw(load=b"ping")
|
||||||
|
print(pkt)
|
||||||
|
sendp(pkt, iface="vpptap0")
|
||||||
|
|
||||||
|
pkt = Ether(dst="01:02:03:04:05:03", src="02:fe:64:dc:1b:82", type=0x0800)
|
||||||
|
/ IP(src="192.168.1.1", dst="192.168.1.3")
|
||||||
|
/ UDP(sport=8298, dport=7) / Raw(load=b"ping")
|
||||||
|
print(pkt)
|
||||||
|
sendp(pkt, iface="vpptap0")
|
||||||
|
```
|
||||||
|
|
||||||
|
What will happen is, the ScaPY program will emit these frames into device `vpptap0` which is in
|
||||||
|
bridge-domain 8298. The bridge will learn our src MAC `02:fe:64:dc:1b:82`, and look up the dst MAC
|
||||||
|
`01:02:03:04:05:02`, and because there hasn't been traffic yet, it'll flood to all member ports, one
|
||||||
|
of which is the VxLAN tunnel. VxLAN will then encapsulate the packets to the other side of the
|
||||||
|
tunnel.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ sudo ./vxlan-test.py
|
||||||
|
Ether / IP / UDP 192.168.1.1:8298 > 192.168.1.2:echo / Raw
|
||||||
|
Ether / IP / UDP 192.168.1.1:8298 > 192.168.1.3:echo / Raw
|
||||||
|
|
||||||
|
pim@summer:~$ sudo tcpdump -evni dummy0
|
||||||
|
10:50:35.310620 02:fe:72:52:38:53 > 01:02:03:82:98:fe, ethertype IPv4 (0x0800), length 96:
|
||||||
|
(tos 0x0, ttl 253, id 0, offset 0, flags [none], proto UDP (17), length 82)
|
||||||
|
192.0.2.1.6345 > 192.0.2.254.4789: VXLAN, flags [I] (0x08), vni 8298
|
||||||
|
02:fe:64:dc:1b:82 > 01:02:03:04:05:02, ethertype IPv4 (0x0800), length 46:
|
||||||
|
(tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 32)
|
||||||
|
192.168.1.1.8298 > 192.168.1.2.7: UDP, length 4
|
||||||
|
10:50:35.362552 02:fe:72:52:38:53 > 01:02:03:82:98:fe, ethertype IPv4 (0x0800), length 96:
|
||||||
|
(tos 0x0, ttl 253, id 0, offset 0, flags [none], proto UDP (17), length 82)
|
||||||
|
192.0.2.1.23916 > 192.0.2.254.4789: VXLAN, flags [I] (0x08), vni 8298
|
||||||
|
02:fe:64:dc:1b:82 > 01:02:03:04:05:03, ethertype IPv4 (0x0800), length 46:
|
||||||
|
(tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 32)
|
||||||
|
192.168.1.1.8298 > 192.168.1.3.7: UDP, length 4
|
||||||
|
```
|
||||||
|
|
||||||
|
I want to point out that nothing, so far, has anything to do with my gerrit change, all of this
|
||||||
|
works with upstream VPP just fine. I can see two VxLAN encapsulated packets, both destined to
|
||||||
|
`192.0.2.254:4789`. Cool.
|
||||||
|
|
||||||
|
### Dynamic VPP VxLAN
|
||||||
|
|
||||||
|
I wrote a prototype for a Dynamic VxLAN tunnel in [[43433](https://gerrit.fd.io/r/c/vpp/+/43433)].
|
||||||
|
The good news is, this works. The bad news is, I think I'll want to discuss my proposal (this
|
||||||
|
article) with the community before going further down a potential rabbit hole.
|
||||||
|
|
||||||
|
With my gerrit patched in, I can do the following:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp# vxlan l2fib vxlan_tunnel0 mac 01:02:03:04:05:02 dst 192.0.2.2
|
||||||
|
Added VXLAN dynamic destination for 01:02:03:04:05:02 on vxlan_tunnel0 dst 192.0.2.2
|
||||||
|
vpp# vxlan l2fib vxlan_tunnel0 mac 01:02:03:04:05:03 dst 192.0.2.3
|
||||||
|
Added VXLAN dynamic destination for 01:02:03:04:05:03 on vxlan_tunnel0 dst 192.0.2.3
|
||||||
|
|
||||||
|
vpp# show vxlan l2fib
|
||||||
|
VXLAN Dynamic L2FIB entries:
|
||||||
|
MAC Interface Destination Port VNI
|
||||||
|
01:02:03:04:05:02 vxlan_tunnel0 192.0.2.2 4789 8298
|
||||||
|
01:02:03:04:05:03 vxlan_tunnel0 192.0.2.3 4789 8298
|
||||||
|
Dynamic L2FIB entries: 2
|
||||||
|
```
|
||||||
|
|
||||||
|
I've instructed the VxLAN tunnel to change the tunnel destination based on the destination MAC.
|
||||||
|
|
||||||
|
|
||||||
|
I run the script and tcpdump again:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ sudo tcpdump -evni dummy0
|
||||||
|
11:16:53.834619 02:fe:fe:ae:0d:a3 > 01:02:03:82:98:fe, ethertype IPv4 (0x0800), length 96:
|
||||||
|
(tos 0x0, ttl 253, id 0, offset 0, flags [none], proto UDP (17), length 82, bad cksum 3945 (->3997)!)
|
||||||
|
192.0.2.1.6345 > 192.0.2.2.4789: VXLAN, flags [I] (0x08), vni 8298
|
||||||
|
02:fe:64:dc:1b:82 > 01:02:03:04:05:02, ethertype IPv4 (0x0800), length 46:
|
||||||
|
(tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 32)
|
||||||
|
192.168.1.1.8298 > 192.168.1.2.7: UDP, length 4
|
||||||
|
11:16:53.882554 02:fe:fe:ae:0d:a3 > 01:02:03:82:98:fe, ethertype IPv4 (0x0800), length 96:
|
||||||
|
(tos 0x0, ttl 253, id 0, offset 0, flags [none], proto UDP (17), length 82, bad cksum 3944 (->3996)!)
|
||||||
|
192.0.2.1.23916 > 192.0.2.3.4789: VXLAN, flags [I] (0x08), vni 8298
|
||||||
|
02:fe:64:dc:1b:82 > 01:02:03:04:05:03, ethertype IPv4 (0x0800), length 46:
|
||||||
|
(tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 32)
|
||||||
|
192.168.1.1.8298 > 192.168.1.3.7: UDP, length 4
|
||||||
|
```
|
||||||
|
|
||||||
|
Two important notes: Firstly, this works! For the MAC address ending in `:02`, send the packet to
|
||||||
|
`192.0.2.2` instead of the default of `192.0.2.254`. Same for the `:03` MAC which now goes to
|
||||||
|
`192.0.2.3`. Nice! But secondly, the IPv4 header of the VxLAN packets was changed, so there needs to
|
||||||
|
be a call to `ip4_header_checksum()` inserted somewhere. That's an easy fix.
|
||||||
|
|
||||||
|
# What's next
|
||||||
|
|
||||||
|
I want to discuss a few things, perhaps at an upcoming VPP Community meeting. Notably:
|
||||||
|
1. Is the VPP Developer community supportive of adding eVPN support? Does anybody want to help
|
||||||
|
write it with me?
|
||||||
|
1. Is changing the existing VxLAN plugin appropriate, or should I make a new plugin which adds
|
||||||
|
dynamic endpoints, L2FIB and Flood lists for BUM traffic?
|
||||||
|
1. Is it acceptable for me to add a BUM marker in `l2_flood.c` so that I can reuse all the logic
|
||||||
|
from bridge-domain flooding as I extend to also do VTEP flooding?
|
||||||
|
1. (perhaps later) VxLAN is the canonical underlay, but is there an appetite to extend also to,
|
||||||
|
say, GENEVE or MPLS?
|
||||||
|
1. (perhaps later) What's a good way to tie in a controlplane like FRRouting or Bird2 into the
|
||||||
|
dataplane (perhaps using a sidecar controller, or perhaps using Linux CP Netlink messages)?
|
||||||
|
|
Reference in New Issue
Block a user