This is super complicated work, taken mostly verbatim from the upstream linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com First, add main handler lcp_nl_route_add() and lcp_nl_route_del() Introduce two FIB sources: one for manual routes, one for dynamic routes. See lcp_nl_proto_fib_source() fo details. Add a bunch of helpers that translate Netlink message data into VPP primitives: - lcp_nl_mk_addr46() converts a Netlink nl_addr to a VPP ip46_address_t. - lcp_nl_mk_route_prefix() converts a Netlink rtnl_route to a VPP fib_prefix_t. - lcp_nl_mk_route_mprefix() converts a Netlink rtnl_route to a VPP mfib_prefix_t. - lcp_nl_proto_fib_source() selects the most appropciate fib_src by looking at the rt_proto (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or better is 'fib_src', while anything above that becomes fib_src_dynamic. - lcp_nl_mk_route_entry_flags() generates fib_entry_flag_t from the Netlink route type, table and proto metadata. - lcp_nl_route_path_parse() converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds that to a growing list of paths. - lcp_nl_route_path_add_special() adds a blackhole/unreach/prohibit route to the list of paths, in the special-case there is not yet a path for the destination. Now we're ready to insert FIB entries: - lcp_nl_table_find() selects the matching table-id,protocol(v4/v6) from a hash of tables. - lcp_nl_table_add_or_lock() if at table-id,protocol(v4/v6) hasn't been used yet, create one, otherwise increment a table reference counter so we know how many FIB entries we have in this table. Then, return it. - lcp_nl_table_unlock() Decrease the refcount on a table, and if no more prefixes are in the table, remove it from VPP. - lcp_nl_route_del() Remove a route from the given table-id/protocol. Do this by applying rtnl_route_foreach_nexthop() to the list of Netlink nexthops, converting them into VPP paths in a lcp_nl_route_path_parse_t structure. If the route is for unreachable/blackhole/prohibit in Linux, add that path too. Then, remove the VPP paths from the FIB and reduce refcnt or remove the table if it's empty using table_unlock(). - lcp_nl_route_add() Not all routes are relevant for VPP. Those in table 255 are 'local' routes, already set up by ip[46]_address_add(), and some other route types are invalid, skip those. Link-local IPv6 and IPv6 multicast is also skipped. Then, construct lcp_nl_route_path_parse_t by walking the Netlink nexthops, and optionally add a special (in case the route was for unreachable/blackhole/prohibit in Linux -- those won't have a nexthop). Then, insert the VPP paths found in the Netlink message into the FIB or the multicast FIB, respectively. And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior gateway protocol and BGP full tables can be consumed, on my bench in about 9 seconds: - A batch of 2048 Netlink messages is handled in 9-11ms, so we can do approx 200K messages/sec at peak (and this will consume 50% CPU due to the yielding logic in lcp_nl_process() (see the 'case NL_EVENT_READ' block that adds a cooldown period of LCP_NL_PROCESS_WAIT milliseconds between batches. - With 3 route reflectors and 2 full BGP peers, at peak I could see 309K messages left in the producer queue. - All IPv4 and IPv6 prefixes made their way into the FIB pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv6: 132506 pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv4: 869966 - Compared to Bird2's view: pim@hippo:~/src/lcpng$ birdc show route count BIRD 2.0.7 ready. 3477845 of 3477845 routes for 869942 networks in table master4 527887 of 527887 routes for 132484 networks in table master6 Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables - Flipping one of the full feeds to another, forcing a reconvergence of every prefix in the FIB took about 8 seconds, peaking at 242K messages in the queue, with again an average consumption of 2048 messages per 9-10ms. - All of this was done while iperf'ing 6Gbps to and from the controlplane. --- Because handling full BGP table is O(1M) messages, I will have to make some changes in the logging: - all neigh/route messages become DBG/INFO at best - all addr/link messages become INFO/NOTICE at best - when we overflow time/msgs, turn process_msgs into a WARN, otherwise keep it at INFO so as not to spam. In lcpng_interface.c: - Log NOTICE for pair_add() and pair_del() call; - Log NOTICE for set_interface_addr() call; With this approach, setting the logging level of the linux-cp/nl plugin to 'notice' hits the sweet spot: with things that the operator has ~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND) to stay below the NOTICE level.
6.9 KiB
Introduction
This plugin is a temporary! copy of VPP's src/plugins/linux-cp/ plugin, originally by the following authors:
- Signed-off-by: Neale Ranns nranns@cisco.com
- Signed-off-by: Matthew Smith mgsmith@netgate.com
- Signed-off-by: Jon Loeliger jdl@netgate.com
- Signed-off-by: Pim van Pelt pim@ipng.nl
- Signed-off-by: Neale Ranns neale@graphiant.com
See previous work:
My work is intended to be re-submitted for review as a cleanup/rewrite of the existing Linux CP interface mirror and netlink syncer.
Follow along on my blog for my findings while I work towards a completed plugin that can copy VPP configuration into Linux interfaces, and copy Linux configuration changes into VPP (ie. a fully bidirectional pipe between Linux and VPP).
When the code is complete, this plugin should be able to work seemlessly with a higher level controlplane like FRR or Bird, for example as a BGP/OSPF speaking ISP router.
WARNING!!
The only reason that this code is here, is so that I can make some progress iterating on the Linux CP plugin, and share my findings with some interested folks. The goal is NOT to use this plugin anywhere other than a bench. I intend to contribute the plugin back upstream as soon as it's peer reviewed!
Pull Requests and Issues will be immediately closed without warning
VPP's code lives at fd.io, and this copy is shared only for convenience purposes.
Functionality
The following functionality is supported by the plugin. The VPP->Linux column shows changes in VPP that are copied into the Linux environment; Linux->VPP column shows changes in LInux that are copied into VPP.
Function | VPP -> Linux | Linux -> VPP |
---|---|---|
Up/Down Link | ✅ | ✅ |
Change MTU | ✅ | ✅ |
Change MAC | ❌ 1) | ✅ |
Add/Del IP4/IP6 Address | ✅ | ✅ |
Route | ❌ 2) | ✅ |
Add/Del Tunnel | ❌ | ❌ |
Add/Del Phy | ✅ | 🟠 |
Add/Del .1q | ✅ | ✅ |
Add/Del .1ad | ✅ | ✅ |
Add/Del QinQ | ✅ | ✅ |
Add/Del QinAD | ✅ | ✅ |
Add/Del BondEthernet | ✅ | 🟠 |
Legend: ✅=supported; 🟠=maybe; ❌=infeasible.
- There is no callback or macro to register an interest in MAC address changes in VPP.
- There is no callback or macro to register an interest in FIB changes in VPP.
Building
First, ensure that you can build and run 'vanilla' VPP by using the instructions. Then check out this plugin out-of-tree and symlink it in.
mkdir ~/src
cd ~/src
git clone https://github.com/pimvanpelt/lcpng.git
ln -s ~/src/vpp/src/plugins/lcpng ~src/lcpng
Running
Ensure this plugin is enabled and the original linux-cp
plugin is disabled,
that logging goes to stderr (in the debug variant of VPP), and that the features
are dis/enabled, by providing the following startup.conf
:
plugins {
path ~/src/vpp/build-root/install-vpp_debug-native/vpp/lib/vpp_plugins
plugin lcpng_if_plugin.so { enable }
plugin lcpng_nl_plugin.so { enable }
plugin linux_cp_plugin.so { disable }
}
logging {
default-log-level info
default-syslog-log-level crit
## Set per-class configuration
class linux-cp/if { rate-limit 10000 level debug syslog-level debug }
class linux-cp/nl { rate-limit 10000 level debug syslog-level debug }
}
lcpng {
default netns dataplane
lcp-sync
lcp-auto-subint
}
Then, simply make build
and make run
VPP which will load the plugin.
im@hippo:~/src/vpp$ make run
snort [debug ]: initialized
snort [debug ]: snort listener /run/vpp/snort.sock
linux-cp/if [debug ]: interface_add: [1] sw TenGigabitEthernet3/0/0 is_sub 0 lcp-auto-subint 1
linux-cp/if [debug ]: mtu_change: sw TenGigabitEthernet3/0/0 0
linux-cp/if [debug ]: interface_add: [2] sw TenGigabitEthernet3/0/1 is_sub 0 lcp-auto-subint 1
linux-cp/if [debug ]: mtu_change: sw TenGigabitEthernet3/0/1 0
linux-cp/if [debug ]: interface_add: [3] sw TenGigabitEthernet3/0/2 is_sub 0 lcp-auto-subint 1
linux-cp/if [debug ]: mtu_change: sw TenGigabitEthernet3/0/2 0
linux-cp/if [debug ]: interface_add: [4] sw TenGigabitEthernet3/0/3 is_sub 0 lcp-auto-subint 1
linux-cp/if [debug ]: mtu_change: sw TenGigabitEthernet3/0/3 0
linux-cp/if [debug ]: interface_add: [5] sw TwentyFiveGigabitEthernete/0/0 is_sub 0 lcp-auto-subint 1
linux-cp/if [debug ]: mtu_change: sw TwentyFiveGigabitEthernete/0/0 0
linux-cp/if [debug ]: interface_add: [6] sw TwentyFiveGigabitEthernete/0/1 is_sub 0 lcp-auto-subint 1
linux-cp/if [debug ]: mtu_change: sw TwentyFiveGigabitEthernete/0/1 0
_______ _ _ _____ ___
__/ __/ _ \ (_)__ | | / / _ \/ _ \
_/ _// // / / / _ \ | |/ / ___/ ___/
/_/ /____(_)_/\___/ |___/_/ /_/
DBGvpp#
Pinging BondEthernet
10.1.1.2 : xmt/rcv/%loss = 30000/29833/0%, min/avg/max = 0.11/0.50/10.6 10.1.2.2 : xmt/rcv/%loss = 30000/29856/0%, min/avg/max = 0.10/0.50/10.8 10.1.3.2 : xmt/rcv/%loss = 30000/29851/0%, min/avg/max = 0.10/0.51/10.7 10.1.4.2 : xmt/rcv/%loss = 30000/29848/0%, min/avg/max = 0.12/0.51/10.8 10.1.5.2 : xmt/rcv/%loss = 30000/29841/0%, min/avg/max = 0.11/0.51/11.7 10.0.1.2 : xmt/rcv/%loss = 30000/30000/0%, min/avg/max = 0.09/0.21/40.4 10.0.2.2 : xmt/rcv/%loss = 30000/30000/0%, min/avg/max = 0.10/0.21/30.4 10.0.3.2 : xmt/rcv/%loss = 30000/30000/0%, min/avg/max = 0.10/0.19/20.4 10.0.4.2 : xmt/rcv/%loss = 30000/30000/0%, min/avg/max = 0.10/0.18/10.3 10.0.5.2 : xmt/rcv/%loss = 30000/30000/0%, min/avg/max = 0.10/0.19/8.50 2001:db8:1:1::2 : xmt/rcv/%loss = 30000/29853/0%, min/avg/max = 0.12/0.52/10.7 2001:db8:1:2::2 : xmt/rcv/%loss = 30000/29870/0%, min/avg/max = 0.08/0.56/10.9 2001:db8:1:3::2 : xmt/rcv/%loss = 30000/29857/0%, min/avg/max = 0.11/0.52/11.1 2001:db8:1:4::2 : xmt/rcv/%loss = 30000/29866/0%, min/avg/max = 0.11/0.56/10.9 2001:db8:1:5::2 : xmt/rcv/%loss = 30000/29864/0%, min/avg/max = 0.10/0.57/11.1 2001:db8:0:1::2 : xmt/rcv/%loss = 30000/30000/0%, min/avg/max = 0.10/0.23/8.33 2001:db8:0:2::2 : xmt/rcv/%loss = 30000/30000/0%, min/avg/max = 0.10/0.21/8.27 2001:db8:0:3::2 : xmt/rcv/%loss = 30000/30000/0%, min/avg/max = 0.10/0.20/8.20 2001:db8:0:4::2 : xmt/rcv/%loss = 30000/29999/0%, min/avg/max = 0.11/0.19/8.49 2001:db8:0:5::2 : xmt/rcv/%loss = 30000/29999/0%, min/avg/max = 0.10/0.19/8.46