lcpng

ipng/lcpng

Author	SHA1	Message	Date
Adrian Pistol	153628a3de	Basic MPLS support. 1) Imports ENCAP_MPLS labels from IPv4/IPv6 routes. Note that this requires libnl 3.6.0 or newer. In previous patches, the fib_path_ext_t had a path ID of -1. After a long investigation, it turned out to be caused by route weight being set to 0. There is a comment explaining more details. 2) Handles MPLS routes. MPLS routes were wrongly added as IPv4 routes before. POP and SWAP are now both supported. All the routes are installed as NON-EOS and EOS routes, as the Linux kernel does not differentiate. EOS POP used in PHP uses the next-hop address family to determine the resulting address family. This patch is sufficient for P setups. PE setups with implicit null should also function okay, as long as a seperate label gets programmed per address family. PE setups with explicit null will also forward packets, but punting is a bit odd and needs MPLS input enabled on the LCP host device. Make sure to enable MPLS in VPP first. 3) Propagate MPLS input state to LCP Pair and Linux. Since the Linux kernel uses the MPLS routes itself, the LCP pair tap needs MPLS enabled to allow host originated packets. This also syncs the Linux `net.mpls.conf.<host_if>.input` sysctl to allow punted packets to have MPLS labels, mostly explicit nulls. For that to work, load the mpls kernel modules. 4) Cross connect MPLS packets from Linux directly to interface-output This is a port of https://gerrit.fd.io/r/c/vpp/+/38702	2023-05-30 22:14:35 +02:00
Pim van Pelt	529a11bb78	Fix ADD/REPLACE semantic for IPv4 and IPv6 routes: if NLM_F_REPLACE is set, call fib_table_entry_update(); otherwise call fib_table_entry_path_add2()	2023-05-21 09:11:05 +02:00
Pim van Pelt	815a6e0dce	Run VPP's checkstyle to reformat the code	2023-01-11 16:21:40 +00:00
Pim van Pelt	e53d4376ab	cleanup: Clean logging, consistent capitalization, nouns, and macro names	2023-01-11 16:18:18 +00:00
Pim van Pelt	e2ac348759	'ns' is now a vector, don't memcpy it, but vec_dup() instead	2021-09-08 21:38:45 +00:00
Pim van Pelt	6d86d2b075	Allow larger fraction of CPU to be used by netlink Instead of doing BATCH_DELAY_MS work and BATCH_DELAY_MS sleep, add a BATCH_WORK_MS (40) and lower BATCH_DELAY_MS (10), so we'll work 80% of the time, and consume netlink messages 20% of the time. Also raise the total batch size to 8K because on my test machine we run 2K in 13ms or 8K in ~50ms.	2021-08-29 17:47:33 +02:00
Pim van Pelt	7a76498277	Add NEWROUTE/DELROUTE handler This is super complicated work, taken mostly verbatim from the upstream linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com First, add main handler lcp_nl_route_add() and lcp_nl_route_del() Introduce two FIB sources: one for manual routes, one for dynamic routes. See lcp_nl_proto_fib_source() fo details. Add a bunch of helpers that translate Netlink message data into VPP primitives: - lcp_nl_mk_addr46() converts a Netlink nl_addr to a VPP ip46_address_t. - lcp_nl_mk_route_prefix() converts a Netlink rtnl_route to a VPP fib_prefix_t. - lcp_nl_mk_route_mprefix() converts a Netlink rtnl_route to a VPP mfib_prefix_t. - lcp_nl_proto_fib_source() selects the most appropciate fib_src by looking at the rt_proto (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or better is 'fib_src', while anything above that becomes fib_src_dynamic. - lcp_nl_mk_route_entry_flags() generates fib_entry_flag_t from the Netlink route type, table and proto metadata. - lcp_nl_route_path_parse() converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds that to a growing list of paths. - lcp_nl_route_path_add_special() adds a blackhole/unreach/prohibit route to the list of paths, in the special-case there is not yet a path for the destination. Now we're ready to insert FIB entries: - lcp_nl_table_find() selects the matching table-id,protocol(v4/v6) from a hash of tables. - lcp_nl_table_add_or_lock() if at table-id,protocol(v4/v6) hasn't been used yet, create one, otherwise increment a table reference counter so we know how many FIB entries we have in this table. Then, return it. - lcp_nl_table_unlock() Decrease the refcount on a table, and if no more prefixes are in the table, remove it from VPP. - lcp_nl_route_del() Remove a route from the given table-id/protocol. Do this by applying rtnl_route_foreach_nexthop() to the list of Netlink nexthops, converting them into VPP paths in a lcp_nl_route_path_parse_t structure. If the route is for unreachable/blackhole/prohibit in Linux, add that path too. Then, remove the VPP paths from the FIB and reduce refcnt or remove the table if it's empty using table_unlock(). - lcp_nl_route_add() Not all routes are relevant for VPP. Those in table 255 are 'local' routes, already set up by ip[46]_address_add(), and some other route types are invalid, skip those. Link-local IPv6 and IPv6 multicast is also skipped. Then, construct lcp_nl_route_path_parse_t by walking the Netlink nexthops, and optionally add a special (in case the route was for unreachable/blackhole/prohibit in Linux -- those won't have a nexthop). Then, insert the VPP paths found in the Netlink message into the FIB or the multicast FIB, respectively. And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior gateway protocol and BGP full tables can be consumed, on my bench in about 9 seconds: - A batch of 2048 Netlink messages is handled in 9-11ms, so we can do approx 200K messages/sec at peak (and this will consume 50% CPU due to the yielding logic in lcp_nl_process() (see the 'case NL_EVENT_READ' block that adds a cooldown period of LCP_NL_PROCESS_WAIT milliseconds between batches. - With 3 route reflectors and 2 full BGP peers, at peak I could see 309K messages left in the producer queue. - All IPv4 and IPv6 prefixes made their way into the FIB pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv6: 132506 pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv4: 869966 - Compared to Bird2's view: pim@hippo:~/src/lcpng$ birdc show route count BIRD 2.0.7 ready. 3477845 of 3477845 routes for 869942 networks in table master4 527887 of 527887 routes for 132484 networks in table master6 Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables - Flipping one of the full feeds to another, forcing a reconvergence of every prefix in the FIB took about 8 seconds, peaking at 242K messages in the queue, with again an average consumption of 2048 messages per 9-10ms. - All of this was done while iperf'ing 6Gbps to and from the controlplane. --- Because handling full BGP table is O(1M) messages, I will have to make some changes in the logging: - all neigh/route messages become DBG/INFO at best - all addr/link messages become INFO/NOTICE at best - when we overflow time/msgs, turn process_msgs into a WARN, otherwise keep it at INFO so as not to spam. In lcpng_interface.c: - Log NOTICE for pair_add() and pair_del() call; - Log NOTICE for set_interface_addr() call; With this approach, setting the logging level of the linux-cp/nl plugin to 'notice' hits the sweet spot: with things that the operator has ~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND) to stay below the NOTICE level.	2021-08-29 14:57:21 +02:00
Pim van Pelt	a3a5f68926	Add newlink/delink processing. - Can up/down a link. - Can set MAC on a link, if it's a phy. - Can set MTU on a link. - Can delete link (including phy). Because link state and mtu changes tend to go around in circles (from netlink -> vpp; and then with lcp-sync on, as well from vpp -> netlink) when we consume a batch of netlink messages, we'll temporarily turn off lcp-sync if it's enabled. TODO (in the next commit), the whole nine yards of creating interfaces in VPP based on NEWLINK vlans that come in. Conceptualy not too difficult: if NEWLINK doesn't have a LIP associated with it, but it's a VLAN, and the parent of the VLAN is a link which _does_ have a LIP, then we can create the subint in VPP in the correct way.	2021-08-24 18:18:23 +02:00
Pim van Pelt	87742b4f54	Add netlink address add/del Straight forward addition/removal of IPv4 and IPv6 addresses on interfaces. One thing I noticed, which isn't a concern but an unfortunate issue, looking at the following sequence: ip addr add 10.0.1.1/30 dev e0 debug linux-cp/nl addr_add: netlink route/addr: add idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent) warn linux-cp/nl dispatch: ignored route/route: add family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: add family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 } ping 10.0.1.2 debug linux-cp/nl neigh_add: netlink route/neigh: add idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000 notice linux-cp/nl neigh_add: Added 10.0.1.2 lladdr 68:05:ca:32:45:94 iface TenGigabitEthernet3/0/0 ip addr del 10.0.1.1/30 dev e0 debug linux-cp/nl addr_del: netlink route/addr: del idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent) notice linux-cp/nl addr_del: Deleted 10.0.1.1/30 iface TenGigabitEthernet3/0/0 warn linux-cp/nl dispatch: ignored route/route: del family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: del family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 } debug linux-cp/nl neigh_del: netlink route/neigh: del idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000 error linux-cp/nl neigh_del: Failed 10.0.1.2 iface TenGigabitEthernet3/0/0 It is this very last message that's a bit of a concern -- the ping brought the lladdr into the neighbor cache; and the subsequent address deletion first removed the address, then all the typical local routes (the connected, the broadcast, the network, and the self/local); but then as well explicitly deleted the neighbor, which is correct behavior for Linux, except that VPP already invalidates the neighbor cache and adds/removes the connected routes for example in ip/ip4_forward.c L826-L830 and L583. I predict more of these false positive 'errors' like the one on neigh_del() beacuse interface/route addition/deletion is slightly different in VPP than in Linux. I may have to reclassify the errors as warnings otherwise.	2021-08-24 00:26:06 +02:00
Pim van Pelt	30bab1d3f9	Our first Netlink syncer! Add lcpng_nl_sync.c that will house these functions. Their purpose is to take state learned from netlink messages, and apply that state to VPP. Some rearranging/plumbing was necessary to get logging to be visible in this new source file. Then, we add lcp_nl_neigh_add() and _del() which look up the LIP, convert the lladdr and ip address from Netlink into VPP variants, and then add or remove the ip4/ip6 neighbor adjacency.	2021-08-24 00:05:29 +02:00
Pim van Pelt	c4e3043ea1	Add skeleton of Linux CP Netlink Listener Register lcp_nl_init() which adds interface pair add/del callbacks. lcb_nl_pair_add_cb: Initiate netlink listener for first interface in its netns. If subsequent adds are in other netns, issue a warning. Keep refcount. lcb_nl_pair_del_cb: Remove listener when the last interface pair is removed. Socket is opened, file is added to VPP's epoll, with lcp_nl_read_cb() and lcp_nl_error_cb() callbacks installed. - lcp_nl_read_cb() calls lcp_nl_callback() which pushes netlink messages onto a queue and issues NL_EVENT_READ event, any socket read error issues NL_EVENT_READ_ERR event. - lcp_nl_error_cb() simply issues NL_EVENT_READ_ERR event. Then, initialize a process node called lcp_nl_process(), which handles: - NL_EVENT_READ and call lcp_nl_process_msgs() - if messages are left in the queue, reschedule consumption after M msecs. This allows new netlink messages to continuously be read from the kernel, even if we have lots of messages to consume. - NL_EVENT_READ_ERR and close/reopens the netlink socket. lcp_nl_process_msgs() processes up to N messages and/or for up to M msecs, whichever comes first. For each, calling lcp_nl_dispatch(). lcp_nl_dispatch() ultimately just throws the message away after logging it with format_nl_object()	2021-08-23 23:04:50 +02:00

11 Commits