lcpng

ipng/lcpng

Author	SHA1	Message	Date
Pim van Pelt	25b2999485	Backport https://gerrit.fd.io/r/c/vpp/+/40441	2024-04-01 23:07:56 +02:00
Pim van Pelt	a960d64a87	Add the ability to skip address sync on unnumbered interfaces	2024-03-05 22:59:55 +01:00
Pim van Pelt	c8dc522fe9	Avoid creating a duplicate LIP	2024-03-05 22:59:55 +01:00
Pim van Pelt	becf8fa759	Fix crash - vec_free() instead of free()	2024-02-09 00:14:18 +01:00
jankincai	ff3ce656c1	Fix delete segment error.	2023-08-11 14:31:25 +08:00
jankincai	6dbd2c1d7b	Fixed the deletion segment error	2023-08-11 14:09:13 +08:00
Adrian Pistol	153628a3de	Basic MPLS support. 1) Imports ENCAP_MPLS labels from IPv4/IPv6 routes. Note that this requires libnl 3.6.0 or newer. In previous patches, the fib_path_ext_t had a path ID of -1. After a long investigation, it turned out to be caused by route weight being set to 0. There is a comment explaining more details. 2) Handles MPLS routes. MPLS routes were wrongly added as IPv4 routes before. POP and SWAP are now both supported. All the routes are installed as NON-EOS and EOS routes, as the Linux kernel does not differentiate. EOS POP used in PHP uses the next-hop address family to determine the resulting address family. This patch is sufficient for P setups. PE setups with implicit null should also function okay, as long as a seperate label gets programmed per address family. PE setups with explicit null will also forward packets, but punting is a bit odd and needs MPLS input enabled on the LCP host device. Make sure to enable MPLS in VPP first. 3) Propagate MPLS input state to LCP Pair and Linux. Since the Linux kernel uses the MPLS routes itself, the LCP pair tap needs MPLS enabled to allow host originated packets. This also syncs the Linux `net.mpls.conf.<host_if>.input` sysctl to allow punted packets to have MPLS labels, mostly explicit nulls. For that to work, load the mpls kernel modules. 4) Cross connect MPLS packets from Linux directly to interface-output This is a port of https://gerrit.fd.io/r/c/vpp/+/38702	2023-05-30 22:14:35 +02:00
Adrian Pistol	8fc5631ef6	Run clang-format on all files.	2023-05-30 21:28:35 +02:00
Pim van Pelt	bc429011e8	Fix logging for https://gerrit.fd.io/r/c/vpp/+/38602	2023-04-12 10:53:39 +02:00
Pim van Pelt	bd3265b77f	backport https://gerrit.fd.io/r/c/vpp/+/38602	2023-04-12 10:52:40 +02:00
Pim van Pelt	9017d1bd3c	Move back to multiple rx queues; and copy in the upstream change to autoassign a tap index ID, as the hw_if_index one could ba already taken (by tap create)	2023-04-06 23:36:36 +02:00
Pim van Pelt	6fe37f6a8d	Revert change in `e035203162` -- set RX/TX queues to 1 -- while hunting down a race condition	2023-02-05 11:34:12 +00:00
Pim van Pelt	815a6e0dce	Run VPP's checkstyle to reformat the code	2023-01-11 16:21:40 +00:00
Pim van Pelt	e53d4376ab	cleanup: Clean logging, consistent capitalization, nouns, and macro names	2023-01-11 16:18:18 +00:00
Pim van Pelt	6faf206370	merge	2023-01-11 12:12:15 +00:00
Pim van Pelt	263ff9d02c	Initialize var in case of error return, per upstream a01be735f2	2023-01-11 13:08:54 +01:00
Pim van Pelt	e035203162	Fix memory leak on failed tap creation per upstream 37b5cccb; Also, sync the RX/TX queues to be the same as upstream linux-cp	2023-01-11 11:13:40 +01:00
Pim van Pelt	a4006ec5da	Backport https://gerrit.fd.io/r/c/vpp/+/37026	2022-10-04 13:13:57 +00:00
Pim van Pelt	a0ad5cc5cf	Merge Gerrit 36176 from upstream	2022-06-03 20:33:48 +00:00
Pim van Pelt	786530701d	Backport https://gerrit.fd.io/r/c/vpp/+/35719	2022-03-31 23:23:36 +00:00
Pim van Pelt	0bc3e01c9b	Backport https://gerrit.fd.io/r/c/vpp/+/35528	2022-03-08 14:45:35 +00:00
Pim van Pelt	8a1c031ea9	backport https://gerrit.fd.io/r/c/vpp/+/35411	2022-02-22 20:42:11 +00:00
Pim van Pelt	5f3eb62be9	Reduce resource usage on virtio polling, also avoid potential (undiagnosed) dataplane lockup when running multiple threads: just poll with one RX queue per TAP	2022-02-20 19:19:58 +00:00
Pim van Pelt	ccd4b393e9	Roll back carrier set -- it works fine for phy's that are carrier-up, but crashes if they are carrier-down. 0: /home/pim/src/vpp/src/vnet/interface_funcs.h:46 (vnet_get_hw_interface) assertion `! pool_is_free (vnm->interface_main.hw_interfaces, _e)' fails at /home/pim/src/vpp/src/vppinfra/error.c:143 ns=0x7fff98774e80 "dataplane", host_sw_if_indexp=0x0) at /home/pim/src/vpp/src/plugins/lcpng/lcpng_interface.c:998 at /home/pim/src/vpp/src/plugins/lcpng/lcpng_if_cli.c:96 parent_command_index=371) at /home/pim/src/vpp/src/vlib/cli.c:591 parent_command_index=0) at /home/pim/src/vpp/src/vlib/cli.c:548 at /home/pim/src/vpp/src/vlib/cli.c:694	2021-12-24 21:12:24 +00:00
Pim van Pelt	ddd3ad372a	Only set carrier up when hw is up	2021-12-24 21:04:15 +00:00
Pim van Pelt	22e907555d	Initialize the TAP carrier based on the (hardware) phy	2021-12-24 20:04:08 +00:00
Pim van Pelt	65fa49f30b	Fix crash if netns is not set at startup	2021-12-19 21:46:00 +00:00
Pim van Pelt	cd86f17454	Copy forward neale's improvement from upstream gerrit 33948	2021-11-29 22:26:44 +00:00
Pim van Pelt	cdf07cce34	Merge review feedback from mgsmith on upstream gerrit 33709 ps8..10	2021-11-29 20:19:34 +00:00
Pim van Pelt	4ed9d02693	Fix non NULL terminated strings (namespace and hostname are vectors)	2021-09-09 20:04:15 +00:00
Pim van Pelt	043fecb0e0	Only find the parent tap if it's necessary (ie. doesn't already exist in vif_index; change by mgsmith@	2021-09-08 21:15:42 +00:00
Pim van Pelt	45cb9b4afc	Cleanup interface sync - move tap_set_carrier() upstream to lcp_itf_set_link_state() - refuse to set admin-up on sub-int if parent is down - no need to switch namespaces, lcp_itf_set_link_state() already does - in change_mtu and change_admin_state, if the interface is a sub, we only have to sync that one interface. Otherwise, walk the parent interface and all sub-ints with lcp_itf_pair_sync_state_hw() and make note of this in the (DBG) log	2021-09-08 20:53:02 +00:00
Pim van Pelt	8b3356cd86	if sub_interface fails to create, return error and don't continue (fixes a crash)	2021-09-08 19:50:27 +00:00
Pim van Pelt	3c806d586d	Accommodate Netgate's usecase, they create the linux netlink device first, and then call the pair_create; in that case, linux_parent_if_index already exists; simplify the call path here, h/t mgsmith@	2021-09-08 18:41:35 +00:00
Pim van Pelt	36f1ebfdae	Move check for parent_if_index up copy review change from mgsmith@	2021-09-07 22:10:54 +00:00
Pim van Pelt	fdab236755	backport fix from mgsmith@ in VPP main repo	2021-09-07 21:13:55 +00:00
Pim van Pelt	15efc4efc2	Copy review notes from mgsmith@ from https://gerrit.fd.io/r/c/vpp/+/33481/12..13	2021-08-30 20:37:05 +00:00
Pim van Pelt	98a84d0fa7	Turn lip_namespace into a vector	2021-08-30 20:31:53 +00:00
Pim van Pelt	7a76498277	Add NEWROUTE/DELROUTE handler This is super complicated work, taken mostly verbatim from the upstream linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com First, add main handler lcp_nl_route_add() and lcp_nl_route_del() Introduce two FIB sources: one for manual routes, one for dynamic routes. See lcp_nl_proto_fib_source() fo details. Add a bunch of helpers that translate Netlink message data into VPP primitives: - lcp_nl_mk_addr46() converts a Netlink nl_addr to a VPP ip46_address_t. - lcp_nl_mk_route_prefix() converts a Netlink rtnl_route to a VPP fib_prefix_t. - lcp_nl_mk_route_mprefix() converts a Netlink rtnl_route to a VPP mfib_prefix_t. - lcp_nl_proto_fib_source() selects the most appropciate fib_src by looking at the rt_proto (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or better is 'fib_src', while anything above that becomes fib_src_dynamic. - lcp_nl_mk_route_entry_flags() generates fib_entry_flag_t from the Netlink route type, table and proto metadata. - lcp_nl_route_path_parse() converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds that to a growing list of paths. - lcp_nl_route_path_add_special() adds a blackhole/unreach/prohibit route to the list of paths, in the special-case there is not yet a path for the destination. Now we're ready to insert FIB entries: - lcp_nl_table_find() selects the matching table-id,protocol(v4/v6) from a hash of tables. - lcp_nl_table_add_or_lock() if at table-id,protocol(v4/v6) hasn't been used yet, create one, otherwise increment a table reference counter so we know how many FIB entries we have in this table. Then, return it. - lcp_nl_table_unlock() Decrease the refcount on a table, and if no more prefixes are in the table, remove it from VPP. - lcp_nl_route_del() Remove a route from the given table-id/protocol. Do this by applying rtnl_route_foreach_nexthop() to the list of Netlink nexthops, converting them into VPP paths in a lcp_nl_route_path_parse_t structure. If the route is for unreachable/blackhole/prohibit in Linux, add that path too. Then, remove the VPP paths from the FIB and reduce refcnt or remove the table if it's empty using table_unlock(). - lcp_nl_route_add() Not all routes are relevant for VPP. Those in table 255 are 'local' routes, already set up by ip[46]_address_add(), and some other route types are invalid, skip those. Link-local IPv6 and IPv6 multicast is also skipped. Then, construct lcp_nl_route_path_parse_t by walking the Netlink nexthops, and optionally add a special (in case the route was for unreachable/blackhole/prohibit in Linux -- those won't have a nexthop). Then, insert the VPP paths found in the Netlink message into the FIB or the multicast FIB, respectively. And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior gateway protocol and BGP full tables can be consumed, on my bench in about 9 seconds: - A batch of 2048 Netlink messages is handled in 9-11ms, so we can do approx 200K messages/sec at peak (and this will consume 50% CPU due to the yielding logic in lcp_nl_process() (see the 'case NL_EVENT_READ' block that adds a cooldown period of LCP_NL_PROCESS_WAIT milliseconds between batches. - With 3 route reflectors and 2 full BGP peers, at peak I could see 309K messages left in the producer queue. - All IPv4 and IPv6 prefixes made their way into the FIB pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv6: 132506 pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv4: 869966 - Compared to Bird2's view: pim@hippo:~/src/lcpng$ birdc show route count BIRD 2.0.7 ready. 3477845 of 3477845 routes for 869942 networks in table master4 527887 of 527887 routes for 132484 networks in table master6 Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables - Flipping one of the full feeds to another, forcing a reconvergence of every prefix in the FIB took about 8 seconds, peaking at 242K messages in the queue, with again an average consumption of 2048 messages per 9-10ms. - All of this was done while iperf'ing 6Gbps to and from the controlplane. --- Because handling full BGP table is O(1M) messages, I will have to make some changes in the logging: - all neigh/route messages become DBG/INFO at best - all addr/link messages become INFO/NOTICE at best - when we overflow time/msgs, turn process_msgs into a WARN, otherwise keep it at INFO so as not to spam. In lcpng_interface.c: - Log NOTICE for pair_add() and pair_del() call; - Log NOTICE for set_interface_addr() call; With this approach, setting the logging level of the linux-cp/nl plugin to 'notice' hits the sweet spot: with things that the operator has ~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND) to stay below the NOTICE level.	2021-08-29 14:57:21 +02:00
Pim van Pelt	6d2ce1cd83	Consolidate MTU and Link changes into one function.	2021-08-24 17:50:44 +02:00
Pim van Pelt	fee9776e87	Cleanup -- remove unused pair_add_sub()	2021-08-15 19:47:46 +02:00
Pim van Pelt	2d00de080b	Protect VPP -> Linux state propagation behind flag Introduce lcp_main.lcp_sync, which determines if state changes made to interfaces in VPP do or don't propagate into Linux. - Add a startup.conf directive 'lcp-sync' to enable at startup time. - Add CLI.short_help = "lcp lcp-sync [on\|enable\|off\|disable]", - Show the current value in "show lcp". Gate changes in mtu, state and address on lcp_lcp_sync(). When the operator issues 'lcp lcp-sync on', it is prudent to do a one-off sync of all interface attributes from VPP into Linux. For this, add a lcp_itf_pair_sync_state_all() function.	2021-08-15 16:07:53 +02:00
Pim van Pelt	d23aab2d95	Add CLI for lcp-auto-subint In preparation of another feature 'netlink-auto-subint', rename lcp_main's field to "lcp_auto_subint". Add CLI .short_help = "lcp lcp-auto-subint [on\|enable\|off\|disable]" Show status of the field on "lcp show" output.	2021-08-15 14:45:04 +02:00
Pim van Pelt	c2adb3262d	Fix bug in find_outer_vlan() I was looking at the hw interface list, which makes sense for ethernet devices but not for other devices, notably BondEthernets. In addition, creating two separate interfaces with the same outer (for example Gi3/0/0 dot1ad 2345 inner-dot1q 1000 AND the same on Gi3/0/1) would yield an erratic match and a crash. Switch to walking the sw interface list instead, and search for the sup_sw_if_index that has the desired outer. Result: BondEthernet0.{1234,1235,1236,1237} can be created and are functional.	2021-08-14 19:37:37 +02:00
Pim van Pelt	934446dcd9	Add automatic LCP creation Update the pair_config() parser to follow suite. When the configuration 'lcp-auto-subint' is set, and the interface at hand is a subinterface, in lcp_itf_interface_add_del(): - if it's a deletion and we're a sub-int, and we have a LIP: delete it. - if it's a creation and we're a sub-int, and our parent has a LIP, create one. Fix a few logging consistency issues (pair_del), and in pair_delete_by_index() ensure that the right namespace is selected. Due to this quirk with lip->lip_namespace not wanting to be a vec_dup() string, rewrite them all to be strdup/free instead.	2021-08-14 17:20:31 +02:00
Pim van Pelt	b3d8e75706	Start LCP auto-creation This first preparation moves lcp_itf_phy_add() to lcpng_if_sync.c and renames it lcp_itf_interface_add_del(). It does all the pre-flight checks to validate that a new device, given by sw_if_index, can have a LIP created: - must be a sub-int - must have a sw_sup_if_index, which itself has a LIP However, I realize that I cannot create an interface from within an interface add callback, so I'll have to schedule the child LIP to be created by a process, after the callback returns. I'll do that in the next commit.	2021-08-14 14:28:16 +02:00
Pim van Pelt	10f10d534c	Ensure the plugin works well with namespaces I've made a few cosmetic adjustments: - introduce debug, info, notice, warn and err loggers - remove redundant logging, and set correct (conservative) log levels - turn the sync-state error into a warning And a little debt paydown: - refactor sync-state into its own function, to be called instead of all the spot fixes elsewhere. It's going to be the case that sync-state is "the reconsiliation function". - Fix a bug in lip->lip_namespace copy: vec_dup() there doesn't seem to work for me, use strdup() instead and make a mental note to reviist. The plugin now works with 'lcpng default netns dataplane' in its startup.conf; and with 'lcp default netns dataplane' as its first command. A few of these fixes should go upstream, too, which I'll do next.	2021-08-14 09:40:43 +02:00
Pim van Pelt	aab11196cf	Sanitize some overly verbose logging ERR->{DBG,INFO}	2021-08-13 21:08:03 +02:00
Pim van Pelt	72f55fd901	Netlink namespaces! I have been very careless in using the correct network namespace when manipulating LCP host devices. Around any/every netlink write operation, we must first clib_setns() into the correct namespace. So, wrap every call of vnet_netlink_*() in all places. For consistency, use the convention 'curr_ns_fd' (for the one we are coming from) and 'vif_ns_fd' (to signal the one that the netlink VIF is in). Be careful as well to enter and exit everywhere without losing file descriptors.	2021-08-13 20:58:28 +02:00
Pim van Pelt	f7e1bb951d	Sync IPv4 and IPv6 addresses from VPP to LCP There are three ways in which IP addresses will want to be copied from VPP into the companion Linux devices: 1) set interface ip address ... adds an IPv4 or IPv6 address - this is handled by lcp_itf_ip[46]_add_del_interface_addr() which is a callback installed in lcp_itf_pair_init() 2) set interface ip address del ... removes them - also handled by lcp_itf_ip[46]_add_del_interface_addr() but curiously there is no upstream vnet_netlink_del_ip[46]_addr() so I wrote them inline here - I will try to get them upstreamed, as they appear to be obvious companions in vnet/device/netlink.h 3) Upon LIP creation, it could be that there are L3 addresses already on the VPP interface. If so, set them with lcp_itf_set_interface_addr() This means that now, at any time a new LIP is created, its state from VPP is fully copied over (MTU, Link state, IPv4/IPv6 addresses)! At runtime, new addresses can be set/removed as well.	2021-08-13 16:50:32 +02:00

1 2

67 Commits