lcpng

ipng/lcpng

Author	SHA1	Message	Date
Pim van Pelt	76c8b53f41	fix silly crash in logging	2021-08-29 15:03:39 +02:00
Pim van Pelt	7a76498277	Add NEWROUTE/DELROUTE handler This is super complicated work, taken mostly verbatim from the upstream linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com First, add main handler lcp_nl_route_add() and lcp_nl_route_del() Introduce two FIB sources: one for manual routes, one for dynamic routes. See lcp_nl_proto_fib_source() fo details. Add a bunch of helpers that translate Netlink message data into VPP primitives: - lcp_nl_mk_addr46() converts a Netlink nl_addr to a VPP ip46_address_t. - lcp_nl_mk_route_prefix() converts a Netlink rtnl_route to a VPP fib_prefix_t. - lcp_nl_mk_route_mprefix() converts a Netlink rtnl_route to a VPP mfib_prefix_t. - lcp_nl_proto_fib_source() selects the most appropciate fib_src by looking at the rt_proto (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or better is 'fib_src', while anything above that becomes fib_src_dynamic. - lcp_nl_mk_route_entry_flags() generates fib_entry_flag_t from the Netlink route type, table and proto metadata. - lcp_nl_route_path_parse() converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds that to a growing list of paths. - lcp_nl_route_path_add_special() adds a blackhole/unreach/prohibit route to the list of paths, in the special-case there is not yet a path for the destination. Now we're ready to insert FIB entries: - lcp_nl_table_find() selects the matching table-id,protocol(v4/v6) from a hash of tables. - lcp_nl_table_add_or_lock() if at table-id,protocol(v4/v6) hasn't been used yet, create one, otherwise increment a table reference counter so we know how many FIB entries we have in this table. Then, return it. - lcp_nl_table_unlock() Decrease the refcount on a table, and if no more prefixes are in the table, remove it from VPP. - lcp_nl_route_del() Remove a route from the given table-id/protocol. Do this by applying rtnl_route_foreach_nexthop() to the list of Netlink nexthops, converting them into VPP paths in a lcp_nl_route_path_parse_t structure. If the route is for unreachable/blackhole/prohibit in Linux, add that path too. Then, remove the VPP paths from the FIB and reduce refcnt or remove the table if it's empty using table_unlock(). - lcp_nl_route_add() Not all routes are relevant for VPP. Those in table 255 are 'local' routes, already set up by ip[46]_address_add(), and some other route types are invalid, skip those. Link-local IPv6 and IPv6 multicast is also skipped. Then, construct lcp_nl_route_path_parse_t by walking the Netlink nexthops, and optionally add a special (in case the route was for unreachable/blackhole/prohibit in Linux -- those won't have a nexthop). Then, insert the VPP paths found in the Netlink message into the FIB or the multicast FIB, respectively. And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior gateway protocol and BGP full tables can be consumed, on my bench in about 9 seconds: - A batch of 2048 Netlink messages is handled in 9-11ms, so we can do approx 200K messages/sec at peak (and this will consume 50% CPU due to the yielding logic in lcp_nl_process() (see the 'case NL_EVENT_READ' block that adds a cooldown period of LCP_NL_PROCESS_WAIT milliseconds between batches. - With 3 route reflectors and 2 full BGP peers, at peak I could see 309K messages left in the producer queue. - All IPv4 and IPv6 prefixes made their way into the FIB pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv6: 132506 pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv4: 869966 - Compared to Bird2's view: pim@hippo:~/src/lcpng$ birdc show route count BIRD 2.0.7 ready. 3477845 of 3477845 routes for 869942 networks in table master4 527887 of 527887 routes for 132484 networks in table master6 Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables - Flipping one of the full feeds to another, forcing a reconvergence of every prefix in the FIB took about 8 seconds, peaking at 242K messages in the queue, with again an average consumption of 2048 messages per 9-10ms. - All of this was done while iperf'ing 6Gbps to and from the controlplane. --- Because handling full BGP table is O(1M) messages, I will have to make some changes in the logging: - all neigh/route messages become DBG/INFO at best - all addr/link messages become INFO/NOTICE at best - when we overflow time/msgs, turn process_msgs into a WARN, otherwise keep it at INFO so as not to spam. In lcpng_interface.c: - Log NOTICE for pair_add() and pair_del() call; - Log NOTICE for set_interface_addr() call; With this approach, setting the logging level of the linux-cp/nl plugin to 'notice' hits the sweet spot: with things that the operator has ~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND) to stay below the NOTICE level.	2021-08-29 14:57:21 +02:00
Pim van Pelt	47fd53be42	Add NEWROUTE/DELROUTE handler This is super complicated work, taken mostly verbatim from the upstream linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com First, add main handler lcp_nl_route_add() and lcp_nl_route_del() Introduce two FIB sources: one for manual routes, one for dynamic routes. See lcp_nl_proto_fib_source() fo details. Add a bunch of helpers that translate Netlink message data into VPP primitives: - lcp_nl_mk_addr46() converts a Netlink nl_addr to a VPP ip46_address_t. - lcp_nl_mk_route_prefix() converts a Netlink rtnl_route to a VPP fib_prefix_t. - lcp_nl_mk_route_mprefix() converts a Netlink rtnl_route to a VPP mfib_prefix_t. - lcp_nl_proto_fib_source() selects the most appropciate fib_src by looking at the rt_proto (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or better is 'fib_src', while anything above that becomes fib_src_dynamic. - lcp_nl_mk_route_entry_flags() generates fib_entry_flag_t from the Netlink route type, table and proto metadata. - lcp_nl_route_path_parse() converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds that to a growing list of paths. - lcp_nl_route_path_add_special() adds a blackhole/unreach/prohibit route to the list of paths, in the special-case there is not yet a path for the destination. Now we're ready to insert FIB entries: - lcp_nl_table_find() selects the matching table-id,protocol(v4/v6) from a hash of tables. - lcp_nl_table_add_or_lock() if at table-id,protocol(v4/v6) hasn't been used yet, create one, otherwise increment a table reference counter so we know how many FIB entries we have in this table. Then, return it. - lcp_nl_table_unlock() Decrease the refcount on a table, and if no more prefixes are in the table, remove it from VPP. - lcp_nl_route_del() Remove a route from the given table-id/protocol. Do this by applying rtnl_route_foreach_nexthop() to the list of Netlink nexthops, converting them into VPP paths in a lcp_nl_route_path_parse_t structure. If the route is for unreachable/blackhole/prohibit in Linux, add that path too. Then, remove the VPP paths from the FIB and reduce refcnt or remove the table if it's empty using table_unlock(). - lcp_nl_route_add() Not all routes are relevant for VPP. Those in table 255 are 'local' routes, already set up by ip[46]_address_add(), and some other route types are invalid, skip those. Link-local IPv6 and IPv6 multicast is also skipped. Then, construct lcp_nl_route_path_parse_t by walking the Netlink nexthops, and optionally add a special (in case the route was for unreachable/blackhole/prohibit in Linux -- those won't have a nexthop). Then, insert the VPP paths found in the Netlink message into the FIB or the multicast FIB, respectively. And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior gateway protocol and BGP full tables can be consumed, on my bench in about 9 seconds: - A batch of 2048 Netlink messages is handled in 9-11ms, so we can do approx 200K messages/sec at peak (and this will consume 50% CPU due to the yielding logic in lcp_nl_process() (see the 'case NL_EVENT_READ' block that adds a cooldown period of LCP_NL_PROCESS_WAIT milliseconds between batches. - With 3 route reflectors and 2 full BGP peers, at peak I could see 309K messages left in the producer queue. - All IPv4 and IPv6 prefixes made their way into the FIB pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv6: 132506 pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv4: 869966 - Compared to Bird2's view: pim@hippo:~/src/lcpng$ birdc show route count BIRD 2.0.7 ready. 3477845 of 3477845 routes for 869942 networks in table master4 527887 of 527887 routes for 132484 networks in table master6 Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables - Flipping one of the full feeds to another, forcing a reconvergence of every prefix in the FIB took about 8 seconds, peaking at 242K messages in the queue, with again an average consumption of 2048 messages per 9-10ms. - All of this was done while iperf'ing 6Gbps to and from the controlplane. --- Because handling full BGP table is O(1M) messages, I will have to make some changes in the logging: - all neigh/route messages become DBG/INFO at best - all addr/link messages become INFO/NOTICE at best - when we overflow time/msgs, turn process_msgs into a WARN, otherwise keep it at INFO so as not to spam. In lcpng_interface.c: - Log NOTICE for pair_add() and pair_del() call; - Log NOTICE for set_interface_addr() call; With this approach, setting the logging level of the linux-cp/nl plugin to 'notice' hits the sweet spot: with things that the operator has ~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND) to stay below the NOTICE level.	2021-08-29 13:52:11 +02:00
Pim van Pelt	873c1e3591	Capture a breadcrum about BondEthernet0 packetlo	2021-08-29 12:59:08 +02:00
Pim van Pelt	a474d09a9e	Fix crash when vlib_buffer_copy() fails - also sent upstream in https://gerrit.fd.io/r/c/vpp/+/33606	2021-08-26 16:43:13 +02:00
Pim van Pelt	45f4088656	Add ability to create subint's from Linux Using the earlier placeholder hint in lcp_nl_link_add(), I know that I've gotten a NEWLINK request but the linux ifindex doesn't have a LIP. This could be because the interface is entirely foreign to VPP, for example somebody created a dummy interface or a VLAN subint on one: ip link add dum0 type dummy ip link add link dum0 name dum0.10 type vlan id 10 Or, I'm actually trying to create a VLAN subint, like these: ip link add link e0 name e0.1234 type vlan id 1234 ip link add link e0.1234 name e0.1235 type vlan id 1000 ip link add link e0 name e0.1236 type vlan id 2345 proto 802.1ad ip link add link e0.1236 name e0.1237 type vlan id 1000 None of these NEWLINK callbacks, represented by vif (linux interface id) will have a corresponding LIP. So, I try to create one by calling lcp_nl_link_add_vlan(). Here, I lookup the parent index ('dum0' or 'e0' in the first examples), the former of which also doesn't have a LIP, so I bail. If it does, I still have two choices: 1) the LIP is a phy (ie TenGigabitEthernet3/0/0) and this is a regular tagged interface; or 2) the LIP is itself a subint (ie TenGigabitEthernet3/0/0.1234) and what I'm asking for is a QinQ or QinAD sub-interface. So I look up as well the phy LIP. We now have all the ingredients I need to create the VPP sub-interfaces with the correct inner-dot1q and outer dot1q or dot1ad. Of course, I don't really know what subinterface ID to use. It's appealing to "just" use the vlan, but that's not helpful if the outer tag and the inner tag are the same. So I write a helper function vnet_sw_interface_get_available_subid() whose job it is to return an unused subid for the phy -- starting from 1. I then create the phy sub-interface and the tap sub-interface, tying them together into a new LIP. During these interface creations, I want to make sure that if lcp-auto-subint is on, we disable that. I don't want VPP racing to create LIPs for the sub-ints right now. Before I return (either in error state or upon success), I put back the lcp-auto-subint to what it was before. If I manage to create the LIP, huzzah. I return it to the caller so it can continue setting link/mac/mtu etc.	2021-08-24 23:40:14 +02:00
Pim van Pelt	04b4aebc97	Merge branch 'main' of github.com:pimvanpelt/lcpng into main	2021-08-24 18:18:51 +02:00
Pim van Pelt	a3a5f68926	Add newlink/delink processing. - Can up/down a link. - Can set MAC on a link, if it's a phy. - Can set MTU on a link. - Can delete link (including phy). Because link state and mtu changes tend to go around in circles (from netlink -> vpp; and then with lcp-sync on, as well from vpp -> netlink) when we consume a batch of netlink messages, we'll temporarily turn off lcp-sync if it's enabled. TODO (in the next commit), the whole nine yards of creating interfaces in VPP based on NEWLINK vlans that come in. Conceptualy not too difficult: if NEWLINK doesn't have a LIP associated with it, but it's a VLAN, and the parent of the VLAN is a link which _does_ have a LIP, then we can create the subint in VPP in the correct way.	2021-08-24 18:18:23 +02:00
Pim van Pelt	e604dd3478	Add newlink/delink processing. - Can up/down a link. - Can set MAC on a link, if it's a phy. - Can set MTU on a link. - Can delete link (including phy). Because link state and mtu changes tend to go around in circles (from netlink -> vpp; and then with lcp-sync on, as well from vpp -> netlink) when we consume a batch of netlink messages, we'll temporarily turn off lcp-sync if it's enabled. TODO (in the next commit), the whole nine yards of creating interfaces in VPP based on NEWLINK vlans that come in. Conceptualy not too difficult: if NEWLINK doesn't have a LIP associated with it, but it's a VLAN, and the parent of the VLAN is a link which _does_ have a LIP, then we can create the subint in VPP in the correct way.	2021-08-24 18:11:51 +02:00
Pim van Pelt	6d2ce1cd83	Consolidate MTU and Link changes into one function.	2021-08-24 17:50:44 +02:00
Pim van Pelt	c02656de22	Add thread barriers on non-mp-safe calls	2021-08-24 01:13:16 +02:00
Pim van Pelt	d63fbd8a9a	Allow NO_SUCH_ENTRY to count as successful 'removal' of neighbors. When an address is removed, VPP will invalidate the neighbor cache. This change allows the subsequent gratutious neigh deletion from Linux to be harmless.	2021-08-24 00:52:26 +02:00
Pim van Pelt	87742b4f54	Add netlink address add/del Straight forward addition/removal of IPv4 and IPv6 addresses on interfaces. One thing I noticed, which isn't a concern but an unfortunate issue, looking at the following sequence: ip addr add 10.0.1.1/30 dev e0 debug linux-cp/nl addr_add: netlink route/addr: add idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent) warn linux-cp/nl dispatch: ignored route/route: add family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: add family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 } ping 10.0.1.2 debug linux-cp/nl neigh_add: netlink route/neigh: add idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000 notice linux-cp/nl neigh_add: Added 10.0.1.2 lladdr 68:05:ca:32:45:94 iface TenGigabitEthernet3/0/0 ip addr del 10.0.1.1/30 dev e0 debug linux-cp/nl addr_del: netlink route/addr: del idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent) notice linux-cp/nl addr_del: Deleted 10.0.1.1/30 iface TenGigabitEthernet3/0/0 warn linux-cp/nl dispatch: ignored route/route: del family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 } warn linux-cp/nl dispatch: ignored route/route: del family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 } debug linux-cp/nl neigh_del: netlink route/neigh: del idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000 error linux-cp/nl neigh_del: Failed 10.0.1.2 iface TenGigabitEthernet3/0/0 It is this very last message that's a bit of a concern -- the ping brought the lladdr into the neighbor cache; and the subsequent address deletion first removed the address, then all the typical local routes (the connected, the broadcast, the network, and the self/local); but then as well explicitly deleted the neighbor, which is correct behavior for Linux, except that VPP already invalidates the neighbor cache and adds/removes the connected routes for example in ip/ip4_forward.c L826-L830 and L583. I predict more of these false positive 'errors' like the one on neigh_del() beacuse interface/route addition/deletion is slightly different in VPP than in Linux. I may have to reclassify the errors as warnings otherwise.	2021-08-24 00:26:06 +02:00
Pim van Pelt	30bab1d3f9	Our first Netlink syncer! Add lcpng_nl_sync.c that will house these functions. Their purpose is to take state learned from netlink messages, and apply that state to VPP. Some rearranging/plumbing was necessary to get logging to be visible in this new source file. Then, we add lcp_nl_neigh_add() and _del() which look up the LIP, convert the lladdr and ip address from Netlink into VPP variants, and then add or remove the ip4/ip6 neighbor adjacency.	2021-08-24 00:05:29 +02:00
Pim van Pelt	c4e3043ea1	Add skeleton of Linux CP Netlink Listener Register lcp_nl_init() which adds interface pair add/del callbacks. lcb_nl_pair_add_cb: Initiate netlink listener for first interface in its netns. If subsequent adds are in other netns, issue a warning. Keep refcount. lcb_nl_pair_del_cb: Remove listener when the last interface pair is removed. Socket is opened, file is added to VPP's epoll, with lcp_nl_read_cb() and lcp_nl_error_cb() callbacks installed. - lcp_nl_read_cb() calls lcp_nl_callback() which pushes netlink messages onto a queue and issues NL_EVENT_READ event, any socket read error issues NL_EVENT_READ_ERR event. - lcp_nl_error_cb() simply issues NL_EVENT_READ_ERR event. Then, initialize a process node called lcp_nl_process(), which handles: - NL_EVENT_READ and call lcp_nl_process_msgs() - if messages are left in the queue, reschedule consumption after M msecs. This allows new netlink messages to continuously be read from the kernel, even if we have lots of messages to consume. - NL_EVENT_READ_ERR and close/reopens the netlink socket. lcp_nl_process_msgs() processes up to N messages and/or for up to M msecs, whichever comes first. For each, calling lcp_nl_dispatch(). lcp_nl_dispatch() ultimately just throws the message away after logging it with format_nl_object()	2021-08-23 23:04:50 +02:00
Pim van Pelt	f2ab33341f	Update README.md	2021-08-22 21:15:18 +02:00
Pim van Pelt	4705791743	Update README.md	2021-08-22 21:11:18 +02:00
Pim van Pelt	ea1568b7cc	Update README.md Add a functionality table with the current state of affairs.	2021-08-22 21:06:46 +02:00
Pim van Pelt	caba21ba99	Add README.md and LICENSE	2021-08-22 15:46:19 +02:00
Pim van Pelt	fee9776e87	Cleanup -- remove unused pair_add_sub()	2021-08-15 19:47:46 +02:00
Pim van Pelt	2d00de080b	Protect VPP -> Linux state propagation behind flag Introduce lcp_main.lcp_sync, which determines if state changes made to interfaces in VPP do or don't propagate into Linux. - Add a startup.conf directive 'lcp-sync' to enable at startup time. - Add CLI.short_help = "lcp lcp-sync [on\|enable\|off\|disable]", - Show the current value in "show lcp". Gate changes in mtu, state and address on lcp_lcp_sync(). When the operator issues 'lcp lcp-sync on', it is prudent to do a one-off sync of all interface attributes from VPP into Linux. For this, add a lcp_itf_pair_sync_state_all() function.	2021-08-15 16:07:53 +02:00
Pim van Pelt	d23aab2d95	Add CLI for lcp-auto-subint In preparation of another feature 'netlink-auto-subint', rename lcp_main's field to "lcp_auto_subint". Add CLI .short_help = "lcp lcp-auto-subint [on\|enable\|off\|disable]" Show status of the field on "lcp show" output.	2021-08-15 14:45:04 +02:00
Pim van Pelt	c2adb3262d	Fix bug in find_outer_vlan() I was looking at the hw interface list, which makes sense for ethernet devices but not for other devices, notably BondEthernets. In addition, creating two separate interfaces with the same outer (for example Gi3/0/0 dot1ad 2345 inner-dot1q 1000 AND the same on Gi3/0/1) would yield an erratic match and a crash. Switch to walking the sw interface list instead, and search for the sup_sw_if_index that has the desired outer. Result: BondEthernet0.{1234,1235,1236,1237} can be created and are functional.	2021-08-14 19:37:37 +02:00
Pim van Pelt	934446dcd9	Add automatic LCP creation Update the pair_config() parser to follow suite. When the configuration 'lcp-auto-subint' is set, and the interface at hand is a subinterface, in lcp_itf_interface_add_del(): - if it's a deletion and we're a sub-int, and we have a LIP: delete it. - if it's a creation and we're a sub-int, and our parent has a LIP, create one. Fix a few logging consistency issues (pair_del), and in pair_delete_by_index() ensure that the right namespace is selected. Due to this quirk with lip->lip_namespace not wanting to be a vec_dup() string, rewrite them all to be strdup/free instead.	2021-08-14 17:20:31 +02:00
Pim van Pelt	b3d8e75706	Start LCP auto-creation This first preparation moves lcp_itf_phy_add() to lcpng_if_sync.c and renames it lcp_itf_interface_add_del(). It does all the pre-flight checks to validate that a new device, given by sw_if_index, can have a LIP created: - must be a sub-int - must have a sw_sup_if_index, which itself has a LIP However, I realize that I cannot create an interface from within an interface add callback, so I'll have to schedule the child LIP to be created by a process, after the callback returns. I'll do that in the next commit.	2021-08-14 14:28:16 +02:00
Pim van Pelt	20e282655b	fixstyle	2021-08-14 11:27:06 +02:00
Pim van Pelt	10f10d534c	Ensure the plugin works well with namespaces I've made a few cosmetic adjustments: - introduce debug, info, notice, warn and err loggers - remove redundant logging, and set correct (conservative) log levels - turn the sync-state error into a warning And a little debt paydown: - refactor sync-state into its own function, to be called instead of all the spot fixes elsewhere. It's going to be the case that sync-state is "the reconsiliation function". - Fix a bug in lip->lip_namespace copy: vec_dup() there doesn't seem to work for me, use strdup() instead and make a mental note to reviist. The plugin now works with 'lcpng default netns dataplane' in its startup.conf; and with 'lcp default netns dataplane' as its first command. A few of these fixes should go upstream, too, which I'll do next.	2021-08-14 09:40:43 +02:00
Pim van Pelt	aab11196cf	Sanitize some overly verbose logging ERR->{DBG,INFO}	2021-08-13 21:08:03 +02:00
Pim van Pelt	72f55fd901	Netlink namespaces! I have been very careless in using the correct network namespace when manipulating LCP host devices. Around any/every netlink write operation, we must first clib_setns() into the correct namespace. So, wrap every call of vnet_netlink_*() in all places. For consistency, use the convention 'curr_ns_fd' (for the one we are coming from) and 'vif_ns_fd' (to signal the one that the netlink VIF is in). Be careful as well to enter and exit everywhere without losing file descriptors.	2021-08-13 20:58:28 +02:00
Pim van Pelt	79a395b3c9	Clamp sub-int MTU to parent's MTU In VPP, a sub-interface can have a larger MTU than its parent. In Linux, this cannot happen, so the following is problematic: ``` DBGvpp# create sub TenGigabitEthernet3/0/0 10 DBGvpp# set int mtu packet 1500 TenGigabitEthernet3/0/0 DBGvpp# set int mtu packet 9000 TenGigabitEthernet3/0/0.10 694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000 695: e0.10@e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 ``` The best way to ensure this works is to clamp the sub-int to a maximum MTU of that of its parent, and override the user request to change the VPP sub-int to anything higher than that, perhaps logging an error explaining why. This means two things: 1. Any change in VPP of a child MTU to larger than its parent, should be reverted. 1 Any change in VPP of a parent MTU should ensure all children are clamped to at most that.	2021-08-13 19:59:55 +02:00
Pim van Pelt	f7e1bb951d	Sync IPv4 and IPv6 addresses from VPP to LCP There are three ways in which IP addresses will want to be copied from VPP into the companion Linux devices: 1) set interface ip address ... adds an IPv4 or IPv6 address - this is handled by lcp_itf_ip[46]_add_del_interface_addr() which is a callback installed in lcp_itf_pair_init() 2) set interface ip address del ... removes them - also handled by lcp_itf_ip[46]_add_del_interface_addr() but curiously there is no upstream vnet_netlink_del_ip[46]_addr() so I wrote them inline here - I will try to get them upstreamed, as they appear to be obvious companions in vnet/device/netlink.h 3) Upon LIP creation, it could be that there are L3 addresses already on the VPP interface. If so, set them with lcp_itf_set_interface_addr() This means that now, at any time a new LIP is created, its state from VPP is fully copied over (MTU, Link state, IPv4/IPv6 addresses)! At runtime, new addresses can be set/removed as well.	2021-08-13 16:50:32 +02:00
Pim van Pelt	39bfa1615f	Add sync of VPP mtu changes into Linux interfaces This is a straight foward copy of the VPP L3 (packet) MTU into the Linux host interface, for any change on an interface/sub-int for which a LIP is defined. As with Linux itself, no care is taken to ensure that the parent interface (e0) has a higher MTU than the child interface (e0.10). DBGvpp# create sub TenGigabitEthernet3/0/0 10 DBGvpp# lcp create TenGigabitEthernet3/0/0 host-if e0 DBGvpp# lcp create TenGigabitEthernet3/0/0.10 host-if e0.10 602: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000 603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000 DBGvpp# set interface mtu packet 9216 TenGigabitEthernet3/0/0 602: e0: <BROADCAST,MULTICAST> mtu 9216 qdisc mq state DOWN mode DEFAULT group default qlen 1000 603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000 DBGvpp# set interface mtu packet 9050 TenGigabitEthernet3/0/0 602: e0: <BROADCAST,MULTICAST> mtu 9050 qdisc mq state DOWN mode DEFAULT group default qlen 1000 603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000 DBGvpp# set interface mtu packet 9050 TenGigabitEthernet3/0/0.10 602: e0: <BROADCAST,MULTICAST> mtu 9050 qdisc mq state DOWN mode DEFAULT group default qlen 1000 603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9050 qdisc noop state DOWN mode DEFAULT group default qlen 1000 DBGvpp# set interface mtu packet 1500 TenGigabitEthernet3/0/0.10 602: e0: <BROADCAST,MULTICAST> mtu 9050 qdisc mq state DOWN mode DEFAULT group default qlen 1000 603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 DBGvpp# set interface mtu packet 1500 TenGigabitEthernet3/0/0 602: e0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000	2021-08-13 15:02:31 +02:00
Pim van Pelt	d8c988273e	fixstyle, now with upstream .clang-* files	2021-08-13 14:37:12 +02:00
Pim van Pelt	cc61e7694d	fixstyle	2021-08-13 14:36:25 +02:00
Pim van Pelt	a3dc56c014	Force a sync of interfaces when VPP phy changes state When Linux changes link on a master interface, all of its children also change. This is not true in VPP, where bringing down a phy while its sub-ints are up will not change link on the sub-ints. We are forced to undo that change by walking the sub-interfaces of a phy and syncing their state back into linux. For simplicity, just walk all interfaces, as others will be a no-op. The approach may have to be revisited once netlink messages are consumed, to ensure there is not an oscillation where netlink sets a link, which forces all links to be reset, generating more netlink messages, etc. Care should be taken when netlink consumption comes into play! DBGvpp# create sub TenGigabitEthernet3/0/0 1 DBGvpp# lcp create TenGigabitEthernet3/0/0 host-if e0 DBGvpp# lcp create TenGigabitEthernet3/0/0.1 host-if e0.1 593: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000 594: e0.1@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000 DBGvpp# set int state TenGigabitEthernet3/0/0 up 593: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000 594: e0.1@e0: <BROADCAST,MULTICAST> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 DBGvpp# set int state TenGigabitEthernet3/0/0.1 up 593: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000 594: e0.1@e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000 DBGvpp# set int state TenGigabitEthernet3/0/0 down 593: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000 594: e0.1@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 DBGvpp# set int state TenGigabitEthernet3/0/0 up DBGvpp# set int state TenGigabitEthernet3/0/0.1 down 593: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000 594: e0.1@e0: <BROADCAST,MULTICAST> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 DBGvpp# set int state TenGigabitEthernet3/0/0 down 593: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000 594: e0.1@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000	2021-08-13 14:33:03 +02:00
Pim van Pelt	7c15c84f6c	Sync interface state from VPP to LCP This is the first in a series of functions that aims to copy forward interface changes in the VPP dataplane into the linux interfaces. Capture link state changes (set interface state ...) and apply them to Linux. There's an important dissonance here: - When Linux sets a parent interface up, all children also go up. ip link set enp66s0f1 down ip link add link enp66s0f1 name foo type vlan id 1234 ip link set foo down ip link \| grep enp66s0f1 9: enp66s0f1: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000 61: foo@enp66s0f1: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000 ip link set enp66s0f1 up ip link \| grep s0f1 9: enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000 61: foo@enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000 While in VPP this is not so, there each individual interface and sub-interface stands for itself. I think the proper fix here is to walk all sub-interfaces when a phy changes, and force a sync of those from VPP to LCP as well. I'll do that in a followup commit so it's easier to roll back.	2021-08-13 13:27:20 +02:00
Pim van Pelt	97b9894dce	Restore the plugin to its original state When I started in my copy, I removed a bunch of code and options that I felt were distracting. I also renamed lots of elements like 'linux-cp' and 'Linux CP' and 'Linux-CP' to just be 'lcpng'. Now, rename all of this back, and make it ready for upstreaming. The only diffs between my repo and upstream now are the includes and the lcpng_interface.[ch] code changes, which is good.	2021-08-12 21:26:34 +02:00
Pim van Pelt	a6e71359c5	Add the ability to create QinQ and QinAD In this situation, Linux and VPP really diverge. In VPP, any sub-interface can carry arbitrary configuration, they can be dot1q, dot1ad, with or without an inner dot1q. So the following is valid in VPP: vppctl create sub TenGigabitEthernet3/0/0 10 dot1ad 100 inner-dot1q 200 exact-match In Linux however, double tagged interfaces have to be created as a chain of two interfaces, first with the outer and then with the inner tag. So there is no equivalent of the above command in Linux, where we must: ip link add link e0 name e0.100 type vlan id 100 proto 802.1ad ip link add link e0.100 name e0.100.200 type vlan id 200 proto 802.1q So in order to create Q-in-Q sub-interfaces, for Linux their intermediary parent must exist, while in VPP this is not true. I have to make a compromise, so I'll be a bit more explicit and allow this type of LCP to be created under these conditions: * A sub-int exists with the intermediary (in this case, `dot1ad 100 exact-match`) * That sub-int itself has an LCP, with a Linux interface device that we can spawn the inner-dot1q 200 interface off of Creation of qinq and qinad interfaces becomes thus: vppctl create sub TenGigabitEthernet3/0/0 10 dot1ad 100 exact-match vppctl create sub TenGigabitEthernet3/0/0 11 dot1ad 100 inner-dot1q 200 exact-match vppctl lcpng create TenGigabitEthernet3/0/0 host-if e0 vppctl lcpng create TenGigabitEthernet3/0/0.10 host-if e0.10 vppctl lcpng create TenGigabitEthernet3/0/0.11 host-if e0.11 And the resulting situation in Linux: pim@hippo:~/src/lcpng$ ip link \| grep e0 397: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000 398: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000 399: e0.11@e0.10: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000	2021-08-12 20:06:40 +02:00
Pim van Pelt	7c3e1eaf3f	Fix two crashes and one potential crash If there are no LCPs defined yet, and I try to create an LCP for a sub-int, there is a NULL deref in lcp_itf_pair_get() because the pool hasn't been initiated yet. Fix this by returning NULL if the pool isn't initialized, and catching the invalid `lip` returning an error. While I was here, I took a look at a few possible crashes: - if the pair_create() is called with an invalid phy_sw_if_index, catch this and return an error. - if a pair_add_sub() is called but the parent cannot be found, catch this and return an error.	2021-08-12 16:38:10 +02:00
Pim van Pelt	d43b583c7b	Initialize the TAP host state from the VPP state DBGvpp# set interface state TenGigabitEthernet3/0/0 down DBGvpp# lcpng create TenGigabitEthernet3/0/0 host-if e0 DBGvpp# set interface state TenGigabitEthernet3/0/1 up DBGvpp# lcpng create TenGigabitEthernet3/0/1 host-if e1 Yields: pim@hippo:~/src/lcpng$ ip link show e0 304: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000 link/ether 68:05:ca:32:46:14 brd ff:ff:ff:ff:ff:ff pim@hippo:~/src/lcpng$ ip link show e1 305: e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 68:05:ca:32:46:15 brd ff:ff:ff:ff:ff:ff	2021-08-12 15:22:35 +02:00
Pim van Pelt	de5c01280e	Refuse to create an LCP for sub-interfaces that does not have exact-match set	2021-08-12 13:01:00 +02:00
Pim van Pelt	954385fd86	Add the ability to set the VLAN protocol on the netlink interfaces -- this will be needed when the VPP interface outer is dot1ad rather than dot1q; Note: this change took for-ever to create! I was stumped on the set_protocol() call succeeding but the add_link() call returning 'Protocol mismatch'; but this was simply a missing ntohs() wrapper on the protocol, pff. Golden hint on https://github.com/thom311/libnl/blob/master/lib/route/link/vlan.c#L454	2021-08-11 18:43:48 +02:00
Pim van Pelt	3c1ccbfcc6	Add a bunch of useful logging. This is to debug why creating QinQ subinterfaces is failing. Consider the following: vppctl create sub TenGigabitEthernet3/0/0 1234 vppctl create sub TenGigabitEthernet3/0/0 1235 dot1q 1234 inner-dot1q 1000 vppctl lcpng create TenGigabitEthernet3/0/0 host-if e0 vppctl lcpng create TenGigabitEthernet3/0/0.1234 host-if e0.1234 vppctl lcpng create TenGigabitEthernet3/0/0.1235 host-if e0.1235 Creating the sub-interfaces works, the first is a normal .1q with tag 1234. Creating its LCP works well, the parent is looked up, a netlink VLAN link is created as a child, and it gets tag 1234. Now the second one: it's also operating on parent Te3/0/0 which is looked up, but now a netlink VLAN link is created as a child, again with dot1q 1234: this interface already exists, so that's a no-go and an error is thrown. -- Thoughts on a fix for this: I think the fix is probably retrieving the correct lip, ie not the lip of the phy parent interface (e0) but the lip of the pair that has the outer vlan 1234 already (e0.1234), and then asking netlink to create a child interface with vlan 1235 on that e0.1234, rather than the phy e0. Although creating a dot1ad.dot1q or a dot1q.dot1q interface in VPP is strictly valid, we will not be able to succeed without the intermediate interface in the Linux model, so we return an error in that case.	2021-08-10 22:03:40 +02:00
Pim van Pelt	cc0be8341f	Add some rambling notes on VPP and LCP and linux interface naming	2021-08-10 20:49:55 +02:00
Pim van Pelt	eea724445f	Catch null on non-existent interface	2021-08-09 00:19:38 +02:00
Pim van Pelt	915e3598ac	Add a bunch more interface phy_add logging to help auto-create/delete LCP in a followup commit	2021-08-09 00:16:03 +02:00
Pim van Pelt	d11c4fccbf	Remove lcp_if_process() Remove the functionality that allows for a configuration of pairs in the startup.conf -- I am intending on creating interfaces for each/any phy and sub and tunnel interfaceb that is created. Instead of injecting events and having a process listening for them, simply use the callback to create LCP interfaces. This change removes the old code and sets a helpful logging entry on line 870, in which future automatic creations and removals of LCP taps will occur. This way, three desirable usability properties are obtained: 1) on startup, all physical interfaces will be copied into LCP 2) on sub-interface or tunnel creation (or phy insertion), new interfaces will be created. 3) Deletions and removals will allow for auto-cleanup of the LCP.	2021-08-08 23:08:43 +02:00
Pim van Pelt	f408a55cde	Add a note (a mental note, really, on why we won't do netlink VPP interface creations	2021-08-08 23:03:48 +02:00
Pim van Pelt	69d7cb73cc	Copy L3 VPP interface MTU if available Still use a sensible default of 9216, but if the L3 packet size is set on the VPP interface, copy it forward (just as we do in the 'host' interface of the TAP itself, ie the interface created in the linux namespace). Now they will all line up initially.	2021-08-08 21:53:04 +02:00
Pim van Pelt	4b79f042bf	Remove hardcoded 9216 with ETHERNET_MAX_PACKET_BYTES	2021-08-08 21:40:56 +02:00

1 2 3

105 Commits