lcpng

ipng/lcpng

Author	SHA1	Message	Date
Pim van Pelt	ccd4b393e9	Roll back carrier set -- it works fine for phy's that are carrier-up, but crashes if they are carrier-down. 0: /home/pim/src/vpp/src/vnet/interface_funcs.h:46 (vnet_get_hw_interface) assertion `! pool_is_free (vnm->interface_main.hw_interfaces, _e)' fails at /home/pim/src/vpp/src/vppinfra/error.c:143 ns=0x7fff98774e80 "dataplane", host_sw_if_indexp=0x0) at /home/pim/src/vpp/src/plugins/lcpng/lcpng_interface.c:998 at /home/pim/src/vpp/src/plugins/lcpng/lcpng_if_cli.c:96 parent_command_index=371) at /home/pim/src/vpp/src/vlib/cli.c:591 parent_command_index=0) at /home/pim/src/vpp/src/vlib/cli.c:548 at /home/pim/src/vpp/src/vlib/cli.c:694	2021-12-24 21:12:24 +00:00
Pim van Pelt	ddd3ad372a	Only set carrier up when hw is up	2021-12-24 21:04:15 +00:00
Pim van Pelt	1bbe17d586	Merge branch 'main' of github.com:pimvanpelt/lcpng into main	2021-12-24 20:05:57 +00:00
Pim van Pelt	b659de9266	When creating a sub-int from Linux, ensure that the tap subint is set admin-up	2021-12-24 20:05:39 +00:00
Pim van Pelt	fffb1e892a	After fixing the feflags/frpflags bug, install specials again	2021-12-24 20:05:08 +00:00
Pim van Pelt	22e907555d	Initialize the TAP carrier based on the (hardware) phy	2021-12-24 20:04:08 +00:00
Pim van Pelt	37300abf84	Update README.md Add a hint about libmnl	2021-12-20 12:46:16 +01:00
Pim van Pelt	65fa49f30b	Fix crash if netns is not set at startup	2021-12-19 21:46:00 +00:00
Pim van Pelt	d36f34b91d	Fix type issue with route_path flags	2021-12-19 21:32:05 +00:00
Pim van Pelt	cd86f17454	Copy forward neale's improvement from upstream gerrit 33948	2021-11-29 22:26:44 +00:00
Pim van Pelt	a8879bfc54	Merge branch 'main' of github.com:pimvanpelt/lcpng into main	2021-11-29 20:19:46 +00:00
Pim van Pelt	cdf07cce34	Merge review feedback from mgsmith on upstream gerrit 33709 ps8..10	2021-11-29 20:19:34 +00:00
Pim van Pelt	cc2d6908a2	Merge review feedback from mgsmith on upstream gerrit 33709 ps8..10	2021-11-29 20:04:54 +00:00
Pim van Pelt	6caa5e8386	Followup of upstream 8e2b1b129815d3e631aa425ed37899c78ea24e65 addition of MFIB_ENTRY_FLAG_NONE	2021-11-07 18:21:58 +00:00
Pim van Pelt	852a590cf6	Update README.md	2021-10-08 07:56:54 +02:00
Pim van Pelt	bd8d8b40d6	Merge pull request #2 from theraphim/main Don't crash when adding second interface to default namespace.	2021-10-08 07:52:19 +02:00
Paul Komkoff	b72707d560	Don't crash when adding second interface to default namespace. Namespace members used to be char[], now they are vectors. Using strlen on a default value for these vectors results in a segmentation fault. Use vec_cmp instead. While at it, fix the output format to be %v.	2021-10-08 01:27:45 +04:00
Pim van Pelt	92e3835cbe	Merge pull request #1 from jin13417/main Update README.md	2021-09-18 17:01:57 +02:00
jinshaohui	311dba085a	Update README.md I think ln -s soruce_dir. link_dir,this is wrong.	2021-09-18 22:00:25 +08:00
Pim van Pelt	4ed9d02693	Fix non NULL terminated strings (namespace and hostname are vectors)	2021-09-09 20:04:15 +00:00
Pim van Pelt	e2ac348759	'ns' is now a vector, don't memcpy it, but vec_dup() instead	2021-09-08 21:38:45 +00:00
Pim van Pelt	043fecb0e0	Only find the parent tap if it's necessary (ie. doesn't already exist in vif_index; change by mgsmith@	2021-09-08 21:15:42 +00:00
Pim van Pelt	2c390ae512	Also set TAP carrier on netlink messages	2021-09-08 21:14:25 +00:00
Pim van Pelt	45cb9b4afc	Cleanup interface sync - move tap_set_carrier() upstream to lcp_itf_set_link_state() - refuse to set admin-up on sub-int if parent is down - no need to switch namespaces, lcp_itf_set_link_state() already does - in change_mtu and change_admin_state, if the interface is a sub, we only have to sync that one interface. Otherwise, walk the parent interface and all sub-ints with lcp_itf_pair_sync_state_hw() and make note of this in the (DBG) log	2021-09-08 20:53:02 +00:00
Pim van Pelt	ba4d9d1a3c	Add lcp_itf_pair_sync_state_hw() and only walk relevant int+subints, not all interfaces	2021-09-08 19:50:50 +00:00
Pim van Pelt	8b3356cd86	if sub_interface fails to create, return error and don't continue (fixes a crash)	2021-09-08 19:50:27 +00:00
Pim van Pelt	3c806d586d	Accommodate Netgate's usecase, they create the linux netlink device first, and then call the pair_create; in that case, linux_parent_if_index already exists; simplify the call path here, h/t mgsmith@	2021-09-08 18:41:35 +00:00
Pim van Pelt	36f1ebfdae	Move check for parent_if_index up copy review change from mgsmith@	2021-09-07 22:10:54 +00:00
Pim van Pelt	fdab236755	backport fix from mgsmith@ in VPP main repo	2021-09-07 21:13:55 +00:00
Pim van Pelt	15efc4efc2	Copy review notes from mgsmith@ from https://gerrit.fd.io/r/c/vpp/+/33481/12..13	2021-08-30 20:37:05 +00:00
Pim van Pelt	98a84d0fa7	Turn lip_namespace into a vector	2021-08-30 20:31:53 +00:00
Pim van Pelt	aa9158e1a2	Add logline in case setting rx/tx size fails	2021-08-29 22:02:57 +00:00
Pim van Pelt	9f7286d285	Update INFO logline	2021-08-29 19:10:03 +00:00
Pim van Pelt	fe5b52504f	Restore insertion of connected routes	2021-08-29 18:50:28 +00:00
Pim van Pelt	82aa1bbc74	Merge from upstream	2021-08-29 17:52:22 +02:00
Pim van Pelt	6d86d2b075	Allow larger fraction of CPU to be used by netlink Instead of doing BATCH_DELAY_MS work and BATCH_DELAY_MS sleep, add a BATCH_WORK_MS (40) and lower BATCH_DELAY_MS (10), so we'll work 80% of the time, and consume netlink messages 20% of the time. Also raise the total batch size to 8K because on my test machine we run 2K in 13ms or 8K in ~50ms.	2021-08-29 17:47:33 +02:00
Pim van Pelt	7c864ed099	Temporary fix Stop adding paths with add_special(); there is a scenario with Bird2 that makes this crash: - Assume a VPP which has its fib fully synced - Kill VPP - Bird will see network devices remove, and mark all routes 'unreach' - Start VPP - Bird will see the devices come back, and issue netlink messages for each route that is unreach - these become add_special() because they have no nexthop and are of type UNREACHABLE - adding these to the FIB sometimes crashes in dpo handling To avoid this, no longer add_special() -- as a caveat, manually inserted routes to unreach/blackhole will not be explicitly added, however most will be caught by fib-entry for default-route (which is a 'drop'). This behavior should be fixed, but it's at the moment not obvious to me how and I'd prefer this behavior over SIGABORT/SIGSEGV deeper in the code.	2021-08-29 17:07:40 +02:00
Pim van Pelt	8a57300b4c	Tidy up locking This is a little bit of a performance hit (consuming 2K msgs was 11ms, is now 18ms) but putting the barrier locks inline is fragile and will eventually cause an issue. As with Matt's pending plugin, sync and release the barrier lock around the entire handler, rather than in-line. Contrary to Matt's implementation, I am also going to lock route_add() and route_del() because without the locking, I get spurious crashes.	2021-08-29 16:59:18 +02:00
Pim van Pelt	76c8b53f41	fix silly crash in logging	2021-08-29 15:03:39 +02:00
Pim van Pelt	7a76498277	Add NEWROUTE/DELROUTE handler This is super complicated work, taken mostly verbatim from the upstream linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com First, add main handler lcp_nl_route_add() and lcp_nl_route_del() Introduce two FIB sources: one for manual routes, one for dynamic routes. See lcp_nl_proto_fib_source() fo details. Add a bunch of helpers that translate Netlink message data into VPP primitives: - lcp_nl_mk_addr46() converts a Netlink nl_addr to a VPP ip46_address_t. - lcp_nl_mk_route_prefix() converts a Netlink rtnl_route to a VPP fib_prefix_t. - lcp_nl_mk_route_mprefix() converts a Netlink rtnl_route to a VPP mfib_prefix_t. - lcp_nl_proto_fib_source() selects the most appropciate fib_src by looking at the rt_proto (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or better is 'fib_src', while anything above that becomes fib_src_dynamic. - lcp_nl_mk_route_entry_flags() generates fib_entry_flag_t from the Netlink route type, table and proto metadata. - lcp_nl_route_path_parse() converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds that to a growing list of paths. - lcp_nl_route_path_add_special() adds a blackhole/unreach/prohibit route to the list of paths, in the special-case there is not yet a path for the destination. Now we're ready to insert FIB entries: - lcp_nl_table_find() selects the matching table-id,protocol(v4/v6) from a hash of tables. - lcp_nl_table_add_or_lock() if at table-id,protocol(v4/v6) hasn't been used yet, create one, otherwise increment a table reference counter so we know how many FIB entries we have in this table. Then, return it. - lcp_nl_table_unlock() Decrease the refcount on a table, and if no more prefixes are in the table, remove it from VPP. - lcp_nl_route_del() Remove a route from the given table-id/protocol. Do this by applying rtnl_route_foreach_nexthop() to the list of Netlink nexthops, converting them into VPP paths in a lcp_nl_route_path_parse_t structure. If the route is for unreachable/blackhole/prohibit in Linux, add that path too. Then, remove the VPP paths from the FIB and reduce refcnt or remove the table if it's empty using table_unlock(). - lcp_nl_route_add() Not all routes are relevant for VPP. Those in table 255 are 'local' routes, already set up by ip[46]_address_add(), and some other route types are invalid, skip those. Link-local IPv6 and IPv6 multicast is also skipped. Then, construct lcp_nl_route_path_parse_t by walking the Netlink nexthops, and optionally add a special (in case the route was for unreachable/blackhole/prohibit in Linux -- those won't have a nexthop). Then, insert the VPP paths found in the Netlink message into the FIB or the multicast FIB, respectively. And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior gateway protocol and BGP full tables can be consumed, on my bench in about 9 seconds: - A batch of 2048 Netlink messages is handled in 9-11ms, so we can do approx 200K messages/sec at peak (and this will consume 50% CPU due to the yielding logic in lcp_nl_process() (see the 'case NL_EVENT_READ' block that adds a cooldown period of LCP_NL_PROCESS_WAIT milliseconds between batches. - With 3 route reflectors and 2 full BGP peers, at peak I could see 309K messages left in the producer queue. - All IPv4 and IPv6 prefixes made their way into the FIB pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv6: 132506 pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv4: 869966 - Compared to Bird2's view: pim@hippo:~/src/lcpng$ birdc show route count BIRD 2.0.7 ready. 3477845 of 3477845 routes for 869942 networks in table master4 527887 of 527887 routes for 132484 networks in table master6 Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables - Flipping one of the full feeds to another, forcing a reconvergence of every prefix in the FIB took about 8 seconds, peaking at 242K messages in the queue, with again an average consumption of 2048 messages per 9-10ms. - All of this was done while iperf'ing 6Gbps to and from the controlplane. --- Because handling full BGP table is O(1M) messages, I will have to make some changes in the logging: - all neigh/route messages become DBG/INFO at best - all addr/link messages become INFO/NOTICE at best - when we overflow time/msgs, turn process_msgs into a WARN, otherwise keep it at INFO so as not to spam. In lcpng_interface.c: - Log NOTICE for pair_add() and pair_del() call; - Log NOTICE for set_interface_addr() call; With this approach, setting the logging level of the linux-cp/nl plugin to 'notice' hits the sweet spot: with things that the operator has ~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND) to stay below the NOTICE level.	2021-08-29 14:57:21 +02:00
Pim van Pelt	47fd53be42	Add NEWROUTE/DELROUTE handler This is super complicated work, taken mostly verbatim from the upstream linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com First, add main handler lcp_nl_route_add() and lcp_nl_route_del() Introduce two FIB sources: one for manual routes, one for dynamic routes. See lcp_nl_proto_fib_source() fo details. Add a bunch of helpers that translate Netlink message data into VPP primitives: - lcp_nl_mk_addr46() converts a Netlink nl_addr to a VPP ip46_address_t. - lcp_nl_mk_route_prefix() converts a Netlink rtnl_route to a VPP fib_prefix_t. - lcp_nl_mk_route_mprefix() converts a Netlink rtnl_route to a VPP mfib_prefix_t. - lcp_nl_proto_fib_source() selects the most appropciate fib_src by looking at the rt_proto (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or better is 'fib_src', while anything above that becomes fib_src_dynamic. - lcp_nl_mk_route_entry_flags() generates fib_entry_flag_t from the Netlink route type, table and proto metadata. - lcp_nl_route_path_parse() converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds that to a growing list of paths. - lcp_nl_route_path_add_special() adds a blackhole/unreach/prohibit route to the list of paths, in the special-case there is not yet a path for the destination. Now we're ready to insert FIB entries: - lcp_nl_table_find() selects the matching table-id,protocol(v4/v6) from a hash of tables. - lcp_nl_table_add_or_lock() if at table-id,protocol(v4/v6) hasn't been used yet, create one, otherwise increment a table reference counter so we know how many FIB entries we have in this table. Then, return it. - lcp_nl_table_unlock() Decrease the refcount on a table, and if no more prefixes are in the table, remove it from VPP. - lcp_nl_route_del() Remove a route from the given table-id/protocol. Do this by applying rtnl_route_foreach_nexthop() to the list of Netlink nexthops, converting them into VPP paths in a lcp_nl_route_path_parse_t structure. If the route is for unreachable/blackhole/prohibit in Linux, add that path too. Then, remove the VPP paths from the FIB and reduce refcnt or remove the table if it's empty using table_unlock(). - lcp_nl_route_add() Not all routes are relevant for VPP. Those in table 255 are 'local' routes, already set up by ip[46]_address_add(), and some other route types are invalid, skip those. Link-local IPv6 and IPv6 multicast is also skipped. Then, construct lcp_nl_route_path_parse_t by walking the Netlink nexthops, and optionally add a special (in case the route was for unreachable/blackhole/prohibit in Linux -- those won't have a nexthop). Then, insert the VPP paths found in the Netlink message into the FIB or the multicast FIB, respectively. And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior gateway protocol and BGP full tables can be consumed, on my bench in about 9 seconds: - A batch of 2048 Netlink messages is handled in 9-11ms, so we can do approx 200K messages/sec at peak (and this will consume 50% CPU due to the yielding logic in lcp_nl_process() (see the 'case NL_EVENT_READ' block that adds a cooldown period of LCP_NL_PROCESS_WAIT milliseconds between batches. - With 3 route reflectors and 2 full BGP peers, at peak I could see 309K messages left in the producer queue. - All IPv4 and IPv6 prefixes made their way into the FIB pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv6: 132506 pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary \| awk '$1~/[0-9]+/ { total += $2 } END { print total }' IPv4: 869966 - Compared to Bird2's view: pim@hippo:~/src/lcpng$ birdc show route count BIRD 2.0.7 ready. 3477845 of 3477845 routes for 869942 networks in table master4 527887 of 527887 routes for 132484 networks in table master6 Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables - Flipping one of the full feeds to another, forcing a reconvergence of every prefix in the FIB took about 8 seconds, peaking at 242K messages in the queue, with again an average consumption of 2048 messages per 9-10ms. - All of this was done while iperf'ing 6Gbps to and from the controlplane. --- Because handling full BGP table is O(1M) messages, I will have to make some changes in the logging: - all neigh/route messages become DBG/INFO at best - all addr/link messages become INFO/NOTICE at best - when we overflow time/msgs, turn process_msgs into a WARN, otherwise keep it at INFO so as not to spam. In lcpng_interface.c: - Log NOTICE for pair_add() and pair_del() call; - Log NOTICE for set_interface_addr() call; With this approach, setting the logging level of the linux-cp/nl plugin to 'notice' hits the sweet spot: with things that the operator has ~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND) to stay below the NOTICE level.	2021-08-29 13:52:11 +02:00
Pim van Pelt	873c1e3591	Capture a breadcrum about BondEthernet0 packetlo	2021-08-29 12:59:08 +02:00
Pim van Pelt	a474d09a9e	Fix crash when vlib_buffer_copy() fails - also sent upstream in https://gerrit.fd.io/r/c/vpp/+/33606	2021-08-26 16:43:13 +02:00
Pim van Pelt	45f4088656	Add ability to create subint's from Linux Using the earlier placeholder hint in lcp_nl_link_add(), I know that I've gotten a NEWLINK request but the linux ifindex doesn't have a LIP. This could be because the interface is entirely foreign to VPP, for example somebody created a dummy interface or a VLAN subint on one: ip link add dum0 type dummy ip link add link dum0 name dum0.10 type vlan id 10 Or, I'm actually trying to create a VLAN subint, like these: ip link add link e0 name e0.1234 type vlan id 1234 ip link add link e0.1234 name e0.1235 type vlan id 1000 ip link add link e0 name e0.1236 type vlan id 2345 proto 802.1ad ip link add link e0.1236 name e0.1237 type vlan id 1000 None of these NEWLINK callbacks, represented by vif (linux interface id) will have a corresponding LIP. So, I try to create one by calling lcp_nl_link_add_vlan(). Here, I lookup the parent index ('dum0' or 'e0' in the first examples), the former of which also doesn't have a LIP, so I bail. If it does, I still have two choices: 1) the LIP is a phy (ie TenGigabitEthernet3/0/0) and this is a regular tagged interface; or 2) the LIP is itself a subint (ie TenGigabitEthernet3/0/0.1234) and what I'm asking for is a QinQ or QinAD sub-interface. So I look up as well the phy LIP. We now have all the ingredients I need to create the VPP sub-interfaces with the correct inner-dot1q and outer dot1q or dot1ad. Of course, I don't really know what subinterface ID to use. It's appealing to "just" use the vlan, but that's not helpful if the outer tag and the inner tag are the same. So I write a helper function vnet_sw_interface_get_available_subid() whose job it is to return an unused subid for the phy -- starting from 1. I then create the phy sub-interface and the tap sub-interface, tying them together into a new LIP. During these interface creations, I want to make sure that if lcp-auto-subint is on, we disable that. I don't want VPP racing to create LIPs for the sub-ints right now. Before I return (either in error state or upon success), I put back the lcp-auto-subint to what it was before. If I manage to create the LIP, huzzah. I return it to the caller so it can continue setting link/mac/mtu etc.	2021-08-24 23:40:14 +02:00
Pim van Pelt	04b4aebc97	Merge branch 'main' of github.com:pimvanpelt/lcpng into main	2021-08-24 18:18:51 +02:00
Pim van Pelt	a3a5f68926	Add newlink/delink processing. - Can up/down a link. - Can set MAC on a link, if it's a phy. - Can set MTU on a link. - Can delete link (including phy). Because link state and mtu changes tend to go around in circles (from netlink -> vpp; and then with lcp-sync on, as well from vpp -> netlink) when we consume a batch of netlink messages, we'll temporarily turn off lcp-sync if it's enabled. TODO (in the next commit), the whole nine yards of creating interfaces in VPP based on NEWLINK vlans that come in. Conceptualy not too difficult: if NEWLINK doesn't have a LIP associated with it, but it's a VLAN, and the parent of the VLAN is a link which _does_ have a LIP, then we can create the subint in VPP in the correct way.	2021-08-24 18:18:23 +02:00
Pim van Pelt	e604dd3478	Add newlink/delink processing. - Can up/down a link. - Can set MAC on a link, if it's a phy. - Can set MTU on a link. - Can delete link (including phy). Because link state and mtu changes tend to go around in circles (from netlink -> vpp; and then with lcp-sync on, as well from vpp -> netlink) when we consume a batch of netlink messages, we'll temporarily turn off lcp-sync if it's enabled. TODO (in the next commit), the whole nine yards of creating interfaces in VPP based on NEWLINK vlans that come in. Conceptualy not too difficult: if NEWLINK doesn't have a LIP associated with it, but it's a VLAN, and the parent of the VLAN is a link which _does_ have a LIP, then we can create the subint in VPP in the correct way.	2021-08-24 18:11:51 +02:00
Pim van Pelt	6d2ce1cd83	Consolidate MTU and Link changes into one function.	2021-08-24 17:50:44 +02:00
Pim van Pelt	c02656de22	Add thread barriers on non-mp-safe calls	2021-08-24 01:13:16 +02:00
Pim van Pelt	d63fbd8a9a	Allow NO_SUCH_ENTRY to count as successful 'removal' of neighbors. When an address is removed, VPP will invalidate the neighbor cache. This change allows the subsequent gratutious neigh deletion from Linux to be harmless.	2021-08-24 00:52:26 +02:00

1 2

93 Commits