Add NEWROUTE/DELROUTE handler

This is super complicated work, taken mostly verbatim from the upstream
linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com

First, add main handler lcp_nl_route_add() and lcp_nl_route_del()

Introduce two FIB sources: one for manual routes, one for dynamic
routes.  See lcp_nl_proto_fib_source() fo details.

Add a bunch of helpers that translate Netlink message data into VPP
primitives:
- lcp_nl_mk_addr46()
    converts a Netlink nl_addr to a VPP ip46_address_t.

- lcp_nl_mk_route_prefix()
    converts a Netlink rtnl_route to a VPP fib_prefix_t.

- lcp_nl_mk_route_mprefix()
     converts a Netlink rtnl_route to a VPP mfib_prefix_t.

- lcp_nl_proto_fib_source()
    selects the most appropciate fib_src by looking at the rt_proto
    (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or
    better is 'fib_src', while anything above that becomes fib_src_dynamic.

- lcp_nl_mk_route_entry_flags()
    generates fib_entry_flag_t from the Netlink route type,
    table and proto metadata.

- lcp_nl_route_path_parse()
    converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds
    that to a growing list of paths.

- lcp_nl_route_path_add_special()
    adds a blackhole/unreach/prohibit route to the list of paths, in
    the special-case there is not yet a path for the destination.

Now we're ready to insert FIB entries:
- lcp_nl_table_find()
    selects the matching table-id,protocol(v4/v6) from a hash of tables.

- lcp_nl_table_add_or_lock()
    if at table-id,protocol(v4/v6) hasn't been used yet, create one,
    otherwise increment a table reference counter so we know how many
    FIB entries we have in this table. Then, return it.

- lcp_nl_table_unlock()
    Decrease the refcount on a table, and if no more prefixes are in
    the table, remove it from VPP.

- lcp_nl_route_del()
    Remove a route from the given table-id/protocol. Do this by applying
    rtnl_route_foreach_nexthop() to the list of Netlink nexthops,
    converting them into VPP paths in a lcp_nl_route_path_parse_t
    structure. If the route is for unreachable/blackhole/prohibit in
    Linux, add that path too.
    Then, remove the VPP paths from the FIB and reduce refcnt or
    remove the table if it's empty using table_unlock().

- lcp_nl_route_add()
    Not all routes are relevant for VPP. Those in table 255 are 'local'
    routes, already set up by ip[46]_address_add(), and some other route
    types are invalid, skip those. Link-local IPv6 and IPv6 multicast is
    also skipped. Then, construct lcp_nl_route_path_parse_t by walking
    the Netlink nexthops, and optionally add a special (in case the
    route was for unreachable/blackhole/prohibit in Linux -- those won't
    have a nexthop).
    Then, insert the VPP paths found in the Netlink message into the FIB
    or the multicast FIB, respectively.

And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior
gateway protocol and BGP full tables can be consumed, on my bench in
about 9 seconds:
- A batch of 2048 Netlink messages is handled in 9-11ms, so we can do
  approx 200K messages/sec at peak (and this will consume 50% CPU due
  to the yielding logic in lcp_nl_process() (see the 'case
  NL_EVENT_READ' block that adds a cooldown period of
  LCP_NL_PROCESS_WAIT milliseconds between batches.
- With 3 route reflectors and 2 full BGP peers, at peak I could see
  309K messages left in the producer queue.

- All IPv4 and IPv6 prefixes made their way into the FIB
pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv6: 132506
pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv4: 869966

- Compared to Bird2's view:
pim@hippo:~/src/lcpng$ birdc show route count
BIRD 2.0.7 ready.
3477845 of 3477845 routes for 869942 networks in table master4
527887 of 527887 routes for 132484 networks in table master6
Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables

- Flipping one of the full feeds to another, forcing a reconvergence
  of every prefix in the FIB took about 8 seconds, peaking at 242K
  messages in the queue, with again an average consumption of 2048
  messages per 9-10ms.

- All of this was done while iperf'ing 6Gbps to and from the
  controlplane.
---

Because handling full BGP table is O(1M) messages, I will have to make
some changes in the logging:
- all neigh/route messages become DBG/INFO at best
- all addr/link messages become INFO/NOTICE at best
- when we overflow time/msgs, turn process_msgs into a WARN, otherwise
  keep it at INFO so as not to spam.

In lcpng_interface.c:
- Log NOTICE for pair_add() and pair_del() call;
- Log NOTICE for set_interface_addr() call;

With this approach, setting the logging level of the linux-cp/nl plugin
to 'notice' hits the sweet spot: with things that the operator has
~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND)
to stay below the NOTICE level.
This commit is contained in:
Pim van Pelt
2021-08-29 13:03:58 +02:00
parent 873c1e3591
commit 7a76498277
5 changed files with 540 additions and 53 deletions

View File

@ -227,15 +227,25 @@ lcp_itf_pair_add (u32 host_sw_if_index, u32 phy_sw_if_index, u8 *host_name,
lipi = lcp_itf_pair_find_by_phy (phy_sw_if_index);
LCP_ITF_PAIR_INFO ("pair_add: host:%U phy:%U, host_if:%v vif:%d ns:%s",
if (lipi != INDEX_INVALID)
return VNET_API_ERROR_VALUE_EXIST;
if (host_sw_if_index == ~0) {
LCP_ITF_PAIR_ERR ("pair_add: Cannot add LIP - invalid host");
return VNET_API_ERROR_INVALID_SW_IF_INDEX;
}
if (phy_sw_if_index == ~0) {
LCP_ITF_PAIR_ERR ("pair_add: Cannot add LIP - invalid phy");
return VNET_API_ERROR_INVALID_SW_IF_INDEX;
}
LCP_ITF_PAIR_NOTICE ("pair_add: Adding LIP for host:%U phy:%U, host_if:%v vif:%d ns:%s",
format_vnet_sw_if_index_name, vnet_get_main (),
host_sw_if_index, format_vnet_sw_if_index_name,
vnet_get_main (), phy_sw_if_index, host_name, host_index,
ns);
if (lipi != INDEX_INVALID)
return VNET_API_ERROR_VALUE_EXIST;
/*
* Create a new pair.
*/
@ -260,9 +270,6 @@ lcp_itf_pair_add (u32 host_sw_if_index, u32 phy_sw_if_index, u8 *host_name,
if (ns && ns[0] != 0)
lip->lip_namespace = (u8 *) strdup ((const char *) ns);
if (lip->lip_host_sw_if_index == ~0)
return 0;
/*
* First use of this host interface.
* Enable the x-connect feature on the host to send
@ -406,7 +413,7 @@ lcp_itf_pair_del (u32 phy_sw_if_index)
lip = lcp_itf_pair_get (lipi);
LCP_ITF_PAIR_INFO (
LCP_ITF_PAIR_NOTICE (
"pair_del: host:%U phy:%U host_if:%s vif:%d ns:%s",
format_vnet_sw_if_index_name, vnet_get_main (), lip->lip_host_sw_if_index,
format_vnet_sw_if_index_name, vnet_get_main (), lip->lip_phy_sw_if_index,
@ -673,7 +680,7 @@ lcp_itf_set_interface_addr (const lcp_itf_pair_t *lip)
foreach_ip_interface_address (
lm4, ia, lip->lip_phy_sw_if_index, 1 /* honor unnumbered */, ({
ip4_address_t *r4 = ip_interface_address_get_address (lm4, ia);
LCP_ITF_PAIR_INFO ("set_interface_addr: %U add ip4 %U/%d",
LCP_ITF_PAIR_NOTICE ("set_interface_addr: %U add ip4 %U/%d",
format_lcp_itf_pair, lip, format_ip4_address, r4,
ia->address_length);
vnet_netlink_add_ip4_addr (lip->lip_vif_index, r4, ia->address_length);
@ -683,7 +690,7 @@ lcp_itf_set_interface_addr (const lcp_itf_pair_t *lip)
foreach_ip_interface_address (
lm6, ia, lip->lip_phy_sw_if_index, 1 /* honor unnumbered */, ({
ip6_address_t *r6 = ip_interface_address_get_address (lm6, ia);
LCP_ITF_PAIR_INFO ("set_interface_addr: %U add ip6 %U/%d",
LCP_ITF_PAIR_NOTICE ("set_interface_addr: %U add ip6 %U/%d",
format_lcp_itf_pair, lip, format_ip6_address, r6,
ia->address_length);
vnet_netlink_add_ip6_addr (lip->lip_vif_index, r6, ia->address_length);
@ -842,7 +849,7 @@ lcp_itf_pair_create (u32 phy_sw_if_index, u8 *host_if_name,
* - if this is an inner VLAN, find the pair from the outer sub-int, which must exist.
*/
if (inner_vlan) {
LCP_ITF_PAIR_INFO ("pair_create: trying to create dot1%s %d inner-dot1q %d on %U",
LCP_ITF_PAIR_DBG ("pair_create: trying to create dot1%s %d inner-dot1q %d on %U",
sw->sub.eth.flags.dot1ad?"ad":"q", outer_vlan, inner_vlan,
format_vnet_sw_if_index_name, vnet_get_main (), hw->sw_if_index);
vlan=inner_vlan;
@ -858,7 +865,7 @@ lcp_itf_pair_create (u32 phy_sw_if_index, u8 *host_if_name,
return VNET_API_ERROR_INVALID_SW_IF_INDEX;
}
} else {
LCP_ITF_PAIR_INFO ("pair_create: trying to create dot1%s %d on %U",
LCP_ITF_PAIR_DBG ("pair_create: trying to create dot1%s %d on %U",
sw->sub.eth.flags.dot1ad?"ad":"q", outer_vlan,
format_vnet_sw_if_index_name, vnet_get_main (), hw->sw_if_index);
vlan=outer_vlan;