Temporary fix

Stop adding paths with add_special(); there is a scenario with Bird2
that makes this crash:
- Assume a VPP which has its fib fully synced
- Kill VPP
- Bird will see network devices remove, and mark all routes 'unreach'
- Start VPP
- Bird will see the devices come back, and issue netlink messages for
  each route that is unreach
- these become add_special() because they have no nexthop and are
  of type UNREACHABLE
- adding these to the FIB sometimes crashes in dpo handling

To avoid this, no longer add_special() -- as a caveat, manually inserted
routes to unreach/blackhole will not be explicitly added, however most
will be caught by fib-entry for default-route (which is a 'drop'). This
behavior should be fixed, but it's at the moment not obvious to me how
and I'd prefer this behavior over SIGABORT/SIGSEGV deeper in the code.
This commit is contained in:
Pim van Pelt
2021-08-29 17:07:40 +02:00
parent 8a57300b4c
commit 7c864ed099

View File

@ -470,13 +470,21 @@ lcp_nl_route_add (struct rtnl_route *rr)
lcp_nl_mk_route_prefix (rr, &pfx);
entry_flags = lcp_nl_mk_route_entry_flags (rtype, table_id, rproto);
/* Connected is already inserted by ip[46]_add_del_interface_address() */
if (entry_flags & FIB_ENTRY_FLAG_CONNECTED)
{
NL_DBG ("route_add: skip connected table %d prefix %U flags %U",
rtnl_route_get_table (rr), format_fib_prefix, &pfx,
format_fib_entry_flags, entry_flags);
return;
}
/* link local IPv6 */
if (FIB_PROTOCOL_IP6 == pfx.fp_proto &&
(ip6_address_is_multicast (&pfx.fp_addr.ip6) ||
ip6_address_is_link_local_unicast (&pfx.fp_addr.ip6)))
{
NL_DBG ("route_add: skip table %d prefix %U flags %U",
NL_DBG ("route_add: skip linklocal table %d prefix %U flags %U",
rtnl_route_get_table (rr), format_fib_prefix, &pfx,
format_fib_entry_flags, entry_flags);
return;
@ -489,8 +497,11 @@ lcp_nl_route_add (struct rtnl_route *rr)
};
rtnl_route_foreach_nexthop (rr, lcp_nl_route_path_parse, &np);
lcp_nl_route_path_add_special (rr, &np);
// TODO(pim) - figure out why we have spurious crashes when
// adding a route w/ nexthops {} or nexthops { idx 1 } on an
// empty FIB.
//
// lcp_nl_route_path_add_special (rr, &np);
if (0 != vec_len (np.paths))
{
@ -530,9 +541,16 @@ lcp_nl_route_add (struct rtnl_route *rr)
}
}
else
NL_ERROR ("route_add: no paths table %d prefix %U flags %U",
rtnl_route_get_table (rr), format_fib_prefix, &pfx,
format_fib_entry_flags, entry_flags);
// TODO(pim) - while the above add_special() is commented out, any
// route inserted tiwh unreach/prohibit/blackhole, for example when VPP
// is restarted, Bird will flip all routes in the RIB to 'unreach', and
// when it rediscovers devices and their connecteds, it will send a whole
// bunch of them back. Ignore for now. When add_special() is fixed, this
// should become a WARN again.
NL_INFO ("route_add: no paths table %d prefix %U flags %U netlink %U",
rtnl_route_get_table (rr), format_fib_prefix, &pfx,
format_fib_entry_flags, entry_flags, format_nl_object, rr);
vec_free (np.paths);
}