30 Commits

Author SHA1 Message Date
cb78074e46 Fix logline tag 2024-03-23 10:04:10 +01:00
3ecbb0199e Backport gerrit 40142 from upstream 2024-02-09 11:08:07 +01:00
153628a3de Basic MPLS support.
1) Imports ENCAP_MPLS labels from IPv4/IPv6 routes.
Note that this requires libnl 3.6.0 or newer.

In previous patches, the fib_path_ext_t had a path ID of -1.
After a long investigation, it turned out to be caused by route weight
being set to 0. There is a comment explaining more details.

2) Handles MPLS routes.
MPLS routes were wrongly added as IPv4 routes before.

POP and SWAP are now both supported.
All the routes are installed as NON-EOS and EOS routes,
as the Linux kernel does not differentiate.

EOS POP used in PHP uses the next-hop address family
to determine the resulting address family.

This patch is sufficient for P setups.
PE setups with implicit null should also function okay, as long as a
seperate label gets programmed per address family.

PE setups with explicit null will also forward packets,
but punting is a bit odd and needs MPLS input enabled on the LCP host
device.

Make sure to enable MPLS in VPP first.

3) Propagate MPLS input state to LCP Pair and Linux.
Since the Linux kernel uses the MPLS routes itself,
the LCP pair tap needs MPLS enabled to allow host originated packets.

This also syncs the Linux `net.mpls.conf.<host_if>.input` sysctl to
allow punted packets to have MPLS labels, mostly explicit nulls.

For that to work, load the mpls kernel modules.

4) Cross connect MPLS packets from Linux directly to interface-output

This is a port of https://gerrit.fd.io/r/c/vpp/+/38702
2023-05-30 22:14:35 +02:00
8fc5631ef6 Run clang-format on all files. 2023-05-30 21:28:35 +02:00
30a1fe2a3f Backport https://gerrit.fd.io/r/c/vpp/+/38633 2023-05-26 09:54:13 +02:00
529a11bb78 Fix ADD/REPLACE semantic for IPv4 and IPv6 routes: if NLM_F_REPLACE is set, call fib_table_entry_update(); otherwise call fib_table_entry_path_add2() 2023-05-21 09:11:05 +02:00
5e9f9ef218 Reduce route add/del message from INFO to DBG 2023-01-14 16:52:57 +00:00
815a6e0dce Run VPP's checkstyle to reformat the code 2023-01-11 16:21:40 +00:00
e53d4376ab cleanup: Clean logging, consistent capitalization, nouns, and macro names 2023-01-11 16:18:18 +00:00
6faf206370 merge 2023-01-11 12:12:15 +00:00
b7fd36bda4 Add entry flags per upstream f0781829d 2023-01-11 13:08:03 +01:00
623973dc1f Backport https://gerrit.fd.io/r/c/vpp/+/36961 2022-10-04 13:11:53 +00:00
a68a6e89e5 Backport https://gerrit.fd.io/r/c/vpp/+/35519 2022-03-08 13:55:28 +00:00
a27fdb9911 Backport https://gerrit.fd.io/r/c/vpp/+/35523 2022-03-08 13:42:56 +00:00
b659de9266 When creating a sub-int from Linux, ensure that the tap subint is set admin-up 2021-12-24 20:05:39 +00:00
fffb1e892a After fixing the feflags/frpflags bug, install specials again 2021-12-24 20:05:08 +00:00
d36f34b91d Fix type issue with route_path flags 2021-12-19 21:32:05 +00:00
cdf07cce34 Merge review feedback from mgsmith on upstream gerrit 33709 ps8..10 2021-11-29 20:19:34 +00:00
6caa5e8386 Followup of upstream 8e2b1b129815d3e631aa425ed37899c78ea24e65 addition of MFIB_ENTRY_FLAG_NONE 2021-11-07 18:21:58 +00:00
2c390ae512 Also set TAP carrier on netlink messages 2021-09-08 21:14:25 +00:00
fe5b52504f Restore insertion of connected routes 2021-08-29 18:50:28 +00:00
7c864ed099 Temporary fix
Stop adding paths with add_special(); there is a scenario with Bird2
that makes this crash:
- Assume a VPP which has its fib fully synced
- Kill VPP
- Bird will see network devices remove, and mark all routes 'unreach'
- Start VPP
- Bird will see the devices come back, and issue netlink messages for
  each route that is unreach
- these become add_special() because they have no nexthop and are
  of type UNREACHABLE
- adding these to the FIB sometimes crashes in dpo handling

To avoid this, no longer add_special() -- as a caveat, manually inserted
routes to unreach/blackhole will not be explicitly added, however most
will be caught by fib-entry for default-route (which is a 'drop'). This
behavior should be fixed, but it's at the moment not obvious to me how
and I'd prefer this behavior over SIGABORT/SIGSEGV deeper in the code.
2021-08-29 17:07:40 +02:00
8a57300b4c Tidy up locking
This is a little bit of a performance hit (consuming 2K msgs was 11ms, is now 18ms)
but putting the barrier locks inline is fragile and will eventually
cause an issue. As with Matt's pending plugin, sync and release the
barrier lock around the entire handler, rather than in-line.

Contrary to Matt's implementation, I am also going to lock route_add()
and route_del() because without the locking, I get spurious crashes.
2021-08-29 16:59:18 +02:00
7a76498277 Add NEWROUTE/DELROUTE handler
This is super complicated work, taken mostly verbatim from the upstream
linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com

First, add main handler lcp_nl_route_add() and lcp_nl_route_del()

Introduce two FIB sources: one for manual routes, one for dynamic
routes.  See lcp_nl_proto_fib_source() fo details.

Add a bunch of helpers that translate Netlink message data into VPP
primitives:
- lcp_nl_mk_addr46()
    converts a Netlink nl_addr to a VPP ip46_address_t.

- lcp_nl_mk_route_prefix()
    converts a Netlink rtnl_route to a VPP fib_prefix_t.

- lcp_nl_mk_route_mprefix()
     converts a Netlink rtnl_route to a VPP mfib_prefix_t.

- lcp_nl_proto_fib_source()
    selects the most appropciate fib_src by looking at the rt_proto
    (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or
    better is 'fib_src', while anything above that becomes fib_src_dynamic.

- lcp_nl_mk_route_entry_flags()
    generates fib_entry_flag_t from the Netlink route type,
    table and proto metadata.

- lcp_nl_route_path_parse()
    converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds
    that to a growing list of paths.

- lcp_nl_route_path_add_special()
    adds a blackhole/unreach/prohibit route to the list of paths, in
    the special-case there is not yet a path for the destination.

Now we're ready to insert FIB entries:
- lcp_nl_table_find()
    selects the matching table-id,protocol(v4/v6) from a hash of tables.

- lcp_nl_table_add_or_lock()
    if at table-id,protocol(v4/v6) hasn't been used yet, create one,
    otherwise increment a table reference counter so we know how many
    FIB entries we have in this table. Then, return it.

- lcp_nl_table_unlock()
    Decrease the refcount on a table, and if no more prefixes are in
    the table, remove it from VPP.

- lcp_nl_route_del()
    Remove a route from the given table-id/protocol. Do this by applying
    rtnl_route_foreach_nexthop() to the list of Netlink nexthops,
    converting them into VPP paths in a lcp_nl_route_path_parse_t
    structure. If the route is for unreachable/blackhole/prohibit in
    Linux, add that path too.
    Then, remove the VPP paths from the FIB and reduce refcnt or
    remove the table if it's empty using table_unlock().

- lcp_nl_route_add()
    Not all routes are relevant for VPP. Those in table 255 are 'local'
    routes, already set up by ip[46]_address_add(), and some other route
    types are invalid, skip those. Link-local IPv6 and IPv6 multicast is
    also skipped. Then, construct lcp_nl_route_path_parse_t by walking
    the Netlink nexthops, and optionally add a special (in case the
    route was for unreachable/blackhole/prohibit in Linux -- those won't
    have a nexthop).
    Then, insert the VPP paths found in the Netlink message into the FIB
    or the multicast FIB, respectively.

And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior
gateway protocol and BGP full tables can be consumed, on my bench in
about 9 seconds:
- A batch of 2048 Netlink messages is handled in 9-11ms, so we can do
  approx 200K messages/sec at peak (and this will consume 50% CPU due
  to the yielding logic in lcp_nl_process() (see the 'case
  NL_EVENT_READ' block that adds a cooldown period of
  LCP_NL_PROCESS_WAIT milliseconds between batches.
- With 3 route reflectors and 2 full BGP peers, at peak I could see
  309K messages left in the producer queue.

- All IPv4 and IPv6 prefixes made their way into the FIB
pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv6: 132506
pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv4: 869966

- Compared to Bird2's view:
pim@hippo:~/src/lcpng$ birdc show route count
BIRD 2.0.7 ready.
3477845 of 3477845 routes for 869942 networks in table master4
527887 of 527887 routes for 132484 networks in table master6
Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables

- Flipping one of the full feeds to another, forcing a reconvergence
  of every prefix in the FIB took about 8 seconds, peaking at 242K
  messages in the queue, with again an average consumption of 2048
  messages per 9-10ms.

- All of this was done while iperf'ing 6Gbps to and from the
  controlplane.
---

Because handling full BGP table is O(1M) messages, I will have to make
some changes in the logging:
- all neigh/route messages become DBG/INFO at best
- all addr/link messages become INFO/NOTICE at best
- when we overflow time/msgs, turn process_msgs into a WARN, otherwise
  keep it at INFO so as not to spam.

In lcpng_interface.c:
- Log NOTICE for pair_add() and pair_del() call;
- Log NOTICE for set_interface_addr() call;

With this approach, setting the logging level of the linux-cp/nl plugin
to 'notice' hits the sweet spot: with things that the operator has
~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND)
to stay below the NOTICE level.
2021-08-29 14:57:21 +02:00
45f4088656 Add ability to create subint's from Linux
Using the earlier placeholder hint in lcp_nl_link_add(), I know that
I've gotten a NEWLINK request but the linux ifindex doesn't have a LIP.

This could be because the interface is entirely foreign to VPP, for
example somebody created a dummy interface or a VLAN subint on one:
ip link add dum0 type dummy
ip link add link dum0 name dum0.10 type vlan id 10

Or, I'm actually trying to create a VLAN subint, like these:
ip link add link e0 name e0.1234 type vlan id 1234
ip link add link e0.1234 name e0.1235 type vlan id 1000
ip link add link e0 name e0.1236 type vlan id 2345 proto 802.1ad
ip link add link e0.1236 name e0.1237 type vlan id 1000

None of these NEWLINK callbacks, represented by vif (linux interface
id) will have a corresponding LIP. So, I try to create one by calling
lcp_nl_link_add_vlan().

Here, I lookup the parent index ('dum0' or 'e0' in the first examples),
the former of which also doesn't have a LIP, so I bail. If it does,
I still have two choices:
1) the LIP is a phy (ie TenGigabitEthernet3/0/0) and this is a regular
   tagged interface; or
2) the LIP is itself a subint (ie TenGigabitEthernet3/0/0.1234) and
   what I'm asking for is a QinQ or QinAD sub-interface.

So I look up as well the phy LIP. We now have all the ingredients I
need to create the VPP sub-interfaces with the correct inner-dot1q
and outer dot1q or dot1ad.

Of course, I don't really know what subinterface ID to use. It's
appealing to "just" use the vlan, but that's not helpful if the outer
tag and the inner tag are the same. So I write a helper function
vnet_sw_interface_get_available_subid() whose job it is to return an
unused subid for the phy -- starting from 1.

I then create the phy sub-interface and the tap sub-interface,
tying them together into a new LIP. During these interface creations, I
want to make sure that if lcp-auto-subint is on, we disable that. I
don't want VPP racing to create LIPs for the sub-ints right now. Before
I return (either in error state or upon success), I put back the
lcp-auto-subint to what it was before.

If I manage to create the LIP, huzzah. I return it to the caller so it
can continue setting link/mac/mtu etc.
2021-08-24 23:40:14 +02:00
a3a5f68926 Add newlink/delink processing.
- Can up/down a link.
- Can set MAC on a link, if it's a phy.
- Can set MTU on a link.
- Can delete link (including phy).

Because link state and mtu changes tend to go around in circles (from
netlink -> vpp; and then with lcp-sync on, as well from vpp -> netlink)
when we consume a batch of netlink messages, we'll temporarily turn off
lcp-sync if it's enabled.

TODO (in the next commit), the whole nine yards of creating interfaces
in VPP based on NEWLINK vlans that come in. Conceptualy not too
difficult: if NEWLINK doesn't have a LIP associated with it, but it's a
VLAN, and the parent of the VLAN is a link which _does_ have a LIP, then
we can create the subint in VPP in the correct way.
2021-08-24 18:18:23 +02:00
c02656de22 Add thread barriers on non-mp-safe calls 2021-08-24 01:13:16 +02:00
d63fbd8a9a Allow NO_SUCH_ENTRY to count as successful 'removal' of neighbors. When an address is removed, VPP will invalidate the neighbor cache. This change allows the subsequent gratutious neigh deletion from Linux to be harmless. 2021-08-24 00:52:26 +02:00
87742b4f54 Add netlink address add/del
Straight forward addition/removal of IPv4 and IPv6 addresses on
interfaces. One thing I noticed, which isn't a concern but an
unfortunate issue, looking at the following sequence:

ip addr add 10.0.1.1/30 dev e0
debug      linux-cp/nl    addr_add: netlink route/addr: add idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
warn       linux-cp/nl    dispatch: ignored route/route: add family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: add family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }

ping 10.0.1.2
debug      linux-cp/nl    neigh_add: netlink route/neigh: add idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
notice     linux-cp/nl    neigh_add: Added 10.0.1.2 lladdr 68:05:ca:32:45:94 iface TenGigabitEthernet3/0/0

ip addr del 10.0.1.1/30 dev e0
debug      linux-cp/nl    addr_del: netlink route/addr: del idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
notice     linux-cp/nl    addr_del: Deleted 10.0.1.1/30 iface TenGigabitEthernet3/0/0
warn       linux-cp/nl    dispatch: ignored route/route: del family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: del family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
debug      linux-cp/nl    neigh_del: netlink route/neigh: del idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
error      linux-cp/nl    neigh_del: Failed 10.0.1.2 iface TenGigabitEthernet3/0/0

It is this very last message that's a bit of a concern -- the ping
brought the lladdr into the neighbor cache; and the subsequent address
deletion first removed the address, then all the typical local routes
(the connected, the broadcast, the network, and the self/local); but
then as well explicitly deleted the neighbor, which is correct behavior
for Linux, except that VPP already invalidates the neighbor cache and
adds/removes the connected routes for example in ip/ip4_forward.c L826-L830
and L583.

I predict more of these false positive 'errors' like the one on neigh_del()
beacuse interface/route addition/deletion is slightly different in VPP than
in Linux. I may have to reclassify the errors as warnings otherwise.
2021-08-24 00:26:06 +02:00
30bab1d3f9 Our first Netlink syncer!
Add lcpng_nl_sync.c that will house these functions. Their purpose is to
take state learned from netlink messages, and apply that state to VPP.

Some rearranging/plumbing was necessary to get logging to be visible in
this new source file.

Then, we add lcp_nl_neigh_add() and _del() which look up the LIP, convert
the lladdr and ip address from Netlink into VPP variants, and then add or
remove the ip4/ip6 neighbor adjacency.
2021-08-24 00:05:29 +02:00