Commit Graph

68 Commits

Author SHA1 Message Date
61386a1c63 backport gerrit 40379 from upstream 2024-06-27 17:43:54 +02:00
25b2999485 Backport https://gerrit.fd.io/r/c/vpp/+/40441 2024-04-01 23:07:56 +02:00
a960d64a87 Add the ability to skip address sync on unnumbered interfaces 2024-03-05 22:59:55 +01:00
c8dc522fe9 Avoid creating a duplicate LIP 2024-03-05 22:59:55 +01:00
becf8fa759 Fix crash - vec_free() instead of free() 2024-02-09 00:14:18 +01:00
ff3ce656c1 Fix delete segment error. 2023-08-11 14:31:25 +08:00
6dbd2c1d7b Fixed the deletion segment error 2023-08-11 14:09:13 +08:00
153628a3de Basic MPLS support.
1) Imports ENCAP_MPLS labels from IPv4/IPv6 routes.
Note that this requires libnl 3.6.0 or newer.

In previous patches, the fib_path_ext_t had a path ID of -1.
After a long investigation, it turned out to be caused by route weight
being set to 0. There is a comment explaining more details.

2) Handles MPLS routes.
MPLS routes were wrongly added as IPv4 routes before.

POP and SWAP are now both supported.
All the routes are installed as NON-EOS and EOS routes,
as the Linux kernel does not differentiate.

EOS POP used in PHP uses the next-hop address family
to determine the resulting address family.

This patch is sufficient for P setups.
PE setups with implicit null should also function okay, as long as a
seperate label gets programmed per address family.

PE setups with explicit null will also forward packets,
but punting is a bit odd and needs MPLS input enabled on the LCP host
device.

Make sure to enable MPLS in VPP first.

3) Propagate MPLS input state to LCP Pair and Linux.
Since the Linux kernel uses the MPLS routes itself,
the LCP pair tap needs MPLS enabled to allow host originated packets.

This also syncs the Linux `net.mpls.conf.<host_if>.input` sysctl to
allow punted packets to have MPLS labels, mostly explicit nulls.

For that to work, load the mpls kernel modules.

4) Cross connect MPLS packets from Linux directly to interface-output

This is a port of https://gerrit.fd.io/r/c/vpp/+/38702
2023-05-30 22:14:35 +02:00
8fc5631ef6 Run clang-format on all files. 2023-05-30 21:28:35 +02:00
bc429011e8 Fix logging for https://gerrit.fd.io/r/c/vpp/+/38602 2023-04-12 10:53:39 +02:00
bd3265b77f backport https://gerrit.fd.io/r/c/vpp/+/38602 2023-04-12 10:52:40 +02:00
9017d1bd3c Move back to multiple rx queues; and copy in the upstream change to autoassign a tap index ID, as the hw_if_index one could ba already taken (by tap create) 2023-04-06 23:36:36 +02:00
6fe37f6a8d Revert change in e035203162 -- set RX/TX queues to 1 -- while hunting down a race condition 2023-02-05 11:34:12 +00:00
815a6e0dce Run VPP's checkstyle to reformat the code 2023-01-11 16:21:40 +00:00
e53d4376ab cleanup: Clean logging, consistent capitalization, nouns, and macro names 2023-01-11 16:18:18 +00:00
6faf206370 merge 2023-01-11 12:12:15 +00:00
263ff9d02c Initialize var in case of error return, per upstream a01be735f2 2023-01-11 13:08:54 +01:00
e035203162 Fix memory leak on failed tap creation per upstream 37b5cccb; Also, sync the RX/TX queues to be the same as upstream linux-cp 2023-01-11 11:13:40 +01:00
a4006ec5da Backport https://gerrit.fd.io/r/c/vpp/+/37026 2022-10-04 13:13:57 +00:00
a0ad5cc5cf Merge Gerrit 36176 from upstream 2022-06-03 20:33:48 +00:00
786530701d Backport https://gerrit.fd.io/r/c/vpp/+/35719 2022-03-31 23:23:36 +00:00
0bc3e01c9b Backport https://gerrit.fd.io/r/c/vpp/+/35528 2022-03-08 14:45:35 +00:00
8a1c031ea9 backport https://gerrit.fd.io/r/c/vpp/+/35411 2022-02-22 20:42:11 +00:00
5f3eb62be9 Reduce resource usage on virtio polling, also avoid potential (undiagnosed) dataplane lockup when running multiple threads: just poll with one RX queue per TAP 2022-02-20 19:19:58 +00:00
ccd4b393e9 Roll back carrier set
-- it works fine for phy's that are carrier-up, but crashes if they are
   carrier-down.

0: /home/pim/src/vpp/src/vnet/interface_funcs.h:46 (vnet_get_hw_interface) assertion `! pool_is_free (vnm->interface_main.hw_interfaces, _e)' fails

    at /home/pim/src/vpp/src/vppinfra/error.c:143
    ns=0x7fff98774e80 "dataplane", host_sw_if_indexp=0x0) at /home/pim/src/vpp/src/plugins/lcpng/lcpng_interface.c:998
    at /home/pim/src/vpp/src/plugins/lcpng/lcpng_if_cli.c:96
    parent_command_index=371) at /home/pim/src/vpp/src/vlib/cli.c:591
    parent_command_index=0) at /home/pim/src/vpp/src/vlib/cli.c:548
    at /home/pim/src/vpp/src/vlib/cli.c:694
2021-12-24 21:12:24 +00:00
ddd3ad372a Only set carrier up when hw is up 2021-12-24 21:04:15 +00:00
22e907555d Initialize the TAP carrier based on the (hardware) phy 2021-12-24 20:04:08 +00:00
65fa49f30b Fix crash if netns is not set at startup 2021-12-19 21:46:00 +00:00
cd86f17454 Copy forward neale's improvement from upstream gerrit 33948 2021-11-29 22:26:44 +00:00
cdf07cce34 Merge review feedback from mgsmith on upstream gerrit 33709 ps8..10 2021-11-29 20:19:34 +00:00
4ed9d02693 Fix non NULL terminated strings (namespace and hostname are vectors) 2021-09-09 20:04:15 +00:00
043fecb0e0 Only find the parent tap if it's necessary (ie. doesn't already exist in vif_index; change by mgsmith@ 2021-09-08 21:15:42 +00:00
45cb9b4afc Cleanup interface sync
- move tap_set_carrier() upstream to lcp_itf_set_link_state()
- refuse to set admin-up on sub-int if parent is down
- no need to switch namespaces, lcp_itf_set_link_state() already does
- in change_mtu and change_admin_state, if the interface is a sub,
  we only have to sync that one interface. Otherwise, walk the parent
  interface and all sub-ints with lcp_itf_pair_sync_state_hw() and
  make note of this in the (DBG) log
2021-09-08 20:53:02 +00:00
8b3356cd86 if sub_interface fails to create, return error and don't continue (fixes a crash) 2021-09-08 19:50:27 +00:00
3c806d586d Accommodate Netgate's usecase, they create the linux netlink device first, and then call the pair_create; in that case, linux_parent_if_index already exists; simplify the call path here, h/t mgsmith@ 2021-09-08 18:41:35 +00:00
36f1ebfdae Move check for parent_if_index up
copy review change from mgsmith@
2021-09-07 22:10:54 +00:00
fdab236755 backport fix from mgsmith@ in VPP main repo 2021-09-07 21:13:55 +00:00
15efc4efc2 Copy review notes from mgsmith@ from https://gerrit.fd.io/r/c/vpp/+/33481/12..13 2021-08-30 20:37:05 +00:00
98a84d0fa7 Turn lip_namespace into a vector 2021-08-30 20:31:53 +00:00
7a76498277 Add NEWROUTE/DELROUTE handler
This is super complicated work, taken mostly verbatim from the upstream
linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com

First, add main handler lcp_nl_route_add() and lcp_nl_route_del()

Introduce two FIB sources: one for manual routes, one for dynamic
routes.  See lcp_nl_proto_fib_source() fo details.

Add a bunch of helpers that translate Netlink message data into VPP
primitives:
- lcp_nl_mk_addr46()
    converts a Netlink nl_addr to a VPP ip46_address_t.

- lcp_nl_mk_route_prefix()
    converts a Netlink rtnl_route to a VPP fib_prefix_t.

- lcp_nl_mk_route_mprefix()
     converts a Netlink rtnl_route to a VPP mfib_prefix_t.

- lcp_nl_proto_fib_source()
    selects the most appropciate fib_src by looking at the rt_proto
    (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or
    better is 'fib_src', while anything above that becomes fib_src_dynamic.

- lcp_nl_mk_route_entry_flags()
    generates fib_entry_flag_t from the Netlink route type,
    table and proto metadata.

- lcp_nl_route_path_parse()
    converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds
    that to a growing list of paths.

- lcp_nl_route_path_add_special()
    adds a blackhole/unreach/prohibit route to the list of paths, in
    the special-case there is not yet a path for the destination.

Now we're ready to insert FIB entries:
- lcp_nl_table_find()
    selects the matching table-id,protocol(v4/v6) from a hash of tables.

- lcp_nl_table_add_or_lock()
    if at table-id,protocol(v4/v6) hasn't been used yet, create one,
    otherwise increment a table reference counter so we know how many
    FIB entries we have in this table. Then, return it.

- lcp_nl_table_unlock()
    Decrease the refcount on a table, and if no more prefixes are in
    the table, remove it from VPP.

- lcp_nl_route_del()
    Remove a route from the given table-id/protocol. Do this by applying
    rtnl_route_foreach_nexthop() to the list of Netlink nexthops,
    converting them into VPP paths in a lcp_nl_route_path_parse_t
    structure. If the route is for unreachable/blackhole/prohibit in
    Linux, add that path too.
    Then, remove the VPP paths from the FIB and reduce refcnt or
    remove the table if it's empty using table_unlock().

- lcp_nl_route_add()
    Not all routes are relevant for VPP. Those in table 255 are 'local'
    routes, already set up by ip[46]_address_add(), and some other route
    types are invalid, skip those. Link-local IPv6 and IPv6 multicast is
    also skipped. Then, construct lcp_nl_route_path_parse_t by walking
    the Netlink nexthops, and optionally add a special (in case the
    route was for unreachable/blackhole/prohibit in Linux -- those won't
    have a nexthop).
    Then, insert the VPP paths found in the Netlink message into the FIB
    or the multicast FIB, respectively.

And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior
gateway protocol and BGP full tables can be consumed, on my bench in
about 9 seconds:
- A batch of 2048 Netlink messages is handled in 9-11ms, so we can do
  approx 200K messages/sec at peak (and this will consume 50% CPU due
  to the yielding logic in lcp_nl_process() (see the 'case
  NL_EVENT_READ' block that adds a cooldown period of
  LCP_NL_PROCESS_WAIT milliseconds between batches.
- With 3 route reflectors and 2 full BGP peers, at peak I could see
  309K messages left in the producer queue.

- All IPv4 and IPv6 prefixes made their way into the FIB
pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv6: 132506
pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv4: 869966

- Compared to Bird2's view:
pim@hippo:~/src/lcpng$ birdc show route count
BIRD 2.0.7 ready.
3477845 of 3477845 routes for 869942 networks in table master4
527887 of 527887 routes for 132484 networks in table master6
Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables

- Flipping one of the full feeds to another, forcing a reconvergence
  of every prefix in the FIB took about 8 seconds, peaking at 242K
  messages in the queue, with again an average consumption of 2048
  messages per 9-10ms.

- All of this was done while iperf'ing 6Gbps to and from the
  controlplane.
---

Because handling full BGP table is O(1M) messages, I will have to make
some changes in the logging:
- all neigh/route messages become DBG/INFO at best
- all addr/link messages become INFO/NOTICE at best
- when we overflow time/msgs, turn process_msgs into a WARN, otherwise
  keep it at INFO so as not to spam.

In lcpng_interface.c:
- Log NOTICE for pair_add() and pair_del() call;
- Log NOTICE for set_interface_addr() call;

With this approach, setting the logging level of the linux-cp/nl plugin
to 'notice' hits the sweet spot: with things that the operator has
~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND)
to stay below the NOTICE level.
2021-08-29 14:57:21 +02:00
6d2ce1cd83 Consolidate MTU and Link changes into one function. 2021-08-24 17:50:44 +02:00
fee9776e87 Cleanup -- remove unused pair_add_sub() 2021-08-15 19:47:46 +02:00
2d00de080b Protect VPP -> Linux state propagation behind flag
Introduce lcp_main.lcp_sync, which determines if state changes made
to interfaces in VPP do or don't propagate into Linux.

- Add a startup.conf directive 'lcp-sync' to enable at startup time.
- Add CLI.short_help = "lcp lcp-sync [on|enable|off|disable]",
- Show the current value in "show lcp".

Gate changes in mtu, state and address on lcp_lcp_sync().

When the operator issues 'lcp lcp-sync on', it is prudent to do a
one-off sync of all interface attributes from VPP into Linux.
For this, add a lcp_itf_pair_sync_state_all() function.
2021-08-15 16:07:53 +02:00
d23aab2d95 Add CLI for lcp-auto-subint
In preparation of another feature 'netlink-auto-subint', rename
lcp_main's field to "lcp_auto_subint".

Add CLI .short_help = "lcp lcp-auto-subint [on|enable|off|disable]"

Show status of the field on "lcp show" output.
2021-08-15 14:45:04 +02:00
c2adb3262d Fix bug in find_outer_vlan()
I was looking at the hw interface list, which makes sense for ethernet
devices but not for other devices, notably BondEthernets.

In addition, creating two separate interfaces with the same outer
(for example Gi3/0/0 dot1ad 2345 inner-dot1q 1000 AND the same on
Gi3/0/1) would yield an erratic match and a crash.

Switch to walking the sw interface list instead, and search for
the sup_sw_if_index that has the desired outer. Result:

BondEthernet0.{1234,1235,1236,1237} can be created and are functional.
2021-08-14 19:37:37 +02:00
934446dcd9 Add automatic LCP creation
Update the pair_config() parser to follow suite.

When the configuration 'lcp-auto-subint' is set, and the interface at
hand is a subinterface, in lcp_itf_interface_add_del():

- if it's a deletion and we're a sub-int, and we have a LIP: delete it.
- if it's a creation and we're a sub-int, and our parent has a LIP, create one.

Fix a few logging consistency issues (pair_del), and in
pair_delete_by_index() ensure that the right namespace is selected.

Due to this quirk with lip->lip_namespace not wanting to be a vec_dup()
string, rewrite them all to be strdup/free instead.
2021-08-14 17:20:31 +02:00
b3d8e75706 Start LCP auto-creation
This first preparation moves lcp_itf_phy_add() to lcpng_if_sync.c
and renames it lcp_itf_interface_add_del().

It does all the pre-flight checks to validate that a new device, given
by sw_if_index, can have a LIP created:
- must be a sub-int
- must have a sw_sup_if_index, which itself has a LIP

However, I realize that I cannot create an interface from within an
interface add callback, so I'll have to schedule the child LIP to be
created by a process, after the callback returns.

I'll do that in the next commit.
2021-08-14 14:28:16 +02:00
10f10d534c Ensure the plugin works well with namespaces
I've made a few cosmetic adjustments:
- introduce debug, info, notice, warn and err loggers
- remove redundant logging, and set correct (conservative) log levels
- turn the sync-state error into a warning

And a little debt paydown:
- refactor sync-state into its own function, to be called instead of
  all the spot fixes elsewhere. It's going to be the case that
  sync-state is "the reconsiliation function".
- Fix a bug in lip->lip_namespace copy: vec_dup() there doesn't seem
  to work for me, use strdup() instead and make a mental note to
  reviist.

The plugin now works with 'lcpng default netns dataplane' in its
startup.conf; and with 'lcp default netns dataplane' as its first
command. A few of these fixes should go upstream, too, which I'll
do next.
2021-08-14 09:40:43 +02:00
aab11196cf Sanitize some overly verbose logging ERR->{DBG,INFO} 2021-08-13 21:08:03 +02:00
72f55fd901 Netlink namespaces!
I have been very careless in using the correct network namespace when
manipulating LCP host devices. Around any/every netlink write operation,
we must first clib_setns() into the correct namespace. So, wrap every
call of vnet_netlink_*() in all places.

For consistency, use the convention 'curr_ns_fd' (for the one we are
coming from) and 'vif_ns_fd' (to signal the one that the netlink VIF
is in).

Be careful as well to enter and exit everywhere without losing file
descriptors.
2021-08-13 20:58:28 +02:00