Commit Graph

119 Commits

Author SHA1 Message Date
ba4d9d1a3c Add lcp_itf_pair_sync_state_hw() and only walk relevant int+subints, not all interfaces 2021-09-08 19:50:50 +00:00
8b3356cd86 if sub_interface fails to create, return error and don't continue (fixes a crash) 2021-09-08 19:50:27 +00:00
3c806d586d Accommodate Netgate's usecase, they create the linux netlink device first, and then call the pair_create; in that case, linux_parent_if_index already exists; simplify the call path here, h/t mgsmith@ 2021-09-08 18:41:35 +00:00
36f1ebfdae Move check for parent_if_index up
copy review change from mgsmith@
2021-09-07 22:10:54 +00:00
fdab236755 backport fix from mgsmith@ in VPP main repo 2021-09-07 21:13:55 +00:00
15efc4efc2 Copy review notes from mgsmith@ from https://gerrit.fd.io/r/c/vpp/+/33481/12..13 2021-08-30 20:37:05 +00:00
98a84d0fa7 Turn lip_namespace into a vector 2021-08-30 20:31:53 +00:00
aa9158e1a2 Add logline in case setting rx/tx size fails 2021-08-29 22:02:57 +00:00
9f7286d285 Update INFO logline 2021-08-29 19:10:03 +00:00
fe5b52504f Restore insertion of connected routes 2021-08-29 18:50:28 +00:00
82aa1bbc74 Merge from upstream 2021-08-29 17:52:22 +02:00
6d86d2b075 Allow larger fraction of CPU to be used by netlink
Instead of doing BATCH_DELAY_MS work and BATCH_DELAY_MS sleep, add
a BATCH_WORK_MS (40) and lower BATCH_DELAY_MS (10), so we'll work
80% of the time, and consume netlink messages 20% of the time.

Also raise the total batch size to 8K because on my test machine
we run 2K in 13ms or 8K in ~50ms.
2021-08-29 17:47:33 +02:00
7c864ed099 Temporary fix
Stop adding paths with add_special(); there is a scenario with Bird2
that makes this crash:
- Assume a VPP which has its fib fully synced
- Kill VPP
- Bird will see network devices remove, and mark all routes 'unreach'
- Start VPP
- Bird will see the devices come back, and issue netlink messages for
  each route that is unreach
- these become add_special() because they have no nexthop and are
  of type UNREACHABLE
- adding these to the FIB sometimes crashes in dpo handling

To avoid this, no longer add_special() -- as a caveat, manually inserted
routes to unreach/blackhole will not be explicitly added, however most
will be caught by fib-entry for default-route (which is a 'drop'). This
behavior should be fixed, but it's at the moment not obvious to me how
and I'd prefer this behavior over SIGABORT/SIGSEGV deeper in the code.
2021-08-29 17:07:40 +02:00
8a57300b4c Tidy up locking
This is a little bit of a performance hit (consuming 2K msgs was 11ms, is now 18ms)
but putting the barrier locks inline is fragile and will eventually
cause an issue. As with Matt's pending plugin, sync and release the
barrier lock around the entire handler, rather than in-line.

Contrary to Matt's implementation, I am also going to lock route_add()
and route_del() because without the locking, I get spurious crashes.
2021-08-29 16:59:18 +02:00
76c8b53f41 fix silly crash in logging 2021-08-29 15:03:39 +02:00
7a76498277 Add NEWROUTE/DELROUTE handler
This is super complicated work, taken mostly verbatim from the upstream
linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com

First, add main handler lcp_nl_route_add() and lcp_nl_route_del()

Introduce two FIB sources: one for manual routes, one for dynamic
routes.  See lcp_nl_proto_fib_source() fo details.

Add a bunch of helpers that translate Netlink message data into VPP
primitives:
- lcp_nl_mk_addr46()
    converts a Netlink nl_addr to a VPP ip46_address_t.

- lcp_nl_mk_route_prefix()
    converts a Netlink rtnl_route to a VPP fib_prefix_t.

- lcp_nl_mk_route_mprefix()
     converts a Netlink rtnl_route to a VPP mfib_prefix_t.

- lcp_nl_proto_fib_source()
    selects the most appropciate fib_src by looking at the rt_proto
    (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or
    better is 'fib_src', while anything above that becomes fib_src_dynamic.

- lcp_nl_mk_route_entry_flags()
    generates fib_entry_flag_t from the Netlink route type,
    table and proto metadata.

- lcp_nl_route_path_parse()
    converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds
    that to a growing list of paths.

- lcp_nl_route_path_add_special()
    adds a blackhole/unreach/prohibit route to the list of paths, in
    the special-case there is not yet a path for the destination.

Now we're ready to insert FIB entries:
- lcp_nl_table_find()
    selects the matching table-id,protocol(v4/v6) from a hash of tables.

- lcp_nl_table_add_or_lock()
    if at table-id,protocol(v4/v6) hasn't been used yet, create one,
    otherwise increment a table reference counter so we know how many
    FIB entries we have in this table. Then, return it.

- lcp_nl_table_unlock()
    Decrease the refcount on a table, and if no more prefixes are in
    the table, remove it from VPP.

- lcp_nl_route_del()
    Remove a route from the given table-id/protocol. Do this by applying
    rtnl_route_foreach_nexthop() to the list of Netlink nexthops,
    converting them into VPP paths in a lcp_nl_route_path_parse_t
    structure. If the route is for unreachable/blackhole/prohibit in
    Linux, add that path too.
    Then, remove the VPP paths from the FIB and reduce refcnt or
    remove the table if it's empty using table_unlock().

- lcp_nl_route_add()
    Not all routes are relevant for VPP. Those in table 255 are 'local'
    routes, already set up by ip[46]_address_add(), and some other route
    types are invalid, skip those. Link-local IPv6 and IPv6 multicast is
    also skipped. Then, construct lcp_nl_route_path_parse_t by walking
    the Netlink nexthops, and optionally add a special (in case the
    route was for unreachable/blackhole/prohibit in Linux -- those won't
    have a nexthop).
    Then, insert the VPP paths found in the Netlink message into the FIB
    or the multicast FIB, respectively.

And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior
gateway protocol and BGP full tables can be consumed, on my bench in
about 9 seconds:
- A batch of 2048 Netlink messages is handled in 9-11ms, so we can do
  approx 200K messages/sec at peak (and this will consume 50% CPU due
  to the yielding logic in lcp_nl_process() (see the 'case
  NL_EVENT_READ' block that adds a cooldown period of
  LCP_NL_PROCESS_WAIT milliseconds between batches.
- With 3 route reflectors and 2 full BGP peers, at peak I could see
  309K messages left in the producer queue.

- All IPv4 and IPv6 prefixes made their way into the FIB
pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv6: 132506
pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv4: 869966

- Compared to Bird2's view:
pim@hippo:~/src/lcpng$ birdc show route count
BIRD 2.0.7 ready.
3477845 of 3477845 routes for 869942 networks in table master4
527887 of 527887 routes for 132484 networks in table master6
Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables

- Flipping one of the full feeds to another, forcing a reconvergence
  of every prefix in the FIB took about 8 seconds, peaking at 242K
  messages in the queue, with again an average consumption of 2048
  messages per 9-10ms.

- All of this was done while iperf'ing 6Gbps to and from the
  controlplane.
---

Because handling full BGP table is O(1M) messages, I will have to make
some changes in the logging:
- all neigh/route messages become DBG/INFO at best
- all addr/link messages become INFO/NOTICE at best
- when we overflow time/msgs, turn process_msgs into a WARN, otherwise
  keep it at INFO so as not to spam.

In lcpng_interface.c:
- Log NOTICE for pair_add() and pair_del() call;
- Log NOTICE for set_interface_addr() call;

With this approach, setting the logging level of the linux-cp/nl plugin
to 'notice' hits the sweet spot: with things that the operator has
~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND)
to stay below the NOTICE level.
2021-08-29 14:57:21 +02:00
47fd53be42 Add NEWROUTE/DELROUTE handler
This is super complicated work, taken mostly verbatim from the upstream
linux-cp Gerrit, with due credit mgsmith@netgate.com neale@grafiant.com

First, add main handler lcp_nl_route_add() and lcp_nl_route_del()

Introduce two FIB sources: one for manual routes, one for dynamic
routes.  See lcp_nl_proto_fib_source() fo details.

Add a bunch of helpers that translate Netlink message data into VPP
primitives:
- lcp_nl_mk_addr46()
    converts a Netlink nl_addr to a VPP ip46_address_t.

- lcp_nl_mk_route_prefix()
    converts a Netlink rtnl_route to a VPP fib_prefix_t.

- lcp_nl_mk_route_mprefix()
     converts a Netlink rtnl_route to a VPP mfib_prefix_t.

- lcp_nl_proto_fib_source()
    selects the most appropciate fib_src by looking at the rt_proto
    (see /etc/iproute2/rt_protos for a hint). Anything RTPROT_STATIC or
    better is 'fib_src', while anything above that becomes fib_src_dynamic.

- lcp_nl_mk_route_entry_flags()
    generates fib_entry_flag_t from the Netlink route type,
    table and proto metadata.

- lcp_nl_route_path_parse()
    converts a Netlink rtnl_nexthop to VPP fib_route_path_t and adds
    that to a growing list of paths.

- lcp_nl_route_path_add_special()
    adds a blackhole/unreach/prohibit route to the list of paths, in
    the special-case there is not yet a path for the destination.

Now we're ready to insert FIB entries:
- lcp_nl_table_find()
    selects the matching table-id,protocol(v4/v6) from a hash of tables.

- lcp_nl_table_add_or_lock()
    if at table-id,protocol(v4/v6) hasn't been used yet, create one,
    otherwise increment a table reference counter so we know how many
    FIB entries we have in this table. Then, return it.

- lcp_nl_table_unlock()
    Decrease the refcount on a table, and if no more prefixes are in
    the table, remove it from VPP.

- lcp_nl_route_del()
    Remove a route from the given table-id/protocol. Do this by applying
    rtnl_route_foreach_nexthop() to the list of Netlink nexthops,
    converting them into VPP paths in a lcp_nl_route_path_parse_t
    structure. If the route is for unreachable/blackhole/prohibit in
    Linux, add that path too.
    Then, remove the VPP paths from the FIB and reduce refcnt or
    remove the table if it's empty using table_unlock().

- lcp_nl_route_add()
    Not all routes are relevant for VPP. Those in table 255 are 'local'
    routes, already set up by ip[46]_address_add(), and some other route
    types are invalid, skip those. Link-local IPv6 and IPv6 multicast is
    also skipped. Then, construct lcp_nl_route_path_parse_t by walking
    the Netlink nexthops, and optionally add a special (in case the
    route was for unreachable/blackhole/prohibit in Linux -- those won't
    have a nexthop).
    Then, insert the VPP paths found in the Netlink message into the FIB
    or the multicast FIB, respectively.

And with that, Bird shoots to life. Both IPv4 and IPv6 OSPF interior
gateway protocol and BGP full tables can be consumed, on my bench in
about 9 seconds:
- A batch of 2048 Netlink messages is handled in 9-11ms, so we can do
  approx 200K messages/sec at peak (and this will consume 50% CPU due
  to the yielding logic in lcp_nl_process() (see the 'case
  NL_EVENT_READ' block that adds a cooldown period of
  LCP_NL_PROCESS_WAIT milliseconds between batches.
- With 3 route reflectors and 2 full BGP peers, at peak I could see
  309K messages left in the producer queue.

- All IPv4 and IPv6 prefixes made their way into the FIB
pim@hippo:~/src/lcpng$ echo -n "IPv6: "; vppctl sh ip6 fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv6: 132506
pim@hippo:~/src/lcpng$ echo -n "IPv4: "; vppctl sh ip fib summary | awk '$1~/[0-9]+/ { total += $2 } END { print total }'
IPv4: 869966

- Compared to Bird2's view:
pim@hippo:~/src/lcpng$ birdc show route count
BIRD 2.0.7 ready.
3477845 of 3477845 routes for 869942 networks in table master4
527887 of 527887 routes for 132484 networks in table master6
Total: 4005732 of 4005732 routes for 1002426 networks in 2 tables

- Flipping one of the full feeds to another, forcing a reconvergence
  of every prefix in the FIB took about 8 seconds, peaking at 242K
  messages in the queue, with again an average consumption of 2048
  messages per 9-10ms.

- All of this was done while iperf'ing 6Gbps to and from the
  controlplane.
---

Because handling full BGP table is O(1M) messages, I will have to make
some changes in the logging:
- all neigh/route messages become DBG/INFO at best
- all addr/link messages become INFO/NOTICE at best
- when we overflow time/msgs, turn process_msgs into a WARN, otherwise
  keep it at INFO so as not to spam.

In lcpng_interface.c:
- Log NOTICE for pair_add() and pair_del() call;
- Log NOTICE for set_interface_addr() call;

With this approach, setting the logging level of the linux-cp/nl plugin
to 'notice' hits the sweet spot: with things that the operator has
~explicitly done, leaving implicit actions (BGP route adds/dels, ARP/ND)
to stay below the NOTICE level.
2021-08-29 13:52:11 +02:00
873c1e3591 Capture a breadcrum about BondEthernet0 packetlo 2021-08-29 12:59:08 +02:00
a474d09a9e Fix crash when vlib_buffer_copy() fails - also sent upstream in https://gerrit.fd.io/r/c/vpp/+/33606 2021-08-26 16:43:13 +02:00
45f4088656 Add ability to create subint's from Linux
Using the earlier placeholder hint in lcp_nl_link_add(), I know that
I've gotten a NEWLINK request but the linux ifindex doesn't have a LIP.

This could be because the interface is entirely foreign to VPP, for
example somebody created a dummy interface or a VLAN subint on one:
ip link add dum0 type dummy
ip link add link dum0 name dum0.10 type vlan id 10

Or, I'm actually trying to create a VLAN subint, like these:
ip link add link e0 name e0.1234 type vlan id 1234
ip link add link e0.1234 name e0.1235 type vlan id 1000
ip link add link e0 name e0.1236 type vlan id 2345 proto 802.1ad
ip link add link e0.1236 name e0.1237 type vlan id 1000

None of these NEWLINK callbacks, represented by vif (linux interface
id) will have a corresponding LIP. So, I try to create one by calling
lcp_nl_link_add_vlan().

Here, I lookup the parent index ('dum0' or 'e0' in the first examples),
the former of which also doesn't have a LIP, so I bail. If it does,
I still have two choices:
1) the LIP is a phy (ie TenGigabitEthernet3/0/0) and this is a regular
   tagged interface; or
2) the LIP is itself a subint (ie TenGigabitEthernet3/0/0.1234) and
   what I'm asking for is a QinQ or QinAD sub-interface.

So I look up as well the phy LIP. We now have all the ingredients I
need to create the VPP sub-interfaces with the correct inner-dot1q
and outer dot1q or dot1ad.

Of course, I don't really know what subinterface ID to use. It's
appealing to "just" use the vlan, but that's not helpful if the outer
tag and the inner tag are the same. So I write a helper function
vnet_sw_interface_get_available_subid() whose job it is to return an
unused subid for the phy -- starting from 1.

I then create the phy sub-interface and the tap sub-interface,
tying them together into a new LIP. During these interface creations, I
want to make sure that if lcp-auto-subint is on, we disable that. I
don't want VPP racing to create LIPs for the sub-ints right now. Before
I return (either in error state or upon success), I put back the
lcp-auto-subint to what it was before.

If I manage to create the LIP, huzzah. I return it to the caller so it
can continue setting link/mac/mtu etc.
2021-08-24 23:40:14 +02:00
04b4aebc97 Merge branch 'main' of github.com:pimvanpelt/lcpng into main 2021-08-24 18:18:51 +02:00
a3a5f68926 Add newlink/delink processing.
- Can up/down a link.
- Can set MAC on a link, if it's a phy.
- Can set MTU on a link.
- Can delete link (including phy).

Because link state and mtu changes tend to go around in circles (from
netlink -> vpp; and then with lcp-sync on, as well from vpp -> netlink)
when we consume a batch of netlink messages, we'll temporarily turn off
lcp-sync if it's enabled.

TODO (in the next commit), the whole nine yards of creating interfaces
in VPP based on NEWLINK vlans that come in. Conceptualy not too
difficult: if NEWLINK doesn't have a LIP associated with it, but it's a
VLAN, and the parent of the VLAN is a link which _does_ have a LIP, then
we can create the subint in VPP in the correct way.
2021-08-24 18:18:23 +02:00
e604dd3478 Add newlink/delink processing.
- Can up/down a link.
- Can set MAC on a link, if it's a phy.
- Can set MTU on a link.
- Can delete link (including phy).

Because link state and mtu changes tend to go around in circles (from
netlink -> vpp; and then with lcp-sync on, as well from vpp -> netlink)
when we consume a batch of netlink messages, we'll temporarily turn off
lcp-sync if it's enabled.

TODO (in the next commit), the whole nine yards of creating interfaces
in VPP based on NEWLINK vlans that come in. Conceptualy not too
difficult: if NEWLINK doesn't have a LIP associated with it, but it's a
VLAN, and the parent of the VLAN is a link which _does_ have a LIP, then
we can create the subint in VPP in the correct way.
2021-08-24 18:11:51 +02:00
6d2ce1cd83 Consolidate MTU and Link changes into one function. 2021-08-24 17:50:44 +02:00
c02656de22 Add thread barriers on non-mp-safe calls 2021-08-24 01:13:16 +02:00
d63fbd8a9a Allow NO_SUCH_ENTRY to count as successful 'removal' of neighbors. When an address is removed, VPP will invalidate the neighbor cache. This change allows the subsequent gratutious neigh deletion from Linux to be harmless. 2021-08-24 00:52:26 +02:00
87742b4f54 Add netlink address add/del
Straight forward addition/removal of IPv4 and IPv6 addresses on
interfaces. One thing I noticed, which isn't a concern but an
unfortunate issue, looking at the following sequence:

ip addr add 10.0.1.1/30 dev e0
debug      linux-cp/nl    addr_add: netlink route/addr: add idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
warn       linux-cp/nl    dispatch: ignored route/route: add family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: add family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }

ping 10.0.1.2
debug      linux-cp/nl    neigh_add: netlink route/neigh: add idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
notice     linux-cp/nl    neigh_add: Added 10.0.1.2 lladdr 68:05:ca:32:45:94 iface TenGigabitEthernet3/0/0

ip addr del 10.0.1.1/30 dev e0
debug      linux-cp/nl    addr_del: netlink route/addr: del idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
notice     linux-cp/nl    addr_del: Deleted 10.0.1.1/30 iface TenGigabitEthernet3/0/0
warn       linux-cp/nl    dispatch: ignored route/route: del family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn       linux-cp/nl    dispatch: ignored route/route: del family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
debug      linux-cp/nl    neigh_del: netlink route/neigh: del idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
error      linux-cp/nl    neigh_del: Failed 10.0.1.2 iface TenGigabitEthernet3/0/0

It is this very last message that's a bit of a concern -- the ping
brought the lladdr into the neighbor cache; and the subsequent address
deletion first removed the address, then all the typical local routes
(the connected, the broadcast, the network, and the self/local); but
then as well explicitly deleted the neighbor, which is correct behavior
for Linux, except that VPP already invalidates the neighbor cache and
adds/removes the connected routes for example in ip/ip4_forward.c L826-L830
and L583.

I predict more of these false positive 'errors' like the one on neigh_del()
beacuse interface/route addition/deletion is slightly different in VPP than
in Linux. I may have to reclassify the errors as warnings otherwise.
2021-08-24 00:26:06 +02:00
30bab1d3f9 Our first Netlink syncer!
Add lcpng_nl_sync.c that will house these functions. Their purpose is to
take state learned from netlink messages, and apply that state to VPP.

Some rearranging/plumbing was necessary to get logging to be visible in
this new source file.

Then, we add lcp_nl_neigh_add() and _del() which look up the LIP, convert
the lladdr and ip address from Netlink into VPP variants, and then add or
remove the ip4/ip6 neighbor adjacency.
2021-08-24 00:05:29 +02:00
c4e3043ea1 Add skeleton of Linux CP Netlink Listener
Register lcp_nl_init() which adds interface pair add/del callbacks.

lcb_nl_pair_add_cb: Initiate netlink listener for first interface in its
  netns. If subsequent adds are in other netns, issue a warning. Keep
  refcount.
lcb_nl_pair_del_cb: Remove listener when the last interface pair is
  removed.

Socket is opened, file is added to VPP's epoll, with lcp_nl_read_cb()
and lcp_nl_error_cb() callbacks installed.
- lcp_nl_read_cb() calls lcp_nl_callback() which pushes netlink messages
  onto a queue and issues NL_EVENT_READ event, any socket read error
  issues NL_EVENT_READ_ERR event.
- lcp_nl_error_cb() simply issues NL_EVENT_READ_ERR event.

Then, initialize a process node called lcp_nl_process(), which handles:
- NL_EVENT_READ and call lcp_nl_process_msgs()
  - if messages are left in the queue, reschedule consumption after M
    msecs. This allows new netlink messages to continuously be read from
    the kernel, even if we have lots of messages to consume.
- NL_EVENT_READ_ERR and close/reopens the netlink socket.

lcp_nl_process_msgs() processes up to N messages and/or for up to M
msecs, whichever comes first. For each, calling lcp_nl_dispatch().

lcp_nl_dispatch() ultimately just throws the message away after
logging it with format_nl_object()
2021-08-23 23:04:50 +02:00
f2ab33341f Update README.md 2021-08-22 21:15:18 +02:00
4705791743 Update README.md 2021-08-22 21:11:18 +02:00
ea1568b7cc Update README.md
Add a functionality table with the current state of affairs.
2021-08-22 21:06:46 +02:00
caba21ba99 Add README.md and LICENSE 2021-08-22 15:46:19 +02:00
fee9776e87 Cleanup -- remove unused pair_add_sub() 2021-08-15 19:47:46 +02:00
2d00de080b Protect VPP -> Linux state propagation behind flag
Introduce lcp_main.lcp_sync, which determines if state changes made
to interfaces in VPP do or don't propagate into Linux.

- Add a startup.conf directive 'lcp-sync' to enable at startup time.
- Add CLI.short_help = "lcp lcp-sync [on|enable|off|disable]",
- Show the current value in "show lcp".

Gate changes in mtu, state and address on lcp_lcp_sync().

When the operator issues 'lcp lcp-sync on', it is prudent to do a
one-off sync of all interface attributes from VPP into Linux.
For this, add a lcp_itf_pair_sync_state_all() function.
2021-08-15 16:07:53 +02:00
d23aab2d95 Add CLI for lcp-auto-subint
In preparation of another feature 'netlink-auto-subint', rename
lcp_main's field to "lcp_auto_subint".

Add CLI .short_help = "lcp lcp-auto-subint [on|enable|off|disable]"

Show status of the field on "lcp show" output.
2021-08-15 14:45:04 +02:00
c2adb3262d Fix bug in find_outer_vlan()
I was looking at the hw interface list, which makes sense for ethernet
devices but not for other devices, notably BondEthernets.

In addition, creating two separate interfaces with the same outer
(for example Gi3/0/0 dot1ad 2345 inner-dot1q 1000 AND the same on
Gi3/0/1) would yield an erratic match and a crash.

Switch to walking the sw interface list instead, and search for
the sup_sw_if_index that has the desired outer. Result:

BondEthernet0.{1234,1235,1236,1237} can be created and are functional.
2021-08-14 19:37:37 +02:00
934446dcd9 Add automatic LCP creation
Update the pair_config() parser to follow suite.

When the configuration 'lcp-auto-subint' is set, and the interface at
hand is a subinterface, in lcp_itf_interface_add_del():

- if it's a deletion and we're a sub-int, and we have a LIP: delete it.
- if it's a creation and we're a sub-int, and our parent has a LIP, create one.

Fix a few logging consistency issues (pair_del), and in
pair_delete_by_index() ensure that the right namespace is selected.

Due to this quirk with lip->lip_namespace not wanting to be a vec_dup()
string, rewrite them all to be strdup/free instead.
2021-08-14 17:20:31 +02:00
b3d8e75706 Start LCP auto-creation
This first preparation moves lcp_itf_phy_add() to lcpng_if_sync.c
and renames it lcp_itf_interface_add_del().

It does all the pre-flight checks to validate that a new device, given
by sw_if_index, can have a LIP created:
- must be a sub-int
- must have a sw_sup_if_index, which itself has a LIP

However, I realize that I cannot create an interface from within an
interface add callback, so I'll have to schedule the child LIP to be
created by a process, after the callback returns.

I'll do that in the next commit.
2021-08-14 14:28:16 +02:00
20e282655b fixstyle 2021-08-14 11:27:06 +02:00
10f10d534c Ensure the plugin works well with namespaces
I've made a few cosmetic adjustments:
- introduce debug, info, notice, warn and err loggers
- remove redundant logging, and set correct (conservative) log levels
- turn the sync-state error into a warning

And a little debt paydown:
- refactor sync-state into its own function, to be called instead of
  all the spot fixes elsewhere. It's going to be the case that
  sync-state is "the reconsiliation function".
- Fix a bug in lip->lip_namespace copy: vec_dup() there doesn't seem
  to work for me, use strdup() instead and make a mental note to
  reviist.

The plugin now works with 'lcpng default netns dataplane' in its
startup.conf; and with 'lcp default netns dataplane' as its first
command. A few of these fixes should go upstream, too, which I'll
do next.
2021-08-14 09:40:43 +02:00
aab11196cf Sanitize some overly verbose logging ERR->{DBG,INFO} 2021-08-13 21:08:03 +02:00
72f55fd901 Netlink namespaces!
I have been very careless in using the correct network namespace when
manipulating LCP host devices. Around any/every netlink write operation,
we must first clib_setns() into the correct namespace. So, wrap every
call of vnet_netlink_*() in all places.

For consistency, use the convention 'curr_ns_fd' (for the one we are
coming from) and 'vif_ns_fd' (to signal the one that the netlink VIF
is in).

Be careful as well to enter and exit everywhere without losing file
descriptors.
2021-08-13 20:58:28 +02:00
79a395b3c9 Clamp sub-int MTU to parent's MTU
In VPP, a sub-interface can have a larger MTU than its parent.
In Linux, this cannot happen, so the following is problematic:

```
DBGvpp# create sub TenGigabitEthernet3/0/0 10
DBGvpp# set int mtu packet 1500 TenGigabitEthernet3/0/0
DBGvpp# set int mtu packet 9000 TenGigabitEthernet3/0/0.10
694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
695: e0.10@e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
```

The best way to ensure this works is to clamp the sub-int to a maximum MTU
of that of its parent, and override the user request to change the VPP
sub-int to anything higher than that, perhaps logging an error explaining
why. This means two things:

1.  Any change in VPP of a child MTU to larger than its parent, should be
    reverted.
1   Any change in VPP of a parent MTU should ensure all children are clamped
    to at most that.
2021-08-13 19:59:55 +02:00
f7e1bb951d Sync IPv4 and IPv6 addresses from VPP to LCP
There are three ways in which IP addresses will want to be copied
from VPP into the companion Linux devices:

1) set interface ip address ... adds an IPv4 or IPv6 address
  - this is handled by lcp_itf_ip[46]_add_del_interface_addr() which
    is a callback installed in lcp_itf_pair_init()
2) set interface ip address del ... removes them
  - also handled by lcp_itf_ip[46]_add_del_interface_addr() but
    curiously there is no upstream vnet_netlink_del_ip[46]_addr() so
    I wrote them inline here - I will try to get them upstreamed, as
    they appear to be obvious companions in vnet/device/netlink.h
3) Upon LIP creation, it could be that there are L3 addresses already
   on the VPP interface. If so, set them with lcp_itf_set_interface_addr()

This means that now, at any time a new LIP is created, its state from
VPP is fully copied over (MTU, Link state, IPv4/IPv6 addresses)!

At runtime, new addresses can be set/removed as well.
2021-08-13 16:50:32 +02:00
39bfa1615f Add sync of VPP mtu changes into Linux interfaces
This is a straight foward copy of the VPP L3 (packet) MTU into the Linux
host interface, for any change on an interface/sub-int for which a LIP is
defined.

As with Linux itself, no care is taken to ensure that the parent interface
(e0) has a higher MTU than the child interface (e0.10).

DBGvpp# create sub TenGigabitEthernet3/0/0 10
DBGvpp# lcp create TenGigabitEthernet3/0/0 host-if e0
DBGvpp# lcp create TenGigabitEthernet3/0/0.10 host-if e0.10
602: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set interface mtu packet 9216 TenGigabitEthernet3/0/0
602: e0: <BROADCAST,MULTICAST> mtu 9216 qdisc mq state DOWN mode DEFAULT group default qlen 1000
603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set interface mtu packet 9050 TenGigabitEthernet3/0/0
602: e0: <BROADCAST,MULTICAST> mtu 9050 qdisc mq state DOWN mode DEFAULT group default qlen 1000
603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set interface mtu packet 9050 TenGigabitEthernet3/0/0.10
602: e0: <BROADCAST,MULTICAST> mtu 9050 qdisc mq state DOWN mode DEFAULT group default qlen 1000
603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9050 qdisc noop state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set interface mtu packet 1500 TenGigabitEthernet3/0/0.10
602: e0: <BROADCAST,MULTICAST> mtu 9050 qdisc mq state DOWN mode DEFAULT group default qlen 1000
603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set interface mtu packet 1500 TenGigabitEthernet3/0/0
602: e0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
603: e0.10@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
2021-08-13 15:02:31 +02:00
d8c988273e fixstyle, now with upstream .clang-* files 2021-08-13 14:37:12 +02:00
cc61e7694d fixstyle 2021-08-13 14:36:25 +02:00
a3dc56c014 Force a sync of interfaces when VPP phy changes state
When Linux changes link on a master interface, all of its children also change.
This is not true in VPP, where bringing down a phy while its sub-ints are up
will not change link on the sub-ints.

We are forced to undo that change by walking the sub-interfaces of a phy and
syncing their state back into linux.

For simplicity, just walk all interfaces, as others will be a no-op.

The approach may have to be revisited once netlink messages are
consumed, to ensure there is not an oscillation where netlink sets a
link, which forces all links to be reset, generating more netlink
messages, etc. Care should be taken when netlink consumption comes into
play!

DBGvpp# create sub TenGigabitEthernet3/0/0 1
DBGvpp# lcp create TenGigabitEthernet3/0/0 host-if e0
DBGvpp# lcp create TenGigabitEthernet3/0/0.1 host-if e0.1
593: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
594: e0.1@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set int state TenGigabitEthernet3/0/0  up
593: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
594: e0.1@e0: <BROADCAST,MULTICAST> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set int state TenGigabitEthernet3/0/0.1 up
593: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
594: e0.1@e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000

DBGvpp# set int state TenGigabitEthernet3/0/0 down
593: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
594: e0.1@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set int state TenGigabitEthernet3/0/0 up
DBGvpp# set int state TenGigabitEthernet3/0/0.1 down
593: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
594: e0.1@e0: <BROADCAST,MULTICAST> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set int state TenGigabitEthernet3/0/0 down
593: e0: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
594: e0.1@e0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
2021-08-13 14:33:03 +02:00
7c15c84f6c Sync interface state from VPP to LCP
This is the first in a series of functions that aims to copy forward
interface changes in the VPP dataplane into the linux interfaces.

Capture link state changes (set interface state ...) and apply them
to Linux.

There's an important dissonance here:
- When Linux sets a parent interface up, all children also go up.

ip link set enp66s0f1 down
ip link add link enp66s0f1 name foo type vlan id 1234
ip link set foo down
ip link | grep enp66s0f1
9: enp66s0f1: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
61: foo@enp66s0f1: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000

ip link set enp66s0f1 up
ip link | grep s0f1
9: enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
61: foo@enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000

While in VPP this is not so, there each individual interface and
sub-interface stands for itself. I think the proper fix here is to walk
all sub-interfaces when a phy changes, and force a sync of those from
VPP to LCP as well. I'll do that in a followup commit so it's easier to
roll back.
2021-08-13 13:27:20 +02:00