Using the earlier placeholder hint in lcp_nl_link_add(), I know that
I've gotten a NEWLINK request but the linux ifindex doesn't have a LIP.
This could be because the interface is entirely foreign to VPP, for
example somebody created a dummy interface or a VLAN subint on one:
ip link add dum0 type dummy
ip link add link dum0 name dum0.10 type vlan id 10
Or, I'm actually trying to create a VLAN subint, like these:
ip link add link e0 name e0.1234 type vlan id 1234
ip link add link e0.1234 name e0.1235 type vlan id 1000
ip link add link e0 name e0.1236 type vlan id 2345 proto 802.1ad
ip link add link e0.1236 name e0.1237 type vlan id 1000
None of these NEWLINK callbacks, represented by vif (linux interface
id) will have a corresponding LIP. So, I try to create one by calling
lcp_nl_link_add_vlan().
Here, I lookup the parent index ('dum0' or 'e0' in the first examples),
the former of which also doesn't have a LIP, so I bail. If it does,
I still have two choices:
1) the LIP is a phy (ie TenGigabitEthernet3/0/0) and this is a regular
tagged interface; or
2) the LIP is itself a subint (ie TenGigabitEthernet3/0/0.1234) and
what I'm asking for is a QinQ or QinAD sub-interface.
So I look up as well the phy LIP. We now have all the ingredients I
need to create the VPP sub-interfaces with the correct inner-dot1q
and outer dot1q or dot1ad.
Of course, I don't really know what subinterface ID to use. It's
appealing to "just" use the vlan, but that's not helpful if the outer
tag and the inner tag are the same. So I write a helper function
vnet_sw_interface_get_available_subid() whose job it is to return an
unused subid for the phy -- starting from 1.
I then create the phy sub-interface and the tap sub-interface,
tying them together into a new LIP. During these interface creations, I
want to make sure that if lcp-auto-subint is on, we disable that. I
don't want VPP racing to create LIPs for the sub-ints right now. Before
I return (either in error state or upon success), I put back the
lcp-auto-subint to what it was before.
If I manage to create the LIP, huzzah. I return it to the caller so it
can continue setting link/mac/mtu etc.
- Can up/down a link.
- Can set MAC on a link, if it's a phy.
- Can set MTU on a link.
- Can delete link (including phy).
Because link state and mtu changes tend to go around in circles (from
netlink -> vpp; and then with lcp-sync on, as well from vpp -> netlink)
when we consume a batch of netlink messages, we'll temporarily turn off
lcp-sync if it's enabled.
TODO (in the next commit), the whole nine yards of creating interfaces
in VPP based on NEWLINK vlans that come in. Conceptualy not too
difficult: if NEWLINK doesn't have a LIP associated with it, but it's a
VLAN, and the parent of the VLAN is a link which _does_ have a LIP, then
we can create the subint in VPP in the correct way.
Straight forward addition/removal of IPv4 and IPv6 addresses on
interfaces. One thing I noticed, which isn't a concern but an
unfortunate issue, looking at the following sequence:
ip addr add 10.0.1.1/30 dev e0
debug linux-cp/nl addr_add: netlink route/addr: add idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
warn linux-cp/nl dispatch: ignored route/route: add family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: add family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: add family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }
ping 10.0.1.2
debug linux-cp/nl neigh_add: netlink route/neigh: add idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
notice linux-cp/nl neigh_add: Added 10.0.1.2 lladdr 68:05:ca:32:45:94 iface TenGigabitEthernet3/0/0
ip addr del 10.0.1.1/30 dev e0
debug linux-cp/nl addr_del: netlink route/addr: del idx 1488 family inet local 10.0.1.1/30 flags 0x0080 (permanent)
notice linux-cp/nl addr_del: Deleted 10.0.1.1/30 iface TenGigabitEthernet3/0/0
warn linux-cp/nl dispatch: ignored route/route: del family inet type 1 proto 2 table 254 dst 10.0.1.0/30 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.3 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: del family inet type 3 proto 2 table 255 dst 10.0.1.0 nexthops { idx 1488 }
warn linux-cp/nl dispatch: ignored route/route: del family inet type 2 proto 2 table 255 dst 10.0.1.1 nexthops { idx 1488 }
debug linux-cp/nl neigh_del: netlink route/neigh: del idx 1488 family inet lladdr 68:05:ca:32:45:94 dst 10.0.1.2 state 0x0002 (reachable) flags 0x0000
error linux-cp/nl neigh_del: Failed 10.0.1.2 iface TenGigabitEthernet3/0/0
It is this very last message that's a bit of a concern -- the ping
brought the lladdr into the neighbor cache; and the subsequent address
deletion first removed the address, then all the typical local routes
(the connected, the broadcast, the network, and the self/local); but
then as well explicitly deleted the neighbor, which is correct behavior
for Linux, except that VPP already invalidates the neighbor cache and
adds/removes the connected routes for example in ip/ip4_forward.c L826-L830
and L583.
I predict more of these false positive 'errors' like the one on neigh_del()
beacuse interface/route addition/deletion is slightly different in VPP than
in Linux. I may have to reclassify the errors as warnings otherwise.
Add lcpng_nl_sync.c that will house these functions. Their purpose is to
take state learned from netlink messages, and apply that state to VPP.
Some rearranging/plumbing was necessary to get logging to be visible in
this new source file.
Then, we add lcp_nl_neigh_add() and _del() which look up the LIP, convert
the lladdr and ip address from Netlink into VPP variants, and then add or
remove the ip4/ip6 neighbor adjacency.
Register lcp_nl_init() which adds interface pair add/del callbacks.
lcb_nl_pair_add_cb: Initiate netlink listener for first interface in its
netns. If subsequent adds are in other netns, issue a warning. Keep
refcount.
lcb_nl_pair_del_cb: Remove listener when the last interface pair is
removed.
Socket is opened, file is added to VPP's epoll, with lcp_nl_read_cb()
and lcp_nl_error_cb() callbacks installed.
- lcp_nl_read_cb() calls lcp_nl_callback() which pushes netlink messages
onto a queue and issues NL_EVENT_READ event, any socket read error
issues NL_EVENT_READ_ERR event.
- lcp_nl_error_cb() simply issues NL_EVENT_READ_ERR event.
Then, initialize a process node called lcp_nl_process(), which handles:
- NL_EVENT_READ and call lcp_nl_process_msgs()
- if messages are left in the queue, reschedule consumption after M
msecs. This allows new netlink messages to continuously be read from
the kernel, even if we have lots of messages to consume.
- NL_EVENT_READ_ERR and close/reopens the netlink socket.
lcp_nl_process_msgs() processes up to N messages and/or for up to M
msecs, whichever comes first. For each, calling lcp_nl_dispatch().
lcp_nl_dispatch() ultimately just throws the message away after
logging it with format_nl_object()