Dataplane reconcile fixes; LB counters cleanup; SPA scope cookie

Checker / reload: - Reload's update-in-place branch now mirrors b.Address onto the runtime health.Backend. Without this, GetBackend kept returning the pre-reload address indefinitely after a config edit that touched addresses but not healthcheck settings — the VPP sync path reads cfg.Backends directly so the dataplane moved on while the gRPC and SPA view stayed wedged on the old IPv4/IPv6. Sync (internal/vpp/lbsync.go): - reconcileVIP now detects encap mismatch in addition to src-ip-sticky mismatch and takes the full tear-down / re-add path via a new shared recreateVIP helper. Triggered when every backend flips address family (gre4 <-> gre6) and the existing VIP can no longer accept new ASes — previously the sync wedged with 'Invalid address family' until a full maglevd restart. - setASWeight is issued whenever the state machine requests flush (a.Flush=true), not only on the weight-value transition edge. Fixes the case where a backend reached StateDisabled after its effective weight had already been drained to 0 by pool failover — the sticky-cache entries pointing at it were previously never cleared. maglev-frontend: - signal.Ignore(SIGHUP) so a controlling-terminal disconnect doesn't kill the daemon. - debian/vpp-maglev.service grants CAP_SYS_ADMIN in addition to CAP_NET_RAW so setns(CLONE_NEWNET) can join the healthcheck netns. Comment documents the 'operation not permitted' symptom and notes the knob can be dropped if the deployment doesn't use the 'netns:' healthcheck option. LB plugin counters (internal/vpp/lbstats.go + friends): - Fix the VIP counter regex: the LB plugin registers vlib_simple_counter_main_t names without a leading '/' (vlib_validate_simple_counter in counter.c:50 uses cm->name verbatim; only entries that set cm->stat_segment_name get a slash). first/next/untracked/no-server now read through as live values instead of zero. - Drop the per-backend FIB counter block end-to-end (proto, grpcapi, metrics, vpp.Client, lbstats, maglevc). Traced from lb/node.c:558 into ip{4,6}_forward.h:141 — the LB plugin forwards by writing adj_index[VLIB_TX] directly and bypassing ip{4,6}_lookup_inline, which is the only path that increments lbm_to_counters. The backend's FIB load_balance stats_index literally never ticks for LB-forwarded traffic, so the column was always zero and misleading. docs/implementation/TODO records the full investigation and the recommended upstream path (new lb_as_stats_dump API message) for when we're ready to carry that VPP patch. - maglevc show vpp lb counters: plain-text tabular headers. label() wraps strings in ANSI escapes (~11 bytes of overhead), but tabwriter counts bytes, not rendered width — so a header row with label()'d cells and data rows with plain cells drifts column alignment on every row. color.go comment now spells out the constraint: label() only works when column N is wrapped identically in every row (key-value layouts are fine, multi-column tables with header-only labelling are not). SPA: - stores/scope.ts is cookie-backed (maglev_scope, 1 year, SameSite=Lax). App.tsx hydrates from the cookie then validates against the fetched snapshots: a cookie referencing a maglevd that no longer exists falls through to snaps[0] instead of leaving the user on a ghost selection. - components/Flash.tsx wraps props.value in createMemo. Solid's on() fires its callback on every dep notification, not on value change — source is right in solid-js/dist/solid.js:460, no equality check. Without the memo, flipping scope between two 'connected' maglevds (or any other cross-store reactive re-eval that doesn't actually change the concrete string) replays the animation every time. createMemo's default === dedupe fixes it in one place for every Flash consumer, superseding the local createMemo workaround we'd added in BackendRow earlier.
2026-04-14 14:39:52 +02:00
parent 4288e22b71
commit 224167ce39
20 changed files with 435 additions and 471 deletions
--- a/internal/vpp/client.go
+++ b/internal/vpp/client.go
@@ -66,11 +66,6 @@ type Client struct {
 	// lbStatsLoop. Published as an immutable slice via atomic.Pointer so
 	// Prometheus scrapes (metrics.Collector.Collect) don't take any lock.
 	lbStatsSnap atomic.Pointer[[]metrics.VIPStatEntry]
-
-	// backendRouteSnap is the most recent per-backend FIB stats snapshot
-	// captured by lbStatsLoop. Same atomic-pointer publish pattern as
-	// lbStatsSnap; see logBackendRouteStats in fibstats.go.
-	backendRouteSnap atomic.Pointer[[]metrics.BackendRouteStat]
 }

 // SetStateSource attaches a live config + health state source. When set, the
@@ -241,18 +236,6 @@ func (c *Client) VIPStats() []metrics.VIPStatEntry {
 	return *p
 }

-// BackendRouteStats satisfies metrics.VPPSource. It returns the latest
-// snapshot of per-backend FIB combined counters (/net/route/to) captured
-// by lbStatsLoop. Returns nil until the first scrape completes, or after
-// a disconnect.
-func (c *Client) BackendRouteStats() []metrics.BackendRouteStat {
-	p := c.backendRouteSnap.Load()
-	if p == nil {
-		return nil
-	}
-	return *p
-}
-
 // VPPInfo satisfies metrics.VPPSource. It returns a copy of the cached
 // connection info as a metrics-local struct so the metrics package doesn't
 // need to import internal/vpp. Second return is false when VPP is not
@@ -309,7 +292,6 @@ func (c *Client) disconnect() {
 	c.lastLBConf = nil // force re-push of lb_conf on reconnect
 	c.mu.Unlock()
 	c.lbStatsSnap.Store(nil)
-	c.backendRouteSnap.Store(nil)

 	safeDisconnectAPI(apiConn)
 	safeDisconnectStats(statsConn)
--- a/internal/vpp/lbstats.go
+++ b/internal/vpp/lbstats.go
@@ -6,7 +6,6 @@ import (
 	"context"
 	"fmt"
 	"log/slog"
-	"sort"
 	"time"

 	"go.fd.io/govpp/adapter"
@@ -20,25 +19,30 @@ import (
 const lbStatsInterval = 5 * time.Second

 // LB VIP counter names as they appear in the VPP stats segment. These
-// come from lb_foreach_vip_counter in src/plugins/lb/lb.h — each entry is
-// registered with only .name set, so the stats segment exposes them at
-// the top level (spaces and all). Replace if the VPP plugin renames them.
+// come from lb_foreach_vip_counter in src/plugins/lb/lb.h — the plugin
+// passes them through vlib_simple_counter_main_t with only .name set
+// (.stat_segment_name unset), so vlib_validate_simple_counter /
+// vlib_stats_add_counter_vector register them under the name verbatim
+// with no leading slash. Contrast with /net/route/to below, which IS
+// registered with stat_segment_name="/net/route/to" in
+// src/vnet/dpo/load_balance.c. Replace if the VPP plugin renames them
+// or starts setting stat_segment_name.
 const (
-	lbStatNextPacket   = "/packet from existing sessions"
-	lbStatFirstPacket  = "/first session packet"
-	lbStatUntrackedPkt = "/untracked packet"
-	lbStatNoServer     = "/no server configured"
+	lbStatNextPacket   = "packet from existing sessions"
+	lbStatFirstPacket  = "first session packet"
+	lbStatUntrackedPkt = "untracked packet"
+	lbStatNoServer     = "no server configured"
 )

 // lbStatPatterns is the full list of anchored regexes passed to DumpStats
 // for one scrape cycle: the four LB-plugin SimpleCounters plus the FIB
-// CombinedCounter for per-route packet+byte totals. Doing it in a single
+// CombinedCounter for per-VIP packet+byte totals. Doing it in a single
 // DumpStats avoids walking the stats segment twice.
 var lbStatPatterns = []string{
-	`^/packet from existing sessions$`,
-	`^/first session packet$`,
-	`^/untracked packet$`,
-	`^/no server configured$`,
+	`^packet from existing sessions$`,
+	`^first session packet$`,
+	`^untracked packet$`,
+	`^no server configured$`,
 	`^/net/route/to$`,
 }

@@ -63,9 +67,24 @@ func (c *Client) lbStatsLoop(ctx context.Context) {
 }

 // scrapeLBStats runs one full scrape cycle: discover VIPs via cli_inband,
-// look up FIB stats_indices for every VIP and every backend via
-// ip_route_lookup, dump all five stats-segment paths in one DumpStats
-// call, and reduce the counters into the two published snapshots.
+// look up the FIB stats_index for every VIP via ip_route_lookup, dump
+// all five stats-segment paths in one DumpStats call, and reduce the
+// counters into the per-VIP snapshot.
+//
+// NOTE: there is no per-backend counter here. The LB plugin's forwarding
+// node (src/plugins/lb/node.c) sets adj_index[VLIB_TX] to the AS's DPO
+// directly and enqueues into the lb*-gre* encap node, bypassing
+// ip6_lookup_inline / ip4_lookup_inline entirely. Since
+// lbm_to_counters is only incremented in those lookup paths (see
+// ip6_forward.h / ip4_forward.h), the backend's FIB-entry
+// load_balance stats_index never ticks for LB-forwarded traffic —
+// /net/route/to at that index would always report zero. Per-backend
+// packet/byte counters would need a new counter inside the LB plugin
+// itself (e.g. a vlib_combined_counter_main_t keyed by AS index
+// incremented in lb4-gre4 / lb6-gre6 / etc.). Until that lands
+// upstream we simply don't expose per-backend rates; the VIP-level
+// counters tell the dataplane story on the granularity VPP actually
+// provides today.
 func (c *Client) scrapeLBStats() error {
 	if !c.IsConnected() {
 		return nil
@@ -100,37 +119,6 @@ func (c *Client) scrapeLBStats() error {
 		vipStatsIdx[k] = idx
 	}

-	// Resolve FIB stats_indices for every backend in the running config.
-	type backendLookup struct {
-		name, addr string
-		index      uint32
-	}
-	var backends []backendLookup
-	if src := c.getStateSource(); src != nil {
-		if cfg := src.Config(); cfg != nil {
-			names := make([]string, 0, len(cfg.Backends))
-			for name := range cfg.Backends {
-				names = append(names, name)
-			}
-			sort.Strings(names) // stable snapshot order
-			for _, name := range names {
-				b := cfg.Backends[name]
-				if b.Address == nil {
-					continue
-				}
-				idx, err := fibStatsIndex(ch, b.Address)
-				if err != nil {
-					slog.Debug("vpp-backend-route-lookup-failed",
-						"backend", name, "address", b.Address.String(), "err", err)
-					continue
-				}
-				backends = append(backends, backendLookup{
-					name: name, addr: b.Address.String(), index: idx,
-				})
-			}
-		}
-	}
-
 	c.mu.Lock()
 	sc := c.statsClient
 	c.mu.Unlock()
@@ -178,26 +166,6 @@ func (c *Client) scrapeLBStats() error {
 		)
 	}
 	c.lbStatsSnap.Store(&vipOut)
-
-	// ---- backend snapshot ----
-	backendOut := make([]metrics.BackendRouteStat, 0, len(backends))
-	for _, l := range backends {
-		pkts, byts := reduceCombinedCounter(routeTo, int(l.index))
-		entry := metrics.BackendRouteStat{
-			Backend: l.name,
-			Address: l.addr,
-			Packets: pkts,
-			Bytes:   byts,
-		}
-		backendOut = append(backendOut, entry)
-		slog.Debug("vpp-backend-route-stats",
-			"backend", entry.Backend,
-			"address", entry.Address,
-			"packets", entry.Packets,
-			"bytes", entry.Bytes,
-		)
-	}
-	c.backendRouteSnap.Store(&backendOut)
 	return nil
 }

--- a/internal/vpp/lbsync.go
+++ b/internal/vpp/lbsync.go
@@ -251,27 +251,25 @@ func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, curSticky bool, f
 	}

 	if curSticky != d.SrcIPSticky {
-		slog.Info("vpp-lb-sync-vip-recreate",
-			"vip", d.Prefix.IP.String(),
-			"protocol", protocolName(d.Protocol),
-			"port", d.Port,
-			"reason", "src-ip-sticky-changed",
-			"from", curSticky,
-			"to", d.SrcIPSticky)
-		if err := removeVIP(ch, *cur, st); err != nil {
-			return err
-		}
-		if err := addVIP(ch, d); err != nil {
-			return err
-		}
-		st.vipAdd++
-		for _, as := range d.ASes {
-			if err := addAS(ch, d.Prefix, d.Protocol, d.Port, as); err != nil {
-				return err
-			}
-			st.asAdd++
-		}
-		return nil
+		return recreateVIP(ch, d, *cur, st, "src-ip-sticky-changed",
+			"from", curSticky, "to", d.SrcIPSticky)
+	}
+
+	// Encap mismatch: every backend flipped address family (e.g. all
+	// IPv6 → all IPv4 after a config edit). VPP's encap is a VIP-level
+	// attribute set at lb_add_del_vip time with no mutation API, so
+	// adding new-family ASes under the old encap wedges the
+	// reconciler: packets would be wrapped for the wrong family and
+	// the new backends never see traffic. The only recovery is a VIP
+	// recreate. The old-family ASes end up orphaned in VPP's pool and
+	// are GC'd on the plugin's own ~40s "USED-flag=false" schedule;
+	// the next regular sync tick confirms steady state. A recreate
+	// does tear down existing flows to the VIP, which is why we gate
+	// on an explicit encap difference rather than reconciling every
+	// sync cycle.
+	if desiredEncap := encapString(d.Encap); desiredEncap != cur.Encap {
+		return recreateVIP(ch, d, *cur, st, "encap-changed",
+			"from", cur.Encap, "to", desiredEncap)
 	}

 	// VIP exists in both — reconcile ASes.
@@ -301,20 +299,38 @@ func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, curSticky bool, f
 			st.asAdd++
 			continue
 		}
-		if c.Weight != a.Weight {
-			// Flush only on the transition from serving traffic (cur > 0) to
-			// zero, and only when the desired state explicitly asks for it
-			// (i.e. the backend was disabled, not merely drained). Steady-
-			// state syncs where weight doesn't change never re-flush.
-			flush := a.Flush && c.Weight > 0 && a.Weight == 0
-			// Caller-forced flush: used by SetFrontendPoolBackendWeight
-			// with flush=true to explicitly drop live sessions for a
-			// single backend. The address match is exact — no other
-			// AS's weight change is affected, even if several happen
-			// in the same reconcile pass.
-			if flushAddress != "" && addr == flushAddress {
-				flush = true
-			}
+		// setASWeight is issued whenever the weight changes OR whenever
+		// the state machine asks for a flush (a.Flush=true, currently
+		// emitted only for StateDisabled). The a.Flush path has to
+		// fire even on a no-op weight change, because a backend can
+		// reach StateDisabled via a pool-failover that already drained
+		// its VPP weight to 0 on an earlier tick — at that moment
+		// c.Weight == a.Weight == 0, and a gate keyed solely on the
+		// weight diff would silently drop the flush intent and leave
+		// stale sticky-cache entries pointing at the now-disabled AS
+		// (see the "disable nlams0 after fallback deactivation" trace
+		// in the bug investigation).
+		//
+		// Firing unconditionally on a.Flush is idempotent at VPP's
+		// side: lb_as_set_weight with an unchanged weight is a no-op
+		// on the Maglev table (lb_vip_update_new_flow_table rebuilds
+		// the same table), and a redundant lb_flush_vip_as is bounded
+		// — it walks each per-worker sticky_ht once. The trade-off is
+		// that disabled backends re-issue the flush on every periodic
+		// SyncLBStateAll tick, and any sticky entries that happened to
+		// land in the meantime get cleared; both are acceptable for
+		// the "correctness over churn" semantics we want here.
+		weightChanged := c.Weight != a.Weight
+		flush := a.Flush
+		// Caller-forced flush: used by SetFrontendPoolBackendWeight
+		// with flush=true to explicitly drop live sessions for a
+		// single backend. The address match is exact — no other
+		// AS's weight change is affected, even if several happen
+		// in the same reconcile pass.
+		if flushAddress != "" && addr == flushAddress {
+			flush = true
+		}
+		if weightChanged || flush {
 			if err := setASWeight(ch, d.Prefix, d.Protocol, d.Port, a, c.Weight, flush); err != nil {
 				return err
 			}
@@ -324,6 +340,37 @@ func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, curSticky bool, f
 	return nil
 }

+// recreateVIP tears down an existing VIP and rebuilds it with the
+// desired configuration and ASes. Used when a VIP attribute that VPP
+// can't mutate in place has changed — today src_ip_sticky and the
+// encap family. reason is logged as an operator-facing explanation;
+// extra is appended to the slog call as additional fields (typically
+// "from", <oldvalue>, "to", <newvalue>).
+func recreateVIP(ch *loggedChannel, d desiredVIP, cur LBVIP, st *syncStats, reason string, extra ...any) error {
+	logAttrs := []any{
+		"vip", d.Prefix.IP.String(),
+		"protocol", protocolName(d.Protocol),
+		"port", d.Port,
+		"reason", reason,
+	}
+	logAttrs = append(logAttrs, extra...)
+	slog.Info("vpp-lb-sync-vip-recreate", logAttrs...)
+	if err := removeVIP(ch, cur, st); err != nil {
+		return err
+	}
+	if err := addVIP(ch, d); err != nil {
+		return err
+	}
+	st.vipAdd++
+	for _, as := range d.ASes {
+		if err := addAS(ch, d.Prefix, d.Protocol, d.Port, as); err != nil {
+			return err
+		}
+		st.asAdd++
+	}
+	return nil
+}
+
 // removeVIP flushes all ASes from a VIP and then deletes the VIP itself.
 func removeVIP(ch *loggedChannel, v LBVIP, st *syncStats) error {
 	for _, as := range v.ASes {