VPP LB counters, src-ip-sticky, and frontend state aggregation

New feature: per-VIP / per-backend runtime counters * New GetVPPLBCounters RPC serving an in-process snapshot refreshed by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls the LB plugin's four SimpleCounters (next, first, untracked, no-server) plus the FIB /net/route/to CombinedCounter for every VIP and every backend host prefix via a single DumpStats call. * FIB stats-index discovery via ip_route_lookup (internal/vpp/ fibstats.go); per-worker reduction happens in the collector. * Prometheus collector exports vip_packets_total (kind label), vip_route_{packets,bytes}_total, and backend_route_{packets, bytes}_total. Metrics source interface extended with VIPStats / BackendRouteStats; vpp.Client publishes snapshots via atomic.Pointer and clears them on disconnect. * New 'show vpp lb counters' CLI command. The 'show vpp lbstate' and 'sync vpp lbstate' commands are restructured under 'show vpp lb {state,counters}' / 'sync vpp lb state' to make room for the new verb. New feature: src-ip-sticky frontends * New frontend YAML key 'src-ip-sticky' (bool). Plumbed through config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call. * Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP. src_ip_sticky, and shown in 'show vpp lb state' output. * Scraped back from VPP by parsing 'show lb vips verbose' through cli_inband — lb_vip_details does not expose the flag. The same scrape also recovers the LB pool index for each VIP, which the stats-segment counters are keyed on. This is a documented temporary workaround until VPP ships an lb_vip_v2_dump. * src_ip_sticky cannot be mutated on a live VIP, so a flipped flag triggers a tear-down-and-recreate in reconcileVIP (ASes deleted with flush, VIP deleted, then re-added). Flip is logged. New feature: frontend state aggregation and events * New health.FrontendState (unknown/up/down) and FrontendTransition types. A frontend is 'up' iff at least one backend has a nonzero effective weight, 'unknown' iff no backend has real state yet, and 'down' otherwise. * Checker tracks per-frontend aggregate state, recomputing after each backend transition and emitting a frontend-transition Event on change. Reload drops entries for removed frontends. * checker.Event gains an optional FrontendTransition pointer; backend- vs. frontend-transition events are demultiplexed on that field. * WatchEvents now sends an initial snapshot of frontend state on connect (mirroring the existing backend snapshot), subscribes once to the checker stream, and fans out to backend/frontend handlers based on the client's filter flags. The proto FrontendEvent message grows name + transition fields. * New Checker.FrontendState accessor. Refactor: pure health helpers * Moved the priority-failover selector and the (pool idx, active pool, state, cfg weight) → (vpp weight, flush) mapping out of internal/vpp/lbsync.go into a new internal/health/weights.go so the checker can reuse them for frontend-state computation without importing internal/vpp. * New functions: health.ActivePoolIndex, BackendEffectiveWeight, EffectiveWeights, ComputeFrontendState. lbsync.go now calls these directly; vpp.EffectiveWeights is a thin wrapper over health.EffectiveWeights retained for the gRPC observability path. Fully unit-tested in internal/health/weights_test.go. maglevc polish * --color default is now mode-aware: on in the interactive shell, off in one-shot mode so piped output is script-safe. Explicit --color=true/false still overrides. * New stripHostMask helper drops /32 and /128 from VIP display; non-host prefixes pass through unchanged. * Counter table column order fixed (first before next) and packets/bytes columns renamed to fib-packets/fib-bytes to clarify they come from the FIB, not the LB plugin. Docs * config-guide: document src-ip-sticky, including the VIP recreate-on-change caveat. * user-guide, maglevc.1, maglevd.8: updated command tree, new counters command, color defaults, and the src-ip-sticky field.
2026-04-12 15:59:02 +02:00
parent d5fbf5c640
commit fb62532fd5
25 changed files with 2163 additions and 549 deletions
--- a/internal/checker/checker.go
+++ b/internal/checker/checker.go
@@ -23,13 +23,26 @@ type BackendSnapshot struct {
 	Config config.Backend
 }

-// Event is emitted on every backend state transition, once per frontend that
-// references the backend.
+// Event is emitted on every state transition the checker observes. There are
+// two kinds, distinguished by which of BackendName or FrontendTransition is
+// populated:
+//
+//   - Backend transition: FrontendName is the frontend that references the
+//     backend (one event per frontend per backend transition), BackendName
+//     and Backend are set, and Transition carries the health.Transition.
+//     FrontendTransition is nil.
+//   - Frontend transition: FrontendName is the frontend whose aggregate state
+//     changed, FrontendTransition is non-nil. BackendName and Backend are
+//     empty, Transition is the zero value.
+//
+// Consumers dispatch on FrontendTransition != nil.
 type Event struct {
 	FrontendName string
 	BackendName  string
 	Backend      net.IP
 	Transition   health.Transition
+
+	FrontendTransition *health.FrontendTransition
 }

 type worker struct {
@@ -49,6 +62,13 @@ type Checker struct {
 	mu      sync.RWMutex
 	workers map[string]*worker // keyed by backend name

+	// frontendStates tracks the aggregated state of every configured frontend
+	// (unknown/up/down). Updated whenever a backend transition happens; a
+	// change emits a frontend-transition Event. The zero value for a missing
+	// key is FrontendStateUnknown, so initial-reference accesses behave
+	// correctly even without explicit seeding.
+	frontendStates map[string]health.FrontendState
+
 	subsMu  sync.Mutex
 	nextID  int
 	subs    map[int]chan Event
@@ -58,10 +78,11 @@ type Checker struct {
 // New creates a Checker. Call Run to start probing.
 func New(cfg *config.Config) *Checker {
 	return &Checker{
-		cfg:     cfg,
-		workers: make(map[string]*worker),
-		subs:    make(map[int]chan Event),
-		eventCh: make(chan Event, 256),
+		cfg:            cfg,
+		workers:        make(map[string]*worker),
+		frontendStates: make(map[string]health.FrontendState),
+		subs:           make(map[int]chan Event),
+		eventCh:        make(chan Event, 256),
 	}
 }

@@ -131,6 +152,13 @@ func (c *Checker) Reload(ctx context.Context, cfg *config.Config) error {
 		c.emitForBackend(name, c.workers[name].backend.Address, c.workers[name].backend.Transitions[0], cfg.Frontends)
 	}

+	// Drop frontendStates entries for frontends no longer in config.
+	for feName := range c.frontendStates {
+		if _, ok := cfg.Frontends[feName]; !ok {
+			delete(c.frontendStates, feName)
+		}
+	}
+
 	c.cfg = cfg
 	return nil
 }
@@ -174,6 +202,18 @@ func (c *Checker) BackendState(name string) (health.State, bool) {
 	return w.backend.State, true
 }

+// FrontendState returns the current aggregate state of a frontend (unknown,
+// up, or down). Returns (FrontendStateUnknown, false) when the frontend is
+// not known to the checker.
+func (c *Checker) FrontendState(name string) (health.FrontendState, bool) {
+	c.mu.RLock()
+	defer c.mu.RUnlock()
+	if _, ok := c.cfg.Frontends[name]; !ok {
+		return health.FrontendStateUnknown, false
+	}
+	return c.frontendStates[name], true
+}
+
 // ListFrontends returns the names of all configured frontends.
 func (c *Checker) ListFrontends() []string {
 	c.mu.RLock()
@@ -575,24 +615,60 @@ func (c *Checker) runProbe(ctx context.Context, name string, pos, total int) {
 	}
 }

-// emitForBackend emits one Event per frontend that references backendName
-// (in any pool), using the provided frontends map. Must be called with c.mu held.
+// emitForBackend emits one backend-transition Event per frontend that
+// references backendName (in any pool), using the provided frontends map.
+// After emitting the backend event for a frontend, it also re-computes that
+// frontend's aggregate state and emits a frontend-transition Event if the
+// state has changed. Must be called with c.mu held.
 func (c *Checker) emitForBackend(backendName string, addr net.IP, t health.Transition, frontends map[string]config.Frontend) {
 	for feName, fe := range frontends {
-		emitted := false
-		for _, pool := range fe.Pools {
-			if emitted {
-				break
-			}
-			for name := range pool.Backends {
-				if name == backendName {
-					c.emit(Event{FrontendName: feName, BackendName: backendName, Backend: addr, Transition: t})
-					emitted = true
-					break
-				}
+		if !frontendReferencesBackend(fe, backendName) {
+			continue
+		}
+		c.emit(Event{FrontendName: feName, BackendName: backendName, Backend: addr, Transition: t})
+		c.updateFrontendState(feName, fe)
+	}
+}
+
+// frontendReferencesBackend reports whether fe has the named backend in any
+// of its pools.
+func frontendReferencesBackend(fe config.Frontend, backendName string) bool {
+	for _, pool := range fe.Pools {
+		if _, ok := pool.Backends[backendName]; ok {
+			return true
+		}
+	}
+	return false
+}
+
+// updateFrontendState recomputes the aggregate state of fe, compares against
+// the last known state, and emits a frontend-transition Event on change.
+// Must be called with c.mu held. The current state is read from the worker
+// map — so the caller (who already holds c.mu) sees a consistent view.
+func (c *Checker) updateFrontendState(feName string, fe config.Frontend) {
+	states := make(map[string]health.State)
+	for _, pool := range fe.Pools {
+		for bName := range pool.Backends {
+			if w, ok := c.workers[bName]; ok {
+				states[bName] = w.backend.State
+			} else {
+				states[bName] = health.StateUnknown
 			}
 		}
 	}
+	newState := health.ComputeFrontendState(fe, states)
+	old := c.frontendStates[feName] // zero value (Unknown) on first access
+	if old == newState {
+		return
+	}
+	c.frontendStates[feName] = newState
+	ft := health.FrontendTransition{From: old, To: newState, At: time.Now()}
+	slog.Info("frontend-transition",
+		"frontend", feName,
+		"from", old.String(),
+		"to", newState.String(),
+	)
+	c.emit(Event{FrontendName: feName, FrontendTransition: &ft})
 }

 // emit sends an event to the internal fan-out channel (non-blocking).