VPP LB counters, src-ip-sticky, and frontend state aggregation
New feature: per-VIP / per-backend runtime counters
* New GetVPPLBCounters RPC serving an in-process snapshot refreshed
by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls
the LB plugin's four SimpleCounters (next, first, untracked,
no-server) plus the FIB /net/route/to CombinedCounter for every
VIP and every backend host prefix via a single DumpStats call.
* FIB stats-index discovery via ip_route_lookup (internal/vpp/
fibstats.go); per-worker reduction happens in the collector.
* Prometheus collector exports vip_packets_total (kind label),
vip_route_{packets,bytes}_total, and backend_route_{packets,
bytes}_total. Metrics source interface extended with VIPStats /
BackendRouteStats; vpp.Client publishes snapshots via
atomic.Pointer and clears them on disconnect.
* New 'show vpp lb counters' CLI command. The 'show vpp lbstate'
and 'sync vpp lbstate' commands are restructured under 'show
vpp lb {state,counters}' / 'sync vpp lb state' to make room
for the new verb.
New feature: src-ip-sticky frontends
* New frontend YAML key 'src-ip-sticky' (bool). Plumbed through
config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call.
* Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP.
src_ip_sticky, and shown in 'show vpp lb state' output.
* Scraped back from VPP by parsing 'show lb vips verbose' through
cli_inband — lb_vip_details does not expose the flag. The same
scrape also recovers the LB pool index for each VIP, which the
stats-segment counters are keyed on. This is a documented
temporary workaround until VPP ships an lb_vip_v2_dump.
* src_ip_sticky cannot be mutated on a live VIP, so a flipped flag
triggers a tear-down-and-recreate in reconcileVIP (ASes deleted
with flush, VIP deleted, then re-added). Flip is logged.
New feature: frontend state aggregation and events
* New health.FrontendState (unknown/up/down) and FrontendTransition
types. A frontend is 'up' iff at least one backend has a nonzero
effective weight, 'unknown' iff no backend has real state yet,
and 'down' otherwise.
* Checker tracks per-frontend aggregate state, recomputing after
each backend transition and emitting a frontend-transition Event
on change. Reload drops entries for removed frontends.
* checker.Event gains an optional FrontendTransition pointer;
backend- vs. frontend-transition events are demultiplexed on
that field.
* WatchEvents now sends an initial snapshot of frontend state on
connect (mirroring the existing backend snapshot), subscribes
once to the checker stream, and fans out to backend/frontend
handlers based on the client's filter flags. The proto
FrontendEvent message grows name + transition fields.
* New Checker.FrontendState accessor.
Refactor: pure health helpers
* Moved the priority-failover selector and the (pool idx, active
pool, state, cfg weight) → (vpp weight, flush) mapping out of
internal/vpp/lbsync.go into a new internal/health/weights.go so
the checker can reuse them for frontend-state computation
without importing internal/vpp.
* New functions: health.ActivePoolIndex, BackendEffectiveWeight,
EffectiveWeights, ComputeFrontendState. lbsync.go now calls
these directly; vpp.EffectiveWeights is a thin wrapper over
health.EffectiveWeights retained for the gRPC observability
path. Fully unit-tested in internal/health/weights_test.go.
maglevc polish
* --color default is now mode-aware: on in the interactive shell,
off in one-shot mode so piped output is script-safe. Explicit
--color=true/false still overrides.
* New stripHostMask helper drops /32 and /128 from VIP display;
non-host prefixes pass through unchanged.
* Counter table column order fixed (first before next) and
packets/bytes columns renamed to fib-packets/fib-bytes to
clarify they come from the FIB, not the LB plugin.
Docs
* config-guide: document src-ip-sticky, including the VIP
recreate-on-change caveat.
* user-guide, maglevc.1, maglevd.8: updated command tree, new
counters command, color defaults, and the src-ip-sticky field.
This commit is contained in:
@@ -23,13 +23,26 @@ type BackendSnapshot struct {
|
||||
Config config.Backend
|
||||
}
|
||||
|
||||
// Event is emitted on every backend state transition, once per frontend that
|
||||
// references the backend.
|
||||
// Event is emitted on every state transition the checker observes. There are
|
||||
// two kinds, distinguished by which of BackendName or FrontendTransition is
|
||||
// populated:
|
||||
//
|
||||
// - Backend transition: FrontendName is the frontend that references the
|
||||
// backend (one event per frontend per backend transition), BackendName
|
||||
// and Backend are set, and Transition carries the health.Transition.
|
||||
// FrontendTransition is nil.
|
||||
// - Frontend transition: FrontendName is the frontend whose aggregate state
|
||||
// changed, FrontendTransition is non-nil. BackendName and Backend are
|
||||
// empty, Transition is the zero value.
|
||||
//
|
||||
// Consumers dispatch on FrontendTransition != nil.
|
||||
type Event struct {
|
||||
FrontendName string
|
||||
BackendName string
|
||||
Backend net.IP
|
||||
Transition health.Transition
|
||||
|
||||
FrontendTransition *health.FrontendTransition
|
||||
}
|
||||
|
||||
type worker struct {
|
||||
@@ -49,6 +62,13 @@ type Checker struct {
|
||||
mu sync.RWMutex
|
||||
workers map[string]*worker // keyed by backend name
|
||||
|
||||
// frontendStates tracks the aggregated state of every configured frontend
|
||||
// (unknown/up/down). Updated whenever a backend transition happens; a
|
||||
// change emits a frontend-transition Event. The zero value for a missing
|
||||
// key is FrontendStateUnknown, so initial-reference accesses behave
|
||||
// correctly even without explicit seeding.
|
||||
frontendStates map[string]health.FrontendState
|
||||
|
||||
subsMu sync.Mutex
|
||||
nextID int
|
||||
subs map[int]chan Event
|
||||
@@ -58,10 +78,11 @@ type Checker struct {
|
||||
// New creates a Checker. Call Run to start probing.
|
||||
func New(cfg *config.Config) *Checker {
|
||||
return &Checker{
|
||||
cfg: cfg,
|
||||
workers: make(map[string]*worker),
|
||||
subs: make(map[int]chan Event),
|
||||
eventCh: make(chan Event, 256),
|
||||
cfg: cfg,
|
||||
workers: make(map[string]*worker),
|
||||
frontendStates: make(map[string]health.FrontendState),
|
||||
subs: make(map[int]chan Event),
|
||||
eventCh: make(chan Event, 256),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -131,6 +152,13 @@ func (c *Checker) Reload(ctx context.Context, cfg *config.Config) error {
|
||||
c.emitForBackend(name, c.workers[name].backend.Address, c.workers[name].backend.Transitions[0], cfg.Frontends)
|
||||
}
|
||||
|
||||
// Drop frontendStates entries for frontends no longer in config.
|
||||
for feName := range c.frontendStates {
|
||||
if _, ok := cfg.Frontends[feName]; !ok {
|
||||
delete(c.frontendStates, feName)
|
||||
}
|
||||
}
|
||||
|
||||
c.cfg = cfg
|
||||
return nil
|
||||
}
|
||||
@@ -174,6 +202,18 @@ func (c *Checker) BackendState(name string) (health.State, bool) {
|
||||
return w.backend.State, true
|
||||
}
|
||||
|
||||
// FrontendState returns the current aggregate state of a frontend (unknown,
|
||||
// up, or down). Returns (FrontendStateUnknown, false) when the frontend is
|
||||
// not known to the checker.
|
||||
func (c *Checker) FrontendState(name string) (health.FrontendState, bool) {
|
||||
c.mu.RLock()
|
||||
defer c.mu.RUnlock()
|
||||
if _, ok := c.cfg.Frontends[name]; !ok {
|
||||
return health.FrontendStateUnknown, false
|
||||
}
|
||||
return c.frontendStates[name], true
|
||||
}
|
||||
|
||||
// ListFrontends returns the names of all configured frontends.
|
||||
func (c *Checker) ListFrontends() []string {
|
||||
c.mu.RLock()
|
||||
@@ -575,24 +615,60 @@ func (c *Checker) runProbe(ctx context.Context, name string, pos, total int) {
|
||||
}
|
||||
}
|
||||
|
||||
// emitForBackend emits one Event per frontend that references backendName
|
||||
// (in any pool), using the provided frontends map. Must be called with c.mu held.
|
||||
// emitForBackend emits one backend-transition Event per frontend that
|
||||
// references backendName (in any pool), using the provided frontends map.
|
||||
// After emitting the backend event for a frontend, it also re-computes that
|
||||
// frontend's aggregate state and emits a frontend-transition Event if the
|
||||
// state has changed. Must be called with c.mu held.
|
||||
func (c *Checker) emitForBackend(backendName string, addr net.IP, t health.Transition, frontends map[string]config.Frontend) {
|
||||
for feName, fe := range frontends {
|
||||
emitted := false
|
||||
for _, pool := range fe.Pools {
|
||||
if emitted {
|
||||
break
|
||||
}
|
||||
for name := range pool.Backends {
|
||||
if name == backendName {
|
||||
c.emit(Event{FrontendName: feName, BackendName: backendName, Backend: addr, Transition: t})
|
||||
emitted = true
|
||||
break
|
||||
}
|
||||
if !frontendReferencesBackend(fe, backendName) {
|
||||
continue
|
||||
}
|
||||
c.emit(Event{FrontendName: feName, BackendName: backendName, Backend: addr, Transition: t})
|
||||
c.updateFrontendState(feName, fe)
|
||||
}
|
||||
}
|
||||
|
||||
// frontendReferencesBackend reports whether fe has the named backend in any
|
||||
// of its pools.
|
||||
func frontendReferencesBackend(fe config.Frontend, backendName string) bool {
|
||||
for _, pool := range fe.Pools {
|
||||
if _, ok := pool.Backends[backendName]; ok {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// updateFrontendState recomputes the aggregate state of fe, compares against
|
||||
// the last known state, and emits a frontend-transition Event on change.
|
||||
// Must be called with c.mu held. The current state is read from the worker
|
||||
// map — so the caller (who already holds c.mu) sees a consistent view.
|
||||
func (c *Checker) updateFrontendState(feName string, fe config.Frontend) {
|
||||
states := make(map[string]health.State)
|
||||
for _, pool := range fe.Pools {
|
||||
for bName := range pool.Backends {
|
||||
if w, ok := c.workers[bName]; ok {
|
||||
states[bName] = w.backend.State
|
||||
} else {
|
||||
states[bName] = health.StateUnknown
|
||||
}
|
||||
}
|
||||
}
|
||||
newState := health.ComputeFrontendState(fe, states)
|
||||
old := c.frontendStates[feName] // zero value (Unknown) on first access
|
||||
if old == newState {
|
||||
return
|
||||
}
|
||||
c.frontendStates[feName] = newState
|
||||
ft := health.FrontendTransition{From: old, To: newState, At: time.Now()}
|
||||
slog.Info("frontend-transition",
|
||||
"frontend", feName,
|
||||
"from", old.String(),
|
||||
"to", newState.String(),
|
||||
)
|
||||
c.emit(Event{FrontendName: feName, FrontendTransition: &ft})
|
||||
}
|
||||
|
||||
// emit sends an event to the internal fan-out channel (non-blocking).
|
||||
|
||||
Reference in New Issue
Block a user