VPP LB counters, src-ip-sticky, and frontend state aggregation

New feature: per-VIP / per-backend runtime counters
  * New GetVPPLBCounters RPC serving an in-process snapshot refreshed
    by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls
    the LB plugin's four SimpleCounters (next, first, untracked,
    no-server) plus the FIB /net/route/to CombinedCounter for every
    VIP and every backend host prefix via a single DumpStats call.
  * FIB stats-index discovery via ip_route_lookup (internal/vpp/
    fibstats.go); per-worker reduction happens in the collector.
  * Prometheus collector exports vip_packets_total (kind label),
    vip_route_{packets,bytes}_total, and backend_route_{packets,
    bytes}_total. Metrics source interface extended with VIPStats /
    BackendRouteStats; vpp.Client publishes snapshots via
    atomic.Pointer and clears them on disconnect.
  * New 'show vpp lb counters' CLI command. The 'show vpp lbstate'
    and 'sync vpp lbstate' commands are restructured under 'show
    vpp lb {state,counters}' / 'sync vpp lb state' to make room
    for the new verb.

New feature: src-ip-sticky frontends
  * New frontend YAML key 'src-ip-sticky' (bool). Plumbed through
    config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call.
  * Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP.
    src_ip_sticky, and shown in 'show vpp lb state' output.
  * Scraped back from VPP by parsing 'show lb vips verbose' through
    cli_inband — lb_vip_details does not expose the flag. The same
    scrape also recovers the LB pool index for each VIP, which the
    stats-segment counters are keyed on. This is a documented
    temporary workaround until VPP ships an lb_vip_v2_dump.
  * src_ip_sticky cannot be mutated on a live VIP, so a flipped flag
    triggers a tear-down-and-recreate in reconcileVIP (ASes deleted
    with flush, VIP deleted, then re-added). Flip is logged.

New feature: frontend state aggregation and events
  * New health.FrontendState (unknown/up/down) and FrontendTransition
    types. A frontend is 'up' iff at least one backend has a nonzero
    effective weight, 'unknown' iff no backend has real state yet,
    and 'down' otherwise.
  * Checker tracks per-frontend aggregate state, recomputing after
    each backend transition and emitting a frontend-transition Event
    on change. Reload drops entries for removed frontends.
  * checker.Event gains an optional FrontendTransition pointer;
    backend- vs. frontend-transition events are demultiplexed on
    that field.
  * WatchEvents now sends an initial snapshot of frontend state on
    connect (mirroring the existing backend snapshot), subscribes
    once to the checker stream, and fans out to backend/frontend
    handlers based on the client's filter flags. The proto
    FrontendEvent message grows name + transition fields.
  * New Checker.FrontendState accessor.

Refactor: pure health helpers
  * Moved the priority-failover selector and the (pool idx, active
    pool, state, cfg weight) → (vpp weight, flush) mapping out of
    internal/vpp/lbsync.go into a new internal/health/weights.go so
    the checker can reuse them for frontend-state computation
    without importing internal/vpp.
  * New functions: health.ActivePoolIndex, BackendEffectiveWeight,
    EffectiveWeights, ComputeFrontendState. lbsync.go now calls
    these directly; vpp.EffectiveWeights is a thin wrapper over
    health.EffectiveWeights retained for the gRPC observability
    path. Fully unit-tested in internal/health/weights_test.go.

maglevc polish
  * --color default is now mode-aware: on in the interactive shell,
    off in one-shot mode so piped output is script-safe. Explicit
    --color=true/false still overrides.
  * New stripHostMask helper drops /32 and /128 from VIP display;
    non-host prefixes pass through unchanged.
  * Counter table column order fixed (first before next) and
    packets/bytes columns renamed to fib-packets/fib-bytes to
    clarify they come from the FIB, not the LB plugin.

Docs
  * config-guide: document src-ip-sticky, including the VIP
    recreate-on-change caveat.
  * user-guide, maglevc.1, maglevd.8: updated command tree, new
    counters command, color defaults, and the src-ip-sticky field.
This commit is contained in:
2026-04-12 15:59:02 +02:00
parent d5fbf5c640
commit fb62532fd5
25 changed files with 2163 additions and 549 deletions

View File

@@ -100,11 +100,12 @@ maglevc [--server host:port] [--color[=bool]] [command...]
| Flag | Default | Description |
|---|---|---|
| `--server` | `localhost:9090` | Address of the `maglevd` gRPC server. |
| `--color` | `true` | Colorize static field labels in output (dark blue ANSI). Pass `--color=false` to disable, e.g. when piping. |
| `--color` | mode-aware | Colorize static field labels (dark blue ANSI). Defaults to `true` in the interactive shell and `false` in one-shot mode, so output piped into scripts stays free of escape codes. Pass `--color=true` or `--color=false` explicitly to override either default. |
When `command` arguments are supplied the command is executed and `maglevc`
exits. When no arguments are given an interactive shell is started and the
build version is printed on entry.
exits; in this mode ANSI color is off by default so the output is script-safe.
When no arguments are given an interactive shell is started, the build version
is printed on entry, and color is on by default.
### Commands
@@ -112,9 +113,9 @@ build version is printed on entry.
show version Print build version, commit hash, and build date.
show frontends [<name>] Without name: list all frontend names.
With name: show address, protocol, port, description,
and pools. Each pool lists its backends with two
weight columns:
With name: show address, protocol, port, src-ip-sticky,
description, and pools. Each pool lists its backends
with two weight columns:
weight — configured weight from the YAML
effective — state-aware weight after pool failover
(what gets programmed into VPP)
@@ -131,12 +132,20 @@ show healthchecks [<name>] Without name: list all health-check names.
show vpp info Show VPP version, build date, PID, uptime, and when
maglevd connected. Returns an error if VPP is not
connected.
show vpp lbstate Show the VPP load-balancer plugin state: global
show vpp lb state Show the VPP load-balancer plugin state: global
configuration, configured VIPs, and their attached
application servers (address, weight, bucket count).
Returns an error if VPP is not connected.
show vpp lb counters Show per-VIP and per-backend packet/byte counters
from the VPP stats segment, refreshed roughly every
five seconds by maglevd. Each VIP row reports the LB
plugin counters (next, first, untracked, no-server)
and the FIB packets/bytes at the VIP's host prefix.
Each backend row reports FIB packets/bytes at the
backend's /32 or /128 prefix. Use Prometheus for
live rates; this command shows absolute values.
sync vpp lbstate [<name>] Reconcile the VPP load-balancer dataplane from the
sync vpp lb state [<name>] Reconcile the VPP load-balancer dataplane from the
running config. Without a name: runs a full sync —
creates missing VIPs, removes stale VIPs, and adjusts
application-server membership and weights across all