VPP LB counters, src-ip-sticky, and frontend state aggregation

New feature: per-VIP / per-backend runtime counters
  * New GetVPPLBCounters RPC serving an in-process snapshot refreshed
    by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls
    the LB plugin's four SimpleCounters (next, first, untracked,
    no-server) plus the FIB /net/route/to CombinedCounter for every
    VIP and every backend host prefix via a single DumpStats call.
  * FIB stats-index discovery via ip_route_lookup (internal/vpp/
    fibstats.go); per-worker reduction happens in the collector.
  * Prometheus collector exports vip_packets_total (kind label),
    vip_route_{packets,bytes}_total, and backend_route_{packets,
    bytes}_total. Metrics source interface extended with VIPStats /
    BackendRouteStats; vpp.Client publishes snapshots via
    atomic.Pointer and clears them on disconnect.
  * New 'show vpp lb counters' CLI command. The 'show vpp lbstate'
    and 'sync vpp lbstate' commands are restructured under 'show
    vpp lb {state,counters}' / 'sync vpp lb state' to make room
    for the new verb.

New feature: src-ip-sticky frontends
  * New frontend YAML key 'src-ip-sticky' (bool). Plumbed through
    config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call.
  * Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP.
    src_ip_sticky, and shown in 'show vpp lb state' output.
  * Scraped back from VPP by parsing 'show lb vips verbose' through
    cli_inband — lb_vip_details does not expose the flag. The same
    scrape also recovers the LB pool index for each VIP, which the
    stats-segment counters are keyed on. This is a documented
    temporary workaround until VPP ships an lb_vip_v2_dump.
  * src_ip_sticky cannot be mutated on a live VIP, so a flipped flag
    triggers a tear-down-and-recreate in reconcileVIP (ASes deleted
    with flush, VIP deleted, then re-added). Flip is logged.

New feature: frontend state aggregation and events
  * New health.FrontendState (unknown/up/down) and FrontendTransition
    types. A frontend is 'up' iff at least one backend has a nonzero
    effective weight, 'unknown' iff no backend has real state yet,
    and 'down' otherwise.
  * Checker tracks per-frontend aggregate state, recomputing after
    each backend transition and emitting a frontend-transition Event
    on change. Reload drops entries for removed frontends.
  * checker.Event gains an optional FrontendTransition pointer;
    backend- vs. frontend-transition events are demultiplexed on
    that field.
  * WatchEvents now sends an initial snapshot of frontend state on
    connect (mirroring the existing backend snapshot), subscribes
    once to the checker stream, and fans out to backend/frontend
    handlers based on the client's filter flags. The proto
    FrontendEvent message grows name + transition fields.
  * New Checker.FrontendState accessor.

Refactor: pure health helpers
  * Moved the priority-failover selector and the (pool idx, active
    pool, state, cfg weight) → (vpp weight, flush) mapping out of
    internal/vpp/lbsync.go into a new internal/health/weights.go so
    the checker can reuse them for frontend-state computation
    without importing internal/vpp.
  * New functions: health.ActivePoolIndex, BackendEffectiveWeight,
    EffectiveWeights, ComputeFrontendState. lbsync.go now calls
    these directly; vpp.EffectiveWeights is a thin wrapper over
    health.EffectiveWeights retained for the gRPC observability
    path. Fully unit-tested in internal/health/weights_test.go.

maglevc polish
  * --color default is now mode-aware: on in the interactive shell,
    off in one-shot mode so piped output is script-safe. Explicit
    --color=true/false still overrides.
  * New stripHostMask helper drops /32 and /128 from VIP display;
    non-host prefixes pass through unchanged.
  * Counter table column order fixed (first before next) and
    packets/bytes columns renamed to fib-packets/fib-bytes to
    clarify they come from the FIB, not the LB plugin.

Docs
  * config-guide: document src-ip-sticky, including the VIP
    recreate-on-change caveat.
  * user-guide, maglevc.1, maglevd.8: updated command tree, new
    counters command, color defaults, and the src-ip-sticky field.
This commit is contained in:
2026-04-12 15:59:02 +02:00
parent d5fbf5c640
commit fb62532fd5
25 changed files with 2163 additions and 549 deletions

View File

@@ -71,13 +71,19 @@ func buildTree() *Node {
Children: []*Node{showHealthCheckName},
}
// show vpp info / lbstate
// show vpp info / lb state / lb counters
showVPPInfo := &Node{Word: "info", Help: "Show VPP version, uptime, and connection status", Run: runShowVPPInfo}
showVPPLBState := &Node{Word: "lbstate", Help: "Show VPP load-balancer state (VIPs and application servers)", Run: runShowVPPLBState}
showVPPLBState := &Node{Word: "state", Help: "Show VPP load-balancer state (VIPs and application servers)", Run: runShowVPPLBState}
showVPPLBCounters := &Node{Word: "counters", Help: "Show VPP per-VIP and per-backend packet/byte counters (refreshed every ~5s server-side)", Run: runShowVPPLBCounters}
showVPPLB := &Node{
Word: "lb",
Help: "VPP load-balancer information",
Children: []*Node{showVPPLBState, showVPPLBCounters},
}
showVPP := &Node{
Word: "vpp",
Help: "VPP dataplane information",
Children: []*Node{showVPPInfo, showVPPLBState},
Children: []*Node{showVPPInfo, showVPPLB},
}
show.Children = []*Node{
@@ -175,7 +181,7 @@ func buildTree() *Node {
Children: []*Node{configCheck, configReload},
}
// sync vpp lbstate [<name>]
// sync vpp lb state [<name>]
//
// Without a name: run SyncLBStateAll (may remove stale VIPs).
// With a name: run SyncLBStateVIP(name) for just that frontend (no removals).
@@ -186,15 +192,20 @@ func buildTree() *Node {
Run: runSyncVPPLBState,
}
syncVPPLBState := &Node{
Word: "lbstate",
Word: "state",
Help: "Sync the VPP load-balancer dataplane from the running config",
Run: runSyncVPPLBState,
Children: []*Node{syncVPPLBStateName},
}
syncVPPLB := &Node{
Word: "lb",
Help: "VPP load-balancer sync commands",
Children: []*Node{syncVPPLBState},
}
syncVPP := &Node{
Word: "vpp",
Help: "VPP dataplane sync commands",
Children: []*Node{syncVPPLBState},
Children: []*Node{syncVPPLB},
}
syncNode := &Node{
Word: "sync",
@@ -295,10 +306,11 @@ func runShowVPPLBState(ctx context.Context, client grpcapi.MaglevClient, _ []str
for _, v := range state.Vips {
fmt.Println()
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
fmt.Fprintf(w, "%s\t%s\n", label("vip"), v.Prefix)
fmt.Fprintf(w, "%s\t%s\n", label("vip"), stripHostMask(v.Prefix))
fmt.Fprintf(w, " %s\t%s\n", label("protocol"), protoString(v.Protocol))
fmt.Fprintf(w, " %s\t%d\n", label("port"), v.Port)
fmt.Fprintf(w, " %s\t%s\n", label("encap"), v.Encap)
fmt.Fprintf(w, " %s\t%t\n", label("src-ip-sticky"), v.SrcIpSticky)
fmt.Fprintf(w, " %s\t%d\n", label("flow-table-length"), v.FlowTableLength)
fmt.Fprintf(w, " %s\t%d\n", label("application-servers"), len(v.ApplicationServers))
if err := w.Flush(); err != nil {
@@ -314,6 +326,80 @@ func runShowVPPLBState(ctx context.Context, client grpcapi.MaglevClient, _ []str
return nil
}
// runShowVPPLBCounters prints the per-VIP and per-backend runtime counters
// captured by maglevd's 5s scrape loop. Values are up to 5 seconds stale;
// Prometheus is the right tool if you need live rates.
func runShowVPPLBCounters(ctx context.Context, client grpcapi.MaglevClient, _ []string) error {
ctx, cancel := context.WithTimeout(ctx, callTimeout)
defer cancel()
resp, err := client.GetVPPLBCounters(ctx, &grpcapi.GetVPPLBCountersRequest{})
if err != nil {
return err
}
if len(resp.Vips) == 0 && len(resp.Backends) == 0 {
fmt.Println("(no counters — VPP disconnected or scrape pending)")
return nil
}
// ---- frontend-counters ----
fmt.Println(label("frontend-counters"))
if len(resp.Vips) == 0 {
fmt.Println(" (none)")
} else {
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
fmt.Fprintf(w, " %s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n",
label("vip"), label("proto"), label("port"),
label("first"), label("next"),
label("untracked"), label("no-server"),
label("fib-packets"), label("fib-bytes"),
)
for _, v := range resp.Vips {
fmt.Fprintf(w, " %s\t%s\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
stripHostMask(v.Prefix), v.Protocol, v.Port,
v.FirstPacket, v.NextPacket,
v.UntrackedPacket, v.NoServer,
v.Packets, v.Bytes,
)
}
if err := w.Flush(); err != nil {
return err
}
}
fmt.Println()
// ---- backend-counters ----
fmt.Println(label("backend-counters"))
if len(resp.Backends) == 0 {
fmt.Println(" (none)")
return nil
}
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
fmt.Fprintf(w, " %s\t%s\t%s\t%s\n",
label("backend"), label("address"),
label("fib-packets"), label("fib-bytes"),
)
for _, b := range resp.Backends {
fmt.Fprintf(w, " %s\t%s\t%d\t%d\n",
b.Backend, b.Address, b.Packets, b.Bytes,
)
}
return w.Flush()
}
// stripHostMask trims "/32" (IPv4) or "/128" (IPv6) from a VIP's CIDR
// string. maglevd only programs host-prefix VIPs so the mask is always
// one of these two values and carries no information for a human reader.
// Non-host prefixes and unparseable strings are returned unchanged so
// future changes don't silently lose data.
func stripHostMask(prefix string) string {
if strings.HasSuffix(prefix, "/32") || strings.HasSuffix(prefix, "/128") {
return prefix[:strings.LastIndexByte(prefix, '/')]
}
return prefix
}
// protoString renders an IP protocol number as a name (tcp, udp, any, or numeric).
func protoString(p uint32) string {
switch p {
@@ -385,6 +471,7 @@ func runShowFrontend(ctx context.Context, client grpcapi.MaglevClient, args []st
fmt.Fprintf(w, "%s\t%s\n", label("address"), info.Address)
fmt.Fprintf(w, "%s\t%s\n", label("protocol"), info.Protocol)
fmt.Fprintf(w, "%s\t%d\n", label("port"), info.Port)
fmt.Fprintf(w, "%s\t%t\n", label("src-ip-sticky"), info.SrcIpSticky)
if info.Description != "" {
fmt.Fprintf(w, "%s\t%s\n", label("description"), info.Description)
}

View File

@@ -25,9 +25,18 @@ func main() {
func run() error {
serverAddr := flag.String("server", "localhost:9090", "maglev server address")
color := flag.Bool("color", true, "colorize static labels in output")
color := flag.Bool("color", true, "colorize static labels in output (defaults to false in one-shot mode)")
flag.Parse()
colorEnabled = *color
// Detect whether -color was explicitly set so we can pick a
// mode-aware default: color is useful in the interactive shell but
// noise (ANSI escapes) when piping one-shot output into scripts.
colorExplicit := false
flag.Visit(func(f *flag.Flag) {
if f.Name == "color" {
colorExplicit = true
}
})
conn, err := grpc.NewClient(*serverAddr,
grpc.WithTransportCredentials(insecure.NewCredentials()))
@@ -41,13 +50,25 @@ func run() error {
args := flag.Args()
if len(args) == 0 {
// Interactive shell: announce version on startup.
// Interactive shell: color defaults to true.
if colorExplicit {
colorEnabled = *color
} else {
colorEnabled = true
}
fmt.Printf("maglevc %s (commit %s, built %s)\n",
buildinfo.Version(), buildinfo.Commit(), buildinfo.Date())
return runShell(ctx, client)
}
// One-shot command from CLI arguments.
// One-shot command from CLI arguments: color defaults to false so
// output is script-safe. Operators wanting color can still pass
// -color=true explicitly.
if colorExplicit {
colorEnabled = *color
} else {
colorEnabled = false
}
root := buildTree()
tokens := splitTokens(strings.Join(args, " "))
return dispatch(ctx, root, client, tokens)

View File

@@ -29,10 +29,11 @@ func TestExpandPathsRoot(t *testing.T) {
"watch events <opt>",
"config check",
"show vpp info",
"show vpp lbstate",
"show vpp lb state",
"show vpp lb counters",
"config reload",
"sync vpp lbstate",
"sync vpp lbstate <name>",
"sync vpp lb state",
"sync vpp lb state <name>",
"quit",
"exit",
}
@@ -63,9 +64,10 @@ func TestExpandPathsShow(t *testing.T) {
}
}
// version, frontends, frontends <name>, backends, backends <name>,
// healthchecks, healthchecks <name>, vpp info, vpp lb = 9 lines
if len(lines) != 9 {
t.Errorf("expected exactly 9 show subcommands, got %d", len(lines))
// healthchecks, healthchecks <name>, vpp info, vpp lb state,
// vpp lb counters = 10 lines
if len(lines) != 10 {
t.Errorf("expected exactly 10 show subcommands, got %d", len(lines))
}
}