VPP LB counters, src-ip-sticky, and frontend state aggregation
New feature: per-VIP / per-backend runtime counters
* New GetVPPLBCounters RPC serving an in-process snapshot refreshed
by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls
the LB plugin's four SimpleCounters (next, first, untracked,
no-server) plus the FIB /net/route/to CombinedCounter for every
VIP and every backend host prefix via a single DumpStats call.
* FIB stats-index discovery via ip_route_lookup (internal/vpp/
fibstats.go); per-worker reduction happens in the collector.
* Prometheus collector exports vip_packets_total (kind label),
vip_route_{packets,bytes}_total, and backend_route_{packets,
bytes}_total. Metrics source interface extended with VIPStats /
BackendRouteStats; vpp.Client publishes snapshots via
atomic.Pointer and clears them on disconnect.
* New 'show vpp lb counters' CLI command. The 'show vpp lbstate'
and 'sync vpp lbstate' commands are restructured under 'show
vpp lb {state,counters}' / 'sync vpp lb state' to make room
for the new verb.
New feature: src-ip-sticky frontends
* New frontend YAML key 'src-ip-sticky' (bool). Plumbed through
config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call.
* Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP.
src_ip_sticky, and shown in 'show vpp lb state' output.
* Scraped back from VPP by parsing 'show lb vips verbose' through
cli_inband — lb_vip_details does not expose the flag. The same
scrape also recovers the LB pool index for each VIP, which the
stats-segment counters are keyed on. This is a documented
temporary workaround until VPP ships an lb_vip_v2_dump.
* src_ip_sticky cannot be mutated on a live VIP, so a flipped flag
triggers a tear-down-and-recreate in reconcileVIP (ASes deleted
with flush, VIP deleted, then re-added). Flip is logged.
New feature: frontend state aggregation and events
* New health.FrontendState (unknown/up/down) and FrontendTransition
types. A frontend is 'up' iff at least one backend has a nonzero
effective weight, 'unknown' iff no backend has real state yet,
and 'down' otherwise.
* Checker tracks per-frontend aggregate state, recomputing after
each backend transition and emitting a frontend-transition Event
on change. Reload drops entries for removed frontends.
* checker.Event gains an optional FrontendTransition pointer;
backend- vs. frontend-transition events are demultiplexed on
that field.
* WatchEvents now sends an initial snapshot of frontend state on
connect (mirroring the existing backend snapshot), subscribes
once to the checker stream, and fans out to backend/frontend
handlers based on the client's filter flags. The proto
FrontendEvent message grows name + transition fields.
* New Checker.FrontendState accessor.
Refactor: pure health helpers
* Moved the priority-failover selector and the (pool idx, active
pool, state, cfg weight) → (vpp weight, flush) mapping out of
internal/vpp/lbsync.go into a new internal/health/weights.go so
the checker can reuse them for frontend-state computation
without importing internal/vpp.
* New functions: health.ActivePoolIndex, BackendEffectiveWeight,
EffectiveWeights, ComputeFrontendState. lbsync.go now calls
these directly; vpp.EffectiveWeights is a thin wrapper over
health.EffectiveWeights retained for the gRPC observability
path. Fully unit-tested in internal/health/weights_test.go.
maglevc polish
* --color default is now mode-aware: on in the interactive shell,
off in one-shot mode so piped output is script-safe. Explicit
--color=true/false still overrides.
* New stripHostMask helper drops /32 and /128 from VIP display;
non-host prefixes pass through unchanged.
* Counter table column order fixed (first before next) and
packets/bytes columns renamed to fib-packets/fib-bytes to
clarify they come from the FIB, not the LB plugin.
Docs
* config-guide: document src-ip-sticky, including the VIP
recreate-on-change caveat.
* user-guide, maglevc.1, maglevd.8: updated command tree, new
counters command, color defaults, and the src-ip-sticky field.
This commit is contained in:
@@ -71,13 +71,19 @@ func buildTree() *Node {
|
||||
Children: []*Node{showHealthCheckName},
|
||||
}
|
||||
|
||||
// show vpp info / lbstate
|
||||
// show vpp info / lb state / lb counters
|
||||
showVPPInfo := &Node{Word: "info", Help: "Show VPP version, uptime, and connection status", Run: runShowVPPInfo}
|
||||
showVPPLBState := &Node{Word: "lbstate", Help: "Show VPP load-balancer state (VIPs and application servers)", Run: runShowVPPLBState}
|
||||
showVPPLBState := &Node{Word: "state", Help: "Show VPP load-balancer state (VIPs and application servers)", Run: runShowVPPLBState}
|
||||
showVPPLBCounters := &Node{Word: "counters", Help: "Show VPP per-VIP and per-backend packet/byte counters (refreshed every ~5s server-side)", Run: runShowVPPLBCounters}
|
||||
showVPPLB := &Node{
|
||||
Word: "lb",
|
||||
Help: "VPP load-balancer information",
|
||||
Children: []*Node{showVPPLBState, showVPPLBCounters},
|
||||
}
|
||||
showVPP := &Node{
|
||||
Word: "vpp",
|
||||
Help: "VPP dataplane information",
|
||||
Children: []*Node{showVPPInfo, showVPPLBState},
|
||||
Children: []*Node{showVPPInfo, showVPPLB},
|
||||
}
|
||||
|
||||
show.Children = []*Node{
|
||||
@@ -175,7 +181,7 @@ func buildTree() *Node {
|
||||
Children: []*Node{configCheck, configReload},
|
||||
}
|
||||
|
||||
// sync vpp lbstate [<name>]
|
||||
// sync vpp lb state [<name>]
|
||||
//
|
||||
// Without a name: run SyncLBStateAll (may remove stale VIPs).
|
||||
// With a name: run SyncLBStateVIP(name) for just that frontend (no removals).
|
||||
@@ -186,15 +192,20 @@ func buildTree() *Node {
|
||||
Run: runSyncVPPLBState,
|
||||
}
|
||||
syncVPPLBState := &Node{
|
||||
Word: "lbstate",
|
||||
Word: "state",
|
||||
Help: "Sync the VPP load-balancer dataplane from the running config",
|
||||
Run: runSyncVPPLBState,
|
||||
Children: []*Node{syncVPPLBStateName},
|
||||
}
|
||||
syncVPPLB := &Node{
|
||||
Word: "lb",
|
||||
Help: "VPP load-balancer sync commands",
|
||||
Children: []*Node{syncVPPLBState},
|
||||
}
|
||||
syncVPP := &Node{
|
||||
Word: "vpp",
|
||||
Help: "VPP dataplane sync commands",
|
||||
Children: []*Node{syncVPPLBState},
|
||||
Children: []*Node{syncVPPLB},
|
||||
}
|
||||
syncNode := &Node{
|
||||
Word: "sync",
|
||||
@@ -295,10 +306,11 @@ func runShowVPPLBState(ctx context.Context, client grpcapi.MaglevClient, _ []str
|
||||
for _, v := range state.Vips {
|
||||
fmt.Println()
|
||||
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("vip"), v.Prefix)
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("vip"), stripHostMask(v.Prefix))
|
||||
fmt.Fprintf(w, " %s\t%s\n", label("protocol"), protoString(v.Protocol))
|
||||
fmt.Fprintf(w, " %s\t%d\n", label("port"), v.Port)
|
||||
fmt.Fprintf(w, " %s\t%s\n", label("encap"), v.Encap)
|
||||
fmt.Fprintf(w, " %s\t%t\n", label("src-ip-sticky"), v.SrcIpSticky)
|
||||
fmt.Fprintf(w, " %s\t%d\n", label("flow-table-length"), v.FlowTableLength)
|
||||
fmt.Fprintf(w, " %s\t%d\n", label("application-servers"), len(v.ApplicationServers))
|
||||
if err := w.Flush(); err != nil {
|
||||
@@ -314,6 +326,80 @@ func runShowVPPLBState(ctx context.Context, client grpcapi.MaglevClient, _ []str
|
||||
return nil
|
||||
}
|
||||
|
||||
// runShowVPPLBCounters prints the per-VIP and per-backend runtime counters
|
||||
// captured by maglevd's 5s scrape loop. Values are up to 5 seconds stale;
|
||||
// Prometheus is the right tool if you need live rates.
|
||||
func runShowVPPLBCounters(ctx context.Context, client grpcapi.MaglevClient, _ []string) error {
|
||||
ctx, cancel := context.WithTimeout(ctx, callTimeout)
|
||||
defer cancel()
|
||||
resp, err := client.GetVPPLBCounters(ctx, &grpcapi.GetVPPLBCountersRequest{})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if len(resp.Vips) == 0 && len(resp.Backends) == 0 {
|
||||
fmt.Println("(no counters — VPP disconnected or scrape pending)")
|
||||
return nil
|
||||
}
|
||||
|
||||
// ---- frontend-counters ----
|
||||
fmt.Println(label("frontend-counters"))
|
||||
if len(resp.Vips) == 0 {
|
||||
fmt.Println(" (none)")
|
||||
} else {
|
||||
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
|
||||
fmt.Fprintf(w, " %s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n",
|
||||
label("vip"), label("proto"), label("port"),
|
||||
label("first"), label("next"),
|
||||
label("untracked"), label("no-server"),
|
||||
label("fib-packets"), label("fib-bytes"),
|
||||
)
|
||||
for _, v := range resp.Vips {
|
||||
fmt.Fprintf(w, " %s\t%s\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
|
||||
stripHostMask(v.Prefix), v.Protocol, v.Port,
|
||||
v.FirstPacket, v.NextPacket,
|
||||
v.UntrackedPacket, v.NoServer,
|
||||
v.Packets, v.Bytes,
|
||||
)
|
||||
}
|
||||
if err := w.Flush(); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
fmt.Println()
|
||||
|
||||
// ---- backend-counters ----
|
||||
fmt.Println(label("backend-counters"))
|
||||
if len(resp.Backends) == 0 {
|
||||
fmt.Println(" (none)")
|
||||
return nil
|
||||
}
|
||||
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
|
||||
fmt.Fprintf(w, " %s\t%s\t%s\t%s\n",
|
||||
label("backend"), label("address"),
|
||||
label("fib-packets"), label("fib-bytes"),
|
||||
)
|
||||
for _, b := range resp.Backends {
|
||||
fmt.Fprintf(w, " %s\t%s\t%d\t%d\n",
|
||||
b.Backend, b.Address, b.Packets, b.Bytes,
|
||||
)
|
||||
}
|
||||
return w.Flush()
|
||||
}
|
||||
|
||||
// stripHostMask trims "/32" (IPv4) or "/128" (IPv6) from a VIP's CIDR
|
||||
// string. maglevd only programs host-prefix VIPs so the mask is always
|
||||
// one of these two values and carries no information for a human reader.
|
||||
// Non-host prefixes and unparseable strings are returned unchanged so
|
||||
// future changes don't silently lose data.
|
||||
func stripHostMask(prefix string) string {
|
||||
if strings.HasSuffix(prefix, "/32") || strings.HasSuffix(prefix, "/128") {
|
||||
return prefix[:strings.LastIndexByte(prefix, '/')]
|
||||
}
|
||||
return prefix
|
||||
}
|
||||
|
||||
// protoString renders an IP protocol number as a name (tcp, udp, any, or numeric).
|
||||
func protoString(p uint32) string {
|
||||
switch p {
|
||||
@@ -385,6 +471,7 @@ func runShowFrontend(ctx context.Context, client grpcapi.MaglevClient, args []st
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("address"), info.Address)
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("protocol"), info.Protocol)
|
||||
fmt.Fprintf(w, "%s\t%d\n", label("port"), info.Port)
|
||||
fmt.Fprintf(w, "%s\t%t\n", label("src-ip-sticky"), info.SrcIpSticky)
|
||||
if info.Description != "" {
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("description"), info.Description)
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user