VPP LB counters, src-ip-sticky, and frontend state aggregation
New feature: per-VIP / per-backend runtime counters
* New GetVPPLBCounters RPC serving an in-process snapshot refreshed
by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls
the LB plugin's four SimpleCounters (next, first, untracked,
no-server) plus the FIB /net/route/to CombinedCounter for every
VIP and every backend host prefix via a single DumpStats call.
* FIB stats-index discovery via ip_route_lookup (internal/vpp/
fibstats.go); per-worker reduction happens in the collector.
* Prometheus collector exports vip_packets_total (kind label),
vip_route_{packets,bytes}_total, and backend_route_{packets,
bytes}_total. Metrics source interface extended with VIPStats /
BackendRouteStats; vpp.Client publishes snapshots via
atomic.Pointer and clears them on disconnect.
* New 'show vpp lb counters' CLI command. The 'show vpp lbstate'
and 'sync vpp lbstate' commands are restructured under 'show
vpp lb {state,counters}' / 'sync vpp lb state' to make room
for the new verb.
New feature: src-ip-sticky frontends
* New frontend YAML key 'src-ip-sticky' (bool). Plumbed through
config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call.
* Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP.
src_ip_sticky, and shown in 'show vpp lb state' output.
* Scraped back from VPP by parsing 'show lb vips verbose' through
cli_inband — lb_vip_details does not expose the flag. The same
scrape also recovers the LB pool index for each VIP, which the
stats-segment counters are keyed on. This is a documented
temporary workaround until VPP ships an lb_vip_v2_dump.
* src_ip_sticky cannot be mutated on a live VIP, so a flipped flag
triggers a tear-down-and-recreate in reconcileVIP (ASes deleted
with flush, VIP deleted, then re-added). Flip is logged.
New feature: frontend state aggregation and events
* New health.FrontendState (unknown/up/down) and FrontendTransition
types. A frontend is 'up' iff at least one backend has a nonzero
effective weight, 'unknown' iff no backend has real state yet,
and 'down' otherwise.
* Checker tracks per-frontend aggregate state, recomputing after
each backend transition and emitting a frontend-transition Event
on change. Reload drops entries for removed frontends.
* checker.Event gains an optional FrontendTransition pointer;
backend- vs. frontend-transition events are demultiplexed on
that field.
* WatchEvents now sends an initial snapshot of frontend state on
connect (mirroring the existing backend snapshot), subscribes
once to the checker stream, and fans out to backend/frontend
handlers based on the client's filter flags. The proto
FrontendEvent message grows name + transition fields.
* New Checker.FrontendState accessor.
Refactor: pure health helpers
* Moved the priority-failover selector and the (pool idx, active
pool, state, cfg weight) → (vpp weight, flush) mapping out of
internal/vpp/lbsync.go into a new internal/health/weights.go so
the checker can reuse them for frontend-state computation
without importing internal/vpp.
* New functions: health.ActivePoolIndex, BackendEffectiveWeight,
EffectiveWeights, ComputeFrontendState. lbsync.go now calls
these directly; vpp.EffectiveWeights is a thin wrapper over
health.EffectiveWeights retained for the gRPC observability
path. Fully unit-tested in internal/health/weights_test.go.
maglevc polish
* --color default is now mode-aware: on in the interactive shell,
off in one-shot mode so piped output is script-safe. Explicit
--color=true/false still overrides.
* New stripHostMask helper drops /32 and /128 from VIP display;
non-host prefixes pass through unchanged.
* Counter table column order fixed (first before next) and
packets/bytes columns renamed to fib-packets/fib-bytes to
clarify they come from the FIB, not the LB plugin.
Docs
* config-guide: document src-ip-sticky, including the VIP
recreate-on-change caveat.
* user-guide, maglevc.1, maglevd.8: updated command tree, new
counters command, color defaults, and the src-ip-sticky field.
This commit is contained in:
@@ -71,13 +71,19 @@ func buildTree() *Node {
|
||||
Children: []*Node{showHealthCheckName},
|
||||
}
|
||||
|
||||
// show vpp info / lbstate
|
||||
// show vpp info / lb state / lb counters
|
||||
showVPPInfo := &Node{Word: "info", Help: "Show VPP version, uptime, and connection status", Run: runShowVPPInfo}
|
||||
showVPPLBState := &Node{Word: "lbstate", Help: "Show VPP load-balancer state (VIPs and application servers)", Run: runShowVPPLBState}
|
||||
showVPPLBState := &Node{Word: "state", Help: "Show VPP load-balancer state (VIPs and application servers)", Run: runShowVPPLBState}
|
||||
showVPPLBCounters := &Node{Word: "counters", Help: "Show VPP per-VIP and per-backend packet/byte counters (refreshed every ~5s server-side)", Run: runShowVPPLBCounters}
|
||||
showVPPLB := &Node{
|
||||
Word: "lb",
|
||||
Help: "VPP load-balancer information",
|
||||
Children: []*Node{showVPPLBState, showVPPLBCounters},
|
||||
}
|
||||
showVPP := &Node{
|
||||
Word: "vpp",
|
||||
Help: "VPP dataplane information",
|
||||
Children: []*Node{showVPPInfo, showVPPLBState},
|
||||
Children: []*Node{showVPPInfo, showVPPLB},
|
||||
}
|
||||
|
||||
show.Children = []*Node{
|
||||
@@ -175,7 +181,7 @@ func buildTree() *Node {
|
||||
Children: []*Node{configCheck, configReload},
|
||||
}
|
||||
|
||||
// sync vpp lbstate [<name>]
|
||||
// sync vpp lb state [<name>]
|
||||
//
|
||||
// Without a name: run SyncLBStateAll (may remove stale VIPs).
|
||||
// With a name: run SyncLBStateVIP(name) for just that frontend (no removals).
|
||||
@@ -186,15 +192,20 @@ func buildTree() *Node {
|
||||
Run: runSyncVPPLBState,
|
||||
}
|
||||
syncVPPLBState := &Node{
|
||||
Word: "lbstate",
|
||||
Word: "state",
|
||||
Help: "Sync the VPP load-balancer dataplane from the running config",
|
||||
Run: runSyncVPPLBState,
|
||||
Children: []*Node{syncVPPLBStateName},
|
||||
}
|
||||
syncVPPLB := &Node{
|
||||
Word: "lb",
|
||||
Help: "VPP load-balancer sync commands",
|
||||
Children: []*Node{syncVPPLBState},
|
||||
}
|
||||
syncVPP := &Node{
|
||||
Word: "vpp",
|
||||
Help: "VPP dataplane sync commands",
|
||||
Children: []*Node{syncVPPLBState},
|
||||
Children: []*Node{syncVPPLB},
|
||||
}
|
||||
syncNode := &Node{
|
||||
Word: "sync",
|
||||
@@ -295,10 +306,11 @@ func runShowVPPLBState(ctx context.Context, client grpcapi.MaglevClient, _ []str
|
||||
for _, v := range state.Vips {
|
||||
fmt.Println()
|
||||
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("vip"), v.Prefix)
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("vip"), stripHostMask(v.Prefix))
|
||||
fmt.Fprintf(w, " %s\t%s\n", label("protocol"), protoString(v.Protocol))
|
||||
fmt.Fprintf(w, " %s\t%d\n", label("port"), v.Port)
|
||||
fmt.Fprintf(w, " %s\t%s\n", label("encap"), v.Encap)
|
||||
fmt.Fprintf(w, " %s\t%t\n", label("src-ip-sticky"), v.SrcIpSticky)
|
||||
fmt.Fprintf(w, " %s\t%d\n", label("flow-table-length"), v.FlowTableLength)
|
||||
fmt.Fprintf(w, " %s\t%d\n", label("application-servers"), len(v.ApplicationServers))
|
||||
if err := w.Flush(); err != nil {
|
||||
@@ -314,6 +326,80 @@ func runShowVPPLBState(ctx context.Context, client grpcapi.MaglevClient, _ []str
|
||||
return nil
|
||||
}
|
||||
|
||||
// runShowVPPLBCounters prints the per-VIP and per-backend runtime counters
|
||||
// captured by maglevd's 5s scrape loop. Values are up to 5 seconds stale;
|
||||
// Prometheus is the right tool if you need live rates.
|
||||
func runShowVPPLBCounters(ctx context.Context, client grpcapi.MaglevClient, _ []string) error {
|
||||
ctx, cancel := context.WithTimeout(ctx, callTimeout)
|
||||
defer cancel()
|
||||
resp, err := client.GetVPPLBCounters(ctx, &grpcapi.GetVPPLBCountersRequest{})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if len(resp.Vips) == 0 && len(resp.Backends) == 0 {
|
||||
fmt.Println("(no counters — VPP disconnected or scrape pending)")
|
||||
return nil
|
||||
}
|
||||
|
||||
// ---- frontend-counters ----
|
||||
fmt.Println(label("frontend-counters"))
|
||||
if len(resp.Vips) == 0 {
|
||||
fmt.Println(" (none)")
|
||||
} else {
|
||||
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
|
||||
fmt.Fprintf(w, " %s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n",
|
||||
label("vip"), label("proto"), label("port"),
|
||||
label("first"), label("next"),
|
||||
label("untracked"), label("no-server"),
|
||||
label("fib-packets"), label("fib-bytes"),
|
||||
)
|
||||
for _, v := range resp.Vips {
|
||||
fmt.Fprintf(w, " %s\t%s\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
|
||||
stripHostMask(v.Prefix), v.Protocol, v.Port,
|
||||
v.FirstPacket, v.NextPacket,
|
||||
v.UntrackedPacket, v.NoServer,
|
||||
v.Packets, v.Bytes,
|
||||
)
|
||||
}
|
||||
if err := w.Flush(); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
fmt.Println()
|
||||
|
||||
// ---- backend-counters ----
|
||||
fmt.Println(label("backend-counters"))
|
||||
if len(resp.Backends) == 0 {
|
||||
fmt.Println(" (none)")
|
||||
return nil
|
||||
}
|
||||
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
|
||||
fmt.Fprintf(w, " %s\t%s\t%s\t%s\n",
|
||||
label("backend"), label("address"),
|
||||
label("fib-packets"), label("fib-bytes"),
|
||||
)
|
||||
for _, b := range resp.Backends {
|
||||
fmt.Fprintf(w, " %s\t%s\t%d\t%d\n",
|
||||
b.Backend, b.Address, b.Packets, b.Bytes,
|
||||
)
|
||||
}
|
||||
return w.Flush()
|
||||
}
|
||||
|
||||
// stripHostMask trims "/32" (IPv4) or "/128" (IPv6) from a VIP's CIDR
|
||||
// string. maglevd only programs host-prefix VIPs so the mask is always
|
||||
// one of these two values and carries no information for a human reader.
|
||||
// Non-host prefixes and unparseable strings are returned unchanged so
|
||||
// future changes don't silently lose data.
|
||||
func stripHostMask(prefix string) string {
|
||||
if strings.HasSuffix(prefix, "/32") || strings.HasSuffix(prefix, "/128") {
|
||||
return prefix[:strings.LastIndexByte(prefix, '/')]
|
||||
}
|
||||
return prefix
|
||||
}
|
||||
|
||||
// protoString renders an IP protocol number as a name (tcp, udp, any, or numeric).
|
||||
func protoString(p uint32) string {
|
||||
switch p {
|
||||
@@ -385,6 +471,7 @@ func runShowFrontend(ctx context.Context, client grpcapi.MaglevClient, args []st
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("address"), info.Address)
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("protocol"), info.Protocol)
|
||||
fmt.Fprintf(w, "%s\t%d\n", label("port"), info.Port)
|
||||
fmt.Fprintf(w, "%s\t%t\n", label("src-ip-sticky"), info.SrcIpSticky)
|
||||
if info.Description != "" {
|
||||
fmt.Fprintf(w, "%s\t%s\n", label("description"), info.Description)
|
||||
}
|
||||
|
||||
@@ -25,9 +25,18 @@ func main() {
|
||||
|
||||
func run() error {
|
||||
serverAddr := flag.String("server", "localhost:9090", "maglev server address")
|
||||
color := flag.Bool("color", true, "colorize static labels in output")
|
||||
color := flag.Bool("color", true, "colorize static labels in output (defaults to false in one-shot mode)")
|
||||
flag.Parse()
|
||||
colorEnabled = *color
|
||||
|
||||
// Detect whether -color was explicitly set so we can pick a
|
||||
// mode-aware default: color is useful in the interactive shell but
|
||||
// noise (ANSI escapes) when piping one-shot output into scripts.
|
||||
colorExplicit := false
|
||||
flag.Visit(func(f *flag.Flag) {
|
||||
if f.Name == "color" {
|
||||
colorExplicit = true
|
||||
}
|
||||
})
|
||||
|
||||
conn, err := grpc.NewClient(*serverAddr,
|
||||
grpc.WithTransportCredentials(insecure.NewCredentials()))
|
||||
@@ -41,13 +50,25 @@ func run() error {
|
||||
|
||||
args := flag.Args()
|
||||
if len(args) == 0 {
|
||||
// Interactive shell: announce version on startup.
|
||||
// Interactive shell: color defaults to true.
|
||||
if colorExplicit {
|
||||
colorEnabled = *color
|
||||
} else {
|
||||
colorEnabled = true
|
||||
}
|
||||
fmt.Printf("maglevc %s (commit %s, built %s)\n",
|
||||
buildinfo.Version(), buildinfo.Commit(), buildinfo.Date())
|
||||
return runShell(ctx, client)
|
||||
}
|
||||
|
||||
// One-shot command from CLI arguments.
|
||||
// One-shot command from CLI arguments: color defaults to false so
|
||||
// output is script-safe. Operators wanting color can still pass
|
||||
// -color=true explicitly.
|
||||
if colorExplicit {
|
||||
colorEnabled = *color
|
||||
} else {
|
||||
colorEnabled = false
|
||||
}
|
||||
root := buildTree()
|
||||
tokens := splitTokens(strings.Join(args, " "))
|
||||
return dispatch(ctx, root, client, tokens)
|
||||
|
||||
@@ -29,10 +29,11 @@ func TestExpandPathsRoot(t *testing.T) {
|
||||
"watch events <opt>",
|
||||
"config check",
|
||||
"show vpp info",
|
||||
"show vpp lbstate",
|
||||
"show vpp lb state",
|
||||
"show vpp lb counters",
|
||||
"config reload",
|
||||
"sync vpp lbstate",
|
||||
"sync vpp lbstate <name>",
|
||||
"sync vpp lb state",
|
||||
"sync vpp lb state <name>",
|
||||
"quit",
|
||||
"exit",
|
||||
}
|
||||
@@ -63,9 +64,10 @@ func TestExpandPathsShow(t *testing.T) {
|
||||
}
|
||||
}
|
||||
// version, frontends, frontends <name>, backends, backends <name>,
|
||||
// healthchecks, healthchecks <name>, vpp info, vpp lb = 9 lines
|
||||
if len(lines) != 9 {
|
||||
t.Errorf("expected exactly 9 show subcommands, got %d", len(lines))
|
||||
// healthchecks, healthchecks <name>, vpp info, vpp lb state,
|
||||
// vpp lb counters = 10 lines
|
||||
if len(lines) != 10 {
|
||||
t.Errorf("expected exactly 10 show subcommands, got %d", len(lines))
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user