VPP LB counters, src-ip-sticky, and frontend state aggregation

New feature: per-VIP / per-backend runtime counters
  * New GetVPPLBCounters RPC serving an in-process snapshot refreshed
    by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls
    the LB plugin's four SimpleCounters (next, first, untracked,
    no-server) plus the FIB /net/route/to CombinedCounter for every
    VIP and every backend host prefix via a single DumpStats call.
  * FIB stats-index discovery via ip_route_lookup (internal/vpp/
    fibstats.go); per-worker reduction happens in the collector.
  * Prometheus collector exports vip_packets_total (kind label),
    vip_route_{packets,bytes}_total, and backend_route_{packets,
    bytes}_total. Metrics source interface extended with VIPStats /
    BackendRouteStats; vpp.Client publishes snapshots via
    atomic.Pointer and clears them on disconnect.
  * New 'show vpp lb counters' CLI command. The 'show vpp lbstate'
    and 'sync vpp lbstate' commands are restructured under 'show
    vpp lb {state,counters}' / 'sync vpp lb state' to make room
    for the new verb.

New feature: src-ip-sticky frontends
  * New frontend YAML key 'src-ip-sticky' (bool). Plumbed through
    config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call.
  * Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP.
    src_ip_sticky, and shown in 'show vpp lb state' output.
  * Scraped back from VPP by parsing 'show lb vips verbose' through
    cli_inband — lb_vip_details does not expose the flag. The same
    scrape also recovers the LB pool index for each VIP, which the
    stats-segment counters are keyed on. This is a documented
    temporary workaround until VPP ships an lb_vip_v2_dump.
  * src_ip_sticky cannot be mutated on a live VIP, so a flipped flag
    triggers a tear-down-and-recreate in reconcileVIP (ASes deleted
    with flush, VIP deleted, then re-added). Flip is logged.

New feature: frontend state aggregation and events
  * New health.FrontendState (unknown/up/down) and FrontendTransition
    types. A frontend is 'up' iff at least one backend has a nonzero
    effective weight, 'unknown' iff no backend has real state yet,
    and 'down' otherwise.
  * Checker tracks per-frontend aggregate state, recomputing after
    each backend transition and emitting a frontend-transition Event
    on change. Reload drops entries for removed frontends.
  * checker.Event gains an optional FrontendTransition pointer;
    backend- vs. frontend-transition events are demultiplexed on
    that field.
  * WatchEvents now sends an initial snapshot of frontend state on
    connect (mirroring the existing backend snapshot), subscribes
    once to the checker stream, and fans out to backend/frontend
    handlers based on the client's filter flags. The proto
    FrontendEvent message grows name + transition fields.
  * New Checker.FrontendState accessor.

Refactor: pure health helpers
  * Moved the priority-failover selector and the (pool idx, active
    pool, state, cfg weight) → (vpp weight, flush) mapping out of
    internal/vpp/lbsync.go into a new internal/health/weights.go so
    the checker can reuse them for frontend-state computation
    without importing internal/vpp.
  * New functions: health.ActivePoolIndex, BackendEffectiveWeight,
    EffectiveWeights, ComputeFrontendState. lbsync.go now calls
    these directly; vpp.EffectiveWeights is a thin wrapper over
    health.EffectiveWeights retained for the gRPC observability
    path. Fully unit-tested in internal/health/weights_test.go.

maglevc polish
  * --color default is now mode-aware: on in the interactive shell,
    off in one-shot mode so piped output is script-safe. Explicit
    --color=true/false still overrides.
  * New stripHostMask helper drops /32 and /128 from VIP display;
    non-host prefixes pass through unchanged.
  * Counter table column order fixed (first before next) and
    packets/bytes columns renamed to fib-packets/fib-bytes to
    clarify they come from the FIB, not the LB plugin.

Docs
  * config-guide: document src-ip-sticky, including the VIP
    recreate-on-change caveat.
  * user-guide, maglevc.1, maglevd.8: updated command tree, new
    counters command, color defaults, and the src-ip-sticky field.
This commit is contained in:
2026-04-12 15:59:02 +02:00
parent d5fbf5c640
commit fb62532fd5
25 changed files with 2163 additions and 549 deletions

View File

@@ -71,13 +71,19 @@ func buildTree() *Node {
Children: []*Node{showHealthCheckName}, Children: []*Node{showHealthCheckName},
} }
// show vpp info / lbstate // show vpp info / lb state / lb counters
showVPPInfo := &Node{Word: "info", Help: "Show VPP version, uptime, and connection status", Run: runShowVPPInfo} showVPPInfo := &Node{Word: "info", Help: "Show VPP version, uptime, and connection status", Run: runShowVPPInfo}
showVPPLBState := &Node{Word: "lbstate", Help: "Show VPP load-balancer state (VIPs and application servers)", Run: runShowVPPLBState} showVPPLBState := &Node{Word: "state", Help: "Show VPP load-balancer state (VIPs and application servers)", Run: runShowVPPLBState}
showVPPLBCounters := &Node{Word: "counters", Help: "Show VPP per-VIP and per-backend packet/byte counters (refreshed every ~5s server-side)", Run: runShowVPPLBCounters}
showVPPLB := &Node{
Word: "lb",
Help: "VPP load-balancer information",
Children: []*Node{showVPPLBState, showVPPLBCounters},
}
showVPP := &Node{ showVPP := &Node{
Word: "vpp", Word: "vpp",
Help: "VPP dataplane information", Help: "VPP dataplane information",
Children: []*Node{showVPPInfo, showVPPLBState}, Children: []*Node{showVPPInfo, showVPPLB},
} }
show.Children = []*Node{ show.Children = []*Node{
@@ -175,7 +181,7 @@ func buildTree() *Node {
Children: []*Node{configCheck, configReload}, Children: []*Node{configCheck, configReload},
} }
// sync vpp lbstate [<name>] // sync vpp lb state [<name>]
// //
// Without a name: run SyncLBStateAll (may remove stale VIPs). // Without a name: run SyncLBStateAll (may remove stale VIPs).
// With a name: run SyncLBStateVIP(name) for just that frontend (no removals). // With a name: run SyncLBStateVIP(name) for just that frontend (no removals).
@@ -186,15 +192,20 @@ func buildTree() *Node {
Run: runSyncVPPLBState, Run: runSyncVPPLBState,
} }
syncVPPLBState := &Node{ syncVPPLBState := &Node{
Word: "lbstate", Word: "state",
Help: "Sync the VPP load-balancer dataplane from the running config", Help: "Sync the VPP load-balancer dataplane from the running config",
Run: runSyncVPPLBState, Run: runSyncVPPLBState,
Children: []*Node{syncVPPLBStateName}, Children: []*Node{syncVPPLBStateName},
} }
syncVPPLB := &Node{
Word: "lb",
Help: "VPP load-balancer sync commands",
Children: []*Node{syncVPPLBState},
}
syncVPP := &Node{ syncVPP := &Node{
Word: "vpp", Word: "vpp",
Help: "VPP dataplane sync commands", Help: "VPP dataplane sync commands",
Children: []*Node{syncVPPLBState}, Children: []*Node{syncVPPLB},
} }
syncNode := &Node{ syncNode := &Node{
Word: "sync", Word: "sync",
@@ -295,10 +306,11 @@ func runShowVPPLBState(ctx context.Context, client grpcapi.MaglevClient, _ []str
for _, v := range state.Vips { for _, v := range state.Vips {
fmt.Println() fmt.Println()
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0) w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
fmt.Fprintf(w, "%s\t%s\n", label("vip"), v.Prefix) fmt.Fprintf(w, "%s\t%s\n", label("vip"), stripHostMask(v.Prefix))
fmt.Fprintf(w, " %s\t%s\n", label("protocol"), protoString(v.Protocol)) fmt.Fprintf(w, " %s\t%s\n", label("protocol"), protoString(v.Protocol))
fmt.Fprintf(w, " %s\t%d\n", label("port"), v.Port) fmt.Fprintf(w, " %s\t%d\n", label("port"), v.Port)
fmt.Fprintf(w, " %s\t%s\n", label("encap"), v.Encap) fmt.Fprintf(w, " %s\t%s\n", label("encap"), v.Encap)
fmt.Fprintf(w, " %s\t%t\n", label("src-ip-sticky"), v.SrcIpSticky)
fmt.Fprintf(w, " %s\t%d\n", label("flow-table-length"), v.FlowTableLength) fmt.Fprintf(w, " %s\t%d\n", label("flow-table-length"), v.FlowTableLength)
fmt.Fprintf(w, " %s\t%d\n", label("application-servers"), len(v.ApplicationServers)) fmt.Fprintf(w, " %s\t%d\n", label("application-servers"), len(v.ApplicationServers))
if err := w.Flush(); err != nil { if err := w.Flush(); err != nil {
@@ -314,6 +326,80 @@ func runShowVPPLBState(ctx context.Context, client grpcapi.MaglevClient, _ []str
return nil return nil
} }
// runShowVPPLBCounters prints the per-VIP and per-backend runtime counters
// captured by maglevd's 5s scrape loop. Values are up to 5 seconds stale;
// Prometheus is the right tool if you need live rates.
func runShowVPPLBCounters(ctx context.Context, client grpcapi.MaglevClient, _ []string) error {
ctx, cancel := context.WithTimeout(ctx, callTimeout)
defer cancel()
resp, err := client.GetVPPLBCounters(ctx, &grpcapi.GetVPPLBCountersRequest{})
if err != nil {
return err
}
if len(resp.Vips) == 0 && len(resp.Backends) == 0 {
fmt.Println("(no counters — VPP disconnected or scrape pending)")
return nil
}
// ---- frontend-counters ----
fmt.Println(label("frontend-counters"))
if len(resp.Vips) == 0 {
fmt.Println(" (none)")
} else {
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
fmt.Fprintf(w, " %s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n",
label("vip"), label("proto"), label("port"),
label("first"), label("next"),
label("untracked"), label("no-server"),
label("fib-packets"), label("fib-bytes"),
)
for _, v := range resp.Vips {
fmt.Fprintf(w, " %s\t%s\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
stripHostMask(v.Prefix), v.Protocol, v.Port,
v.FirstPacket, v.NextPacket,
v.UntrackedPacket, v.NoServer,
v.Packets, v.Bytes,
)
}
if err := w.Flush(); err != nil {
return err
}
}
fmt.Println()
// ---- backend-counters ----
fmt.Println(label("backend-counters"))
if len(resp.Backends) == 0 {
fmt.Println(" (none)")
return nil
}
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
fmt.Fprintf(w, " %s\t%s\t%s\t%s\n",
label("backend"), label("address"),
label("fib-packets"), label("fib-bytes"),
)
for _, b := range resp.Backends {
fmt.Fprintf(w, " %s\t%s\t%d\t%d\n",
b.Backend, b.Address, b.Packets, b.Bytes,
)
}
return w.Flush()
}
// stripHostMask trims "/32" (IPv4) or "/128" (IPv6) from a VIP's CIDR
// string. maglevd only programs host-prefix VIPs so the mask is always
// one of these two values and carries no information for a human reader.
// Non-host prefixes and unparseable strings are returned unchanged so
// future changes don't silently lose data.
func stripHostMask(prefix string) string {
if strings.HasSuffix(prefix, "/32") || strings.HasSuffix(prefix, "/128") {
return prefix[:strings.LastIndexByte(prefix, '/')]
}
return prefix
}
// protoString renders an IP protocol number as a name (tcp, udp, any, or numeric). // protoString renders an IP protocol number as a name (tcp, udp, any, or numeric).
func protoString(p uint32) string { func protoString(p uint32) string {
switch p { switch p {
@@ -385,6 +471,7 @@ func runShowFrontend(ctx context.Context, client grpcapi.MaglevClient, args []st
fmt.Fprintf(w, "%s\t%s\n", label("address"), info.Address) fmt.Fprintf(w, "%s\t%s\n", label("address"), info.Address)
fmt.Fprintf(w, "%s\t%s\n", label("protocol"), info.Protocol) fmt.Fprintf(w, "%s\t%s\n", label("protocol"), info.Protocol)
fmt.Fprintf(w, "%s\t%d\n", label("port"), info.Port) fmt.Fprintf(w, "%s\t%d\n", label("port"), info.Port)
fmt.Fprintf(w, "%s\t%t\n", label("src-ip-sticky"), info.SrcIpSticky)
if info.Description != "" { if info.Description != "" {
fmt.Fprintf(w, "%s\t%s\n", label("description"), info.Description) fmt.Fprintf(w, "%s\t%s\n", label("description"), info.Description)
} }

View File

@@ -25,9 +25,18 @@ func main() {
func run() error { func run() error {
serverAddr := flag.String("server", "localhost:9090", "maglev server address") serverAddr := flag.String("server", "localhost:9090", "maglev server address")
color := flag.Bool("color", true, "colorize static labels in output") color := flag.Bool("color", true, "colorize static labels in output (defaults to false in one-shot mode)")
flag.Parse() flag.Parse()
colorEnabled = *color
// Detect whether -color was explicitly set so we can pick a
// mode-aware default: color is useful in the interactive shell but
// noise (ANSI escapes) when piping one-shot output into scripts.
colorExplicit := false
flag.Visit(func(f *flag.Flag) {
if f.Name == "color" {
colorExplicit = true
}
})
conn, err := grpc.NewClient(*serverAddr, conn, err := grpc.NewClient(*serverAddr,
grpc.WithTransportCredentials(insecure.NewCredentials())) grpc.WithTransportCredentials(insecure.NewCredentials()))
@@ -41,13 +50,25 @@ func run() error {
args := flag.Args() args := flag.Args()
if len(args) == 0 { if len(args) == 0 {
// Interactive shell: announce version on startup. // Interactive shell: color defaults to true.
if colorExplicit {
colorEnabled = *color
} else {
colorEnabled = true
}
fmt.Printf("maglevc %s (commit %s, built %s)\n", fmt.Printf("maglevc %s (commit %s, built %s)\n",
buildinfo.Version(), buildinfo.Commit(), buildinfo.Date()) buildinfo.Version(), buildinfo.Commit(), buildinfo.Date())
return runShell(ctx, client) return runShell(ctx, client)
} }
// One-shot command from CLI arguments. // One-shot command from CLI arguments: color defaults to false so
// output is script-safe. Operators wanting color can still pass
// -color=true explicitly.
if colorExplicit {
colorEnabled = *color
} else {
colorEnabled = false
}
root := buildTree() root := buildTree()
tokens := splitTokens(strings.Join(args, " ")) tokens := splitTokens(strings.Join(args, " "))
return dispatch(ctx, root, client, tokens) return dispatch(ctx, root, client, tokens)

View File

@@ -29,10 +29,11 @@ func TestExpandPathsRoot(t *testing.T) {
"watch events <opt>", "watch events <opt>",
"config check", "config check",
"show vpp info", "show vpp info",
"show vpp lbstate", "show vpp lb state",
"show vpp lb counters",
"config reload", "config reload",
"sync vpp lbstate", "sync vpp lb state",
"sync vpp lbstate <name>", "sync vpp lb state <name>",
"quit", "quit",
"exit", "exit",
} }
@@ -63,9 +64,10 @@ func TestExpandPathsShow(t *testing.T) {
} }
} }
// version, frontends, frontends <name>, backends, backends <name>, // version, frontends, frontends <name>, backends, backends <name>,
// healthchecks, healthchecks <name>, vpp info, vpp lb = 9 lines // healthchecks, healthchecks <name>, vpp info, vpp lb state,
if len(lines) != 9 { // vpp lb counters = 10 lines
t.Errorf("expected exactly 9 show subcommands, got %d", len(lines)) if len(lines) != 10 {
t.Errorf("expected exactly 10 show subcommands, got %d", len(lines))
} }
} }

View File

@@ -288,6 +288,15 @@ ordered list of backend pools. The gRPC API exposes frontends by name.
cascades across further tiers. See [healthchecks.md](healthchecks.md#pool-failover) cascades across further tiers. See [healthchecks.md](healthchecks.md#pool-failover)
for the full failover semantics. All backends across all pools in a frontend must for the full failover semantics. All backends across all pools in a frontend must
have addresses of the same address family (all IPv4 or all IPv6). have addresses of the same address family (all IPv4 or all IPv6).
* ***src-ip-sticky***: Boolean, default `false`. When `true`, the VPP load-balancer
programs this VIP with source-IP-based stickiness — all flows from the same client
source IP hash to the same backend (subject to the Maglev consistent-hash bucket
assignment). Use this for protocols that require session affinity at the L3 level,
or when clients open many short flows that should land on one backend. Changing this
field in a running config and reloading causes maglevd to tear down the VIP (all
application servers are deleted with flush, then the VIP itself is deleted) and
recreate it with the new value; VPP has no API to mutate `src_ip_sticky` on an
existing VIP, and existing flow state cannot be preserved across the flip.
Each pool has: Each pool has:
@@ -324,6 +333,7 @@ frontends:
address: 2001:db8::1 address: 2001:db8::1
protocol: tcp protocol: tcp
port: 993 port: 993
src-ip-sticky: true
pools: pools:
- name: primary - name: primary
backends: backends:

View File

@@ -12,15 +12,21 @@ is an interactive CLI client for
.BR maglevd (8). .BR maglevd (8).
Without arguments it opens a readline shell with tab completion and Without arguments it opens a readline shell with tab completion and
inline help. inline help.
A command may also be passed directly on the command line for one\-shot use, A command may also be passed directly on the command line for one\-shot
which is useful for scripting (use use, which is useful for scripting; in that mode ANSI color is
.B \-color=false disabled by default so the output is script\-safe. Pass
to suppress ANSI codes). .B \-color=true
explicitly if you want color in one\-shot mode.
.PP .PP
When the shell starts it prints the build version and connects to the When the shell starts it prints the build version and connects to the
.B maglevd .B maglevd
gRPC server specified by gRPC server specified by
.BR \-server . .BR \-server .
All static tokens support tab completion; dynamic names (frontend,
backend, health\-check names) are completed by querying the server.
Type
.B ?
at any point to list completions without advancing the input.
.SH OPTIONS .SH OPTIONS
.TP .TP
.BI \-server " addr" .BI \-server " addr"
@@ -31,81 +37,59 @@ gRPC server.
.IR localhost:9090 ) .IR localhost:9090 )
.TP .TP
.BR \-color [=\fIbool\fR] .BR \-color [=\fIbool\fR]
Colorize static field labels in output using ANSI dark blue. Colorize static field labels in output using ANSI dark blue. The
(default: true) default is mode\-aware: enabled (true) in the interactive shell, and
Pass disabled (false) in one\-shot mode so that output piped into scripts
or files stays free of escape codes. Pass
.B \-color=true
or
.B \-color=false .B \-color=false
to disable, e.g.\& when piping output. explicitly to override the default for either mode.
.SH COMMANDS
Commands are entered at the
.B maglevc>
prompt or passed as arguments on the command line.
All static tokens support tab completion; dynamic names (frontend, backend,
health\-check names) are completed by querying the server.
Type
.B ?
at any point to list completions without advancing the input.
.SS Show commands
.TP
.B show version
Print build version, commit hash, and build date.
.TP
.B show frontends
List all configured frontends.
.TP
.BI "show frontend " name
Show address, protocol, port, description, and pools (with weights and
disabled\-backend notation) for the named frontend.
.TP
.B show backends
List all active backends.
.TP
.BI "show backend " name
Show address, current health state (with duration), enabled flag,
health\-check name, and recent state transitions with timestamps.
.TP
.B show healthchecks
List all configured health checks.
.TP
.BI "show healthcheck " name
Show the full configuration of the named health check.
.SS Set commands
.TP
.BI "set backend " "name " pause
Pause health checking for a backend, freezing its state.
.TP
.BI "set backend " "name " resume
Resume health checking for a backend; state resets to
.BR unknown .
.SS Shell commands
.TP
.BR quit ", " exit
Exit the interactive shell.
.SH COMPLETION
In interactive mode, press
.B Tab
to complete the current token.
If more than one completion is possible, all candidates are listed.
Type
.B ?
anywhere on the line to list candidates at that position without consuming
the character or advancing the cursor.
.SH EXAMPLES .SH EXAMPLES
One\-shot query (no color, suitable for scripts): Open the interactive shell (no command on the command line). Tab
completes the current token; typing
.B ?
lists candidates at the cursor.
.B quit
or
.B exit
(or Ctrl\-D) leaves the shell:
.PP .PP
.RS .RS
.EX .EX
maglevc \-color=false show backends $ maglevc
maglevc> show frontends
\&...
maglevc> quit
.EE .EE
.RE .RE
.PP .PP
Interactive session: One\-shot query passed on the command line (color is off by default
in this mode so the output is script\-safe):
.PP .PP
.RS .RS
.EX .EX
maglevc \-server 10.0.0.1:9090 $ maglevc show frontends
.EE .EE
.RE .RE
.PP
Query VPP version and connection status, forcing color on:
.PP
.RS
.EX
$ maglevc \-color=true show vpp info
.EE
.RE
.SH "FULL DOCUMENTATION"
This manpage documents only the invocation of
.BR maglevc .
For the complete command reference — every
.BR show ", " set ", " sync ", " config ", and " watch
command, with examples and operational notes — see the user guide at:
.PP
.RS
https://git.ipng.ch/ipng/vpp-maglev/docs/user-guide.md
.RE
.SH SEE ALSO .SH SEE ALSO
.BR maglevd (8) .BR maglevd (8)
.SH AUTHOR .SH AUTHOR

View File

@@ -1,6 +1,6 @@
.TH MAGLEVD 8 "April 2026" "vpp\-maglev" "System Administration" .TH MAGLEVD 8 "April 2026" "vpp\-maglev" "System Administration"
.SH NAME .SH NAME
maglevd \- Maglev health\-checker daemon maglevd \- Maglev health\-checker daemon and VPP load\-balancer controller
.SH SYNOPSIS .SH SYNOPSIS
.B maglevd .B maglevd
[\fB\-config\fR \fIfile\fR] [\fB\-config\fR \fIfile\fR]
@@ -10,23 +10,44 @@ maglevd \- Maglev health\-checker daemon
[\fB\-version\fR] [\fB\-version\fR]
.SH DESCRIPTION .SH DESCRIPTION
.B maglevd .B maglevd
is a health\-checker daemon that monitors backends (HTTP, TCP, ICMP) and has two responsibilities:
exposes their aggregated state via a gRPC API. .IP \(bu 2
Configuration is loaded from a YAML file. It monitors backends with active health checks (HTTP, TCP, ICMP) and
A running daemon reloads its configuration when it receives aggregates the results into a state machine per backend. Probe
.BR SIGHUP . intervals adapt automatically: a faster interval is used while a
backend is degraded, and a slower one once it has been continuously
down. Operators can also
.B pause/resume
or
.B disable/enable
a backend out of band.
.IP \(bu 2
It programs the VPP dataplane via the
.B lb
(load\-balancer) plugin. Each frontend in the config becomes a VPP
load\-balancer VIP; each healthy backend becomes an application server
under its frontend's VIPs. State transitions drive
application\-server weight updates over the VPP binary API so that
unhealthy backends stop receiving new flows. Drift between the
running config and VPP's view is reconciled periodically
.RB ( maglev.vpp.lb.sync-interval ,
default 30s), on
.B SIGHUP
reloads, and on operator request via
.BR maglevc .
.PP .PP
Backends are tracked with a rise/fall counter model. The aggregated backend state, VPP dataplane state, and per\-VIP /
Each backend cycles through the states per\-backend stats\-segment counters are exposed via a gRPC API (and
.BR unknown , scraped into Prometheus when the
.BR up , .B /metrics
.BR down , endpoint is enabled).
and See
.B paused .BR maglevc (1)
(operator\-set). for the interactive CLI client.
Health\-check intervals adapt automatically: a faster interval is used when .PP
a backend is not fully healthy, and a slower interval when it has been Configuration is loaded from a YAML file. A running daemon reloads
continuously down. its configuration when it receives
.BR SIGHUP .
.SH OPTIONS .SH OPTIONS
Each flag may also be supplied via an environment variable (shown in Each flag may also be supplied via an environment variable (shown in
parentheses); the flag takes precedence. parentheses); the flag takes precedence.
@@ -63,12 +84,16 @@ Print version, commit hash, and build date, then exit.
.SH SIGNALS .SH SIGNALS
.TP .TP
.B SIGHUP .B SIGHUP
Reload the configuration file without restarting. Reload the configuration file without restarting. New backends are
New backends are added, removed backends are stopped, and unchanged added, removed backends are stopped, and unchanged backend workers
backend workers are left running. keep running. After the reload, the VPP dataplane is reconciled so
added/removed frontends and application servers are programmed
immediately.
.TP .TP
.BR SIGTERM ", " SIGINT .BR SIGTERM ", " SIGINT
Gracefully shut down: drain active gRPC streams, then exit. Gracefully shut down: drain active gRPC streams, then exit. VPP
dataplane state is left in place so that existing VIPs continue to
forward traffic during a restart.
.SH FILES .SH FILES
.TP .TP
.I /etc/vpp-maglev/maglev.yaml .I /etc/vpp-maglev/maglev.yaml
@@ -77,20 +102,34 @@ Default configuration file (YAML).
.I /etc/default/vpp-maglev .I /etc/default/vpp-maglev
Environment file sourced by the systemd unit before starting Environment file sourced by the systemd unit before starting
.BR maglevd . .BR maglevd .
.SH CONFIGURATION .TP
The configuration file uses YAML and has four top\-level sections under the .I /run/vpp/api.sock
.B maglev VPP binary\-API socket
key: .BR maglevd
.BR healthchecker , connects to when programming the
.BR healthchecks , .B lb
.BR backends , plugin.
and .TP
.BR frontends . .I /run/vpp/stats.sock
VPP stats\-segment socket used to scrape per\-VIP and per\-backend
packet/byte counters.
.SH "FULL DOCUMENTATION"
This manpage documents only the invocation of
.BR maglevd .
For the configuration file reference (including the
.B maglev.vpp.lb
section controlling the VPP integration), the health\-check state
machine, and the full operational guide, see:
.PP .PP
See the example at .RS
.I /etc/vpp-maglev/maglev.yaml https://git.ipng.ch/ipng/vpp-maglev/docs/config-guide.md
and the full reference in the project documentation. .br
https://git.ipng.ch/ipng/vpp-maglev/docs/healthchecks.md
.br
https://git.ipng.ch/ipng/vpp-maglev/docs/user-guide.md
.RE
.SH SEE ALSO .SH SEE ALSO
.BR maglevc (1) .BR maglevc (1),
.BR vpp (8)
.SH AUTHOR .SH AUTHOR
Pim van Pelt <pim@ipng.ch> Pim van Pelt <pim@ipng.ch>

View File

@@ -100,11 +100,12 @@ maglevc [--server host:port] [--color[=bool]] [command...]
| Flag | Default | Description | | Flag | Default | Description |
|---|---|---| |---|---|---|
| `--server` | `localhost:9090` | Address of the `maglevd` gRPC server. | | `--server` | `localhost:9090` | Address of the `maglevd` gRPC server. |
| `--color` | `true` | Colorize static field labels in output (dark blue ANSI). Pass `--color=false` to disable, e.g. when piping. | | `--color` | mode-aware | Colorize static field labels (dark blue ANSI). Defaults to `true` in the interactive shell and `false` in one-shot mode, so output piped into scripts stays free of escape codes. Pass `--color=true` or `--color=false` explicitly to override either default. |
When `command` arguments are supplied the command is executed and `maglevc` When `command` arguments are supplied the command is executed and `maglevc`
exits. When no arguments are given an interactive shell is started and the exits; in this mode ANSI color is off by default so the output is script-safe.
build version is printed on entry. When no arguments are given an interactive shell is started, the build version
is printed on entry, and color is on by default.
### Commands ### Commands
@@ -112,9 +113,9 @@ build version is printed on entry.
show version Print build version, commit hash, and build date. show version Print build version, commit hash, and build date.
show frontends [<name>] Without name: list all frontend names. show frontends [<name>] Without name: list all frontend names.
With name: show address, protocol, port, description, With name: show address, protocol, port, src-ip-sticky,
and pools. Each pool lists its backends with two description, and pools. Each pool lists its backends
weight columns: with two weight columns:
weight — configured weight from the YAML weight — configured weight from the YAML
effective — state-aware weight after pool failover effective — state-aware weight after pool failover
(what gets programmed into VPP) (what gets programmed into VPP)
@@ -131,12 +132,20 @@ show healthchecks [<name>] Without name: list all health-check names.
show vpp info Show VPP version, build date, PID, uptime, and when show vpp info Show VPP version, build date, PID, uptime, and when
maglevd connected. Returns an error if VPP is not maglevd connected. Returns an error if VPP is not
connected. connected.
show vpp lbstate Show the VPP load-balancer plugin state: global show vpp lb state Show the VPP load-balancer plugin state: global
configuration, configured VIPs, and their attached configuration, configured VIPs, and their attached
application servers (address, weight, bucket count). application servers (address, weight, bucket count).
Returns an error if VPP is not connected. Returns an error if VPP is not connected.
show vpp lb counters Show per-VIP and per-backend packet/byte counters
from the VPP stats segment, refreshed roughly every
five seconds by maglevd. Each VIP row reports the LB
plugin counters (next, first, untracked, no-server)
and the FIB packets/bytes at the VIP's host prefix.
Each backend row reports FIB packets/bytes at the
backend's /32 or /128 prefix. Use Prometheus for
live rates; this command shows absolute values.
sync vpp lbstate [<name>] Reconcile the VPP load-balancer dataplane from the sync vpp lb state [<name>] Reconcile the VPP load-balancer dataplane from the
running config. Without a name: runs a full sync — running config. Without a name: runs a full sync —
creates missing VIPs, removes stale VIPs, and adjusts creates missing VIPs, removes stale VIPs, and adjusts
application-server membership and weights across all application-server membership and weights across all

View File

@@ -23,13 +23,26 @@ type BackendSnapshot struct {
Config config.Backend Config config.Backend
} }
// Event is emitted on every backend state transition, once per frontend that // Event is emitted on every state transition the checker observes. There are
// references the backend. // two kinds, distinguished by which of BackendName or FrontendTransition is
// populated:
//
// - Backend transition: FrontendName is the frontend that references the
// backend (one event per frontend per backend transition), BackendName
// and Backend are set, and Transition carries the health.Transition.
// FrontendTransition is nil.
// - Frontend transition: FrontendName is the frontend whose aggregate state
// changed, FrontendTransition is non-nil. BackendName and Backend are
// empty, Transition is the zero value.
//
// Consumers dispatch on FrontendTransition != nil.
type Event struct { type Event struct {
FrontendName string FrontendName string
BackendName string BackendName string
Backend net.IP Backend net.IP
Transition health.Transition Transition health.Transition
FrontendTransition *health.FrontendTransition
} }
type worker struct { type worker struct {
@@ -49,6 +62,13 @@ type Checker struct {
mu sync.RWMutex mu sync.RWMutex
workers map[string]*worker // keyed by backend name workers map[string]*worker // keyed by backend name
// frontendStates tracks the aggregated state of every configured frontend
// (unknown/up/down). Updated whenever a backend transition happens; a
// change emits a frontend-transition Event. The zero value for a missing
// key is FrontendStateUnknown, so initial-reference accesses behave
// correctly even without explicit seeding.
frontendStates map[string]health.FrontendState
subsMu sync.Mutex subsMu sync.Mutex
nextID int nextID int
subs map[int]chan Event subs map[int]chan Event
@@ -60,6 +80,7 @@ func New(cfg *config.Config) *Checker {
return &Checker{ return &Checker{
cfg: cfg, cfg: cfg,
workers: make(map[string]*worker), workers: make(map[string]*worker),
frontendStates: make(map[string]health.FrontendState),
subs: make(map[int]chan Event), subs: make(map[int]chan Event),
eventCh: make(chan Event, 256), eventCh: make(chan Event, 256),
} }
@@ -131,6 +152,13 @@ func (c *Checker) Reload(ctx context.Context, cfg *config.Config) error {
c.emitForBackend(name, c.workers[name].backend.Address, c.workers[name].backend.Transitions[0], cfg.Frontends) c.emitForBackend(name, c.workers[name].backend.Address, c.workers[name].backend.Transitions[0], cfg.Frontends)
} }
// Drop frontendStates entries for frontends no longer in config.
for feName := range c.frontendStates {
if _, ok := cfg.Frontends[feName]; !ok {
delete(c.frontendStates, feName)
}
}
c.cfg = cfg c.cfg = cfg
return nil return nil
} }
@@ -174,6 +202,18 @@ func (c *Checker) BackendState(name string) (health.State, bool) {
return w.backend.State, true return w.backend.State, true
} }
// FrontendState returns the current aggregate state of a frontend (unknown,
// up, or down). Returns (FrontendStateUnknown, false) when the frontend is
// not known to the checker.
func (c *Checker) FrontendState(name string) (health.FrontendState, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
if _, ok := c.cfg.Frontends[name]; !ok {
return health.FrontendStateUnknown, false
}
return c.frontendStates[name], true
}
// ListFrontends returns the names of all configured frontends. // ListFrontends returns the names of all configured frontends.
func (c *Checker) ListFrontends() []string { func (c *Checker) ListFrontends() []string {
c.mu.RLock() c.mu.RLock()
@@ -575,24 +615,60 @@ func (c *Checker) runProbe(ctx context.Context, name string, pos, total int) {
} }
} }
// emitForBackend emits one Event per frontend that references backendName // emitForBackend emits one backend-transition Event per frontend that
// (in any pool), using the provided frontends map. Must be called with c.mu held. // references backendName (in any pool), using the provided frontends map.
// After emitting the backend event for a frontend, it also re-computes that
// frontend's aggregate state and emits a frontend-transition Event if the
// state has changed. Must be called with c.mu held.
func (c *Checker) emitForBackend(backendName string, addr net.IP, t health.Transition, frontends map[string]config.Frontend) { func (c *Checker) emitForBackend(backendName string, addr net.IP, t health.Transition, frontends map[string]config.Frontend) {
for feName, fe := range frontends { for feName, fe := range frontends {
emitted := false if !frontendReferencesBackend(fe, backendName) {
for _, pool := range fe.Pools { continue
if emitted {
break
} }
for name := range pool.Backends {
if name == backendName {
c.emit(Event{FrontendName: feName, BackendName: backendName, Backend: addr, Transition: t}) c.emit(Event{FrontendName: feName, BackendName: backendName, Backend: addr, Transition: t})
emitted = true c.updateFrontendState(feName, fe)
break }
}
// frontendReferencesBackend reports whether fe has the named backend in any
// of its pools.
func frontendReferencesBackend(fe config.Frontend, backendName string) bool {
for _, pool := range fe.Pools {
if _, ok := pool.Backends[backendName]; ok {
return true
}
}
return false
}
// updateFrontendState recomputes the aggregate state of fe, compares against
// the last known state, and emits a frontend-transition Event on change.
// Must be called with c.mu held. The current state is read from the worker
// map — so the caller (who already holds c.mu) sees a consistent view.
func (c *Checker) updateFrontendState(feName string, fe config.Frontend) {
states := make(map[string]health.State)
for _, pool := range fe.Pools {
for bName := range pool.Backends {
if w, ok := c.workers[bName]; ok {
states[bName] = w.backend.State
} else {
states[bName] = health.StateUnknown
} }
} }
} }
newState := health.ComputeFrontendState(fe, states)
old := c.frontendStates[feName] // zero value (Unknown) on first access
if old == newState {
return
} }
c.frontendStates[feName] = newState
ft := health.FrontendTransition{From: old, To: newState, At: time.Now()}
slog.Info("frontend-transition",
"frontend", feName,
"from", old.String(),
"to", newState.String(),
)
c.emit(Event{FrontendName: feName, FrontendTransition: &ft})
} }
// emit sends an event to the internal fan-out channel (non-blocking). // emit sends an event to the internal fan-out channel (non-blocking).

View File

@@ -119,6 +119,7 @@ type Frontend struct {
Protocol string // "tcp", "udp", or "" (all traffic) Protocol string // "tcp", "udp", or "" (all traffic)
Port uint16 // 0 means omitted (all ports) Port uint16 // 0 means omitted (all ports)
Pools []Pool // ordered tiers; first pool with any up backend is active Pools []Pool // ordered tiers; first pool with any up backend is active
SrcIPSticky bool // when true, VPP LB uses src-IP-based hashing for this VIP
} }
// ---- raw YAML types -------------------------------------------------------- // ---- raw YAML types --------------------------------------------------------
@@ -199,6 +200,7 @@ type rawFrontend struct {
Protocol string `yaml:"protocol"` Protocol string `yaml:"protocol"`
Port uint16 `yaml:"port"` Port uint16 `yaml:"port"`
Pools []rawPool `yaml:"pools"` Pools []rawPool `yaml:"pools"`
SrcIPSticky bool `yaml:"src-ip-sticky"`
} }
// ---- Check / Load ---------------------------------------------------------- // ---- Check / Load ----------------------------------------------------------
@@ -519,6 +521,7 @@ func convertFrontend(name string, r *rawFrontend, backends map[string]Backend) (
Description: r.Description, Description: r.Description,
Protocol: r.Protocol, Protocol: r.Protocol,
Port: r.Port, Port: r.Port,
SrcIPSticky: r.SrcIPSticky,
} }
ip := net.ParseIP(r.Address) ip := net.ParseIP(r.Address)

File diff suppressed because it is too large Load Diff

View File

@@ -36,6 +36,7 @@ const (
Maglev_GetVPPInfo_FullMethodName = "/maglev.Maglev/GetVPPInfo" Maglev_GetVPPInfo_FullMethodName = "/maglev.Maglev/GetVPPInfo"
Maglev_GetVPPLBState_FullMethodName = "/maglev.Maglev/GetVPPLBState" Maglev_GetVPPLBState_FullMethodName = "/maglev.Maglev/GetVPPLBState"
Maglev_SyncVPPLBState_FullMethodName = "/maglev.Maglev/SyncVPPLBState" Maglev_SyncVPPLBState_FullMethodName = "/maglev.Maglev/SyncVPPLBState"
Maglev_GetVPPLBCounters_FullMethodName = "/maglev.Maglev/GetVPPLBCounters"
) )
// MaglevClient is the client API for Maglev service. // MaglevClient is the client API for Maglev service.
@@ -61,6 +62,7 @@ type MaglevClient interface {
GetVPPInfo(ctx context.Context, in *GetVPPInfoRequest, opts ...grpc.CallOption) (*VPPInfo, error) GetVPPInfo(ctx context.Context, in *GetVPPInfoRequest, opts ...grpc.CallOption) (*VPPInfo, error)
GetVPPLBState(ctx context.Context, in *GetVPPLBStateRequest, opts ...grpc.CallOption) (*VPPLBState, error) GetVPPLBState(ctx context.Context, in *GetVPPLBStateRequest, opts ...grpc.CallOption) (*VPPLBState, error)
SyncVPPLBState(ctx context.Context, in *SyncVPPLBStateRequest, opts ...grpc.CallOption) (*SyncVPPLBStateResponse, error) SyncVPPLBState(ctx context.Context, in *SyncVPPLBStateRequest, opts ...grpc.CallOption) (*SyncVPPLBStateResponse, error)
GetVPPLBCounters(ctx context.Context, in *GetVPPLBCountersRequest, opts ...grpc.CallOption) (*VPPLBCounters, error)
} }
type maglevClient struct { type maglevClient struct {
@@ -250,6 +252,16 @@ func (c *maglevClient) SyncVPPLBState(ctx context.Context, in *SyncVPPLBStateReq
return out, nil return out, nil
} }
func (c *maglevClient) GetVPPLBCounters(ctx context.Context, in *GetVPPLBCountersRequest, opts ...grpc.CallOption) (*VPPLBCounters, error) {
cOpts := append([]grpc.CallOption{grpc.StaticMethod()}, opts...)
out := new(VPPLBCounters)
err := c.cc.Invoke(ctx, Maglev_GetVPPLBCounters_FullMethodName, in, out, cOpts...)
if err != nil {
return nil, err
}
return out, nil
}
// MaglevServer is the server API for Maglev service. // MaglevServer is the server API for Maglev service.
// All implementations must embed UnimplementedMaglevServer // All implementations must embed UnimplementedMaglevServer
// for forward compatibility. // for forward compatibility.
@@ -273,6 +285,7 @@ type MaglevServer interface {
GetVPPInfo(context.Context, *GetVPPInfoRequest) (*VPPInfo, error) GetVPPInfo(context.Context, *GetVPPInfoRequest) (*VPPInfo, error)
GetVPPLBState(context.Context, *GetVPPLBStateRequest) (*VPPLBState, error) GetVPPLBState(context.Context, *GetVPPLBStateRequest) (*VPPLBState, error)
SyncVPPLBState(context.Context, *SyncVPPLBStateRequest) (*SyncVPPLBStateResponse, error) SyncVPPLBState(context.Context, *SyncVPPLBStateRequest) (*SyncVPPLBStateResponse, error)
GetVPPLBCounters(context.Context, *GetVPPLBCountersRequest) (*VPPLBCounters, error)
mustEmbedUnimplementedMaglevServer() mustEmbedUnimplementedMaglevServer()
} }
@@ -334,6 +347,9 @@ func (UnimplementedMaglevServer) GetVPPLBState(context.Context, *GetVPPLBStateRe
func (UnimplementedMaglevServer) SyncVPPLBState(context.Context, *SyncVPPLBStateRequest) (*SyncVPPLBStateResponse, error) { func (UnimplementedMaglevServer) SyncVPPLBState(context.Context, *SyncVPPLBStateRequest) (*SyncVPPLBStateResponse, error) {
return nil, status.Error(codes.Unimplemented, "method SyncVPPLBState not implemented") return nil, status.Error(codes.Unimplemented, "method SyncVPPLBState not implemented")
} }
func (UnimplementedMaglevServer) GetVPPLBCounters(context.Context, *GetVPPLBCountersRequest) (*VPPLBCounters, error) {
return nil, status.Error(codes.Unimplemented, "method GetVPPLBCounters not implemented")
}
func (UnimplementedMaglevServer) mustEmbedUnimplementedMaglevServer() {} func (UnimplementedMaglevServer) mustEmbedUnimplementedMaglevServer() {}
func (UnimplementedMaglevServer) testEmbeddedByValue() {} func (UnimplementedMaglevServer) testEmbeddedByValue() {}
@@ -654,6 +670,24 @@ func _Maglev_SyncVPPLBState_Handler(srv interface{}, ctx context.Context, dec fu
return interceptor(ctx, in, info, handler) return interceptor(ctx, in, info, handler)
} }
func _Maglev_GetVPPLBCounters_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
in := new(GetVPPLBCountersRequest)
if err := dec(in); err != nil {
return nil, err
}
if interceptor == nil {
return srv.(MaglevServer).GetVPPLBCounters(ctx, in)
}
info := &grpc.UnaryServerInfo{
Server: srv,
FullMethod: Maglev_GetVPPLBCounters_FullMethodName,
}
handler := func(ctx context.Context, req interface{}) (interface{}, error) {
return srv.(MaglevServer).GetVPPLBCounters(ctx, req.(*GetVPPLBCountersRequest))
}
return interceptor(ctx, in, info, handler)
}
// Maglev_ServiceDesc is the grpc.ServiceDesc for Maglev service. // Maglev_ServiceDesc is the grpc.ServiceDesc for Maglev service.
// It's only intended for direct use with grpc.RegisterService, // It's only intended for direct use with grpc.RegisterService,
// and not to be introspected or modified (even as a copy) // and not to be introspected or modified (even as a copy)
@@ -725,6 +759,10 @@ var Maglev_ServiceDesc = grpc.ServiceDesc{
MethodName: "SyncVPPLBState", MethodName: "SyncVPPLBState",
Handler: _Maglev_SyncVPPLBState_Handler, Handler: _Maglev_SyncVPPLBState_Handler,
}, },
{
MethodName: "GetVPPLBCounters",
Handler: _Maglev_GetVPPLBCounters_Handler,
},
}, },
Streams: []grpc.StreamDesc{ Streams: []grpc.StreamDesc{
{ {

View File

@@ -7,6 +7,7 @@ import (
"errors" "errors"
"log/slog" "log/slog"
"net" "net"
"sort"
"google.golang.org/grpc/codes" "google.golang.org/grpc/codes"
"google.golang.org/grpc/status" "google.golang.org/grpc/status"
@@ -128,14 +129,13 @@ func (s *Server) GetHealthCheck(_ context.Context, req *GetHealthCheckRequest) (
} }
// WatchEvents streams events to the client. On connect, the current state of // WatchEvents streams events to the client. On connect, the current state of
// all backends is sent as synthetic BackendEvents. Afterwards, live events are // every backend and/or frontend is sent as a synthetic event. Afterwards,
// forwarded based on the filter flags in req. An unset (nil) flag defaults to // live events are forwarded based on the filter flags in req. An unset (nil)
// true (subscribe). An empty log_level defaults to "info". // flag defaults to true (subscribe). An empty log_level defaults to "info".
func (s *Server) WatchEvents(req *WatchRequest, stream Maglev_WatchEventsServer) error { func (s *Server) WatchEvents(req *WatchRequest, stream Maglev_WatchEventsServer) error {
wantLog := req.Log == nil || *req.Log wantLog := req.Log == nil || *req.Log
wantBackend := req.Backend == nil || *req.Backend wantBackend := req.Backend == nil || *req.Backend
wantFrontend := req.Frontend == nil || *req.Frontend wantFrontend := req.Frontend == nil || *req.Frontend
_ = wantFrontend // no frontend events emitted yet
logLevel := slog.LevelInfo logLevel := slog.LevelInfo
if req.LogLevel != "" { if req.LogLevel != "" {
@@ -152,8 +152,20 @@ func (s *Server) WatchEvents(req *WatchRequest, stream Maglev_WatchEventsServer)
defer unsub() defer unsub()
} }
// Subscribe to backend events; send initial state snapshot first. // Subscribe to the checker event stream once; we demultiplex backend
var backendCh <-chan checker.Event // and frontend events in the select below. Skip the subscription if
// neither kind is wanted.
var eventCh <-chan checker.Event
if wantBackend || wantFrontend {
var unsub func()
eventCh, unsub = s.checker.Subscribe()
defer unsub()
}
// Send initial state snapshot: one synthetic event per existing backend
// (if wanted), and one per existing frontend (if wanted). Clients that
// connect mid-flight see the current state immediately instead of
// waiting for the next transition.
if wantBackend { if wantBackend {
for _, name := range s.checker.ListBackends() { for _, name := range s.checker.ListBackends() {
snap, ok := s.checker.GetBackend(name) snap, ok := s.checker.GetBackend(name)
@@ -172,9 +184,25 @@ func (s *Server) WatchEvents(req *WatchRequest, stream Maglev_WatchEventsServer)
return err return err
} }
} }
var unsub func() }
backendCh, unsub = s.checker.Subscribe() if wantFrontend {
defer unsub() for _, name := range s.checker.ListFrontends() {
fs, ok := s.checker.FrontendState(name)
if !ok {
continue
}
ev := &Event{Event: &Event_Frontend{Frontend: &FrontendEvent{
FrontendName: name,
Transition: &TransitionRecord{
From: fs.String(),
To: fs.String(),
AtUnixNs: 0,
},
}}}
if err := stream.Send(ev); err != nil {
return err
}
}
} }
for { for {
@@ -190,10 +218,29 @@ func (s *Server) WatchEvents(req *WatchRequest, stream Maglev_WatchEventsServer)
if err := stream.Send(&Event{Event: &Event_Log{Log: le}}); err != nil { if err := stream.Send(&Event{Event: &Event_Log{Log: le}}); err != nil {
return err return err
} }
case e, ok := <-backendCh: case e, ok := <-eventCh:
if !ok { if !ok {
return nil return nil
} }
if e.FrontendTransition != nil {
if !wantFrontend {
continue
}
if err := stream.Send(&Event{Event: &Event_Frontend{Frontend: &FrontendEvent{
FrontendName: e.FrontendName,
Transition: &TransitionRecord{
From: e.FrontendTransition.From.String(),
To: e.FrontendTransition.To.String(),
AtUnixNs: e.FrontendTransition.At.UnixNano(),
},
}}}); err != nil {
return err
}
continue
}
if !wantBackend {
continue
}
if err := stream.Send(&Event{Event: &Event_Backend{Backend: &BackendEvent{ if err := stream.Send(&Event{Event: &Event_Backend{Backend: &BackendEvent{
BackendName: e.BackendName, BackendName: e.BackendName,
Transition: transitionToProto(e.Transition), Transition: transitionToProto(e.Transition),
@@ -302,6 +349,51 @@ func (s *Server) GetVPPLBState(_ context.Context, _ *GetVPPLBStateRequest) (*VPP
return lbStateToProto(state), nil return lbStateToProto(state), nil
} }
// GetVPPLBCounters returns the most recent per-VIP and per-backend counter
// snapshot captured by the client's 5s scrape loop. The call is served
// from an in-process cache and does not hit VPP. An empty response is
// returned when VPP is disconnected or no scrape has completed yet.
func (s *Server) GetVPPLBCounters(_ context.Context, _ *GetVPPLBCountersRequest) (*VPPLBCounters, error) {
if s.vppClient == nil {
return nil, status.Error(codes.Unavailable, "VPP integration is disabled")
}
out := &VPPLBCounters{}
for _, v := range s.vppClient.VIPStats() {
out.Vips = append(out.Vips, &VPPLBVIPCounters{
Prefix: v.Prefix,
Protocol: v.Protocol,
Port: uint32(v.Port),
NextPacket: v.NextPkt,
FirstPacket: v.FirstPkt,
UntrackedPacket: v.Untracked,
NoServer: v.NoServer,
Packets: v.Packets,
Bytes: v.Bytes,
})
}
sort.Slice(out.Vips, func(i, j int) bool {
if out.Vips[i].Prefix != out.Vips[j].Prefix {
return out.Vips[i].Prefix < out.Vips[j].Prefix
}
if out.Vips[i].Protocol != out.Vips[j].Protocol {
return out.Vips[i].Protocol < out.Vips[j].Protocol
}
return out.Vips[i].Port < out.Vips[j].Port
})
for _, b := range s.vppClient.BackendRouteStats() {
out.Backends = append(out.Backends, &VPPLBBackendCounters{
Backend: b.Backend,
Address: b.Address,
Packets: b.Packets,
Bytes: b.Bytes,
})
}
sort.Slice(out.Backends, func(i, j int) bool {
return out.Backends[i].Backend < out.Backends[j].Backend
})
return out, nil
}
// SyncVPPLBState runs the LB reconciler. With frontend_name unset it does a // SyncVPPLBState runs the LB reconciler. With frontend_name unset it does a
// full sync (SyncLBStateAll), which may remove stale VIPs. With frontend_name // full sync (SyncLBStateAll), which may remove stale VIPs. With frontend_name
// set it does a single-VIP sync (SyncLBStateVIP) that only adds/updates. // set it does a single-VIP sync (SyncLBStateVIP) that only adds/updates.
@@ -342,6 +434,7 @@ func lbStateToProto(s *vpp.LBState) *VPPLBState {
Port: uint32(v.Port), Port: uint32(v.Port),
Encap: v.Encap, Encap: v.Encap,
FlowTableLength: uint32(v.FlowTableLength), FlowTableLength: uint32(v.FlowTableLength),
SrcIpSticky: v.SrcIPSticky,
} }
for _, a := range v.ASes { for _, a := range v.ASes {
var ts int64 var ts int64
@@ -393,6 +486,7 @@ func frontendToProto(name string, fe config.Frontend, src vpp.StateSource) *Fron
Port: uint32(fe.Port), Port: uint32(fe.Port),
Description: fe.Description, Description: fe.Description,
Pools: pools, Pools: pools,
SrcIpSticky: fe.SrcIPSticky,
} }
} }

View File

@@ -353,9 +353,11 @@ func TestWatchEventsServerShutdown(t *testing.T) {
if err != nil { if err != nil {
t.Fatalf("WatchEvents: %v", err) t.Fatalf("WatchEvents: %v", err)
} }
// Drain the initial synthetic backend event. // Drain the initial synthetic snapshots (one per backend, one per frontend).
for i := 0; i < 2; i++ {
if _, err := stream.Recv(); err != nil { if _, err := stream.Recv(); err != nil {
t.Fatalf("initial Recv: %v", err) t.Fatalf("initial Recv %d: %v", i, err)
}
} }
// Cancel the server context; the stream must terminate. // Cancel the server context; the stream must terminate.

View File

@@ -64,6 +64,43 @@ type Transition struct {
Result ProbeResult Result ProbeResult
} }
// FrontendState is the aggregated state of a frontend derived from the
// effective weights of its member backends. Frontends do not have their
// own rise/fall counters: they're purely a reduction over backend state.
//
// - unknown: no backends, or every referenced backend is in StateUnknown
// (the checker has no probe data yet).
// - up: at least one backend has effective weight > 0 — the VIP has
// something to serve.
// - down: backends exist with real state, but none have effective
// weight > 0 — the VIP has nothing to serve.
type FrontendState int
const (
FrontendStateUnknown FrontendState = iota
FrontendStateUp
FrontendStateDown
)
func (s FrontendState) String() string {
switch s {
case FrontendStateUnknown:
return "unknown"
case FrontendStateUp:
return "up"
case FrontendStateDown:
return "down"
}
return "unknown"
}
// FrontendTransition records a frontend state change event.
type FrontendTransition struct {
From FrontendState
To FrontendState
At time.Time
}
// HealthCounter is HAProxy's single-integer rise/fall model. // HealthCounter is HAProxy's single-integer rise/fall model.
// //
// Health ∈ [0, Rise+Fall-1]. Server is UP when Health >= Rise, DOWN when // Health ∈ [0, Rise+Fall-1]. Server is UP when Health >= Rise, DOWN when

118
internal/health/weights.go Normal file
View File

@@ -0,0 +1,118 @@
// Copyright (c) 2026, Pim van Pelt <pim@ipng.ch>
package health
import (
"git.ipng.ch/ipng/vpp-maglev/internal/config"
)
// ActivePoolIndex returns the priority-failover pool index for fe given
// the current backend states. The active pool is the first pool that
// contains at least one backend in StateUp — pool[0] is the primary,
// pool[1] the first fallback, and so on. Returns 0 when no pool has
// any up backend, in which case every backend maps to weight 0 and the
// return value is unobservable.
func ActivePoolIndex(fe config.Frontend, states map[string]State) int {
for i, pool := range fe.Pools {
for bName := range pool.Backends {
if states[bName] == StateUp {
return i
}
}
}
return 0
}
// BackendEffectiveWeight is the pure mapping from (pool index, active pool,
// backend state, config weight) to the desired VPP AS weight and flush hint.
// This is the single source of truth for the state → dataplane rule.
//
// A backend gets its configured weight iff it is up AND belongs to the
// currently-active pool. Every other case yields weight 0. Only StateDisabled
// produces flush=true (immediate session teardown).
//
// state in active pool not in active pool flush
// -------- -------------- ------------------- -----
// unknown 0 0 no
// up configured 0 (standby) no
// down 0 0 no
// paused 0 0 no
// disabled 0 0 yes
func BackendEffectiveWeight(poolIdx, activePool int, state State, cfgWeight int) (weight uint8, flush bool) {
switch state {
case StateUp:
if poolIdx == activePool {
return clampWeight(cfgWeight), false
}
return 0, false
case StateDisabled:
return 0, true
default:
return 0, false
}
}
// EffectiveWeights computes per-pool per-backend effective weights for fe,
// given a snapshot of backend states. Result layout: weights[poolIdx][backendName].
func EffectiveWeights(fe config.Frontend, states map[string]State) map[int]map[string]uint8 {
activePool := ActivePoolIndex(fe, states)
out := make(map[int]map[string]uint8, len(fe.Pools))
for poolIdx, pool := range fe.Pools {
out[poolIdx] = make(map[string]uint8, len(pool.Backends))
for bName, pb := range pool.Backends {
w, _ := BackendEffectiveWeight(poolIdx, activePool, states[bName], pb.Weight)
out[poolIdx][bName] = w
}
}
return out
}
// ComputeFrontendState derives the FrontendState for fe from a snapshot of
// backend states. Rules:
//
// - no backends → unknown
// - every referenced backend is in StateUnknown → unknown
// - any backend has effective weight > 0 → up
// - otherwise → down
func ComputeFrontendState(fe config.Frontend, states map[string]State) FrontendState {
// Unique set of backends referenced by this frontend (a single backend
// may appear in multiple pools; we count it once).
seen := make(map[string]struct{})
for _, pool := range fe.Pools {
for bName := range pool.Backends {
seen[bName] = struct{}{}
}
}
if len(seen) == 0 {
return FrontendStateUnknown
}
allUnknown := true
for bName := range seen {
if states[bName] != StateUnknown {
allUnknown = false
break
}
}
if allUnknown {
return FrontendStateUnknown
}
ew := EffectiveWeights(fe, states)
for _, poolMap := range ew {
for _, w := range poolMap {
if w > 0 {
return FrontendStateUp
}
}
}
return FrontendStateDown
}
func clampWeight(w int) uint8 {
if w < 0 {
return 0
}
if w > 100 {
return 100
}
return uint8(w)
}

View File

@@ -0,0 +1,232 @@
// Copyright (c) 2026, Pim van Pelt <pim@ipng.ch>
package health
import (
"testing"
"git.ipng.ch/ipng/vpp-maglev/internal/config"
)
// TestBackendEffectiveWeight locks down the state → (weight, flush) truth
// table. This is the single source of truth for how maglevd decides what
// to program into VPP for each backend state. If this test needs updating
// the behavior has deliberately changed.
func TestBackendEffectiveWeight(t *testing.T) {
cases := []struct {
name string
poolIdx int
activePool int
state State
cfgWeight int
wantWeight uint8
wantFlush bool
}{
{"up active w100", 0, 0, StateUp, 100, 100, false},
{"up active w50", 0, 0, StateUp, 50, 50, false},
{"up active w0", 0, 0, StateUp, 0, 0, false},
{"up active clamp-high", 0, 0, StateUp, 150, 100, false},
{"up active clamp-low", 0, 0, StateUp, -5, 0, false},
{"up standby pool0 active=1", 0, 1, StateUp, 100, 0, false},
{"up standby pool1 active=0", 1, 0, StateUp, 100, 0, false},
{"up standby pool2 active=0", 2, 0, StateUp, 100, 0, false},
{"up failover pool1 active=1", 1, 1, StateUp, 100, 100, false},
{"unknown pool0 active=0", 0, 0, StateUnknown, 100, 0, false},
{"unknown pool1 active=0", 1, 0, StateUnknown, 100, 0, false},
{"down pool0 active=0", 0, 0, StateDown, 100, 0, false},
{"down pool1 active=1", 1, 1, StateDown, 100, 0, false},
{"paused pool0 active=0", 0, 0, StatePaused, 100, 0, false},
{"disabled pool0 active=0", 0, 0, StateDisabled, 100, 0, true},
{"disabled pool1 active=1", 1, 1, StateDisabled, 100, 0, true},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
w, f := BackendEffectiveWeight(tc.poolIdx, tc.activePool, tc.state, tc.cfgWeight)
if w != tc.wantWeight {
t.Errorf("weight: got %d, want %d", w, tc.wantWeight)
}
if f != tc.wantFlush {
t.Errorf("flush: got %v, want %v", f, tc.wantFlush)
}
})
}
}
// TestActivePoolIndex locks down the priority-failover selector: the first
// pool containing at least one up backend is the active pool. Default 0.
func TestActivePoolIndex(t *testing.T) {
mkFE := func(pools ...[]string) config.Frontend {
out := make([]config.Pool, len(pools))
for i, p := range pools {
out[i] = config.Pool{Name: "p", Backends: map[string]config.PoolBackend{}}
for _, name := range p {
out[i].Backends[name] = config.PoolBackend{Weight: 100}
}
}
return config.Frontend{Pools: out}
}
cases := []struct {
name string
fe config.Frontend
states map[string]State
want int
}{
{
name: "pool0 has up, pool1 standby",
fe: mkFE([]string{"a", "b"}, []string{"c", "d"}),
states: map[string]State{"a": StateUp, "b": StateDown, "c": StateUp, "d": StateUp},
want: 0,
},
{
name: "pool0 all down, pool1 has up → failover",
fe: mkFE([]string{"a", "b"}, []string{"c", "d"}),
states: map[string]State{"a": StateDown, "b": StateDown, "c": StateUp, "d": StateUp},
want: 1,
},
{
name: "pool0 all disabled, pool1 has up → failover",
fe: mkFE([]string{"a", "b"}, []string{"c"}),
states: map[string]State{"a": StateDisabled, "b": StateDisabled, "c": StateUp},
want: 1,
},
{
name: "pool0 all paused, pool1 has up → failover",
fe: mkFE([]string{"a"}, []string{"c"}),
states: map[string]State{"a": StatePaused, "c": StateUp},
want: 1,
},
{
name: "pool0 all unknown (startup), pool1 up → pool1",
fe: mkFE([]string{"a"}, []string{"c"}),
states: map[string]State{"a": StateUnknown, "c": StateUp},
want: 1,
},
{
name: "nothing up anywhere → default 0",
fe: mkFE([]string{"a"}, []string{"c"}),
states: map[string]State{"a": StateDown, "c": StateDown},
want: 0,
},
{
name: "1 up in pool0 is enough",
fe: mkFE([]string{"a", "b", "c"}, []string{"d"}),
states: map[string]State{"a": StateDown, "b": StateDown, "c": StateUp, "d": StateUp},
want: 0,
},
{
name: "three tiers, pool0 and pool1 both empty → pool2",
fe: mkFE([]string{"a"}, []string{"b"}, []string{"c"}),
states: map[string]State{"a": StateDown, "b": StateDown, "c": StateUp},
want: 2,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := ActivePoolIndex(tc.fe, tc.states)
if got != tc.want {
t.Errorf("got pool %d, want pool %d", got, tc.want)
}
})
}
}
// TestComputeFrontendState locks down the reduction rule: frontends are
// up iff any backend has effective weight > 0, unknown iff all backends
// are still in StateUnknown (or there are no backends), and down otherwise.
func TestComputeFrontendState(t *testing.T) {
mkFE := func(pools ...[]string) config.Frontend {
out := make([]config.Pool, len(pools))
for i, p := range pools {
out[i] = config.Pool{Name: "p", Backends: map[string]config.PoolBackend{}}
for _, name := range p {
out[i].Backends[name] = config.PoolBackend{Weight: 100}
}
}
return config.Frontend{Pools: out}
}
cases := []struct {
name string
fe config.Frontend
states map[string]State
want FrontendState
}{
{
name: "no backends → unknown",
fe: config.Frontend{Pools: []config.Pool{{Name: "primary", Backends: map[string]config.PoolBackend{}}}},
want: FrontendStateUnknown,
},
{
name: "all unknown (startup) → unknown",
fe: mkFE([]string{"a", "b"}),
states: map[string]State{"a": StateUnknown, "b": StateUnknown},
want: FrontendStateUnknown,
},
{
name: "one up in primary → up",
fe: mkFE([]string{"a", "b"}),
states: map[string]State{"a": StateUp, "b": StateDown},
want: FrontendStateUp,
},
{
name: "all down → down",
fe: mkFE([]string{"a", "b"}),
states: map[string]State{"a": StateDown, "b": StateDown},
want: FrontendStateDown,
},
{
name: "all disabled → down",
fe: mkFE([]string{"a", "b"}),
states: map[string]State{"a": StateDisabled, "b": StateDisabled},
want: FrontendStateDown,
},
{
name: "all paused → down",
fe: mkFE([]string{"a"}),
states: map[string]State{"a": StatePaused},
want: FrontendStateDown,
},
{
name: "primary down, secondary up → up (failover)",
fe: mkFE([]string{"a"}, []string{"b"}),
states: map[string]State{"a": StateDown, "b": StateUp},
want: FrontendStateUp,
},
{
name: "primary up, secondary down → up (secondary standby ignored)",
fe: mkFE([]string{"a"}, []string{"b"}),
states: map[string]State{"a": StateUp, "b": StateDown},
want: FrontendStateUp,
},
{
name: "primary unknown, secondary unknown → unknown",
fe: mkFE([]string{"a"}, []string{"b"}),
states: map[string]State{"a": StateUnknown, "b": StateUnknown},
want: FrontendStateUnknown,
},
{
name: "primary down, secondary unknown → down",
fe: mkFE([]string{"a"}, []string{"b"}),
states: map[string]State{"a": StateDown, "b": StateUnknown},
want: FrontendStateDown,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := ComputeFrontendState(tc.fe, tc.states)
if got != tc.want {
t.Errorf("got %s, want %s", got, tc.want)
}
})
}
}

View File

@@ -45,12 +45,53 @@ type VPPInfo struct {
ConnectedSince time.Time ConnectedSince time.Time
} }
// VIPStatEntry is a point-in-time snapshot of the per-VIP counters that
// VPP exposes via the stats segment: four SimpleCounters from the LB
// plugin (packets only) plus the FIB CombinedCounter at /net/route/to
// for the VIP's own host prefix (packets + bytes). Values are summed
// across worker threads. The labelling (prefix/protocol/port) matches
// the gRPC VPPLBVIP representation so a Prometheus time series
// corresponds 1:1 to a maglev frontend VIP.
type VIPStatEntry struct {
Prefix string // CIDR string, e.g. "192.0.2.1/32"
Protocol string // "tcp", "udp", "any"
Port uint16
// LB plugin SimpleCounters (packets only)
NextPkt uint64 // /packet from existing sessions
FirstPkt uint64 // /first session packet
Untracked uint64 // /untracked packet
NoServer uint64 // /no server configured
// FIB CombinedCounter from /net/route/to at the VIP prefix
Packets uint64
Bytes uint64
}
// BackendRouteStat is a point-in-time snapshot of the FIB combined counter
// (/net/route/to) for a single backend's host prefix. Values are summed
// across worker threads. Labels match the backend's identity so a time
// series corresponds 1:1 to a maglev backend entry.
type BackendRouteStat struct {
Backend string // backend name from the config
Address string // backend IP address as a string (e.g. "192.0.2.10")
Packets uint64
Bytes uint64
}
// VPPSource provides read-only access to the VPP client's state. vpp.Client // VPPSource provides read-only access to the VPP client's state. vpp.Client
// is adapted to this interface via a small shim in the collector so the // is adapted to this interface via a small shim in the collector so the
// metrics package stays decoupled from the vpp package's concrete types. // metrics package stays decoupled from the vpp package's concrete types.
type VPPSource interface { type VPPSource interface {
IsConnected() bool IsConnected() bool
VPPInfo() (VPPInfo, bool) VPPInfo() (VPPInfo, bool)
// VIPStats returns the most recent snapshot of per-VIP stats-segment
// counters, as captured by the LB stats loop. Returns nil when VPP is
// disconnected or no scrape has happened yet.
VIPStats() []VIPStatEntry
// BackendRouteStats returns the most recent snapshot of per-backend
// FIB combined counters (/net/route/to), as captured by the LB stats
// loop. Returns nil when VPP is disconnected, no scrape has happened
// yet, or the route lookup for every backend failed.
BackendRouteStats() []BackendRouteStat
} }
// ---- inline metrics (updated per probe) ------------------------------------ // ---- inline metrics (updated per probe) ------------------------------------
@@ -118,6 +159,12 @@ type Collector struct {
vppUptimeSeconds *prometheus.Desc vppUptimeSeconds *prometheus.Desc
vppConnectedFor *prometheus.Desc vppConnectedFor *prometheus.Desc
vppInfo *prometheus.Desc vppInfo *prometheus.Desc
vipPackets *prometheus.Desc // per-VIP LB counters from stats segment
vipRoutePkts *prometheus.Desc // per-VIP FIB combined counter: packets
vipRouteByts *prometheus.Desc // per-VIP FIB combined counter: bytes
backendRoutePkts *prometheus.Desc // per-backend FIB combined counter: packets
backendRouteByts *prometheus.Desc // per-backend FIB combined counter: bytes
} }
// NewCollector creates a Collector backed by the given StateSource. vpp may // NewCollector creates a Collector backed by the given StateSource. vpp may
@@ -167,6 +214,31 @@ func NewCollector(src StateSource, vpp VPPSource) *Collector {
"Static VPP build information. Always 1; metadata is conveyed via labels.", "Static VPP build information. Always 1; metadata is conveyed via labels.",
[]string{"version", "build_date", "pid"}, nil, []string{"version", "build_date", "pid"}, nil,
), ),
vipPackets: prometheus.NewDesc(
"maglev_vpp_vip_packets_total",
"Per-VIP packet counters from the VPP LB plugin stats segment, summed across workers. kind ∈ {next, first, untracked, no_server}.",
[]string{"prefix", "protocol", "port", "kind"}, nil,
),
vipRoutePkts: prometheus.NewDesc(
"maglev_vpp_vip_route_packets_total",
"Packets forwarded by VPP's FIB toward each VIP's host prefix (from /net/route/to), summed across workers.",
[]string{"prefix", "protocol", "port"}, nil,
),
vipRouteByts: prometheus.NewDesc(
"maglev_vpp_vip_route_bytes_total",
"Bytes forwarded by VPP's FIB toward each VIP's host prefix (from /net/route/to), summed across workers.",
[]string{"prefix", "protocol", "port"}, nil,
),
backendRoutePkts: prometheus.NewDesc(
"maglev_vpp_backend_route_packets_total",
"Packets forwarded by VPP's FIB toward each backend's host prefix (from /net/route/to), summed across workers.",
[]string{"backend", "address"}, nil,
),
backendRouteByts: prometheus.NewDesc(
"maglev_vpp_backend_route_bytes_total",
"Bytes forwarded by VPP's FIB toward each backend's host prefix (from /net/route/to), summed across workers.",
[]string{"backend", "address"}, nil,
),
} }
} }
@@ -180,6 +252,11 @@ func (c *Collector) Describe(ch chan<- *prometheus.Desc) {
ch <- c.vppUptimeSeconds ch <- c.vppUptimeSeconds
ch <- c.vppConnectedFor ch <- c.vppConnectedFor
ch <- c.vppInfo ch <- c.vppInfo
ch <- c.vipPackets
ch <- c.vipRoutePkts
ch <- c.vipRouteByts
ch <- c.backendRoutePkts
ch <- c.backendRouteByts
} }
// Collect implements prometheus.Collector. // Collect implements prometheus.Collector.
@@ -271,6 +348,27 @@ func (c *Collector) Collect(ch chan<- prometheus.Metric) {
c.vppInfo, prometheus.GaugeValue, 1.0, c.vppInfo, prometheus.GaugeValue, 1.0,
info.Version, info.BuildDate, fmt.Sprintf("%d", info.PID), info.Version, info.BuildDate, fmt.Sprintf("%d", info.PID),
) )
// Per-VIP packet counters, read from the snapshot updated by the LB
// stats loop in internal/vpp. CounterValue so rate()/increase() work
// as expected; VPP counter resets (e.g. VIP recreate) are handled by
// Prometheus's built-in counter-reset detection.
for _, v := range c.vpp.VIPStats() {
port := fmt.Sprintf("%d", v.Port)
ch <- prometheus.MustNewConstMetric(c.vipPackets, prometheus.CounterValue, float64(v.NextPkt), v.Prefix, v.Protocol, port, "next")
ch <- prometheus.MustNewConstMetric(c.vipPackets, prometheus.CounterValue, float64(v.FirstPkt), v.Prefix, v.Protocol, port, "first")
ch <- prometheus.MustNewConstMetric(c.vipPackets, prometheus.CounterValue, float64(v.Untracked), v.Prefix, v.Protocol, port, "untracked")
ch <- prometheus.MustNewConstMetric(c.vipPackets, prometheus.CounterValue, float64(v.NoServer), v.Prefix, v.Protocol, port, "no_server")
ch <- prometheus.MustNewConstMetric(c.vipRoutePkts, prometheus.CounterValue, float64(v.Packets), v.Prefix, v.Protocol, port)
ch <- prometheus.MustNewConstMetric(c.vipRouteByts, prometheus.CounterValue, float64(v.Bytes), v.Prefix, v.Protocol, port)
}
// Per-backend FIB counters from /net/route/to. Same CounterValue
// semantics as above.
for _, b := range c.vpp.BackendRouteStats() {
ch <- prometheus.MustNewConstMetric(c.backendRoutePkts, prometheus.CounterValue, float64(b.Packets), b.Backend, b.Address)
ch <- prometheus.MustNewConstMetric(c.backendRouteByts, prometheus.CounterValue, float64(b.Bytes), b.Backend, b.Address)
}
} }
// Register registers all metrics with the given registry. vpp may be nil // Register registers all metrics with the given registry. vpp may be nil

View File

@@ -9,6 +9,7 @@ import (
"context" "context"
"log/slog" "log/slog"
"sync" "sync"
"sync/atomic"
"time" "time"
"go.fd.io/govpp/adapter" "go.fd.io/govpp/adapter"
@@ -60,6 +61,16 @@ type Client struct {
info Info // populated on successful connect info Info // populated on successful connect
stateSrc StateSource // optional; enables periodic LB sync stateSrc StateSource // optional; enables periodic LB sync
lastLBConf *lb.LbConf // cached last-pushed lb_conf (dedup) lastLBConf *lb.LbConf // cached last-pushed lb_conf (dedup)
// lbStatsSnap is the most recent per-VIP stats snapshot captured by
// lbStatsLoop. Published as an immutable slice via atomic.Pointer so
// Prometheus scrapes (metrics.Collector.Collect) don't take any lock.
lbStatsSnap atomic.Pointer[[]metrics.VIPStatEntry]
// backendRouteSnap is the most recent per-backend FIB stats snapshot
// captured by lbStatsLoop. Same atomic-pointer publish pattern as
// lbStatsSnap; see logBackendRouteStats in fibstats.go.
backendRouteSnap atomic.Pointer[[]metrics.BackendRouteStat]
} }
// SetStateSource attaches a live config + health state source. When set, the // SetStateSource attaches a live config + health state source. When set, the
@@ -140,10 +151,11 @@ func (c *Client) Run(ctx context.Context) {
} }
} }
// Start the LB sync loop for as long as the connection is up. // Start the LB sync and stats loops for as long as the connection
// It exits when connCtx is cancelled (on disconnect or shutdown). // is up. Both exit when connCtx is cancelled.
connCtx, connCancel := context.WithCancel(ctx) connCtx, connCancel := context.WithCancel(ctx)
go c.lbSyncLoop(connCtx) go c.lbSyncLoop(connCtx)
go c.lbStatsLoop(connCtx)
// Hold the connection, pinging periodically to detect VPP restarts. // Hold the connection, pinging periodically to detect VPP restarts.
c.monitor(ctx) c.monitor(ctx)
@@ -217,6 +229,30 @@ func (c *Client) GetInfo() (Info, error) {
return c.info, nil return c.info, nil
} }
// VIPStats satisfies metrics.VPPSource. It returns the latest snapshot of
// per-VIP LB stats-segment counters captured by lbStatsLoop. Returns nil
// until the first scrape completes, or after a disconnect (the pointer is
// cleared when the connection drops).
func (c *Client) VIPStats() []metrics.VIPStatEntry {
p := c.lbStatsSnap.Load()
if p == nil {
return nil
}
return *p
}
// BackendRouteStats satisfies metrics.VPPSource. It returns the latest
// snapshot of per-backend FIB combined counters (/net/route/to) captured
// by lbStatsLoop. Returns nil until the first scrape completes, or after
// a disconnect.
func (c *Client) BackendRouteStats() []metrics.BackendRouteStat {
p := c.backendRouteSnap.Load()
if p == nil {
return nil
}
return *p
}
// VPPInfo satisfies metrics.VPPSource. It returns a copy of the cached // VPPInfo satisfies metrics.VPPSource. It returns a copy of the cached
// connection info as a metrics-local struct so the metrics package doesn't // connection info as a metrics-local struct so the metrics package doesn't
// need to import internal/vpp. Second return is false when VPP is not // need to import internal/vpp. Second return is false when VPP is not
@@ -272,6 +308,8 @@ func (c *Client) disconnect() {
c.info = Info{} c.info = Info{}
c.lastLBConf = nil // force re-push of lb_conf on reconnect c.lastLBConf = nil // force re-push of lb_conf on reconnect
c.mu.Unlock() c.mu.Unlock()
c.lbStatsSnap.Store(nil)
c.backendRouteSnap.Store(nil)
safeDisconnectAPI(apiConn) safeDisconnectAPI(apiConn)
safeDisconnectStats(statsConn) safeDisconnectStats(statsConn)

87
internal/vpp/fibstats.go Normal file
View File

@@ -0,0 +1,87 @@
// Copyright (c) 2026, Pim van Pelt <pim@ipng.ch>
package vpp
import (
"fmt"
"net"
"go.fd.io/govpp/adapter"
"go.fd.io/govpp/binapi/ip"
"go.fd.io/govpp/binapi/ip_types"
)
// routeToStatPath is the VPP stats-segment path exposing the per-FIB-entry
// "route-to" combined counter (packets + bytes), indexed by the load-
// balance index of each FIB entry. See lbm_to_counters in
// src/vnet/dpo/load_balance.c.
const routeToStatPath = "/net/route/to"
// fibStatsIndex returns the FIB entry's stats_index (load_balance index)
// for the host prefix of addr. Uses exact=0 (longest-match) so a covering
// route is returned if there is no host-prefix entry — note this means
// two maglev entities sharing a covering route will report identical
// /net/route/to counters.
func fibStatsIndex(ch *loggedChannel, addr net.IP) (uint32, error) {
var prefix ip_types.Prefix
if v4 := addr.To4(); v4 != nil {
prefix.Address.Af = ip_types.ADDRESS_IP4
copy(prefix.Address.Un.XXX_UnionData[:4], v4)
prefix.Len = 32
} else {
prefix.Address.Af = ip_types.ADDRESS_IP6
copy(prefix.Address.Un.XXX_UnionData[:], addr.To16())
prefix.Len = 128
}
req := &ip.IPRouteLookup{
TableID: 0,
Exact: 0,
Prefix: prefix,
}
reply := &ip.IPRouteLookupReply{}
if err := ch.SendRequest(req).ReceiveReply(reply); err != nil {
return 0, fmt.Errorf("ip_route_lookup: %w", err)
}
if reply.Retval != 0 {
return 0, fmt.Errorf("ip_route_lookup: retval=%d", reply.Retval)
}
return reply.Route.StatsIndex, nil
}
// findCombinedCounter returns the CombinedCounterStat matching name, or
// nil if not found or the wrong type.
func findCombinedCounter(entries []adapter.StatEntry, name string) adapter.CombinedCounterStat {
for _, e := range entries {
if string(e.Name) != name {
continue
}
if s, ok := e.Data.(adapter.CombinedCounterStat); ok {
return s
}
}
return nil
}
// reduceCombinedCounter sums the (packets, bytes) CombinedCounter across
// workers at column i, tolerating short per-worker vectors.
func reduceCombinedCounter(s adapter.CombinedCounterStat, i int) (pkts, byts uint64) {
for _, thread := range s {
if i >= 0 && i < len(thread) {
pkts += thread[i][0]
byts += thread[i][1]
}
}
return pkts, byts
}
// vipKeyToIP extracts the VIP address from a vipKey's CIDR string. The
// second return is the prefix length. Used by the scrape path to feed
// a VIP prefix into fibStatsIndex.
func vipKeyToIP(k vipKey) (net.IP, int, error) {
ip, ipnet, err := net.ParseCIDR(k.prefix)
if err != nil {
return nil, 0, err
}
ones, _ := ipnet.Mask.Size()
return ip, ones, nil
}

View File

@@ -30,6 +30,10 @@ type LBVIP struct {
Dscp uint8 Dscp uint8
TargetPort uint16 TargetPort uint16
FlowTableLength uint16 FlowTableLength uint16
// SrcIPSticky is scraped from `show lb vips verbose` via cli_inband;
// VPP's lb_vip_details does not carry this flag. Populated by
// GetLBStateAll and GetLBStateVIP; see queryLBSticky in lbsync.go.
SrcIPSticky bool
ASes []LBAS ASes []LBAS
} }
@@ -70,12 +74,17 @@ func (c *Client) GetLBStateAll() (*LBState, error) {
if err != nil { if err != nil {
return nil, err return nil, err
} }
stickyMap, err := queryLBSticky(ch)
if err != nil {
return nil, err
}
for i := range vips { for i := range vips {
ases, err := dumpASesForVIP(ch, vips[i].Protocol, vips[i].Port) ases, err := dumpASesForVIP(ch, vips[i].Protocol, vips[i].Port)
if err != nil { if err != nil {
return nil, err return nil, err
} }
vips[i].ASes = ases vips[i].ASes = ases
vips[i].SrcIPSticky = stickyMap[makeVIPKey(vips[i].Prefix, vips[i].Protocol, vips[i].Port)]
} }
state.VIPs = vips state.VIPs = vips
return state, nil return state, nil
@@ -90,7 +99,16 @@ func (c *Client) GetLBStateVIP(prefix *net.IPNet, protocol uint8, port uint16) (
return nil, err return nil, err
} }
defer ch.Close() defer ch.Close()
return lookupVIP(ch, prefix, protocol, port) vip, err := lookupVIP(ch, prefix, protocol, port)
if err != nil || vip == nil {
return vip, err
}
stickyMap, err := queryLBSticky(ch)
if err != nil {
return nil, err
}
vip.SrcIPSticky = stickyMap[makeVIPKey(vip.Prefix, vip.Protocol, vip.Port)]
return vip, nil
} }
// ---- low-level helpers (used by both Get and Sync paths) ------------------- // ---- low-level helpers (used by both Get and Sync paths) -------------------

229
internal/vpp/lbstats.go Normal file
View File

@@ -0,0 +1,229 @@
// Copyright (c) 2026, Pim van Pelt <pim@ipng.ch>
package vpp
import (
"context"
"fmt"
"log/slog"
"sort"
"time"
"go.fd.io/govpp/adapter"
"git.ipng.ch/ipng/vpp-maglev/internal/metrics"
)
// lbStatsInterval is how often lbStatsLoop scrapes per-VIP and per-backend
// counters from the VPP stats segment. Hard-coded for now; the scrape
// feeds both slog.Debug lines and the Prometheus collector.
const lbStatsInterval = 5 * time.Second
// LB VIP counter names as they appear in the VPP stats segment. These
// come from lb_foreach_vip_counter in src/plugins/lb/lb.h — each entry is
// registered with only .name set, so the stats segment exposes them at
// the top level (spaces and all). Replace if the VPP plugin renames them.
const (
lbStatNextPacket = "/packet from existing sessions"
lbStatFirstPacket = "/first session packet"
lbStatUntrackedPkt = "/untracked packet"
lbStatNoServer = "/no server configured"
)
// lbStatPatterns is the full list of anchored regexes passed to DumpStats
// for one scrape cycle: the four LB-plugin SimpleCounters plus the FIB
// CombinedCounter for per-route packet+byte totals. Doing it in a single
// DumpStats avoids walking the stats segment twice.
var lbStatPatterns = []string{
`^/packet from existing sessions$`,
`^/first session packet$`,
`^/untracked packet$`,
`^/no server configured$`,
`^/net/route/to$`,
}
// lbStatsLoop periodically scrapes the LB plugin's per-VIP counters and
// the FIB's /net/route/to combined counter for both VIPs and backends,
// publishes the results to the atomic snapshots read by Prometheus, and
// emits one slog.Debug line per VIP and per backend. Exits when ctx is
// cancelled.
func (c *Client) lbStatsLoop(ctx context.Context) {
ticker := time.NewTicker(lbStatsInterval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
}
if err := c.scrapeLBStats(); err != nil {
slog.Debug("vpp-lb-stats-error", "err", err)
}
}
}
// scrapeLBStats runs one full scrape cycle: discover VIPs via cli_inband,
// look up FIB stats_indices for every VIP and every backend via
// ip_route_lookup, dump all five stats-segment paths in one DumpStats
// call, and reduce the counters into the two published snapshots.
func (c *Client) scrapeLBStats() error {
if !c.IsConnected() {
return nil
}
ch, err := c.apiChannel()
if err != nil {
return err
}
defer ch.Close()
snap, err := queryLBVIPSnapshot(ch)
if err != nil {
return fmt.Errorf("query vip snapshot: %w", err)
}
// Resolve FIB stats_indices for every VIP. The LB plugin installs a
// host-prefix FIB entry per VIP (lb.c:990), so the exact=0 lookup
// lands on it in the common case. See fibStatsIndex for the exact=0
// caveat when a covering route shadows the host prefix.
vipStatsIdx := make(map[vipKey]uint32, len(snap))
for k := range snap {
addr, _, err := vipKeyToIP(k)
if err != nil {
continue
}
idx, err := fibStatsIndex(ch, addr)
if err != nil {
slog.Debug("vpp-vip-route-lookup-failed",
"prefix", k.prefix, "err", err)
continue
}
vipStatsIdx[k] = idx
}
// Resolve FIB stats_indices for every backend in the running config.
type backendLookup struct {
name, addr string
index uint32
}
var backends []backendLookup
if src := c.getStateSource(); src != nil {
if cfg := src.Config(); cfg != nil {
names := make([]string, 0, len(cfg.Backends))
for name := range cfg.Backends {
names = append(names, name)
}
sort.Strings(names) // stable snapshot order
for _, name := range names {
b := cfg.Backends[name]
if b.Address == nil {
continue
}
idx, err := fibStatsIndex(ch, b.Address)
if err != nil {
slog.Debug("vpp-backend-route-lookup-failed",
"backend", name, "address", b.Address.String(), "err", err)
continue
}
backends = append(backends, backendLookup{
name: name, addr: b.Address.String(), index: idx,
})
}
}
}
c.mu.Lock()
sc := c.statsClient
c.mu.Unlock()
if sc == nil {
return nil
}
entries, err := sc.DumpStats(lbStatPatterns...)
if err != nil {
return fmt.Errorf("dump stats: %w", err)
}
nextPkt := findSimpleCounter(entries, lbStatNextPacket)
firstPkt := findSimpleCounter(entries, lbStatFirstPacket)
untracked := findSimpleCounter(entries, lbStatUntrackedPkt)
noServer := findSimpleCounter(entries, lbStatNoServer)
routeTo := findCombinedCounter(entries, routeToStatPath)
// ---- VIP snapshot ----
vipOut := make([]metrics.VIPStatEntry, 0, len(snap))
for key, info := range snap {
lbIdx := int(info.index)
entry := metrics.VIPStatEntry{
Prefix: key.prefix,
Protocol: protocolName(key.protocol),
Port: key.port,
NextPkt: reduceSimpleCounter(nextPkt, lbIdx),
FirstPkt: reduceSimpleCounter(firstPkt, lbIdx),
Untracked: reduceSimpleCounter(untracked, lbIdx),
NoServer: reduceSimpleCounter(noServer, lbIdx),
}
if fibIdx, ok := vipStatsIdx[key]; ok {
entry.Packets, entry.Bytes = reduceCombinedCounter(routeTo, int(fibIdx))
}
vipOut = append(vipOut, entry)
slog.Debug("vpp-vip-stats",
"prefix", entry.Prefix,
"protocol", entry.Protocol,
"port", entry.Port,
"next-packet", entry.NextPkt,
"first-packet", entry.FirstPkt,
"untracked", entry.Untracked,
"no-server", entry.NoServer,
"packets", entry.Packets,
"bytes", entry.Bytes,
)
}
c.lbStatsSnap.Store(&vipOut)
// ---- backend snapshot ----
backendOut := make([]metrics.BackendRouteStat, 0, len(backends))
for _, l := range backends {
pkts, byts := reduceCombinedCounter(routeTo, int(l.index))
entry := metrics.BackendRouteStat{
Backend: l.name,
Address: l.addr,
Packets: pkts,
Bytes: byts,
}
backendOut = append(backendOut, entry)
slog.Debug("vpp-backend-route-stats",
"backend", entry.Backend,
"address", entry.Address,
"packets", entry.Packets,
"bytes", entry.Bytes,
)
}
c.backendRouteSnap.Store(&backendOut)
return nil
}
// findSimpleCounter returns the SimpleCounterStat matching name, or nil if
// not found. Stats segment names are byte slices, so we compare as string.
func findSimpleCounter(entries []adapter.StatEntry, name string) adapter.SimpleCounterStat {
for _, e := range entries {
if string(e.Name) != name {
continue
}
if s, ok := e.Data.(adapter.SimpleCounterStat); ok {
return s
}
}
return nil
}
// reduceSimpleCounter sums per-worker values at column i. It tolerates a
// short per-worker vector (which can happen right after a VIP is added,
// before a worker has observed it) by skipping out-of-range rows.
func reduceSimpleCounter(s adapter.SimpleCounterStat, i int) uint64 {
var sum uint64
for _, thread := range s {
if i >= 0 && i < len(thread) {
sum += uint64(thread[i])
}
}
return sum
}

View File

@@ -7,6 +7,11 @@ import (
"fmt" "fmt"
"log/slog" "log/slog"
"net" "net"
"regexp"
"strconv"
"strings"
"go.fd.io/govpp/binapi/vlib"
"git.ipng.ch/ipng/vpp-maglev/internal/config" "git.ipng.ch/ipng/vpp-maglev/internal/config"
"git.ipng.ch/ipng/vpp-maglev/internal/health" "git.ipng.ch/ipng/vpp-maglev/internal/health"
@@ -32,6 +37,7 @@ type desiredVIP struct {
Prefix *net.IPNet Prefix *net.IPNet
Protocol uint8 // 6=TCP, 17=UDP, 255=any Protocol uint8 // 6=TCP, 17=UDP, 255=any
Port uint16 Port uint16
SrcIPSticky bool // lb_add_del_vip_v2.src_ip_sticky
ASes map[string]desiredAS // keyed by AS IP string ASes map[string]desiredAS // keyed by AS IP string
} }
@@ -133,10 +139,12 @@ func (c *Client) SyncLBStateAll(cfg *config.Config) error {
for k, d := range desByKey { for k, d := range desByKey {
cur, existing := curByKey[k] cur, existing := curByKey[k]
var curPtr *LBVIP var curPtr *LBVIP
var curSticky bool
if existing { if existing {
curPtr = &cur curPtr = &cur
curSticky = cur.SrcIPSticky
} }
if err := reconcileVIP(ch, d, curPtr, &st); err != nil { if err := reconcileVIP(ch, d, curPtr, curSticky, &st); err != nil {
return err return err
} }
} }
@@ -189,8 +197,13 @@ func (c *Client) SyncLBStateVIP(cfg *config.Config, feName string) error {
"protocol", protocolName(d.Protocol), "protocol", protocolName(d.Protocol),
"port", d.Port) "port", d.Port)
var curSticky bool
if cur != nil {
curSticky = cur.SrcIPSticky
}
var st syncStats var st syncStats
if err := reconcileVIP(ch, d, cur, &st); err != nil { if err := reconcileVIP(ch, d, cur, curSticky, &st); err != nil {
return err return err
} }
recordSyncStats("vip", &st) recordSyncStats("vip", &st)
@@ -207,7 +220,14 @@ func (c *Client) SyncLBStateVIP(cfg *config.Config, feName string) error {
// reconcileVIP brings one VIP's state in VPP into alignment with the desired // reconcileVIP brings one VIP's state in VPP into alignment with the desired
// state. If cur is nil the VIP is added from scratch; otherwise ASes are // state. If cur is nil the VIP is added from scratch; otherwise ASes are
// added, removed, and reweighted individually. Stats are accumulated into st. // added, removed, and reweighted individually. Stats are accumulated into st.
func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, st *syncStats) error { //
// curSticky is the src_ip_sticky flag VPP currently has programmed for this
// VIP, as scraped by queryLBSticky. Callers only consult it when cur != nil,
// and the scrape reads the same live lb_main.vips pool as lb_vip_dump, so a
// matching entry is always present. When the flag differs from the desired
// value, the VIP is torn down (ASes del+flushed, VIP deleted) and recreated
// — VPP has no API to mutate src_ip_sticky on an existing VIP.
func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, curSticky bool, st *syncStats) error {
if cur == nil { if cur == nil {
if err := addVIP(ch, d); err != nil { if err := addVIP(ch, d); err != nil {
return err return err
@@ -222,6 +242,30 @@ func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, st *syncStats) er
return nil return nil
} }
if curSticky != d.SrcIPSticky {
slog.Info("vpp-lbsync-vip-recreate",
"prefix", d.Prefix.String(),
"protocol", protocolName(d.Protocol),
"port", d.Port,
"reason", "src-ip-sticky-changed",
"from", curSticky,
"to", d.SrcIPSticky)
if err := removeVIP(ch, *cur, st); err != nil {
return err
}
if err := addVIP(ch, d); err != nil {
return err
}
st.vipAdd++
for _, as := range d.ASes {
if err := addAS(ch, d.Prefix, d.Protocol, d.Port, as); err != nil {
return err
}
st.asAdd++
}
return nil
}
// VIP exists in both — reconcile ASes. // VIP exists in both — reconcile ASes.
curASes := make(map[string]LBAS, len(cur.ASes)) curASes := make(map[string]LBAS, len(cur.ASes))
for _, a := range cur.ASes { for _, a := range cur.ASes {
@@ -309,22 +353,12 @@ func desiredFromFrontend(cfg *config.Config, fe config.Frontend, src StateSource
Prefix: &net.IPNet{IP: fe.Address, Mask: net.CIDRMask(bits, bits)}, Prefix: &net.IPNet{IP: fe.Address, Mask: net.CIDRMask(bits, bits)},
Protocol: protocolFromConfig(fe.Protocol), Protocol: protocolFromConfig(fe.Protocol),
Port: fe.Port, Port: fe.Port,
SrcIPSticky: fe.SrcIPSticky,
ASes: make(map[string]desiredAS), ASes: make(map[string]desiredAS),
} }
// Snapshot backend states once so the active-pool computation and the states := snapshotStates(fe, src)
// per-backend weight assignment see a consistent view. activePool := health.ActivePoolIndex(fe, states)
states := make(map[string]health.State)
for _, pool := range fe.Pools {
for bName := range pool.Backends {
if s, ok := src.BackendState(bName); ok {
states[bName] = s
} else {
states[bName] = health.StateUnknown
}
}
}
activePool := activePoolIndex(fe, states)
for poolIdx, pool := range fe.Pools { for poolIdx, pool := range fe.Pools {
for bName, pb := range pool.Backends { for bName, pb := range pool.Backends {
@@ -337,12 +371,13 @@ func desiredFromFrontend(cfg *config.Config, fe config.Frontend, src StateSource
// weight=0 — they must not be deleted, otherwise a subsequent // weight=0 — they must not be deleted, otherwise a subsequent
// enable has to re-add them and existing flow-table state (if // enable has to re-add them and existing flow-table state (if
// any) is lost. The state machine drives what weight to set // any) is lost. The state machine drives what weight to set
// via asFromBackend; we never filter on b.Enabled here. // via health.BackendEffectiveWeight; we never filter on
// b.Enabled here.
addr := b.Address.String() addr := b.Address.String()
if _, already := d.ASes[addr]; already { if _, already := d.ASes[addr]; already {
continue continue
} }
w, flush := asFromBackend(poolIdx, activePool, states[bName], pb.Weight) w, flush := health.BackendEffectiveWeight(poolIdx, activePool, states[bName], pb.Weight)
d.ASes[addr] = desiredAS{ d.ASes[addr] = desiredAS{
Address: b.Address, Address: b.Address,
Weight: w, Weight: w,
@@ -354,14 +389,17 @@ func desiredFromFrontend(cfg *config.Config, fe config.Frontend, src StateSource
} }
// EffectiveWeights returns the current effective VPP weight for every backend // EffectiveWeights returns the current effective VPP weight for every backend
// in every pool of fe, keyed by poolIdx and backend name. It runs the same // in every pool of fe, keyed by poolIdx and backend name. Intended for
// failover + state-aware weight calculation that the sync path uses, but // observability (e.g. the GetFrontend gRPC handler) — the sync path and the
// produces a plain map instead of desiredVIP — intended for observability // checker's frontend-state logic use health.EffectiveWeights directly.
// (e.g. the GetFrontend gRPC handler) and for robot-testing the failover
// logic without needing a running VPP instance.
//
// The returned map layout is: result[poolIdx][backendName] = effective weight.
func EffectiveWeights(fe config.Frontend, src StateSource) map[int]map[string]uint8 { func EffectiveWeights(fe config.Frontend, src StateSource) map[int]map[string]uint8 {
return health.EffectiveWeights(fe, snapshotStates(fe, src))
}
// snapshotStates builds a state map for every backend referenced by fe. It
// takes one read-lock per backend via the StateSource interface, so the
// caller gets a consistent view to feed into the pure health helpers.
func snapshotStates(fe config.Frontend, src StateSource) map[string]health.State {
states := make(map[string]health.State) states := make(map[string]health.State)
for _, pool := range fe.Pools { for _, pool := range fe.Pools {
for bName := range pool.Backends { for bName := range pool.Backends {
@@ -372,75 +410,7 @@ func EffectiveWeights(fe config.Frontend, src StateSource) map[int]map[string]ui
} }
} }
} }
activePool := activePoolIndex(fe, states) return states
out := make(map[int]map[string]uint8, len(fe.Pools))
for poolIdx, pool := range fe.Pools {
out[poolIdx] = make(map[string]uint8, len(pool.Backends))
for bName, pb := range pool.Backends {
w, _ := asFromBackend(poolIdx, activePool, states[bName], pb.Weight)
out[poolIdx][bName] = w
}
}
return out
}
// activePoolIndex returns the index of the first pool in fe that contains at
// least one backend currently in StateUp. This is the priority-failover
// selector: pool[0] is the primary, pool[1] is the first fallback, and so on.
// As long as pool[0] has any up backend, it stays active. When every pool[0]
// backend leaves StateUp (down, paused, disabled, unknown), pool[1] is
// promoted — and so on for further fallback tiers. When no pool has any up
// backend, returns 0 (the return value is unobservable in that case since
// every backend maps to weight 0 regardless of the active pool).
func activePoolIndex(fe config.Frontend, states map[string]health.State) int {
for i, pool := range fe.Pools {
for bName := range pool.Backends {
if states[bName] == health.StateUp {
return i
}
}
}
return 0
}
// asFromBackend is the pure mapping from (pool index, active pool, backend
// state, config weight) to the desired VPP AS weight and flush hint. This is
// the single source of truth for the state → dataplane rule — every LB change
// flows through this function.
//
// A backend gets its configured weight iff it is up AND belongs to the
// currently-active pool. Every other case yields weight 0. The only
// state that produces flush=true is disabled.
//
// state in active pool not in active pool flush
// -------- -------------- ------------------- -----
// unknown 0 0 no
// up configured 0 (standby) no
// down 0 0 no
// paused 0 0 no
// disabled 0 0 yes
// removed handled separately (AS deleted via delAS)
//
// Flush semantics: flush=true means "if the AS currently has a non-zero
// weight in VPP, drop its existing flow-table entries when setting weight
// to 0". The reconciler only acts on flush when transitioning (current
// weight > 0), so steady-state syncs never re-flush. Failover demotion
// (e.g. pool[1] up→standby when pool[0] recovers) does NOT flush — we
// let those sessions drain naturally.
func asFromBackend(poolIdx, activePool int, state health.State, cfgWeight int) (weight uint8, flush bool) {
switch state {
case health.StateUp:
if poolIdx == activePool {
return clampWeight(cfgWeight), false
}
return 0, false
case health.StateDisabled:
return 0, true
default:
// unknown, down, paused: off, drain existing flows naturally.
return 0, false
}
} }
// ---- API call helpers ------------------------------------------------------ // ---- API call helpers ------------------------------------------------------
@@ -461,6 +431,7 @@ func addVIP(ch *loggedChannel, d desiredVIP) error {
Encap: encap, Encap: encap,
Type: lb_types.LB_API_SRV_TYPE_CLUSTERIP, Type: lb_types.LB_API_SRV_TYPE_CLUSTERIP,
NewFlowsTableLength: defaultFlowsTableLength, NewFlowsTableLength: defaultFlowsTableLength,
SrcIPSticky: d.SrcIPSticky,
IsDel: false, IsDel: false,
} }
reply := &lb.LbAddDelVipV2Reply{} reply := &lb.LbAddDelVipV2Reply{}
@@ -474,7 +445,8 @@ func addVIP(ch *loggedChannel, d desiredVIP) error {
"prefix", d.Prefix.String(), "prefix", d.Prefix.String(),
"protocol", protocolName(d.Protocol), "protocol", protocolName(d.Protocol),
"port", d.Port, "port", d.Port,
"encap", encapName(encap)) "encap", encapName(encap),
"src-ip-sticky", d.SrcIPSticky)
return nil return nil
} }
@@ -574,6 +546,136 @@ func setASWeight(ch *loggedChannel, prefix *net.IPNet, protocol uint8, port uint
return nil return nil
} }
// ---- VIP snapshot scrape ---------------------------------------------------
// TEMPORARY WORKAROUND: VPP's lb_vip_dump / lb_vip_details message does not
// return the src_ip_sticky flag, and lb_vip_details also doesn't expose the
// LB plugin's pool index — which is what the stats segment uses to key
// per-VIP counters. We work around both by running `show lb vips verbose`
// via the cli_inband API and parsing the human-readable output. This is
// slow and fragile (the format is not a stable API) and must be replaced
// once VPP ships an lb_vip_v2_dump that includes these fields.
// lbVIPSnapshot holds the per-VIP facts that the scrape recovers. `index`
// is the LB pool index (`vip - lbm->vips` in lb.c) used as the second
// dimension of the vip_counters SimpleCounterStat in the stats segment.
type lbVIPSnapshot struct {
index uint32
sticky bool
}
// queryLBVIPSnapshot runs `show lb vips verbose` and parses it into a map
// keyed by vipKey. The returned error is non-nil only when the cli_inband
// RPC itself fails.
func queryLBVIPSnapshot(ch *loggedChannel) (map[vipKey]lbVIPSnapshot, error) {
req := &vlib.CliInband{Cmd: "show lb vips verbose"}
reply := &vlib.CliInbandReply{}
if err := ch.SendRequest(req).ReceiveReply(reply); err != nil {
return nil, fmt.Errorf("cli_inband %q: %w", req.Cmd, err)
}
if reply.Retval != 0 {
return nil, fmt.Errorf("cli_inband %q: retval=%d", req.Cmd, reply.Retval)
}
return parseLBVIPSnapshot(reply.Reply), nil
}
// queryLBSticky is a thin projection of queryLBVIPSnapshot for callers that
// only care about the src_ip_sticky flag (the sync path).
func queryLBSticky(ch *loggedChannel) (map[vipKey]bool, error) {
snap, err := queryLBVIPSnapshot(ch)
if err != nil {
return nil, err
}
out := make(map[vipKey]bool, len(snap))
for k, v := range snap {
out[k] = v.sticky
}
return out, nil
}
// lbVIPHeaderRe matches the first line of each VIP block in the output of
// `show lb vips verbose`. VPP produces lines like:
//
// ip4-gre4 [1] 192.0.2.1/32 src_ip_sticky
// ip6-gre6 [2] 2001:db8::1/128
//
// Capture groups: 1 = pool index, 2 = prefix (CIDR), 3 = " src_ip_sticky" when present.
var lbVIPHeaderRe = regexp.MustCompile(
`\b(?:ip4-gre4|ip6-gre6|ip4-gre6|ip6-gre4|ip4-l3dsr|ip4-nat4|ip6-nat6)\s+\[(\d+)\]\s+(\S+)(\s+src_ip_sticky)?`,
)
// lbVIPProtoRe matches the `protocol:<n> port:<n>` sub-line that appears
// under each non-all-port VIP block.
var lbVIPProtoRe = regexp.MustCompile(`protocol:(\d+)\s+port:(\d+)`)
// parseLBVIPSnapshot turns the reply text of `show lb vips verbose` into a
// map of vipKey → lbVIPSnapshot. It walks lines sequentially: each header
// line starts a new VIP, and the following `protocol:<p> port:<port>` line
// (if any) refines the key. All-port VIPs have no protocol/port sub-line
// and are recorded with protocol=255 and port=0 — the same key format used
// by the rest of the sync path (see protocolFromConfig).
func parseLBVIPSnapshot(text string) map[vipKey]lbVIPSnapshot {
out := make(map[vipKey]lbVIPSnapshot)
var havePending bool
var curPrefix string
var curIndex uint32
var curSticky bool
var curProto uint8 = 255
var curPort uint16
flush := func() {
if !havePending || curPrefix == "" {
return
}
out[vipKey{prefix: curPrefix, protocol: curProto, port: curPort}] = lbVIPSnapshot{
index: curIndex,
sticky: curSticky,
}
}
for _, line := range strings.Split(text, "\n") {
if m := lbVIPHeaderRe.FindStringSubmatch(line); m != nil {
flush()
havePending = true
if idx, err := strconv.ParseUint(m[1], 10, 32); err == nil {
curIndex = uint32(idx)
} else {
curIndex = 0
}
curPrefix = canonicalCIDR(m[2])
curSticky = m[3] != ""
curProto = 255
curPort = 0
continue
}
if !havePending {
continue
}
if m := lbVIPProtoRe.FindStringSubmatch(line); m != nil {
if p, err := strconv.Atoi(m[1]); err == nil {
curProto = uint8(p)
}
if p, err := strconv.Atoi(m[2]); err == nil {
curPort = uint16(p)
}
}
}
flush()
return out
}
// canonicalCIDR parses a CIDR string and returns it in Go's canonical
// net.IPNet.String() form so it matches the keys produced by makeVIPKey.
// Returns the input unchanged if parsing fails — the header regex should
// have validated it, but a mismatch shouldn't panic the sync path.
func canonicalCIDR(s string) string {
_, ipnet, err := net.ParseCIDR(s)
if err != nil {
return s
}
return ipnet.String()
}
// ---- utility --------------------------------------------------------------- // ---- utility ---------------------------------------------------------------
func makeVIPKey(prefix *net.IPNet, protocol uint8, port uint16) vipKey { func makeVIPKey(prefix *net.IPNet, protocol uint8, port uint16) vipKey {
@@ -618,13 +720,3 @@ func encapName(e lb_types.LbEncapType) string {
} }
return fmt.Sprintf("%d", e) return fmt.Sprintf("%d", e)
} }
func clampWeight(w int) uint8 {
if w < 0 {
return 0
}
if w > 100 {
return 100
}
return uint8(w)
}

View File

@@ -10,141 +10,45 @@ import (
"git.ipng.ch/ipng/vpp-maglev/internal/health" "git.ipng.ch/ipng/vpp-maglev/internal/health"
) )
// TestAsFromBackend locks down the state → (weight, flush) truth table. // TestParseLBVIPSnapshot pins the parser for `show lb vips verbose` output.
// This is the single source of truth for how maglevd decides what to // The text below is a synthetic sample that mirrors format_lb_vip_detailed
// program into VPP for each backend state. If this test needs updating // in src/plugins/lb/lb.c: a header line per VIP optionally carrying the
// the behavior has deliberately changed. // src_ip_sticky token, followed by a protocol:/port: sub-line for non all-
func TestAsFromBackend(t *testing.T) { // port VIPs. If VPP changes this format the test will fail loudly — the
cases := []struct { // scrape is a temporary workaround until lb_vip_v2_dump exists.
name string func TestParseLBVIPSnapshot(t *testing.T) {
poolIdx int text := ` ip4-gre4 [1] 192.0.2.1/32 src_ip_sticky
activePool int new_size:1024
state health.State protocol:6 port:80
cfgWeight int counters:
wantWeight uint8 ip4-gre4 [2] 192.0.2.2/32
wantFlush bool new_size:1024
}{ protocol:17 port:53
// up in active pool → configured weight, no flush ip6-gre6 [3] 2001:db8::1/128 src_ip_sticky
{"up active w100", 0, 0, health.StateUp, 100, 100, false}, new_size:1024
{"up active w50", 0, 0, health.StateUp, 50, 50, false}, protocol:6 port:443
{"up active w0", 0, 0, health.StateUp, 0, 0, false}, ip4-gre4 [4] 192.0.2.3/32
{"up active clamp-high", 0, 0, health.StateUp, 150, 100, false}, new_size:1024
{"up active clamp-low", 0, 0, health.StateUp, -5, 0, false}, `
got := parseLBVIPSnapshot(text)
// up in non-active pool → standby (weight 0), no flush want := map[vipKey]lbVIPSnapshot{
{"up standby pool0 active=1", 0, 1, health.StateUp, 100, 0, false}, {prefix: "192.0.2.1/32", protocol: 6, port: 80}: {index: 1, sticky: true},
{"up standby pool1 active=0", 1, 0, health.StateUp, 100, 0, false}, {prefix: "192.0.2.2/32", protocol: 17, port: 53}: {index: 2, sticky: false},
{"up standby pool2 active=0", 2, 0, health.StateUp, 100, 0, false}, {prefix: "2001:db8::1/128", protocol: 6, port: 443}: {index: 3, sticky: true},
{prefix: "192.0.2.3/32", protocol: 255, port: 0}: {index: 4, sticky: false}, // all-port VIP
// up in secondary, promoted because pool[1] is now active
{"up failover pool1 active=1", 1, 1, health.StateUp, 100, 100, false},
// unknown → off, drain
{"unknown pool0 active=0", 0, 0, health.StateUnknown, 100, 0, false},
{"unknown pool1 active=0", 1, 0, health.StateUnknown, 100, 0, false},
// down → off, drain (probe might be wrong)
{"down pool0 active=0", 0, 0, health.StateDown, 100, 0, false},
{"down pool1 active=1", 1, 1, health.StateDown, 100, 0, false},
// paused → off, drain (graceful maintenance)
{"paused pool0 active=0", 0, 0, health.StatePaused, 100, 0, false},
// disabled → off, flush (hard stop)
{"disabled pool0 active=0", 0, 0, health.StateDisabled, 100, 0, true},
{"disabled pool1 active=1", 1, 1, health.StateDisabled, 100, 0, true},
} }
if len(got) != len(want) {
for _, tc := range cases { t.Errorf("got %d entries, want %d: %#v", len(got), len(want), got)
t.Run(tc.name, func(t *testing.T) {
w, f := asFromBackend(tc.poolIdx, tc.activePool, tc.state, tc.cfgWeight)
if w != tc.wantWeight {
t.Errorf("weight: got %d, want %d", w, tc.wantWeight)
} }
if f != tc.wantFlush { for k, v := range want {
t.Errorf("flush: got %v, want %v", f, tc.wantFlush) g, ok := got[k]
if !ok {
t.Errorf("missing key %+v", k)
continue
} }
}) if g != v {
t.Errorf("key %+v: got %+v, want %+v", k, g, v)
} }
}
// TestActivePoolIndex locks down the priority-failover selector: the first
// pool containing at least one up backend is the active pool. Default 0.
func TestActivePoolIndex(t *testing.T) {
mkFE := func(pools ...[]string) config.Frontend {
out := make([]config.Pool, len(pools))
for i, p := range pools {
out[i] = config.Pool{Name: "p", Backends: map[string]config.PoolBackend{}}
for _, name := range p {
out[i].Backends[name] = config.PoolBackend{Weight: 100}
}
}
return config.Frontend{Pools: out}
}
cases := []struct {
name string
fe config.Frontend
states map[string]health.State
want int
}{
{
name: "pool0 has up, pool1 standby",
fe: mkFE([]string{"a", "b"}, []string{"c", "d"}),
states: map[string]health.State{"a": health.StateUp, "b": health.StateDown, "c": health.StateUp, "d": health.StateUp},
want: 0,
},
{
name: "pool0 all down, pool1 has up → failover",
fe: mkFE([]string{"a", "b"}, []string{"c", "d"}),
states: map[string]health.State{"a": health.StateDown, "b": health.StateDown, "c": health.StateUp, "d": health.StateUp},
want: 1,
},
{
name: "pool0 all disabled, pool1 has up → failover",
fe: mkFE([]string{"a", "b"}, []string{"c"}),
states: map[string]health.State{"a": health.StateDisabled, "b": health.StateDisabled, "c": health.StateUp},
want: 1,
},
{
name: "pool0 all paused, pool1 has up → failover",
fe: mkFE([]string{"a"}, []string{"c"}),
states: map[string]health.State{"a": health.StatePaused, "c": health.StateUp},
want: 1,
},
{
name: "pool0 all unknown (startup), pool1 up → pool1",
fe: mkFE([]string{"a"}, []string{"c"}),
states: map[string]health.State{"a": health.StateUnknown, "c": health.StateUp},
want: 1,
},
{
name: "nothing up anywhere → default 0",
fe: mkFE([]string{"a"}, []string{"c"}),
states: map[string]health.State{"a": health.StateDown, "c": health.StateDown},
want: 0,
},
{
name: "1 up in pool0 is enough",
fe: mkFE([]string{"a", "b", "c"}, []string{"d"}),
states: map[string]health.State{"a": health.StateDown, "b": health.StateDown, "c": health.StateUp, "d": health.StateUp},
want: 0,
},
{
name: "three tiers, pool0 and pool1 both empty → pool2",
fe: mkFE([]string{"a"}, []string{"b"}, []string{"c"}),
states: map[string]health.State{"a": health.StateDown, "b": health.StateDown, "c": health.StateUp},
want: 2,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := activePoolIndex(tc.fe, tc.states)
if got != tc.want {
t.Errorf("got pool %d, want pool %d", got, tc.want)
}
})
} }
} }
@@ -161,8 +65,10 @@ func (f *fakeStateSource) BackendState(name string) (health.State, bool) {
} }
// TestDesiredFromFrontendFailover is the end-to-end integration test for // TestDesiredFromFrontendFailover is the end-to-end integration test for
// priority-failover: given a frontend with two pools, the desired weights // priority-failover in the VPP sync path: given a frontend with two pools,
// flip between pools based on which has any up backends. // the desired weights flip between pools based on which has any up backends.
// This exercises vpp.desiredFromFrontend which wraps the pure helpers in
// the health package; those helpers are unit-tested separately in health.
func TestDesiredFromFrontendFailover(t *testing.T) { func TestDesiredFromFrontendFailover(t *testing.T) {
ip := func(s string) net.IP { return net.ParseIP(s).To4() } ip := func(s string) net.IP { return net.ParseIP(s).To4() }
cfg := &config.Config{ cfg := &config.Config{

View File

@@ -63,11 +63,17 @@ func (r *Reconciler) Run(ctx context.Context) {
} }
} }
// handle reconciles one event. Operates only on events that carry a // handle reconciles one event. Operates only on backend-transition events
// frontend name (the checker emits one event per frontend that references // that carry a frontend name (the checker emits one event per frontend that
// the backend, so a backend shared across multiple frontends produces // references the backend, so a backend shared across multiple frontends
// multiple events and all relevant VIPs are reconciled). // produces multiple events and all relevant VIPs are reconciled).
// Frontend-transition events are observational only — the dataplane work
// they would imply has already been done by the backend-transition event
// that triggered them.
func (r *Reconciler) handle(ev checker.Event) { func (r *Reconciler) handle(ev checker.Event) {
if ev.FrontendTransition != nil {
return // frontend-only event; no dataplane work
}
if ev.FrontendName == "" { if ev.FrontendName == "" {
return return
} }

View File

@@ -23,6 +23,7 @@ service Maglev {
rpc GetVPPInfo(GetVPPInfoRequest) returns (VPPInfo); rpc GetVPPInfo(GetVPPInfoRequest) returns (VPPInfo);
rpc GetVPPLBState(GetVPPLBStateRequest) returns (VPPLBState); rpc GetVPPLBState(GetVPPLBStateRequest) returns (VPPLBState);
rpc SyncVPPLBState(SyncVPPLBStateRequest) returns (SyncVPPLBStateResponse); rpc SyncVPPLBState(SyncVPPLBStateRequest) returns (SyncVPPLBStateResponse);
rpc GetVPPLBCounters(GetVPPLBCountersRequest) returns (VPPLBCounters);
} }
// ---- requests --------------------------------------------------------------- // ---- requests ---------------------------------------------------------------
@@ -108,6 +109,7 @@ message VPPLBVIP {
string encap = 4; // gre4|gre6|l3dsr|nat4|nat6 string encap = 4; // gre4|gre6|l3dsr|nat4|nat6
uint32 flow_table_length = 5; uint32 flow_table_length = 5;
repeated VPPLBAS application_servers = 6; repeated VPPLBAS application_servers = 6;
bool src_ip_sticky = 7; // source-IP based sticky session (scraped via cli_inband)
} }
message VPPLBState { message VPPLBState {
@@ -126,6 +128,44 @@ message SyncVPPLBStateRequest {
message SyncVPPLBStateResponse {} message SyncVPPLBStateResponse {}
// ---- VPP load-balancer runtime counters ------------------------------------
// GetVPPLBCountersRequest asks maglevd for the most recent per-VIP and
// per-backend counter snapshot. The data is served from an in-process
// cache that is refreshed every ~5 seconds server-side; the call itself
// is cheap and does not hit VPP.
message GetVPPLBCountersRequest {}
// VPPLBVIPCounters is the point-in-time counter row for a single VIP.
// The four lb_* fields are the LB plugin's SimpleCounters (packets only);
// packets / bytes come from the VPP FIB's combined counter at the VIP's
// host prefix (/net/route/to).
message VPPLBVIPCounters {
string prefix = 1; // CIDR, e.g. 192.0.2.1/32
string protocol = 2; // tcp | udp | any
uint32 port = 3;
uint64 next_packet = 4; // "/packet from existing sessions"
uint64 first_packet = 5; // "/first session packet"
uint64 untracked_packet = 6; // "/untracked packet"
uint64 no_server = 7; // "/no server configured"
uint64 packets = 8; // /net/route/to (FIB, summed across workers)
uint64 bytes = 9; // /net/route/to (FIB, summed across workers)
}
// VPPLBBackendCounters is the FIB combined counter for a single backend's
// host prefix, summed across worker threads.
message VPPLBBackendCounters {
string backend = 1; // backend name from config
string address = 2; // backend IP address
uint64 packets = 3;
uint64 bytes = 4;
}
message VPPLBCounters {
repeated VPPLBVIPCounters vips = 1;
repeated VPPLBBackendCounters backends = 2;
}
message SetWeightRequest { message SetWeightRequest {
string frontend = 1; string frontend = 1;
string pool = 2; string pool = 2;
@@ -166,6 +206,7 @@ message FrontendInfo {
uint32 port = 4; uint32 port = 4;
repeated PoolInfo pools = 5; repeated PoolInfo pools = 5;
string description = 6; string description = 6;
bool src_ip_sticky = 7; // VPP LB uses src-IP-based stickiness for this VIP
} }
message ListBackendsResponse { message ListBackendsResponse {
@@ -245,8 +286,12 @@ message BackendEvent {
TransitionRecord transition = 2; TransitionRecord transition = 2;
} }
// FrontendEvent is reserved for future frontend-level events. // FrontendEvent is emitted when a frontend's aggregate state changes.
message FrontendEvent {} // Frontends have three states: unknown, up, down. See docs/healthchecks.md.
message FrontendEvent {
string frontend_name = 1;
TransitionRecord transition = 2;
}
// Event is the envelope returned by WatchEvents. // Event is the envelope returned by WatchEvents.
message Event { message Event {