Files
vpp-maglev/docs/config-guide.md
Pim van Pelt fb62532fd5 VPP LB counters, src-ip-sticky, and frontend state aggregation
New feature: per-VIP / per-backend runtime counters
  * New GetVPPLBCounters RPC serving an in-process snapshot refreshed
    by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls
    the LB plugin's four SimpleCounters (next, first, untracked,
    no-server) plus the FIB /net/route/to CombinedCounter for every
    VIP and every backend host prefix via a single DumpStats call.
  * FIB stats-index discovery via ip_route_lookup (internal/vpp/
    fibstats.go); per-worker reduction happens in the collector.
  * Prometheus collector exports vip_packets_total (kind label),
    vip_route_{packets,bytes}_total, and backend_route_{packets,
    bytes}_total. Metrics source interface extended with VIPStats /
    BackendRouteStats; vpp.Client publishes snapshots via
    atomic.Pointer and clears them on disconnect.
  * New 'show vpp lb counters' CLI command. The 'show vpp lbstate'
    and 'sync vpp lbstate' commands are restructured under 'show
    vpp lb {state,counters}' / 'sync vpp lb state' to make room
    for the new verb.

New feature: src-ip-sticky frontends
  * New frontend YAML key 'src-ip-sticky' (bool). Plumbed through
    config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call.
  * Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP.
    src_ip_sticky, and shown in 'show vpp lb state' output.
  * Scraped back from VPP by parsing 'show lb vips verbose' through
    cli_inband — lb_vip_details does not expose the flag. The same
    scrape also recovers the LB pool index for each VIP, which the
    stats-segment counters are keyed on. This is a documented
    temporary workaround until VPP ships an lb_vip_v2_dump.
  * src_ip_sticky cannot be mutated on a live VIP, so a flipped flag
    triggers a tear-down-and-recreate in reconcileVIP (ASes deleted
    with flush, VIP deleted, then re-added). Flip is logged.

New feature: frontend state aggregation and events
  * New health.FrontendState (unknown/up/down) and FrontendTransition
    types. A frontend is 'up' iff at least one backend has a nonzero
    effective weight, 'unknown' iff no backend has real state yet,
    and 'down' otherwise.
  * Checker tracks per-frontend aggregate state, recomputing after
    each backend transition and emitting a frontend-transition Event
    on change. Reload drops entries for removed frontends.
  * checker.Event gains an optional FrontendTransition pointer;
    backend- vs. frontend-transition events are demultiplexed on
    that field.
  * WatchEvents now sends an initial snapshot of frontend state on
    connect (mirroring the existing backend snapshot), subscribes
    once to the checker stream, and fans out to backend/frontend
    handlers based on the client's filter flags. The proto
    FrontendEvent message grows name + transition fields.
  * New Checker.FrontendState accessor.

Refactor: pure health helpers
  * Moved the priority-failover selector and the (pool idx, active
    pool, state, cfg weight) → (vpp weight, flush) mapping out of
    internal/vpp/lbsync.go into a new internal/health/weights.go so
    the checker can reuse them for frontend-state computation
    without importing internal/vpp.
  * New functions: health.ActivePoolIndex, BackendEffectiveWeight,
    EffectiveWeights, ComputeFrontendState. lbsync.go now calls
    these directly; vpp.EffectiveWeights is a thin wrapper over
    health.EffectiveWeights retained for the gRPC observability
    path. Fully unit-tested in internal/health/weights_test.go.

maglevc polish
  * --color default is now mode-aware: on in the interactive shell,
    off in one-shot mode so piped output is script-safe. Explicit
    --color=true/false still overrides.
  * New stripHostMask helper drops /32 and /128 from VIP display;
    non-host prefixes pass through unchanged.
  * Counter table column order fixed (first before next) and
    packets/bytes columns renamed to fib-packets/fib-bytes to
    clarify they come from the FIB, not the LB plugin.

Docs
  * config-guide: document src-ip-sticky, including the VIP
    recreate-on-change caveat.
  * user-guide, maglevc.1, maglevd.8: updated command tree, new
    counters command, color defaults, and the src-ip-sticky field.
2026-04-12 16:07:39 +02:00

350 lines
13 KiB
Markdown

# maglevd Configuration Guide
## Overview
`maglevd` consumes a YAML configuration file of a specific format. Validation is performed
in two stages:
1. **Structural parsing**: the YAML is unmarshalled into typed Go structs. Unknown fields and
type mismatches are rejected immediately.
1. **Semantic validation**: cross-field and cross-object rules are enforced, for example
ensuring that every backend referenced by a frontend exists, that address families are
consistent within a frontend, and that IP source addresses are the correct family.
If you want to get started quickly, take a look at the [example config](../debian/maglev.yaml).
## Basic structure
The YAML configuration file has the following top-level structure:
```yaml
maglev:
healthchecker:
[ Global health checker settings ]
vpp:
lb:
[ VPP load-balancer integration settings ]
healthchecks:
my-check:
[ Health check definition ]
backends:
my-backend:
[ Backend definition ]
frontends:
my-frontend:
[ Frontend (VIP) definition ]
```
All five sections live under the top-level `maglev:` key. The `healthchecks`, `backends`,
and `frontends` sections are maps keyed by an arbitrary name of your choosing. Names must be
unique within their section and are case-sensitive. The `vpp` section is required when
`maglevd` has a working VPP connection — its `lb.ipv4-src-address` and `lb.ipv6-src-address`
fields are mandatory and `maglevd` will refuse to start without them.
---
## healthchecker
Global settings for the health checker engine.
* ***transition-history***: An integer >= 1 that controls how many state transitions are
retained per backend for display via the gRPC API. Defaults to `5`.
* ***netns***: The name of a Linux network namespace in which probes are executed. When
empty or omitted, probes run in the current (default) network namespace. Useful when
backends are reachable only through a dedicated dataplane namespace.
Example:
```yaml
maglev:
healthchecker:
transition-history: 10
netns: dataplane
```
---
## vpp
Settings controlling the integration with a locally running VPP instance. The
`vpp` section is a map with a single sub-section, `lb`. Both `lb.ipv4-src-address`
and `lb.ipv6-src-address` are **required**`maglevd --check` exits with a
semantic error and the daemon refuses to start when either is missing, because
VPP's GRE encap needs a source address and every VIP `maglevd` programs uses GRE.
* ***lb.ipv4-src-address***: Required. The IPv4 source address VPP uses when
encapsulating IPv4 traffic into GRE4 tunnels to application servers. Must
be a valid IPv4 address. No default.
* ***lb.ipv6-src-address***: Required. The IPv6 source address VPP uses when
encapsulating IPv6 traffic into GRE6 tunnels. Must be a valid IPv6 address.
No default.
* ***lb.sync-interval***: A positive Go duration (e.g. `30s`, `1m`) controlling
how often `maglevd` reconciles the VPP load-balancer dataplane against its
running configuration. On startup, an immediate full sync runs; subsequent
syncs fire at this interval as long as the VPP connection is up. Defaults
to `30s`. The purpose is to catch drift — for example, a VIP added to VPP
by hand — and bring VPP back in line with the maglev config.
* ***lb.sticky-buckets-per-core***: The number of buckets per worker thread in
the established-flow table. Must be a power of 2. Defaults to `65536` (64k).
* ***lb.flow-timeout***: Idle time after which an established flow is removed
from the table. Must be a whole number of seconds between `1s` and `120s`
inclusive. Defaults to `40s`.
These four values are pushed to VPP via `lb_conf` when `maglevd` connects to
VPP and again after every config reload (whenever they change). A log line
`vpp-lb-conf-set` records the effective values.
Example:
```yaml
maglev:
vpp:
lb:
sync-interval: 60s
ipv4-src-address: 10.0.0.1
ipv6-src-address: 2001:db8::1
sticky-buckets-per-core: 65536
flow-timeout: 40s
```
---
## healthchecks
A named map of health check definitions. Each health check describes *how* to probe a backend.
Backends reference health checks by name. The same health check can be reused across any number
of backends; each backend is probed exactly once regardless of how many frontends reference it.
Common fields (all types):
* ***type***: Required. One of `icmp`, `tcp`, `http`, or `https`.
* ***port***: The destination port to probe. Required for `tcp`, `http`, and `https`.
Must be omitted for `icmp`.
* ***probe-ipv4-src***: An optional IPv4 source address used when probing IPv4 backends.
Must be an IPv4 address. When omitted, the OS chooses the source address.
* ***probe-ipv6-src***: An optional IPv6 source address used when probing IPv6 backends.
Must be an IPv6 address. When omitted, the OS chooses the source address.
* ***interval***: Required. A positive Go duration string (e.g. `2s`, `500ms`) controlling
how often a probe is sent when the backend is fully healthy (counter at maximum).
* ***fast-interval***: Optional. A positive duration used instead of `interval` while the
backend's health counter is degraded (between down and up) or in `unknown` state. When
omitted, `interval` is used.
* ***down-interval***: Optional. A positive duration used instead of `interval` while the
backend is fully down (counter at zero). When omitted, `interval` is used. Setting this to
a longer value reduces probe traffic to backends that are known to be offline.
* ***timeout***: Required. A positive duration after which an in-flight probe is abandoned
and counted as a failure.
* ***rise***: The number of consecutive successes required to transition from down to up.
Defaults to `2`. Must be >= 1.
* ***fall***: The number of consecutive failures required to transition from up to down.
Defaults to `3`. Must be >= 1.
### type: icmp
Sends an ICMP echo request (ping) to the backend address. Requires `CAP_NET_RAW`. No `port`
may be specified. No `params` block is used.
```yaml
healthchecks:
ping:
type: icmp
probe-ipv4-src: 10.0.0.1
probe-ipv6-src: 2001:db8::1
interval: 2s
timeout: 1s
rise: 2
fall: 3
```
### type: tcp
Opens a TCP connection to the backend and immediately closes it upon success. Use `params` to
optionally wrap the connection in TLS.
* ***params.ssl***: A boolean. When `true`, a TLS handshake is performed after the TCP
connection is established. Defaults to `false`.
* ***params.server-name***: The TLS SNI hostname sent during the handshake. When omitted,
the backend IP address is used.
* ***params.insecure-skip-verify***: A boolean. When `true`, the TLS certificate presented
by the server is not verified. Defaults to `false`.
```yaml
healthchecks:
imaps-check:
type: tcp
port: 993
params:
ssl: true
server-name: imaps.example.com
interval: 5s
timeout: 3s
rise: 2
fall: 3
```
### type: http / https
Opens a TCP (or TLS for `https`) connection, sends an HTTP request, and evaluates the response
code. An optional regexp can additionally match against the response body.
* ***params.path***: Required. The HTTP request path, e.g. `/healthz`.
* ***params.host***: The `Host` header value sent in the request. When omitted, the backend
IP address is used.
* ***params.response-code***: The expected HTTP response code. Can be a single value (`"200"`)
or an inclusive range (`"200-299"`). Defaults to `"200"`.
* ***params.response-regexp***: An optional Go regular expression matched against the response
body. If specified, the body must match for the probe to succeed.
* ***params.server-name***: The TLS SNI hostname (`https` only). Defaults to the value of
`params.host` if not set.
* ***params.insecure-skip-verify***: A boolean. Skip TLS certificate verification (`https`
only). Defaults to `false`.
```yaml
healthchecks:
nginx-http:
type: http
port: 80
params:
path: /healthz
host: nginx.example.com
response-code: "200-204"
interval: 2s
fast-interval: 500ms
down-interval: 30s
timeout: 3s
rise: 2
fall: 3
nginx-https:
type: https
port: 443
params:
path: /healthz
host: nginx.example.com
server-name: nginx.example.com
insecure-skip-verify: false
interval: 5s
timeout: 3s
```
---
## backends
A named map of individual backend servers. Each backend has a single IP address and optionally
references a health check by name. Backends are probed exactly once, even if they appear in
multiple frontends.
* ***address***: Required. The IPv4 or IPv6 address of this backend server.
* ***healthcheck***: The name of a health check defined in the `healthchecks` section.
When empty or omitted, the backend is static: no probing is performed and the backend
enters `StateUp` immediately on startup (via a synthetic pass, rise/fall forced to 1/1).
This is useful for backends that are always available or managed by other means. See
[healthchecks.md](healthchecks.md) for details on the static-backend behavior.
* ***enabled***: A boolean controlling whether this backend participates in any frontend.
When `false`, the backend is excluded entirely and no probe goroutine is started.
Defaults to `true`.
Examples:
```yaml
backends:
nginx0-ams:
address: 198.51.100.10
healthcheck: nginx-http
nginx0-lon:
address: 198.51.100.11
healthcheck: nginx-http
nginx0-draining:
address: 198.51.100.12
healthcheck: nginx-http
enabled: false
static-backend:
address: 198.51.100.20
# no healthcheck: assumed always healthy
```
---
## frontends
A named map of virtual IPs (VIPs). Each frontend ties together a listener address with an
ordered list of backend pools. The gRPC API exposes frontends by name.
* ***description***: An optional free-text string for documentation purposes.
* ***address***: Required. The IPv4 or IPv6 address of the VIP.
* ***protocol***: The IP protocol, either `tcp` or `udp`. When omitted, the frontend matches
all traffic to the VIP address regardless of protocol. If `port` is specified, `protocol`
must also be set.
* ***port***: The destination port of the VIP, an integer between 1 and 65535. Requires
`protocol` to be set. When omitted, the frontend matches all ports. Note that the
frontend port is independent of the healthcheck port: a frontend on port 443 may use
a healthcheck that probes port 80.
* ***pools***: Required. A non-empty ordered list of pool objects. Pools express priority:
the first pool is preferred; subsequent pools act as fallbacks. When every backend in
pool[0] leaves `StateUp` (down, paused, disabled, or not yet probed), pool[1] is
automatically promoted — its up backends take over serving traffic. The promotion
cascades across further tiers. See [healthchecks.md](healthchecks.md#pool-failover)
for the full failover semantics. All backends across all pools in a frontend must
have addresses of the same address family (all IPv4 or all IPv6).
* ***src-ip-sticky***: Boolean, default `false`. When `true`, the VPP load-balancer
programs this VIP with source-IP-based stickiness — all flows from the same client
source IP hash to the same backend (subject to the Maglev consistent-hash bucket
assignment). Use this for protocols that require session affinity at the L3 level,
or when clients open many short flows that should land on one backend. Changing this
field in a running config and reloading causes maglevd to tear down the VIP (all
application servers are deleted with flush, then the VIP itself is deleted) and
recreate it with the new value; VPP has no API to mutate `src_ip_sticky` on an
existing VIP, and existing flow state cannot be preserved across the flip.
Each pool has:
* ***name***: Required. A non-empty string identifying the pool (e.g. `primary`, `fallback`).
* ***backends***: A map of backend names to per-pool backend options. Every name must refer
to an existing entry in the `backends` section.
Per-pool backend options:
* ***weight***: An integer between 0 and 100 (inclusive) expressing the relative weight of
this backend within the pool. `0` keeps the backend in the pool but assigns it no traffic.
Defaults to `100`. Weight is per-pool, not global — the same backend can appear with
different weights in different frontends.
Examples:
```yaml
frontends:
nginx-v4-http:
description: "IPv4 HTTP VIP with fallback"
address: 198.51.100.1
protocol: tcp
port: 80
pools:
- name: primary
backends:
nginx0-ams: { weight: 10 }
nginx0-lon: {}
- name: fallback
backends:
nginx0-fra: {}
maildrop-imaps:
description: "IMAPS VIP"
address: 2001:db8::1
protocol: tcp
port: 993
src-ip-sticky: true
pools:
- name: primary
backends:
maildrop0-ams: {}
maildrop0-lon: {}
```
---
For a detailed description of the health state machine, probe intervals, and all transition events,
see [healthchecks.md](healthchecks.md). For a user guide on how to use the maglev daemon and client,
see the [user-guide.md](user-guide.md).