Files
vpp-maglev/docs/config-guide.md
Pim van Pelt d3c5c86037 VPP load-balancer dataplane integration: state, sync, and global conf
This commit wires maglevd through to VPP's LB plugin end-to-end, using
locally-generated GoVPP bindings for the newer v2 API messages.

VPP binapi (vendored)
- New package internal/vpp/binapi/ containing lb, lb_types, ip_types, and
  interface_types, generated from a local VPP build (~/src/vpp) via a new
  'make vpp-binapi' target. GoVPP v0.12.0 upstream lacks the v2 messages we
  need (lb_conf_get, lb_add_del_vip_v2, lb_add_del_as_v2, lb_as_v2_dump,
  lb_as_set_weight), so we commit the generated output in-tree.
- All generated files go through our loggedChannel wrapper; every VPP API
  send/receive is recorded at DEBUG via slog (vpp-api-send / vpp-api-recv /
  vpp-api-send-multi / vpp-api-recv-multi) so the full wire-level trail is
  auditable. NewAPIChannel is unexported — callers must use c.apiChannel().

Read path: GetLBState{All,VIP}
- GetLBStateAll returns a full snapshot (global conf + every VIP with its
  attached application servers).
- GetLBStateVIP looks up a single VIP by (prefix, protocol, port) and
  returns (nil, nil) when the VIP doesn't exist in VPP. This is the
  efficient path for targeted updates on a busy LB.
- Helpers factored out: getLBConf, dumpAllVIPs, dumpASesForVIP, lookupVIP,
  vipFromDetails.

Write path: SyncLBState{All,VIP}
- SyncLBStateAll reconciles every configured frontend with VPP: creates
  missing VIPs, removes stale ones (with AS flush), and reconciles AS
  membership and weights within VIPs that exist on both sides.
- SyncLBStateVIP targets a single frontend by name. Never removes VIPs.
  Returns ErrFrontendNotFound (wrapped with the name) when the frontend
  isn't in config, so callers can use errors.Is.
- Shared reconcileVIP helper does the per-VIP AS diff; removeVIP is used
  only by the full-sync pass.
- LbAddDelVipV2 requests always set NewFlowsTableLength=1024. The .api
  default=1024 annotation is only applied by VAT/CLI parsers, not wire-
  level marshalling — sending 0 caused VPP to vec_validate with mask
  0xFFFFFFFF and OOM-panic.
- Pool semantics: backends in the primary (first) pool of a frontend get
  their configured weight; backends in secondary pools get weight 0. All
  backends are installed so higher layers can flip weights on failover
  without add/remove churn.
- Every individual change emits a DEBUG slog (vpp-lbsync-vip-add/del,
  vpp-lbsync-as-add/del, vpp-lbsync-as-weight). Start/done INFO logs
  carry a scope=all|vip label plus aggregate counts.

Global conf push: SetLBConf
- New SetLBConf(cfg) sends lb_conf with ipv4-src, ipv6-src, sticky-buckets,
  and flow-timeout. Called automatically on VPP (re)connect and after
  every config reload (via doReloadConfig). Results are cached on the
  Client so redundant pushes are silently skipped — only actual changes
  produce a vpp-lb-conf-set INFO log line.

Periodic drift reconciliation
- vpp.Client.lbSyncLoop runs in a goroutine tied to each VPP connection's
  lifetime. Its first tick is immediate (startup and post-reconnect
  sync quickly); subsequent ticks fire every vpp.lb.sync-interval from
  config (default 30s). Purpose: catch drift if something/someone
  modifies VPP state by hand. The loop uses a ConfigSource interface
  (satisfied by checker.Checker via its new Config() accessor) to avoid
  an import cycle with the checker package.

Config schema additions (maglev.vpp.lb)
- sync-interval: positive Go duration, default 30s.
- ipv4-src-address: REQUIRED. Used as the outer source for GRE4 encap
  to application servers. Missing this is a hard semantic error —
  maglevd --check exits 2 and the daemon refuses to start. VPP GRE
  needs a source address and every VIP we program uses GRE, so there
  is no meaningful config without it.
- ipv6-src-address: REQUIRED. Same treatment as ipv4-src-address.
- sticky-buckets-per-core: default 65536, must be a power of 2.
- flow-timeout: default 40s, must be a whole number of seconds in [1s, 120s].
- VPP validation runs at the end of convert() so structural errors in
  healthchecks/backends/frontends surface first — operators fix those,
  then get the VPP-specific requirements.

gRPC API
- New GetVPPLBState RPC returning VPPLBState: global conf + VIPs with
  ASes. Mirrors the read-path but strips fields irrelevant to our
  GRE-only deployment (srv_type, dscp, target_port).
- New SyncVPPLBState RPC with optional frontend_name. Unset → full sync
  (may remove stale VIPs). Set → single-VIP sync (never removes).
  Returns codes.NotFound for unknown frontends, codes.Unavailable when
  VPP integration is disabled or disconnected.

maglevc (CLI)
- New 'show vpp lbstate' command displaying the LB plugin state. VPP-only
  fields the dataplane irrelevant to GRE are suppressed. Per-AS lines use
  a key-value format ("address X  weight Y  flow-table-buckets Z")
  instead of a tabwriter column, which avoids the ANSI-color alignment
  issue we hit with mixed label/data rows.
- New 'sync vpp lbstate [<name>]' command. Without a name, triggers a
  full reconciliation; with a name, targets one frontend.
- Previous 'show vpp lb' renamed to 'show vpp lbstate' for consistency
  with the new sync command.

Test fixtures
- validConfig and all ad-hoc config_test.go fixtures that reach the end
  of convert() now include the two required vpp.lb src addresses.
- tests/01-maglevd/maglevd-lab/maglev.yaml gains a vpp.lb section so the
  robot integration tests can still load the config.
- cmd/maglevc/tree_test.go gains expected paths for the new commands.

Docs
- config-guide.md: new 'vpp' section in the basic structure, detailed
  vpp.lb field reference, noting ipv4/ipv6 src addresses as REQUIRED
  (hard error) with no defaults; example config updated.
- user-guide.md: documented 'show vpp info', 'show vpp lbstate',
  'sync vpp lbstate [<name>]', new --vpp-api-addr and --vpp-stats-addr
  flags, the vpp-lb-conf-set log line, and corrected the pause/resume
  description to reflect that pause cancels the probe goroutine.
- debian/maglev.yaml: example config gains a vpp.lb block with src
  addresses and commented optional overrides.
2026-04-12 10:58:44 +02:00

334 lines
12 KiB
Markdown

# maglevd Configuration Guide
## Overview
`maglevd` consumes a YAML configuration file of a specific format. Validation is performed
in two stages:
1. **Structural parsing**: the YAML is unmarshalled into typed Go structs. Unknown fields and
type mismatches are rejected immediately.
1. **Semantic validation**: cross-field and cross-object rules are enforced, for example
ensuring that every backend referenced by a frontend exists, that address families are
consistent within a frontend, and that IP source addresses are the correct family.
If you want to get started quickly, take a look at the [example config](../debian/maglev.yaml).
## Basic structure
The YAML configuration file has the following top-level structure:
```yaml
maglev:
healthchecker:
[ Global health checker settings ]
vpp:
lb:
[ VPP load-balancer integration settings ]
healthchecks:
my-check:
[ Health check definition ]
backends:
my-backend:
[ Backend definition ]
frontends:
my-frontend:
[ Frontend (VIP) definition ]
```
All five sections live under the top-level `maglev:` key. The `healthchecks`, `backends`,
and `frontends` sections are maps keyed by an arbitrary name of your choosing. Names must be
unique within their section and are case-sensitive. The `vpp` section is required when
`maglevd` has a working VPP connection — its `lb.ipv4-src-address` and `lb.ipv6-src-address`
fields are mandatory and `maglevd` will refuse to start without them.
---
## healthchecker
Global settings for the health checker engine.
* ***transition-history***: An integer >= 1 that controls how many state transitions are
retained per backend for display via the gRPC API. Defaults to `5`.
* ***netns***: The name of a Linux network namespace in which probes are executed. When
empty or omitted, probes run in the current (default) network namespace. Useful when
backends are reachable only through a dedicated dataplane namespace.
Example:
```yaml
maglev:
healthchecker:
transition-history: 10
netns: dataplane
```
---
## vpp
Settings controlling the integration with a locally running VPP instance. The
`vpp` section is a map with a single sub-section, `lb`. Both `lb.ipv4-src-address`
and `lb.ipv6-src-address` are **required**`maglevd --check` exits with a
semantic error and the daemon refuses to start when either is missing, because
VPP's GRE encap needs a source address and every VIP `maglevd` programs uses GRE.
* ***lb.ipv4-src-address***: Required. The IPv4 source address VPP uses when
encapsulating IPv4 traffic into GRE4 tunnels to application servers. Must
be a valid IPv4 address. No default.
* ***lb.ipv6-src-address***: Required. The IPv6 source address VPP uses when
encapsulating IPv6 traffic into GRE6 tunnels. Must be a valid IPv6 address.
No default.
* ***lb.sync-interval***: A positive Go duration (e.g. `30s`, `1m`) controlling
how often `maglevd` reconciles the VPP load-balancer dataplane against its
running configuration. On startup, an immediate full sync runs; subsequent
syncs fire at this interval as long as the VPP connection is up. Defaults
to `30s`. The purpose is to catch drift — for example, a VIP added to VPP
by hand — and bring VPP back in line with the maglev config.
* ***lb.sticky-buckets-per-core***: The number of buckets per worker thread in
the established-flow table. Must be a power of 2. Defaults to `65536` (64k).
* ***lb.flow-timeout***: Idle time after which an established flow is removed
from the table. Must be a whole number of seconds between `1s` and `120s`
inclusive. Defaults to `40s`.
These four values are pushed to VPP via `lb_conf` when `maglevd` connects to
VPP and again after every config reload (whenever they change). A log line
`vpp-lb-conf-set` records the effective values.
Example:
```yaml
maglev:
vpp:
lb:
sync-interval: 60s
ipv4-src-address: 10.0.0.1
ipv6-src-address: 2001:db8::1
sticky-buckets-per-core: 65536
flow-timeout: 40s
```
---
## healthchecks
A named map of health check definitions. Each health check describes *how* to probe a backend.
Backends reference health checks by name. The same health check can be reused across any number
of backends; each backend is probed exactly once regardless of how many frontends reference it.
Common fields (all types):
* ***type***: Required. One of `icmp`, `tcp`, `http`, or `https`.
* ***port***: The destination port to probe. Required for `tcp`, `http`, and `https`.
Must be omitted for `icmp`.
* ***probe-ipv4-src***: An optional IPv4 source address used when probing IPv4 backends.
Must be an IPv4 address. When omitted, the OS chooses the source address.
* ***probe-ipv6-src***: An optional IPv6 source address used when probing IPv6 backends.
Must be an IPv6 address. When omitted, the OS chooses the source address.
* ***interval***: Required. A positive Go duration string (e.g. `2s`, `500ms`) controlling
how often a probe is sent when the backend is fully healthy (counter at maximum).
* ***fast-interval***: Optional. A positive duration used instead of `interval` while the
backend's health counter is degraded (between down and up) or in `unknown` state. When
omitted, `interval` is used.
* ***down-interval***: Optional. A positive duration used instead of `interval` while the
backend is fully down (counter at zero). When omitted, `interval` is used. Setting this to
a longer value reduces probe traffic to backends that are known to be offline.
* ***timeout***: Required. A positive duration after which an in-flight probe is abandoned
and counted as a failure.
* ***rise***: The number of consecutive successes required to transition from down to up.
Defaults to `2`. Must be >= 1.
* ***fall***: The number of consecutive failures required to transition from up to down.
Defaults to `3`. Must be >= 1.
### type: icmp
Sends an ICMP echo request (ping) to the backend address. Requires `CAP_NET_RAW`. No `port`
may be specified. No `params` block is used.
```yaml
healthchecks:
ping:
type: icmp
probe-ipv4-src: 10.0.0.1
probe-ipv6-src: 2001:db8::1
interval: 2s
timeout: 1s
rise: 2
fall: 3
```
### type: tcp
Opens a TCP connection to the backend and immediately closes it upon success. Use `params` to
optionally wrap the connection in TLS.
* ***params.ssl***: A boolean. When `true`, a TLS handshake is performed after the TCP
connection is established. Defaults to `false`.
* ***params.server-name***: The TLS SNI hostname sent during the handshake. When omitted,
the backend IP address is used.
* ***params.insecure-skip-verify***: A boolean. When `true`, the TLS certificate presented
by the server is not verified. Defaults to `false`.
```yaml
healthchecks:
imaps-check:
type: tcp
port: 993
params:
ssl: true
server-name: imaps.example.com
interval: 5s
timeout: 3s
rise: 2
fall: 3
```
### type: http / https
Opens a TCP (or TLS for `https`) connection, sends an HTTP request, and evaluates the response
code. An optional regexp can additionally match against the response body.
* ***params.path***: Required. The HTTP request path, e.g. `/healthz`.
* ***params.host***: The `Host` header value sent in the request. When omitted, the backend
IP address is used.
* ***params.response-code***: The expected HTTP response code. Can be a single value (`"200"`)
or an inclusive range (`"200-299"`). Defaults to `"200"`.
* ***params.response-regexp***: An optional Go regular expression matched against the response
body. If specified, the body must match for the probe to succeed.
* ***params.server-name***: The TLS SNI hostname (`https` only). Defaults to the value of
`params.host` if not set.
* ***params.insecure-skip-verify***: A boolean. Skip TLS certificate verification (`https`
only). Defaults to `false`.
```yaml
healthchecks:
nginx-http:
type: http
port: 80
params:
path: /healthz
host: nginx.example.com
response-code: "200-204"
interval: 2s
fast-interval: 500ms
down-interval: 30s
timeout: 3s
rise: 2
fall: 3
nginx-https:
type: https
port: 443
params:
path: /healthz
host: nginx.example.com
server-name: nginx.example.com
insecure-skip-verify: false
interval: 5s
timeout: 3s
```
---
## backends
A named map of individual backend servers. Each backend has a single IP address and optionally
references a health check by name. Backends are probed exactly once, even if they appear in
multiple frontends.
* ***address***: Required. The IPv4 or IPv6 address of this backend server.
* ***healthcheck***: The name of a health check defined in the `healthchecks` section.
When empty or omitted, no probing is performed and the backend is assumed permanently
healthy. This is useful for backends that are always available or managed by other means.
* ***enabled***: A boolean controlling whether this backend participates in any frontend.
When `false`, the backend is excluded entirely and no probe goroutine is started.
Defaults to `true`.
Examples:
```yaml
backends:
nginx0-ams:
address: 198.51.100.10
healthcheck: nginx-http
nginx0-lon:
address: 198.51.100.11
healthcheck: nginx-http
nginx0-draining:
address: 198.51.100.12
healthcheck: nginx-http
enabled: false
static-backend:
address: 198.51.100.20
# no healthcheck: assumed always healthy
```
---
## frontends
A named map of virtual IPs (VIPs). Each frontend ties together a listener address with an
ordered list of backend pools. The gRPC API exposes frontends by name.
* ***description***: An optional free-text string for documentation purposes.
* ***address***: Required. The IPv4 or IPv6 address of the VIP.
* ***protocol***: The IP protocol, either `tcp` or `udp`. When omitted, the frontend matches
all traffic to the VIP address regardless of protocol. If `port` is specified, `protocol`
must also be set.
* ***port***: The destination port of the VIP, an integer between 1 and 65535. Requires
`protocol` to be set. When omitted, the frontend matches all ports. Note that the
frontend port is independent of the healthcheck port: a frontend on port 443 may use
a healthcheck that probes port 80.
* ***pools***: Required. A non-empty ordered list of pool objects. Pools express priority:
the first pool is preferred; subsequent pools act as fallbacks. All backends across all
pools in a frontend must have addresses of the same address family (all IPv4 or all IPv6).
Each pool has:
* ***name***: Required. A non-empty string identifying the pool (e.g. `primary`, `fallback`).
* ***backends***: A map of backend names to per-pool backend options. Every name must refer
to an existing entry in the `backends` section.
Per-pool backend options:
* ***weight***: An integer between 0 and 100 (inclusive) expressing the relative weight of
this backend within the pool. `0` keeps the backend in the pool but assigns it no traffic.
Defaults to `100`. Weight is per-pool, not global — the same backend can appear with
different weights in different frontends.
Examples:
```yaml
frontends:
nginx-v4-http:
description: "IPv4 HTTP VIP with fallback"
address: 198.51.100.1
protocol: tcp
port: 80
pools:
- name: primary
backends:
nginx0-ams: { weight: 10 }
nginx0-lon: {}
- name: fallback
backends:
nginx0-fra: {}
maildrop-imaps:
description: "IMAPS VIP"
address: 2001:db8::1
protocol: tcp
port: 993
pools:
- name: primary
backends:
maildrop0-ams: {}
maildrop0-lon: {}
```
---
For a detailed description of the health state machine, probe intervals, and all transition events,
see [healthchecks.md](healthchecks.md). For a user guide on how to use the maglev daemon and client,
see the [user-guide.md](user-guide.md).