This commit wires maglevd through to VPP's LB plugin end-to-end, using
locally-generated GoVPP bindings for the newer v2 API messages.
VPP binapi (vendored)
- New package internal/vpp/binapi/ containing lb, lb_types, ip_types, and
interface_types, generated from a local VPP build (~/src/vpp) via a new
'make vpp-binapi' target. GoVPP v0.12.0 upstream lacks the v2 messages we
need (lb_conf_get, lb_add_del_vip_v2, lb_add_del_as_v2, lb_as_v2_dump,
lb_as_set_weight), so we commit the generated output in-tree.
- All generated files go through our loggedChannel wrapper; every VPP API
send/receive is recorded at DEBUG via slog (vpp-api-send / vpp-api-recv /
vpp-api-send-multi / vpp-api-recv-multi) so the full wire-level trail is
auditable. NewAPIChannel is unexported — callers must use c.apiChannel().
Read path: GetLBState{All,VIP}
- GetLBStateAll returns a full snapshot (global conf + every VIP with its
attached application servers).
- GetLBStateVIP looks up a single VIP by (prefix, protocol, port) and
returns (nil, nil) when the VIP doesn't exist in VPP. This is the
efficient path for targeted updates on a busy LB.
- Helpers factored out: getLBConf, dumpAllVIPs, dumpASesForVIP, lookupVIP,
vipFromDetails.
Write path: SyncLBState{All,VIP}
- SyncLBStateAll reconciles every configured frontend with VPP: creates
missing VIPs, removes stale ones (with AS flush), and reconciles AS
membership and weights within VIPs that exist on both sides.
- SyncLBStateVIP targets a single frontend by name. Never removes VIPs.
Returns ErrFrontendNotFound (wrapped with the name) when the frontend
isn't in config, so callers can use errors.Is.
- Shared reconcileVIP helper does the per-VIP AS diff; removeVIP is used
only by the full-sync pass.
- LbAddDelVipV2 requests always set NewFlowsTableLength=1024. The .api
default=1024 annotation is only applied by VAT/CLI parsers, not wire-
level marshalling — sending 0 caused VPP to vec_validate with mask
0xFFFFFFFF and OOM-panic.
- Pool semantics: backends in the primary (first) pool of a frontend get
their configured weight; backends in secondary pools get weight 0. All
backends are installed so higher layers can flip weights on failover
without add/remove churn.
- Every individual change emits a DEBUG slog (vpp-lbsync-vip-add/del,
vpp-lbsync-as-add/del, vpp-lbsync-as-weight). Start/done INFO logs
carry a scope=all|vip label plus aggregate counts.
Global conf push: SetLBConf
- New SetLBConf(cfg) sends lb_conf with ipv4-src, ipv6-src, sticky-buckets,
and flow-timeout. Called automatically on VPP (re)connect and after
every config reload (via doReloadConfig). Results are cached on the
Client so redundant pushes are silently skipped — only actual changes
produce a vpp-lb-conf-set INFO log line.
Periodic drift reconciliation
- vpp.Client.lbSyncLoop runs in a goroutine tied to each VPP connection's
lifetime. Its first tick is immediate (startup and post-reconnect
sync quickly); subsequent ticks fire every vpp.lb.sync-interval from
config (default 30s). Purpose: catch drift if something/someone
modifies VPP state by hand. The loop uses a ConfigSource interface
(satisfied by checker.Checker via its new Config() accessor) to avoid
an import cycle with the checker package.
Config schema additions (maglev.vpp.lb)
- sync-interval: positive Go duration, default 30s.
- ipv4-src-address: REQUIRED. Used as the outer source for GRE4 encap
to application servers. Missing this is a hard semantic error —
maglevd --check exits 2 and the daemon refuses to start. VPP GRE
needs a source address and every VIP we program uses GRE, so there
is no meaningful config without it.
- ipv6-src-address: REQUIRED. Same treatment as ipv4-src-address.
- sticky-buckets-per-core: default 65536, must be a power of 2.
- flow-timeout: default 40s, must be a whole number of seconds in [1s, 120s].
- VPP validation runs at the end of convert() so structural errors in
healthchecks/backends/frontends surface first — operators fix those,
then get the VPP-specific requirements.
gRPC API
- New GetVPPLBState RPC returning VPPLBState: global conf + VIPs with
ASes. Mirrors the read-path but strips fields irrelevant to our
GRE-only deployment (srv_type, dscp, target_port).
- New SyncVPPLBState RPC with optional frontend_name. Unset → full sync
(may remove stale VIPs). Set → single-VIP sync (never removes).
Returns codes.NotFound for unknown frontends, codes.Unavailable when
VPP integration is disabled or disconnected.
maglevc (CLI)
- New 'show vpp lbstate' command displaying the LB plugin state. VPP-only
fields the dataplane irrelevant to GRE are suppressed. Per-AS lines use
a key-value format ("address X weight Y flow-table-buckets Z")
instead of a tabwriter column, which avoids the ANSI-color alignment
issue we hit with mixed label/data rows.
- New 'sync vpp lbstate [<name>]' command. Without a name, triggers a
full reconciliation; with a name, targets one frontend.
- Previous 'show vpp lb' renamed to 'show vpp lbstate' for consistency
with the new sync command.
Test fixtures
- validConfig and all ad-hoc config_test.go fixtures that reach the end
of convert() now include the two required vpp.lb src addresses.
- tests/01-maglevd/maglevd-lab/maglev.yaml gains a vpp.lb section so the
robot integration tests can still load the config.
- cmd/maglevc/tree_test.go gains expected paths for the new commands.
Docs
- config-guide.md: new 'vpp' section in the basic structure, detailed
vpp.lb field reference, noting ipv4/ipv6 src addresses as REQUIRED
(hard error) with no defaults; example config updated.
- user-guide.md: documented 'show vpp info', 'show vpp lbstate',
'sync vpp lbstate [<name>]', new --vpp-api-addr and --vpp-stats-addr
flags, the vpp-lb-conf-set log line, and corrected the pause/resume
description to reflect that pause cancels the probe goroutine.
- debian/maglev.yaml: example config gains a vpp.lb block with src
addresses and commented optional overrides.
334 lines
12 KiB
Markdown
334 lines
12 KiB
Markdown
# maglevd Configuration Guide
|
|
|
|
## Overview
|
|
|
|
`maglevd` consumes a YAML configuration file of a specific format. Validation is performed
|
|
in two stages:
|
|
|
|
1. **Structural parsing**: the YAML is unmarshalled into typed Go structs. Unknown fields and
|
|
type mismatches are rejected immediately.
|
|
1. **Semantic validation**: cross-field and cross-object rules are enforced, for example
|
|
ensuring that every backend referenced by a frontend exists, that address families are
|
|
consistent within a frontend, and that IP source addresses are the correct family.
|
|
|
|
If you want to get started quickly, take a look at the [example config](../debian/maglev.yaml).
|
|
|
|
## Basic structure
|
|
|
|
The YAML configuration file has the following top-level structure:
|
|
|
|
```yaml
|
|
maglev:
|
|
healthchecker:
|
|
[ Global health checker settings ]
|
|
|
|
vpp:
|
|
lb:
|
|
[ VPP load-balancer integration settings ]
|
|
|
|
healthchecks:
|
|
my-check:
|
|
[ Health check definition ]
|
|
|
|
backends:
|
|
my-backend:
|
|
[ Backend definition ]
|
|
|
|
frontends:
|
|
my-frontend:
|
|
[ Frontend (VIP) definition ]
|
|
```
|
|
|
|
All five sections live under the top-level `maglev:` key. The `healthchecks`, `backends`,
|
|
and `frontends` sections are maps keyed by an arbitrary name of your choosing. Names must be
|
|
unique within their section and are case-sensitive. The `vpp` section is required when
|
|
`maglevd` has a working VPP connection — its `lb.ipv4-src-address` and `lb.ipv6-src-address`
|
|
fields are mandatory and `maglevd` will refuse to start without them.
|
|
|
|
---
|
|
|
|
## healthchecker
|
|
|
|
Global settings for the health checker engine.
|
|
|
|
* ***transition-history***: An integer >= 1 that controls how many state transitions are
|
|
retained per backend for display via the gRPC API. Defaults to `5`.
|
|
* ***netns***: The name of a Linux network namespace in which probes are executed. When
|
|
empty or omitted, probes run in the current (default) network namespace. Useful when
|
|
backends are reachable only through a dedicated dataplane namespace.
|
|
|
|
Example:
|
|
```yaml
|
|
maglev:
|
|
healthchecker:
|
|
transition-history: 10
|
|
netns: dataplane
|
|
```
|
|
|
|
---
|
|
|
|
## vpp
|
|
|
|
Settings controlling the integration with a locally running VPP instance. The
|
|
`vpp` section is a map with a single sub-section, `lb`. Both `lb.ipv4-src-address`
|
|
and `lb.ipv6-src-address` are **required** — `maglevd --check` exits with a
|
|
semantic error and the daemon refuses to start when either is missing, because
|
|
VPP's GRE encap needs a source address and every VIP `maglevd` programs uses GRE.
|
|
|
|
* ***lb.ipv4-src-address***: Required. The IPv4 source address VPP uses when
|
|
encapsulating IPv4 traffic into GRE4 tunnels to application servers. Must
|
|
be a valid IPv4 address. No default.
|
|
* ***lb.ipv6-src-address***: Required. The IPv6 source address VPP uses when
|
|
encapsulating IPv6 traffic into GRE6 tunnels. Must be a valid IPv6 address.
|
|
No default.
|
|
* ***lb.sync-interval***: A positive Go duration (e.g. `30s`, `1m`) controlling
|
|
how often `maglevd` reconciles the VPP load-balancer dataplane against its
|
|
running configuration. On startup, an immediate full sync runs; subsequent
|
|
syncs fire at this interval as long as the VPP connection is up. Defaults
|
|
to `30s`. The purpose is to catch drift — for example, a VIP added to VPP
|
|
by hand — and bring VPP back in line with the maglev config.
|
|
* ***lb.sticky-buckets-per-core***: The number of buckets per worker thread in
|
|
the established-flow table. Must be a power of 2. Defaults to `65536` (64k).
|
|
* ***lb.flow-timeout***: Idle time after which an established flow is removed
|
|
from the table. Must be a whole number of seconds between `1s` and `120s`
|
|
inclusive. Defaults to `40s`.
|
|
|
|
These four values are pushed to VPP via `lb_conf` when `maglevd` connects to
|
|
VPP and again after every config reload (whenever they change). A log line
|
|
`vpp-lb-conf-set` records the effective values.
|
|
|
|
Example:
|
|
```yaml
|
|
maglev:
|
|
vpp:
|
|
lb:
|
|
sync-interval: 60s
|
|
ipv4-src-address: 10.0.0.1
|
|
ipv6-src-address: 2001:db8::1
|
|
sticky-buckets-per-core: 65536
|
|
flow-timeout: 40s
|
|
```
|
|
|
|
---
|
|
|
|
## healthchecks
|
|
|
|
A named map of health check definitions. Each health check describes *how* to probe a backend.
|
|
Backends reference health checks by name. The same health check can be reused across any number
|
|
of backends; each backend is probed exactly once regardless of how many frontends reference it.
|
|
|
|
Common fields (all types):
|
|
|
|
* ***type***: Required. One of `icmp`, `tcp`, `http`, or `https`.
|
|
* ***port***: The destination port to probe. Required for `tcp`, `http`, and `https`.
|
|
Must be omitted for `icmp`.
|
|
* ***probe-ipv4-src***: An optional IPv4 source address used when probing IPv4 backends.
|
|
Must be an IPv4 address. When omitted, the OS chooses the source address.
|
|
* ***probe-ipv6-src***: An optional IPv6 source address used when probing IPv6 backends.
|
|
Must be an IPv6 address. When omitted, the OS chooses the source address.
|
|
* ***interval***: Required. A positive Go duration string (e.g. `2s`, `500ms`) controlling
|
|
how often a probe is sent when the backend is fully healthy (counter at maximum).
|
|
* ***fast-interval***: Optional. A positive duration used instead of `interval` while the
|
|
backend's health counter is degraded (between down and up) or in `unknown` state. When
|
|
omitted, `interval` is used.
|
|
* ***down-interval***: Optional. A positive duration used instead of `interval` while the
|
|
backend is fully down (counter at zero). When omitted, `interval` is used. Setting this to
|
|
a longer value reduces probe traffic to backends that are known to be offline.
|
|
* ***timeout***: Required. A positive duration after which an in-flight probe is abandoned
|
|
and counted as a failure.
|
|
* ***rise***: The number of consecutive successes required to transition from down to up.
|
|
Defaults to `2`. Must be >= 1.
|
|
* ***fall***: The number of consecutive failures required to transition from up to down.
|
|
Defaults to `3`. Must be >= 1.
|
|
|
|
### type: icmp
|
|
|
|
Sends an ICMP echo request (ping) to the backend address. Requires `CAP_NET_RAW`. No `port`
|
|
may be specified. No `params` block is used.
|
|
|
|
```yaml
|
|
healthchecks:
|
|
ping:
|
|
type: icmp
|
|
probe-ipv4-src: 10.0.0.1
|
|
probe-ipv6-src: 2001:db8::1
|
|
interval: 2s
|
|
timeout: 1s
|
|
rise: 2
|
|
fall: 3
|
|
```
|
|
|
|
### type: tcp
|
|
|
|
Opens a TCP connection to the backend and immediately closes it upon success. Use `params` to
|
|
optionally wrap the connection in TLS.
|
|
|
|
* ***params.ssl***: A boolean. When `true`, a TLS handshake is performed after the TCP
|
|
connection is established. Defaults to `false`.
|
|
* ***params.server-name***: The TLS SNI hostname sent during the handshake. When omitted,
|
|
the backend IP address is used.
|
|
* ***params.insecure-skip-verify***: A boolean. When `true`, the TLS certificate presented
|
|
by the server is not verified. Defaults to `false`.
|
|
|
|
```yaml
|
|
healthchecks:
|
|
imaps-check:
|
|
type: tcp
|
|
port: 993
|
|
params:
|
|
ssl: true
|
|
server-name: imaps.example.com
|
|
interval: 5s
|
|
timeout: 3s
|
|
rise: 2
|
|
fall: 3
|
|
```
|
|
|
|
### type: http / https
|
|
|
|
Opens a TCP (or TLS for `https`) connection, sends an HTTP request, and evaluates the response
|
|
code. An optional regexp can additionally match against the response body.
|
|
|
|
* ***params.path***: Required. The HTTP request path, e.g. `/healthz`.
|
|
* ***params.host***: The `Host` header value sent in the request. When omitted, the backend
|
|
IP address is used.
|
|
* ***params.response-code***: The expected HTTP response code. Can be a single value (`"200"`)
|
|
or an inclusive range (`"200-299"`). Defaults to `"200"`.
|
|
* ***params.response-regexp***: An optional Go regular expression matched against the response
|
|
body. If specified, the body must match for the probe to succeed.
|
|
* ***params.server-name***: The TLS SNI hostname (`https` only). Defaults to the value of
|
|
`params.host` if not set.
|
|
* ***params.insecure-skip-verify***: A boolean. Skip TLS certificate verification (`https`
|
|
only). Defaults to `false`.
|
|
|
|
```yaml
|
|
healthchecks:
|
|
nginx-http:
|
|
type: http
|
|
port: 80
|
|
params:
|
|
path: /healthz
|
|
host: nginx.example.com
|
|
response-code: "200-204"
|
|
interval: 2s
|
|
fast-interval: 500ms
|
|
down-interval: 30s
|
|
timeout: 3s
|
|
rise: 2
|
|
fall: 3
|
|
|
|
nginx-https:
|
|
type: https
|
|
port: 443
|
|
params:
|
|
path: /healthz
|
|
host: nginx.example.com
|
|
server-name: nginx.example.com
|
|
insecure-skip-verify: false
|
|
interval: 5s
|
|
timeout: 3s
|
|
```
|
|
|
|
---
|
|
|
|
## backends
|
|
|
|
A named map of individual backend servers. Each backend has a single IP address and optionally
|
|
references a health check by name. Backends are probed exactly once, even if they appear in
|
|
multiple frontends.
|
|
|
|
* ***address***: Required. The IPv4 or IPv6 address of this backend server.
|
|
* ***healthcheck***: The name of a health check defined in the `healthchecks` section.
|
|
When empty or omitted, no probing is performed and the backend is assumed permanently
|
|
healthy. This is useful for backends that are always available or managed by other means.
|
|
* ***enabled***: A boolean controlling whether this backend participates in any frontend.
|
|
When `false`, the backend is excluded entirely and no probe goroutine is started.
|
|
Defaults to `true`.
|
|
|
|
Examples:
|
|
```yaml
|
|
backends:
|
|
nginx0-ams:
|
|
address: 198.51.100.10
|
|
healthcheck: nginx-http
|
|
nginx0-lon:
|
|
address: 198.51.100.11
|
|
healthcheck: nginx-http
|
|
nginx0-draining:
|
|
address: 198.51.100.12
|
|
healthcheck: nginx-http
|
|
enabled: false
|
|
static-backend:
|
|
address: 198.51.100.20
|
|
# no healthcheck: assumed always healthy
|
|
```
|
|
|
|
---
|
|
|
|
## frontends
|
|
|
|
A named map of virtual IPs (VIPs). Each frontend ties together a listener address with an
|
|
ordered list of backend pools. The gRPC API exposes frontends by name.
|
|
|
|
* ***description***: An optional free-text string for documentation purposes.
|
|
* ***address***: Required. The IPv4 or IPv6 address of the VIP.
|
|
* ***protocol***: The IP protocol, either `tcp` or `udp`. When omitted, the frontend matches
|
|
all traffic to the VIP address regardless of protocol. If `port` is specified, `protocol`
|
|
must also be set.
|
|
* ***port***: The destination port of the VIP, an integer between 1 and 65535. Requires
|
|
`protocol` to be set. When omitted, the frontend matches all ports. Note that the
|
|
frontend port is independent of the healthcheck port: a frontend on port 443 may use
|
|
a healthcheck that probes port 80.
|
|
* ***pools***: Required. A non-empty ordered list of pool objects. Pools express priority:
|
|
the first pool is preferred; subsequent pools act as fallbacks. All backends across all
|
|
pools in a frontend must have addresses of the same address family (all IPv4 or all IPv6).
|
|
|
|
Each pool has:
|
|
|
|
* ***name***: Required. A non-empty string identifying the pool (e.g. `primary`, `fallback`).
|
|
* ***backends***: A map of backend names to per-pool backend options. Every name must refer
|
|
to an existing entry in the `backends` section.
|
|
|
|
Per-pool backend options:
|
|
|
|
* ***weight***: An integer between 0 and 100 (inclusive) expressing the relative weight of
|
|
this backend within the pool. `0` keeps the backend in the pool but assigns it no traffic.
|
|
Defaults to `100`. Weight is per-pool, not global — the same backend can appear with
|
|
different weights in different frontends.
|
|
|
|
Examples:
|
|
```yaml
|
|
frontends:
|
|
nginx-v4-http:
|
|
description: "IPv4 HTTP VIP with fallback"
|
|
address: 198.51.100.1
|
|
protocol: tcp
|
|
port: 80
|
|
pools:
|
|
- name: primary
|
|
backends:
|
|
nginx0-ams: { weight: 10 }
|
|
nginx0-lon: {}
|
|
- name: fallback
|
|
backends:
|
|
nginx0-fra: {}
|
|
|
|
maildrop-imaps:
|
|
description: "IMAPS VIP"
|
|
address: 2001:db8::1
|
|
protocol: tcp
|
|
port: 993
|
|
pools:
|
|
- name: primary
|
|
backends:
|
|
maildrop0-ams: {}
|
|
maildrop0-lon: {}
|
|
```
|
|
|
|
---
|
|
|
|
For a detailed description of the health state machine, probe intervals, and all transition events,
|
|
see [healthchecks.md](healthchecks.md). For a user guide on how to use the maglev daemon and client,
|
|
see the [user-guide.md](user-guide.md).
|
|
|