Makefile:
- New install-deps umbrella target split into three sub-targets:
install-deps-apt — Debian/Trixie-packaged build deps
(nodejs, npm, protobuf-compiler, git, make,
dpkg-dev, ca-certificates, curl, tar). Uses
sudo when not already root.
install-deps-go — ensures a Go toolchain >= GO_VERSION (go.mod
floor, default 1.25.0). Short-circuits when
the system Go is already recent enough;
otherwise downloads the upstream tarball
from go.dev/dl/ into /usr/local/go. Trixie
only ships 1.24 so this step is load-bearing.
install-deps-go-tools — go install protoc-gen-go, protoc-gen-go-grpc,
and golangci-lint/v2/cmd/golangci-lint. Then
asserts the installed golangci-lint version
parses as >= GOLANGCI_LINT_VERSION (default
1.64.0, the floor that supports Go 1.25
syntax) to catch stale binaries in $GOPATH
/bin before they silently run against Go
1.25 code.
- Parser bug fixed: golangci-lint v1.x prints "has version v1.64.8" but
v2.x dropped the 'v' prefix and prints "has version 2.11.4". The
original sed regex required the 'v' and returned an empty match on
v2.x, making the assertion explode with "could not parse version
output". Fixed by switching to extended regex (sed -En) with 'v?' so
both forms parse cleanly.
- GO_VERSION and GOLANGCI_LINT_VERSION exposed as Makefile variables
so operators can override on the command line, e.g.
make install-deps GO_VERSION=1.25.5 GOLANGCI_LINT_VERSION=2.0.0
- .PHONY extended with the four new target names.
Docs:
- README.md: capability note rewritten to cover CAP_NET_RAW (ICMP) and
the new CAP_SYS_ADMIN requirement when healthchecker.netns is set,
plus a paragraph explaining that the Debian systemd unit grants both
automatically. Docker example gained a second variant that shows the
additional --cap-add SYS_ADMIN and /var/run/netns bind mount for
netns-scoped deployments. Also notes that maglevd-frontend ignores
SIGHUP so controlling-terminal disconnects don't kill it.
- docs/user-guide.md: Capabilities section rewritten as a bulleted
list covering both caps, with the EPERM error string and three
different ways to grant them (systemd unit, setcap, systemd-run);
'show vpp lb counters' command description updated to explain that
per-backend packet counts are no longer shown (LB plugin's
forwarding node bypasses ip{4,6}_lookup_inline, so /net/route/to at
the backend's FIB entry never ticks for LB-forwarded traffic); new
~75-line "What the SPA shows" subsection covering the scope
selector + maglev_scope cookie, the per-maglevd frontend cards, the
health-cascade icon table (ok / bug-buckets / primary-drained /
degraded / unknown), the lb buckets column semantics, the
maglev_zippy_open cookie, the admin-mode lifecycle dialogs with
their plain-English consequence text, and the debug panel.
- docs/config-guide.md: healthchecker.netns field gains a capability-
requirement note spelling out setns(CLONE_NEWNET), the EPERM
symptom string, and the /var/run/netns/ readability requirement.
- docs/healthchecks.md: new "Jitter" subsection explaining the +/-10%
scaling on every computed interval, and a "Probe timing while a
probe is in flight" subsection that explains why fast-interval alone
doesn't give fast fault detection against hanging backends (the
probe loop is synchronous, so each iteration is timeout +
fast-interval; the advice is to lower timeout, not fast-interval).
- docs/maglevd.8: description paragraph corrected (dropped the
per-backend stats claim and added a short note pointing at the LB
plugin forwarding-path bypass); new CAPABILITIES section between
SIGNALS and FILES covering both CAP_NET_RAW and CAP_SYS_ADMIN with
the drop-in-override hint.
- docs/maglevd-frontend.8: new SIGNALS section documenting the
explicit SIGHUP ignore (so a controlling-terminal disconnect doesn't
kill the daemon); description extended with paragraphs on the two
persistence cookies (maglev_scope, maglev_zippy_open) and on the
health-cascade icon + lb buckets column.
- docs/maglevc.1: left untouched — intentionally minimal and delegates
to docs/user-guide.md.
Lint (26 issues across 12 files, all errcheck / ineffassign / S1021):
- cmd/frontend/handlers.go: _, _ = fmt.Fprintf(...) for the SSE retry
hint and resync control-event writes.
- cmd/maglevc/commands.go: bulk-prefix every fmt.Fprintf(w, ...) with
_, _ =; also merged 'var watchEventsOptSlot *Node; ... = &Node{...}'
into a single := declaration (staticcheck S1021) — the self-
referencing pattern still works because the Children back-ref is
assigned on the next statement, not inside the struct literal.
- cmd/maglevc/complete.go: _, _ = fmt.Fprintf(ql.rl.Stderr(), ...)
for the banner and help writes; removed the ineffectual
'partial = ""' assignment (nothing downstream reads partial after
that branch, so setting it was dead code flagged by ineffassign).
- cmd/maglevc/shell.go: defer func() { _ = rl.Close() }() for the
readline instance; _, _ = fmt.Fprintf(rl.Stderr(), ...) for error
display in the REPL loop.
- cmd/maglevc/main.go: defer func() { _ = conn.Close() }() for the
gRPC client connection.
- internal/grpcapi/server_test.go: _ = conn.Close() in the test
teardown closure.
- internal/prober/http.go: _ = c.Close() in the TLS-handshake-failed
path; defer func() { _ = conn.Close() }() and defer func() { _ =
resp.Body.Close() }() for the two deferred cleanups.
- internal/prober/http_test.go: defer func() { _ = resp.Body.Close()
}() plus three _, _ = fmt.Fprint(w, ...) in the httptest.Server
handlers and _, _ = fmt.Sscanf(...) when parsing the test listener's
port.
- internal/prober/icmp.go: defer func() { _ = pc.Close() }() for the
ICMP packet conn.
- internal/prober/netns.go: defer func() { _ = origNs.Close() }(),
defer func() { _ = netns.Set(origNs) }(), defer func() { _ =
targetNs.Close() }() — also dropped a stray //nolint:errcheck that
was no longer needed once the closure wrapping handled the discard.
- internal/prober/tcp.go: _ = conn.Close() in the L4-only path,
_ = tlsConn.Close() in the failed and succeeded handshake branches,
_ = tlsConn.SetDeadline(...) (also dropped a //nolint:errcheck
previously covering it).
Iterative 'make lint' runs were needed because golangci-lint v2.x
caps same-linter reports per pass, so the first pass reported 21,
then 4, then 3, then 1, then 0. Final pass: 0 issues. make test is
green across every package, and make build produces all three
binaries cleanly.
366 lines
14 KiB
Markdown
366 lines
14 KiB
Markdown
# maglevd Configuration Guide
|
|
|
|
## Overview
|
|
|
|
`maglevd` consumes a YAML configuration file of a specific format. Validation is performed
|
|
in two stages:
|
|
|
|
1. **Structural parsing**: the YAML is unmarshalled into typed Go structs. Unknown fields and
|
|
type mismatches are rejected immediately.
|
|
1. **Semantic validation**: cross-field and cross-object rules are enforced, for example
|
|
ensuring that every backend referenced by a frontend exists, that address families are
|
|
consistent within a frontend, and that IP source addresses are the correct family.
|
|
|
|
If you want to get started quickly, take a look at the [example config](../debian/maglev.yaml).
|
|
|
|
## Basic structure
|
|
|
|
The YAML configuration file has the following top-level structure:
|
|
|
|
```yaml
|
|
maglev:
|
|
healthchecker:
|
|
[ Global health checker settings ]
|
|
|
|
vpp:
|
|
lb:
|
|
[ VPP load-balancer integration settings ]
|
|
|
|
healthchecks:
|
|
my-check:
|
|
[ Health check definition ]
|
|
|
|
backends:
|
|
my-backend:
|
|
[ Backend definition ]
|
|
|
|
frontends:
|
|
my-frontend:
|
|
[ Frontend (VIP) definition ]
|
|
```
|
|
|
|
All five sections live under the top-level `maglev:` key. The `healthchecks`, `backends`,
|
|
and `frontends` sections are maps keyed by an arbitrary name of your choosing. Names must be
|
|
unique within their section and are case-sensitive. The `vpp` section is required when
|
|
`maglevd` has a working VPP connection — its `lb.ipv4-src-address` and `lb.ipv6-src-address`
|
|
fields are mandatory and `maglevd` will refuse to start without them.
|
|
|
|
---
|
|
|
|
## healthchecker
|
|
|
|
Global settings for the health checker engine.
|
|
|
|
* ***transition-history***: An integer >= 1 that controls how many state transitions are
|
|
retained per backend for display via the gRPC API. Defaults to `5`.
|
|
* ***netns***: The name of a Linux network namespace in which probes are executed. When
|
|
empty or omitted, probes run in the current (default) network namespace. Useful when
|
|
backends are reachable only through a dedicated dataplane namespace.
|
|
|
|
**Capability requirement**: setting this field makes `maglevd` call
|
|
`setns(CLONE_NEWNET)` on the probe thread before each probe, which the
|
|
kernel only permits to processes holding `CAP_SYS_ADMIN` in the target
|
|
namespace's user namespace (`setns(2)`). The Debian systemd unit
|
|
(`vpp-maglev.service`) already grants this capability; if you run
|
|
`maglevd` by hand under a non-root user make sure the binary has
|
|
`CAP_SYS_ADMIN` via `setcap cap_net_raw,cap_sys_admin=eip
|
|
/usr/sbin/maglevd` or equivalent, otherwise every probe fails with
|
|
`enter netns "<name>": operation not permitted` and all backends
|
|
transition to `down` on their first probe.
|
|
|
|
Also make sure the named namespace is mounted under `/var/run/netns/`
|
|
(which is where `ip netns add` puts it) and that it is readable by
|
|
the user `maglevd` runs as — the default mode from `ip netns add` is
|
|
`0644`, which is fine for any user.
|
|
|
|
Example:
|
|
```yaml
|
|
maglev:
|
|
healthchecker:
|
|
transition-history: 10
|
|
netns: dataplane
|
|
```
|
|
|
|
---
|
|
|
|
## vpp
|
|
|
|
Settings controlling the integration with a locally running VPP instance. The
|
|
`vpp` section is a map with a single sub-section, `lb`. Both `lb.ipv4-src-address`
|
|
and `lb.ipv6-src-address` are **required** — `maglevd --check` exits with a
|
|
semantic error and the daemon refuses to start when either is missing, because
|
|
VPP's GRE encap needs a source address and every VIP `maglevd` programs uses GRE.
|
|
|
|
* ***lb.ipv4-src-address***: Required. The IPv4 source address VPP uses when
|
|
encapsulating IPv4 traffic into GRE4 tunnels to application servers. Must
|
|
be a valid IPv4 address. No default.
|
|
* ***lb.ipv6-src-address***: Required. The IPv6 source address VPP uses when
|
|
encapsulating IPv6 traffic into GRE6 tunnels. Must be a valid IPv6 address.
|
|
No default.
|
|
* ***lb.sync-interval***: A positive Go duration (e.g. `30s`, `1m`) controlling
|
|
how often `maglevd` reconciles the VPP load-balancer dataplane against its
|
|
running configuration. On startup, an immediate full sync runs; subsequent
|
|
syncs fire at this interval as long as the VPP connection is up. Defaults
|
|
to `30s`. The purpose is to catch drift — for example, a VIP added to VPP
|
|
by hand — and bring VPP back in line with the maglev config.
|
|
* ***lb.sticky-buckets-per-core***: The number of buckets per worker thread in
|
|
the established-flow table. Must be a power of 2. Defaults to `65536` (64k).
|
|
* ***lb.flow-timeout***: Idle time after which an established flow is removed
|
|
from the table. Must be a whole number of seconds between `1s` and `120s`
|
|
inclusive. Defaults to `40s`.
|
|
|
|
These four values are pushed to VPP via `lb_conf` when `maglevd` connects to
|
|
VPP and again after every config reload (whenever they change). A log line
|
|
`vpp-lb-conf-set` records the effective values.
|
|
|
|
Example:
|
|
```yaml
|
|
maglev:
|
|
vpp:
|
|
lb:
|
|
sync-interval: 60s
|
|
ipv4-src-address: 10.0.0.1
|
|
ipv6-src-address: 2001:db8::1
|
|
sticky-buckets-per-core: 65536
|
|
flow-timeout: 40s
|
|
```
|
|
|
|
---
|
|
|
|
## healthchecks
|
|
|
|
A named map of health check definitions. Each health check describes *how* to probe a backend.
|
|
Backends reference health checks by name. The same health check can be reused across any number
|
|
of backends; each backend is probed exactly once regardless of how many frontends reference it.
|
|
|
|
Common fields (all types):
|
|
|
|
* ***type***: Required. One of `icmp`, `tcp`, `http`, or `https`.
|
|
* ***port***: The destination port to probe. Required for `tcp`, `http`, and `https`.
|
|
Must be omitted for `icmp`.
|
|
* ***probe-ipv4-src***: An optional IPv4 source address used when probing IPv4 backends.
|
|
Must be an IPv4 address. When omitted, the OS chooses the source address.
|
|
* ***probe-ipv6-src***: An optional IPv6 source address used when probing IPv6 backends.
|
|
Must be an IPv6 address. When omitted, the OS chooses the source address.
|
|
* ***interval***: Required. A positive Go duration string (e.g. `2s`, `500ms`) controlling
|
|
how often a probe is sent when the backend is fully healthy (counter at maximum).
|
|
* ***fast-interval***: Optional. A positive duration used instead of `interval` while the
|
|
backend's health counter is degraded (between down and up) or in `unknown` state. When
|
|
omitted, `interval` is used.
|
|
* ***down-interval***: Optional. A positive duration used instead of `interval` while the
|
|
backend is fully down (counter at zero). When omitted, `interval` is used. Setting this to
|
|
a longer value reduces probe traffic to backends that are known to be offline.
|
|
* ***timeout***: Required. A positive duration after which an in-flight probe is abandoned
|
|
and counted as a failure.
|
|
* ***rise***: The number of consecutive successes required to transition from down to up.
|
|
Defaults to `2`. Must be >= 1.
|
|
* ***fall***: The number of consecutive failures required to transition from up to down.
|
|
Defaults to `3`. Must be >= 1.
|
|
|
|
### type: icmp
|
|
|
|
Sends an ICMP echo request (ping) to the backend address. Requires `CAP_NET_RAW`. No `port`
|
|
may be specified. No `params` block is used.
|
|
|
|
```yaml
|
|
healthchecks:
|
|
ping:
|
|
type: icmp
|
|
probe-ipv4-src: 10.0.0.1
|
|
probe-ipv6-src: 2001:db8::1
|
|
interval: 2s
|
|
timeout: 1s
|
|
rise: 2
|
|
fall: 3
|
|
```
|
|
|
|
### type: tcp
|
|
|
|
Opens a TCP connection to the backend and immediately closes it upon success. Use `params` to
|
|
optionally wrap the connection in TLS.
|
|
|
|
* ***params.ssl***: A boolean. When `true`, a TLS handshake is performed after the TCP
|
|
connection is established. Defaults to `false`.
|
|
* ***params.server-name***: The TLS SNI hostname sent during the handshake. When omitted,
|
|
the backend IP address is used.
|
|
* ***params.insecure-skip-verify***: A boolean. When `true`, the TLS certificate presented
|
|
by the server is not verified. Defaults to `false`.
|
|
|
|
```yaml
|
|
healthchecks:
|
|
imaps-check:
|
|
type: tcp
|
|
port: 993
|
|
params:
|
|
ssl: true
|
|
server-name: imaps.example.com
|
|
interval: 5s
|
|
timeout: 3s
|
|
rise: 2
|
|
fall: 3
|
|
```
|
|
|
|
### type: http / https
|
|
|
|
Opens a TCP (or TLS for `https`) connection, sends an HTTP request, and evaluates the response
|
|
code. An optional regexp can additionally match against the response body.
|
|
|
|
* ***params.path***: Required. The HTTP request path, e.g. `/healthz`.
|
|
* ***params.host***: The `Host` header value sent in the request. When omitted, the backend
|
|
IP address is used.
|
|
* ***params.response-code***: The expected HTTP response code. Can be a single value (`"200"`)
|
|
or an inclusive range (`"200-299"`). Defaults to `"200"`.
|
|
* ***params.response-regexp***: An optional Go regular expression matched against the response
|
|
body. If specified, the body must match for the probe to succeed.
|
|
* ***params.server-name***: The TLS SNI hostname (`https` only). Defaults to the value of
|
|
`params.host` if not set.
|
|
* ***params.insecure-skip-verify***: A boolean. Skip TLS certificate verification (`https`
|
|
only). Defaults to `false`.
|
|
|
|
```yaml
|
|
healthchecks:
|
|
nginx-http:
|
|
type: http
|
|
port: 80
|
|
params:
|
|
path: /healthz
|
|
host: nginx.example.com
|
|
response-code: "200-204"
|
|
interval: 2s
|
|
fast-interval: 500ms
|
|
down-interval: 30s
|
|
timeout: 3s
|
|
rise: 2
|
|
fall: 3
|
|
|
|
nginx-https:
|
|
type: https
|
|
port: 443
|
|
params:
|
|
path: /healthz
|
|
host: nginx.example.com
|
|
server-name: nginx.example.com
|
|
insecure-skip-verify: false
|
|
interval: 5s
|
|
timeout: 3s
|
|
```
|
|
|
|
---
|
|
|
|
## backends
|
|
|
|
A named map of individual backend servers. Each backend has a single IP address and optionally
|
|
references a health check by name. Backends are probed exactly once, even if they appear in
|
|
multiple frontends.
|
|
|
|
* ***address***: Required. The IPv4 or IPv6 address of this backend server.
|
|
* ***healthcheck***: The name of a health check defined in the `healthchecks` section.
|
|
When empty or omitted, the backend is static: no probing is performed and the backend
|
|
enters `StateUp` immediately on startup (via a synthetic pass, rise/fall forced to 1/1).
|
|
This is useful for backends that are always available or managed by other means. See
|
|
[healthchecks.md](healthchecks.md) for details on the static-backend behavior.
|
|
* ***enabled***: A boolean controlling whether this backend participates in any frontend.
|
|
When `false`, the backend is excluded entirely and no probe goroutine is started.
|
|
Defaults to `true`.
|
|
|
|
Examples:
|
|
```yaml
|
|
backends:
|
|
nginx0-ams:
|
|
address: 198.51.100.10
|
|
healthcheck: nginx-http
|
|
nginx0-lon:
|
|
address: 198.51.100.11
|
|
healthcheck: nginx-http
|
|
nginx0-draining:
|
|
address: 198.51.100.12
|
|
healthcheck: nginx-http
|
|
enabled: false
|
|
static-backend:
|
|
address: 198.51.100.20
|
|
# no healthcheck: assumed always healthy
|
|
```
|
|
|
|
---
|
|
|
|
## frontends
|
|
|
|
A named map of virtual IPs (VIPs). Each frontend ties together a listener address with an
|
|
ordered list of backend pools. The gRPC API exposes frontends by name.
|
|
|
|
* ***description***: An optional free-text string for documentation purposes.
|
|
* ***address***: Required. The IPv4 or IPv6 address of the VIP.
|
|
* ***protocol***: The IP protocol, either `tcp` or `udp`. When omitted, the frontend matches
|
|
all traffic to the VIP address regardless of protocol. If `port` is specified, `protocol`
|
|
must also be set.
|
|
* ***port***: The destination port of the VIP, an integer between 1 and 65535. Requires
|
|
`protocol` to be set. When omitted, the frontend matches all ports. Note that the
|
|
frontend port is independent of the healthcheck port: a frontend on port 443 may use
|
|
a healthcheck that probes port 80.
|
|
* ***pools***: Required. A non-empty ordered list of pool objects. Pools express priority:
|
|
the first pool is preferred; subsequent pools act as fallbacks. When every backend in
|
|
pool[0] leaves `StateUp` (down, paused, disabled, or not yet probed), pool[1] is
|
|
automatically promoted — its up backends take over serving traffic. The promotion
|
|
cascades across further tiers. See [healthchecks.md](healthchecks.md#pool-failover)
|
|
for the full failover semantics. All backends across all pools in a frontend must
|
|
have addresses of the same address family (all IPv4 or all IPv6).
|
|
* ***src-ip-sticky***: Boolean, default `false`. When `true`, the VPP load-balancer
|
|
programs this VIP with source-IP-based stickiness — all flows from the same client
|
|
source IP hash to the same backend (subject to the Maglev consistent-hash bucket
|
|
assignment). Use this for protocols that require session affinity at the L3 level,
|
|
or when clients open many short flows that should land on one backend. Changing this
|
|
field in a running config and reloading causes maglevd to tear down the VIP (all
|
|
application servers are deleted with flush, then the VIP itself is deleted) and
|
|
recreate it with the new value; VPP has no API to mutate `src_ip_sticky` on an
|
|
existing VIP, and existing flow state cannot be preserved across the flip.
|
|
|
|
Each pool has:
|
|
|
|
* ***name***: Required. A non-empty string identifying the pool (e.g. `primary`, `fallback`).
|
|
* ***backends***: A map of backend names to per-pool backend options. Every name must refer
|
|
to an existing entry in the `backends` section.
|
|
|
|
Per-pool backend options:
|
|
|
|
* ***weight***: An integer between 0 and 100 (inclusive) expressing the relative weight of
|
|
this backend within the pool. `0` keeps the backend in the pool but assigns it no traffic.
|
|
Defaults to `100`. Weight is per-pool, not global — the same backend can appear with
|
|
different weights in different frontends.
|
|
|
|
Examples:
|
|
```yaml
|
|
frontends:
|
|
nginx-v4-http:
|
|
description: "IPv4 HTTP VIP with fallback"
|
|
address: 198.51.100.1
|
|
protocol: tcp
|
|
port: 80
|
|
pools:
|
|
- name: primary
|
|
backends:
|
|
nginx0-ams: { weight: 10 }
|
|
nginx0-lon: {}
|
|
- name: fallback
|
|
backends:
|
|
nginx0-fra: {}
|
|
|
|
maildrop-imaps:
|
|
description: "IMAPS VIP"
|
|
address: 2001:db8::1
|
|
protocol: tcp
|
|
port: 993
|
|
src-ip-sticky: true
|
|
pools:
|
|
- name: primary
|
|
backends:
|
|
maildrop0-ams: {}
|
|
maildrop0-lon: {}
|
|
```
|
|
|
|
---
|
|
|
|
For a detailed description of the health state machine, probe intervals, and all transition events,
|
|
see [healthchecks.md](healthchecks.md). For a user guide on how to use the maglev daemon and client,
|
|
see the [user-guide.md](user-guide.md).
|
|
|