New feature: per-VIP / per-backend runtime counters
* New GetVPPLBCounters RPC serving an in-process snapshot refreshed
by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls
the LB plugin's four SimpleCounters (next, first, untracked,
no-server) plus the FIB /net/route/to CombinedCounter for every
VIP and every backend host prefix via a single DumpStats call.
* FIB stats-index discovery via ip_route_lookup (internal/vpp/
fibstats.go); per-worker reduction happens in the collector.
* Prometheus collector exports vip_packets_total (kind label),
vip_route_{packets,bytes}_total, and backend_route_{packets,
bytes}_total. Metrics source interface extended with VIPStats /
BackendRouteStats; vpp.Client publishes snapshots via
atomic.Pointer and clears them on disconnect.
* New 'show vpp lb counters' CLI command. The 'show vpp lbstate'
and 'sync vpp lbstate' commands are restructured under 'show
vpp lb {state,counters}' / 'sync vpp lb state' to make room
for the new verb.
New feature: src-ip-sticky frontends
* New frontend YAML key 'src-ip-sticky' (bool). Plumbed through
config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call.
* Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP.
src_ip_sticky, and shown in 'show vpp lb state' output.
* Scraped back from VPP by parsing 'show lb vips verbose' through
cli_inband — lb_vip_details does not expose the flag. The same
scrape also recovers the LB pool index for each VIP, which the
stats-segment counters are keyed on. This is a documented
temporary workaround until VPP ships an lb_vip_v2_dump.
* src_ip_sticky cannot be mutated on a live VIP, so a flipped flag
triggers a tear-down-and-recreate in reconcileVIP (ASes deleted
with flush, VIP deleted, then re-added). Flip is logged.
New feature: frontend state aggregation and events
* New health.FrontendState (unknown/up/down) and FrontendTransition
types. A frontend is 'up' iff at least one backend has a nonzero
effective weight, 'unknown' iff no backend has real state yet,
and 'down' otherwise.
* Checker tracks per-frontend aggregate state, recomputing after
each backend transition and emitting a frontend-transition Event
on change. Reload drops entries for removed frontends.
* checker.Event gains an optional FrontendTransition pointer;
backend- vs. frontend-transition events are demultiplexed on
that field.
* WatchEvents now sends an initial snapshot of frontend state on
connect (mirroring the existing backend snapshot), subscribes
once to the checker stream, and fans out to backend/frontend
handlers based on the client's filter flags. The proto
FrontendEvent message grows name + transition fields.
* New Checker.FrontendState accessor.
Refactor: pure health helpers
* Moved the priority-failover selector and the (pool idx, active
pool, state, cfg weight) → (vpp weight, flush) mapping out of
internal/vpp/lbsync.go into a new internal/health/weights.go so
the checker can reuse them for frontend-state computation
without importing internal/vpp.
* New functions: health.ActivePoolIndex, BackendEffectiveWeight,
EffectiveWeights, ComputeFrontendState. lbsync.go now calls
these directly; vpp.EffectiveWeights is a thin wrapper over
health.EffectiveWeights retained for the gRPC observability
path. Fully unit-tested in internal/health/weights_test.go.
maglevc polish
* --color default is now mode-aware: on in the interactive shell,
off in one-shot mode so piped output is script-safe. Explicit
--color=true/false still overrides.
* New stripHostMask helper drops /32 and /128 from VIP display;
non-host prefixes pass through unchanged.
* Counter table column order fixed (first before next) and
packets/bytes columns renamed to fib-packets/fib-bytes to
clarify they come from the FIB, not the LB plugin.
Docs
* config-guide: document src-ip-sticky, including the VIP
recreate-on-change caveat.
* user-guide, maglevc.1, maglevd.8: updated command tree, new
counters command, color defaults, and the src-ip-sticky field.
350 lines
13 KiB
Markdown
350 lines
13 KiB
Markdown
# maglevd Configuration Guide
|
|
|
|
## Overview
|
|
|
|
`maglevd` consumes a YAML configuration file of a specific format. Validation is performed
|
|
in two stages:
|
|
|
|
1. **Structural parsing**: the YAML is unmarshalled into typed Go structs. Unknown fields and
|
|
type mismatches are rejected immediately.
|
|
1. **Semantic validation**: cross-field and cross-object rules are enforced, for example
|
|
ensuring that every backend referenced by a frontend exists, that address families are
|
|
consistent within a frontend, and that IP source addresses are the correct family.
|
|
|
|
If you want to get started quickly, take a look at the [example config](../debian/maglev.yaml).
|
|
|
|
## Basic structure
|
|
|
|
The YAML configuration file has the following top-level structure:
|
|
|
|
```yaml
|
|
maglev:
|
|
healthchecker:
|
|
[ Global health checker settings ]
|
|
|
|
vpp:
|
|
lb:
|
|
[ VPP load-balancer integration settings ]
|
|
|
|
healthchecks:
|
|
my-check:
|
|
[ Health check definition ]
|
|
|
|
backends:
|
|
my-backend:
|
|
[ Backend definition ]
|
|
|
|
frontends:
|
|
my-frontend:
|
|
[ Frontend (VIP) definition ]
|
|
```
|
|
|
|
All five sections live under the top-level `maglev:` key. The `healthchecks`, `backends`,
|
|
and `frontends` sections are maps keyed by an arbitrary name of your choosing. Names must be
|
|
unique within their section and are case-sensitive. The `vpp` section is required when
|
|
`maglevd` has a working VPP connection — its `lb.ipv4-src-address` and `lb.ipv6-src-address`
|
|
fields are mandatory and `maglevd` will refuse to start without them.
|
|
|
|
---
|
|
|
|
## healthchecker
|
|
|
|
Global settings for the health checker engine.
|
|
|
|
* ***transition-history***: An integer >= 1 that controls how many state transitions are
|
|
retained per backend for display via the gRPC API. Defaults to `5`.
|
|
* ***netns***: The name of a Linux network namespace in which probes are executed. When
|
|
empty or omitted, probes run in the current (default) network namespace. Useful when
|
|
backends are reachable only through a dedicated dataplane namespace.
|
|
|
|
Example:
|
|
```yaml
|
|
maglev:
|
|
healthchecker:
|
|
transition-history: 10
|
|
netns: dataplane
|
|
```
|
|
|
|
---
|
|
|
|
## vpp
|
|
|
|
Settings controlling the integration with a locally running VPP instance. The
|
|
`vpp` section is a map with a single sub-section, `lb`. Both `lb.ipv4-src-address`
|
|
and `lb.ipv6-src-address` are **required** — `maglevd --check` exits with a
|
|
semantic error and the daemon refuses to start when either is missing, because
|
|
VPP's GRE encap needs a source address and every VIP `maglevd` programs uses GRE.
|
|
|
|
* ***lb.ipv4-src-address***: Required. The IPv4 source address VPP uses when
|
|
encapsulating IPv4 traffic into GRE4 tunnels to application servers. Must
|
|
be a valid IPv4 address. No default.
|
|
* ***lb.ipv6-src-address***: Required. The IPv6 source address VPP uses when
|
|
encapsulating IPv6 traffic into GRE6 tunnels. Must be a valid IPv6 address.
|
|
No default.
|
|
* ***lb.sync-interval***: A positive Go duration (e.g. `30s`, `1m`) controlling
|
|
how often `maglevd` reconciles the VPP load-balancer dataplane against its
|
|
running configuration. On startup, an immediate full sync runs; subsequent
|
|
syncs fire at this interval as long as the VPP connection is up. Defaults
|
|
to `30s`. The purpose is to catch drift — for example, a VIP added to VPP
|
|
by hand — and bring VPP back in line with the maglev config.
|
|
* ***lb.sticky-buckets-per-core***: The number of buckets per worker thread in
|
|
the established-flow table. Must be a power of 2. Defaults to `65536` (64k).
|
|
* ***lb.flow-timeout***: Idle time after which an established flow is removed
|
|
from the table. Must be a whole number of seconds between `1s` and `120s`
|
|
inclusive. Defaults to `40s`.
|
|
|
|
These four values are pushed to VPP via `lb_conf` when `maglevd` connects to
|
|
VPP and again after every config reload (whenever they change). A log line
|
|
`vpp-lb-conf-set` records the effective values.
|
|
|
|
Example:
|
|
```yaml
|
|
maglev:
|
|
vpp:
|
|
lb:
|
|
sync-interval: 60s
|
|
ipv4-src-address: 10.0.0.1
|
|
ipv6-src-address: 2001:db8::1
|
|
sticky-buckets-per-core: 65536
|
|
flow-timeout: 40s
|
|
```
|
|
|
|
---
|
|
|
|
## healthchecks
|
|
|
|
A named map of health check definitions. Each health check describes *how* to probe a backend.
|
|
Backends reference health checks by name. The same health check can be reused across any number
|
|
of backends; each backend is probed exactly once regardless of how many frontends reference it.
|
|
|
|
Common fields (all types):
|
|
|
|
* ***type***: Required. One of `icmp`, `tcp`, `http`, or `https`.
|
|
* ***port***: The destination port to probe. Required for `tcp`, `http`, and `https`.
|
|
Must be omitted for `icmp`.
|
|
* ***probe-ipv4-src***: An optional IPv4 source address used when probing IPv4 backends.
|
|
Must be an IPv4 address. When omitted, the OS chooses the source address.
|
|
* ***probe-ipv6-src***: An optional IPv6 source address used when probing IPv6 backends.
|
|
Must be an IPv6 address. When omitted, the OS chooses the source address.
|
|
* ***interval***: Required. A positive Go duration string (e.g. `2s`, `500ms`) controlling
|
|
how often a probe is sent when the backend is fully healthy (counter at maximum).
|
|
* ***fast-interval***: Optional. A positive duration used instead of `interval` while the
|
|
backend's health counter is degraded (between down and up) or in `unknown` state. When
|
|
omitted, `interval` is used.
|
|
* ***down-interval***: Optional. A positive duration used instead of `interval` while the
|
|
backend is fully down (counter at zero). When omitted, `interval` is used. Setting this to
|
|
a longer value reduces probe traffic to backends that are known to be offline.
|
|
* ***timeout***: Required. A positive duration after which an in-flight probe is abandoned
|
|
and counted as a failure.
|
|
* ***rise***: The number of consecutive successes required to transition from down to up.
|
|
Defaults to `2`. Must be >= 1.
|
|
* ***fall***: The number of consecutive failures required to transition from up to down.
|
|
Defaults to `3`. Must be >= 1.
|
|
|
|
### type: icmp
|
|
|
|
Sends an ICMP echo request (ping) to the backend address. Requires `CAP_NET_RAW`. No `port`
|
|
may be specified. No `params` block is used.
|
|
|
|
```yaml
|
|
healthchecks:
|
|
ping:
|
|
type: icmp
|
|
probe-ipv4-src: 10.0.0.1
|
|
probe-ipv6-src: 2001:db8::1
|
|
interval: 2s
|
|
timeout: 1s
|
|
rise: 2
|
|
fall: 3
|
|
```
|
|
|
|
### type: tcp
|
|
|
|
Opens a TCP connection to the backend and immediately closes it upon success. Use `params` to
|
|
optionally wrap the connection in TLS.
|
|
|
|
* ***params.ssl***: A boolean. When `true`, a TLS handshake is performed after the TCP
|
|
connection is established. Defaults to `false`.
|
|
* ***params.server-name***: The TLS SNI hostname sent during the handshake. When omitted,
|
|
the backend IP address is used.
|
|
* ***params.insecure-skip-verify***: A boolean. When `true`, the TLS certificate presented
|
|
by the server is not verified. Defaults to `false`.
|
|
|
|
```yaml
|
|
healthchecks:
|
|
imaps-check:
|
|
type: tcp
|
|
port: 993
|
|
params:
|
|
ssl: true
|
|
server-name: imaps.example.com
|
|
interval: 5s
|
|
timeout: 3s
|
|
rise: 2
|
|
fall: 3
|
|
```
|
|
|
|
### type: http / https
|
|
|
|
Opens a TCP (or TLS for `https`) connection, sends an HTTP request, and evaluates the response
|
|
code. An optional regexp can additionally match against the response body.
|
|
|
|
* ***params.path***: Required. The HTTP request path, e.g. `/healthz`.
|
|
* ***params.host***: The `Host` header value sent in the request. When omitted, the backend
|
|
IP address is used.
|
|
* ***params.response-code***: The expected HTTP response code. Can be a single value (`"200"`)
|
|
or an inclusive range (`"200-299"`). Defaults to `"200"`.
|
|
* ***params.response-regexp***: An optional Go regular expression matched against the response
|
|
body. If specified, the body must match for the probe to succeed.
|
|
* ***params.server-name***: The TLS SNI hostname (`https` only). Defaults to the value of
|
|
`params.host` if not set.
|
|
* ***params.insecure-skip-verify***: A boolean. Skip TLS certificate verification (`https`
|
|
only). Defaults to `false`.
|
|
|
|
```yaml
|
|
healthchecks:
|
|
nginx-http:
|
|
type: http
|
|
port: 80
|
|
params:
|
|
path: /healthz
|
|
host: nginx.example.com
|
|
response-code: "200-204"
|
|
interval: 2s
|
|
fast-interval: 500ms
|
|
down-interval: 30s
|
|
timeout: 3s
|
|
rise: 2
|
|
fall: 3
|
|
|
|
nginx-https:
|
|
type: https
|
|
port: 443
|
|
params:
|
|
path: /healthz
|
|
host: nginx.example.com
|
|
server-name: nginx.example.com
|
|
insecure-skip-verify: false
|
|
interval: 5s
|
|
timeout: 3s
|
|
```
|
|
|
|
---
|
|
|
|
## backends
|
|
|
|
A named map of individual backend servers. Each backend has a single IP address and optionally
|
|
references a health check by name. Backends are probed exactly once, even if they appear in
|
|
multiple frontends.
|
|
|
|
* ***address***: Required. The IPv4 or IPv6 address of this backend server.
|
|
* ***healthcheck***: The name of a health check defined in the `healthchecks` section.
|
|
When empty or omitted, the backend is static: no probing is performed and the backend
|
|
enters `StateUp` immediately on startup (via a synthetic pass, rise/fall forced to 1/1).
|
|
This is useful for backends that are always available or managed by other means. See
|
|
[healthchecks.md](healthchecks.md) for details on the static-backend behavior.
|
|
* ***enabled***: A boolean controlling whether this backend participates in any frontend.
|
|
When `false`, the backend is excluded entirely and no probe goroutine is started.
|
|
Defaults to `true`.
|
|
|
|
Examples:
|
|
```yaml
|
|
backends:
|
|
nginx0-ams:
|
|
address: 198.51.100.10
|
|
healthcheck: nginx-http
|
|
nginx0-lon:
|
|
address: 198.51.100.11
|
|
healthcheck: nginx-http
|
|
nginx0-draining:
|
|
address: 198.51.100.12
|
|
healthcheck: nginx-http
|
|
enabled: false
|
|
static-backend:
|
|
address: 198.51.100.20
|
|
# no healthcheck: assumed always healthy
|
|
```
|
|
|
|
---
|
|
|
|
## frontends
|
|
|
|
A named map of virtual IPs (VIPs). Each frontend ties together a listener address with an
|
|
ordered list of backend pools. The gRPC API exposes frontends by name.
|
|
|
|
* ***description***: An optional free-text string for documentation purposes.
|
|
* ***address***: Required. The IPv4 or IPv6 address of the VIP.
|
|
* ***protocol***: The IP protocol, either `tcp` or `udp`. When omitted, the frontend matches
|
|
all traffic to the VIP address regardless of protocol. If `port` is specified, `protocol`
|
|
must also be set.
|
|
* ***port***: The destination port of the VIP, an integer between 1 and 65535. Requires
|
|
`protocol` to be set. When omitted, the frontend matches all ports. Note that the
|
|
frontend port is independent of the healthcheck port: a frontend on port 443 may use
|
|
a healthcheck that probes port 80.
|
|
* ***pools***: Required. A non-empty ordered list of pool objects. Pools express priority:
|
|
the first pool is preferred; subsequent pools act as fallbacks. When every backend in
|
|
pool[0] leaves `StateUp` (down, paused, disabled, or not yet probed), pool[1] is
|
|
automatically promoted — its up backends take over serving traffic. The promotion
|
|
cascades across further tiers. See [healthchecks.md](healthchecks.md#pool-failover)
|
|
for the full failover semantics. All backends across all pools in a frontend must
|
|
have addresses of the same address family (all IPv4 or all IPv6).
|
|
* ***src-ip-sticky***: Boolean, default `false`. When `true`, the VPP load-balancer
|
|
programs this VIP with source-IP-based stickiness — all flows from the same client
|
|
source IP hash to the same backend (subject to the Maglev consistent-hash bucket
|
|
assignment). Use this for protocols that require session affinity at the L3 level,
|
|
or when clients open many short flows that should land on one backend. Changing this
|
|
field in a running config and reloading causes maglevd to tear down the VIP (all
|
|
application servers are deleted with flush, then the VIP itself is deleted) and
|
|
recreate it with the new value; VPP has no API to mutate `src_ip_sticky` on an
|
|
existing VIP, and existing flow state cannot be preserved across the flip.
|
|
|
|
Each pool has:
|
|
|
|
* ***name***: Required. A non-empty string identifying the pool (e.g. `primary`, `fallback`).
|
|
* ***backends***: A map of backend names to per-pool backend options. Every name must refer
|
|
to an existing entry in the `backends` section.
|
|
|
|
Per-pool backend options:
|
|
|
|
* ***weight***: An integer between 0 and 100 (inclusive) expressing the relative weight of
|
|
this backend within the pool. `0` keeps the backend in the pool but assigns it no traffic.
|
|
Defaults to `100`. Weight is per-pool, not global — the same backend can appear with
|
|
different weights in different frontends.
|
|
|
|
Examples:
|
|
```yaml
|
|
frontends:
|
|
nginx-v4-http:
|
|
description: "IPv4 HTTP VIP with fallback"
|
|
address: 198.51.100.1
|
|
protocol: tcp
|
|
port: 80
|
|
pools:
|
|
- name: primary
|
|
backends:
|
|
nginx0-ams: { weight: 10 }
|
|
nginx0-lon: {}
|
|
- name: fallback
|
|
backends:
|
|
nginx0-fra: {}
|
|
|
|
maildrop-imaps:
|
|
description: "IMAPS VIP"
|
|
address: 2001:db8::1
|
|
protocol: tcp
|
|
port: 993
|
|
src-ip-sticky: true
|
|
pools:
|
|
- name: primary
|
|
backends:
|
|
maildrop0-ams: {}
|
|
maildrop0-lon: {}
|
|
```
|
|
|
|
---
|
|
|
|
For a detailed description of the health state machine, probe intervals, and all transition events,
|
|
see [healthchecks.md](healthchecks.md). For a user guide on how to use the maglev daemon and client,
|
|
see the [user-guide.md](user-guide.md).
|
|
|