Files

Pim van Pelt d3c5c86037 VPP load-balancer dataplane integration: state, sync, and global conf

This commit wires maglevd through to VPP's LB plugin end-to-end, using
locally-generated GoVPP bindings for the newer v2 API messages.

VPP binapi (vendored)
- New package internal/vpp/binapi/ containing lb, lb_types, ip_types, and
  interface_types, generated from a local VPP build (~/src/vpp) via a new
  'make vpp-binapi' target. GoVPP v0.12.0 upstream lacks the v2 messages we
  need (lb_conf_get, lb_add_del_vip_v2, lb_add_del_as_v2, lb_as_v2_dump,
  lb_as_set_weight), so we commit the generated output in-tree.
- All generated files go through our loggedChannel wrapper; every VPP API
  send/receive is recorded at DEBUG via slog (vpp-api-send / vpp-api-recv /
  vpp-api-send-multi / vpp-api-recv-multi) so the full wire-level trail is
  auditable. NewAPIChannel is unexported — callers must use c.apiChannel().

Read path: GetLBState{All,VIP}
- GetLBStateAll returns a full snapshot (global conf + every VIP with its
  attached application servers).
- GetLBStateVIP looks up a single VIP by (prefix, protocol, port) and
  returns (nil, nil) when the VIP doesn't exist in VPP. This is the
  efficient path for targeted updates on a busy LB.
- Helpers factored out: getLBConf, dumpAllVIPs, dumpASesForVIP, lookupVIP,
  vipFromDetails.

Write path: SyncLBState{All,VIP}
- SyncLBStateAll reconciles every configured frontend with VPP: creates
  missing VIPs, removes stale ones (with AS flush), and reconciles AS
  membership and weights within VIPs that exist on both sides.
- SyncLBStateVIP targets a single frontend by name. Never removes VIPs.
  Returns ErrFrontendNotFound (wrapped with the name) when the frontend
  isn't in config, so callers can use errors.Is.
- Shared reconcileVIP helper does the per-VIP AS diff; removeVIP is used
  only by the full-sync pass.
- LbAddDelVipV2 requests always set NewFlowsTableLength=1024. The .api
  default=1024 annotation is only applied by VAT/CLI parsers, not wire-
  level marshalling — sending 0 caused VPP to vec_validate with mask
  0xFFFFFFFF and OOM-panic.
- Pool semantics: backends in the primary (first) pool of a frontend get
  their configured weight; backends in secondary pools get weight 0. All
  backends are installed so higher layers can flip weights on failover
  without add/remove churn.
- Every individual change emits a DEBUG slog (vpp-lbsync-vip-add/del,
  vpp-lbsync-as-add/del, vpp-lbsync-as-weight). Start/done INFO logs
  carry a scope=all|vip label plus aggregate counts.

Global conf push: SetLBConf
- New SetLBConf(cfg) sends lb_conf with ipv4-src, ipv6-src, sticky-buckets,
  and flow-timeout. Called automatically on VPP (re)connect and after
  every config reload (via doReloadConfig). Results are cached on the
  Client so redundant pushes are silently skipped — only actual changes
  produce a vpp-lb-conf-set INFO log line.

Periodic drift reconciliation
- vpp.Client.lbSyncLoop runs in a goroutine tied to each VPP connection's
  lifetime. Its first tick is immediate (startup and post-reconnect
  sync quickly); subsequent ticks fire every vpp.lb.sync-interval from
  config (default 30s). Purpose: catch drift if something/someone
  modifies VPP state by hand. The loop uses a ConfigSource interface
  (satisfied by checker.Checker via its new Config() accessor) to avoid
  an import cycle with the checker package.

Config schema additions (maglev.vpp.lb)
- sync-interval: positive Go duration, default 30s.
- ipv4-src-address: REQUIRED. Used as the outer source for GRE4 encap
  to application servers. Missing this is a hard semantic error —
  maglevd --check exits 2 and the daemon refuses to start. VPP GRE
  needs a source address and every VIP we program uses GRE, so there
  is no meaningful config without it.
- ipv6-src-address: REQUIRED. Same treatment as ipv4-src-address.
- sticky-buckets-per-core: default 65536, must be a power of 2.
- flow-timeout: default 40s, must be a whole number of seconds in [1s, 120s].
- VPP validation runs at the end of convert() so structural errors in
  healthchecks/backends/frontends surface first — operators fix those,
  then get the VPP-specific requirements.

gRPC API
- New GetVPPLBState RPC returning VPPLBState: global conf + VIPs with
  ASes. Mirrors the read-path but strips fields irrelevant to our
  GRE-only deployment (srv_type, dscp, target_port).
- New SyncVPPLBState RPC with optional frontend_name. Unset → full sync
  (may remove stale VIPs). Set → single-VIP sync (never removes).
  Returns codes.NotFound for unknown frontends, codes.Unavailable when
  VPP integration is disabled or disconnected.

maglevc (CLI)
- New 'show vpp lbstate' command displaying the LB plugin state. VPP-only
  fields the dataplane irrelevant to GRE are suppressed. Per-AS lines use
  a key-value format ("address X  weight Y  flow-table-buckets Z")
  instead of a tabwriter column, which avoids the ANSI-color alignment
  issue we hit with mixed label/data rows.
- New 'sync vpp lbstate [<name>]' command. Without a name, triggers a
  full reconciliation; with a name, targets one frontend.
- Previous 'show vpp lb' renamed to 'show vpp lbstate' for consistency
  with the new sync command.

Test fixtures
- validConfig and all ad-hoc config_test.go fixtures that reach the end
  of convert() now include the two required vpp.lb src addresses.
- tests/01-maglevd/maglevd-lab/maglev.yaml gains a vpp.lb section so the
  robot integration tests can still load the config.
- cmd/maglevc/tree_test.go gains expected paths for the new commands.

Docs
- config-guide.md: new 'vpp' section in the basic structure, detailed
  vpp.lb field reference, noting ipv4/ipv6 src addresses as REQUIRED
  (hard error) with no defaults; example config updated.
- user-guide.md: documented 'show vpp info', 'show vpp lbstate',
  'sync vpp lbstate [<name>]', new --vpp-api-addr and --vpp-stats-addr
  flags, the vpp-lb-conf-set log line, and corrected the pause/resume
  description to reflect that pause cancels the probe goroutine.
- debian/maglev.yaml: example config gains a vpp.lb block with src
  addresses and commented optional overrides.

2026-04-12 10:58:44 +02:00

12 KiB

Raw Blame History

maglevd Configuration Guide

Overview

maglevd consumes a YAML configuration file of a specific format. Validation is performed in two stages:

Structural parsing: the YAML is unmarshalled into typed Go structs. Unknown fields and type mismatches are rejected immediately.
Semantic validation: cross-field and cross-object rules are enforced, for example ensuring that every backend referenced by a frontend exists, that address families are consistent within a frontend, and that IP source addresses are the correct family.

If you want to get started quickly, take a look at the example config.

Basic structure

The YAML configuration file has the following top-level structure:

maglev:
  healthchecker:
    [ Global health checker settings ]

  vpp:
    lb:
      [ VPP load-balancer integration settings ]

  healthchecks:
    my-check:
      [ Health check definition ]

  backends:
    my-backend:
      [ Backend definition ]

  frontends:
    my-frontend:
      [ Frontend (VIP) definition ]

All five sections live under the top-level maglev: key. The healthchecks, backends, and frontends sections are maps keyed by an arbitrary name of your choosing. Names must be unique within their section and are case-sensitive. The vpp section is required when maglevd has a working VPP connection — its lb.ipv4-src-address and lb.ipv6-src-address fields are mandatory and maglevd will refuse to start without them.

healthchecker

Global settings for the health checker engine.

transition-history: An integer >= 1 that controls how many state transitions are retained per backend for display via the gRPC API. Defaults to 5.
netns: The name of a Linux network namespace in which probes are executed. When empty or omitted, probes run in the current (default) network namespace. Useful when backends are reachable only through a dedicated dataplane namespace.

Example:

maglev:
  healthchecker:
    transition-history: 10
    netns: dataplane

vpp

Settings controlling the integration with a locally running VPP instance. The vpp section is a map with a single sub-section, lb. Both lb.ipv4-src-address and lb.ipv6-src-address are required — maglevd --check exits with a semantic error and the daemon refuses to start when either is missing, because VPP's GRE encap needs a source address and every VIP maglevd programs uses GRE.

lb.ipv4-src-address: Required. The IPv4 source address VPP uses when encapsulating IPv4 traffic into GRE4 tunnels to application servers. Must be a valid IPv4 address. No default.
lb.ipv6-src-address: Required. The IPv6 source address VPP uses when encapsulating IPv6 traffic into GRE6 tunnels. Must be a valid IPv6 address. No default.
lb.sync-interval: A positive Go duration (e.g. 30s, 1m) controlling how often maglevd reconciles the VPP load-balancer dataplane against its running configuration. On startup, an immediate full sync runs; subsequent syncs fire at this interval as long as the VPP connection is up. Defaults to 30s. The purpose is to catch drift — for example, a VIP added to VPP by hand — and bring VPP back in line with the maglev config.
lb.sticky-buckets-per-core: The number of buckets per worker thread in the established-flow table. Must be a power of 2. Defaults to 65536 (64k).
lb.flow-timeout: Idle time after which an established flow is removed from the table. Must be a whole number of seconds between 1s and 120s inclusive. Defaults to 40s.

These four values are pushed to VPP via lb_conf when maglevd connects to VPP and again after every config reload (whenever they change). A log line vpp-lb-conf-set records the effective values.

Example:

maglev:
  vpp:
    lb:
      sync-interval: 60s
      ipv4-src-address: 10.0.0.1
      ipv6-src-address: 2001:db8::1
      sticky-buckets-per-core: 65536
      flow-timeout: 40s

healthchecks

A named map of health check definitions. Each health check describes how to probe a backend. Backends reference health checks by name. The same health check can be reused across any number of backends; each backend is probed exactly once regardless of how many frontends reference it.

Common fields (all types):

type: Required. One of icmp, tcp, http, or https.
port: The destination port to probe. Required for tcp, http, and https. Must be omitted for icmp.
probe-ipv4-src: An optional IPv4 source address used when probing IPv4 backends. Must be an IPv4 address. When omitted, the OS chooses the source address.
probe-ipv6-src: An optional IPv6 source address used when probing IPv6 backends. Must be an IPv6 address. When omitted, the OS chooses the source address.
interval: Required. A positive Go duration string (e.g. 2s, 500ms) controlling how often a probe is sent when the backend is fully healthy (counter at maximum).
fast-interval: Optional. A positive duration used instead of interval while the backend's health counter is degraded (between down and up) or in unknown state. When omitted, interval is used.
down-interval: Optional. A positive duration used instead of interval while the backend is fully down (counter at zero). When omitted, interval is used. Setting this to a longer value reduces probe traffic to backends that are known to be offline.
timeout: Required. A positive duration after which an in-flight probe is abandoned and counted as a failure.
rise: The number of consecutive successes required to transition from down to up. Defaults to 2. Must be >= 1.
fall: The number of consecutive failures required to transition from up to down. Defaults to 3. Must be >= 1.

type: icmp

Sends an ICMP echo request (ping) to the backend address. Requires CAP_NET_RAW. No port may be specified. No params block is used.

healthchecks:
  ping:
    type: icmp
    probe-ipv4-src: 10.0.0.1
    probe-ipv6-src: 2001:db8::1
    interval: 2s
    timeout: 1s
    rise: 2
    fall: 3

type: tcp

Opens a TCP connection to the backend and immediately closes it upon success. Use params to optionally wrap the connection in TLS.

params.ssl: A boolean. When true, a TLS handshake is performed after the TCP connection is established. Defaults to false.
params.server-name: The TLS SNI hostname sent during the handshake. When omitted, the backend IP address is used.
params.insecure-skip-verify: A boolean. When true, the TLS certificate presented by the server is not verified. Defaults to false.

healthchecks:
  imaps-check:
    type: tcp
    port: 993
    params:
      ssl: true
      server-name: imaps.example.com
    interval: 5s
    timeout: 3s
    rise: 2
    fall: 3

type: http / https

Opens a TCP (or TLS for https) connection, sends an HTTP request, and evaluates the response code. An optional regexp can additionally match against the response body.

params.path: Required. The HTTP request path, e.g. /healthz.
params.host: The Host header value sent in the request. When omitted, the backend IP address is used.
params.response-code: The expected HTTP response code. Can be a single value ("200") or an inclusive range ("200-299"). Defaults to "200".
params.response-regexp: An optional Go regular expression matched against the response body. If specified, the body must match for the probe to succeed.
params.server-name: The TLS SNI hostname (https only). Defaults to the value of params.host if not set.
params.insecure-skip-verify: A boolean. Skip TLS certificate verification (https only). Defaults to false.

healthchecks:
  nginx-http:
    type: http
    port: 80
    params:
      path: /healthz
      host: nginx.example.com
      response-code: "200-204"
    interval: 2s
    fast-interval: 500ms
    down-interval: 30s
    timeout: 3s
    rise: 2
    fall: 3

  nginx-https:
    type: https
    port: 443
    params:
      path: /healthz
      host: nginx.example.com
      server-name: nginx.example.com
      insecure-skip-verify: false
    interval: 5s
    timeout: 3s

backends

A named map of individual backend servers. Each backend has a single IP address and optionally references a health check by name. Backends are probed exactly once, even if they appear in multiple frontends.

address: Required. The IPv4 or IPv6 address of this backend server.
healthcheck: The name of a health check defined in the healthchecks section. When empty or omitted, no probing is performed and the backend is assumed permanently healthy. This is useful for backends that are always available or managed by other means.
enabled: A boolean controlling whether this backend participates in any frontend. When false, the backend is excluded entirely and no probe goroutine is started. Defaults to true.

Examples:

backends:
  nginx0-ams:
    address: 198.51.100.10
    healthcheck: nginx-http
  nginx0-lon:
    address: 198.51.100.11
    healthcheck: nginx-http
  nginx0-draining:
    address: 198.51.100.12
    healthcheck: nginx-http
    enabled: false
  static-backend:
    address: 198.51.100.20
    # no healthcheck: assumed always healthy

frontends

A named map of virtual IPs (VIPs). Each frontend ties together a listener address with an ordered list of backend pools. The gRPC API exposes frontends by name.

description: An optional free-text string for documentation purposes.
address: Required. The IPv4 or IPv6 address of the VIP.
protocol: The IP protocol, either tcp or udp. When omitted, the frontend matches all traffic to the VIP address regardless of protocol. If port is specified, protocol must also be set.
port: The destination port of the VIP, an integer between 1 and 65535. Requires protocol to be set. When omitted, the frontend matches all ports. Note that the frontend port is independent of the healthcheck port: a frontend on port 443 may use a healthcheck that probes port 80.
pools: Required. A non-empty ordered list of pool objects. Pools express priority: the first pool is preferred; subsequent pools act as fallbacks. All backends across all pools in a frontend must have addresses of the same address family (all IPv4 or all IPv6).

Each pool has:

name: Required. A non-empty string identifying the pool (e.g. primary, fallback).
backends: A map of backend names to per-pool backend options. Every name must refer to an existing entry in the backends section.

Per-pool backend options:

weight: An integer between 0 and 100 (inclusive) expressing the relative weight of this backend within the pool. 0 keeps the backend in the pool but assigns it no traffic. Defaults to 100. Weight is per-pool, not global — the same backend can appear with different weights in different frontends.