This commit wires maglevd through to VPP's LB plugin end-to-end, using
locally-generated GoVPP bindings for the newer v2 API messages.
VPP binapi (vendored)
- New package internal/vpp/binapi/ containing lb, lb_types, ip_types, and
interface_types, generated from a local VPP build (~/src/vpp) via a new
'make vpp-binapi' target. GoVPP v0.12.0 upstream lacks the v2 messages we
need (lb_conf_get, lb_add_del_vip_v2, lb_add_del_as_v2, lb_as_v2_dump,
lb_as_set_weight), so we commit the generated output in-tree.
- All generated files go through our loggedChannel wrapper; every VPP API
send/receive is recorded at DEBUG via slog (vpp-api-send / vpp-api-recv /
vpp-api-send-multi / vpp-api-recv-multi) so the full wire-level trail is
auditable. NewAPIChannel is unexported — callers must use c.apiChannel().
Read path: GetLBState{All,VIP}
- GetLBStateAll returns a full snapshot (global conf + every VIP with its
attached application servers).
- GetLBStateVIP looks up a single VIP by (prefix, protocol, port) and
returns (nil, nil) when the VIP doesn't exist in VPP. This is the
efficient path for targeted updates on a busy LB.
- Helpers factored out: getLBConf, dumpAllVIPs, dumpASesForVIP, lookupVIP,
vipFromDetails.
Write path: SyncLBState{All,VIP}
- SyncLBStateAll reconciles every configured frontend with VPP: creates
missing VIPs, removes stale ones (with AS flush), and reconciles AS
membership and weights within VIPs that exist on both sides.
- SyncLBStateVIP targets a single frontend by name. Never removes VIPs.
Returns ErrFrontendNotFound (wrapped with the name) when the frontend
isn't in config, so callers can use errors.Is.
- Shared reconcileVIP helper does the per-VIP AS diff; removeVIP is used
only by the full-sync pass.
- LbAddDelVipV2 requests always set NewFlowsTableLength=1024. The .api
default=1024 annotation is only applied by VAT/CLI parsers, not wire-
level marshalling — sending 0 caused VPP to vec_validate with mask
0xFFFFFFFF and OOM-panic.
- Pool semantics: backends in the primary (first) pool of a frontend get
their configured weight; backends in secondary pools get weight 0. All
backends are installed so higher layers can flip weights on failover
without add/remove churn.
- Every individual change emits a DEBUG slog (vpp-lbsync-vip-add/del,
vpp-lbsync-as-add/del, vpp-lbsync-as-weight). Start/done INFO logs
carry a scope=all|vip label plus aggregate counts.
Global conf push: SetLBConf
- New SetLBConf(cfg) sends lb_conf with ipv4-src, ipv6-src, sticky-buckets,
and flow-timeout. Called automatically on VPP (re)connect and after
every config reload (via doReloadConfig). Results are cached on the
Client so redundant pushes are silently skipped — only actual changes
produce a vpp-lb-conf-set INFO log line.
Periodic drift reconciliation
- vpp.Client.lbSyncLoop runs in a goroutine tied to each VPP connection's
lifetime. Its first tick is immediate (startup and post-reconnect
sync quickly); subsequent ticks fire every vpp.lb.sync-interval from
config (default 30s). Purpose: catch drift if something/someone
modifies VPP state by hand. The loop uses a ConfigSource interface
(satisfied by checker.Checker via its new Config() accessor) to avoid
an import cycle with the checker package.
Config schema additions (maglev.vpp.lb)
- sync-interval: positive Go duration, default 30s.
- ipv4-src-address: REQUIRED. Used as the outer source for GRE4 encap
to application servers. Missing this is a hard semantic error —
maglevd --check exits 2 and the daemon refuses to start. VPP GRE
needs a source address and every VIP we program uses GRE, so there
is no meaningful config without it.
- ipv6-src-address: REQUIRED. Same treatment as ipv4-src-address.
- sticky-buckets-per-core: default 65536, must be a power of 2.
- flow-timeout: default 40s, must be a whole number of seconds in [1s, 120s].
- VPP validation runs at the end of convert() so structural errors in
healthchecks/backends/frontends surface first — operators fix those,
then get the VPP-specific requirements.
gRPC API
- New GetVPPLBState RPC returning VPPLBState: global conf + VIPs with
ASes. Mirrors the read-path but strips fields irrelevant to our
GRE-only deployment (srv_type, dscp, target_port).
- New SyncVPPLBState RPC with optional frontend_name. Unset → full sync
(may remove stale VIPs). Set → single-VIP sync (never removes).
Returns codes.NotFound for unknown frontends, codes.Unavailable when
VPP integration is disabled or disconnected.
maglevc (CLI)
- New 'show vpp lbstate' command displaying the LB plugin state. VPP-only
fields the dataplane irrelevant to GRE are suppressed. Per-AS lines use
a key-value format ("address X weight Y flow-table-buckets Z")
instead of a tabwriter column, which avoids the ANSI-color alignment
issue we hit with mixed label/data rows.
- New 'sync vpp lbstate [<name>]' command. Without a name, triggers a
full reconciliation; with a name, targets one frontend.
- Previous 'show vpp lb' renamed to 'show vpp lbstate' for consistency
with the new sync command.
Test fixtures
- validConfig and all ad-hoc config_test.go fixtures that reach the end
of convert() now include the two required vpp.lb src addresses.
- tests/01-maglevd/maglevd-lab/maglev.yaml gains a vpp.lb section so the
robot integration tests can still load the config.
- cmd/maglevc/tree_test.go gains expected paths for the new commands.
Docs
- config-guide.md: new 'vpp' section in the basic structure, detailed
vpp.lb field reference, noting ipv4/ipv6 src addresses as REQUIRED
(hard error) with no defaults; example config updated.
- user-guide.md: documented 'show vpp info', 'show vpp lbstate',
'sync vpp lbstate [<name>]', new --vpp-api-addr and --vpp-stats-addr
flags, the vpp-lb-conf-set log line, and corrected the pause/resume
description to reflect that pause cancels the probe goroutine.
- debian/maglev.yaml: example config gains a vpp.lb block with src
addresses and commented optional overrides.
Add StateDisabled for operator-initiated disable, keeping StateRemoved
for backends that disappear during a config reload. Previously both
used StateRemoved, which was confusing: "removed" implies the backend
no longer exists in config, but a disabled backend is still present
and can be re-enabled on the fly.
- health: add StateDisabled with String() "disabled", Disable() method
with probe code "disabled". Record() rejects probes in all three
inactive states (paused, disabled, removed).
- checker: DisableBackend calls backend.Disable() instead of Remove().
- docs: healthchecks.md rewritten for pause (goroutine cancelled, not
just results discarded), and separate disabled/removed state rows.
user-guide.md updated to match.
- Makefile: add fixstyle target (gofmt -w .).
Pause semantics
- PauseBackend now cancels the probe goroutine so no HTTP/TCP/ICMP
traffic is sent while the backend is paused. Previously the goroutine
kept running and results were silently discarded.
- ResumeBackend launches a fresh probe goroutine on the existing worker,
preserving transition history. The backend re-enters unknown state.
Integration tests (tests/01-maglevd/)
- Containerlab topology with 3 nginx:alpine backends on a dedicated
management network (172.20.30.0/24) with static IPs.
- maglevd config with 200ms HTTP health-check interval for fast test
convergence (rise=2, fall=2).
- 8 test cases: deploy lab, start maglevd, all backends reach up,
nginx logs confirm probes arriving, pause stops probes (probe count
stable), resume restarts probes, disable stops probes, enable
restarts probes.
VPP dataplane test (tests/02-vpp-lb/)
- Rewrite 01-e2e-lab.robot to match the actual single-VPP topology:
test client-to-server ping through VPP bridge domains and verify
nginx is serving on all app servers. The previous version referenced
a non-existent topology file and tested OSPF/BFD between two VPP
nodes that don't exist in this lab.
Build infrastructure
- Add 'make robot-test' target with TEST= for suite selection.
- Add tests/.venv target for Robot Framework virtualenv.
- Make IMAGE optional in rf-run.sh.
- Add .gitignore entries for test output, venv, logs, and clab state.
- Replaced flat `backends: [...]` list on frontends with an ordered `pools:`
list; each pool has a name and a map of backends with per-pool weights (0–100,
default 100). Pools express priority: first pool with a healthy backend wins.
- Removed global backend weight (was on the backend, now lives in the pool).
- Config validation enforces non-empty pools, non-empty pool names, weight
range, and consistent address families across all pools of a frontend.
- Added `PoolBackendInfo { name, weight }` and changed `PoolInfo.backends` from
`repeated string` to `repeated PoolBackendInfo` so weights are visible over
the API.
- Full interactive shell with readline, tab completion, and `?` inline help.
- Command tree parser (Walk) handles fixed keywords and dynamic slot nodes;
prefix matching with exact-match priority.
- Commands: `show version/frontends/frontend/backends/backend/healthchecks/
healthcheck`, `set backend <name> pause|resume`, `quit`/`exit`.
- `show frontend` output is hierarchical (pools → backends) with per-backend
weights and `[disabled]` notation; pool section uses fixed-width formatting
so ANSI color codes don't corrupt tabwriter alignment.
- `-color` flag (default true) wraps static field labels in dark-blue ANSI;
works correctly with tabwriter because all labels carry identical-length
escape sequences.
- `cmd/version.go` package holds `version`, `commit`, `date` vars set at build
time via `-ldflags -X`.
- `make build` / `make build-amd64` / `make build-arm64` all inject
`VERSION=0.1.1`, `COMMIT_HASH` (from `git rev-parse --short HEAD`), and
`DATE` (UTC ISO-8601).
- `maglevc` prints version on interactive startup and exposes `show version`.
- `maglevd` logs version/commit/date at startup; `-version` flag prints and exits.
- `doHTTPProbe` was building a `https://` target URL even though TLS was already
applied to the connection inside `inNetns`. `http.Transport` then wrapped the
connection in a second TLS layer, producing "http: server gave HTTP response
to HTTPS client". Fixed by always using `http://` in the target URL.
- Added `TestHTTPSProbe` using `httptest.NewTLSServer` to cover the full path.
- New `docs/user-guide.md`: maglevd flags/signals, maglevc commands, shell
completion, and command-tree parser walkthrough.
- New `docs/healthchecks.md`: state machine, rise/fall model, probe intervals,
all transition events with log examples.
- Updated `docs/config-guide.md`: pools design, removed global weight from
backends, updated all examples.
- Updated `README.md`: packaging table, build paths, corrected binary locations
(`/usr/sbin/maglevd`), config filename (`.yaml`).
- `debian/` directory contains `control.in`, `maglevd.service`, `default.maglev`,
`maglev.yaml` (example config), `conffiles`, `postinst`, `prerm`.
- `debian/build-deb.sh` stages a package tree and calls `dpkg-deb`; emits
`build/vpp-maglev_<version>~<commit>_<arch>.deb`.
- Cross-compiles for amd64 and arm64 in one `make pkg-deb` invocation.
- `maglevd` installed to `/usr/sbin/`, `maglevc` to `/usr/bin/`.
- Service reads `MAGLEV_CONFIG` from `/etc/default/maglev`
(default: `/etc/maglev/maglev.yaml`).
- Man pages `maglevd(8)` and `maglevc(1)` live in `docs/` and are gzip'd into
the package.
- All build output goes to `build/<arch>/`; `build/` is gitignored.