VPP client (internal/vpp/)
- New package managing connections to both VPP API and stats sockets,
treated as a unit: if either drops, both are torn down and
re-established together.
- Run() loop: connect, fetch version via vpe.ShowVersion, read
/sys/boottime from the stats segment, log vpp-connect, then monitor
with control_ping every 10s. On failure, disconnect both, retry
after 5s.
- Registers as client name "vpp-maglev" (visible in VPP's
"show api clients").
- Flags: --vpp-api-addr (default /run/vpp/api.sock) and
--vpp-stats-addr (default /run/vpp/stats.sock). Empty api addr
disables VPP integration entirely.
gRPC / proto
- Add GetVPPInfo RPC returning VPPInfo: version, build_date,
build_directory, pid, boottime_ns, connecttime_ns. Both times are
unix timestamps in nanoseconds — the client computes durations
locally for display.
- Returns codes.Unavailable if VPP is disabled or not connected.
maglevc
- Add 'show vpp info' command displaying version, build-date,
build-dir, vpp-pid, vpp-boottime (with duration), and connected
time (with duration).
Add StateDisabled for operator-initiated disable, keeping StateRemoved
for backends that disappear during a config reload. Previously both
used StateRemoved, which was confusing: "removed" implies the backend
no longer exists in config, but a disabled backend is still present
and can be re-enabled on the fly.
- health: add StateDisabled with String() "disabled", Disable() method
with probe code "disabled". Record() rejects probes in all three
inactive states (paused, disabled, removed).
- checker: DisableBackend calls backend.Disable() instead of Remove().
- docs: healthchecks.md rewritten for pause (goroutine cancelled, not
just results discarded), and separate disabled/removed state rows.
user-guide.md updated to match.
- Makefile: add fixstyle target (gofmt -w .).
Prometheus metrics (internal/metrics/, cmd/maglevd/)
- New --metrics-addr flag (default :9091, env MAGLEV_METRICS_ADDR)
serving /metrics via promhttp.
- Gauge metrics scraped on demand via a custom prometheus.Collector:
maglev_backend_state, maglev_backend_health, maglev_backend_enabled,
maglev_frontend_pool_backend_weight.
- Inline counter/histogram metrics updated per probe:
maglev_probe_total (by backend, type, result, code),
maglev_probe_duration_seconds (by backend, type),
maglev_backend_transitions_total (by backend, from, to).
- StateSource interface in metrics package breaks the import cycle
with checker; checker.Checker satisfies it via GetBackendInfo.
Integration tests
- Run maglevd inside a containerlab node (debian:trixie-slim with
build/ bind-mounted) instead of on the host. Eliminates port
collisions with any host maglevd.
- maglevc commands run via docker exec into the maglevd container.
- Add 6 Prometheus test cases: endpoint reachable, all backends
report state=up, probe counters non-zero, duration histogram
populated, pool weights correct, transition counters present.
Pause semantics
- PauseBackend now cancels the probe goroutine so no HTTP/TCP/ICMP
traffic is sent while the backend is paused. Previously the goroutine
kept running and results were silently discarded.
- ResumeBackend launches a fresh probe goroutine on the existing worker,
preserving transition history. The backend re-enters unknown state.
Integration tests (tests/01-maglevd/)
- Containerlab topology with 3 nginx:alpine backends on a dedicated
management network (172.20.30.0/24) with static IPs.
- maglevd config with 200ms HTTP health-check interval for fast test
convergence (rise=2, fall=2).
- 8 test cases: deploy lab, start maglevd, all backends reach up,
nginx logs confirm probes arriving, pause stops probes (probe count
stable), resume restarts probes, disable stops probes, enable
restarts probes.
VPP dataplane test (tests/02-vpp-lb/)
- Rewrite 01-e2e-lab.robot to match the actual single-VPP topology:
test client-to-server ping through VPP bridge domains and verify
nginx is serving on all app servers. The previous version referenced
a non-existent topology file and tested OSPF/BFD between two VPP
nodes that don't exist in this lab.
Build infrastructure
- Add 'make robot-test' target with TEST= for suite selection.
- Add tests/.venv target for Robot Framework virtualenv.
- Make IMAGE optional in rf-run.sh.
- Add .gitignore entries for test output, venv, logs, and clab state.