4347bb9b0517310624ebc95e69da7ecf21b6cf98
9 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
25e9d79aba |
Frontend: live clocks, admin mode, backend actions; packaging polish
Builds on the maglev-frontend component introduced in
|
||
|
|
284b4cc9a4 |
New maglev-frontend component; promote LB sync events to INFO
Introduces maglev-frontend, a responsive, real-time web dashboard for one
or more running maglevd instances. Source lives at cmd/frontend/; the
built binary is maglev-frontend. It is a single Go process with the
SolidJS SPA embedded via //go:embed — no runtime file dependencies.
Architecture
- One persistent gRPC connection per configured maglevd (-server A,B,C).
Each connection runs three background loops: a WatchEvents stream
subscribed at log_level=debug for live events, a 30s refresh loop as
a safety net for drift, and a 5s health loop that surfaces connection
drops quickly.
- In-process pub/sub broker with a 30s / 2000-event replay ring using
<epoch>-<seq> monotonic IDs. Short browser reconnects (nginx idle,
wifi flap, laptop wake) silently replay buffered events via the
EventSource Last-Event-ID header; longer outages or frontend restarts
fall through to a "resync" event that triggers a full state refetch.
- HTTP surface: /view/ (SPA), /view/api/state, /view/api/state/{name},
/view/api/maglevds, /view/api/version, /view/api/events (SSE),
/healthz, and an /admin/* placeholder returning 501 for a future
basic-auth mutation surface.
- SSE handler follows the full operational checklist: retry hint, 15s
: ping heartbeat, Flush after every write, r.Context().Done() teardown,
X-Accel-Buffering: no, and no gzip.
SolidJS SPA (cmd/frontend/web/, Vite + TypeScript)
- solid-js/store for a reactive per-maglevd state tree; reducers apply
backend transitions, maglevd-status flips, and resync refetches.
- Scope selector tabs for multi-maglevd support, per-maglevd frontend
cards with pool tables showing state, configured weight, effective
weight, and last-transition age.
- ProbeHeartbeat component turns a middle-dot into ❤️ on probe-start and
back on probe-done, driven by real log events; fixed-size wrapper so
the emoji swap doesn't jiggle the row.
- Flash wrapper animates any primitive on change (1s yellow fade via
Web Animations API, skipped on first mount). Wired into the state
badge, configured weight, and effective weight columns.
- DebugPanel: chronological rolling event tail with tail-style auto-
scroll, pause/resume, and scope/firehose filter. Syntactic highlight
for vpp-lb-sync-* events with fixed-order attribute formatting.
- Live effective_weight updates: vpp-lb-sync-as-added/removed/weight-
updated log events are routed through a reducer that walks the
snapshot's pool rows and sets effective_weight on every match
without waiting for the 30s refresh.
- Header shows build version + commit with build date in a tooltip,
fetched once from /view/api/version on mount.
- Prettier wired in as the web-side fixstyle; make fixstyle now tidies
both Go and web in one shot via a new fixstyle-web target.
Per-mutation VPP LB sync logging
- Promotes the addVIP/delVIP/addAS/delAS/setASWeight helpers from
slog.Debug to slog.Info and renames them from vpp-lbsync-* to
vpp-lb-sync-{vip-added,vip-removed,as-added,as-removed,as-weight-
updated}. Matching rename for vpp-lb-sync-start / -done / -error /
-vip-recreate. The Prometheus metric name (maglev_vpp_lbsync_total)
is left alone to preserve dashboards.
- setASWeight now takes the prior weight so the event can emit
from=X to=Y and the UI can show the delta.
- The vip field in every event is the bare address (no /32 or /128
mask), matching the CLI output style.
- Any listener on the gRPC WatchEvents stream — CLI watch events or
maglev-frontend — now sees every VIP/AS dataplane change in real
time without needing to raise the log level.
Build and tooling
- Makefile: maglev-frontend added to BINARIES; build / build-amd64 /
build-arm64 emit the binary alongside maglevd and maglevc. A new
maglev-frontend-web target rebuilds the SolidJS bundle via npm.
- web/dist/ is tracked so a bare `go build` keeps working for Go-only
contributors and CI.
- .gitignore skips cmd/frontend/web/node_modules/.
Stability fixes
- maglevd's WatchEvents synthetic replay events (from==to, at_unix_ns=0)
were corrupting the frontend's LastTransition cache with at=0,
rendering as "20555d ago" in the browser. Client now skips synthetic
events: the cache comes from refreshAll and doesn't need them.
- Frontends, Backends, and HealthChecks are now served in the order
returned by the corresponding List* RPC instead of Go map iteration
order, so reloads and refreshes keep the SPA stable.
|
||
|
|
d3c5c86037 |
VPP load-balancer dataplane integration: state, sync, and global conf
This commit wires maglevd through to VPP's LB plugin end-to-end, using
locally-generated GoVPP bindings for the newer v2 API messages.
VPP binapi (vendored)
- New package internal/vpp/binapi/ containing lb, lb_types, ip_types, and
interface_types, generated from a local VPP build (~/src/vpp) via a new
'make vpp-binapi' target. GoVPP v0.12.0 upstream lacks the v2 messages we
need (lb_conf_get, lb_add_del_vip_v2, lb_add_del_as_v2, lb_as_v2_dump,
lb_as_set_weight), so we commit the generated output in-tree.
- All generated files go through our loggedChannel wrapper; every VPP API
send/receive is recorded at DEBUG via slog (vpp-api-send / vpp-api-recv /
vpp-api-send-multi / vpp-api-recv-multi) so the full wire-level trail is
auditable. NewAPIChannel is unexported — callers must use c.apiChannel().
Read path: GetLBState{All,VIP}
- GetLBStateAll returns a full snapshot (global conf + every VIP with its
attached application servers).
- GetLBStateVIP looks up a single VIP by (prefix, protocol, port) and
returns (nil, nil) when the VIP doesn't exist in VPP. This is the
efficient path for targeted updates on a busy LB.
- Helpers factored out: getLBConf, dumpAllVIPs, dumpASesForVIP, lookupVIP,
vipFromDetails.
Write path: SyncLBState{All,VIP}
- SyncLBStateAll reconciles every configured frontend with VPP: creates
missing VIPs, removes stale ones (with AS flush), and reconciles AS
membership and weights within VIPs that exist on both sides.
- SyncLBStateVIP targets a single frontend by name. Never removes VIPs.
Returns ErrFrontendNotFound (wrapped with the name) when the frontend
isn't in config, so callers can use errors.Is.
- Shared reconcileVIP helper does the per-VIP AS diff; removeVIP is used
only by the full-sync pass.
- LbAddDelVipV2 requests always set NewFlowsTableLength=1024. The .api
default=1024 annotation is only applied by VAT/CLI parsers, not wire-
level marshalling — sending 0 caused VPP to vec_validate with mask
0xFFFFFFFF and OOM-panic.
- Pool semantics: backends in the primary (first) pool of a frontend get
their configured weight; backends in secondary pools get weight 0. All
backends are installed so higher layers can flip weights on failover
without add/remove churn.
- Every individual change emits a DEBUG slog (vpp-lbsync-vip-add/del,
vpp-lbsync-as-add/del, vpp-lbsync-as-weight). Start/done INFO logs
carry a scope=all|vip label plus aggregate counts.
Global conf push: SetLBConf
- New SetLBConf(cfg) sends lb_conf with ipv4-src, ipv6-src, sticky-buckets,
and flow-timeout. Called automatically on VPP (re)connect and after
every config reload (via doReloadConfig). Results are cached on the
Client so redundant pushes are silently skipped — only actual changes
produce a vpp-lb-conf-set INFO log line.
Periodic drift reconciliation
- vpp.Client.lbSyncLoop runs in a goroutine tied to each VPP connection's
lifetime. Its first tick is immediate (startup and post-reconnect
sync quickly); subsequent ticks fire every vpp.lb.sync-interval from
config (default 30s). Purpose: catch drift if something/someone
modifies VPP state by hand. The loop uses a ConfigSource interface
(satisfied by checker.Checker via its new Config() accessor) to avoid
an import cycle with the checker package.
Config schema additions (maglev.vpp.lb)
- sync-interval: positive Go duration, default 30s.
- ipv4-src-address: REQUIRED. Used as the outer source for GRE4 encap
to application servers. Missing this is a hard semantic error —
maglevd --check exits 2 and the daemon refuses to start. VPP GRE
needs a source address and every VIP we program uses GRE, so there
is no meaningful config without it.
- ipv6-src-address: REQUIRED. Same treatment as ipv4-src-address.
- sticky-buckets-per-core: default 65536, must be a power of 2.
- flow-timeout: default 40s, must be a whole number of seconds in [1s, 120s].
- VPP validation runs at the end of convert() so structural errors in
healthchecks/backends/frontends surface first — operators fix those,
then get the VPP-specific requirements.
gRPC API
- New GetVPPLBState RPC returning VPPLBState: global conf + VIPs with
ASes. Mirrors the read-path but strips fields irrelevant to our
GRE-only deployment (srv_type, dscp, target_port).
- New SyncVPPLBState RPC with optional frontend_name. Unset → full sync
(may remove stale VIPs). Set → single-VIP sync (never removes).
Returns codes.NotFound for unknown frontends, codes.Unavailable when
VPP integration is disabled or disconnected.
maglevc (CLI)
- New 'show vpp lbstate' command displaying the LB plugin state. VPP-only
fields the dataplane irrelevant to GRE are suppressed. Per-AS lines use
a key-value format ("address X weight Y flow-table-buckets Z")
instead of a tabwriter column, which avoids the ANSI-color alignment
issue we hit with mixed label/data rows.
- New 'sync vpp lbstate [<name>]' command. Without a name, triggers a
full reconciliation; with a name, targets one frontend.
- Previous 'show vpp lb' renamed to 'show vpp lbstate' for consistency
with the new sync command.
Test fixtures
- validConfig and all ad-hoc config_test.go fixtures that reach the end
of convert() now include the two required vpp.lb src addresses.
- tests/01-maglevd/maglevd-lab/maglev.yaml gains a vpp.lb section so the
robot integration tests can still load the config.
- cmd/maglevc/tree_test.go gains expected paths for the new commands.
Docs
- config-guide.md: new 'vpp' section in the basic structure, detailed
vpp.lb field reference, noting ipv4/ipv6 src addresses as REQUIRED
(hard error) with no defaults; example config updated.
- user-guide.md: documented 'show vpp info', 'show vpp lbstate',
'sync vpp lbstate [<name>]', new --vpp-api-addr and --vpp-stats-addr
flags, the vpp-lb-conf-set log line, and corrected the pause/resume
description to reflect that pause cancels the probe goroutine.
- debian/maglev.yaml: example config gains a vpp.lb block with src
addresses and commented optional overrides.
|
||
|
|
1815675fb6 |
Distinguish disabled from removed backend state; add make fixstyle
Add StateDisabled for operator-initiated disable, keeping StateRemoved for backends that disappear during a config reload. Previously both used StateRemoved, which was confusing: "removed" implies the backend no longer exists in config, but a disabled backend is still present and can be re-enabled on the fly. - health: add StateDisabled with String() "disabled", Disable() method with probe code "disabled". Record() rejects probes in all three inactive states (paused, disabled, removed). - checker: DisableBackend calls backend.Disable() instead of Remove(). - docs: healthchecks.md rewritten for pause (goroutine cancelled, not just results discarded), and separate disabled/removed state rows. user-guide.md updated to match. - Makefile: add fixstyle target (gofmt -w .). |
||
|
|
8bde00eb61 |
Fix pause to cancel probe goroutine; add Robot Framework integration tests
Pause semantics - PauseBackend now cancels the probe goroutine so no HTTP/TCP/ICMP traffic is sent while the backend is paused. Previously the goroutine kept running and results were silently discarded. - ResumeBackend launches a fresh probe goroutine on the existing worker, preserving transition history. The backend re-enters unknown state. Integration tests (tests/01-maglevd/) - Containerlab topology with 3 nginx:alpine backends on a dedicated management network (172.20.30.0/24) with static IPs. - maglevd config with 200ms HTTP health-check interval for fast test convergence (rise=2, fall=2). - 8 test cases: deploy lab, start maglevd, all backends reach up, nginx logs confirm probes arriving, pause stops probes (probe count stable), resume restarts probes, disable stops probes, enable restarts probes. VPP dataplane test (tests/02-vpp-lb/) - Rewrite 01-e2e-lab.robot to match the actual single-VPP topology: test client-to-server ping through VPP bridge domains and verify nginx is serving on all app servers. The previous version referenced a non-existent topology file and tested OSPF/BFD between two VPP nodes that don't exist in this lab. Build infrastructure - Add 'make robot-test' target with TEST= for suite selection. - Add tests/.venv target for Robot Framework virtualenv. - Make IMAGE optional in rf-run.sh. - Add .gitignore entries for test output, venv, logs, and clab state. |
||
|
|
d612086a5f |
Pools, CLI, versioning, Debian packaging, HTTPS fix
- Replaced flat `backends: [...]` list on frontends with an ordered `pools:`
list; each pool has a name and a map of backends with per-pool weights (0–100,
default 100). Pools express priority: first pool with a healthy backend wins.
- Removed global backend weight (was on the backend, now lives in the pool).
- Config validation enforces non-empty pools, non-empty pool names, weight
range, and consistent address families across all pools of a frontend.
- Added `PoolBackendInfo { name, weight }` and changed `PoolInfo.backends` from
`repeated string` to `repeated PoolBackendInfo` so weights are visible over
the API.
- Full interactive shell with readline, tab completion, and `?` inline help.
- Command tree parser (Walk) handles fixed keywords and dynamic slot nodes;
prefix matching with exact-match priority.
- Commands: `show version/frontends/frontend/backends/backend/healthchecks/
healthcheck`, `set backend <name> pause|resume`, `quit`/`exit`.
- `show frontend` output is hierarchical (pools → backends) with per-backend
weights and `[disabled]` notation; pool section uses fixed-width formatting
so ANSI color codes don't corrupt tabwriter alignment.
- `-color` flag (default true) wraps static field labels in dark-blue ANSI;
works correctly with tabwriter because all labels carry identical-length
escape sequences.
- `cmd/version.go` package holds `version`, `commit`, `date` vars set at build
time via `-ldflags -X`.
- `make build` / `make build-amd64` / `make build-arm64` all inject
`VERSION=0.1.1`, `COMMIT_HASH` (from `git rev-parse --short HEAD`), and
`DATE` (UTC ISO-8601).
- `maglevc` prints version on interactive startup and exposes `show version`.
- `maglevd` logs version/commit/date at startup; `-version` flag prints and exits.
- `doHTTPProbe` was building a `https://` target URL even though TLS was already
applied to the connection inside `inNetns`. `http.Transport` then wrapped the
connection in a second TLS layer, producing "http: server gave HTTP response
to HTTPS client". Fixed by always using `http://` in the target URL.
- Added `TestHTTPSProbe` using `httptest.NewTLSServer` to cover the full path.
- New `docs/user-guide.md`: maglevd flags/signals, maglevc commands, shell
completion, and command-tree parser walkthrough.
- New `docs/healthchecks.md`: state machine, rise/fall model, probe intervals,
all transition events with log examples.
- Updated `docs/config-guide.md`: pools design, removed global weight from
backends, updated all examples.
- Updated `README.md`: packaging table, build paths, corrected binary locations
(`/usr/sbin/maglevd`), config filename (`.yaml`).
- `debian/` directory contains `control.in`, `maglevd.service`, `default.maglev`,
`maglev.yaml` (example config), `conffiles`, `postinst`, `prerm`.
- `debian/build-deb.sh` stages a package tree and calls `dpkg-deb`; emits
`build/vpp-maglev_<version>~<commit>_<arch>.deb`.
- Cross-compiles for amd64 and arm64 in one `make pkg-deb` invocation.
- `maglevd` installed to `/usr/sbin/`, `maglevc` to `/usr/bin/`.
- Service reads `MAGLEV_CONFIG` from `/etc/default/maglev`
(default: `/etc/maglev/maglev.yaml`).
- Man pages `maglevd(8)` and `maglevc(1)` live in `docs/` and are gzip'd into
the package.
- All build output goes to `build/<arch>/`; `build/` is gitignored.
|
||
|
|
46e78ec36f | First stab at maglevc | ||
|
|
040d6f5853 | Revision: Rename to 'maglevd'; Refactor config structure | ||
|
|
b84b3274b1 | Initial revisin of healthchecker, inspired by HAProxy |