Commit Graph

5 Commits

Author SHA1 Message Date
Pim van Pelt
4347bb9b05 Bug fixes, config validation, SPA tightening, set-weight UI
This session covers three distinct arcs: correctness bug fixes in the
VPP sync path and frontend reducers, new config validation, and a
large polish pass on the web frontend (tighter layout, backend kebab
dialogs, live grouped-table, live config-reload re-sync).

 - encap for a VIP is now derived from the backend address family,
   not the VIP's. A v6 VIP with v4 backends is programmed as IP6_GRE4
   (not the buggy IP6_GRE6), matching the VPP LB plugin's
   requirement that encap reflects the tunnel inner family. desiredVIP
   gained an Encap field populated in desiredFromFrontend.
 - ActivePoolIndex now requires at least one backend in a pool to be
   BOTH in StateUp AND pb.Weight>0 before the pool counts as active.
   Previously a primary pool with every backend manually zeroed would
   still win over a fallback with weight=100, so fallback traffic
   never materialized. New TestActivePoolIndexWeightedFailover table
   pins the rule in five subcases.
 - SyncLBStateVIP gained a flushAddress parameter threaded through
   reconcileVIP; it forces flush=true on the setASWeight call for a
   specific backend regardless of the usual 0→N heuristic. Wires up
   the explicit [flush] knob the CLI exposes.

 - convertFrontend already enforced that backends within one frontend
   share a family. New cross-frontend pass validateVIPFamilyConsistency
   rejects configs where two frontends share a VIP address but carry
   backends in different families — VPP's LB plugin requires every
   VIP on a prefix to have the same encap type, so such a config
   would fail at lb_add_del_vip_v2 time with VNET_API_ERROR_INVALID
   _ARGUMENT (-73). Catching it at config load turns a silent
   runtime failure into a clear startup error.
 - Two new TestValidationErrors cases pin the behavior: mismatched
   families reject, same-family frontends on one VIP address allowed.

 - Proto adds `bool flush = 5` to SetWeightRequest. The RPC now
   drives a VIP sync immediately after mutating config (fixing the
   latent "weight change only takes effect at the next 30s periodic
   reconcile" gap), passing flushAddress = backend IP when req.Flush
   is true.
 - maglevc grows an optional [flush] token: `set frontend F pool P
   backend B weight N [flush]`. Implementation uses two Run closures
   (runSetFrontendPoolBackendWeight and -Flush) because the tree
   walker only puts slot tokens in args — literal keywords like
   `flush` advance the node but don't appear in the arg list.
 - docs/user-guide.md updated with the [flush] optional and a
   three-paragraph explainer of the graceful-drain vs. flush
   semantics at the VPP level.

 - checker.ListFrontends now sorts alphabetically to match the
   existing sort in ListBackends / ListHealthChecks — RPC responses
   no longer shuffle VIPs per call. cmd/frontend/client.go also
   sorts defensively in refreshAll so an old maglevd build renders
   alphabetically too.
 - backendFromProto was returning out.Transitions[n-1] as the
   LastTransition, but maglevd stores (and the proto carries)
   transitions newest-first, so [n-1] was actually the oldest.
   Reverse on read, which normalizes the client's Transitions slice
   to oldest-first and makes [n-1] genuinely the newest. LastTransition
   now points at the actual latest transition record.
 - applyBackendTransition (Go and TS) derives Enabled = state!="disabled"
   so the two fields stay in lockstep — closed a drift window where
   a recently re-enabled backend still rendered with a stuck
   [disabled] tag. The tag was later removed entirely since state
   and enabled carry the same information.

 - Layout tightened substantially: "FRONTENDS" panel header removed,
   zippy-summary and zippy-body paddings cut, backend-table row
   padding dropped to 2px, per-pool <h3> removed. Pools now live in
   a single consolidated table per frontend with a dedicated "pool"
   column that shows the pool name only on the first row of each
   group — classic grouped-table layout, maximally dense.
 - Description moved inline into the Zippy summary as muted italic
   text, freeing a vertical line per frontend card.
 - formatVIPAddress() helper renders IPv6 VIPs as [addr]:port and
   IPv4 as addr:port, matching RFC 3986 authority syntax.
 - Pools with effective_weight=0 on every backend (standby
   fallbacks, fully-drained primaries) render at opacity 0.35 on
   their non-actions cells; the kebab column stays at full contrast
   because its menu is still fully functional on standby backends.
 - Config-reload propagation: a maglevd config-reload-done log
   event triggers triggerConfigResync() on the frontend side —
   refreshAll() runs off the event-dispatch goroutine, then a
   BrowserEvent{Type:"resync"} is published through the broker.
   writeEvent emits type="resync" as a named SSE frame so the
   SPA's existing addEventListener("resync") handler picks it up
   and calls fetchAllState → replaceAll.
 - recomputeEffectiveWeights in stores/state.ts mirrors the
   server-side health.EffectiveWeights logic so the SPA keeps
   pool.effective_weight correct the moment a backend transitions,
   without waiting for the 30s refresh. Fixed a nasty bug where
   applyBackendEffectiveWeight wrote VIP-scoped vpp-lb-sync-as-*
   event weights into every frontend sharing the backend,
   corrupting frontends with different per-pool configured weights.
   The old log-event reducer was removed; applyConfiguredWeight is
   the narrower replacement used by the kebab set-weight flow.
 - applyBackendTransition calls recomputeEffectiveWeights after
   state updates so pool-failover transitions (primary ⇌ fallback)
   reflect instantly in the UI.

 - Confirmation dialogs via a new Modal primitive
   (Portal-mounted to document.body, escape/click-outside close,
   click-outside debounced on mousedown so mid-row-text-selection
   drags don't dismiss).
 - pause/resume/enable/disable each show a Modal with a consequence
   paragraph explaining what hits live traffic ("will keep existing
   flows", "will flush VPP's flow table", etc.). The disable commit
   button is styled btn-danger red.
 - set-weight action shows a Modal with a range slider (0-100,
   seeded from the current configured weight, accent-colored live
   numeric readout via <output>) plus a flush checkbox and a live-
   swapping note/warn paragraph describing what will happen. On
   commit, the SPA also updates its local store via
   applyConfiguredWeight so the operator sees the new weight
   immediately without waiting for the next refresh.

 - ProbeHeartbeat is now state-aware: ▶ (play) at rest for up/
   down/unknown backends, ⏸ (pause) for paused, ⏹ (stop) for
   disabled/removed, ❤️ (heart) during an in-flight probe.
 - Drop the probe-done event listener — fast probes (<10ms)
   could fire probe-done in the same render tick as probe-start
   and the heart would never visibly paint. Each probe-start now
   runs a fixed 400ms scale-pop animation on a timer; subsequent
   probe-start events reset the timer, so fast cadences produce a
   continuous heart pulse.
 - Fixed wrapper box (16x14 px, overflow hidden) so the row
   doesn't jiggle when the glyph swaps between the narrow ▶/⏸/⏹
   text glyphs and the wider ❤️ emoji.

 - Brand wordmark changed from "maglev" to "vpp-maglev" and wrapped
   in an <a> linking to https://git.ipng.ch/ipng/vpp-maglev. Logo
   link changed to https://ipng.ch/. Both open in a new tab with
   rel="noopener".
 - .gitignore fix: `frontend`, `maglevc`, `maglevd` were matching
   ANY file or directory with those names anywhere in the tree,
   silently ignoring cmd/frontend and friends. Anchored with
   leading slashes so only repo-root build artifacts match.
2026-04-12 23:06:42 +02:00
Pim van Pelt
284b4cc9a4 New maglev-frontend component; promote LB sync events to INFO
Introduces maglev-frontend, a responsive, real-time web dashboard for one
or more running maglevd instances. Source lives at cmd/frontend/; the
built binary is maglev-frontend. It is a single Go process with the
SolidJS SPA embedded via //go:embed — no runtime file dependencies.

Architecture
 - One persistent gRPC connection per configured maglevd (-server A,B,C).
   Each connection runs three background loops: a WatchEvents stream
   subscribed at log_level=debug for live events, a 30s refresh loop as
   a safety net for drift, and a 5s health loop that surfaces connection
   drops quickly.
 - In-process pub/sub broker with a 30s / 2000-event replay ring using
   <epoch>-<seq> monotonic IDs. Short browser reconnects (nginx idle,
   wifi flap, laptop wake) silently replay buffered events via the
   EventSource Last-Event-ID header; longer outages or frontend restarts
   fall through to a "resync" event that triggers a full state refetch.
 - HTTP surface: /view/ (SPA), /view/api/state, /view/api/state/{name},
   /view/api/maglevds, /view/api/version, /view/api/events (SSE),
   /healthz, and an /admin/* placeholder returning 501 for a future
   basic-auth mutation surface.
 - SSE handler follows the full operational checklist: retry hint, 15s
   : ping heartbeat, Flush after every write, r.Context().Done() teardown,
   X-Accel-Buffering: no, and no gzip.

SolidJS SPA (cmd/frontend/web/, Vite + TypeScript)
 - solid-js/store for a reactive per-maglevd state tree; reducers apply
   backend transitions, maglevd-status flips, and resync refetches.
 - Scope selector tabs for multi-maglevd support, per-maglevd frontend
   cards with pool tables showing state, configured weight, effective
   weight, and last-transition age.
 - ProbeHeartbeat component turns a middle-dot into ❤️ on probe-start and
   back on probe-done, driven by real log events; fixed-size wrapper so
   the emoji swap doesn't jiggle the row.
 - Flash wrapper animates any primitive on change (1s yellow fade via
   Web Animations API, skipped on first mount). Wired into the state
   badge, configured weight, and effective weight columns.
 - DebugPanel: chronological rolling event tail with tail-style auto-
   scroll, pause/resume, and scope/firehose filter. Syntactic highlight
   for vpp-lb-sync-* events with fixed-order attribute formatting.
 - Live effective_weight updates: vpp-lb-sync-as-added/removed/weight-
   updated log events are routed through a reducer that walks the
   snapshot's pool rows and sets effective_weight on every match
   without waiting for the 30s refresh.
 - Header shows build version + commit with build date in a tooltip,
   fetched once from /view/api/version on mount.
 - Prettier wired in as the web-side fixstyle; make fixstyle now tidies
   both Go and web in one shot via a new fixstyle-web target.

Per-mutation VPP LB sync logging
 - Promotes the addVIP/delVIP/addAS/delAS/setASWeight helpers from
   slog.Debug to slog.Info and renames them from vpp-lbsync-* to
   vpp-lb-sync-{vip-added,vip-removed,as-added,as-removed,as-weight-
   updated}. Matching rename for vpp-lb-sync-start / -done / -error /
   -vip-recreate. The Prometheus metric name (maglev_vpp_lbsync_total)
   is left alone to preserve dashboards.
 - setASWeight now takes the prior weight so the event can emit
   from=X to=Y and the UI can show the delta.
 - The vip field in every event is the bare address (no /32 or /128
   mask), matching the CLI output style.
 - Any listener on the gRPC WatchEvents stream — CLI watch events or
   maglev-frontend — now sees every VIP/AS dataplane change in real
   time without needing to raise the log level.

Build and tooling
 - Makefile: maglev-frontend added to BINARIES; build / build-amd64 /
   build-arm64 emit the binary alongside maglevd and maglevc. A new
   maglev-frontend-web target rebuilds the SolidJS bundle via npm.
 - web/dist/ is tracked so a bare `go build` keeps working for Go-only
   contributors and CI.
 - .gitignore skips cmd/frontend/web/node_modules/.

Stability fixes
 - maglevd's WatchEvents synthetic replay events (from==to, at_unix_ns=0)
   were corrupting the frontend's LastTransition cache with at=0,
   rendering as "20555d ago" in the browser. Client now skips synthetic
   events: the cache comes from refreshAll and doesn't need them.
 - Frontends, Backends, and HealthChecks are now served in the order
   returned by the corresponding List* RPC instead of Go map iteration
   order, so reloads and refreshes keep the SPA stable.
2026-04-12 17:48:31 +02:00
Pim van Pelt
8bde00eb61 Fix pause to cancel probe goroutine; add Robot Framework integration tests
Pause semantics
- PauseBackend now cancels the probe goroutine so no HTTP/TCP/ICMP
  traffic is sent while the backend is paused. Previously the goroutine
  kept running and results were silently discarded.
- ResumeBackend launches a fresh probe goroutine on the existing worker,
  preserving transition history. The backend re-enters unknown state.

Integration tests (tests/01-maglevd/)
- Containerlab topology with 3 nginx:alpine backends on a dedicated
  management network (172.20.30.0/24) with static IPs.
- maglevd config with 200ms HTTP health-check interval for fast test
  convergence (rise=2, fall=2).
- 8 test cases: deploy lab, start maglevd, all backends reach up,
  nginx logs confirm probes arriving, pause stops probes (probe count
  stable), resume restarts probes, disable stops probes, enable
  restarts probes.

VPP dataplane test (tests/02-vpp-lb/)
- Rewrite 01-e2e-lab.robot to match the actual single-VPP topology:
  test client-to-server ping through VPP bridge domains and verify
  nginx is serving on all app servers. The previous version referenced
  a non-existent topology file and tested OSPF/BFD between two VPP
  nodes that don't exist in this lab.

Build infrastructure
- Add 'make robot-test' target with TEST= for suite selection.
- Add tests/.venv target for Robot Framework virtualenv.
- Make IMAGE optional in rf-run.sh.
- Add .gitignore entries for test output, venv, logs, and clab state.
2026-04-11 20:19:36 +02:00
Pim van Pelt
d612086a5f Pools, CLI, versioning, Debian packaging, HTTPS fix
- Replaced flat `backends: [...]` list on frontends with an ordered `pools:`
  list; each pool has a name and a map of backends with per-pool weights (0–100,
  default 100). Pools express priority: first pool with a healthy backend wins.
- Removed global backend weight (was on the backend, now lives in the pool).
- Config validation enforces non-empty pools, non-empty pool names, weight
  range, and consistent address families across all pools of a frontend.

- Added `PoolBackendInfo { name, weight }` and changed `PoolInfo.backends` from
  `repeated string` to `repeated PoolBackendInfo` so weights are visible over
  the API.

- Full interactive shell with readline, tab completion, and `?` inline help.
- Command tree parser (Walk) handles fixed keywords and dynamic slot nodes;
  prefix matching with exact-match priority.
- Commands: `show version/frontends/frontend/backends/backend/healthchecks/
  healthcheck`, `set backend <name> pause|resume`, `quit`/`exit`.
- `show frontend` output is hierarchical (pools → backends) with per-backend
  weights and `[disabled]` notation; pool section uses fixed-width formatting
  so ANSI color codes don't corrupt tabwriter alignment.
- `-color` flag (default true) wraps static field labels in dark-blue ANSI;
  works correctly with tabwriter because all labels carry identical-length
  escape sequences.

- `cmd/version.go` package holds `version`, `commit`, `date` vars set at build
  time via `-ldflags -X`.
- `make build` / `make build-amd64` / `make build-arm64` all inject
  `VERSION=0.1.1`, `COMMIT_HASH` (from `git rev-parse --short HEAD`), and
  `DATE` (UTC ISO-8601).
- `maglevc` prints version on interactive startup and exposes `show version`.
- `maglevd` logs version/commit/date at startup; `-version` flag prints and exits.

- `doHTTPProbe` was building a `https://` target URL even though TLS was already
  applied to the connection inside `inNetns`. `http.Transport` then wrapped the
  connection in a second TLS layer, producing "http: server gave HTTP response
  to HTTPS client". Fixed by always using `http://` in the target URL.
- Added `TestHTTPSProbe` using `httptest.NewTLSServer` to cover the full path.

- New `docs/user-guide.md`: maglevd flags/signals, maglevc commands, shell
  completion, and command-tree parser walkthrough.
- New `docs/healthchecks.md`: state machine, rise/fall model, probe intervals,
  all transition events with log examples.
- Updated `docs/config-guide.md`: pools design, removed global weight from
  backends, updated all examples.
- Updated `README.md`: packaging table, build paths, corrected binary locations
  (`/usr/sbin/maglevd`), config filename (`.yaml`).

- `debian/` directory contains `control.in`, `maglevd.service`, `default.maglev`,
  `maglev.yaml` (example config), `conffiles`, `postinst`, `prerm`.
- `debian/build-deb.sh` stages a package tree and calls `dpkg-deb`; emits
  `build/vpp-maglev_<version>~<commit>_<arch>.deb`.
- Cross-compiles for amd64 and arm64 in one `make pkg-deb` invocation.
- `maglevd` installed to `/usr/sbin/`, `maglevc` to `/usr/bin/`.
- Service reads `MAGLEV_CONFIG` from `/etc/default/maglev`
  (default: `/etc/maglev/maglev.yaml`).
- Man pages `maglevd(8)` and `maglevc(1)` live in `docs/` and are gzip'd into
  the package.
- All build output goes to `build/<arch>/`; `build/` is gitignored.
2026-04-11 12:18:17 +02:00
Pim van Pelt
b84b3274b1 Initial revisin of healthchecker, inspired by HAProxy 2026-04-10 17:30:44 +02:00