diff --git a/README.md b/README.md
index 1d9022e..72ec682 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
 
 Health checker, gRPC control plane, CLI, and web dashboard for the VPP
 `lb` (load-balancer) plugin. Runs as a set of three binaries under one
-Debian package:
+Debian package, plus an out-of-band tester built alongside:
 
 - **`maglevd`** — the long-running health-checker daemon. Probes backends
   (HTTP, TCP, ICMP), tracks their aggregate state, programs the VPP
@@ -14,6 +14,12 @@ Debian package:
   SolidJS Single-Page-App; connects to one or more maglevds over gRPC and
   serves a live HTTP view (read-only `/view/` and optional basic-auth
   `/admin/` with mutating commands).
+- **`maglevt`** — optional out-of-band VIP probe TUI. Reads a
+  `maglev.yaml` and hits each frontend on a live HTTP path, reporting
+  latency and a configurable response-header tally so operators can see
+  failover as it happens. Does not talk gRPC; useful for validating a
+  `maglevd` restart end-to-end from a client perspective. Built by
+  `make` but not installed by the Debian package.
 
 ## Build and install
 
@@ -94,6 +100,9 @@ deployments.
 
 ## Documentation
 
+- [docs/design.md](docs/design.md) — architecture, components, and
+  numbered functional / non-functional requirements. Start here if
+  you want the big picture before diving into the code.
 - A minimal configuration file in
   [debian/maglev.yaml](debian/maglev.yaml) shows every knob.
 - [docs/user-guide.md](docs/user-guide.md) — flags, signals, and
diff --git a/docs/design.md b/docs/design.md
new file mode 100644
index 0000000..22c43b3
--- /dev/null
+++ b/docs/design.md
@@ -0,0 +1,1076 @@
+# vpp-maglev Design Document
+
+## Metadata
+
+| | |
+| --- | --- |
+| **Status** | Retrofit — describes shipped behavior as of `v0.9.5` |
+| **Author** | Pim van Pelt `<pim@ipng.ch>` |
+| **Last updated** | 2026-04-15 |
+| **Audience** | Operators and contributors who will read the source tree next |
+
+The key words **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**, and
+**MAY** are used as described in
+[RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119), and are
+reserved in this document for requirements that are actually enforced
+in code or by an external dependency. Plain-language descriptions of
+what the system or an operator can do are written in lowercase —
+"can", "will", "does" — and should not be read as normative.
+
+## Summary
+
+`vpp-maglev` is a control plane for the VPP `lb` (Maglev load
+balancer) plugin. A single daemon — `maglevd` — probes a fleet of
+backends, maintains an authoritative view of their health, and
+programs the VPP dataplane so that traffic hashed to a given VIP
+lands only on healthy backends. Operators drive the system through
+`maglevc` (an interactive CLI) or `maglevd-frontend` (a read-only
+web dashboard with an optional authenticated admin surface). A small
+companion binary, `maglevt`, validates VIPs from outside the control
+plane by sending live HTTP probes and reporting failover behavior.
+
+## Background
+
+VPP's `lb` plugin implements Maglev consistent hashing inside the
+dataplane: a VIP is backed by a pool of Application Servers (ASes),
+each with an integer weight in `[0, 100]`, and incoming flows are
+hashed onto a bucket ring so that weight changes disturb as few
+existing flows as possible. The plugin knows nothing about backend
+health; if an AS dies while it holds buckets, traffic to those
+buckets is black-holed until something external tells `lb` to remove
+or re-weight the AS.
+
+`vpp-maglev` is that external thing. Before `vpp-maglev`, operators
+maintained VIP configurations by hand and reacted to incidents with
+`vppctl`. The project replaces that loop with a daemon that owns the
+health story, reconciles it with the dataplane, and exposes the
+result through a uniform gRPC API so that CLIs, dashboards, and
+scripts all read the same source of truth.
+
+## Goals and Non-Goals
+
+### Product Goals
+
+1. **Accurate backend health.** Detect that a backend is up,
+   degraded, or down quickly enough to keep user-visible error rates
+   low, and avoid flapping under transient faults.
+2. **Correct VPP state.** The set of VIPs and per-AS weights in VPP
+   converges to the configured intent, filtered by current health,
+   for every supported failure mode.
+3. **Restart neutrality.** Restarting `maglevd` with VPP already up
+   MUST NOT cause traffic to be black-holed while health probes warm
+   up.
+4. **Operator control.** A human can pause, drain, or weight-shift
+   a backend in seconds without editing config files.
+5. **Uniform observability.** Every state transition, VPP API call,
+   and probe result is emitted as a structured log, a Prometheus
+   metric, or a streaming event — ideally all three.
+6. **One source of truth.** Every other component (CLI, web
+   frontend, scripts) reads `maglevd` through one typed interface.
+   There is no secondary control plane.
+
+### Non-Goals
+
+- `vpp-maglev` is not a VPP installer or packaging layer. It assumes
+  VPP is already running with the `lb` plugin loaded.
+- It does not implement its own dataplane fast path. All forwarding
+  stays in VPP; `maglevd` only programs the plugin.
+- It is not a generic service mesh. There is no L7 routing, cert
+  issuance, service discovery, or east-west policy — only VIPs,
+  pools, and backends.
+- It is not a config store. Configuration is a YAML file on disk;
+  the gRPC API can check and reload it but cannot author it.
+- It does not secure its own transport. gRPC runs insecure by
+  default; TLS, mTLS, or firewalls are the operator's
+  responsibility.
+
+## Requirements
+
+Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`)
+so that later sections can cite it.
+
+### Functional Requirements
+
+**FR-1 Health checking**
+
+- **FR-1.1** The system supports ICMP, TCP, HTTP, and HTTPS health
+  checks, each with its own protocol-specific success criteria.
+- **FR-1.2** Each health check MUST apply HAProxy rise/fall
+  semantics with operator-configurable thresholds.
+- **FR-1.3** A health check MAY declare distinct `interval`,
+  `fast-interval`, and `down-interval` values so that recovery from
+  a degraded or down state is faster than steady-state polling.
+- **FR-1.4** Each probe attempt is bounded by a configurable
+  per-probe timeout, independent of the scheduling interval.
+- **FR-1.5** If the configuration sets `healthchecker.netns`, every
+  probe MUST execute inside the named Linux network namespace.
+- **FR-1.6** The first probe result against a newly-created backend
+  forces an immediate transition out of `Unknown`, without waiting
+  for `rise` or `fall` consecutive results.
+- **FR-1.7** A backend MAY omit its `healthcheck` reference to
+  declare itself **static**. A static backend is not probed and is
+  treated as permanently Up; it still participates in pool failover
+  and still honors operator Pause and Disable overrides.
+
+**FR-2 Aggregation and pool failover**
+
+- **FR-2.1** A frontend MAY reference one or more named pools.
+  Each referenced pool MUST contain at least one
+  `(backend, configured-weight)` tuple; an empty pool is a
+  configuration error and is rejected at load time.
+- **FR-2.2** At any time, exactly one pool — the first, in
+  configuration order, that contains a healthy backend with
+  non-zero configured weight — is active; backends in other pools
+  contribute zero effective weight.
+- **FR-2.3** The effective weight of a `(frontend, pool, backend)`
+  tuple is the configured weight when the backend is Up **and** the
+  pool is active, and zero in every other case.
+- **FR-2.4** A frontend's aggregate state is Up when at least one
+  backend has non-zero effective weight, Unknown when every
+  referenced backend is still Unknown (or the frontend references
+  no backends), and Down otherwise.
+
+**FR-3 Operator control**
+
+- **FR-3.1** Operators can pause and resume individual backends at
+  runtime. Pausing stops the probe worker, freezes the rise/fall
+  counter, and drives effective weight to zero in **every** pool
+  and **every** frontend that references the backend. Existing
+  flows are not torn down; this is a soft drain.
+- **FR-3.2** Operators can disable and re-enable individual
+  backends at runtime. Disabling drives effective weight to zero in
+  **every** pool and **every** frontend that references the
+  backend, and MUST cause existing flows to be torn down on the
+  next VPP sync.
+- **FR-3.3** Operators can set the configured weight of a specific
+  `(frontend, pool, backend)` tuple at runtime.
+- **FR-3.4** Operator overrides (Pause, Disable) and operator
+  weight mutations survive a configuration **reload** (`SIGHUP`)
+  as long as the underlying backend and tuple still exist in the
+  new configuration.
+- **FR-3.5** Operator overrides and operator weight mutations do
+  **not** survive a `maglevd` **restart**. After a restart, the
+  YAML configuration file is authoritative for every backend and
+  every tuple: paused backends come back unpaused, disabled
+  backends come back enabled, mutated weights revert to the
+  configured value. Operators who need persistent changes must
+  edit the config file.
+
+**FR-4 VPP reconciliation**
+
+- **FR-4.1** For every backend state transition that changes an
+  effective weight, `maglevd` pushes the resulting AS state into
+  VPP for every affected VIP.
+- **FR-4.2** `maglevd` runs a periodic full reconciliation on a
+  configurable cadence (default thirty seconds) as a safety net
+  against missed events and VPP restarts.
+- **FR-4.3** Weight-to-zero is communicated to VPP as a graceful
+  drain by default; transitions to Disabled and transitions to
+  Down while `flush-on-down` is true MUST tear existing flows down
+  on the next sync.
+- **FR-4.4** `maglevd` tolerates VPP disconnects by auto-reconnecting
+  and resuming reconciliation once the connection is
+  re-established.
+
+**FR-5 Configuration**
+
+- **FR-5.1** Configuration is loaded from a single YAML file
+  specified at startup and referenced by all later operations.
+- **FR-5.2** Configuration validation distinguishes **parse
+  errors** (malformed YAML) from **semantic errors** (structural
+  invariants) and MUST report each with its own exit code from
+  `--check`: 0 (OK), 1 (parse), 2 (semantic).
+- **FR-5.3** `maglevd` reloads its configuration on `SIGHUP`
+  without restarting the process, without restarting unchanged
+  probe workers, and without losing operator overrides (see
+  FR-3.4).
+- **FR-5.4** A parse or semantic error encountered during reload
+  MUST leave the running configuration in place.
+- **FR-5.5** The same validation and reload semantics are also
+  reachable through gRPC (`CheckConfig`, `ReloadConfig`).
+
+**FR-6 Observability**
+
+- **FR-6.1** All logs are emitted as structured JSON on stdout at
+  a configurable level.
+- **FR-6.2** `maglevd` exposes Prometheus metrics for probe
+  outcomes, probe latency, backend state transitions, VPP API
+  traffic, and VPP LB sync mutations.
+- **FR-6.3** A streaming gRPC API multiplexes log entries, backend
+  transitions, and frontend aggregate transitions to any number of
+  subscribers with per-subscriber filters.
+- **FR-6.4** Per-VIP packet counters from VPP's stats segment are
+  surfaced through both the gRPC API and the Prometheus surface.
+
+**FR-7 Clients and peripheral tools**
+
+- **FR-7.1** An interactive CLI (`maglevc`) provides a
+  tab-completing shell and a one-shot command mode, both backed
+  by the same command tree.
+- **FR-7.2** A web frontend (`maglevd-frontend`) can multiplex more
+  than one `maglevd` in a single process and present their
+  combined state.
+- **FR-7.3** The web frontend partitions its HTTP surface into a
+  public read-only path (`/view/`) and an authenticated mutating
+  path (`/admin/`). If credentials are not configured, `/admin/`
+  MUST NOT be advertised (the path returns 404).
+- **FR-7.4** An out-of-band tester (`maglevt`) probes configured
+  VIPs from outside the control plane, measures latency, and
+  tallies a configurable response header.
+
+### Non-Functional Requirements
+
+**NFR-1 Availability and reliability**
+
+- **NFR-1.1** A `maglevd` outage MUST NOT stop the dataplane.
+  While `maglevd` is absent, VPP continues to forward traffic
+  with its last-programmed state.
+- **NFR-1.2** Restarting `maglevd` with VPP up MUST NOT black-hole
+  new flows during the probe warm-up window; this is enforced by
+  the startup warmup state machine described under `maglevd`.
+- **NFR-1.3** The warmup clock is tied to process start and MUST
+  NOT be reset by VPP reconnects or configuration reloads.
+- **NFR-1.4** A `maglevd`-side reload with a broken file MUST NOT
+  interrupt any running probe.
+
+**NFR-2 Determinism and correctness**
+
+- **NFR-2.1** Two `maglevd` instances given the same configuration
+  and the same backend state MUST issue the same sequence of
+  `lb_as_add_del` calls to VPP, so that VPP's bucket assignment is
+  stable across process swaps. This is the job of the
+  deterministic AS ordering rule.
+- **NFR-2.2** Configuration reload MUST be atomic: either every
+  change in the new file takes effect, or none of them do.
+- **NFR-2.3** Probe scheduling SHOULD apply bounded jitter so
+  that, after a daemon restart or a configuration reload, probes
+  do not phase-lock to the wall clock.
+- **NFR-2.4** Operator mutations, event-driven syncs, and
+  periodic full syncs against VPP MUST be serialized with respect
+  to one another; they MUST NOT interleave.
+
+**NFR-3 Performance and scalability**
+
+- **NFR-3.1** Probing N backends costs roughly N goroutines doing
+  mostly idle waits; there is no central probe scheduler.
+- **NFR-3.2** Event fan-out, transition history, and
+  per-subscriber event queues MUST all be bounded; no structure
+  grows without limit under sustained load.
+- **NFR-3.3** VPP stats snapshots are published as an atomic
+  pointer so that Prometheus scrapes and gRPC counter reads are
+  wait-free.
+- **NFR-3.4** A gRPC subscriber that cannot keep up MUST be
+  dropped rather than blocking the central fan-out.
+
+**NFR-4 Security**
+
+- **NFR-4.1** `maglevd` runs with only the Linux capabilities it
+  actually needs: `CAP_NET_RAW` only when ICMP probes are in use,
+  `CAP_SYS_ADMIN` only when `healthchecker.netns` is set.
+- **NFR-4.2** gRPC transport security is explicitly out of scope;
+  the daemon runs insecure by default and deployments SHOULD
+  front it with a firewall, a trusted network, or a
+  TLS-terminating sidecar.
+- **NFR-4.3** The web frontend's mutating surface MUST be hidden
+  entirely (HTTP 404) when either of its basic-auth environment
+  variables is unset.
+
+**NFR-5 Operability**
+
+- **NFR-5.1** Every CLI flag on every binary SHOULD have an
+  environment-variable equivalent so that the binaries can be
+  driven purely through env in container deployments.
+- **NFR-5.2** `maglevd --check` MUST provide a stable exit-code
+  contract (0 / 1 / 2) for use by packaging scripts and
+  `ExecStartPre` handlers.
+- **NFR-5.3** Dashboards can track state in real time through the
+  streaming event interface rather than by tight polling.
+- **NFR-5.4** `maglevc` and `maglevd-frontend` MUST NOT maintain
+  any authoritative state of their own; all truth lives in
+  `maglevd`.
+
+## Architecture Overview
+
+### Process Model
+
+The system ships as three independent executables plus one optional
+companion tester:
+
+- **`maglevd`** — the long-running daemon. Hosts both the health
+  checker and the VPP control plane.
+- **`maglevc`** — short-lived CLI client.
+- **`maglevd-frontend`** — long-running web dashboard (optional).
+- **`maglevt`** — short-lived out-of-band probe TUI (optional).
+
+VPP itself is a fourth moving part, but it is an external
+dependency, not part of the `vpp-maglev` codebase.
+
+### Data Flow
+
+Configuration flows **in** from a YAML file on disk (read by
+`maglevd`) and from runtime mutations issued over gRPC by `maglevc`
+or `maglevd-frontend`. Health state flows **out** of `maglevd` in
+three directions: into VPP (as AS weight changes), into Prometheus
+(as metrics), and into gRPC clients (as streaming events and
+snapshot reads). Traffic counters flow **back in** from VPP's stats
+segment and are surfaced through the same gRPC and Prometheus
+channels. No component writes to VPP except `maglevd`. No component
+serves `maglevd`'s state except `maglevd` itself.
+
+## Components
+
+### maglevd
+
+`maglevd` is the entire control plane. It is a single Go process
+that bundles three internal concerns — a fleet of probe workers, a
+VPP reconciler, and a gRPC server — around one shared, versioned
+view of `(config, backend state, frontend state)`.
+
+#### Responsibilities
+
+- Load and validate configuration; accept reloads on `SIGHUP`
+  (FR-5.3, FR-5.4).
+- Run one health-check worker per backend defined in config
+  (NFR-3.1).
+- Maintain each backend's rise/fall counter and derive its state
+  (FR-1.2, FR-1.6).
+- Aggregate backend state into per-frontend state, honoring
+  pool-based failover and per-backend operator overrides
+  (FR-2.x, FR-3.x).
+- Connect to VPP's binary API and stats socket, reconnecting
+  automatically on disconnect (FR-4.4).
+- Compute a desired VPP `lb` state from current configuration and
+  health, and drive VPP to match it (FR-4.1, FR-4.2).
+- Expose the whole picture through a gRPC service and a Prometheus
+  `/metrics` endpoint (FR-6.x).
+
+#### Probe Types and Intervals
+
+Four probe types are supported (FR-1.1):
+
+- **ICMP** — sends an echo request, expects a matching reply. This
+  probe type MUST have access to a raw socket, which requires
+  `CAP_NET_RAW` (NFR-4.1).
+- **TCP** — establishes a TCP connection and immediately closes
+  it. No payload is exchanged.
+- **HTTP** — issues a request against a configured path, matches
+  the response code against a configured numeric range, and
+  optionally matches the response body against a regular
+  expression.
+- **HTTPS** — HTTP over TLS with configurable SNI and an option to
+  skip certificate verification.
+
+Each health check configures three candidate intervals (FR-1.3):
+the nominal `interval`, an optional faster `fast-interval` used
+while the counter is in its degraded zone, and an optional slower
+`down-interval` used while the backend is fully down. If an
+optional interval is not set, the nominal interval is used. Every
+scheduled sleep receives bounded random jitter; this is the
+mechanism that satisfies NFR-2.3.
+
+Each probe also has a `timeout` (FR-1.4). The probe-level timeout
+bounds a single attempt; the interval bounds the time between the
+**start** of consecutive attempts, with the actual probe latency
+deducted from the next sleep so that slow probes do not push the
+schedule later and later.
+
+If the configuration sets `healthchecker.netns`, every probe of
+every type MUST run inside that Linux network namespace (FR-1.5).
+Entering a netns requires `CAP_SYS_ADMIN`; without it, probes will
+fail and the backend will go down. This is a deliberate deployment
+choice, not a bug — see the security subsection below.
+
+#### Rise/Fall State Machine
+
+Each backend carries a single integer counter in the closed range
+`[0, rise + fall − 1]`. A backend is considered **Up** when the
+counter is at or above `rise`, and **Down** otherwise. A successful
+probe increments the counter, saturating at the maximum; a failing
+probe decrements it, saturating at zero. This is the HAProxy
+hysteresis model adapted to a single scalar (FR-1.2).
+
+Four additional states overlay the rise/fall logic:
+
+- **Unknown** — the backend has not yet produced any probe result
+  since `maglevd` started (or since it was re-added by a reload).
+  An Unknown backend contributes zero effective weight and the
+  transition to Up or Down is taken on the *first* result rather
+  than after `rise` or `fall` consecutive results (FR-1.6). This
+  asymmetric rule lets fresh daemons discover the world quickly
+  while still requiring hysteresis for steady-state flaps.
+- **Paused** — operator override (FR-3.1). The probe worker is
+  stopped and the counter is frozen. Effective weight is zero in
+  every pool and every frontend that references the backend, but
+  existing flows are not torn down; this is a soft drain.
+- **Disabled** — operator override (FR-3.2). The probe worker is
+  stopped and effective weight is zero in every pool and every
+  frontend that references the backend. Unlike Paused, Disabled
+  causes existing flows to be torn down on the next VPP sync
+  (FR-4.3).
+- **Removed** — the backend was deleted by a configuration reload.
+  Its final transition is emitted on the event stream and then
+  all references are dropped.
+
+Backends declared **static** (no `healthcheck` reference in
+config, FR-1.7) bypass the rise/fall machinery entirely. They are
+not probed, their counter is not maintained, and they enter Up on
+startup via a single synthetic pass. They still participate in
+pool-failover weight computation like any other backend and still
+honor operator Pause and Disable overrides.
+
+Operator overrides and operator weight mutations are held in
+process memory only. They survive a `SIGHUP` reload (FR-3.4) but
+do **not** survive a daemon restart (FR-3.5): when `maglevd`
+starts, the YAML file is the sole source of truth, and any
+earlier runtime mutation is gone. Operators who need durable
+changes must commit them to the configuration file.
+
+#### Aggregation to Frontend State
+
+A frontend references one or more named pools. Each referenced
+pool contains one or more backends with a per-reference configured
+weight in `[0, 100]` (FR-2.1). The effective weight that `maglevd`
+computes for a given `(frontend, pool, backend)` tuple is
+(FR-2.3):
+
+- The configured weight, if the backend is Up **and** the
+  backend's pool is the active pool (see below).
+- Zero in every other case.
+
+The active pool is the first pool, in configuration order, that
+contains at least one Up backend whose configured weight is
+non-zero (FR-2.2). If no pool is active (e.g. all backends are
+Down), every backend contributes zero weight and the frontend's
+aggregate state is Down. A frontend with no backends at all, or
+with every referenced backend still in Unknown, is itself Unknown.
+A frontend with at least one non-zero effective weight is Up
+(FR-2.4).
+
+Whether effective weight zero also flushes existing flows depends
+on the cause (FR-4.3):
+
+- Up in a non-active pool: weight zero, **no** flush (standby
+  pool).
+- Down while `flush-on-down` is true: weight zero, flush.
+- Disabled: weight zero, flush, always.
+- Paused or Unknown: weight zero, no flush.
+
+#### VPP Reconciliation
+
+`maglevd` treats VPP's LB configuration as a desired-state
+reconciliation target. The desired state is a pure function of
+`(current config, current backend state)`; the observed state is
+read back from VPP through the `lb` plugin's binary API. A sync
+operation diffs the two and issues the minimal set of
+`lb_vip_add_del`, `lb_as_add_del`, and `lb_as_set_weight` messages
+to make them match.
+
+Two triggers drive a sync:
+
+1. **Event-driven, single VIP** (FR-4.1). When the health checker
+   emits a backend transition, the reconciler recomputes desired
+   state for every frontend that references that backend and
+   syncs those VIPs. This is the primary path for convergence
+   during incidents.
+2. **Periodic, full** (FR-4.2). A background loop runs a full
+   sync on a configurable interval (default thirty seconds).
+   This is the safety net that closes gaps left by missed events,
+   VPP restarts, or bugs in the event path.
+
+For determinism (NFR-2.1), whenever a sync operation iterates
+over ASes it does so in a total order defined by the numeric
+representation of the AS address, with IPv4 addresses ordered
+before IPv6. Two `maglevd` instances given the same input MUST
+therefore issue the same `lb_as_add_del` sequence, which in turn
+means VPP produces the same bucket-to-AS assignment regardless of
+which instance is driving.
+
+Operator mutations, event-driven syncs, and periodic full syncs
+are serialized through a single mutex at the VPP-call boundary
+(NFR-2.4); they never interleave.
+
+#### Startup Warmup and Restart Neutrality
+
+A naive sync loop would, on restart, immediately synthesize a
+desired state in which every backend is Unknown, map every
+backend through the effective-weight rules to zero, and push
+"zero weight everywhere" into VPP before a single probe had
+completed. The result would be a multi-second black hole on
+every `maglevd` restart. NFR-1.2 forbids this, and the warmup
+state machine is how it is enforced.
+
+The warmup has three phases, keyed off two configurable delays
+`startup-min-delay` (default five seconds) and `startup-max-delay`
+(default thirty seconds):
+
+1. **Hands-off.** From process start to `startup-min-delay`, the
+   reconciler MUST NOT write anything to VPP at all. Event-driven
+   syncs are suppressed; the periodic full sync is suppressed.
+2. **Per-VIP release.** From `startup-min-delay` to
+   `startup-max-delay`, a VIP becomes eligible for sync the moment
+   every backend it references has produced at least one probe
+   result (i.e. none are Unknown). Eligible VIPs are released
+   individually so that healthy VIPs converge as fast as their
+   slowest backend, without being held back by unrelated slow
+   VIPs.
+3. **Watchdog.** At `startup-max-delay`, any VIPs still held are
+   released unconditionally by a final full sync. This bounds the
+   worst-case blackout to `startup-max-delay` rather than "as long
+   as the slowest backend takes".
+
+The warmup clock is tied to process start, not to VPP reconnect
+or configuration reload (NFR-1.3). Reconnecting to a flapping VPP
+does not re-enter warmup, and `SIGHUP` does not re-enter warmup.
+
+Setting both delays to zero disables the warmup entirely, which
+is useful for tests but SHOULD NOT be done in production.
+
+#### Configuration and Reload
+
+Configuration lives in a single YAML file (FR-5.1), typically
+`/etc/vpp-maglev/maglev.yaml`. It is validated in two distinct
+phases (FR-5.2): a **parse** phase that catches YAML errors, and
+a **semantic** phase that enforces structural invariants such as:
+
+- Every frontend whose VIPs share an address MUST use backends of
+  the same address family (IPv4 or IPv6), because VPP picks an
+  encap type per VIP and mixing families on one VIP is not
+  supported.
+- Every backend referenced by a frontend MUST exist.
+- Every referenced health check MUST exist.
+- Every pool referenced by a frontend MUST contain at least one
+  backend (FR-2.1).
+- VPP LB knobs MUST satisfy plugin constraints: `flow-timeout`
+  in `[1s, 120s]`, `sticky-buckets-per-core` a power of two,
+  `sync-interval` strictly positive, `startup-max-delay` not less
+  than `startup-min-delay`.
+- `transition-history` MUST be at least one.
+
+`maglevd --check` runs both phases and exits with code 0 on
+success, 1 on parse errors, and 2 on semantic errors (NFR-5.2).
+This exit code contract is what packaging scripts and systemd
+`ExecStartPre` rely on.
+
+On `SIGHUP` the same two-phase validation runs against the file
+on disk. If either phase fails, `maglevd` MUST log the error and
+leave the running configuration untouched (FR-5.4, NFR-1.4). On
+success, the delta is applied atomically (NFR-2.2): new backends
+spawn workers, removed backends have their workers stopped and
+emit a terminal `Removed` event, changed backends restart their
+workers, and metadata-only changes (address, weight, enable flag)
+are updated in place without restarting anything. Operator
+overrides (Pause, Disable) survive reloads (FR-3.4) but — to
+repeat the point from FR-3.5 — do **not** survive a daemon
+restart.
+
+#### Lifecycle, Signals, and Security
+
+`maglevd` handles three signals:
+
+- **`SIGHUP`** triggers a configuration reload as described
+  above.
+- **`SIGTERM`** and **`SIGINT`** initiate a graceful shutdown:
+  the gRPC server drains, stream subscribers are released, probe
+  workers are cancelled, and the VPP connection is closed. VPP's
+  last-programmed state is not torn down; traffic continues to
+  flow (NFR-1.1).
+
+`maglevd` requires two Linux capabilities, each tied to a
+specific feature (NFR-4.1):
+
+- **`CAP_NET_RAW`** is required if and only if any configured
+  health check is of type ICMP. Without it, raw-socket creation
+  will fail and all ICMP probes will error out.
+- **`CAP_SYS_ADMIN`** is required if and only if
+  `healthchecker.netns` is set. The kernel's `setns(CLONE_NEWNET)`
+  call requires it; without it, every probe will fail on
+  namespace entry.
+
+The shipped Debian unit grants both capabilities through
+`AmbientCapabilities` and `CapabilityBoundingSet`, which is why
+the package "just works" out of the box. Hand-run invocations
+SHOULD set capabilities explicitly (e.g. via `setcap`) rather
+than running as root.
+
+`maglevd` does not secure its own gRPC listener (NFR-4.2).
+Operators SHOULD bind the listener to loopback, to a
+control-plane VRF, or behind a firewall, depending on their
+threat model. The design deliberately pushes transport security
+out of the binary on the theory that every deployment already
+has an answer for it.
+
+#### Interfaces
+
+**Presents.**
+
+- **A gRPC service on a TCP listener** (default `:9090`). This
+  is the *only* programmatic interface to `maglevd`. Every other
+  component talks to `maglevd` through this interface and no
+  other. The service has read-only methods (`List*`, `Get*`,
+  `WatchEvents`, `CheckConfig`), mutating methods
+  (`PauseBackend`, `ResumeBackend`, `EnableBackend`,
+  `DisableBackend`, `SetFrontendPoolBackendWeight`,
+  `ReloadConfig`, `SyncVPPLBState`), and a single streaming
+  method (`WatchEvents`) that multiplexes log entries and state
+  transitions to any number of subscribers with per-subscriber
+  filters (FR-6.3). gRPC reflection is enabled by default so
+  that ad-hoc tooling can introspect the service.
+- **A Prometheus `/metrics` HTTP endpoint** on a separate
+  listener (default `:9091`) (FR-6.2). Counters are updated
+  inline as probes run and VPP calls complete; gauges are
+  computed on each scrape from the current checker and VPP
+  state, so there is no sampling lag.
+- **Structured JSON logs on stdout**, via `log/slog`, at a
+  configurable level (FR-6.1). Key events — daemon start, config
+  load, VPP connect/disconnect, backend transitions, LB sync
+  mutations, warmup milestones — are logged at `info` or higher
+  so that a default-level deployment has enough to post-mortem
+  an incident.
+- **Process exit codes** from `--check`: 0, 1, or 2 as described
+  above (NFR-5.2). These form a small but load-bearing interface
+  to packaging and systemd.
+
+**Consumes.**
+
+- **A YAML configuration file** on disk, passed via `--config`
+  or `MAGLEV_CONFIG`. This is the declarative source of truth
+  for intent; everything the operator mutates at runtime is a
+  delta on top of it, and every runtime delta is lost on a
+  daemon restart (FR-3.5).
+- **VPP's binary API socket** (default `/run/vpp/api.sock`).
+  The connection auto-reconnects on drop (FR-4.4), and while
+  disconnected, the reconciler silently queues no work — the
+  next periodic sync closes any gap.
+- **VPP's stats segment socket** (default `/run/vpp/stats.sock`).
+  Read periodically (five-second cadence) for per-VIP packet
+  and byte counters (FR-6.4). Readers are non-blocking
+  (NFR-3.3); a stale snapshot is always available.
+- **The Linux kernel's namespace subsystem**, when
+  `healthchecker.netns` is set. Requires `CAP_SYS_ADMIN`.
+- **Raw sockets**, for ICMP probes. Requires `CAP_NET_RAW`.
+
+### VPP Dataplane
+
+The VPP dataplane is not part of the `vpp-maglev` codebase, but
+it is the component every other piece revolves around, and its
+contract with `maglevd` defines what `maglevd` is allowed to do.
+
+#### Responsibilities
+
+VPP's `lb` plugin implements Maglev consistent hashing in the
+forwarding fast path. It owns:
+
+- **Global configuration** — an IPv4 source address and an IPv6
+  source address used as the outer header for GRE-encapsulated
+  traffic to ASes, the number of sticky buckets per worker core,
+  and a per-flow idle timeout.
+- **A set of VIPs**, each identified by an address prefix, an IP
+  protocol, and a port. A VIP carries an encap type (GRE4 or
+  GRE6, picked by the family of the AS addresses) and a flag
+  for source-IP sticky hashing.
+- **A set of ASes per VIP**, each identified by address, with an
+  integer weight in `[0, 100]`, a `used`/`flushed` state, and a
+  bucket count derived from the Maglev ring.
+
+It does **not** own: health, configuration intent, operator
+overrides, transition history, or metrics. Those belong to
+`maglevd`.
+
+#### Interfaces
+
+**Presents.**
+
+- **A binary API** (GoVPP-style message exchange) for reading
+  and mutating VIP and AS state. `maglevd` is the sole user.
+- **A stats segment** with per-VIP counters from the LB plugin
+  (existing-flow, first-flow, untracked, no-server) and
+  per-prefix FIB counters. The LB plugin bypasses the FIB for
+  forwarded packets, so per-backend traffic counters are not
+  available; this is a known limitation that operators consuming
+  metrics need to understand.
+- **The forwarded-traffic fast path itself**, which is the whole
+  reason this project exists.
+
+**Consumes.**
+
+- `maglevd`'s binary-API writes — nothing else. There is no
+  third party programming `lb` state in a working deployment.
+
+### maglevc
+
+`maglevc` is the interactive and scripting CLI. It is a
+short-lived client with no persistent state and no background
+work (NFR-5.4).
+
+#### Responsibilities
+
+- Provide a human-readable tab-completing shell for `maglevd`
+  (FR-7.1).
+- Dispatch one-shot commands for scripts and automation.
+- Render state snapshots (frontends, backends, health checks,
+  VPP LB state, VPP counters) with optional ANSI color.
+- Stream events in real time (`watch events`) with filters.
+
+#### Interaction Model
+
+With no positional arguments, `maglevc` starts a readline-based
+REPL with a nested command tree: `show`, `set`, `watch`,
+`config`, plus the usual `help`, `exit`, `quit`. Tab completion
+is built from the same command tree the dispatcher uses, so
+completion can never drift from the actual command set. With
+positional arguments, `maglevc` executes one command against the
+server and exits — in this mode color is off by default so that
+pipes and logs stay clean, but `--color=true` can be set
+explicitly.
+
+#### Interfaces
+
+**Presents.**
+
+- **An interactive TTY shell** and a **one-shot command mode**.
+  Humans and scripts are the only consumers; there is no API,
+  no socket, no file output.
+
+**Consumes.**
+
+- **`maglevd`'s gRPC service**, over insecure credentials by
+  default. `maglevc` MUST NOT talk to VPP directly, MUST NOT
+  read the config file directly, and MUST NOT maintain any
+  state of its own across invocations (NFR-5.4). Everything it
+  shows and everything it mutates goes through the gRPC API.
+
+### maglevd-frontend
+
+`maglevd-frontend` is an optional web dashboard (FR-7.2). Unlike
+`maglevc`, it is a long-running process: it holds open gRPC
+streams, caches snapshots, and serves HTTP.
+
+#### Responsibilities
+
+- Connect to one or more `maglevd` servers simultaneously.
+- Maintain a cached view of each server's state: frontends,
+  backends, health checks, VPP LB state, and VPP counters.
+- Serve a SolidJS single-page application and a JSON API to
+  browsers.
+- Stream live updates to browsers so that dashboards update
+  without polling (NFR-5.3).
+- Expose an optional authenticated mutation surface (FR-7.3).
+
+#### Multi-Server Multiplexing
+
+A single `maglevd-frontend` process accepts a comma-separated
+list of gRPC server addresses. For each one, it runs an
+independent pool of goroutines: one to stream events, one to
+refresh list-oriented data on a roughly one-second cadence, one
+to refresh per-health-check detail, and one (debounced on
+incoming events) to refresh VPP LB state and counters. Failures
+on one server MUST NOT block the others, and the served JSON
+state always reports per-server connection status so that the
+SPA can mark partially-available views.
+
+All per-server event streams publish into a single shared event
+broker with a bounded replay buffer (capped both in time and in
+event count, satisfying NFR-3.2). The broker assigns each event
+a monotonic `epoch-seq` identifier so that browsers reconnecting
+a dropped Server-Sent-Events stream can resume from where they
+left off without a full refresh — and so that a broker restart,
+which reshuffles the epoch, forces a full refresh rather than
+silently handing out ambiguous IDs.
+
+#### Read-Only and Admin Surfaces
+
+The HTTP surface is partitioned into two paths (FR-7.3):
+
+- **`/view/`** serves the SPA and a read-only JSON API. It is
+  always publicly accessible: there is no auth, and there are
+  no mutation endpoints under it at all. The design intent is
+  that `/view/` can be exposed to a broader audience (NOC,
+  management UIs, screens on walls) without risk.
+- **`/admin/`** serves the SPA entry point and the mutating
+  JSON API behind HTTP basic auth. Credentials come from
+  `MAGLEV_FRONTEND_USER` and `MAGLEV_FRONTEND_PASSWORD`. If
+  either is unset or empty, the `/admin/` path MUST return 404
+  (NFR-4.3) — the admin surface is not merely locked, it is
+  not advertised. This makes accidental exposure self-limiting:
+  forgetting to set the env vars disables admin rather than
+  leaving it open.
+
+Both surfaces talk to the same underlying cache; the difference
+is only what endpoints exist.
+
+#### Interfaces
+
+**Presents.**
+
+- **An HTTP listener** (default `:8080`) serving:
+  - `/view/` — the SolidJS SPA (embedded in the binary).
+  - `/view/api/*` — read-only JSON endpoints for version,
+    server list, aggregated state, and per-server state.
+  - `/view/api/events` — an SSE stream bridged from the
+    internal event broker, with `Last-Event-ID` replay.
+  - `/admin/` — the SPA entry point, gated on basic auth.
+  - `/admin/api/*` — mutating JSON endpoints that translate
+    to gRPC mutations against the appropriate `maglevd`.
+  - `/healthz` — a liveness probe.
+
+**Consumes.**
+
+- **One or more `maglevd` gRPC services.** As with `maglevc`,
+  this is the *only* way `maglevd-frontend` reaches into the
+  system. It MUST NOT read the YAML config file and MUST NOT
+  talk to VPP directly (NFR-5.4).
+- **Two environment variables**, `MAGLEV_FRONTEND_USER` and
+  `MAGLEV_FRONTEND_PASSWORD`, for the optional admin surface.
+
+### maglevt
+
+`maglevt` is a small out-of-band probe TUI (FR-7.4). It is not
+part of the control loop at all; it is a validation tool that
+an operator runs on a laptop, a jump host, or a monitoring box
+to see VIPs the way a client sees them.
+
+#### Responsibilities
+
+- Read one or more `maglev.yaml` files and enumerate TCP-style
+  VIPs from the `frontends` section.
+- Probe each VIP at a configurable interval with a real HTTP or
+  HTTPS request against a configurable path.
+- Measure latency (min/max/average and a handful of
+  percentiles) and success rate over a rolling window.
+- Tally the value of a configurable response header (by
+  default, `X-IPng-Frontend`) so that operators can see which
+  backend actually served each request. Because keep-alives are
+  disabled by default, this tally reflects fresh Maglev hashing
+  decisions rather than a pinned connection.
+
+#### Scope Boundary
+
+`maglevt` is intentionally decoupled from `maglevd`. It does
+not talk gRPC, it does not read the VPP stats segment, and it
+does not know or care whether the target VIPs are actually
+served by the `vpp-maglev` control plane at all — it simply
+probes addresses. This makes it useful in at least three
+scenarios: validating a `maglevd` restart end-to-end from a
+client perspective, debugging pool failover by watching the
+header tally reshuffle, and sanity-checking that a given VIP is
+reachable across deployments when the gRPC control plane is
+unavailable or out of reach.
+
+#### Interfaces
+
+**Presents.**
+
+- **A full-screen TUI** built on Bubble Tea, with a
+  deterministic grid layout and a few interactive toggles (e.g.
+  reverse-DNS lookup). There is no machine-readable output; if
+  you need metrics, use Prometheus on `maglevd`.
+
+**Consumes.**
+
+- **One or more YAML configuration files**, which it parses
+  with the same library `maglevd` uses. Only the subset of the
+  schema describing frontends is actually consumed; unknown
+  fields are ignored. Duplicate VIPs discovered across files
+  are de-duplicated by `(scheme, address, port)` so that
+  multi-file deployments don't double-probe.
+- **The outbound network**, directly. No special capabilities
+  are required — `maglevt` is a plain HTTP client.
+
+## Operational Concerns
+
+### Configuration Reload Semantics
+
+Reload is triggered by `SIGHUP` to `maglevd`, or by the
+`ReloadConfig` gRPC method. Both paths run the same validation
+as `--check`. A reload MUST NOT partially apply (NFR-2.2):
+either every change in the new file takes effect, or none of
+them do. A reload MUST NOT restart unchanged probe workers; the
+probe state machine is preserved precisely because operators
+use reloads as a routine operation and expect backends whose
+health-check definitions did not change to simply keep running.
+
+Operator overrides (Pause, Disable) survive a reload as long as
+the backend still exists in the new config (FR-3.4). A backend
+that disappears from the new config transitions to `Removed`
+and its worker is stopped; if it reappears in a later reload it
+starts again in `Unknown` with a fresh counter.
+
+A daemon **restart** is different from a reload. On restart,
+the YAML configuration is the sole source of truth: every
+runtime override is gone, every runtime weight mutation is gone
+(FR-3.5). Operators who need an override to persist across
+restarts must commit the intended state to the config file.
+
+### Failure Modes
+
+- **VPP restart.** `maglevd` detects the disconnect, enters a
+  reconnect loop, and on reconnect reads VPP's version and
+  current state (FR-4.4). The warmup clock is not reset by VPP
+  reconnects (NFR-1.3) — a flapping VPP does not cause
+  `maglevd` to go hands-off every time. The next periodic full
+  sync pushes the current desired state into the freshly
+  restarted plugin.
+- **`maglevd` restart with VPP up.** Handled by the warmup
+  state machine (NFR-1.2): new flows see the last-programmed
+  weights until probes catch up, not zeros.
+- **`maglevd` restart with VPP also down.** VPP comes back
+  first, `maglevd` comes back second, warmup gates pushing
+  anything until probes converge. This is the worst-case path,
+  bounded by `startup-max-delay`.
+- **Configuration reload with a broken file.** The reload is
+  rejected; the running configuration is retained; an error
+  is logged (FR-5.4). No probes are interrupted (NFR-1.4).
+- **Probe namespace disappears.** Entering the namespace fails,
+  the probe is counted as a failure, and the backend
+  eventually transitions Down under normal rise/fall rules.
+  There is no special-case handling; this is by design, because
+  an operator removing the netns while `maglevd` is running is
+  an operational error that SHOULD manifest as a visible Down,
+  not as silent success.
+- **gRPC subscriber too slow.** Per-subscriber event queues
+  are bounded (NFR-3.2). A subscriber that cannot keep up MUST
+  be dropped rather than backing up the central fan-out
+  (NFR-3.4).
+- **Mid-flight weight mutation during sync.** Operator weight
+  changes and reconciler sync both route through the same
+  state-protected code path, so mutations are serialized rather
+  than interleaved with VPP writes (NFR-2.4).
+
+### Observability
+
+**Structured logging** (FR-6.1). All logs are slog-formatted
+JSON written to stdout. The default level is `info`, which is
+sized to produce one or two lines per incident rather than per
+probe. The `debug` level dumps every probe attempt and every
+VPP binary-API message, and is intended for post-mortem
+investigation.
+
+**Prometheus metrics** (FR-6.2, FR-6.4). `maglevd` exposes four
+classes of metric: inline counters for probe outcomes,
+probe-latency histograms, backend state-transition counters,
+and VPP API and LB sync counters; and on-demand gauges for
+current backend state, rise/fall counter values, configured
+weights, VPP connection status, VPP uptime, VPP info labels,
+and per-VIP LB plugin counters. Gauges are sampled from live
+state on every scrape, so there is no sampling staleness.
+
+**Streaming events** (FR-6.3). The gRPC `WatchEvents` method
+multiplexes three event families into one stream: log events
+(the same structured logs the daemon writes to stdout), backend
+transitions (one per affected frontend, since a single backend
+may participate in multiple frontends), and frontend aggregate
+transitions (Up/Down/Unknown flips at the frontend level).
+Clients MAY filter by event family and by minimum log level.
+The web frontend consumes this stream and re-publishes it to
+browsers over SSE, with an epoch-seq replay buffer layered on
+top.
+
+### Security and Capabilities
+
+`maglevd` needs `CAP_NET_RAW` for ICMP probes and
+`CAP_SYS_ADMIN` for netns entry (NFR-4.1). Neither is optional
+for the feature that needs it, and neither is required
+otherwise; operators who use neither feature MAY run `maglevd`
+as an unprivileged user with no capabilities at all.
+
+`maglevd-frontend` needs no special capabilities — it is a
+plain HTTP client of `maglevd` and a plain HTTP server for
+browsers. It does handle user credentials (basic auth), which
+are read from the environment and held in process memory;
+operators SHOULD terminate the frontend behind a TLS reverse
+proxy if it is exposed beyond a trusted network.
+
+`maglevc` and `maglevt` need no special capabilities.
+
+All gRPC traffic runs insecure by default (NFR-4.2). Securing
+transport is an operational decision, not a build-time one;
+deployments that require mTLS SHOULD terminate gRPC at a
+sidecar or colocate control and data plane on a trusted
+segment.
+
+### Concurrency Model
+
+The concurrency model inside `maglevd` is deliberately simple
+and local:
+
+- Each backend owns exactly one probe worker goroutine
+  (NFR-3.1). Workers do not share state with each other.
+- All events — transitions and log records — travel through a
+  single central channel which is then fanned out to bounded
+  per-subscriber queues (NFR-3.2). The fan-out is the only
+  place where multiple subscribers can observe the same event.
+- The configuration pointer is swapped atomically on reload
+  (NFR-2.2); readers take a read lock for the duration of a
+  single access, so the live config is always internally
+  consistent even mid-reload.
+- The VPP stats snapshot is published as an atomic pointer
+  (NFR-3.3), so Prometheus scrapes and gRPC reads of counters
+  are wait-free.
+- Reconciliation holds a mutex around VPP calls, which
+  serializes operator mutations, event-driven syncs, and
+  periodic full syncs against each other (NFR-2.4). This is
+  intentional: the order in which VPP sees mutations matters
+  for determinism, and serializing them is cheap at the scale
+  of control-plane events.
+
+Deadlock avoidance is structural rather than audited:
+dependencies between subsystems are one-way. The checker does
+not call into VPP; the reconciler reads checker state and calls
+VPP; VPP never calls back. `maglevd-frontend` and `maglevc`
+only read from `maglevd` over gRPC. There is no cycle in the
+wait-for graph.
+
+## Alternatives Considered
+
+This is a retrofit of a shipped system, so the alternatives
+here are the ones the code actively rejects, not speculative
+designs.
+
+- **Several probe schedulers sharing one goroutine pool.**
+  Rejected in favor of one goroutine per backend. The
+  per-backend model is trivially correct, has no shared state,
+  and scales linearly with backend count at a cost of a few
+  kilobytes per backend.
+- **`maglevd-frontend` as a sidecar per `maglevd`.** Rejected
+  in favor of one frontend speaking to many daemons. A single
+  dashboard pane across a fleet is the common operator
+  request; pushing multi-server logic into the frontend keeps
+  the daemon simple.
+- **Operator actions expressed as config edits plus SIGHUP.**
+  Rejected in favor of direct gRPC mutations. Pausing a
+  backend during an incident should not require editing a
+  file, and the effect should survive subsequent reloads
+  (FR-3.4) — though, by deliberate design, not a daemon
+  restart (FR-3.5).
+- **Persisting operator overrides across daemon restarts.**
+  Rejected in favor of making the YAML config file the sole
+  source of truth on startup (FR-3.5). Persisting runtime
+  overrides would require an on-disk side store and a clear
+  policy for what happens when the side store and the config
+  file disagree; keeping the daemon stateless on startup is
+  simpler and harder to get wrong.
+- **Synchronous full sync after every transition.** Rejected
+  in favor of event-driven single-VIP syncs with a periodic
+  full sync as a safety net (FR-4.1, FR-4.2). Full syncs are
+  cheap but not free, and the blast radius of a transient bug
+  in the desired-state computation is smaller when
+  per-transition work only touches one VIP.
+- **Letting `maglevt` read `maglevd`'s gRPC.** Rejected in
+  favor of probing the YAML file directly so that `maglevt`
+  remains useful when `maglevd` itself is the thing being
+  investigated.
+
+## Open Questions
+
+- **Mutual TLS for gRPC.** Currently insecure by default. A
+  future version may wire in standard mTLS support once a
+  credential-management story is picked.
+- **Per-AS traffic counters.** The VPP `lb` plugin bypasses
+  the FIB and therefore does not produce per-AS traffic
+  counters. Surfacing real per-backend byte/packet counts
+  would require a VPP-side change.
+- **High-availability of the control plane.** Two `maglevd`
+  instances on the same VPP would interleave writes harmlessly
+  thanks to determinism (NFR-2.1), but there is no leader
+  election and no formal story about which instance owns which
+  VIPs. Today, operators run a single `maglevd` per VPP host.
diff --git a/docs/user-guide.md b/docs/user-guide.md
index 8a33c86..34118f1 100644
--- a/docs/user-guide.md
+++ b/docs/user-guide.md
@@ -535,3 +535,61 @@ Nginx, HAProxy, or any proxy in front of `maglevd-frontend` must:
   the live-stream property.
 
 See `maglevd-frontend(8)` for the full reference.
+
+---
+
+## maglevt
+
+`maglevt` is an optional out-of-band VIP probe TUI. It reads one or
+more `maglev.yaml` files, enumerates the configured TCP/HTTP frontends,
+and probes each one on a configurable HTTP path at a configurable
+interval. It does not talk gRPC and does not depend on a running
+`maglevd` — it's a purely client-side view of the VIPs, driven entirely
+from the config file on disk.
+
+It's useful for a handful of things in particular:
+
+- Validating a `maglevd` restart end-to-end from a client perspective:
+  the probe tally keeps running regardless of what the control plane
+  is doing, so a brief blip or a missed failover is visible directly.
+- Debugging pool failover: with keep-alives off, every probe opens a
+  fresh TCP connection and is reshuffled by VPP's Maglev hash, so the
+  response-header tally visibly reshuffles the moment a standby pool
+  takes over.
+- Sanity-checking VIP reachability across multi-site deployments,
+  especially when the gRPC control plane isn't reachable from the
+  machine you're debugging on.
+
+`maglevt` is built by `make` alongside the other binaries but is not
+shipped in the Debian package; run it from the `build/` tree or copy
+it onto the host by hand.
+
+### Flags
+
+| Flag | Environment variable | Default | Description |
+|---|---|---|---|
+| `--config` | — | `/etc/vpp-maglev/maglev.yaml` | Path to a `maglev.yaml` file. Repeatable; also accepts a comma-separated list. Frontends are unioned across files and de-duplicated by `(address, protocol, port)`. |
+| `--interval` | — | `100ms` | Probe interval per VIP, with ±10% jitter applied per probe to avoid phase-locking. |
+| `--timeout` | — | `2s` | Per-request timeout. |
+| `--host` | — | (VIP address) | Override for the HTTP `Host` header. Defaults to the VIP address literal. |
+| `--uri` / `--path` | — | `/.well-known/ipng/healthz` | HTTP request path used in the GET. `--path` is an alias for `--uri`. |
+| `--header` | — | `X-IPng-Frontend` | Response header whose value is extracted and tallied, so you can see which backend served each request. |
+| `--insecure` | — | `true` | Skip TLS verification for HTTPS frontends. |
+| `--keepalive` / `-k` | — | `false` | Enable HTTP keep-alives. Off by default so every probe opens a fresh connection — required for failover visibility, because a pinned keep-alive would mask a Maglev reshuffle. |
+| `--filter` | — | — | Regular expression; only probe frontends whose name matches. |
+| `--version` | — | — | Print version, commit hash, and build date, then exit. |
+
+### UI
+
+The TUI is built with Bubble Tea and shows a deterministic grid —
+one tile per `(scheme, address, port)` VIP, IPv6 before IPv4 and
+HTTPS before HTTP, so the layout is stable across runs and across
+machines. Each tile carries a rolling latency summary (min, max,
+average, plus a few percentiles), running success and failure
+counts, and a tally of the configured response-header values seen
+from that VIP. Press `d` to toggle reverse-DNS resolution on the
+addresses shown in the tile headers; press `q` or `Ctrl-C` to
+exit.
+
+There is no machine-readable output. If you need metrics, scrape
+Prometheus on `maglevd` instead.