Go to file

Pim van Pelt 1191b3d994 Frontend aggregate state: SPA-side derive + checker fixes

The web UI showed the wrong up/down state for frontends whose pool
composition had been touched by a mix of runtime disable/enable and
weight changes: a frontend with every backend at effective_weight=0
would still display "up", while a sibling frontend with a serving
fallback backend would display "down". Two independent bugs, each
fixed on its own layer.

On the fast path (healthCheckEqual returns true), Reload did
`w.entry = b`, blindly replacing the runtime worker entry with the
fresh YAML record. YAML's default for Enabled is true, so any
backend the operator had runtime-disabled would have its Enabled
flag silently reset while the worker's backend.State stayed at
StateDisabled. Subsequent EnableBackend calls then early-returned
on `if w.entry.Enabled` and never transitioned the state machine
— the CLI reported "enabled, state is 'disabled'" and the backend
was permanently stuck.

Fix: preserve w.entry.Enabled across the fast-path replacement.

    runtimeEnabled := w.entry.Enabled
    w.entry = b
    w.entry.Enabled = runtimeEnabled

Runtime operator state now outlives config reloads. On the worker-
restart path (different health check) the new worker is
structurally fresh and the YAML's Enabled is still authoritative.

Both methods used `w.entry.Enabled` as their idempotency check,
which meant a stuck `Enabled=true, State=disabled` combo couldn't
be repaired even after the Reload fix (existing bad state had to
survive the upgrade). Switched both methods to key on
`w.backend.State`:

 - DisableBackend: if state == StateDisabled, sync the flag but
   don't emit a redundant transition; otherwise do the full
   state transition + flag flip + worker cancel.
 - EnableBackend: if state != StateDisabled, sync the flag but
   don't emit a redundant transition; otherwise do the full
   transition + flag flip + probe-goroutine restart.

Either method will now unstick any inconsistency between the
flag and the state machine — future drift from a panic, a new
code path we haven't thought of, or existing already-stuck
backends from before this commit are all repaired on the next
enable/disable call.

Changing a backend's weight can flip a frontend between up and
down (e.g. zeroing the last non-zero-weighted backend in the
active pool), but SetFrontendPoolBackendWeight never called
updateFrontendState, so the checker's cached frontend state
would drift from reality until the next genuine backend
transition happened to trigger a recompute. The symptom was
"show frontends nginx-ip4-http" reporting up even with every
effective_weight=0.

Fix: call c.updateFrontendState(frontendName, fe) after the
weight mutation, under the same lock. The recompute emits a
FrontendEvent transition if the aggregate flipped, so any
WatchEvents consumer picks up the change live.

stores/state.ts recomputeEffectiveWeights is renamed and
extended to recomputeDerivedState, which now also writes
fe.state using the same rule as health.ComputeFrontendState:
unknown if no backends or all unknown, up if any effective
weight > 0, down otherwise. Called from every mutation path
(replaceAll, replaceSnapshot, applyBackendTransition,
applyConfiguredWeight) so the SPA is authoritative for *display*
state and doesn't inherit any staleness the server's cached
frontendStates map might have.

applyFrontendTransition is now a no-op for the state field —
the server's `to` value is no longer trusted because
recomputeDerivedState walks the local backends array on every
update and produces a fresh, correct answer. The reducer is kept
as a named function so sse.ts's dispatch table still has a
landing spot for "frontend" events (they still feed the
DebugPanel via pushEvent); the empty body is deliberate, not a
bug — a comment at the top spells it out.

2026-04-12 23:50:26 +02:00

cmd

Frontend aggregate state: SPA-side derive + checker fixes

2026-04-12 23:50:26 +02:00

debian

Frontend: live clocks, admin mode, backend actions; packaging polish

2026-04-12 20:04:53 +02:00

docs

Bug fixes, config validation, SPA tightening, set-weight UI

2026-04-12 23:06:42 +02:00

internal

Frontend aggregate state: SPA-side derive + checker fixes

2026-04-12 23:50:26 +02:00

proto

Bug fixes, config validation, SPA tightening, set-weight UI

2026-04-12 23:06:42 +02:00

tests

VPP reconciler: event-driven sync, pool failover, bug fixes

2026-04-12 12:40:09 +02:00

.gitignore

Bug fixes, config validation, SPA tightening, set-weight UI

2026-04-12 23:06:42 +02:00

Dockerfile

Revision: Rename to 'maglevd'; Refactor config structure

2026-04-10 22:15:20 +02:00

go.mod

Prometheus: add VPP, LB sync, and gRPC metrics; expand docs

2026-04-12 13:00:35 +02:00

go.sum

Prometheus: add VPP, LB sync, and gRPC metrics; expand docs

2026-04-12 13:00:35 +02:00

LICENSE

Add LICENSE and README + config-guide

2026-04-10 22:22:56 +02:00

Makefile

Frontend: live clocks, admin mode, backend actions; packaging polish

2026-04-12 20:04:53 +02:00

README.md

Frontend: live clocks, admin mode, backend actions; packaging polish

2026-04-12 20:04:53 +02:00

README.md

maglevd

Health checker and gRPC control plane for VPP Maglev load balancing.

Build and Install

make          # builds build/<arch>/maglevd and build/<arch>/maglevc
make test     # runs all tests
make pkg-deb  # Creates a debian package for arm64 and amd64

Requires Go 1.25+ and (for make proto) protoc with protoc-gen-go and protoc-gen-go-grpc.

Produces vpp-maglev_<version>_amd64.deb and vpp-maglev_<version>_arm64.deb in the build/ directory by cross-compiling with GOOS=linux GOARCH=<arch>. Requires dpkg-deb (available on any Debian/Ubuntu host). The installed binaries report the exact git commit via maglevd --version (and similarly for maglevc / maglev-frontend).

Running

After installing, the unit is enabled but not started automatically:

# edit /etc/vpp-maglev/maglev.yaml, then:
systemctl enable --now vpp-maglev

Or run the server and client by hand:

maglevd --config /etc/vpp-maglev/maglev.yaml --grpc-addr :9090
maglevd --version                        # print version and exit

maglevc --server localhost:9090          # interactive shell
maglevc show frontends                   # one-shot
maglevc -color=false show backends       # one-shot, no ANSI color
maglevc set backend nginx0-ams pause

Send SIGHUP to maglevd to reload config without restarting. maglevd requires CAP_NET_RAW for ICMP health checks.

Check out a minimal configuration file in [debian/maglev.yaml]. See docs/user-guide.md for flags, signals, and maglevc usage. See docs/config-guide.md for the full configuration reference. See docs/healthchecks.md for health state machine details.

Docker

docker build -t maglevd .
docker run --cap-add NET_RAW -v /etc/vpp-maglev:/etc/vpp-maglev maglevd

Languages

Go 79%

TypeScript 12.8%

CSS 2.9%

Makefile 2.2%

RobotFramework 2.2%

Other 0.8%