Frontend aggregate state: SPA-side derive + checker fixes

The web UI showed the wrong up/down state for frontends whose pool
composition had been touched by a mix of runtime disable/enable and
weight changes: a frontend with every backend at effective_weight=0
would still display "up", while a sibling frontend with a serving
fallback backend would display "down". Two independent bugs, each
fixed on its own layer.

On the fast path (healthCheckEqual returns true), Reload did
`w.entry = b`, blindly replacing the runtime worker entry with the
fresh YAML record. YAML's default for Enabled is true, so any
backend the operator had runtime-disabled would have its Enabled
flag silently reset while the worker's backend.State stayed at
StateDisabled. Subsequent EnableBackend calls then early-returned
on `if w.entry.Enabled` and never transitioned the state machine
— the CLI reported "enabled, state is 'disabled'" and the backend
was permanently stuck.

Fix: preserve w.entry.Enabled across the fast-path replacement.

    runtimeEnabled := w.entry.Enabled
    w.entry = b
    w.entry.Enabled = runtimeEnabled

Runtime operator state now outlives config reloads. On the worker-
restart path (different health check) the new worker is
structurally fresh and the YAML's Enabled is still authoritative.

Both methods used `w.entry.Enabled` as their idempotency check,
which meant a stuck `Enabled=true, State=disabled` combo couldn't
be repaired even after the Reload fix (existing bad state had to
survive the upgrade). Switched both methods to key on
`w.backend.State`:

 - DisableBackend: if state == StateDisabled, sync the flag but
   don't emit a redundant transition; otherwise do the full
   state transition + flag flip + worker cancel.
 - EnableBackend: if state != StateDisabled, sync the flag but
   don't emit a redundant transition; otherwise do the full
   transition + flag flip + probe-goroutine restart.

Either method will now unstick any inconsistency between the
flag and the state machine — future drift from a panic, a new
code path we haven't thought of, or existing already-stuck
backends from before this commit are all repaired on the next
enable/disable call.

Changing a backend's weight can flip a frontend between up and
down (e.g. zeroing the last non-zero-weighted backend in the
active pool), but SetFrontendPoolBackendWeight never called
updateFrontendState, so the checker's cached frontend state
would drift from reality until the next genuine backend
transition happened to trigger a recompute. The symptom was
"show frontends nginx-ip4-http" reporting up even with every
effective_weight=0.

Fix: call c.updateFrontendState(frontendName, fe) after the
weight mutation, under the same lock. The recompute emits a
FrontendEvent transition if the aggregate flipped, so any
WatchEvents consumer picks up the change live.

stores/state.ts recomputeEffectiveWeights is renamed and
extended to recomputeDerivedState, which now also writes
fe.state using the same rule as health.ComputeFrontendState:
unknown if no backends or all unknown, up if any effective
weight > 0, down otherwise. Called from every mutation path
(replaceAll, replaceSnapshot, applyBackendTransition,
applyConfiguredWeight) so the SPA is authoritative for *display*
state and doesn't inherit any staleness the server's cached
frontendStates map might have.

applyFrontendTransition is now a no-op for the state field —
the server's `to` value is no longer trusted because
recomputeDerivedState walks the local backends array on every
update and produces a fresh, correct answer. The reducer is kept
as a named function so sse.ts's dispatch table still has a
landing spot for "frontend" events (they still feed the
DebugPanel via pushEvent); the empty body is deliberate, not a
bug — a comment at the top spells it out.
This commit is contained in:
2026-04-12 23:50:22 +02:00
parent 4347bb9b05
commit 1191b3d994
5 changed files with 125 additions and 34 deletions

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -4,7 +4,7 @@
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>maglev</title>
<script type="module" crossorigin src="/view/assets/index-BBNMNdtq.js"></script>
<script type="module" crossorigin src="/view/assets/index-C-XMkBf5.js"></script>
<link rel="stylesheet" crossorigin href="/view/assets/index-CxDuAfMR.css">
</head>
<body>

View File

@@ -8,22 +8,36 @@ import type {
} from "../types";
import { tick } from "./tick";
// recomputeEffectiveWeights mirrors the server-side
// health.EffectiveWeights / ActivePoolIndex logic so the SPA can keep
// pool.effective_weight correct the moment a backend transitions,
// without waiting for the 30s refresh. Walking every frontend is cheap
// — O(frontends × pools × backends-per-pool) with tiny constants —
// and it's strictly a function of the backend state map, so there's no
// risk of drift vs. the server as long as the rule stays the same.
// recomputeDerivedState mirrors the server-side
// health.EffectiveWeights / ActivePoolIndex / ComputeFrontendState
// logic so the SPA can keep pool.effective_weight AND the
// per-frontend aggregate state correct the moment any backend
// transitions or any configured weight changes, without waiting for
// the 30s refresh. Walking every frontend is cheap — O(frontends ×
// pools × backends-per-pool) with tiny constants — and it's
// strictly a function of the backend state map + configured
// weights, so there's no risk of drift vs. the server as long as
// the rules stay identical. The SPA is the authoritative source of
// truth for *display* state: the server's cached frontendStates
// field can be stale (e.g. after a SetFrontendPoolBackendWeight
// call that doesn't re-run updateFrontendState, or after a long-
// lived WatchEvents stream where a past transition corrupted the
// client's cache) and the SPA recomputes from its own live
// backends array to avoid inheriting any staleness.
//
// Rule: a backend gets its configured pool weight iff it is up AND
// belongs to the currently-active pool; everything else is 0. The
// active pool is the first pool containing a backend that is both
// up AND has a non-zero configured weight — a pool whose up backends
// are all weight=0 contributes no serving capacity and gets skipped
// over in priority failover. Kept in lock-step with
// internal/health/weights.go.
function recomputeEffectiveWeights(snap: StateSnapshot) {
// Effective weight rule: a backend gets its configured pool weight
// iff it is up AND belongs to the currently-active pool; everything
// else is 0. The active pool is the first pool containing a backend
// that is both up AND has a non-zero configured weight — a pool
// whose up backends are all weight=0 contributes no serving
// capacity and gets skipped over in priority failover. Kept in
// lock-step with internal/health/weights.go ActivePoolIndex.
//
// Frontend state rule: unknown if no backends or every referenced
// backend is still in StateUnknown; up if any backend in any pool
// has effective_weight > 0; otherwise down. Kept in lock-step with
// internal/health/weights.go ComputeFrontendState.
function recomputeDerivedState(snap: StateSnapshot) {
const stateOf: Record<string, string> = {};
for (const b of snap.backends) stateOf[b.name] = b.state;
for (const fe of snap.frontends) {
@@ -41,12 +55,29 @@ function recomputeEffectiveWeights(snap: StateSnapshot) {
break;
}
}
let anyEffective = false;
let seenAny = false;
let allUnknown = true;
const seen = new Set<string>();
for (let i = 0; i < fe.pools.length; i++) {
for (const pb of fe.pools[i].backends) {
const st = stateOf[pb.name];
pb.effective_weight = st === "up" && i === activePool ? pb.weight : 0;
if (pb.effective_weight > 0) anyEffective = true;
if (!seen.has(pb.name)) {
seen.add(pb.name);
seenAny = true;
if (st !== "unknown") allUnknown = false;
}
}
}
if (!seenAny || allUnknown) {
fe.state = "unknown";
} else if (anyEffective) {
fe.state = "up";
} else {
fe.state = "down";
}
}
}
@@ -61,6 +92,14 @@ const [state, setState] = createStore<FrontendState>({ byName: {} });
export { state };
export function replaceSnapshot(snap: StateSnapshot) {
// Recompute effective weights + aggregate frontend state locally
// from the snapshot's backends array, rather than trusting the
// server's state field verbatim. The server can be stale (the
// checker's frontendStates map is only updated on backend
// transitions, not on weight changes), so deriving from our own
// backend data is the only way to guarantee the display stays
// consistent with reality.
recomputeDerivedState(snap);
setState(
produce((s) => {
s.byName[snap.maglevd.name] = snap;
@@ -70,7 +109,10 @@ export function replaceSnapshot(snap: StateSnapshot) {
export function replaceAll(snaps: StateSnapshot[]) {
const byName: Record<string, StateSnapshot> = {};
for (const s of snaps) byName[s.maglevd.name] = s;
for (const s of snaps) {
recomputeDerivedState(s);
byName[s.maglevd.name] = s;
}
setState({ byName });
}
@@ -96,25 +138,26 @@ export function applyBackendTransition(maglevd: string, p: BackendEventPayload)
}
// A backend state change can shift which pool is active and
// therefore which pool-memberships get non-zero effective
// weights. Recompute for every frontend — not just the one
// weights, and in turn can flip the frontend's aggregate
// state. Recompute for every frontend — not just the one
// pointed at by this backend — because pool-failover is a
// per-frontend computation and the same backend can appear in
// multiple frontends with different pool placements.
recomputeEffectiveWeights(snap);
recomputeDerivedState(snap);
}),
);
}
export function applyFrontendTransition(maglevd: string, p: FrontendEventPayload) {
setState(
produce((s) => {
const snap = s.byName[maglevd];
if (!snap) return;
const fe = snap.frontends.find((x) => x.name === p.frontend);
if (!fe) return;
fe.state = p.transition.to;
}),
);
// Frontend-transition events arrive from the server's checker, but
// the SPA no longer trusts their `to` field — recomputeDerivedState
// walks the local backends array on every backend event and every
// hydration to produce an up-to-date frontend state that the server
// can't make stale. Kept as a named reducer so sse.ts's dispatch
// table still has a landing spot for "frontend" events (they also
// flow into the DebugPanel via pushEvent); the body is deliberately
// empty — not a bug.
export function applyFrontendTransition(_maglevd: string, _p: FrontendEventPayload) {
// no-op — state is derived client-side, see recomputeDerivedState
}
export function applyVPPStatus(maglevd: string, state: string) {
@@ -165,7 +208,7 @@ export function applyConfiguredWeight(
const pb = p.backends.find((x) => x.name === backend);
if (!pb) return;
pb.weight = weight;
recomputeEffectiveWeights(snap);
recomputeDerivedState(snap);
}),
);
}