Frontend aggregate state: SPA-side derive + checker fixes
The web UI showed the wrong up/down state for frontends whose pool
composition had been touched by a mix of runtime disable/enable and
weight changes: a frontend with every backend at effective_weight=0
would still display "up", while a sibling frontend with a serving
fallback backend would display "down". Two independent bugs, each
fixed on its own layer.
On the fast path (healthCheckEqual returns true), Reload did
`w.entry = b`, blindly replacing the runtime worker entry with the
fresh YAML record. YAML's default for Enabled is true, so any
backend the operator had runtime-disabled would have its Enabled
flag silently reset while the worker's backend.State stayed at
StateDisabled. Subsequent EnableBackend calls then early-returned
on `if w.entry.Enabled` and never transitioned the state machine
— the CLI reported "enabled, state is 'disabled'" and the backend
was permanently stuck.
Fix: preserve w.entry.Enabled across the fast-path replacement.
runtimeEnabled := w.entry.Enabled
w.entry = b
w.entry.Enabled = runtimeEnabled
Runtime operator state now outlives config reloads. On the worker-
restart path (different health check) the new worker is
structurally fresh and the YAML's Enabled is still authoritative.
Both methods used `w.entry.Enabled` as their idempotency check,
which meant a stuck `Enabled=true, State=disabled` combo couldn't
be repaired even after the Reload fix (existing bad state had to
survive the upgrade). Switched both methods to key on
`w.backend.State`:
- DisableBackend: if state == StateDisabled, sync the flag but
don't emit a redundant transition; otherwise do the full
state transition + flag flip + worker cancel.
- EnableBackend: if state != StateDisabled, sync the flag but
don't emit a redundant transition; otherwise do the full
transition + flag flip + probe-goroutine restart.
Either method will now unstick any inconsistency between the
flag and the state machine — future drift from a panic, a new
code path we haven't thought of, or existing already-stuck
backends from before this commit are all repaired on the next
enable/disable call.
Changing a backend's weight can flip a frontend between up and
down (e.g. zeroing the last non-zero-weighted backend in the
active pool), but SetFrontendPoolBackendWeight never called
updateFrontendState, so the checker's cached frontend state
would drift from reality until the next genuine backend
transition happened to trigger a recompute. The symptom was
"show frontends nginx-ip4-http" reporting up even with every
effective_weight=0.
Fix: call c.updateFrontendState(frontendName, fe) after the
weight mutation, under the same lock. The recompute emits a
FrontendEvent transition if the aggregate flipped, so any
WatchEvents consumer picks up the change live.
stores/state.ts recomputeEffectiveWeights is renamed and
extended to recomputeDerivedState, which now also writes
fe.state using the same rule as health.ComputeFrontendState:
unknown if no backends or all unknown, up if any effective
weight > 0, down otherwise. Called from every mutation path
(replaceAll, replaceSnapshot, applyBackendTransition,
applyConfiguredWeight) so the SPA is authoritative for *display*
state and doesn't inherit any staleness the server's cached
frontendStates map might have.
applyFrontendTransition is now a no-op for the state field —
the server's `to` value is no longer trusted because
recomputeDerivedState walks the local backends array on every
update and produces a fresh, correct answer. The reducer is kept
as a named function so sse.ts's dispatch table still has a
landing spot for "frontend" events (they still feed the
DebugPanel via pushEvent); the empty body is deliberate, not a
bug — a comment at the top spells it out.
This commit is contained in:
File diff suppressed because one or more lines are too long
1
cmd/frontend/web/dist/assets/index-C-XMkBf5.js
vendored
Normal file
1
cmd/frontend/web/dist/assets/index-C-XMkBf5.js
vendored
Normal file
File diff suppressed because one or more lines are too long
2
cmd/frontend/web/dist/index.html
vendored
2
cmd/frontend/web/dist/index.html
vendored
@@ -4,7 +4,7 @@
|
|||||||
<meta charset="UTF-8" />
|
<meta charset="UTF-8" />
|
||||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
<title>maglev</title>
|
<title>maglev</title>
|
||||||
<script type="module" crossorigin src="/view/assets/index-BBNMNdtq.js"></script>
|
<script type="module" crossorigin src="/view/assets/index-C-XMkBf5.js"></script>
|
||||||
<link rel="stylesheet" crossorigin href="/view/assets/index-CxDuAfMR.css">
|
<link rel="stylesheet" crossorigin href="/view/assets/index-CxDuAfMR.css">
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
|
|||||||
@@ -8,22 +8,36 @@ import type {
|
|||||||
} from "../types";
|
} from "../types";
|
||||||
import { tick } from "./tick";
|
import { tick } from "./tick";
|
||||||
|
|
||||||
// recomputeEffectiveWeights mirrors the server-side
|
// recomputeDerivedState mirrors the server-side
|
||||||
// health.EffectiveWeights / ActivePoolIndex logic so the SPA can keep
|
// health.EffectiveWeights / ActivePoolIndex / ComputeFrontendState
|
||||||
// pool.effective_weight correct the moment a backend transitions,
|
// logic so the SPA can keep pool.effective_weight AND the
|
||||||
// without waiting for the 30s refresh. Walking every frontend is cheap
|
// per-frontend aggregate state correct the moment any backend
|
||||||
// — O(frontends × pools × backends-per-pool) with tiny constants —
|
// transitions or any configured weight changes, without waiting for
|
||||||
// and it's strictly a function of the backend state map, so there's no
|
// the 30s refresh. Walking every frontend is cheap — O(frontends ×
|
||||||
// risk of drift vs. the server as long as the rule stays the same.
|
// pools × backends-per-pool) with tiny constants — and it's
|
||||||
|
// strictly a function of the backend state map + configured
|
||||||
|
// weights, so there's no risk of drift vs. the server as long as
|
||||||
|
// the rules stay identical. The SPA is the authoritative source of
|
||||||
|
// truth for *display* state: the server's cached frontendStates
|
||||||
|
// field can be stale (e.g. after a SetFrontendPoolBackendWeight
|
||||||
|
// call that doesn't re-run updateFrontendState, or after a long-
|
||||||
|
// lived WatchEvents stream where a past transition corrupted the
|
||||||
|
// client's cache) and the SPA recomputes from its own live
|
||||||
|
// backends array to avoid inheriting any staleness.
|
||||||
//
|
//
|
||||||
// Rule: a backend gets its configured pool weight iff it is up AND
|
// Effective weight rule: a backend gets its configured pool weight
|
||||||
// belongs to the currently-active pool; everything else is 0. The
|
// iff it is up AND belongs to the currently-active pool; everything
|
||||||
// active pool is the first pool containing a backend that is both
|
// else is 0. The active pool is the first pool containing a backend
|
||||||
// up AND has a non-zero configured weight — a pool whose up backends
|
// that is both up AND has a non-zero configured weight — a pool
|
||||||
// are all weight=0 contributes no serving capacity and gets skipped
|
// whose up backends are all weight=0 contributes no serving
|
||||||
// over in priority failover. Kept in lock-step with
|
// capacity and gets skipped over in priority failover. Kept in
|
||||||
// internal/health/weights.go.
|
// lock-step with internal/health/weights.go ActivePoolIndex.
|
||||||
function recomputeEffectiveWeights(snap: StateSnapshot) {
|
//
|
||||||
|
// Frontend state rule: unknown if no backends or every referenced
|
||||||
|
// backend is still in StateUnknown; up if any backend in any pool
|
||||||
|
// has effective_weight > 0; otherwise down. Kept in lock-step with
|
||||||
|
// internal/health/weights.go ComputeFrontendState.
|
||||||
|
function recomputeDerivedState(snap: StateSnapshot) {
|
||||||
const stateOf: Record<string, string> = {};
|
const stateOf: Record<string, string> = {};
|
||||||
for (const b of snap.backends) stateOf[b.name] = b.state;
|
for (const b of snap.backends) stateOf[b.name] = b.state;
|
||||||
for (const fe of snap.frontends) {
|
for (const fe of snap.frontends) {
|
||||||
@@ -41,13 +55,30 @@ function recomputeEffectiveWeights(snap: StateSnapshot) {
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
let anyEffective = false;
|
||||||
|
let seenAny = false;
|
||||||
|
let allUnknown = true;
|
||||||
|
const seen = new Set<string>();
|
||||||
for (let i = 0; i < fe.pools.length; i++) {
|
for (let i = 0; i < fe.pools.length; i++) {
|
||||||
for (const pb of fe.pools[i].backends) {
|
for (const pb of fe.pools[i].backends) {
|
||||||
const st = stateOf[pb.name];
|
const st = stateOf[pb.name];
|
||||||
pb.effective_weight = st === "up" && i === activePool ? pb.weight : 0;
|
pb.effective_weight = st === "up" && i === activePool ? pb.weight : 0;
|
||||||
|
if (pb.effective_weight > 0) anyEffective = true;
|
||||||
|
if (!seen.has(pb.name)) {
|
||||||
|
seen.add(pb.name);
|
||||||
|
seenAny = true;
|
||||||
|
if (st !== "unknown") allUnknown = false;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
if (!seenAny || allUnknown) {
|
||||||
|
fe.state = "unknown";
|
||||||
|
} else if (anyEffective) {
|
||||||
|
fe.state = "up";
|
||||||
|
} else {
|
||||||
|
fe.state = "down";
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// FrontendState keys snapshots by maglevd name. A single store drives the
|
// FrontendState keys snapshots by maglevd name. A single store drives the
|
||||||
@@ -61,6 +92,14 @@ const [state, setState] = createStore<FrontendState>({ byName: {} });
|
|||||||
export { state };
|
export { state };
|
||||||
|
|
||||||
export function replaceSnapshot(snap: StateSnapshot) {
|
export function replaceSnapshot(snap: StateSnapshot) {
|
||||||
|
// Recompute effective weights + aggregate frontend state locally
|
||||||
|
// from the snapshot's backends array, rather than trusting the
|
||||||
|
// server's state field verbatim. The server can be stale (the
|
||||||
|
// checker's frontendStates map is only updated on backend
|
||||||
|
// transitions, not on weight changes), so deriving from our own
|
||||||
|
// backend data is the only way to guarantee the display stays
|
||||||
|
// consistent with reality.
|
||||||
|
recomputeDerivedState(snap);
|
||||||
setState(
|
setState(
|
||||||
produce((s) => {
|
produce((s) => {
|
||||||
s.byName[snap.maglevd.name] = snap;
|
s.byName[snap.maglevd.name] = snap;
|
||||||
@@ -70,7 +109,10 @@ export function replaceSnapshot(snap: StateSnapshot) {
|
|||||||
|
|
||||||
export function replaceAll(snaps: StateSnapshot[]) {
|
export function replaceAll(snaps: StateSnapshot[]) {
|
||||||
const byName: Record<string, StateSnapshot> = {};
|
const byName: Record<string, StateSnapshot> = {};
|
||||||
for (const s of snaps) byName[s.maglevd.name] = s;
|
for (const s of snaps) {
|
||||||
|
recomputeDerivedState(s);
|
||||||
|
byName[s.maglevd.name] = s;
|
||||||
|
}
|
||||||
setState({ byName });
|
setState({ byName });
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -96,25 +138,26 @@ export function applyBackendTransition(maglevd: string, p: BackendEventPayload)
|
|||||||
}
|
}
|
||||||
// A backend state change can shift which pool is active and
|
// A backend state change can shift which pool is active and
|
||||||
// therefore which pool-memberships get non-zero effective
|
// therefore which pool-memberships get non-zero effective
|
||||||
// weights. Recompute for every frontend — not just the one
|
// weights, and in turn can flip the frontend's aggregate
|
||||||
|
// state. Recompute for every frontend — not just the one
|
||||||
// pointed at by this backend — because pool-failover is a
|
// pointed at by this backend — because pool-failover is a
|
||||||
// per-frontend computation and the same backend can appear in
|
// per-frontend computation and the same backend can appear in
|
||||||
// multiple frontends with different pool placements.
|
// multiple frontends with different pool placements.
|
||||||
recomputeEffectiveWeights(snap);
|
recomputeDerivedState(snap);
|
||||||
}),
|
}),
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
export function applyFrontendTransition(maglevd: string, p: FrontendEventPayload) {
|
// Frontend-transition events arrive from the server's checker, but
|
||||||
setState(
|
// the SPA no longer trusts their `to` field — recomputeDerivedState
|
||||||
produce((s) => {
|
// walks the local backends array on every backend event and every
|
||||||
const snap = s.byName[maglevd];
|
// hydration to produce an up-to-date frontend state that the server
|
||||||
if (!snap) return;
|
// can't make stale. Kept as a named reducer so sse.ts's dispatch
|
||||||
const fe = snap.frontends.find((x) => x.name === p.frontend);
|
// table still has a landing spot for "frontend" events (they also
|
||||||
if (!fe) return;
|
// flow into the DebugPanel via pushEvent); the body is deliberately
|
||||||
fe.state = p.transition.to;
|
// empty — not a bug.
|
||||||
}),
|
export function applyFrontendTransition(_maglevd: string, _p: FrontendEventPayload) {
|
||||||
);
|
// no-op — state is derived client-side, see recomputeDerivedState
|
||||||
}
|
}
|
||||||
|
|
||||||
export function applyVPPStatus(maglevd: string, state: string) {
|
export function applyVPPStatus(maglevd: string, state: string) {
|
||||||
@@ -165,7 +208,7 @@ export function applyConfiguredWeight(
|
|||||||
const pb = p.backends.find((x) => x.name === backend);
|
const pb = p.backends.find((x) => x.name === backend);
|
||||||
if (!pb) return;
|
if (!pb) return;
|
||||||
pb.weight = weight;
|
pb.weight = weight;
|
||||||
recomputeEffectiveWeights(snap);
|
recomputeDerivedState(snap);
|
||||||
}),
|
}),
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -138,8 +138,22 @@ func (c *Checker) Reload(ctx context.Context, cfg *config.Config) error {
|
|||||||
hc := cfg.HealthChecks[b.HealthCheck]
|
hc := cfg.HealthChecks[b.HealthCheck]
|
||||||
if w, ok := c.workers[name]; ok {
|
if w, ok := c.workers[name]; ok {
|
||||||
if healthCheckEqual(w.hc, hc) {
|
if healthCheckEqual(w.hc, hc) {
|
||||||
// Update entry metadata (weight, etc.) in place without restart.
|
// Update entry metadata (address, healthcheck name)
|
||||||
|
// in place without restart. Preserve the runtime
|
||||||
|
// Enabled flag — the operator's
|
||||||
|
// PauseBackend/DisableBackend/EnableBackend state
|
||||||
|
// must outlive config reloads so an operator who
|
||||||
|
// disabled a backend and then reloaded config
|
||||||
|
// (e.g. to adjust weights on an unrelated
|
||||||
|
// frontend) doesn't find their disabled backend
|
||||||
|
// silently re-enabled while its worker state
|
||||||
|
// remains stuck at StateDisabled. The YAML's
|
||||||
|
// Enabled field is still authoritative on the
|
||||||
|
// worker-restart path below (where the backend
|
||||||
|
// is structurally new to this worker instance).
|
||||||
|
runtimeEnabled := w.entry.Enabled
|
||||||
w.entry = b
|
w.entry = b
|
||||||
|
w.entry.Enabled = runtimeEnabled
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
slog.Info("backend-restart", "backend", name)
|
slog.Info("backend-restart", "backend", name)
|
||||||
@@ -237,6 +251,13 @@ func (c *Checker) GetFrontend(name string) (config.Frontend, bool) {
|
|||||||
// SetFrontendPoolBackendWeight updates the weight of a backend within a named
|
// SetFrontendPoolBackendWeight updates the weight of a backend within a named
|
||||||
// pool of a frontend. Returns the updated FrontendInfo and a descriptive error
|
// pool of a frontend. Returns the updated FrontendInfo and a descriptive error
|
||||||
// if the frontend, pool, or backend is not found or the weight is out of range.
|
// if the frontend, pool, or backend is not found or the weight is out of range.
|
||||||
|
//
|
||||||
|
// After mutating the weight, updateFrontendState is re-run for the affected
|
||||||
|
// frontend so the aggregate state reflects the new effective weights. A
|
||||||
|
// weight change can flip a frontend between UP and DOWN (e.g. zeroing the
|
||||||
|
// last non-zero-weighted backend in the active pool), and without this
|
||||||
|
// call the checker's cached frontend state would drift from reality until
|
||||||
|
// the next genuine backend transition happens to trigger a recompute.
|
||||||
func (c *Checker) SetFrontendPoolBackendWeight(frontendName, poolName, backendName string, weight int) (config.Frontend, error) {
|
func (c *Checker) SetFrontendPoolBackendWeight(frontendName, poolName, backendName string, weight int) (config.Frontend, error) {
|
||||||
if weight < 0 || weight > 100 {
|
if weight < 0 || weight > 100 {
|
||||||
return config.Frontend{}, fmt.Errorf("weight %d out of range [0, 100]", weight)
|
return config.Frontend{}, fmt.Errorf("weight %d out of range [0, 100]", weight)
|
||||||
@@ -259,6 +280,7 @@ func (c *Checker) SetFrontendPoolBackendWeight(frontendName, poolName, backendNa
|
|||||||
fe.Pools[i].Backends[backendName] = pb
|
fe.Pools[i].Backends[backendName] = pb
|
||||||
c.cfg.Frontends[frontendName] = fe
|
c.cfg.Frontends[frontendName] = fe
|
||||||
slog.Info("frontend-pool-weight", "frontend", frontendName, "pool", poolName, "backend", backendName, "weight", weight)
|
slog.Info("frontend-pool-weight", "frontend", frontendName, "pool", poolName, "backend", backendName, "weight", weight)
|
||||||
|
c.updateFrontendState(frontendName, fe)
|
||||||
return fe, nil
|
return fe, nil
|
||||||
}
|
}
|
||||||
return config.Frontend{}, fmt.Errorf("pool %q not found in frontend %q", poolName, frontendName)
|
return config.Frontend{}, fmt.Errorf("pool %q not found in frontend %q", poolName, frontendName)
|
||||||
@@ -410,6 +432,13 @@ func (c *Checker) ResumeBackend(name string) (BackendSnapshot, error) {
|
|||||||
// DisableBackend stops health checking for a backend and removes it from active
|
// DisableBackend stops health checking for a backend and removes it from active
|
||||||
// rotation. The worker entry is kept in the map so the backend remains visible
|
// rotation. The worker entry is kept in the map so the backend remains visible
|
||||||
// via GetBackend and can be re-enabled with EnableBackend.
|
// via GetBackend and can be re-enabled with EnableBackend.
|
||||||
|
//
|
||||||
|
// Preconditions are keyed on w.backend.State rather than w.entry.Enabled so
|
||||||
|
// that any drift between the two fields (e.g. a past Reload that reset the
|
||||||
|
// flag without transitioning state) is self-healing: if the state is not
|
||||||
|
// already disabled we always do the full transition and bring the flag in
|
||||||
|
// line, and if the state is already disabled we fix up the flag without a
|
||||||
|
// no-op transition.
|
||||||
func (c *Checker) DisableBackend(name string) (BackendSnapshot, bool) {
|
func (c *Checker) DisableBackend(name string) (BackendSnapshot, bool) {
|
||||||
c.mu.Lock()
|
c.mu.Lock()
|
||||||
defer c.mu.Unlock()
|
defer c.mu.Unlock()
|
||||||
@@ -417,7 +446,14 @@ func (c *Checker) DisableBackend(name string) (BackendSnapshot, bool) {
|
|||||||
if !ok {
|
if !ok {
|
||||||
return BackendSnapshot{}, false
|
return BackendSnapshot{}, false
|
||||||
}
|
}
|
||||||
if !w.entry.Enabled {
|
if w.backend.State == health.StateDisabled {
|
||||||
|
// Already disabled at the state level; make sure the flag
|
||||||
|
// reflects reality without emitting a redundant transition.
|
||||||
|
w.entry.Enabled = false
|
||||||
|
if b, ok := c.cfg.Backends[name]; ok {
|
||||||
|
b.Enabled = false
|
||||||
|
c.cfg.Backends[name] = b
|
||||||
|
}
|
||||||
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
|
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
|
||||||
}
|
}
|
||||||
maxHistory := c.cfg.HealthChecker.TransitionHistory
|
maxHistory := c.cfg.HealthChecker.TransitionHistory
|
||||||
@@ -439,6 +475,12 @@ func (c *Checker) DisableBackend(name string) (BackendSnapshot, bool) {
|
|||||||
// EnableBackend re-enables a previously disabled backend. The existing
|
// EnableBackend re-enables a previously disabled backend. The existing
|
||||||
// Backend struct is reused — its transition history is preserved — and a
|
// Backend struct is reused — its transition history is preserved — and a
|
||||||
// fresh probe goroutine is launched. The backend re-enters StateUnknown.
|
// fresh probe goroutine is launched. The backend re-enters StateUnknown.
|
||||||
|
//
|
||||||
|
// Preconditions are keyed on w.backend.State rather than w.entry.Enabled, so
|
||||||
|
// drift between the two (most commonly caused by a Reload that reset the
|
||||||
|
// flag while the worker state was still disabled) doesn't wedge the backend
|
||||||
|
// — we always do the full transition when the state is disabled, and skip
|
||||||
|
// it (while syncing the flag) when it's not.
|
||||||
func (c *Checker) EnableBackend(name string) (BackendSnapshot, bool) {
|
func (c *Checker) EnableBackend(name string) (BackendSnapshot, bool) {
|
||||||
c.mu.Lock()
|
c.mu.Lock()
|
||||||
defer c.mu.Unlock()
|
defer c.mu.Unlock()
|
||||||
@@ -446,7 +488,13 @@ func (c *Checker) EnableBackend(name string) (BackendSnapshot, bool) {
|
|||||||
if !ok {
|
if !ok {
|
||||||
return BackendSnapshot{}, false
|
return BackendSnapshot{}, false
|
||||||
}
|
}
|
||||||
if w.entry.Enabled {
|
if w.backend.State != health.StateDisabled {
|
||||||
|
// Not in the disabled state — just make the flag match.
|
||||||
|
w.entry.Enabled = true
|
||||||
|
if b, ok := c.cfg.Backends[name]; ok {
|
||||||
|
b.Enabled = true
|
||||||
|
c.cfg.Backends[name] = b
|
||||||
|
}
|
||||||
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
|
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
|
||||||
}
|
}
|
||||||
w.entry.Enabled = true
|
w.entry.Enabled = true
|
||||||
|
|||||||
Reference in New Issue
Block a user