Distinguish disabled from removed backend state; add make fixstyle

Add StateDisabled for operator-initiated disable, keeping StateRemoved
for backends that disappear during a config reload. Previously both
used StateRemoved, which was confusing: "removed" implies the backend
no longer exists in config, but a disabled backend is still present
and can be re-enabled on the fly.

- health: add StateDisabled with String() "disabled", Disable() method
  with probe code "disabled". Record() rejects probes in all three
  inactive states (paused, disabled, removed).
- checker: DisableBackend calls backend.Disable() instead of Remove().
- docs: healthchecks.md rewritten for pause (goroutine cancelled, not
  just results discarded), and separate disabled/removed state rows.
  user-guide.md updated to match.
- Makefile: add fixstyle target (gofmt -w .).
This commit is contained in:
2026-04-11 21:04:17 +02:00
parent 4ab3096c8b
commit 1815675fb6
10 changed files with 68 additions and 30 deletions

View File

@@ -2,7 +2,7 @@
`maglevd` probes each backend independently of how many frontends reference it.
Every backend runs exactly one probe goroutine. State changes are broadcast as
gRPC events to all connected `WatchBackendEvents` subscribers.
gRPC events to all connected `WatchEvents` subscribers.
---
@@ -10,11 +10,12 @@ gRPC events to all connected `WatchBackendEvents` subscribers.
| State | Meaning |
|---|---|
| `unknown` | Initial state; also entered after a resume or backend restart. |
| `unknown` | Initial state; also entered after a resume or enable. |
| `up` | Backend is healthy and eligible to receive traffic. |
| `down` | Backend has failed enough consecutive probes to be considered offline. |
| `paused` | Health checking suspended by an operator. Probes fire but results are discarded. |
| `removed` | Backend was removed from configuration. No further probes are accepted. |
| `paused` | Health checking stopped by an operator. No probes are sent. |
| `disabled` | Backend disabled by an operator. No probes are sent. |
| `removed` | Backend removed from configuration by a reload. No probes are sent. |
---
@@ -41,9 +42,9 @@ without bouncing between up and down.
### Expedited unknown resolution
When a backend enters `unknown` state (new, restarted, or resumed) its counter
is pre-loaded to `rise 1`. This means a single probe result is enough to
resolve the state:
When a backend enters `unknown` state (new, restarted, resumed, or re-enabled)
its counter is pre-loaded to `rise 1`. This means a single probe result is
enough to resolve the state:
- **1 pass** → `up`
- **1 fail** → `down` (also via the special unknown shortcut below)
@@ -92,7 +93,7 @@ that are known to be offline.
## Transition events
Every state change is logged as `backend-transition` and emitted as a gRPC
`BackendEvent` to all active `WatchBackendEvents` streams.
`BackendEvent` to all active `WatchEvents` streams.
### Backend added (config load or reload)
@@ -100,7 +101,7 @@ Every state change is logged as `backend-transition` and emitted as a gRPC
unknown → unknown (code: start)
```
The counter is pre-loaded to `rise 1`. The first probe fires immediately at
The counter is pre-loaded to `rise 1`. The first probe fires after
`fast-interval` (or `interval` if not configured). One pass produces `unknown →
up`; one fail produces `unknown → down`.
@@ -127,8 +128,9 @@ If multiple backends start together they are staggered across the first
<any> → paused (operator action)
```
The counter is reset to 0. Probes continue to fire on their normal schedule but
all results are discarded. The backend stays `paused` until explicitly resumed.
The counter is reset to 0. The probe goroutine is cancelled — no further
probes are sent and no traffic reaches the backend while it is paused. The
backend stays `paused` until explicitly resumed.
### Resume
@@ -136,9 +138,31 @@ all results are discarded. The backend stays `paused` until explicitly resumed.
paused → unknown (operator action)
```
The counter is reset to `rise 1`. The probe goroutine is woken immediately
(no wait for the next scheduled probe). One subsequent pass produces `unknown →
up`; one fail produces `unknown → down`.
The counter is reset to `rise 1`. A fresh probe goroutine is started,
which fires its first probe after `fast-interval` (or `interval` if not
configured). One pass produces `unknown → up`; one fail produces `unknown →
down`.
### Disable
```
<any> → disabled (operator action)
```
The probe goroutine is cancelled and the backend is marked `enabled: false`.
No further probes are sent. The backend remains visible via the gRPC API (state
`disabled`) and can be re-enabled without a config reload.
### Enable
```
disabled → unknown (operator action, via fresh goroutine)
```
A new probe goroutine is started and the backend re-enters `unknown` with the
counter pre-loaded to `rise 1`. The `enabled` flag is set back to `true`.
The first probe fires after `fast-interval` and resolves state as described
under *Backend added*.
### Backend removed (config reload)
@@ -175,4 +199,4 @@ All state changes produce a structured log line at `INFO` level:
Probe-driven transitions also carry `code` and `detail` fields from the probe
result (e.g. `L4CON`, `L7STS`, `connection refused`). Operator-driven
transitions (pause, resume) carry empty code and detail.
transitions (pause, resume, disable, enable) carry empty code and detail.

View File

@@ -87,7 +87,7 @@ set backend <name> pause Suspend health checking for a backend, freezing
set backend <name> resume Resume health checking; backend re-enters unknown state
and is probed immediately.
set backend <name> disable Stop probing entirely and remove the backend from rotation.
The backend remains visible (state: removed) and can be
The backend remains visible (state: disabled) and can be
re-enabled without reloading configuration.
set backend <name> enable Re-enable a disabled backend. A fresh probe goroutine is
started and the backend re-enters unknown state.