Distinguish disabled from removed backend state; add make fixstyle
Add StateDisabled for operator-initiated disable, keeping StateRemoved for backends that disappear during a config reload. Previously both used StateRemoved, which was confusing: "removed" implies the backend no longer exists in config, but a disabled backend is still present and can be re-enabled on the fly. - health: add StateDisabled with String() "disabled", Disable() method with probe code "disabled". Record() rejects probes in all three inactive states (paused, disabled, removed). - checker: DisableBackend calls backend.Disable() instead of Remove(). - docs: healthchecks.md rewritten for pause (goroutine cancelled, not just results discarded), and separate disabled/removed state rows. user-guide.md updated to match. - Makefile: add fixstyle target (gofmt -w .).
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
`maglevd` probes each backend independently of how many frontends reference it.
|
||||
Every backend runs exactly one probe goroutine. State changes are broadcast as
|
||||
gRPC events to all connected `WatchBackendEvents` subscribers.
|
||||
gRPC events to all connected `WatchEvents` subscribers.
|
||||
|
||||
---
|
||||
|
||||
@@ -10,11 +10,12 @@ gRPC events to all connected `WatchBackendEvents` subscribers.
|
||||
|
||||
| State | Meaning |
|
||||
|---|---|
|
||||
| `unknown` | Initial state; also entered after a resume or backend restart. |
|
||||
| `unknown` | Initial state; also entered after a resume or enable. |
|
||||
| `up` | Backend is healthy and eligible to receive traffic. |
|
||||
| `down` | Backend has failed enough consecutive probes to be considered offline. |
|
||||
| `paused` | Health checking suspended by an operator. Probes fire but results are discarded. |
|
||||
| `removed` | Backend was removed from configuration. No further probes are accepted. |
|
||||
| `paused` | Health checking stopped by an operator. No probes are sent. |
|
||||
| `disabled` | Backend disabled by an operator. No probes are sent. |
|
||||
| `removed` | Backend removed from configuration by a reload. No probes are sent. |
|
||||
|
||||
---
|
||||
|
||||
@@ -41,9 +42,9 @@ without bouncing between up and down.
|
||||
|
||||
### Expedited unknown resolution
|
||||
|
||||
When a backend enters `unknown` state (new, restarted, or resumed) its counter
|
||||
is pre-loaded to `rise − 1`. This means a single probe result is enough to
|
||||
resolve the state:
|
||||
When a backend enters `unknown` state (new, restarted, resumed, or re-enabled)
|
||||
its counter is pre-loaded to `rise − 1`. This means a single probe result is
|
||||
enough to resolve the state:
|
||||
|
||||
- **1 pass** → `up`
|
||||
- **1 fail** → `down` (also via the special unknown shortcut below)
|
||||
@@ -92,7 +93,7 @@ that are known to be offline.
|
||||
## Transition events
|
||||
|
||||
Every state change is logged as `backend-transition` and emitted as a gRPC
|
||||
`BackendEvent` to all active `WatchBackendEvents` streams.
|
||||
`BackendEvent` to all active `WatchEvents` streams.
|
||||
|
||||
### Backend added (config load or reload)
|
||||
|
||||
@@ -100,7 +101,7 @@ Every state change is logged as `backend-transition` and emitted as a gRPC
|
||||
unknown → unknown (code: start)
|
||||
```
|
||||
|
||||
The counter is pre-loaded to `rise − 1`. The first probe fires immediately at
|
||||
The counter is pre-loaded to `rise − 1`. The first probe fires after
|
||||
`fast-interval` (or `interval` if not configured). One pass produces `unknown →
|
||||
up`; one fail produces `unknown → down`.
|
||||
|
||||
@@ -127,8 +128,9 @@ If multiple backends start together they are staggered across the first
|
||||
<any> → paused (operator action)
|
||||
```
|
||||
|
||||
The counter is reset to 0. Probes continue to fire on their normal schedule but
|
||||
all results are discarded. The backend stays `paused` until explicitly resumed.
|
||||
The counter is reset to 0. The probe goroutine is cancelled — no further
|
||||
probes are sent and no traffic reaches the backend while it is paused. The
|
||||
backend stays `paused` until explicitly resumed.
|
||||
|
||||
### Resume
|
||||
|
||||
@@ -136,9 +138,31 @@ all results are discarded. The backend stays `paused` until explicitly resumed.
|
||||
paused → unknown (operator action)
|
||||
```
|
||||
|
||||
The counter is reset to `rise − 1`. The probe goroutine is woken immediately
|
||||
(no wait for the next scheduled probe). One subsequent pass produces `unknown →
|
||||
up`; one fail produces `unknown → down`.
|
||||
The counter is reset to `rise − 1`. A fresh probe goroutine is started,
|
||||
which fires its first probe after `fast-interval` (or `interval` if not
|
||||
configured). One pass produces `unknown → up`; one fail produces `unknown →
|
||||
down`.
|
||||
|
||||
### Disable
|
||||
|
||||
```
|
||||
<any> → disabled (operator action)
|
||||
```
|
||||
|
||||
The probe goroutine is cancelled and the backend is marked `enabled: false`.
|
||||
No further probes are sent. The backend remains visible via the gRPC API (state
|
||||
`disabled`) and can be re-enabled without a config reload.
|
||||
|
||||
### Enable
|
||||
|
||||
```
|
||||
disabled → unknown (operator action, via fresh goroutine)
|
||||
```
|
||||
|
||||
A new probe goroutine is started and the backend re-enters `unknown` with the
|
||||
counter pre-loaded to `rise − 1`. The `enabled` flag is set back to `true`.
|
||||
The first probe fires after `fast-interval` and resolves state as described
|
||||
under *Backend added*.
|
||||
|
||||
### Backend removed (config reload)
|
||||
|
||||
@@ -175,4 +199,4 @@ All state changes produce a structured log line at `INFO` level:
|
||||
|
||||
Probe-driven transitions also carry `code` and `detail` fields from the probe
|
||||
result (e.g. `L4CON`, `L7STS`, `connection refused`). Operator-driven
|
||||
transitions (pause, resume) carry empty code and detail.
|
||||
transitions (pause, resume, disable, enable) carry empty code and detail.
|
||||
|
||||
@@ -87,7 +87,7 @@ set backend <name> pause Suspend health checking for a backend, freezing
|
||||
set backend <name> resume Resume health checking; backend re-enters unknown state
|
||||
and is probed immediately.
|
||||
set backend <name> disable Stop probing entirely and remove the backend from rotation.
|
||||
The backend remains visible (state: removed) and can be
|
||||
The backend remains visible (state: disabled) and can be
|
||||
re-enabled without reloading configuration.
|
||||
set backend <name> enable Re-enable a disabled backend. A fresh probe goroutine is
|
||||
started and the backend re-enters unknown state.
|
||||
|
||||
Reference in New Issue
Block a user