Distinguish disabled from removed backend state; add make fixstyle
Add StateDisabled for operator-initiated disable, keeping StateRemoved for backends that disappear during a config reload. Previously both used StateRemoved, which was confusing: "removed" implies the backend no longer exists in config, but a disabled backend is still present and can be re-enabled on the fly. - health: add StateDisabled with String() "disabled", Disable() method with probe code "disabled". Record() rejects probes in all three inactive states (paused, disabled, removed). - checker: DisableBackend calls backend.Disable() instead of Remove(). - docs: healthchecks.md rewritten for pause (goroutine cancelled, not just results discarded), and separate disabled/removed state rows. user-guide.md updated to match. - Makefile: add fixstyle target (gofmt -w .).
This commit is contained in:
5
Makefile
5
Makefile
@@ -14,7 +14,7 @@ LDFLAGS := -X '$(MODULE)/cmd.version=$(VERSION)' \
|
|||||||
|
|
||||||
TEST ?= tests/
|
TEST ?= tests/
|
||||||
|
|
||||||
.PHONY: all build build-amd64 build-arm64 test proto lint pkg-deb robot-test clean
|
.PHONY: all build build-amd64 build-arm64 test proto lint fixstyle pkg-deb robot-test clean
|
||||||
|
|
||||||
all: build
|
all: build
|
||||||
|
|
||||||
@@ -48,6 +48,9 @@ $(GEN_FILES): $(PROTO_FILE)
|
|||||||
--go-grpc_out=. --go-grpc_opt=module=$(MODULE) \
|
--go-grpc_out=. --go-grpc_opt=module=$(MODULE) \
|
||||||
$(PROTO_FILE)
|
$(PROTO_FILE)
|
||||||
|
|
||||||
|
fixstyle:
|
||||||
|
gofmt -w .
|
||||||
|
|
||||||
lint:
|
lint:
|
||||||
golangci-lint run ./...
|
golangci-lint run ./...
|
||||||
|
|
||||||
|
|||||||
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
`maglevd` probes each backend independently of how many frontends reference it.
|
`maglevd` probes each backend independently of how many frontends reference it.
|
||||||
Every backend runs exactly one probe goroutine. State changes are broadcast as
|
Every backend runs exactly one probe goroutine. State changes are broadcast as
|
||||||
gRPC events to all connected `WatchBackendEvents` subscribers.
|
gRPC events to all connected `WatchEvents` subscribers.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -10,11 +10,12 @@ gRPC events to all connected `WatchBackendEvents` subscribers.
|
|||||||
|
|
||||||
| State | Meaning |
|
| State | Meaning |
|
||||||
|---|---|
|
|---|---|
|
||||||
| `unknown` | Initial state; also entered after a resume or backend restart. |
|
| `unknown` | Initial state; also entered after a resume or enable. |
|
||||||
| `up` | Backend is healthy and eligible to receive traffic. |
|
| `up` | Backend is healthy and eligible to receive traffic. |
|
||||||
| `down` | Backend has failed enough consecutive probes to be considered offline. |
|
| `down` | Backend has failed enough consecutive probes to be considered offline. |
|
||||||
| `paused` | Health checking suspended by an operator. Probes fire but results are discarded. |
|
| `paused` | Health checking stopped by an operator. No probes are sent. |
|
||||||
| `removed` | Backend was removed from configuration. No further probes are accepted. |
|
| `disabled` | Backend disabled by an operator. No probes are sent. |
|
||||||
|
| `removed` | Backend removed from configuration by a reload. No probes are sent. |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -41,9 +42,9 @@ without bouncing between up and down.
|
|||||||
|
|
||||||
### Expedited unknown resolution
|
### Expedited unknown resolution
|
||||||
|
|
||||||
When a backend enters `unknown` state (new, restarted, or resumed) its counter
|
When a backend enters `unknown` state (new, restarted, resumed, or re-enabled)
|
||||||
is pre-loaded to `rise − 1`. This means a single probe result is enough to
|
its counter is pre-loaded to `rise − 1`. This means a single probe result is
|
||||||
resolve the state:
|
enough to resolve the state:
|
||||||
|
|
||||||
- **1 pass** → `up`
|
- **1 pass** → `up`
|
||||||
- **1 fail** → `down` (also via the special unknown shortcut below)
|
- **1 fail** → `down` (also via the special unknown shortcut below)
|
||||||
@@ -92,7 +93,7 @@ that are known to be offline.
|
|||||||
## Transition events
|
## Transition events
|
||||||
|
|
||||||
Every state change is logged as `backend-transition` and emitted as a gRPC
|
Every state change is logged as `backend-transition` and emitted as a gRPC
|
||||||
`BackendEvent` to all active `WatchBackendEvents` streams.
|
`BackendEvent` to all active `WatchEvents` streams.
|
||||||
|
|
||||||
### Backend added (config load or reload)
|
### Backend added (config load or reload)
|
||||||
|
|
||||||
@@ -100,7 +101,7 @@ Every state change is logged as `backend-transition` and emitted as a gRPC
|
|||||||
unknown → unknown (code: start)
|
unknown → unknown (code: start)
|
||||||
```
|
```
|
||||||
|
|
||||||
The counter is pre-loaded to `rise − 1`. The first probe fires immediately at
|
The counter is pre-loaded to `rise − 1`. The first probe fires after
|
||||||
`fast-interval` (or `interval` if not configured). One pass produces `unknown →
|
`fast-interval` (or `interval` if not configured). One pass produces `unknown →
|
||||||
up`; one fail produces `unknown → down`.
|
up`; one fail produces `unknown → down`.
|
||||||
|
|
||||||
@@ -127,8 +128,9 @@ If multiple backends start together they are staggered across the first
|
|||||||
<any> → paused (operator action)
|
<any> → paused (operator action)
|
||||||
```
|
```
|
||||||
|
|
||||||
The counter is reset to 0. Probes continue to fire on their normal schedule but
|
The counter is reset to 0. The probe goroutine is cancelled — no further
|
||||||
all results are discarded. The backend stays `paused` until explicitly resumed.
|
probes are sent and no traffic reaches the backend while it is paused. The
|
||||||
|
backend stays `paused` until explicitly resumed.
|
||||||
|
|
||||||
### Resume
|
### Resume
|
||||||
|
|
||||||
@@ -136,9 +138,31 @@ all results are discarded. The backend stays `paused` until explicitly resumed.
|
|||||||
paused → unknown (operator action)
|
paused → unknown (operator action)
|
||||||
```
|
```
|
||||||
|
|
||||||
The counter is reset to `rise − 1`. The probe goroutine is woken immediately
|
The counter is reset to `rise − 1`. A fresh probe goroutine is started,
|
||||||
(no wait for the next scheduled probe). One subsequent pass produces `unknown →
|
which fires its first probe after `fast-interval` (or `interval` if not
|
||||||
up`; one fail produces `unknown → down`.
|
configured). One pass produces `unknown → up`; one fail produces `unknown →
|
||||||
|
down`.
|
||||||
|
|
||||||
|
### Disable
|
||||||
|
|
||||||
|
```
|
||||||
|
<any> → disabled (operator action)
|
||||||
|
```
|
||||||
|
|
||||||
|
The probe goroutine is cancelled and the backend is marked `enabled: false`.
|
||||||
|
No further probes are sent. The backend remains visible via the gRPC API (state
|
||||||
|
`disabled`) and can be re-enabled without a config reload.
|
||||||
|
|
||||||
|
### Enable
|
||||||
|
|
||||||
|
```
|
||||||
|
disabled → unknown (operator action, via fresh goroutine)
|
||||||
|
```
|
||||||
|
|
||||||
|
A new probe goroutine is started and the backend re-enters `unknown` with the
|
||||||
|
counter pre-loaded to `rise − 1`. The `enabled` flag is set back to `true`.
|
||||||
|
The first probe fires after `fast-interval` and resolves state as described
|
||||||
|
under *Backend added*.
|
||||||
|
|
||||||
### Backend removed (config reload)
|
### Backend removed (config reload)
|
||||||
|
|
||||||
@@ -175,4 +199,4 @@ All state changes produce a structured log line at `INFO` level:
|
|||||||
|
|
||||||
Probe-driven transitions also carry `code` and `detail` fields from the probe
|
Probe-driven transitions also carry `code` and `detail` fields from the probe
|
||||||
result (e.g. `L4CON`, `L7STS`, `connection refused`). Operator-driven
|
result (e.g. `L4CON`, `L7STS`, `connection refused`). Operator-driven
|
||||||
transitions (pause, resume) carry empty code and detail.
|
transitions (pause, resume, disable, enable) carry empty code and detail.
|
||||||
|
|||||||
@@ -87,7 +87,7 @@ set backend <name> pause Suspend health checking for a backend, freezing
|
|||||||
set backend <name> resume Resume health checking; backend re-enters unknown state
|
set backend <name> resume Resume health checking; backend re-enters unknown state
|
||||||
and is probed immediately.
|
and is probed immediately.
|
||||||
set backend <name> disable Stop probing entirely and remove the backend from rotation.
|
set backend <name> disable Stop probing entirely and remove the backend from rotation.
|
||||||
The backend remains visible (state: removed) and can be
|
The backend remains visible (state: disabled) and can be
|
||||||
re-enabled without reloading configuration.
|
re-enabled without reloading configuration.
|
||||||
set backend <name> enable Re-enable a disabled backend. A fresh probe goroutine is
|
set backend <name> enable Re-enable a disabled backend. A fresh probe goroutine is
|
||||||
started and the backend re-enters unknown state.
|
started and the backend re-enters unknown state.
|
||||||
|
|||||||
@@ -350,7 +350,7 @@ func (c *Checker) DisableBackend(name string) (BackendSnapshot, bool) {
|
|||||||
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
|
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
|
||||||
}
|
}
|
||||||
maxHistory := c.cfg.HealthChecker.TransitionHistory
|
maxHistory := c.cfg.HealthChecker.TransitionHistory
|
||||||
t := w.backend.Remove(maxHistory)
|
t := w.backend.Disable(maxHistory)
|
||||||
slog.Info("backend-disable", "backend", name)
|
slog.Info("backend-disable", "backend", name)
|
||||||
c.emitForBackend(name, w.backend.Address, t, c.cfg.Frontends)
|
c.emitForBackend(name, w.backend.Address, t, c.cfg.Frontends)
|
||||||
w.cancel()
|
w.cancel()
|
||||||
|
|||||||
@@ -310,8 +310,8 @@ func TestEnableDisable(t *testing.T) {
|
|||||||
if !ok {
|
if !ok {
|
||||||
t.Fatal("DisableBackend: not found")
|
t.Fatal("DisableBackend: not found")
|
||||||
}
|
}
|
||||||
if b.Health.State != health.StateRemoved {
|
if b.Health.State != health.StateDisabled {
|
||||||
t.Errorf("after disable: state=%s, want removed", b.Health.State)
|
t.Errorf("after disable: state=%s, want disabled", b.Health.State)
|
||||||
}
|
}
|
||||||
if b.Config.Enabled {
|
if b.Config.Enabled {
|
||||||
t.Error("after disable: Enabled should be false")
|
t.Error("after disable: Enabled should be false")
|
||||||
|
|||||||
@@ -268,8 +268,8 @@ func TestEnableDisableBackend(t *testing.T) {
|
|||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("DisableBackend: %v", err)
|
t.Fatalf("DisableBackend: %v", err)
|
||||||
}
|
}
|
||||||
if info.State != "removed" {
|
if info.State != "disabled" {
|
||||||
t.Errorf("after disable: got %q, want removed", info.State)
|
t.Errorf("after disable: got %q, want disabled", info.State)
|
||||||
}
|
}
|
||||||
if info.Enabled {
|
if info.Enabled {
|
||||||
t.Error("after disable: Enabled should be false")
|
t.Error("after disable: Enabled should be false")
|
||||||
|
|||||||
@@ -30,10 +30,11 @@ type State int
|
|||||||
|
|
||||||
const (
|
const (
|
||||||
StateUnknown State = iota // initial state before first probe
|
StateUnknown State = iota // initial state before first probe
|
||||||
StateUp
|
StateUp // backend is healthy
|
||||||
StateDown
|
StateDown // backend has failed enough probes
|
||||||
StatePaused
|
StatePaused // operator paused health checking
|
||||||
StateRemoved // backend was removed from configuration
|
StateDisabled // operator disabled the backend
|
||||||
|
StateRemoved // backend removed from configuration by reload
|
||||||
)
|
)
|
||||||
|
|
||||||
func (s State) String() string {
|
func (s State) String() string {
|
||||||
@@ -46,6 +47,8 @@ func (s State) String() string {
|
|||||||
return "down"
|
return "down"
|
||||||
case StatePaused:
|
case StatePaused:
|
||||||
return "paused"
|
return "paused"
|
||||||
|
case StateDisabled:
|
||||||
|
return "disabled"
|
||||||
case StateRemoved:
|
case StateRemoved:
|
||||||
return "removed"
|
return "removed"
|
||||||
default:
|
default:
|
||||||
@@ -123,7 +126,7 @@ func New(name string, addr net.IP, rise, fall int) *Backend {
|
|||||||
// failure means the backend is not yet confirmed reachable), and to StateUp
|
// failure means the backend is not yet confirmed reachable), and to StateUp
|
||||||
// once the counter reaches Rise consecutive passes.
|
// once the counter reaches Rise consecutive passes.
|
||||||
func (b *Backend) Record(r ProbeResult, maxHistory int) bool {
|
func (b *Backend) Record(r ProbeResult, maxHistory int) bool {
|
||||||
if b.State == StatePaused || b.State == StateRemoved {
|
if b.State == StatePaused || b.State == StateDisabled || b.State == StateRemoved {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
if r.OK {
|
if r.OK {
|
||||||
@@ -196,6 +199,13 @@ func (b *Backend) Start(maxHistory int) Transition {
|
|||||||
return b.Transitions[0]
|
return b.Transitions[0]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Disable transitions the backend to StateDisabled. Returns the transition.
|
||||||
|
// After this call no further probe results are accepted.
|
||||||
|
func (b *Backend) Disable(maxHistory int) Transition {
|
||||||
|
b.transition(StateDisabled, ProbeResult{Code: "disabled"}, maxHistory)
|
||||||
|
return b.Transitions[0]
|
||||||
|
}
|
||||||
|
|
||||||
// Remove transitions the backend to StateRemoved. Returns the transition.
|
// Remove transitions the backend to StateRemoved. Returns the transition.
|
||||||
// After this call no further probe results are accepted.
|
// After this call no further probe results are accepted.
|
||||||
func (b *Backend) Remove(maxHistory int) Transition {
|
func (b *Backend) Remove(maxHistory int) Transition {
|
||||||
|
|||||||
@@ -333,6 +333,7 @@ func TestStateString(t *testing.T) {
|
|||||||
{StateUp, "up"},
|
{StateUp, "up"},
|
||||||
{StateDown, "down"},
|
{StateDown, "down"},
|
||||||
{StatePaused, "paused"},
|
{StatePaused, "paused"},
|
||||||
|
{StateDisabled, "disabled"},
|
||||||
{StateRemoved, "removed"},
|
{StateRemoved, "removed"},
|
||||||
}
|
}
|
||||||
for _, c := range cases {
|
for _, c := range cases {
|
||||||
|
|||||||
@@ -112,6 +112,7 @@ func (c *Collector) Collect(ch chan<- prometheus.Metric) {
|
|||||||
health.StateUp,
|
health.StateUp,
|
||||||
health.StateDown,
|
health.StateDown,
|
||||||
health.StatePaused,
|
health.StatePaused,
|
||||||
|
health.StateDisabled,
|
||||||
health.StateRemoved,
|
health.StateRemoved,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -173,4 +174,3 @@ func Register(reg prometheus.Registerer, src StateSource) *Collector {
|
|||||||
reg.MustRegister(TransitionTotal)
|
reg.MustRegister(TransitionTotal)
|
||||||
return coll
|
return coll
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -59,7 +59,7 @@ Resume backend restarts probing
|
|||||||
|
|
||||||
Disable backend stops probing
|
Disable backend stops probing
|
||||||
Maglevc set backend nginx2 disable
|
Maglevc set backend nginx2 disable
|
||||||
Backend Should Have State nginx2 removed
|
Backend Should Have State nginx2 disabled
|
||||||
Backend Should Be Disabled nginx2
|
Backend Should Be Disabled nginx2
|
||||||
Sleep 1s
|
Sleep 1s
|
||||||
${before} = Get Probe Count nginx2
|
${before} = Get Probe Count nginx2
|
||||||
|
|||||||
Reference in New Issue
Block a user