Files
vpp-maglev/internal/checker/checker.go
Pim van Pelt 0049c2ae73 VPP reconciler: event-driven sync, pool failover, bug fixes
This commit wires the checker's state machine through to the VPP dataplane:
every backend state transition flows through a single code path that
recomputes the effective per-backend weight (with pool failover) and pushes
the result to VPP. Along the way several latent bugs in the state machine
and the sync path were fixed.

internal/vpp/reconciler.go (new)
- New Reconciler type subscribes to checker.Checker events and, on every
  transition, calls Client.SyncLBStateVIP for the affected frontend. This
  is the ONLY place in the codebase where backend state changes cause VPP
  calls — the "single path" discipline requested during design.
- Defines an EventSource interface (checker.Checker satisfies it) so the
  dependency direction stays vpp → checker; the checker never imports vpp.

internal/vpp/client.go
- Renamed ConfigSource → StateSource. The interface now has two methods:
  Config() and BackendState(name) — the reconciler and the desired-state
  builder both need live health state to compute effective weights.
- SetConfigSource → SetStateSource; internal cfgSrc field → stateSrc.
- New getStateSource() helper for internal locked access.
- lbSyncLoop still uses the state source for its periodic drift
  reconciliation; it's fully idempotent and runs the same code path as
  event-driven syncs.

internal/vpp/lbsync.go
- desiredAS grows a Flush bool so the mapping function can signal "on
  transition to weight 0, flush existing flow-table entries".
- asFromBackend is now the single source of truth for the state →
  (weight, flush) rule. Documented with a full truth table. Takes an
  activePool parameter so it can distinguish "up in active pool" from
  "up but standby".
- activePoolIndex(fe, states) implements priority failover: returns the
  index of the first pool containing any StateUp backend. pool[0] wins
  when at least one member is up; pool[1] takes over when pool[0] is
  empty; and so on. Defaults to 0 (unobservable, since all backends map
  to weight 0 when nothing is up).
- desiredFromFrontend snapshots backend states once, computes activePool,
  then walks every backend through asFromBackend. No more filtering on
  b.Enabled — disabled backends stay in the desired set so they keep
  their AS entry in VPP with weight=0. The previous filter caused delAS
  on disable, which destroyed the entry and broke enable afterwards.
- EffectiveWeights(fe, src) exported helper that returns the per-pool
  per-backend weight map for one frontend. Used by the gRPC GetFrontend
  handler and robot tests to observe failover without touching VPP.
- reconcileVIP computes flush at the weight-change call site:
    flush = desired.Flush && cur.Weight > 0 && desired.Weight == 0
  This ensures only the *transition* to disabled flushes sessions —
  steady-state syncs with already-zero weight skip the call entirely.
- setASWeight now plumbs IsFlush into lb_as_set_weight.

internal/vpp/lbsync_test.go (new)
- TestAsFromBackend: 15 cases locking down the truth table, including
  failover scenarios (up in standby pool, up promoted in pool[1]).
- TestActivePoolIndex: 8 cases covering pool[0]-has-up, pool[0]-all-down,
  all-disabled, all-paused, all-unknown, nothing-up-anywhere, and
  three-tier failover.
- TestDesiredFromFrontendFailover: 5 end-to-end scenarios wiring a fake
  StateSource through desiredFromFrontend and asserting the final
  per-IP weight map. Exercises the complete pipeline without VPP.

internal/checker/checker.go
- Added BackendState(name) (health.State, bool) — one-line method that
  satisfies vpp.StateSource. The checker is otherwise unchanged.
- EnableBackend rewritten to reuse the existing worker (parallel to
  ResumeBackend). The old code called startWorker which constructed a
  brand-new Backend via health.New, throwing away the transition
  history; the resulting 'backend-transition' log showed the bogus
  from=unknown,to=unknown. Now uses w.backend.Enable() to record a
  proper disabled→unknown transition and launches a fresh goroutine.
- Static (no-healthcheck) backends now fire their synthetic 'always up'
  pass on the first iteration of runProbe instead of sleeping 30s
  first. Previously static backends sat in StateUnknown for 30s after
  startup — useless for deterministic testing and surprising for
  operators. The fix is a simple first-iteration flag.

internal/health/state.go
- New Enable(maxHistory) method parallel to Disable. Transitions the
  backend from whatever state it's in (typically StateDisabled) to
  StateUnknown, resets the health counter to rise-1 so the expedited
  resolution kicks in on the first probe result, and emits a transition
  with code 'enabled'.

proto/maglev.proto
- PoolBackendInfo gains effective_weight: the state-aware weight that
  would be programmed into VPP (distinct from the configured weight in
  the YAML). Exposed via GetFrontend.

internal/grpcapi/server.go
- frontendToProto takes a vpp.StateSource, computes effective weights
  via vpp.EffectiveWeights, and populates PoolBackendInfo.EffectiveWeight.
- GetFrontend and SetFrontendPoolBackendWeight updated to pass the
  checker in.

cmd/maglevc/commands.go
- 'show frontends <name>' now renders every pool backend row as
    <name>  weight <cfg>  effective <eff>  [disabled]?
  so both values are always visible. The VPP-style key/value format
  avoids the ANSI-alignment pitfall we hit earlier and makes the output
  regex-parseable for robot tests.

cmd/maglevd/main.go
- Construct and start the Reconciler alongside the VPP client. Two
  extra lines, no other changes to startup.

tests/01-maglevd/maglevd-lab/maglev.yaml
- Two new static backends (static-primary, static-fallback) and a new
  failover-vip frontend with one backend per pool. No healthcheck, so
  the state machine resolves them to 'up' immediately via the synthetic
  pass. Used by the failover robot tests.

tests/01-maglevd/01-healthcheck.robot
- Three new test cases exercising pool failover end-to-end:
  1. primary up, secondary standby (initial state)
  2. disable primary → fallback takes over (effective weight flips)
  3. enable primary → fallback steps back
  All run without VPP: they scrape 'maglevc show frontends <name>' and
  regex-match the effective weight in the output. Deterministic and
  fast (~2s total) because the static backends don't probe.
- Two helper keywords: Static Backend Should Be Up and
  Effective Weight Should Be.

Net result: 16/16 robot tests pass. Backend state transitions now
flow through a single documented path (checker event → reconciler →
SyncLBStateVIP → desiredFromFrontend → asFromBackend → reconcileVIP →
setASWeight), and the pool failover / enable-after-disable / static-
backend-startup bugs are all fixed.
2026-04-12 12:40:09 +02:00

705 lines
20 KiB
Go

// Copyright (c) 2026, Pim van Pelt <pim@ipng.ch>
package checker
import (
"context"
"fmt"
"log/slog"
"net"
"sort"
"sync"
"time"
"git.ipng.ch/ipng/vpp-maglev/internal/config"
"git.ipng.ch/ipng/vpp-maglev/internal/health"
"git.ipng.ch/ipng/vpp-maglev/internal/metrics"
"git.ipng.ch/ipng/vpp-maglev/internal/prober"
)
// BackendSnapshot combines the live health state with the config entry for a backend.
type BackendSnapshot struct {
Health *health.Backend
Config config.Backend
}
// Event is emitted on every backend state transition, once per frontend that
// references the backend.
type Event struct {
FrontendName string
BackendName string
Backend net.IP
Transition health.Transition
}
type worker struct {
backend *health.Backend
hc config.HealthCheck
entry config.Backend
cancel context.CancelFunc
wakeCh chan struct{} // closed/signalled to interrupt probe sleep on resume
}
// Checker orchestrates health probing for all backends.
// Each backend is probed exactly once, regardless of how many frontends
// reference it.
type Checker struct {
runCtx context.Context // set in Run; used by EnableBackend to start new goroutines
cfg *config.Config
mu sync.RWMutex
workers map[string]*worker // keyed by backend name
subsMu sync.Mutex
nextID int
subs map[int]chan Event
eventCh chan Event
}
// New creates a Checker. Call Run to start probing.
func New(cfg *config.Config) *Checker {
return &Checker{
cfg: cfg,
workers: make(map[string]*worker),
subs: make(map[int]chan Event),
eventCh: make(chan Event, 256),
}
}
// Run starts all probe goroutines and blocks until ctx is cancelled.
func (c *Checker) Run(ctx context.Context) error {
go c.fanOut(ctx)
c.mu.Lock()
c.runCtx = ctx // safe: held under mu before any EnableBackend call can read it
names := activeBackendNames(c.cfg)
maxHistory := c.cfg.HealthChecker.TransitionHistory
for i, name := range names {
b := c.cfg.Backends[name]
hc := c.cfg.HealthChecks[b.HealthCheck]
c.startWorker(ctx, name, b, hc, i, len(names), maxHistory)
}
c.mu.Unlock()
<-ctx.Done()
return nil
}
// Reload applies a new config without restarting the process.
// New backends are added, removed backends are stopped, changed backends are
// restarted. Backends whose healthcheck config is unchanged continue
// uninterrupted, even if the set of frontends referencing them changes.
func (c *Checker) Reload(ctx context.Context, cfg *config.Config) error {
c.mu.Lock()
defer c.mu.Unlock()
maxHistory := cfg.HealthChecker.TransitionHistory
desired := map[string]struct{}{}
for _, name := range activeBackendNames(cfg) {
desired[name] = struct{}{}
}
// Stop workers no longer needed; emit a removed event using the old frontends.
for name, w := range c.workers {
if _, ok := desired[name]; !ok {
slog.Info("backend-stop", "backend", name)
t := w.backend.Remove(maxHistory)
c.emitForBackend(name, w.backend.Address, t, c.cfg.Frontends)
w.cancel()
delete(c.workers, name)
}
}
// Add new or restart changed workers; emit an unknown event using the new frontends.
names := activeBackendNames(cfg)
for i, name := range names {
b := cfg.Backends[name]
hc := cfg.HealthChecks[b.HealthCheck]
if w, ok := c.workers[name]; ok {
if healthCheckEqual(w.hc, hc) {
// Update entry metadata (weight, etc.) in place without restart.
w.entry = b
continue
}
slog.Info("backend-restart", "backend", name)
w.cancel()
c.startWorker(ctx, name, b, hc, i, len(names), maxHistory)
} else {
slog.Info("backend-start", "backend", name)
c.startWorker(ctx, name, b, hc, i, len(names), maxHistory)
}
c.emitForBackend(name, c.workers[name].backend.Address, c.workers[name].backend.Transitions[0], cfg.Frontends)
}
c.cfg = cfg
return nil
}
// Subscribe returns a channel that receives Events for every state transition.
// Call the returned cancel function to unsubscribe.
func (c *Checker) Subscribe() (<-chan Event, func()) {
c.subsMu.Lock()
defer c.subsMu.Unlock()
id := c.nextID
c.nextID++
ch := make(chan Event, 64)
c.subs[id] = ch
return ch, func() {
c.subsMu.Lock()
defer c.subsMu.Unlock()
delete(c.subs, id)
close(ch)
}
}
// Config returns the live config pointer held by the checker. Callers must
// treat the returned value as read-only. The pointer is swapped on Reload,
// so callers that cache it across reloads may see stale data.
func (c *Checker) Config() *config.Config {
c.mu.RLock()
defer c.mu.RUnlock()
return c.cfg
}
// BackendState returns the current health state of a backend. Returns
// (StateUnknown, false) when the backend has no worker. Satisfies
// vpp.StateSource.
func (c *Checker) BackendState(name string) (health.State, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
w, ok := c.workers[name]
if !ok {
return health.StateUnknown, false
}
return w.backend.State, true
}
// ListFrontends returns the names of all configured frontends.
func (c *Checker) ListFrontends() []string {
c.mu.RLock()
defer c.mu.RUnlock()
names := make([]string, 0, len(c.cfg.Frontends))
for name := range c.cfg.Frontends {
names = append(names, name)
}
return names
}
// GetFrontend returns the frontend config for the given name.
func (c *Checker) GetFrontend(name string) (config.Frontend, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
v, ok := c.cfg.Frontends[name]
return v, ok
}
// SetFrontendPoolBackendWeight updates the weight of a backend within a named
// pool of a frontend. Returns the updated FrontendInfo and a descriptive error
// if the frontend, pool, or backend is not found or the weight is out of range.
func (c *Checker) SetFrontendPoolBackendWeight(frontendName, poolName, backendName string, weight int) (config.Frontend, error) {
if weight < 0 || weight > 100 {
return config.Frontend{}, fmt.Errorf("weight %d out of range [0, 100]", weight)
}
c.mu.Lock()
defer c.mu.Unlock()
fe, ok := c.cfg.Frontends[frontendName]
if !ok {
return config.Frontend{}, fmt.Errorf("frontend %q not found", frontendName)
}
for i, pool := range fe.Pools {
if pool.Name != poolName {
continue
}
pb, ok := pool.Backends[backendName]
if !ok {
return config.Frontend{}, fmt.Errorf("backend %q not found in pool %q", backendName, poolName)
}
pb.Weight = weight
fe.Pools[i].Backends[backendName] = pb
c.cfg.Frontends[frontendName] = fe
slog.Info("frontend-pool-weight", "frontend", frontendName, "pool", poolName, "backend", backendName, "weight", weight)
return fe, nil
}
return config.Frontend{}, fmt.Errorf("pool %q not found in frontend %q", poolName, frontendName)
}
// ListHealthChecks returns the names of all configured health checks, sorted.
func (c *Checker) ListHealthChecks() []string {
c.mu.RLock()
defer c.mu.RUnlock()
names := make([]string, 0, len(c.cfg.HealthChecks))
for name := range c.cfg.HealthChecks {
names = append(names, name)
}
sort.Strings(names)
return names
}
// GetHealthCheck returns the config for a health check by name.
func (c *Checker) GetHealthCheck(name string) (config.HealthCheck, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
hc, ok := c.cfg.HealthChecks[name]
return hc, ok
}
// ListBackends returns the names of all active backends.
func (c *Checker) ListBackends() []string {
c.mu.RLock()
defer c.mu.RUnlock()
names := make([]string, 0, len(c.workers))
for name := range c.workers {
names = append(names, name)
}
sort.Strings(names)
return names
}
// ListFrontendBackends returns the backend health states for all backends of a frontend.
func (c *Checker) ListFrontendBackends(frontendName string) []*health.Backend {
c.mu.RLock()
defer c.mu.RUnlock()
fe, ok := c.cfg.Frontends[frontendName]
if !ok {
return nil
}
var out []*health.Backend
seen := map[string]struct{}{}
for _, pool := range fe.Pools {
for name := range pool.Backends {
if _, already := seen[name]; already {
continue
}
seen[name] = struct{}{}
if w, ok := c.workers[name]; ok {
out = append(out, w.backend)
}
}
}
return out
}
// GetBackend returns a snapshot of the health state and config for a backend by name.
func (c *Checker) GetBackend(name string) (BackendSnapshot, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
w, ok := c.workers[name]
if !ok {
return BackendSnapshot{}, false
}
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
}
// GetBackendInfo returns the health state and key config fields for a backend.
// Satisfies metrics.StateSource.
func (c *Checker) GetBackendInfo(name string) (metrics.BackendInfo, bool) {
c.mu.RLock()
defer c.mu.RUnlock()
w, ok := c.workers[name]
if !ok {
return metrics.BackendInfo{}, false
}
return metrics.BackendInfo{
Health: w.backend,
Enabled: w.entry.Enabled,
HCName: w.entry.HealthCheck,
}, true
}
// PauseBackend pauses health checking for a backend by name. The probe
// goroutine is cancelled so no further traffic is sent to the backend. The
// backend's state is set to paused and remains frozen until ResumeBackend is
// called (which starts a fresh probe goroutine).
// Returns an error if the backend is not found or is disabled.
func (c *Checker) PauseBackend(name string) (BackendSnapshot, error) {
c.mu.Lock()
defer c.mu.Unlock()
w, ok := c.workers[name]
if !ok {
return BackendSnapshot{}, fmt.Errorf("backend %q not found", name)
}
if !w.entry.Enabled {
return BackendSnapshot{}, fmt.Errorf("backend %q is disabled; enable it first", name)
}
maxHistory := c.cfg.HealthChecker.TransitionHistory
if w.backend.Pause(maxHistory) {
t := w.backend.Transitions[0]
slog.Info("backend-transition", "backend", name,
"from", t.From.String(),
"to", t.To.String(),
)
c.emitForBackend(name, w.backend.Address, t, c.cfg.Frontends)
}
w.cancel()
return BackendSnapshot{Health: w.backend, Config: w.entry}, nil
}
// ResumeBackend resumes health checking for a backend by name. A fresh probe
// goroutine is started and the backend re-enters StateUnknown. The existing
// transition history is preserved.
// Returns an error if the backend is not found or is disabled.
func (c *Checker) ResumeBackend(name string) (BackendSnapshot, error) {
c.mu.Lock()
defer c.mu.Unlock()
w, ok := c.workers[name]
if !ok {
return BackendSnapshot{}, fmt.Errorf("backend %q not found", name)
}
if !w.entry.Enabled {
return BackendSnapshot{}, fmt.Errorf("backend %q is disabled; enable it first", name)
}
maxHistory := c.cfg.HealthChecker.TransitionHistory
if w.backend.Resume(maxHistory) {
t := w.backend.Transitions[0]
slog.Info("backend-transition", "backend", name,
"from", t.From.String(),
"to", t.To.String(),
)
c.emitForBackend(name, w.backend.Address, t, c.cfg.Frontends)
}
// Launch a fresh probe goroutine with a new cancellable context,
// keeping the existing worker and its transition history.
wCtx, cancel := context.WithCancel(c.runCtx)
w.cancel = cancel
w.wakeCh = make(chan struct{}, 1)
go c.runProbe(wCtx, name, 0, 1)
return BackendSnapshot{Health: w.backend, Config: w.entry}, nil
}
// DisableBackend stops health checking for a backend and removes it from active
// rotation. The worker entry is kept in the map so the backend remains visible
// via GetBackend and can be re-enabled with EnableBackend.
func (c *Checker) DisableBackend(name string) (BackendSnapshot, bool) {
c.mu.Lock()
defer c.mu.Unlock()
w, ok := c.workers[name]
if !ok {
return BackendSnapshot{}, false
}
if !w.entry.Enabled {
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
}
maxHistory := c.cfg.HealthChecker.TransitionHistory
t := w.backend.Disable(maxHistory)
slog.Info("backend-transition", "backend", name,
"from", t.From.String(),
"to", t.To.String(),
)
c.emitForBackend(name, w.backend.Address, t, c.cfg.Frontends)
w.cancel()
w.entry.Enabled = false
if b, ok := c.cfg.Backends[name]; ok {
b.Enabled = false
c.cfg.Backends[name] = b
}
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
}
// EnableBackend re-enables a previously disabled backend. The existing
// Backend struct is reused — its transition history is preserved — and a
// fresh probe goroutine is launched. The backend re-enters StateUnknown.
func (c *Checker) EnableBackend(name string) (BackendSnapshot, bool) {
c.mu.Lock()
defer c.mu.Unlock()
w, ok := c.workers[name]
if !ok {
return BackendSnapshot{}, false
}
if w.entry.Enabled {
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
}
w.entry.Enabled = true
if b, ok := c.cfg.Backends[name]; ok {
b.Enabled = true
c.cfg.Backends[name] = b
}
maxHistory := c.cfg.HealthChecker.TransitionHistory
t := w.backend.Enable(maxHistory)
slog.Info("backend-transition", "backend", name,
"from", t.From.String(),
"to", t.To.String(),
)
c.emitForBackend(name, w.backend.Address, t, c.cfg.Frontends)
// Launch a fresh probe goroutine with a new cancellable context,
// keeping the existing worker and its transition history.
wCtx, cancel := context.WithCancel(c.runCtx)
w.cancel = cancel
w.wakeCh = make(chan struct{}, 1)
go c.runProbe(wCtx, name, 0, 1)
return BackendSnapshot{Health: w.backend, Config: w.entry}, true
}
// ---- internal --------------------------------------------------------------
// startWorker creates a Backend and launches a probe goroutine.
// Must be called with c.mu held.
func (c *Checker) startWorker(ctx context.Context, name string, entry config.Backend, hc config.HealthCheck, pos, total, maxHistory int) {
rise, fall := hc.Rise, hc.Fall
if entry.HealthCheck == "" {
// No healthcheck: one synthetic pass drives the backend to Up immediately.
rise, fall = 1, 1
}
wCtx, cancel := context.WithCancel(ctx)
w := &worker{
backend: health.New(name, entry.Address, rise, fall),
hc: hc,
entry: entry,
cancel: cancel,
wakeCh: make(chan struct{}, 1),
}
w.backend.Start(maxHistory)
c.workers[name] = w
go c.runProbe(wCtx, name, pos, total)
}
// runProbe is the per-backend probe loop.
func (c *Checker) runProbe(ctx context.Context, name string, pos, total int) {
c.mu.RLock()
w, ok := c.workers[name]
if !ok {
c.mu.RUnlock()
return
}
initialDelay := staggerDelay(w.hc.Interval, pos, total)
c.mu.RUnlock()
if initialDelay > 0 {
select {
case <-ctx.Done():
return
case <-time.After(initialDelay):
}
}
first := true
for {
c.mu.RLock()
w, ok := c.workers[name]
if !ok {
c.mu.RUnlock()
return
}
hc := w.hc
entry := w.entry
maxHistory := c.cfg.HealthChecker.TransitionHistory
netns := c.cfg.HealthChecker.Netns
wakeCh := w.wakeCh
var sleepFor time.Duration
if entry.HealthCheck == "" {
// Static (no-healthcheck) backends: the first iteration fires
// the synthetic pass immediately so the backend reaches "up"
// without delay; subsequent iterations idle at 30s since there's
// nothing to do anyway.
if first {
sleepFor = 0
} else {
sleepFor = 30 * time.Second
}
} else {
sleepFor = w.backend.NextInterval(hc.Interval, hc.FastInterval, hc.DownInterval)
}
c.mu.RUnlock()
select {
case <-ctx.Done():
return
case <-time.After(sleepFor):
case <-wakeCh:
}
first = false
var result health.ProbeResult
if entry.HealthCheck == "" {
// No healthcheck configured: synthesise a passing result so the
// backend is assumed healthy without any network activity.
result = health.ProbeResult{OK: true, Layer: health.LayerL7, Code: "L7OK"}
} else {
var probeSrc net.IP
if entry.Address.To4() != nil {
probeSrc = hc.ProbeIPv4Src
} else {
probeSrc = hc.ProbeIPv6Src
}
pcfg := prober.ProbeConfig{
Target: entry.Address,
Port: hc.Port,
ProbeSrc: probeSrc,
HealthCheckNetns: netns,
Timeout: hc.Timeout,
HTTP: hc.HTTP,
TCP: hc.TCP,
}
probeCtx, cancel := context.WithTimeout(ctx, hc.Timeout)
slog.Debug("probe-start", "backend", name, "type", hc.Type)
start := time.Now()
result = prober.ForType(hc.Type)(probeCtx, pcfg)
elapsed := time.Since(start)
cancel()
slog.Debug("probe-done",
"backend", name,
"type", hc.Type,
"ok", result.OK,
"code", result.Code,
"detail", result.Detail,
"elapsed", elapsed.Round(time.Millisecond).String(),
)
res := "success"
if !result.OK {
res = "failure"
}
metrics.ProbeTotal.WithLabelValues(name, hc.Type, res, result.Code).Inc()
metrics.ProbeDuration.WithLabelValues(name, hc.Type).Observe(elapsed.Seconds())
}
c.mu.Lock()
w, exists := c.workers[name]
if !exists {
c.mu.Unlock()
return
}
if w.backend.Record(result, maxHistory) {
t := w.backend.Transitions[0]
addr := w.backend.Address
slog.Info("backend-transition",
"backend", name,
"from", t.From.String(),
"to", t.To.String(),
"code", result.Code,
"detail", result.Detail,
)
metrics.TransitionTotal.WithLabelValues(name, t.From.String(), t.To.String()).Inc()
c.emitForBackend(name, addr, t, c.cfg.Frontends)
}
c.mu.Unlock()
}
}
// emitForBackend emits one Event per frontend that references backendName
// (in any pool), using the provided frontends map. Must be called with c.mu held.
func (c *Checker) emitForBackend(backendName string, addr net.IP, t health.Transition, frontends map[string]config.Frontend) {
for feName, fe := range frontends {
emitted := false
for _, pool := range fe.Pools {
if emitted {
break
}
for name := range pool.Backends {
if name == backendName {
c.emit(Event{FrontendName: feName, BackendName: backendName, Backend: addr, Transition: t})
emitted = true
break
}
}
}
}
}
// emit sends an event to the internal fan-out channel (non-blocking).
// Must be called with c.mu held.
func (c *Checker) emit(e Event) {
select {
case c.eventCh <- e:
default:
slog.Warn("event-drop", "frontend", e.FrontendName, "backend", e.BackendName)
}
}
// fanOut reads from eventCh and distributes to all subscribers.
func (c *Checker) fanOut(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
case e := <-c.eventCh:
c.subsMu.Lock()
for _, ch := range c.subs {
select {
case ch <- e:
default:
// Slow subscriber — drop rather than block.
}
}
c.subsMu.Unlock()
}
}
}
// healthCheckEqual returns true if two HealthCheck configs are functionally identical.
func healthCheckEqual(a, b config.HealthCheck) bool {
if a.Type != b.Type ||
a.Interval != b.Interval ||
a.FastInterval != b.FastInterval ||
a.DownInterval != b.DownInterval ||
a.Timeout != b.Timeout ||
a.Rise != b.Rise ||
a.Fall != b.Fall {
return false
}
return httpParamsEqual(a.HTTP, b.HTTP) && tcpParamsEqual(a.TCP, b.TCP)
}
func httpParamsEqual(a, b *config.HTTPParams) bool {
if a == nil && b == nil {
return true
}
if a == nil || b == nil {
return false
}
aRe, bRe := "", ""
if a.ResponseRegexp != nil {
aRe = a.ResponseRegexp.String()
}
if b.ResponseRegexp != nil {
bRe = b.ResponseRegexp.String()
}
return a.Path == b.Path &&
a.Host == b.Host &&
a.ResponseCodeMin == b.ResponseCodeMin &&
a.ResponseCodeMax == b.ResponseCodeMax &&
aRe == bRe &&
a.ServerName == b.ServerName &&
a.InsecureSkipVerify == b.InsecureSkipVerify
}
func tcpParamsEqual(a, b *config.TCPParams) bool {
if a == nil && b == nil {
return true
}
if a == nil || b == nil {
return false
}
return a.SSL == b.SSL &&
a.ServerName == b.ServerName &&
a.InsecureSkipVerify == b.InsecureSkipVerify
}
// activeBackendNames returns a sorted, deduplicated list of backend names that
// are referenced by at least one frontend pool and have Enabled: true.
func activeBackendNames(cfg *config.Config) []string {
seen := map[string]struct{}{}
for _, fe := range cfg.Frontends {
for _, pool := range fe.Pools {
for name := range pool.Backends {
if b, ok := cfg.Backends[name]; ok && b.Enabled {
seen[name] = struct{}{}
}
}
}
}
names := make([]string, 0, len(seen))
for name := range seen {
names = append(names, name)
}
sort.Strings(names)
return names
}
// staggerDelay computes the initial probe delay for position pos out of total.
func staggerDelay(interval time.Duration, pos, total int) time.Duration {
if total <= 1 {
return 0
}
return time.Duration(int64(interval) * int64(pos) / int64(total))
}