VPP LB counters, src-ip-sticky, and frontend state aggregation

New feature: per-VIP / per-backend runtime counters * New GetVPPLBCounters RPC serving an in-process snapshot refreshed by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls the LB plugin's four SimpleCounters (next, first, untracked, no-server) plus the FIB /net/route/to CombinedCounter for every VIP and every backend host prefix via a single DumpStats call. * FIB stats-index discovery via ip_route_lookup (internal/vpp/ fibstats.go); per-worker reduction happens in the collector. * Prometheus collector exports vip_packets_total (kind label), vip_route_{packets,bytes}_total, and backend_route_{packets, bytes}_total. Metrics source interface extended with VIPStats / BackendRouteStats; vpp.Client publishes snapshots via atomic.Pointer and clears them on disconnect. * New 'show vpp lb counters' CLI command. The 'show vpp lbstate' and 'sync vpp lbstate' commands are restructured under 'show vpp lb {state,counters}' / 'sync vpp lb state' to make room for the new verb. New feature: src-ip-sticky frontends * New frontend YAML key 'src-ip-sticky' (bool). Plumbed through config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call. * Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP. src_ip_sticky, and shown in 'show vpp lb state' output. * Scraped back from VPP by parsing 'show lb vips verbose' through cli_inband — lb_vip_details does not expose the flag. The same scrape also recovers the LB pool index for each VIP, which the stats-segment counters are keyed on. This is a documented temporary workaround until VPP ships an lb_vip_v2_dump. * src_ip_sticky cannot be mutated on a live VIP, so a flipped flag triggers a tear-down-and-recreate in reconcileVIP (ASes deleted with flush, VIP deleted, then re-added). Flip is logged. New feature: frontend state aggregation and events * New health.FrontendState (unknown/up/down) and FrontendTransition types. A frontend is 'up' iff at least one backend has a nonzero effective weight, 'unknown' iff no backend has real state yet, and 'down' otherwise. * Checker tracks per-frontend aggregate state, recomputing after each backend transition and emitting a frontend-transition Event on change. Reload drops entries for removed frontends. * checker.Event gains an optional FrontendTransition pointer; backend- vs. frontend-transition events are demultiplexed on that field. * WatchEvents now sends an initial snapshot of frontend state on connect (mirroring the existing backend snapshot), subscribes once to the checker stream, and fans out to backend/frontend handlers based on the client's filter flags. The proto FrontendEvent message grows name + transition fields. * New Checker.FrontendState accessor. Refactor: pure health helpers * Moved the priority-failover selector and the (pool idx, active pool, state, cfg weight) → (vpp weight, flush) mapping out of internal/vpp/lbsync.go into a new internal/health/weights.go so the checker can reuse them for frontend-state computation without importing internal/vpp. * New functions: health.ActivePoolIndex, BackendEffectiveWeight, EffectiveWeights, ComputeFrontendState. lbsync.go now calls these directly; vpp.EffectiveWeights is a thin wrapper over health.EffectiveWeights retained for the gRPC observability path. Fully unit-tested in internal/health/weights_test.go. maglevc polish * --color default is now mode-aware: on in the interactive shell, off in one-shot mode so piped output is script-safe. Explicit --color=true/false still overrides. * New stripHostMask helper drops /32 and /128 from VIP display; non-host prefixes pass through unchanged. * Counter table column order fixed (first before next) and packets/bytes columns renamed to fib-packets/fib-bytes to clarify they come from the FIB, not the LB plugin. Docs * config-guide: document src-ip-sticky, including the VIP recreate-on-change caveat. * user-guide, maglevc.1, maglevd.8: updated command tree, new counters command, color defaults, and the src-ip-sticky field.
2026-04-12 15:59:02 +02:00
parent d5fbf5c640
commit fb62532fd5
25 changed files with 2163 additions and 549 deletions
--- a/internal/checker/checker.go
+++ b/internal/checker/checker.go
@@ -23,13 +23,26 @@ type BackendSnapshot struct {
 	Config config.Backend
 }

-// Event is emitted on every backend state transition, once per frontend that
-// references the backend.
+// Event is emitted on every state transition the checker observes. There are
+// two kinds, distinguished by which of BackendName or FrontendTransition is
+// populated:
+//
+//   - Backend transition: FrontendName is the frontend that references the
+//     backend (one event per frontend per backend transition), BackendName
+//     and Backend are set, and Transition carries the health.Transition.
+//     FrontendTransition is nil.
+//   - Frontend transition: FrontendName is the frontend whose aggregate state
+//     changed, FrontendTransition is non-nil. BackendName and Backend are
+//     empty, Transition is the zero value.
+//
+// Consumers dispatch on FrontendTransition != nil.
 type Event struct {
 	FrontendName string
 	BackendName  string
 	Backend      net.IP
 	Transition   health.Transition
+
+	FrontendTransition *health.FrontendTransition
 }

 type worker struct {
@@ -49,6 +62,13 @@ type Checker struct {
 	mu      sync.RWMutex
 	workers map[string]*worker // keyed by backend name

+	// frontendStates tracks the aggregated state of every configured frontend
+	// (unknown/up/down). Updated whenever a backend transition happens; a
+	// change emits a frontend-transition Event. The zero value for a missing
+	// key is FrontendStateUnknown, so initial-reference accesses behave
+	// correctly even without explicit seeding.
+	frontendStates map[string]health.FrontendState
+
 	subsMu  sync.Mutex
 	nextID  int
 	subs    map[int]chan Event
@@ -58,10 +78,11 @@ type Checker struct {
 // New creates a Checker. Call Run to start probing.
 func New(cfg *config.Config) *Checker {
 	return &Checker{
-		cfg:     cfg,
-		workers: make(map[string]*worker),
-		subs:    make(map[int]chan Event),
-		eventCh: make(chan Event, 256),
+		cfg:            cfg,
+		workers:        make(map[string]*worker),
+		frontendStates: make(map[string]health.FrontendState),
+		subs:           make(map[int]chan Event),
+		eventCh:        make(chan Event, 256),
 	}
 }

@@ -131,6 +152,13 @@ func (c *Checker) Reload(ctx context.Context, cfg *config.Config) error {
 		c.emitForBackend(name, c.workers[name].backend.Address, c.workers[name].backend.Transitions[0], cfg.Frontends)
 	}

+	// Drop frontendStates entries for frontends no longer in config.
+	for feName := range c.frontendStates {
+		if _, ok := cfg.Frontends[feName]; !ok {
+			delete(c.frontendStates, feName)
+		}
+	}
+
 	c.cfg = cfg
 	return nil
 }
@@ -174,6 +202,18 @@ func (c *Checker) BackendState(name string) (health.State, bool) {
 	return w.backend.State, true
 }

+// FrontendState returns the current aggregate state of a frontend (unknown,
+// up, or down). Returns (FrontendStateUnknown, false) when the frontend is
+// not known to the checker.
+func (c *Checker) FrontendState(name string) (health.FrontendState, bool) {
+	c.mu.RLock()
+	defer c.mu.RUnlock()
+	if _, ok := c.cfg.Frontends[name]; !ok {
+		return health.FrontendStateUnknown, false
+	}
+	return c.frontendStates[name], true
+}
+
 // ListFrontends returns the names of all configured frontends.
 func (c *Checker) ListFrontends() []string {
 	c.mu.RLock()
@@ -575,24 +615,60 @@ func (c *Checker) runProbe(ctx context.Context, name string, pos, total int) {
 	}
 }

-// emitForBackend emits one Event per frontend that references backendName
-// (in any pool), using the provided frontends map. Must be called with c.mu held.
+// emitForBackend emits one backend-transition Event per frontend that
+// references backendName (in any pool), using the provided frontends map.
+// After emitting the backend event for a frontend, it also re-computes that
+// frontend's aggregate state and emits a frontend-transition Event if the
+// state has changed. Must be called with c.mu held.
 func (c *Checker) emitForBackend(backendName string, addr net.IP, t health.Transition, frontends map[string]config.Frontend) {
 	for feName, fe := range frontends {
-		emitted := false
-		for _, pool := range fe.Pools {
-			if emitted {
-				break
-			}
-			for name := range pool.Backends {
-				if name == backendName {
-					c.emit(Event{FrontendName: feName, BackendName: backendName, Backend: addr, Transition: t})
-					emitted = true
-					break
-				}
+		if !frontendReferencesBackend(fe, backendName) {
+			continue
+		}
+		c.emit(Event{FrontendName: feName, BackendName: backendName, Backend: addr, Transition: t})
+		c.updateFrontendState(feName, fe)
+	}
+}
+
+// frontendReferencesBackend reports whether fe has the named backend in any
+// of its pools.
+func frontendReferencesBackend(fe config.Frontend, backendName string) bool {
+	for _, pool := range fe.Pools {
+		if _, ok := pool.Backends[backendName]; ok {
+			return true
+		}
+	}
+	return false
+}
+
+// updateFrontendState recomputes the aggregate state of fe, compares against
+// the last known state, and emits a frontend-transition Event on change.
+// Must be called with c.mu held. The current state is read from the worker
+// map — so the caller (who already holds c.mu) sees a consistent view.
+func (c *Checker) updateFrontendState(feName string, fe config.Frontend) {
+	states := make(map[string]health.State)
+	for _, pool := range fe.Pools {
+		for bName := range pool.Backends {
+			if w, ok := c.workers[bName]; ok {
+				states[bName] = w.backend.State
+			} else {
+				states[bName] = health.StateUnknown
 			}
 		}
 	}
+	newState := health.ComputeFrontendState(fe, states)
+	old := c.frontendStates[feName] // zero value (Unknown) on first access
+	if old == newState {
+		return
+	}
+	c.frontendStates[feName] = newState
+	ft := health.FrontendTransition{From: old, To: newState, At: time.Now()}
+	slog.Info("frontend-transition",
+		"frontend", feName,
+		"from", old.String(),
+		"to", newState.String(),
+	)
+	c.emit(Event{FrontendName: feName, FrontendTransition: &ft})
 }

 // emit sends an event to the internal fan-out channel (non-blocking).
--- a/internal/config/config.go
+++ b/internal/config/config.go
@@ -119,6 +119,7 @@ type Frontend struct {
 	Protocol    string // "tcp", "udp", or "" (all traffic)
 	Port        uint16 // 0 means omitted (all ports)
 	Pools       []Pool // ordered tiers; first pool with any up backend is active
+	SrcIPSticky bool   // when true, VPP LB uses src-IP-based hashing for this VIP
 }

 // ---- raw YAML types --------------------------------------------------------
@@ -199,6 +200,7 @@ type rawFrontend struct {
 	Protocol    string    `yaml:"protocol"`
 	Port        uint16    `yaml:"port"`
 	Pools       []rawPool `yaml:"pools"`
+	SrcIPSticky bool      `yaml:"src-ip-sticky"`
 }

 // ---- Check / Load ----------------------------------------------------------
@@ -519,6 +521,7 @@ func convertFrontend(name string, r *rawFrontend, backends map[string]Backend) (
 		Description: r.Description,
 		Protocol:    r.Protocol,
 		Port:        r.Port,
+		SrcIPSticky: r.SrcIPSticky,
 	}

 	ip := net.ParseIP(r.Address)
--- a/internal/grpcapi/maglev.pb.go
+++ b/internal/grpcapi/maglev.pb.go
--- a/internal/grpcapi/maglev_grpc.pb.go
+++ b/internal/grpcapi/maglev_grpc.pb.go
@@ -36,6 +36,7 @@ const (
 	Maglev_GetVPPInfo_FullMethodName                   = "/maglev.Maglev/GetVPPInfo"
 	Maglev_GetVPPLBState_FullMethodName                = "/maglev.Maglev/GetVPPLBState"
 	Maglev_SyncVPPLBState_FullMethodName               = "/maglev.Maglev/SyncVPPLBState"
+	Maglev_GetVPPLBCounters_FullMethodName             = "/maglev.Maglev/GetVPPLBCounters"
 )

 // MaglevClient is the client API for Maglev service.
@@ -61,6 +62,7 @@ type MaglevClient interface {
 	GetVPPInfo(ctx context.Context, in *GetVPPInfoRequest, opts ...grpc.CallOption) (*VPPInfo, error)
 	GetVPPLBState(ctx context.Context, in *GetVPPLBStateRequest, opts ...grpc.CallOption) (*VPPLBState, error)
 	SyncVPPLBState(ctx context.Context, in *SyncVPPLBStateRequest, opts ...grpc.CallOption) (*SyncVPPLBStateResponse, error)
+	GetVPPLBCounters(ctx context.Context, in *GetVPPLBCountersRequest, opts ...grpc.CallOption) (*VPPLBCounters, error)
 }

 type maglevClient struct {
@@ -250,6 +252,16 @@ func (c *maglevClient) SyncVPPLBState(ctx context.Context, in *SyncVPPLBStateReq
 	return out, nil
 }

+func (c *maglevClient) GetVPPLBCounters(ctx context.Context, in *GetVPPLBCountersRequest, opts ...grpc.CallOption) (*VPPLBCounters, error) {
+	cOpts := append([]grpc.CallOption{grpc.StaticMethod()}, opts...)
+	out := new(VPPLBCounters)
+	err := c.cc.Invoke(ctx, Maglev_GetVPPLBCounters_FullMethodName, in, out, cOpts...)
+	if err != nil {
+		return nil, err
+	}
+	return out, nil
+}
+
 // MaglevServer is the server API for Maglev service.
 // All implementations must embed UnimplementedMaglevServer
 // for forward compatibility.
@@ -273,6 +285,7 @@ type MaglevServer interface {
 	GetVPPInfo(context.Context, *GetVPPInfoRequest) (*VPPInfo, error)
 	GetVPPLBState(context.Context, *GetVPPLBStateRequest) (*VPPLBState, error)
 	SyncVPPLBState(context.Context, *SyncVPPLBStateRequest) (*SyncVPPLBStateResponse, error)
+	GetVPPLBCounters(context.Context, *GetVPPLBCountersRequest) (*VPPLBCounters, error)
 	mustEmbedUnimplementedMaglevServer()
 }

@@ -334,6 +347,9 @@ func (UnimplementedMaglevServer) GetVPPLBState(context.Context, *GetVPPLBStateRe
 func (UnimplementedMaglevServer) SyncVPPLBState(context.Context, *SyncVPPLBStateRequest) (*SyncVPPLBStateResponse, error) {
 	return nil, status.Error(codes.Unimplemented, "method SyncVPPLBState not implemented")
 }
+func (UnimplementedMaglevServer) GetVPPLBCounters(context.Context, *GetVPPLBCountersRequest) (*VPPLBCounters, error) {
+	return nil, status.Error(codes.Unimplemented, "method GetVPPLBCounters not implemented")
+}
 func (UnimplementedMaglevServer) mustEmbedUnimplementedMaglevServer() {}
 func (UnimplementedMaglevServer) testEmbeddedByValue()                {}

@@ -654,6 +670,24 @@ func _Maglev_SyncVPPLBState_Handler(srv interface{}, ctx context.Context, dec fu
 	return interceptor(ctx, in, info, handler)
 }

+func _Maglev_GetVPPLBCounters_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
+	in := new(GetVPPLBCountersRequest)
+	if err := dec(in); err != nil {
+		return nil, err
+	}
+	if interceptor == nil {
+		return srv.(MaglevServer).GetVPPLBCounters(ctx, in)
+	}
+	info := &grpc.UnaryServerInfo{
+		Server:     srv,
+		FullMethod: Maglev_GetVPPLBCounters_FullMethodName,
+	}
+	handler := func(ctx context.Context, req interface{}) (interface{}, error) {
+		return srv.(MaglevServer).GetVPPLBCounters(ctx, req.(*GetVPPLBCountersRequest))
+	}
+	return interceptor(ctx, in, info, handler)
+}
+
 // Maglev_ServiceDesc is the grpc.ServiceDesc for Maglev service.
 // It's only intended for direct use with grpc.RegisterService,
 // and not to be introspected or modified (even as a copy)
@@ -725,6 +759,10 @@ var Maglev_ServiceDesc = grpc.ServiceDesc{
 			MethodName: "SyncVPPLBState",
 			Handler:    _Maglev_SyncVPPLBState_Handler,
 		},
+		{
+			MethodName: "GetVPPLBCounters",
+			Handler:    _Maglev_GetVPPLBCounters_Handler,
+		},
 	},
 	Streams: []grpc.StreamDesc{
 		{
--- a/internal/grpcapi/server.go
+++ b/internal/grpcapi/server.go
@@ -7,6 +7,7 @@ import (
 	"errors"
 	"log/slog"
 	"net"
+	"sort"

 	"google.golang.org/grpc/codes"
 	"google.golang.org/grpc/status"
@@ -128,14 +129,13 @@ func (s *Server) GetHealthCheck(_ context.Context, req *GetHealthCheckRequest) (
 }

 // WatchEvents streams events to the client. On connect, the current state of
-// all backends is sent as synthetic BackendEvents. Afterwards, live events are
-// forwarded based on the filter flags in req. An unset (nil) flag defaults to
-// true (subscribe). An empty log_level defaults to "info".
+// every backend and/or frontend is sent as a synthetic event. Afterwards,
+// live events are forwarded based on the filter flags in req. An unset (nil)
+// flag defaults to true (subscribe). An empty log_level defaults to "info".
 func (s *Server) WatchEvents(req *WatchRequest, stream Maglev_WatchEventsServer) error {
 	wantLog := req.Log == nil || *req.Log
 	wantBackend := req.Backend == nil || *req.Backend
 	wantFrontend := req.Frontend == nil || *req.Frontend
-	_ = wantFrontend // no frontend events emitted yet

 	logLevel := slog.LevelInfo
 	if req.LogLevel != "" {
@@ -152,8 +152,20 @@ func (s *Server) WatchEvents(req *WatchRequest, stream Maglev_WatchEventsServer)
 		defer unsub()
 	}

-	// Subscribe to backend events; send initial state snapshot first.
-	var backendCh <-chan checker.Event
+	// Subscribe to the checker event stream once; we demultiplex backend
+	// and frontend events in the select below. Skip the subscription if
+	// neither kind is wanted.
+	var eventCh <-chan checker.Event
+	if wantBackend || wantFrontend {
+		var unsub func()
+		eventCh, unsub = s.checker.Subscribe()
+		defer unsub()
+	}
+
+	// Send initial state snapshot: one synthetic event per existing backend
+	// (if wanted), and one per existing frontend (if wanted). Clients that
+	// connect mid-flight see the current state immediately instead of
+	// waiting for the next transition.
 	if wantBackend {
 		for _, name := range s.checker.ListBackends() {
 			snap, ok := s.checker.GetBackend(name)
@@ -172,9 +184,25 @@ func (s *Server) WatchEvents(req *WatchRequest, stream Maglev_WatchEventsServer)
 				return err
 			}
 		}
-		var unsub func()
-		backendCh, unsub = s.checker.Subscribe()
-		defer unsub()
+	}
+	if wantFrontend {
+		for _, name := range s.checker.ListFrontends() {
+			fs, ok := s.checker.FrontendState(name)
+			if !ok {
+				continue
+			}
+			ev := &Event{Event: &Event_Frontend{Frontend: &FrontendEvent{
+				FrontendName: name,
+				Transition: &TransitionRecord{
+					From:     fs.String(),
+					To:       fs.String(),
+					AtUnixNs: 0,
+				},
+			}}}
+			if err := stream.Send(ev); err != nil {
+				return err
+			}
+		}
 	}

 	for {
@@ -190,10 +218,29 @@ func (s *Server) WatchEvents(req *WatchRequest, stream Maglev_WatchEventsServer)
 			if err := stream.Send(&Event{Event: &Event_Log{Log: le}}); err != nil {
 				return err
 			}
-		case e, ok := <-backendCh:
+		case e, ok := <-eventCh:
 			if !ok {
 				return nil
 			}
+			if e.FrontendTransition != nil {
+				if !wantFrontend {
+					continue
+				}
+				if err := stream.Send(&Event{Event: &Event_Frontend{Frontend: &FrontendEvent{
+					FrontendName: e.FrontendName,
+					Transition: &TransitionRecord{
+						From:     e.FrontendTransition.From.String(),
+						To:       e.FrontendTransition.To.String(),
+						AtUnixNs: e.FrontendTransition.At.UnixNano(),
+					},
+				}}}); err != nil {
+					return err
+				}
+				continue
+			}
+			if !wantBackend {
+				continue
+			}
 			if err := stream.Send(&Event{Event: &Event_Backend{Backend: &BackendEvent{
 				BackendName: e.BackendName,
 				Transition:  transitionToProto(e.Transition),
@@ -302,6 +349,51 @@ func (s *Server) GetVPPLBState(_ context.Context, _ *GetVPPLBStateRequest) (*VPP
 	return lbStateToProto(state), nil
 }

+// GetVPPLBCounters returns the most recent per-VIP and per-backend counter
+// snapshot captured by the client's 5s scrape loop. The call is served
+// from an in-process cache and does not hit VPP. An empty response is
+// returned when VPP is disconnected or no scrape has completed yet.
+func (s *Server) GetVPPLBCounters(_ context.Context, _ *GetVPPLBCountersRequest) (*VPPLBCounters, error) {
+	if s.vppClient == nil {
+		return nil, status.Error(codes.Unavailable, "VPP integration is disabled")
+	}
+	out := &VPPLBCounters{}
+	for _, v := range s.vppClient.VIPStats() {
+		out.Vips = append(out.Vips, &VPPLBVIPCounters{
+			Prefix:          v.Prefix,
+			Protocol:        v.Protocol,
+			Port:            uint32(v.Port),
+			NextPacket:      v.NextPkt,
+			FirstPacket:     v.FirstPkt,
+			UntrackedPacket: v.Untracked,
+			NoServer:        v.NoServer,
+			Packets:         v.Packets,
+			Bytes:           v.Bytes,
+		})
+	}
+	sort.Slice(out.Vips, func(i, j int) bool {
+		if out.Vips[i].Prefix != out.Vips[j].Prefix {
+			return out.Vips[i].Prefix < out.Vips[j].Prefix
+		}
+		if out.Vips[i].Protocol != out.Vips[j].Protocol {
+			return out.Vips[i].Protocol < out.Vips[j].Protocol
+		}
+		return out.Vips[i].Port < out.Vips[j].Port
+	})
+	for _, b := range s.vppClient.BackendRouteStats() {
+		out.Backends = append(out.Backends, &VPPLBBackendCounters{
+			Backend: b.Backend,
+			Address: b.Address,
+			Packets: b.Packets,
+			Bytes:   b.Bytes,
+		})
+	}
+	sort.Slice(out.Backends, func(i, j int) bool {
+		return out.Backends[i].Backend < out.Backends[j].Backend
+	})
+	return out, nil
+}
+
 // SyncVPPLBState runs the LB reconciler. With frontend_name unset it does a
 // full sync (SyncLBStateAll), which may remove stale VIPs. With frontend_name
 // set it does a single-VIP sync (SyncLBStateVIP) that only adds/updates.
@@ -342,6 +434,7 @@ func lbStateToProto(s *vpp.LBState) *VPPLBState {
 			Port:            uint32(v.Port),
 			Encap:           v.Encap,
 			FlowTableLength: uint32(v.FlowTableLength),
+			SrcIpSticky:     v.SrcIPSticky,
 		}
 		for _, a := range v.ASes {
 			var ts int64
@@ -393,6 +486,7 @@ func frontendToProto(name string, fe config.Frontend, src vpp.StateSource) *Fron
 		Port:        uint32(fe.Port),
 		Description: fe.Description,
 		Pools:       pools,
+		SrcIpSticky: fe.SrcIPSticky,
 	}
 }

--- a/internal/grpcapi/server_test.go
+++ b/internal/grpcapi/server_test.go
@@ -353,9 +353,11 @@ func TestWatchEventsServerShutdown(t *testing.T) {
 	if err != nil {
 		t.Fatalf("WatchEvents: %v", err)
 	}
-	// Drain the initial synthetic backend event.
-	if _, err := stream.Recv(); err != nil {
-		t.Fatalf("initial Recv: %v", err)
+	// Drain the initial synthetic snapshots (one per backend, one per frontend).
+	for i := 0; i < 2; i++ {
+		if _, err := stream.Recv(); err != nil {
+			t.Fatalf("initial Recv %d: %v", i, err)
+		}
 	}

 	// Cancel the server context; the stream must terminate.
--- a/internal/health/state.go
+++ b/internal/health/state.go
@@ -64,6 +64,43 @@ type Transition struct {
 	Result ProbeResult
 }

+// FrontendState is the aggregated state of a frontend derived from the
+// effective weights of its member backends. Frontends do not have their
+// own rise/fall counters: they're purely a reduction over backend state.
+//
+//   - unknown: no backends, or every referenced backend is in StateUnknown
+//     (the checker has no probe data yet).
+//   - up: at least one backend has effective weight > 0 — the VIP has
+//     something to serve.
+//   - down: backends exist with real state, but none have effective
+//     weight > 0 — the VIP has nothing to serve.
+type FrontendState int
+
+const (
+	FrontendStateUnknown FrontendState = iota
+	FrontendStateUp
+	FrontendStateDown
+)
+
+func (s FrontendState) String() string {
+	switch s {
+	case FrontendStateUnknown:
+		return "unknown"
+	case FrontendStateUp:
+		return "up"
+	case FrontendStateDown:
+		return "down"
+	}
+	return "unknown"
+}
+
+// FrontendTransition records a frontend state change event.
+type FrontendTransition struct {
+	From FrontendState
+	To   FrontendState
+	At   time.Time
+}
+
 // HealthCounter is HAProxy's single-integer rise/fall model.
 //
 // Health ∈ [0, Rise+Fall-1]. Server is UP when Health >= Rise, DOWN when
--- a/internal/health/weights.go
+++ b/internal/health/weights.go
@@ -0,0 +1,118 @@
+// Copyright (c) 2026, Pim van Pelt <pim@ipng.ch>
+
+package health
+
+import (
+	"git.ipng.ch/ipng/vpp-maglev/internal/config"
+)
+
+// ActivePoolIndex returns the priority-failover pool index for fe given
+// the current backend states. The active pool is the first pool that
+// contains at least one backend in StateUp — pool[0] is the primary,
+// pool[1] the first fallback, and so on. Returns 0 when no pool has
+// any up backend, in which case every backend maps to weight 0 and the
+// return value is unobservable.
+func ActivePoolIndex(fe config.Frontend, states map[string]State) int {
+	for i, pool := range fe.Pools {
+		for bName := range pool.Backends {
+			if states[bName] == StateUp {
+				return i
+			}
+		}
+	}
+	return 0
+}
+
+// BackendEffectiveWeight is the pure mapping from (pool index, active pool,
+// backend state, config weight) to the desired VPP AS weight and flush hint.
+// This is the single source of truth for the state → dataplane rule.
+//
+// A backend gets its configured weight iff it is up AND belongs to the
+// currently-active pool. Every other case yields weight 0. Only StateDisabled
+// produces flush=true (immediate session teardown).
+//
+//	state       in active pool   not in active pool    flush
+//	--------    --------------   -------------------   -----
+//	unknown     0                0                     no
+//	up          configured       0 (standby)           no
+//	down        0                0                     no
+//	paused      0                0                     no
+//	disabled    0                0                     yes
+func BackendEffectiveWeight(poolIdx, activePool int, state State, cfgWeight int) (weight uint8, flush bool) {
+	switch state {
+	case StateUp:
+		if poolIdx == activePool {
+			return clampWeight(cfgWeight), false
+		}
+		return 0, false
+	case StateDisabled:
+		return 0, true
+	default:
+		return 0, false
+	}
+}
+
+// EffectiveWeights computes per-pool per-backend effective weights for fe,
+// given a snapshot of backend states. Result layout: weights[poolIdx][backendName].
+func EffectiveWeights(fe config.Frontend, states map[string]State) map[int]map[string]uint8 {
+	activePool := ActivePoolIndex(fe, states)
+	out := make(map[int]map[string]uint8, len(fe.Pools))
+	for poolIdx, pool := range fe.Pools {
+		out[poolIdx] = make(map[string]uint8, len(pool.Backends))
+		for bName, pb := range pool.Backends {
+			w, _ := BackendEffectiveWeight(poolIdx, activePool, states[bName], pb.Weight)
+			out[poolIdx][bName] = w
+		}
+	}
+	return out
+}
+
+// ComputeFrontendState derives the FrontendState for fe from a snapshot of
+// backend states. Rules:
+//
+//   - no backends → unknown
+//   - every referenced backend is in StateUnknown → unknown
+//   - any backend has effective weight > 0 → up
+//   - otherwise → down
+func ComputeFrontendState(fe config.Frontend, states map[string]State) FrontendState {
+	// Unique set of backends referenced by this frontend (a single backend
+	// may appear in multiple pools; we count it once).
+	seen := make(map[string]struct{})
+	for _, pool := range fe.Pools {
+		for bName := range pool.Backends {
+			seen[bName] = struct{}{}
+		}
+	}
+	if len(seen) == 0 {
+		return FrontendStateUnknown
+	}
+	allUnknown := true
+	for bName := range seen {
+		if states[bName] != StateUnknown {
+			allUnknown = false
+			break
+		}
+	}
+	if allUnknown {
+		return FrontendStateUnknown
+	}
+	ew := EffectiveWeights(fe, states)
+	for _, poolMap := range ew {
+		for _, w := range poolMap {
+			if w > 0 {
+				return FrontendStateUp
+			}
+		}
+	}
+	return FrontendStateDown
+}
+
+func clampWeight(w int) uint8 {
+	if w < 0 {
+		return 0
+	}
+	if w > 100 {
+		return 100
+	}
+	return uint8(w)
+}
--- a/internal/health/weights_test.go
+++ b/internal/health/weights_test.go
@@ -0,0 +1,232 @@
+// Copyright (c) 2026, Pim van Pelt <pim@ipng.ch>
+
+package health
+
+import (
+	"testing"
+
+	"git.ipng.ch/ipng/vpp-maglev/internal/config"
+)
+
+// TestBackendEffectiveWeight locks down the state → (weight, flush) truth
+// table. This is the single source of truth for how maglevd decides what
+// to program into VPP for each backend state. If this test needs updating
+// the behavior has deliberately changed.
+func TestBackendEffectiveWeight(t *testing.T) {
+	cases := []struct {
+		name       string
+		poolIdx    int
+		activePool int
+		state      State
+		cfgWeight  int
+		wantWeight uint8
+		wantFlush  bool
+	}{
+		{"up active w100", 0, 0, StateUp, 100, 100, false},
+		{"up active w50", 0, 0, StateUp, 50, 50, false},
+		{"up active w0", 0, 0, StateUp, 0, 0, false},
+		{"up active clamp-high", 0, 0, StateUp, 150, 100, false},
+		{"up active clamp-low", 0, 0, StateUp, -5, 0, false},
+
+		{"up standby pool0 active=1", 0, 1, StateUp, 100, 0, false},
+		{"up standby pool1 active=0", 1, 0, StateUp, 100, 0, false},
+		{"up standby pool2 active=0", 2, 0, StateUp, 100, 0, false},
+
+		{"up failover pool1 active=1", 1, 1, StateUp, 100, 100, false},
+
+		{"unknown pool0 active=0", 0, 0, StateUnknown, 100, 0, false},
+		{"unknown pool1 active=0", 1, 0, StateUnknown, 100, 0, false},
+
+		{"down pool0 active=0", 0, 0, StateDown, 100, 0, false},
+		{"down pool1 active=1", 1, 1, StateDown, 100, 0, false},
+
+		{"paused pool0 active=0", 0, 0, StatePaused, 100, 0, false},
+
+		{"disabled pool0 active=0", 0, 0, StateDisabled, 100, 0, true},
+		{"disabled pool1 active=1", 1, 1, StateDisabled, 100, 0, true},
+	}
+
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			w, f := BackendEffectiveWeight(tc.poolIdx, tc.activePool, tc.state, tc.cfgWeight)
+			if w != tc.wantWeight {
+				t.Errorf("weight: got %d, want %d", w, tc.wantWeight)
+			}
+			if f != tc.wantFlush {
+				t.Errorf("flush: got %v, want %v", f, tc.wantFlush)
+			}
+		})
+	}
+}
+
+// TestActivePoolIndex locks down the priority-failover selector: the first
+// pool containing at least one up backend is the active pool. Default 0.
+func TestActivePoolIndex(t *testing.T) {
+	mkFE := func(pools ...[]string) config.Frontend {
+		out := make([]config.Pool, len(pools))
+		for i, p := range pools {
+			out[i] = config.Pool{Name: "p", Backends: map[string]config.PoolBackend{}}
+			for _, name := range p {
+				out[i].Backends[name] = config.PoolBackend{Weight: 100}
+			}
+		}
+		return config.Frontend{Pools: out}
+	}
+
+	cases := []struct {
+		name   string
+		fe     config.Frontend
+		states map[string]State
+		want   int
+	}{
+		{
+			name:   "pool0 has up, pool1 standby",
+			fe:     mkFE([]string{"a", "b"}, []string{"c", "d"}),
+			states: map[string]State{"a": StateUp, "b": StateDown, "c": StateUp, "d": StateUp},
+			want:   0,
+		},
+		{
+			name:   "pool0 all down, pool1 has up → failover",
+			fe:     mkFE([]string{"a", "b"}, []string{"c", "d"}),
+			states: map[string]State{"a": StateDown, "b": StateDown, "c": StateUp, "d": StateUp},
+			want:   1,
+		},
+		{
+			name:   "pool0 all disabled, pool1 has up → failover",
+			fe:     mkFE([]string{"a", "b"}, []string{"c"}),
+			states: map[string]State{"a": StateDisabled, "b": StateDisabled, "c": StateUp},
+			want:   1,
+		},
+		{
+			name:   "pool0 all paused, pool1 has up → failover",
+			fe:     mkFE([]string{"a"}, []string{"c"}),
+			states: map[string]State{"a": StatePaused, "c": StateUp},
+			want:   1,
+		},
+		{
+			name:   "pool0 all unknown (startup), pool1 up → pool1",
+			fe:     mkFE([]string{"a"}, []string{"c"}),
+			states: map[string]State{"a": StateUnknown, "c": StateUp},
+			want:   1,
+		},
+		{
+			name:   "nothing up anywhere → default 0",
+			fe:     mkFE([]string{"a"}, []string{"c"}),
+			states: map[string]State{"a": StateDown, "c": StateDown},
+			want:   0,
+		},
+		{
+			name:   "1 up in pool0 is enough",
+			fe:     mkFE([]string{"a", "b", "c"}, []string{"d"}),
+			states: map[string]State{"a": StateDown, "b": StateDown, "c": StateUp, "d": StateUp},
+			want:   0,
+		},
+		{
+			name:   "three tiers, pool0 and pool1 both empty → pool2",
+			fe:     mkFE([]string{"a"}, []string{"b"}, []string{"c"}),
+			states: map[string]State{"a": StateDown, "b": StateDown, "c": StateUp},
+			want:   2,
+		},
+	}
+
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			got := ActivePoolIndex(tc.fe, tc.states)
+			if got != tc.want {
+				t.Errorf("got pool %d, want pool %d", got, tc.want)
+			}
+		})
+	}
+}
+
+// TestComputeFrontendState locks down the reduction rule: frontends are
+// up iff any backend has effective weight > 0, unknown iff all backends
+// are still in StateUnknown (or there are no backends), and down otherwise.
+func TestComputeFrontendState(t *testing.T) {
+	mkFE := func(pools ...[]string) config.Frontend {
+		out := make([]config.Pool, len(pools))
+		for i, p := range pools {
+			out[i] = config.Pool{Name: "p", Backends: map[string]config.PoolBackend{}}
+			for _, name := range p {
+				out[i].Backends[name] = config.PoolBackend{Weight: 100}
+			}
+		}
+		return config.Frontend{Pools: out}
+	}
+
+	cases := []struct {
+		name   string
+		fe     config.Frontend
+		states map[string]State
+		want   FrontendState
+	}{
+		{
+			name: "no backends → unknown",
+			fe:   config.Frontend{Pools: []config.Pool{{Name: "primary", Backends: map[string]config.PoolBackend{}}}},
+			want: FrontendStateUnknown,
+		},
+		{
+			name:   "all unknown (startup) → unknown",
+			fe:     mkFE([]string{"a", "b"}),
+			states: map[string]State{"a": StateUnknown, "b": StateUnknown},
+			want:   FrontendStateUnknown,
+		},
+		{
+			name:   "one up in primary → up",
+			fe:     mkFE([]string{"a", "b"}),
+			states: map[string]State{"a": StateUp, "b": StateDown},
+			want:   FrontendStateUp,
+		},
+		{
+			name:   "all down → down",
+			fe:     mkFE([]string{"a", "b"}),
+			states: map[string]State{"a": StateDown, "b": StateDown},
+			want:   FrontendStateDown,
+		},
+		{
+			name:   "all disabled → down",
+			fe:     mkFE([]string{"a", "b"}),
+			states: map[string]State{"a": StateDisabled, "b": StateDisabled},
+			want:   FrontendStateDown,
+		},
+		{
+			name:   "all paused → down",
+			fe:     mkFE([]string{"a"}),
+			states: map[string]State{"a": StatePaused},
+			want:   FrontendStateDown,
+		},
+		{
+			name:   "primary down, secondary up → up (failover)",
+			fe:     mkFE([]string{"a"}, []string{"b"}),
+			states: map[string]State{"a": StateDown, "b": StateUp},
+			want:   FrontendStateUp,
+		},
+		{
+			name:   "primary up, secondary down → up (secondary standby ignored)",
+			fe:     mkFE([]string{"a"}, []string{"b"}),
+			states: map[string]State{"a": StateUp, "b": StateDown},
+			want:   FrontendStateUp,
+		},
+		{
+			name:   "primary unknown, secondary unknown → unknown",
+			fe:     mkFE([]string{"a"}, []string{"b"}),
+			states: map[string]State{"a": StateUnknown, "b": StateUnknown},
+			want:   FrontendStateUnknown,
+		},
+		{
+			name:   "primary down, secondary unknown → down",
+			fe:     mkFE([]string{"a"}, []string{"b"}),
+			states: map[string]State{"a": StateDown, "b": StateUnknown},
+			want:   FrontendStateDown,
+		},
+	}
+
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			got := ComputeFrontendState(tc.fe, tc.states)
+			if got != tc.want {
+				t.Errorf("got %s, want %s", got, tc.want)
+			}
+		})
+	}
+}
--- a/internal/metrics/metrics.go
+++ b/internal/metrics/metrics.go
@@ -45,12 +45,53 @@ type VPPInfo struct {
 	ConnectedSince time.Time
 }

+// VIPStatEntry is a point-in-time snapshot of the per-VIP counters that
+// VPP exposes via the stats segment: four SimpleCounters from the LB
+// plugin (packets only) plus the FIB CombinedCounter at /net/route/to
+// for the VIP's own host prefix (packets + bytes). Values are summed
+// across worker threads. The labelling (prefix/protocol/port) matches
+// the gRPC VPPLBVIP representation so a Prometheus time series
+// corresponds 1:1 to a maglev frontend VIP.
+type VIPStatEntry struct {
+	Prefix   string // CIDR string, e.g. "192.0.2.1/32"
+	Protocol string // "tcp", "udp", "any"
+	Port     uint16
+	// LB plugin SimpleCounters (packets only)
+	NextPkt   uint64 // /packet from existing sessions
+	FirstPkt  uint64 // /first session packet
+	Untracked uint64 // /untracked packet
+	NoServer  uint64 // /no server configured
+	// FIB CombinedCounter from /net/route/to at the VIP prefix
+	Packets uint64
+	Bytes   uint64
+}
+
+// BackendRouteStat is a point-in-time snapshot of the FIB combined counter
+// (/net/route/to) for a single backend's host prefix. Values are summed
+// across worker threads. Labels match the backend's identity so a time
+// series corresponds 1:1 to a maglev backend entry.
+type BackendRouteStat struct {
+	Backend string // backend name from the config
+	Address string // backend IP address as a string (e.g. "192.0.2.10")
+	Packets uint64
+	Bytes   uint64
+}
+
 // VPPSource provides read-only access to the VPP client's state. vpp.Client
 // is adapted to this interface via a small shim in the collector so the
 // metrics package stays decoupled from the vpp package's concrete types.
 type VPPSource interface {
 	IsConnected() bool
 	VPPInfo() (VPPInfo, bool)
+	// VIPStats returns the most recent snapshot of per-VIP stats-segment
+	// counters, as captured by the LB stats loop. Returns nil when VPP is
+	// disconnected or no scrape has happened yet.
+	VIPStats() []VIPStatEntry
+	// BackendRouteStats returns the most recent snapshot of per-backend
+	// FIB combined counters (/net/route/to), as captured by the LB stats
+	// loop. Returns nil when VPP is disconnected, no scrape has happened
+	// yet, or the route lookup for every backend failed.
+	BackendRouteStats() []BackendRouteStat
 }

 // ---- inline metrics (updated per probe) ------------------------------------
@@ -118,6 +159,12 @@ type Collector struct {
 	vppUptimeSeconds *prometheus.Desc
 	vppConnectedFor  *prometheus.Desc
 	vppInfo          *prometheus.Desc
+
+	vipPackets       *prometheus.Desc // per-VIP LB counters from stats segment
+	vipRoutePkts     *prometheus.Desc // per-VIP FIB combined counter: packets
+	vipRouteByts     *prometheus.Desc // per-VIP FIB combined counter: bytes
+	backendRoutePkts *prometheus.Desc // per-backend FIB combined counter: packets
+	backendRouteByts *prometheus.Desc // per-backend FIB combined counter: bytes
 }

 // NewCollector creates a Collector backed by the given StateSource. vpp may
@@ -167,6 +214,31 @@ func NewCollector(src StateSource, vpp VPPSource) *Collector {
 			"Static VPP build information. Always 1; metadata is conveyed via labels.",
 			[]string{"version", "build_date", "pid"}, nil,
 		),
+		vipPackets: prometheus.NewDesc(
+			"maglev_vpp_vip_packets_total",
+			"Per-VIP packet counters from the VPP LB plugin stats segment, summed across workers. kind ∈ {next, first, untracked, no_server}.",
+			[]string{"prefix", "protocol", "port", "kind"}, nil,
+		),
+		vipRoutePkts: prometheus.NewDesc(
+			"maglev_vpp_vip_route_packets_total",
+			"Packets forwarded by VPP's FIB toward each VIP's host prefix (from /net/route/to), summed across workers.",
+			[]string{"prefix", "protocol", "port"}, nil,
+		),
+		vipRouteByts: prometheus.NewDesc(
+			"maglev_vpp_vip_route_bytes_total",
+			"Bytes forwarded by VPP's FIB toward each VIP's host prefix (from /net/route/to), summed across workers.",
+			[]string{"prefix", "protocol", "port"}, nil,
+		),
+		backendRoutePkts: prometheus.NewDesc(
+			"maglev_vpp_backend_route_packets_total",
+			"Packets forwarded by VPP's FIB toward each backend's host prefix (from /net/route/to), summed across workers.",
+			[]string{"backend", "address"}, nil,
+		),
+		backendRouteByts: prometheus.NewDesc(
+			"maglev_vpp_backend_route_bytes_total",
+			"Bytes forwarded by VPP's FIB toward each backend's host prefix (from /net/route/to), summed across workers.",
+			[]string{"backend", "address"}, nil,
+		),
 	}
 }

@@ -180,6 +252,11 @@ func (c *Collector) Describe(ch chan<- *prometheus.Desc) {
 	ch <- c.vppUptimeSeconds
 	ch <- c.vppConnectedFor
 	ch <- c.vppInfo
+	ch <- c.vipPackets
+	ch <- c.vipRoutePkts
+	ch <- c.vipRouteByts
+	ch <- c.backendRoutePkts
+	ch <- c.backendRouteByts
 }

 // Collect implements prometheus.Collector.
@@ -271,6 +348,27 @@ func (c *Collector) Collect(ch chan<- prometheus.Metric) {
 		c.vppInfo, prometheus.GaugeValue, 1.0,
 		info.Version, info.BuildDate, fmt.Sprintf("%d", info.PID),
 	)
+
+	// Per-VIP packet counters, read from the snapshot updated by the LB
+	// stats loop in internal/vpp. CounterValue so rate()/increase() work
+	// as expected; VPP counter resets (e.g. VIP recreate) are handled by
+	// Prometheus's built-in counter-reset detection.
+	for _, v := range c.vpp.VIPStats() {
+		port := fmt.Sprintf("%d", v.Port)
+		ch <- prometheus.MustNewConstMetric(c.vipPackets, prometheus.CounterValue, float64(v.NextPkt), v.Prefix, v.Protocol, port, "next")
+		ch <- prometheus.MustNewConstMetric(c.vipPackets, prometheus.CounterValue, float64(v.FirstPkt), v.Prefix, v.Protocol, port, "first")
+		ch <- prometheus.MustNewConstMetric(c.vipPackets, prometheus.CounterValue, float64(v.Untracked), v.Prefix, v.Protocol, port, "untracked")
+		ch <- prometheus.MustNewConstMetric(c.vipPackets, prometheus.CounterValue, float64(v.NoServer), v.Prefix, v.Protocol, port, "no_server")
+		ch <- prometheus.MustNewConstMetric(c.vipRoutePkts, prometheus.CounterValue, float64(v.Packets), v.Prefix, v.Protocol, port)
+		ch <- prometheus.MustNewConstMetric(c.vipRouteByts, prometheus.CounterValue, float64(v.Bytes), v.Prefix, v.Protocol, port)
+	}
+
+	// Per-backend FIB counters from /net/route/to. Same CounterValue
+	// semantics as above.
+	for _, b := range c.vpp.BackendRouteStats() {
+		ch <- prometheus.MustNewConstMetric(c.backendRoutePkts, prometheus.CounterValue, float64(b.Packets), b.Backend, b.Address)
+		ch <- prometheus.MustNewConstMetric(c.backendRouteByts, prometheus.CounterValue, float64(b.Bytes), b.Backend, b.Address)
+	}
 }

 // Register registers all metrics with the given registry. vpp may be nil
--- a/internal/vpp/client.go
+++ b/internal/vpp/client.go
@@ -9,6 +9,7 @@ import (
 	"context"
 	"log/slog"
 	"sync"
+	"sync/atomic"
 	"time"

 	"go.fd.io/govpp/adapter"
@@ -60,6 +61,16 @@ type Client struct {
 	info        Info             // populated on successful connect
 	stateSrc    StateSource      // optional; enables periodic LB sync
 	lastLBConf  *lb.LbConf       // cached last-pushed lb_conf (dedup)
+
+	// lbStatsSnap is the most recent per-VIP stats snapshot captured by
+	// lbStatsLoop. Published as an immutable slice via atomic.Pointer so
+	// Prometheus scrapes (metrics.Collector.Collect) don't take any lock.
+	lbStatsSnap atomic.Pointer[[]metrics.VIPStatEntry]
+
+	// backendRouteSnap is the most recent per-backend FIB stats snapshot
+	// captured by lbStatsLoop. Same atomic-pointer publish pattern as
+	// lbStatsSnap; see logBackendRouteStats in fibstats.go.
+	backendRouteSnap atomic.Pointer[[]metrics.BackendRouteStat]
 }

 // SetStateSource attaches a live config + health state source. When set, the
@@ -140,10 +151,11 @@ func (c *Client) Run(ctx context.Context) {
 			}
 		}

-		// Start the LB sync loop for as long as the connection is up.
-		// It exits when connCtx is cancelled (on disconnect or shutdown).
+		// Start the LB sync and stats loops for as long as the connection
+		// is up. Both exit when connCtx is cancelled.
 		connCtx, connCancel := context.WithCancel(ctx)
 		go c.lbSyncLoop(connCtx)
+		go c.lbStatsLoop(connCtx)

 		// Hold the connection, pinging periodically to detect VPP restarts.
 		c.monitor(ctx)
@@ -217,6 +229,30 @@ func (c *Client) GetInfo() (Info, error) {
 	return c.info, nil
 }

+// VIPStats satisfies metrics.VPPSource. It returns the latest snapshot of
+// per-VIP LB stats-segment counters captured by lbStatsLoop. Returns nil
+// until the first scrape completes, or after a disconnect (the pointer is
+// cleared when the connection drops).
+func (c *Client) VIPStats() []metrics.VIPStatEntry {
+	p := c.lbStatsSnap.Load()
+	if p == nil {
+		return nil
+	}
+	return *p
+}
+
+// BackendRouteStats satisfies metrics.VPPSource. It returns the latest
+// snapshot of per-backend FIB combined counters (/net/route/to) captured
+// by lbStatsLoop. Returns nil until the first scrape completes, or after
+// a disconnect.
+func (c *Client) BackendRouteStats() []metrics.BackendRouteStat {
+	p := c.backendRouteSnap.Load()
+	if p == nil {
+		return nil
+	}
+	return *p
+}
+
 // VPPInfo satisfies metrics.VPPSource. It returns a copy of the cached
 // connection info as a metrics-local struct so the metrics package doesn't
 // need to import internal/vpp. Second return is false when VPP is not
@@ -272,6 +308,8 @@ func (c *Client) disconnect() {
 	c.info = Info{}
 	c.lastLBConf = nil // force re-push of lb_conf on reconnect
 	c.mu.Unlock()
+	c.lbStatsSnap.Store(nil)
+	c.backendRouteSnap.Store(nil)

 	safeDisconnectAPI(apiConn)
 	safeDisconnectStats(statsConn)
--- a/internal/vpp/fibstats.go
+++ b/internal/vpp/fibstats.go
@@ -0,0 +1,87 @@
+// Copyright (c) 2026, Pim van Pelt <pim@ipng.ch>
+
+package vpp
+
+import (
+	"fmt"
+	"net"
+
+	"go.fd.io/govpp/adapter"
+	"go.fd.io/govpp/binapi/ip"
+	"go.fd.io/govpp/binapi/ip_types"
+)
+
+// routeToStatPath is the VPP stats-segment path exposing the per-FIB-entry
+// "route-to" combined counter (packets + bytes), indexed by the load-
+// balance index of each FIB entry. See lbm_to_counters in
+// src/vnet/dpo/load_balance.c.
+const routeToStatPath = "/net/route/to"
+
+// fibStatsIndex returns the FIB entry's stats_index (load_balance index)
+// for the host prefix of addr. Uses exact=0 (longest-match) so a covering
+// route is returned if there is no host-prefix entry — note this means
+// two maglev entities sharing a covering route will report identical
+// /net/route/to counters.
+func fibStatsIndex(ch *loggedChannel, addr net.IP) (uint32, error) {
+	var prefix ip_types.Prefix
+	if v4 := addr.To4(); v4 != nil {
+		prefix.Address.Af = ip_types.ADDRESS_IP4
+		copy(prefix.Address.Un.XXX_UnionData[:4], v4)
+		prefix.Len = 32
+	} else {
+		prefix.Address.Af = ip_types.ADDRESS_IP6
+		copy(prefix.Address.Un.XXX_UnionData[:], addr.To16())
+		prefix.Len = 128
+	}
+	req := &ip.IPRouteLookup{
+		TableID: 0,
+		Exact:   0,
+		Prefix:  prefix,
+	}
+	reply := &ip.IPRouteLookupReply{}
+	if err := ch.SendRequest(req).ReceiveReply(reply); err != nil {
+		return 0, fmt.Errorf("ip_route_lookup: %w", err)
+	}
+	if reply.Retval != 0 {
+		return 0, fmt.Errorf("ip_route_lookup: retval=%d", reply.Retval)
+	}
+	return reply.Route.StatsIndex, nil
+}
+
+// findCombinedCounter returns the CombinedCounterStat matching name, or
+// nil if not found or the wrong type.
+func findCombinedCounter(entries []adapter.StatEntry, name string) adapter.CombinedCounterStat {
+	for _, e := range entries {
+		if string(e.Name) != name {
+			continue
+		}
+		if s, ok := e.Data.(adapter.CombinedCounterStat); ok {
+			return s
+		}
+	}
+	return nil
+}
+
+// reduceCombinedCounter sums the (packets, bytes) CombinedCounter across
+// workers at column i, tolerating short per-worker vectors.
+func reduceCombinedCounter(s adapter.CombinedCounterStat, i int) (pkts, byts uint64) {
+	for _, thread := range s {
+		if i >= 0 && i < len(thread) {
+			pkts += thread[i][0]
+			byts += thread[i][1]
+		}
+	}
+	return pkts, byts
+}
+
+// vipKeyToIP extracts the VIP address from a vipKey's CIDR string. The
+// second return is the prefix length. Used by the scrape path to feed
+// a VIP prefix into fibStatsIndex.
+func vipKeyToIP(k vipKey) (net.IP, int, error) {
+	ip, ipnet, err := net.ParseCIDR(k.prefix)
+	if err != nil {
+		return nil, 0, err
+	}
+	ones, _ := ipnet.Mask.Size()
+	return ip, ones, nil
+}
--- a/internal/vpp/lbstate.go
+++ b/internal/vpp/lbstate.go
@@ -30,7 +30,11 @@ type LBVIP struct {
 	Dscp            uint8
 	TargetPort      uint16
 	FlowTableLength uint16
-	ASes            []LBAS
+	// SrcIPSticky is scraped from `show lb vips verbose` via cli_inband;
+	// VPP's lb_vip_details does not carry this flag. Populated by
+	// GetLBStateAll and GetLBStateVIP; see queryLBSticky in lbsync.go.
+	SrcIPSticky bool
+	ASes        []LBAS
 }

 // LBAS mirrors VPP's lb_as_v2_details: one application server bound to a VIP.
@@ -70,12 +74,17 @@ func (c *Client) GetLBStateAll() (*LBState, error) {
 	if err != nil {
 		return nil, err
 	}
+	stickyMap, err := queryLBSticky(ch)
+	if err != nil {
+		return nil, err
+	}
 	for i := range vips {
 		ases, err := dumpASesForVIP(ch, vips[i].Protocol, vips[i].Port)
 		if err != nil {
 			return nil, err
 		}
 		vips[i].ASes = ases
+		vips[i].SrcIPSticky = stickyMap[makeVIPKey(vips[i].Prefix, vips[i].Protocol, vips[i].Port)]
 	}
 	state.VIPs = vips
 	return state, nil
@@ -90,7 +99,16 @@ func (c *Client) GetLBStateVIP(prefix *net.IPNet, protocol uint8, port uint16) (
 		return nil, err
 	}
 	defer ch.Close()
-	return lookupVIP(ch, prefix, protocol, port)
+	vip, err := lookupVIP(ch, prefix, protocol, port)
+	if err != nil || vip == nil {
+		return vip, err
+	}
+	stickyMap, err := queryLBSticky(ch)
+	if err != nil {
+		return nil, err
+	}
+	vip.SrcIPSticky = stickyMap[makeVIPKey(vip.Prefix, vip.Protocol, vip.Port)]
+	return vip, nil
 }

 // ---- low-level helpers (used by both Get and Sync paths) -------------------
--- a/internal/vpp/lbstats.go
+++ b/internal/vpp/lbstats.go
@@ -0,0 +1,229 @@
+// Copyright (c) 2026, Pim van Pelt <pim@ipng.ch>
+
+package vpp
+
+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"sort"
+	"time"
+
+	"go.fd.io/govpp/adapter"
+
+	"git.ipng.ch/ipng/vpp-maglev/internal/metrics"
+)
+
+// lbStatsInterval is how often lbStatsLoop scrapes per-VIP and per-backend
+// counters from the VPP stats segment. Hard-coded for now; the scrape
+// feeds both slog.Debug lines and the Prometheus collector.
+const lbStatsInterval = 5 * time.Second
+
+// LB VIP counter names as they appear in the VPP stats segment. These
+// come from lb_foreach_vip_counter in src/plugins/lb/lb.h — each entry is
+// registered with only .name set, so the stats segment exposes them at
+// the top level (spaces and all). Replace if the VPP plugin renames them.
+const (
+	lbStatNextPacket   = "/packet from existing sessions"
+	lbStatFirstPacket  = "/first session packet"
+	lbStatUntrackedPkt = "/untracked packet"
+	lbStatNoServer     = "/no server configured"
+)
+
+// lbStatPatterns is the full list of anchored regexes passed to DumpStats
+// for one scrape cycle: the four LB-plugin SimpleCounters plus the FIB
+// CombinedCounter for per-route packet+byte totals. Doing it in a single
+// DumpStats avoids walking the stats segment twice.
+var lbStatPatterns = []string{
+	`^/packet from existing sessions$`,
+	`^/first session packet$`,
+	`^/untracked packet$`,
+	`^/no server configured$`,
+	`^/net/route/to$`,
+}
+
+// lbStatsLoop periodically scrapes the LB plugin's per-VIP counters and
+// the FIB's /net/route/to combined counter for both VIPs and backends,
+// publishes the results to the atomic snapshots read by Prometheus, and
+// emits one slog.Debug line per VIP and per backend. Exits when ctx is
+// cancelled.
+func (c *Client) lbStatsLoop(ctx context.Context) {
+	ticker := time.NewTicker(lbStatsInterval)
+	defer ticker.Stop()
+	for {
+		select {
+		case <-ctx.Done():
+			return
+		case <-ticker.C:
+		}
+		if err := c.scrapeLBStats(); err != nil {
+			slog.Debug("vpp-lb-stats-error", "err", err)
+		}
+	}
+}
+
+// scrapeLBStats runs one full scrape cycle: discover VIPs via cli_inband,
+// look up FIB stats_indices for every VIP and every backend via
+// ip_route_lookup, dump all five stats-segment paths in one DumpStats
+// call, and reduce the counters into the two published snapshots.
+func (c *Client) scrapeLBStats() error {
+	if !c.IsConnected() {
+		return nil
+	}
+	ch, err := c.apiChannel()
+	if err != nil {
+		return err
+	}
+	defer ch.Close()
+
+	snap, err := queryLBVIPSnapshot(ch)
+	if err != nil {
+		return fmt.Errorf("query vip snapshot: %w", err)
+	}
+
+	// Resolve FIB stats_indices for every VIP. The LB plugin installs a
+	// host-prefix FIB entry per VIP (lb.c:990), so the exact=0 lookup
+	// lands on it in the common case. See fibStatsIndex for the exact=0
+	// caveat when a covering route shadows the host prefix.
+	vipStatsIdx := make(map[vipKey]uint32, len(snap))
+	for k := range snap {
+		addr, _, err := vipKeyToIP(k)
+		if err != nil {
+			continue
+		}
+		idx, err := fibStatsIndex(ch, addr)
+		if err != nil {
+			slog.Debug("vpp-vip-route-lookup-failed",
+				"prefix", k.prefix, "err", err)
+			continue
+		}
+		vipStatsIdx[k] = idx
+	}
+
+	// Resolve FIB stats_indices for every backend in the running config.
+	type backendLookup struct {
+		name, addr string
+		index      uint32
+	}
+	var backends []backendLookup
+	if src := c.getStateSource(); src != nil {
+		if cfg := src.Config(); cfg != nil {
+			names := make([]string, 0, len(cfg.Backends))
+			for name := range cfg.Backends {
+				names = append(names, name)
+			}
+			sort.Strings(names) // stable snapshot order
+			for _, name := range names {
+				b := cfg.Backends[name]
+				if b.Address == nil {
+					continue
+				}
+				idx, err := fibStatsIndex(ch, b.Address)
+				if err != nil {
+					slog.Debug("vpp-backend-route-lookup-failed",
+						"backend", name, "address", b.Address.String(), "err", err)
+					continue
+				}
+				backends = append(backends, backendLookup{
+					name: name, addr: b.Address.String(), index: idx,
+				})
+			}
+		}
+	}
+
+	c.mu.Lock()
+	sc := c.statsClient
+	c.mu.Unlock()
+	if sc == nil {
+		return nil
+	}
+	entries, err := sc.DumpStats(lbStatPatterns...)
+	if err != nil {
+		return fmt.Errorf("dump stats: %w", err)
+	}
+
+	nextPkt := findSimpleCounter(entries, lbStatNextPacket)
+	firstPkt := findSimpleCounter(entries, lbStatFirstPacket)
+	untracked := findSimpleCounter(entries, lbStatUntrackedPkt)
+	noServer := findSimpleCounter(entries, lbStatNoServer)
+	routeTo := findCombinedCounter(entries, routeToStatPath)
+
+	// ---- VIP snapshot ----
+	vipOut := make([]metrics.VIPStatEntry, 0, len(snap))
+	for key, info := range snap {
+		lbIdx := int(info.index)
+		entry := metrics.VIPStatEntry{
+			Prefix:    key.prefix,
+			Protocol:  protocolName(key.protocol),
+			Port:      key.port,
+			NextPkt:   reduceSimpleCounter(nextPkt, lbIdx),
+			FirstPkt:  reduceSimpleCounter(firstPkt, lbIdx),
+			Untracked: reduceSimpleCounter(untracked, lbIdx),
+			NoServer:  reduceSimpleCounter(noServer, lbIdx),
+		}
+		if fibIdx, ok := vipStatsIdx[key]; ok {
+			entry.Packets, entry.Bytes = reduceCombinedCounter(routeTo, int(fibIdx))
+		}
+		vipOut = append(vipOut, entry)
+		slog.Debug("vpp-vip-stats",
+			"prefix", entry.Prefix,
+			"protocol", entry.Protocol,
+			"port", entry.Port,
+			"next-packet", entry.NextPkt,
+			"first-packet", entry.FirstPkt,
+			"untracked", entry.Untracked,
+			"no-server", entry.NoServer,
+			"packets", entry.Packets,
+			"bytes", entry.Bytes,
+		)
+	}
+	c.lbStatsSnap.Store(&vipOut)
+
+	// ---- backend snapshot ----
+	backendOut := make([]metrics.BackendRouteStat, 0, len(backends))
+	for _, l := range backends {
+		pkts, byts := reduceCombinedCounter(routeTo, int(l.index))
+		entry := metrics.BackendRouteStat{
+			Backend: l.name,
+			Address: l.addr,
+			Packets: pkts,
+			Bytes:   byts,
+		}
+		backendOut = append(backendOut, entry)
+		slog.Debug("vpp-backend-route-stats",
+			"backend", entry.Backend,
+			"address", entry.Address,
+			"packets", entry.Packets,
+			"bytes", entry.Bytes,
+		)
+	}
+	c.backendRouteSnap.Store(&backendOut)
+	return nil
+}
+
+// findSimpleCounter returns the SimpleCounterStat matching name, or nil if
+// not found. Stats segment names are byte slices, so we compare as string.
+func findSimpleCounter(entries []adapter.StatEntry, name string) adapter.SimpleCounterStat {
+	for _, e := range entries {
+		if string(e.Name) != name {
+			continue
+		}
+		if s, ok := e.Data.(adapter.SimpleCounterStat); ok {
+			return s
+		}
+	}
+	return nil
+}
+
+// reduceSimpleCounter sums per-worker values at column i. It tolerates a
+// short per-worker vector (which can happen right after a VIP is added,
+// before a worker has observed it) by skipping out-of-range rows.
+func reduceSimpleCounter(s adapter.SimpleCounterStat, i int) uint64 {
+	var sum uint64
+	for _, thread := range s {
+		if i >= 0 && i < len(thread) {
+			sum += uint64(thread[i])
+		}
+	}
+	return sum
+}
--- a/internal/vpp/lbsync.go
+++ b/internal/vpp/lbsync.go
@@ -7,6 +7,11 @@ import (
 	"fmt"
 	"log/slog"
 	"net"
+	"regexp"
+	"strconv"
+	"strings"
+
+	"go.fd.io/govpp/binapi/vlib"

 	"git.ipng.ch/ipng/vpp-maglev/internal/config"
 	"git.ipng.ch/ipng/vpp-maglev/internal/health"
@@ -29,10 +34,11 @@ type vipKey struct {

 // desiredVIP is the sync's view of one VIP derived from the maglev config.
 type desiredVIP struct {
-	Prefix   *net.IPNet
-	Protocol uint8 // 6=TCP, 17=UDP, 255=any
-	Port     uint16
-	ASes     map[string]desiredAS // keyed by AS IP string
+	Prefix      *net.IPNet
+	Protocol    uint8 // 6=TCP, 17=UDP, 255=any
+	Port        uint16
+	SrcIPSticky bool                 // lb_add_del_vip_v2.src_ip_sticky
+	ASes        map[string]desiredAS // keyed by AS IP string
 }

 // desiredAS is one application server to be installed under a VIP.
@@ -133,10 +139,12 @@ func (c *Client) SyncLBStateAll(cfg *config.Config) error {
 	for k, d := range desByKey {
 		cur, existing := curByKey[k]
 		var curPtr *LBVIP
+		var curSticky bool
 		if existing {
 			curPtr = &cur
+			curSticky = cur.SrcIPSticky
 		}
-		if err := reconcileVIP(ch, d, curPtr, &st); err != nil {
+		if err := reconcileVIP(ch, d, curPtr, curSticky, &st); err != nil {
 			return err
 		}
 	}
@@ -189,8 +197,13 @@ func (c *Client) SyncLBStateVIP(cfg *config.Config, feName string) error {
 		"protocol", protocolName(d.Protocol),
 		"port", d.Port)

+	var curSticky bool
+	if cur != nil {
+		curSticky = cur.SrcIPSticky
+	}
+
 	var st syncStats
-	if err := reconcileVIP(ch, d, cur, &st); err != nil {
+	if err := reconcileVIP(ch, d, cur, curSticky, &st); err != nil {
 		return err
 	}
 	recordSyncStats("vip", &st)
@@ -207,7 +220,14 @@ func (c *Client) SyncLBStateVIP(cfg *config.Config, feName string) error {
 // reconcileVIP brings one VIP's state in VPP into alignment with the desired
 // state. If cur is nil the VIP is added from scratch; otherwise ASes are
 // added, removed, and reweighted individually. Stats are accumulated into st.
-func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, st *syncStats) error {
+//
+// curSticky is the src_ip_sticky flag VPP currently has programmed for this
+// VIP, as scraped by queryLBSticky. Callers only consult it when cur != nil,
+// and the scrape reads the same live lb_main.vips pool as lb_vip_dump, so a
+// matching entry is always present. When the flag differs from the desired
+// value, the VIP is torn down (ASes del+flushed, VIP deleted) and recreated
+// — VPP has no API to mutate src_ip_sticky on an existing VIP.
+func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, curSticky bool, st *syncStats) error {
 	if cur == nil {
 		if err := addVIP(ch, d); err != nil {
 			return err
@@ -222,6 +242,30 @@ func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, st *syncStats) er
 		return nil
 	}

+	if curSticky != d.SrcIPSticky {
+		slog.Info("vpp-lbsync-vip-recreate",
+			"prefix", d.Prefix.String(),
+			"protocol", protocolName(d.Protocol),
+			"port", d.Port,
+			"reason", "src-ip-sticky-changed",
+			"from", curSticky,
+			"to", d.SrcIPSticky)
+		if err := removeVIP(ch, *cur, st); err != nil {
+			return err
+		}
+		if err := addVIP(ch, d); err != nil {
+			return err
+		}
+		st.vipAdd++
+		for _, as := range d.ASes {
+			if err := addAS(ch, d.Prefix, d.Protocol, d.Port, as); err != nil {
+				return err
+			}
+			st.asAdd++
+		}
+		return nil
+	}
+
 	// VIP exists in both — reconcile ASes.
 	curASes := make(map[string]LBAS, len(cur.ASes))
 	for _, a := range cur.ASes {
@@ -306,25 +350,15 @@ func desiredFromFrontend(cfg *config.Config, fe config.Frontend, src StateSource
 		bits = 128
 	}
 	d := desiredVIP{
-		Prefix:   &net.IPNet{IP: fe.Address, Mask: net.CIDRMask(bits, bits)},
-		Protocol: protocolFromConfig(fe.Protocol),
-		Port:     fe.Port,
-		ASes:     make(map[string]desiredAS),
+		Prefix:      &net.IPNet{IP: fe.Address, Mask: net.CIDRMask(bits, bits)},
+		Protocol:    protocolFromConfig(fe.Protocol),
+		Port:        fe.Port,
+		SrcIPSticky: fe.SrcIPSticky,
+		ASes:        make(map[string]desiredAS),
 	}

-	// Snapshot backend states once so the active-pool computation and the
-	// per-backend weight assignment see a consistent view.
-	states := make(map[string]health.State)
-	for _, pool := range fe.Pools {
-		for bName := range pool.Backends {
-			if s, ok := src.BackendState(bName); ok {
-				states[bName] = s
-			} else {
-				states[bName] = health.StateUnknown
-			}
-		}
-	}
-	activePool := activePoolIndex(fe, states)
+	states := snapshotStates(fe, src)
+	activePool := health.ActivePoolIndex(fe, states)

 	for poolIdx, pool := range fe.Pools {
 		for bName, pb := range pool.Backends {
@@ -337,12 +371,13 @@ func desiredFromFrontend(cfg *config.Config, fe config.Frontend, src StateSource
 			// weight=0 — they must not be deleted, otherwise a subsequent
 			// enable has to re-add them and existing flow-table state (if
 			// any) is lost. The state machine drives what weight to set
-			// via asFromBackend; we never filter on b.Enabled here.
+			// via health.BackendEffectiveWeight; we never filter on
+			// b.Enabled here.
 			addr := b.Address.String()
 			if _, already := d.ASes[addr]; already {
 				continue
 			}
-			w, flush := asFromBackend(poolIdx, activePool, states[bName], pb.Weight)
+			w, flush := health.BackendEffectiveWeight(poolIdx, activePool, states[bName], pb.Weight)
 			d.ASes[addr] = desiredAS{
 				Address: b.Address,
 				Weight:  w,
@@ -354,14 +389,17 @@ func desiredFromFrontend(cfg *config.Config, fe config.Frontend, src StateSource
 }

 // EffectiveWeights returns the current effective VPP weight for every backend
-// in every pool of fe, keyed by poolIdx and backend name. It runs the same
-// failover + state-aware weight calculation that the sync path uses, but
-// produces a plain map instead of desiredVIP — intended for observability
-// (e.g. the GetFrontend gRPC handler) and for robot-testing the failover
-// logic without needing a running VPP instance.
-//
-// The returned map layout is: result[poolIdx][backendName] = effective weight.
+// in every pool of fe, keyed by poolIdx and backend name. Intended for
+// observability (e.g. the GetFrontend gRPC handler) — the sync path and the
+// checker's frontend-state logic use health.EffectiveWeights directly.
 func EffectiveWeights(fe config.Frontend, src StateSource) map[int]map[string]uint8 {
+	return health.EffectiveWeights(fe, snapshotStates(fe, src))
+}
+
+// snapshotStates builds a state map for every backend referenced by fe. It
+// takes one read-lock per backend via the StateSource interface, so the
+// caller gets a consistent view to feed into the pure health helpers.
+func snapshotStates(fe config.Frontend, src StateSource) map[string]health.State {
 	states := make(map[string]health.State)
 	for _, pool := range fe.Pools {
 		for bName := range pool.Backends {
@@ -372,75 +410,7 @@ func EffectiveWeights(fe config.Frontend, src StateSource) map[int]map[string]ui
 			}
 		}
 	}
-	activePool := activePoolIndex(fe, states)
-
-	out := make(map[int]map[string]uint8, len(fe.Pools))
-	for poolIdx, pool := range fe.Pools {
-		out[poolIdx] = make(map[string]uint8, len(pool.Backends))
-		for bName, pb := range pool.Backends {
-			w, _ := asFromBackend(poolIdx, activePool, states[bName], pb.Weight)
-			out[poolIdx][bName] = w
-		}
-	}
-	return out
-}
-
-// activePoolIndex returns the index of the first pool in fe that contains at
-// least one backend currently in StateUp. This is the priority-failover
-// selector: pool[0] is the primary, pool[1] is the first fallback, and so on.
-// As long as pool[0] has any up backend, it stays active. When every pool[0]
-// backend leaves StateUp (down, paused, disabled, unknown), pool[1] is
-// promoted — and so on for further fallback tiers. When no pool has any up
-// backend, returns 0 (the return value is unobservable in that case since
-// every backend maps to weight 0 regardless of the active pool).
-func activePoolIndex(fe config.Frontend, states map[string]health.State) int {
-	for i, pool := range fe.Pools {
-		for bName := range pool.Backends {
-			if states[bName] == health.StateUp {
-				return i
-			}
-		}
-	}
-	return 0
-}
-
-// asFromBackend is the pure mapping from (pool index, active pool, backend
-// state, config weight) to the desired VPP AS weight and flush hint. This is
-// the single source of truth for the state → dataplane rule — every LB change
-// flows through this function.
-//
-// A backend gets its configured weight iff it is up AND belongs to the
-// currently-active pool. Every other case yields weight 0. The only
-// state that produces flush=true is disabled.
-//
-//	state       in active pool   not in active pool    flush
-//	--------    --------------   -------------------   -----
-//	unknown     0                0                     no
-//	up          configured       0 (standby)           no
-//	down        0                0                     no
-//	paused      0                0                     no
-//	disabled    0                0                     yes
-//	removed     handled separately (AS deleted via delAS)
-//
-// Flush semantics: flush=true means "if the AS currently has a non-zero
-// weight in VPP, drop its existing flow-table entries when setting weight
-// to 0". The reconciler only acts on flush when transitioning (current
-// weight > 0), so steady-state syncs never re-flush. Failover demotion
-// (e.g. pool[1] up→standby when pool[0] recovers) does NOT flush — we
-// let those sessions drain naturally.
-func asFromBackend(poolIdx, activePool int, state health.State, cfgWeight int) (weight uint8, flush bool) {
-	switch state {
-	case health.StateUp:
-		if poolIdx == activePool {
-			return clampWeight(cfgWeight), false
-		}
-		return 0, false
-	case health.StateDisabled:
-		return 0, true
-	default:
-		// unknown, down, paused: off, drain existing flows naturally.
-		return 0, false
-	}
+	return states
 }

 // ---- API call helpers ------------------------------------------------------
@@ -461,6 +431,7 @@ func addVIP(ch *loggedChannel, d desiredVIP) error {
 		Encap:               encap,
 		Type:                lb_types.LB_API_SRV_TYPE_CLUSTERIP,
 		NewFlowsTableLength: defaultFlowsTableLength,
+		SrcIPSticky:         d.SrcIPSticky,
 		IsDel:               false,
 	}
 	reply := &lb.LbAddDelVipV2Reply{}
@@ -474,7 +445,8 @@ func addVIP(ch *loggedChannel, d desiredVIP) error {
 		"prefix", d.Prefix.String(),
 		"protocol", protocolName(d.Protocol),
 		"port", d.Port,
-		"encap", encapName(encap))
+		"encap", encapName(encap),
+		"src-ip-sticky", d.SrcIPSticky)
 	return nil
 }

@@ -574,6 +546,136 @@ func setASWeight(ch *loggedChannel, prefix *net.IPNet, protocol uint8, port uint
 	return nil
 }

+// ---- VIP snapshot scrape ---------------------------------------------------
+
+// TEMPORARY WORKAROUND: VPP's lb_vip_dump / lb_vip_details message does not
+// return the src_ip_sticky flag, and lb_vip_details also doesn't expose the
+// LB plugin's pool index — which is what the stats segment uses to key
+// per-VIP counters. We work around both by running `show lb vips verbose`
+// via the cli_inband API and parsing the human-readable output. This is
+// slow and fragile (the format is not a stable API) and must be replaced
+// once VPP ships an lb_vip_v2_dump that includes these fields.
+
+// lbVIPSnapshot holds the per-VIP facts that the scrape recovers. `index`
+// is the LB pool index (`vip - lbm->vips` in lb.c) used as the second
+// dimension of the vip_counters SimpleCounterStat in the stats segment.
+type lbVIPSnapshot struct {
+	index  uint32
+	sticky bool
+}
+
+// queryLBVIPSnapshot runs `show lb vips verbose` and parses it into a map
+// keyed by vipKey. The returned error is non-nil only when the cli_inband
+// RPC itself fails.
+func queryLBVIPSnapshot(ch *loggedChannel) (map[vipKey]lbVIPSnapshot, error) {
+	req := &vlib.CliInband{Cmd: "show lb vips verbose"}
+	reply := &vlib.CliInbandReply{}
+	if err := ch.SendRequest(req).ReceiveReply(reply); err != nil {
+		return nil, fmt.Errorf("cli_inband %q: %w", req.Cmd, err)
+	}
+	if reply.Retval != 0 {
+		return nil, fmt.Errorf("cli_inband %q: retval=%d", req.Cmd, reply.Retval)
+	}
+	return parseLBVIPSnapshot(reply.Reply), nil
+}
+
+// queryLBSticky is a thin projection of queryLBVIPSnapshot for callers that
+// only care about the src_ip_sticky flag (the sync path).
+func queryLBSticky(ch *loggedChannel) (map[vipKey]bool, error) {
+	snap, err := queryLBVIPSnapshot(ch)
+	if err != nil {
+		return nil, err
+	}
+	out := make(map[vipKey]bool, len(snap))
+	for k, v := range snap {
+		out[k] = v.sticky
+	}
+	return out, nil
+}
+
+// lbVIPHeaderRe matches the first line of each VIP block in the output of
+// `show lb vips verbose`. VPP produces lines like:
+//
+//	ip4-gre4 [1] 192.0.2.1/32 src_ip_sticky
+//	ip6-gre6 [2] 2001:db8::1/128
+//
+// Capture groups: 1 = pool index, 2 = prefix (CIDR), 3 = " src_ip_sticky" when present.
+var lbVIPHeaderRe = regexp.MustCompile(
+	`\b(?:ip4-gre4|ip6-gre6|ip4-gre6|ip6-gre4|ip4-l3dsr|ip4-nat4|ip6-nat6)\s+\[(\d+)\]\s+(\S+)(\s+src_ip_sticky)?`,
+)
+
+// lbVIPProtoRe matches the `protocol:<n> port:<n>` sub-line that appears
+// under each non-all-port VIP block.
+var lbVIPProtoRe = regexp.MustCompile(`protocol:(\d+)\s+port:(\d+)`)
+
+// parseLBVIPSnapshot turns the reply text of `show lb vips verbose` into a
+// map of vipKey → lbVIPSnapshot. It walks lines sequentially: each header
+// line starts a new VIP, and the following `protocol:<p> port:<port>` line
+// (if any) refines the key. All-port VIPs have no protocol/port sub-line
+// and are recorded with protocol=255 and port=0 — the same key format used
+// by the rest of the sync path (see protocolFromConfig).
+func parseLBVIPSnapshot(text string) map[vipKey]lbVIPSnapshot {
+	out := make(map[vipKey]lbVIPSnapshot)
+	var havePending bool
+	var curPrefix string
+	var curIndex uint32
+	var curSticky bool
+	var curProto uint8 = 255
+	var curPort uint16
+
+	flush := func() {
+		if !havePending || curPrefix == "" {
+			return
+		}
+		out[vipKey{prefix: curPrefix, protocol: curProto, port: curPort}] = lbVIPSnapshot{
+			index:  curIndex,
+			sticky: curSticky,
+		}
+	}
+
+	for _, line := range strings.Split(text, "\n") {
+		if m := lbVIPHeaderRe.FindStringSubmatch(line); m != nil {
+			flush()
+			havePending = true
+			if idx, err := strconv.ParseUint(m[1], 10, 32); err == nil {
+				curIndex = uint32(idx)
+			} else {
+				curIndex = 0
+			}
+			curPrefix = canonicalCIDR(m[2])
+			curSticky = m[3] != ""
+			curProto = 255
+			curPort = 0
+			continue
+		}
+		if !havePending {
+			continue
+		}
+		if m := lbVIPProtoRe.FindStringSubmatch(line); m != nil {
+			if p, err := strconv.Atoi(m[1]); err == nil {
+				curProto = uint8(p)
+			}
+			if p, err := strconv.Atoi(m[2]); err == nil {
+				curPort = uint16(p)
+			}
+		}
+	}
+	flush()
+	return out
+}
+
+// canonicalCIDR parses a CIDR string and returns it in Go's canonical
+// net.IPNet.String() form so it matches the keys produced by makeVIPKey.
+// Returns the input unchanged if parsing fails — the header regex should
+// have validated it, but a mismatch shouldn't panic the sync path.
+func canonicalCIDR(s string) string {
+	_, ipnet, err := net.ParseCIDR(s)
+	if err != nil {
+		return s
+	}
+	return ipnet.String()
+}
+
 // ---- utility ---------------------------------------------------------------

 func makeVIPKey(prefix *net.IPNet, protocol uint8, port uint16) vipKey {
@@ -618,13 +720,3 @@ func encapName(e lb_types.LbEncapType) string {
 	}
 	return fmt.Sprintf("%d", e)
 }
-
-func clampWeight(w int) uint8 {
-	if w < 0 {
-		return 0
-	}
-	if w > 100 {
-		return 100
-	}
-	return uint8(w)
-}
--- a/internal/vpp/lbsync_test.go
+++ b/internal/vpp/lbsync_test.go
@@ -10,141 +10,45 @@ import (
 	"git.ipng.ch/ipng/vpp-maglev/internal/health"
 )

-// TestAsFromBackend locks down the state → (weight, flush) truth table.
-// This is the single source of truth for how maglevd decides what to
-// program into VPP for each backend state. If this test needs updating
-// the behavior has deliberately changed.
-func TestAsFromBackend(t *testing.T) {
-	cases := []struct {
-		name       string
-		poolIdx    int
-		activePool int
-		state      health.State
-		cfgWeight  int
-		wantWeight uint8
-		wantFlush  bool
-	}{
-		// up in active pool → configured weight, no flush
-		{"up active w100", 0, 0, health.StateUp, 100, 100, false},
-		{"up active w50", 0, 0, health.StateUp, 50, 50, false},
-		{"up active w0", 0, 0, health.StateUp, 0, 0, false},
-		{"up active clamp-high", 0, 0, health.StateUp, 150, 100, false},
-		{"up active clamp-low", 0, 0, health.StateUp, -5, 0, false},
-
-		// up in non-active pool → standby (weight 0), no flush
-		{"up standby pool0 active=1", 0, 1, health.StateUp, 100, 0, false},
-		{"up standby pool1 active=0", 1, 0, health.StateUp, 100, 0, false},
-		{"up standby pool2 active=0", 2, 0, health.StateUp, 100, 0, false},
-
-		// up in secondary, promoted because pool[1] is now active
-		{"up failover pool1 active=1", 1, 1, health.StateUp, 100, 100, false},
-
-		// unknown → off, drain
-		{"unknown pool0 active=0", 0, 0, health.StateUnknown, 100, 0, false},
-		{"unknown pool1 active=0", 1, 0, health.StateUnknown, 100, 0, false},
-
-		// down → off, drain (probe might be wrong)
-		{"down pool0 active=0", 0, 0, health.StateDown, 100, 0, false},
-		{"down pool1 active=1", 1, 1, health.StateDown, 100, 0, false},
-
-		// paused → off, drain (graceful maintenance)
-		{"paused pool0 active=0", 0, 0, health.StatePaused, 100, 0, false},
-
-		// disabled → off, flush (hard stop)
-		{"disabled pool0 active=0", 0, 0, health.StateDisabled, 100, 0, true},
-		{"disabled pool1 active=1", 1, 1, health.StateDisabled, 100, 0, true},
+// TestParseLBVIPSnapshot pins the parser for `show lb vips verbose` output.
+// The text below is a synthetic sample that mirrors format_lb_vip_detailed
+// in src/plugins/lb/lb.c: a header line per VIP optionally carrying the
+// src_ip_sticky token, followed by a protocol:/port: sub-line for non all-
+// port VIPs. If VPP changes this format the test will fail loudly — the
+// scrape is a temporary workaround until lb_vip_v2_dump exists.
+func TestParseLBVIPSnapshot(t *testing.T) {
+	text := `  ip4-gre4 [1] 192.0.2.1/32 src_ip_sticky
+    new_size:1024
+    protocol:6 port:80
+    counters:
+  ip4-gre4 [2] 192.0.2.2/32
+    new_size:1024
+    protocol:17 port:53
+  ip6-gre6 [3] 2001:db8::1/128 src_ip_sticky
+    new_size:1024
+    protocol:6 port:443
+  ip4-gre4 [4] 192.0.2.3/32
+    new_size:1024
+`
+	got := parseLBVIPSnapshot(text)
+	want := map[vipKey]lbVIPSnapshot{
+		{prefix: "192.0.2.1/32", protocol: 6, port: 80}:     {index: 1, sticky: true},
+		{prefix: "192.0.2.2/32", protocol: 17, port: 53}:    {index: 2, sticky: false},
+		{prefix: "2001:db8::1/128", protocol: 6, port: 443}: {index: 3, sticky: true},
+		{prefix: "192.0.2.3/32", protocol: 255, port: 0}:    {index: 4, sticky: false}, // all-port VIP
 	}
-
-	for _, tc := range cases {
-		t.Run(tc.name, func(t *testing.T) {
-			w, f := asFromBackend(tc.poolIdx, tc.activePool, tc.state, tc.cfgWeight)
-			if w != tc.wantWeight {
-				t.Errorf("weight: got %d, want %d", w, tc.wantWeight)
-			}
-			if f != tc.wantFlush {
-				t.Errorf("flush: got %v, want %v", f, tc.wantFlush)
-			}
-		})
+	if len(got) != len(want) {
+		t.Errorf("got %d entries, want %d: %#v", len(got), len(want), got)
 	}
-}
-
-// TestActivePoolIndex locks down the priority-failover selector: the first
-// pool containing at least one up backend is the active pool. Default 0.
-func TestActivePoolIndex(t *testing.T) {
-	mkFE := func(pools ...[]string) config.Frontend {
-		out := make([]config.Pool, len(pools))
-		for i, p := range pools {
-			out[i] = config.Pool{Name: "p", Backends: map[string]config.PoolBackend{}}
-			for _, name := range p {
-				out[i].Backends[name] = config.PoolBackend{Weight: 100}
-			}
+	for k, v := range want {
+		g, ok := got[k]
+		if !ok {
+			t.Errorf("missing key %+v", k)
+			continue
+		}
+		if g != v {
+			t.Errorf("key %+v: got %+v, want %+v", k, g, v)
 		}
-		return config.Frontend{Pools: out}
-	}
-
-	cases := []struct {
-		name   string
-		fe     config.Frontend
-		states map[string]health.State
-		want   int
-	}{
-		{
-			name:   "pool0 has up, pool1 standby",
-			fe:     mkFE([]string{"a", "b"}, []string{"c", "d"}),
-			states: map[string]health.State{"a": health.StateUp, "b": health.StateDown, "c": health.StateUp, "d": health.StateUp},
-			want:   0,
-		},
-		{
-			name:   "pool0 all down, pool1 has up → failover",
-			fe:     mkFE([]string{"a", "b"}, []string{"c", "d"}),
-			states: map[string]health.State{"a": health.StateDown, "b": health.StateDown, "c": health.StateUp, "d": health.StateUp},
-			want:   1,
-		},
-		{
-			name:   "pool0 all disabled, pool1 has up → failover",
-			fe:     mkFE([]string{"a", "b"}, []string{"c"}),
-			states: map[string]health.State{"a": health.StateDisabled, "b": health.StateDisabled, "c": health.StateUp},
-			want:   1,
-		},
-		{
-			name:   "pool0 all paused, pool1 has up → failover",
-			fe:     mkFE([]string{"a"}, []string{"c"}),
-			states: map[string]health.State{"a": health.StatePaused, "c": health.StateUp},
-			want:   1,
-		},
-		{
-			name:   "pool0 all unknown (startup), pool1 up → pool1",
-			fe:     mkFE([]string{"a"}, []string{"c"}),
-			states: map[string]health.State{"a": health.StateUnknown, "c": health.StateUp},
-			want:   1,
-		},
-		{
-			name:   "nothing up anywhere → default 0",
-			fe:     mkFE([]string{"a"}, []string{"c"}),
-			states: map[string]health.State{"a": health.StateDown, "c": health.StateDown},
-			want:   0,
-		},
-		{
-			name:   "1 up in pool0 is enough",
-			fe:     mkFE([]string{"a", "b", "c"}, []string{"d"}),
-			states: map[string]health.State{"a": health.StateDown, "b": health.StateDown, "c": health.StateUp, "d": health.StateUp},
-			want:   0,
-		},
-		{
-			name:   "three tiers, pool0 and pool1 both empty → pool2",
-			fe:     mkFE([]string{"a"}, []string{"b"}, []string{"c"}),
-			states: map[string]health.State{"a": health.StateDown, "b": health.StateDown, "c": health.StateUp},
-			want:   2,
-		},
-	}
-
-	for _, tc := range cases {
-		t.Run(tc.name, func(t *testing.T) {
-			got := activePoolIndex(tc.fe, tc.states)
-			if got != tc.want {
-				t.Errorf("got pool %d, want pool %d", got, tc.want)
-			}
-		})
 	}
 }

@@ -161,8 +65,10 @@ func (f *fakeStateSource) BackendState(name string) (health.State, bool) {
 }

 // TestDesiredFromFrontendFailover is the end-to-end integration test for
-// priority-failover: given a frontend with two pools, the desired weights
-// flip between pools based on which has any up backends.
+// priority-failover in the VPP sync path: given a frontend with two pools,
+// the desired weights flip between pools based on which has any up backends.
+// This exercises vpp.desiredFromFrontend which wraps the pure helpers in
+// the health package; those helpers are unit-tested separately in health.
 func TestDesiredFromFrontendFailover(t *testing.T) {
 	ip := func(s string) net.IP { return net.ParseIP(s).To4() }
 	cfg := &config.Config{
--- a/internal/vpp/reconciler.go
+++ b/internal/vpp/reconciler.go
@@ -63,11 +63,17 @@ func (r *Reconciler) Run(ctx context.Context) {
 	}
 }

-// handle reconciles one event. Operates only on events that carry a
-// frontend name (the checker emits one event per frontend that references
-// the backend, so a backend shared across multiple frontends produces
-// multiple events and all relevant VIPs are reconciled).
+// handle reconciles one event. Operates only on backend-transition events
+// that carry a frontend name (the checker emits one event per frontend that
+// references the backend, so a backend shared across multiple frontends
+// produces multiple events and all relevant VIPs are reconciled).
+// Frontend-transition events are observational only — the dataplane work
+// they would imply has already been done by the backend-transition event
+// that triggered them.
 func (r *Reconciler) handle(ev checker.Event) {
+	if ev.FrontendTransition != nil {
+		return // frontend-only event; no dataplane work
+	}
 	if ev.FrontendName == "" {
 		return
 	}