VPP LB counters, src-ip-sticky, and frontend state aggregation

New feature: per-VIP / per-backend runtime counters
  * New GetVPPLBCounters RPC serving an in-process snapshot refreshed
    by a 5s scrape loop (internal/vpp/lbstats.go). Each cycle pulls
    the LB plugin's four SimpleCounters (next, first, untracked,
    no-server) plus the FIB /net/route/to CombinedCounter for every
    VIP and every backend host prefix via a single DumpStats call.
  * FIB stats-index discovery via ip_route_lookup (internal/vpp/
    fibstats.go); per-worker reduction happens in the collector.
  * Prometheus collector exports vip_packets_total (kind label),
    vip_route_{packets,bytes}_total, and backend_route_{packets,
    bytes}_total. Metrics source interface extended with VIPStats /
    BackendRouteStats; vpp.Client publishes snapshots via
    atomic.Pointer and clears them on disconnect.
  * New 'show vpp lb counters' CLI command. The 'show vpp lbstate'
    and 'sync vpp lbstate' commands are restructured under 'show
    vpp lb {state,counters}' / 'sync vpp lb state' to make room
    for the new verb.

New feature: src-ip-sticky frontends
  * New frontend YAML key 'src-ip-sticky' (bool). Plumbed through
    config.Frontend, desiredVIP, and the lb_add_del_vip_v2 call.
  * Reflected in gRPC FrontendInfo.src_ip_sticky and VPPLBVIP.
    src_ip_sticky, and shown in 'show vpp lb state' output.
  * Scraped back from VPP by parsing 'show lb vips verbose' through
    cli_inband — lb_vip_details does not expose the flag. The same
    scrape also recovers the LB pool index for each VIP, which the
    stats-segment counters are keyed on. This is a documented
    temporary workaround until VPP ships an lb_vip_v2_dump.
  * src_ip_sticky cannot be mutated on a live VIP, so a flipped flag
    triggers a tear-down-and-recreate in reconcileVIP (ASes deleted
    with flush, VIP deleted, then re-added). Flip is logged.

New feature: frontend state aggregation and events
  * New health.FrontendState (unknown/up/down) and FrontendTransition
    types. A frontend is 'up' iff at least one backend has a nonzero
    effective weight, 'unknown' iff no backend has real state yet,
    and 'down' otherwise.
  * Checker tracks per-frontend aggregate state, recomputing after
    each backend transition and emitting a frontend-transition Event
    on change. Reload drops entries for removed frontends.
  * checker.Event gains an optional FrontendTransition pointer;
    backend- vs. frontend-transition events are demultiplexed on
    that field.
  * WatchEvents now sends an initial snapshot of frontend state on
    connect (mirroring the existing backend snapshot), subscribes
    once to the checker stream, and fans out to backend/frontend
    handlers based on the client's filter flags. The proto
    FrontendEvent message grows name + transition fields.
  * New Checker.FrontendState accessor.

Refactor: pure health helpers
  * Moved the priority-failover selector and the (pool idx, active
    pool, state, cfg weight) → (vpp weight, flush) mapping out of
    internal/vpp/lbsync.go into a new internal/health/weights.go so
    the checker can reuse them for frontend-state computation
    without importing internal/vpp.
  * New functions: health.ActivePoolIndex, BackendEffectiveWeight,
    EffectiveWeights, ComputeFrontendState. lbsync.go now calls
    these directly; vpp.EffectiveWeights is a thin wrapper over
    health.EffectiveWeights retained for the gRPC observability
    path. Fully unit-tested in internal/health/weights_test.go.

maglevc polish
  * --color default is now mode-aware: on in the interactive shell,
    off in one-shot mode so piped output is script-safe. Explicit
    --color=true/false still overrides.
  * New stripHostMask helper drops /32 and /128 from VIP display;
    non-host prefixes pass through unchanged.
  * Counter table column order fixed (first before next) and
    packets/bytes columns renamed to fib-packets/fib-bytes to
    clarify they come from the FIB, not the LB plugin.

Docs
  * config-guide: document src-ip-sticky, including the VIP
    recreate-on-change caveat.
  * user-guide, maglevc.1, maglevd.8: updated command tree, new
    counters command, color defaults, and the src-ip-sticky field.
This commit is contained in:
2026-04-12 15:59:02 +02:00
parent d5fbf5c640
commit fb62532fd5
25 changed files with 2163 additions and 549 deletions

View File

@@ -7,6 +7,11 @@ import (
"fmt"
"log/slog"
"net"
"regexp"
"strconv"
"strings"
"go.fd.io/govpp/binapi/vlib"
"git.ipng.ch/ipng/vpp-maglev/internal/config"
"git.ipng.ch/ipng/vpp-maglev/internal/health"
@@ -29,10 +34,11 @@ type vipKey struct {
// desiredVIP is the sync's view of one VIP derived from the maglev config.
type desiredVIP struct {
Prefix *net.IPNet
Protocol uint8 // 6=TCP, 17=UDP, 255=any
Port uint16
ASes map[string]desiredAS // keyed by AS IP string
Prefix *net.IPNet
Protocol uint8 // 6=TCP, 17=UDP, 255=any
Port uint16
SrcIPSticky bool // lb_add_del_vip_v2.src_ip_sticky
ASes map[string]desiredAS // keyed by AS IP string
}
// desiredAS is one application server to be installed under a VIP.
@@ -133,10 +139,12 @@ func (c *Client) SyncLBStateAll(cfg *config.Config) error {
for k, d := range desByKey {
cur, existing := curByKey[k]
var curPtr *LBVIP
var curSticky bool
if existing {
curPtr = &cur
curSticky = cur.SrcIPSticky
}
if err := reconcileVIP(ch, d, curPtr, &st); err != nil {
if err := reconcileVIP(ch, d, curPtr, curSticky, &st); err != nil {
return err
}
}
@@ -189,8 +197,13 @@ func (c *Client) SyncLBStateVIP(cfg *config.Config, feName string) error {
"protocol", protocolName(d.Protocol),
"port", d.Port)
var curSticky bool
if cur != nil {
curSticky = cur.SrcIPSticky
}
var st syncStats
if err := reconcileVIP(ch, d, cur, &st); err != nil {
if err := reconcileVIP(ch, d, cur, curSticky, &st); err != nil {
return err
}
recordSyncStats("vip", &st)
@@ -207,7 +220,14 @@ func (c *Client) SyncLBStateVIP(cfg *config.Config, feName string) error {
// reconcileVIP brings one VIP's state in VPP into alignment with the desired
// state. If cur is nil the VIP is added from scratch; otherwise ASes are
// added, removed, and reweighted individually. Stats are accumulated into st.
func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, st *syncStats) error {
//
// curSticky is the src_ip_sticky flag VPP currently has programmed for this
// VIP, as scraped by queryLBSticky. Callers only consult it when cur != nil,
// and the scrape reads the same live lb_main.vips pool as lb_vip_dump, so a
// matching entry is always present. When the flag differs from the desired
// value, the VIP is torn down (ASes del+flushed, VIP deleted) and recreated
// — VPP has no API to mutate src_ip_sticky on an existing VIP.
func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, curSticky bool, st *syncStats) error {
if cur == nil {
if err := addVIP(ch, d); err != nil {
return err
@@ -222,6 +242,30 @@ func reconcileVIP(ch *loggedChannel, d desiredVIP, cur *LBVIP, st *syncStats) er
return nil
}
if curSticky != d.SrcIPSticky {
slog.Info("vpp-lbsync-vip-recreate",
"prefix", d.Prefix.String(),
"protocol", protocolName(d.Protocol),
"port", d.Port,
"reason", "src-ip-sticky-changed",
"from", curSticky,
"to", d.SrcIPSticky)
if err := removeVIP(ch, *cur, st); err != nil {
return err
}
if err := addVIP(ch, d); err != nil {
return err
}
st.vipAdd++
for _, as := range d.ASes {
if err := addAS(ch, d.Prefix, d.Protocol, d.Port, as); err != nil {
return err
}
st.asAdd++
}
return nil
}
// VIP exists in both — reconcile ASes.
curASes := make(map[string]LBAS, len(cur.ASes))
for _, a := range cur.ASes {
@@ -306,25 +350,15 @@ func desiredFromFrontend(cfg *config.Config, fe config.Frontend, src StateSource
bits = 128
}
d := desiredVIP{
Prefix: &net.IPNet{IP: fe.Address, Mask: net.CIDRMask(bits, bits)},
Protocol: protocolFromConfig(fe.Protocol),
Port: fe.Port,
ASes: make(map[string]desiredAS),
Prefix: &net.IPNet{IP: fe.Address, Mask: net.CIDRMask(bits, bits)},
Protocol: protocolFromConfig(fe.Protocol),
Port: fe.Port,
SrcIPSticky: fe.SrcIPSticky,
ASes: make(map[string]desiredAS),
}
// Snapshot backend states once so the active-pool computation and the
// per-backend weight assignment see a consistent view.
states := make(map[string]health.State)
for _, pool := range fe.Pools {
for bName := range pool.Backends {
if s, ok := src.BackendState(bName); ok {
states[bName] = s
} else {
states[bName] = health.StateUnknown
}
}
}
activePool := activePoolIndex(fe, states)
states := snapshotStates(fe, src)
activePool := health.ActivePoolIndex(fe, states)
for poolIdx, pool := range fe.Pools {
for bName, pb := range pool.Backends {
@@ -337,12 +371,13 @@ func desiredFromFrontend(cfg *config.Config, fe config.Frontend, src StateSource
// weight=0 — they must not be deleted, otherwise a subsequent
// enable has to re-add them and existing flow-table state (if
// any) is lost. The state machine drives what weight to set
// via asFromBackend; we never filter on b.Enabled here.
// via health.BackendEffectiveWeight; we never filter on
// b.Enabled here.
addr := b.Address.String()
if _, already := d.ASes[addr]; already {
continue
}
w, flush := asFromBackend(poolIdx, activePool, states[bName], pb.Weight)
w, flush := health.BackendEffectiveWeight(poolIdx, activePool, states[bName], pb.Weight)
d.ASes[addr] = desiredAS{
Address: b.Address,
Weight: w,
@@ -354,14 +389,17 @@ func desiredFromFrontend(cfg *config.Config, fe config.Frontend, src StateSource
}
// EffectiveWeights returns the current effective VPP weight for every backend
// in every pool of fe, keyed by poolIdx and backend name. It runs the same
// failover + state-aware weight calculation that the sync path uses, but
// produces a plain map instead of desiredVIP — intended for observability
// (e.g. the GetFrontend gRPC handler) and for robot-testing the failover
// logic without needing a running VPP instance.
//
// The returned map layout is: result[poolIdx][backendName] = effective weight.
// in every pool of fe, keyed by poolIdx and backend name. Intended for
// observability (e.g. the GetFrontend gRPC handler) — the sync path and the
// checker's frontend-state logic use health.EffectiveWeights directly.
func EffectiveWeights(fe config.Frontend, src StateSource) map[int]map[string]uint8 {
return health.EffectiveWeights(fe, snapshotStates(fe, src))
}
// snapshotStates builds a state map for every backend referenced by fe. It
// takes one read-lock per backend via the StateSource interface, so the
// caller gets a consistent view to feed into the pure health helpers.
func snapshotStates(fe config.Frontend, src StateSource) map[string]health.State {
states := make(map[string]health.State)
for _, pool := range fe.Pools {
for bName := range pool.Backends {
@@ -372,75 +410,7 @@ func EffectiveWeights(fe config.Frontend, src StateSource) map[int]map[string]ui
}
}
}
activePool := activePoolIndex(fe, states)
out := make(map[int]map[string]uint8, len(fe.Pools))
for poolIdx, pool := range fe.Pools {
out[poolIdx] = make(map[string]uint8, len(pool.Backends))
for bName, pb := range pool.Backends {
w, _ := asFromBackend(poolIdx, activePool, states[bName], pb.Weight)
out[poolIdx][bName] = w
}
}
return out
}
// activePoolIndex returns the index of the first pool in fe that contains at
// least one backend currently in StateUp. This is the priority-failover
// selector: pool[0] is the primary, pool[1] is the first fallback, and so on.
// As long as pool[0] has any up backend, it stays active. When every pool[0]
// backend leaves StateUp (down, paused, disabled, unknown), pool[1] is
// promoted — and so on for further fallback tiers. When no pool has any up
// backend, returns 0 (the return value is unobservable in that case since
// every backend maps to weight 0 regardless of the active pool).
func activePoolIndex(fe config.Frontend, states map[string]health.State) int {
for i, pool := range fe.Pools {
for bName := range pool.Backends {
if states[bName] == health.StateUp {
return i
}
}
}
return 0
}
// asFromBackend is the pure mapping from (pool index, active pool, backend
// state, config weight) to the desired VPP AS weight and flush hint. This is
// the single source of truth for the state → dataplane rule — every LB change
// flows through this function.
//
// A backend gets its configured weight iff it is up AND belongs to the
// currently-active pool. Every other case yields weight 0. The only
// state that produces flush=true is disabled.
//
// state in active pool not in active pool flush
// -------- -------------- ------------------- -----
// unknown 0 0 no
// up configured 0 (standby) no
// down 0 0 no
// paused 0 0 no
// disabled 0 0 yes
// removed handled separately (AS deleted via delAS)
//
// Flush semantics: flush=true means "if the AS currently has a non-zero
// weight in VPP, drop its existing flow-table entries when setting weight
// to 0". The reconciler only acts on flush when transitioning (current
// weight > 0), so steady-state syncs never re-flush. Failover demotion
// (e.g. pool[1] up→standby when pool[0] recovers) does NOT flush — we
// let those sessions drain naturally.
func asFromBackend(poolIdx, activePool int, state health.State, cfgWeight int) (weight uint8, flush bool) {
switch state {
case health.StateUp:
if poolIdx == activePool {
return clampWeight(cfgWeight), false
}
return 0, false
case health.StateDisabled:
return 0, true
default:
// unknown, down, paused: off, drain existing flows naturally.
return 0, false
}
return states
}
// ---- API call helpers ------------------------------------------------------
@@ -461,6 +431,7 @@ func addVIP(ch *loggedChannel, d desiredVIP) error {
Encap: encap,
Type: lb_types.LB_API_SRV_TYPE_CLUSTERIP,
NewFlowsTableLength: defaultFlowsTableLength,
SrcIPSticky: d.SrcIPSticky,
IsDel: false,
}
reply := &lb.LbAddDelVipV2Reply{}
@@ -474,7 +445,8 @@ func addVIP(ch *loggedChannel, d desiredVIP) error {
"prefix", d.Prefix.String(),
"protocol", protocolName(d.Protocol),
"port", d.Port,
"encap", encapName(encap))
"encap", encapName(encap),
"src-ip-sticky", d.SrcIPSticky)
return nil
}
@@ -574,6 +546,136 @@ func setASWeight(ch *loggedChannel, prefix *net.IPNet, protocol uint8, port uint
return nil
}
// ---- VIP snapshot scrape ---------------------------------------------------
// TEMPORARY WORKAROUND: VPP's lb_vip_dump / lb_vip_details message does not
// return the src_ip_sticky flag, and lb_vip_details also doesn't expose the
// LB plugin's pool index — which is what the stats segment uses to key
// per-VIP counters. We work around both by running `show lb vips verbose`
// via the cli_inband API and parsing the human-readable output. This is
// slow and fragile (the format is not a stable API) and must be replaced
// once VPP ships an lb_vip_v2_dump that includes these fields.
// lbVIPSnapshot holds the per-VIP facts that the scrape recovers. `index`
// is the LB pool index (`vip - lbm->vips` in lb.c) used as the second
// dimension of the vip_counters SimpleCounterStat in the stats segment.
type lbVIPSnapshot struct {
index uint32
sticky bool
}
// queryLBVIPSnapshot runs `show lb vips verbose` and parses it into a map
// keyed by vipKey. The returned error is non-nil only when the cli_inband
// RPC itself fails.
func queryLBVIPSnapshot(ch *loggedChannel) (map[vipKey]lbVIPSnapshot, error) {
req := &vlib.CliInband{Cmd: "show lb vips verbose"}
reply := &vlib.CliInbandReply{}
if err := ch.SendRequest(req).ReceiveReply(reply); err != nil {
return nil, fmt.Errorf("cli_inband %q: %w", req.Cmd, err)
}
if reply.Retval != 0 {
return nil, fmt.Errorf("cli_inband %q: retval=%d", req.Cmd, reply.Retval)
}
return parseLBVIPSnapshot(reply.Reply), nil
}
// queryLBSticky is a thin projection of queryLBVIPSnapshot for callers that
// only care about the src_ip_sticky flag (the sync path).
func queryLBSticky(ch *loggedChannel) (map[vipKey]bool, error) {
snap, err := queryLBVIPSnapshot(ch)
if err != nil {
return nil, err
}
out := make(map[vipKey]bool, len(snap))
for k, v := range snap {
out[k] = v.sticky
}
return out, nil
}
// lbVIPHeaderRe matches the first line of each VIP block in the output of
// `show lb vips verbose`. VPP produces lines like:
//
// ip4-gre4 [1] 192.0.2.1/32 src_ip_sticky
// ip6-gre6 [2] 2001:db8::1/128
//
// Capture groups: 1 = pool index, 2 = prefix (CIDR), 3 = " src_ip_sticky" when present.
var lbVIPHeaderRe = regexp.MustCompile(
`\b(?:ip4-gre4|ip6-gre6|ip4-gre6|ip6-gre4|ip4-l3dsr|ip4-nat4|ip6-nat6)\s+\[(\d+)\]\s+(\S+)(\s+src_ip_sticky)?`,
)
// lbVIPProtoRe matches the `protocol:<n> port:<n>` sub-line that appears
// under each non-all-port VIP block.
var lbVIPProtoRe = regexp.MustCompile(`protocol:(\d+)\s+port:(\d+)`)
// parseLBVIPSnapshot turns the reply text of `show lb vips verbose` into a
// map of vipKey → lbVIPSnapshot. It walks lines sequentially: each header
// line starts a new VIP, and the following `protocol:<p> port:<port>` line
// (if any) refines the key. All-port VIPs have no protocol/port sub-line
// and are recorded with protocol=255 and port=0 — the same key format used
// by the rest of the sync path (see protocolFromConfig).
func parseLBVIPSnapshot(text string) map[vipKey]lbVIPSnapshot {
out := make(map[vipKey]lbVIPSnapshot)
var havePending bool
var curPrefix string
var curIndex uint32
var curSticky bool
var curProto uint8 = 255
var curPort uint16
flush := func() {
if !havePending || curPrefix == "" {
return
}
out[vipKey{prefix: curPrefix, protocol: curProto, port: curPort}] = lbVIPSnapshot{
index: curIndex,
sticky: curSticky,
}
}
for _, line := range strings.Split(text, "\n") {
if m := lbVIPHeaderRe.FindStringSubmatch(line); m != nil {
flush()
havePending = true
if idx, err := strconv.ParseUint(m[1], 10, 32); err == nil {
curIndex = uint32(idx)
} else {
curIndex = 0
}
curPrefix = canonicalCIDR(m[2])
curSticky = m[3] != ""
curProto = 255
curPort = 0
continue
}
if !havePending {
continue
}
if m := lbVIPProtoRe.FindStringSubmatch(line); m != nil {
if p, err := strconv.Atoi(m[1]); err == nil {
curProto = uint8(p)
}
if p, err := strconv.Atoi(m[2]); err == nil {
curPort = uint16(p)
}
}
}
flush()
return out
}
// canonicalCIDR parses a CIDR string and returns it in Go's canonical
// net.IPNet.String() form so it matches the keys produced by makeVIPKey.
// Returns the input unchanged if parsing fails — the header regex should
// have validated it, but a mismatch shouldn't panic the sync path.
func canonicalCIDR(s string) string {
_, ipnet, err := net.ParseCIDR(s)
if err != nil {
return s
}
return ipnet.String()
}
// ---- utility ---------------------------------------------------------------
func makeVIPKey(prefix *net.IPNet, protocol uint8, port uint16) vipKey {
@@ -618,13 +720,3 @@ func encapName(e lb_types.LbEncapType) string {
}
return fmt.Sprintf("%d", e)
}
func clampWeight(w int) uint8 {
if w < 0 {
return 0
}
if w > 100 {
return 100
}
return uint8(w)
}