36 KiB
date, title
| date | title |
|---|---|
| 2026-05-08T06:35:14Z | VPP with Maglev Loadbalancing - Part 2 |
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
About this series
Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic ASR (aggregation service router), VPP will look and feel quite familiar as many of the approaches are shared between the two.
In a [[previous article]({{< ref "2026-04-30-vpp-maglev" >}})], I looked into the Maglev algorithm and how it is implemented in VPP. I fixed a couple of bugs in the API and added features to set weights for application server backends. In this article, I am going to describe an approach to a control plane for VPP's Maglev plugin.
Introduction
For the VPP Maglev plugin to be truly useful, some automation has to govern its use of backends: which ones get how much traffic, which ones are unhealthy and need to be drained, and so on. Ideally, this control loop is fully automatic: when backends go missing either because they are down themselves or because the datacenter they are in decides to take the day off, it would be nice if the load balancer notices this and avoids sending traffic there. However, the VPP Maglev plugin does not offer any of these smarts. The plugin is a pure dataplane component that can sling packets to backends at very high rates, and all the rest is left as an exercise for the reader.
VPP Maglev: Controlplane
The core problem is that VPP's lb plugin is pure dataplane. It holds a table of VIPs, each with a
set of application servers and their weights using the feature I added. It then hashes new flows
deterministically onto those servers. That is cool, but it is all it does. If a backend stops
responding, VPP does not know and does not care - it will keep sending traffic to that address until
someone or something tells it otherwise. The result is a black hole: clients trying to establish new
connections time out while waiting for a backend that will never respond.
Before I decided to write vpp-maglev, the fix for missing/down backends was manual: watch your
monitoring dashboards, notice when a backend is down, SSH into the machine running VPP, and use
vppctl lb as ... del flush to remove the dead backend. That works, but it obviously requires a
human in the loop and introduces a window of failure between the backend going down and the operator
reacting. For a production load balancer that is supposed to be invisible to users, this is not good
enough.
What IPng needs at a high level, is a controlplane that can:
- Continuously probe each backend and maintain an accurate view of its health.
- Translate health state changes into VPP API calls immediately, without human intervention.
- Handle edge cases gracefully: what happens when
maglevditself restarts? When VPP restarts? When a backend is briefly playing Flappy Bird? - Expose all of this state through a uniform API so that CLIs, dashboards, and monitoring scripts can all read from (and write to) the same source of truth.
To address my needs, I decided to write vpp-maglev, which ships as four binaries: maglevd (the
controlplane daemon), maglevc (a CLI for it), maglevd-frontend (a web dashboard for it), and
maglevt (an out-of-band test utility). The rest of this article goes through each one in detail.
Design Principles
Before blindly writing code, I wrote down a few of the constraints I wanted to hold true. Wait, a design you say? Well, yes! And this design turned out to drive most of the architectural decisions:
One source of truth. Every component - CLI, web dashboard, alerting scripts - reads maglevd
through one typed gRPC interface. There is no secondary control plane. The CLI and the web dashboard
show exactly the same state as each other because they both ask the same controlplane daemon.
Restart neutrality. Restarting maglevd while VPP is serving live traffic must not cause user
interruption or traffic blackholing. A naive implementation would initialize an empty LB state upon
startup, because at that point the vpp-maglev daemon sees every backend in an initial unknown state. I
need to make sure I design for things like controlplane upgrades from the get-go, so they are safe.
Diff-based reconciliation. I want to create a VPP sync that computes a desired state from the config and current observed health, then diffs it against what VPP already has, issuing only the minimum set of API calls to converge. This is not too dissimilar from the approach I took in [[vppcfg]({{< ref 2022-03-27-vppcfg-1 >}})], in that running the sync multiple times needs to produce the same outcome as running it once.
Structured observability from the start. Every state change needs to be accounted for in a structured JSON log, a Prometheus counter increment, and a streaming gRPC event. All three, every time. I find it very frustrating to debug production systems that have ad hoc log messages and no metrics, and if it's one thing a life-time career of being an SRE has taught me, it is to set the observability bar high early.
Health Checker: maglevd
maglevd is the long-running daemon at the center of everything. It needs to have some initial
state configuration which needs to be present on the machine, so that cold restarts do not need to
phone home to get a running config. My first decision is to let it read a YAML configuration
file that describes three named collections: healthchecks, backends that reference the health
checks, and frontends that reference the backends.
The configuration structure maps directly onto the internal runtime model, sort of like this:
maglev:
healthchecks:
http-check:
type: http
port: 80
params:
path: /.well-known/ipng/healthz
response-code: "200-204"
interval: 5s
backends:
nginx0-ams:
address: 192.0.2.10
healthcheck: http-check
nginx1-ams:
address: 192.0.2.11
healthcheck: http-check
nginx0-fra:
address: 192.0.2.12
healthcheck: http-check
frontends:
http-vip:
address: 192.0.2.1
protocol: tcp
port: 80
pools:
- name: primary
backends:
nginx0-ams: { weight: 100 }
nginx1-ams: { weight: 10 }
- name: fallback
backends:
nginx0-fra: {}
A healthcheck defines how to probe - the protocol, port, success criteria, timing parameters,
and so on. A backend is a named IP address bound to a healthcheck. A frontend is a VIP
address with one or more named pools, where each pool is an ordered list of (backend, weight)
tuples. At runtime, each backend gets exactly one probe (which Go lets me use goroutines for),
regardless of how many frontends reference it, which greatly cuts down on probe traffic.
Probes run on the configured schedule and their results flow through a state machine. State
changes emit events that the reconciler picks up and translates into VPP API calls and gRPC
streaming events for subscribed clients. The frontend's aggregate state, be it up, down, or
unknown, is derived from the effective weights of its backends and needs to be updated on every
backend transition.
The Golang slog (structured log) package emits machine-consumable JSON directly:
{"level":"INFO","msg":"backend-transition","backend":"nginx0-ams","from":"down","to":"up","code":"L7OK","detail":""}
{"level":"INFO","msg":"frontend-transition","frontend":"http-vip","from":"down","to":"up"}
I don't really have to think about all of this state checking stuff from scratch. There are a few really good loadbalancers out there already! One of them is HAProxy, which I used a very long time ago. It features a really good health checking approach, the principles of which I am grateful to borrow for my own project.
HAProxy: Learning from its Health Counter
The state machine is driven by a single integer borrowed from HAProxy's health model: given a rise
threshold and a fall threshold, define a counter health in the range [0, rise + fall - 1]. The
backend is considered up when health >= rise and down when health < rise.
On each probe, a pass increments the counter (ceiling at maximum); a failure decrements it (floor
at zero). This gives hysteresis: a backend sitting at the rise boundary needs fall
consecutive failures before it transitions to down, and a fully-down backend needs rise
consecutive passes to come back up. A flapping backend that alternates between passing and failing
stays in the degraded zone without bouncing between states - which is exactly what I want to
avoid a storm of VPP API calls from a noisy backend.
In pseudocode, here's what that simple yet elegant approach looks like:
type HealthCounter struct {
Health int
Rise int
Fall int
}
func (h *HealthCounter) IsUp() bool { return h.Health >= h.Rise }
func (h *HealthCounter) RecordPass() bool {
wasUp := h.IsUp()
if h.Health < h.Max() { h.Health++ }
return !wasUp && h.IsUp()
}
func (h *HealthCounter) RecordFail() bool {
wasDown := !h.IsUp()
if h.Health > 0 { h.Health-- }
return !wasDown && !h.IsUp()
}
Taking an example of rise=2, fall=3, the health counter will span [0, 4]. The state boundary
sits between the 'down' side (health of 0 or 1), and 'up' side (health of 2, 3 or 4). A backend
sitting at health counter 2 (just transitioned to 'up') will need three consecutive failures to go
down: 2->1->0.
When a backend enters the unknown state, for example when the vpp-maglev daemon just started, or
after a backend was briefly paused or disabled, I try to be a bit more clever than HAProxy (famous
last words, I'm sure), by pre-setting the health counter to rise - 1. This means the very first
probe resolves the state immediately: one pass produces an unknown transition to up, and one
fail produces an unknown transition to down. The shortcut allows any probe failure while the
state is unknown to immediately be marked down. I argue that a backend that cannot pass even its
very first probe should not receive traffic and we should not wait for its health to fall all the
way down to 0.
Probe types. maglevd starts off its life supporting four probe types:
icmp- sends an ICMP echo request and waits for a reply, for which I do not need to run the daemon with root privileges, instead I can assignCAP_NET_RAWfor this purpose. This healthcheck type is useful for checking basic reachability without opening a TCP connection. Borrowing again from HAProxy, this can result in probe codes:L4OKon reply,L4TOUTon timeout,L4CONon send error.tcp- opens a TCP connection to the configured port and closes it cleanly. This healthcheck can optionally wrap the connection in TLS with parameterssl: true, with optional server name andinsecure-skip-verifyto allow for self-signed certificates. The resulting probe codes areL4OKon connect,L4CONon refused,L4TOUTon timeout,L6OK/L6CON/L6TOUTfor TLS.http- opens a TCP connection, sends an HTTP/1.1GETrequest to the configured path with an optionalHostheader, and validates the response code against a configured range (e.g."200-204"). This healthcheck can optionally validate the body against a regular expression, making it similar to how Nagios does its checks. The probe return codes are:L7OKon success,L7STSon unexpected status code,L7RSPon bad response, andL7TOUTon timeout.https- This is a special-case of thehttphealthcheck type, but using TLS. It supports the use of SNIserver-nameoverride andinsecure-skip-verifyas well for backends with self-signed certificates.
One other thing I noticed while reading the HAProxy docs is that its probe timing is not fixed,
instead depending on the counter state. A fully healthy backend (counter at maximum) is probed at
the configured interval. A degraded or unknown backend is probed at the faster fast-interval, to
be able to mark it either up or down more quickly. And, a fully down backend is probed at the slower
down-interval. The result of these is that a recovering backend is re-evaluated quickly while one
that has been offline for a long time generates less probe traffic.
I add one additional detail (which I've learned the hard way when operating very large loadbalancer
pools with thousands of backends), namely jitter: every computed interval (fast, down or normal)
is scaled by a uniformly-random factor of 10% so that all probe goroutines do not phase-lock to the
same wall-clock tick after a restart, and do not hit the backend at exactly the same time either.
Good for vpp-maglev and good for the backends. We can all win, sometimes :)
Pool failover. I've found it can be useful, mostly in smaller deployments like IPng's mail and webserver cluster, to have primary traffic stay local to the Maglev loadbalancer (eg. a VPP Maglev instance in Amsterdam will select nginx backends in Amsterdam, not Paris or Zurich), but if they are all down, then fall back to further away backends in a different city.
This is how I came to the decision to give the ability for a frontend to have one or more pools, which
are priority tiers. The idea is that the active pool will be the first one that contains at least
one backend in up state. Backends in inactive pools have their weight effectively forced to zero
and will therefore receive no traffic. If all backends in the primary pool were to be down, the
weight of the next-best pool needs to be re-evaluated, and when the backends in the primary pool
recover, demotion of the standby pool can be graceful thanks to the lb as ... weight feature I
added to VPP: existing flows to standby backends are left to drain naturally. Only an operator
disable call will trigger an immediate flow-table flush.
Controlplane API: gRPC Endpoint
I want all client-visible functionality to be exposed through a single gRPC service. Read-only questions like 'how many frontends are there?' or 'what is the current health state of backend X?' but also state changing questions like 'set frontend F's backend B to weight W' need to be simple RPCs.
The most powerful RPC I add is called WatchEvents. This one returns a streaming response, and a
client can initiate a WatchRequest which specifies which event types to include. The vpp-maglev
daemon then pushes events as they happen - there is no polling. The event envelope is a protobuf oneof:
message Event {
oneof event {
LogEvent log = 1; // structured log record with key/value attrs
BackendEvent backend = 2; // backend state transition
FrontendEvent frontend = 3; // frontend aggregate state change
}
}
Using this approach allows the maglev daemon to send useful information to downstream consumers like
a CLI or WebUI in a simple yet extensible way. I imagine a CLI command like watch events, or a web
dashboard that shows health checks and state transitions in realtime. Those will be super useful and
can be observed within milliseconds without any busy-waiting or polling.
I didn't know this, but in the process of writing vpp-maglev, I learned about gRPC server
reflection, which I've enabled by default, so I can poke at the API without having the .proto
file, for example using grpcurl on the commandline:
pim@summer:~$ grpcurl -plaintext localhost:9090 list
pim@summer:~$ grpcurl -plaintext localhost:9090 maglev.Maglev/ListFrontends
pim@summer:~$ grpcurl -plaintext -d '{"name":"http-vip"}' localhost:9090 maglev.Maglev/GetFrontend
Dataplane API: VPP Plugin Programming
There are two parts to programming the VPP dataplane state. First, a reconciler reacts to individual backend state transitions, and then a VPP LB Sync module computes a minimal set of API calls to perform to make the dataplane reflect the backend state as seen by the controlplane daemon.
I tried to keep the reconciler as simple as possible. It only subscribes to the healthchecker's
event channel and for every backend transition, calls SyncLBStateVIP for the affected frontend. To
catch drift in the VPP dataplane, for example if VPP restarted, or if we re-connected to VPP, a
periodic SyncLBStateAll also runs and sweeps up any changes. This should not occur in general
operation, though, it's a belt-and-suspenders type of thing.
This isolated SyncLBState* stuff is also a future hook for divorcing the healthchecker and the LB
reconciler into two different binaries: think of a datacenter with 100 maglev frontends and 1000
local backends. In such a scenario, having three (N+2) healthcheckers should be sufficient, no need
to have every maglev check every backend!
Otherwise, the reconciler carries no state of its own. I put all the logic in SyncLBStateVIP,
which computes the full desired state from the config and current health, diffs it against what VPP
has, and issues only the necessary Binary API calls to bring the two in sync.
Dataplane API: Startup Warmup
{{< image width="7em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
During one of my tests, I noticed that after restarting the maglevd, it completely wipes the VPP
loadbalancer VIPs. In hindsight this makes total sense because when the healthchecker starts, all
backends are in unknown state, which causes the weights to be zero until the backends transition
to the up state. This causes thrashing in the dataplane, which is not what I intended. I think for
a bit and decide how I'm going to prevent that. My solution is a two-phase startup warmup controlled
by startup-min-delay (default 5s) and startup-max-delay (default 30s):
Phase 1: hands-off window. For the first startup-min-delay seconds after maglevd starts,
neither the reconciler nor the periodic sync loop can touch VPP at all. Probes run, the checker
accumulates state, but transitions are suppressed at the dataplane. VPP continues serving whatever
it was programmed with before the restart.
Phase 2: per-VIP release. Between startup-min-delay and startup-max-delay, each VIP is
released as soon as every backend it references has reached a non-unknown state. A background
poll running every 250 milliseconds checks for releasable VIPs, and the reconciler also checks
on every received transition. Whichever wins the race performs a single SyncLBStateVIP for that
VIP. It is free to live its life.
Watchdog. At startup-max-delay, any VIP whose backends are still unknown is swept by a
final SyncLBStateAll. Those stragglers are programmed with weight zero: something is still wrong
with them, but this is an unlikely situation, and one of those belt-and-suspenders things again.
Controlplane CLI: maglevc
maglevc connects to a running maglevd over gRPC and either executes a single command or drops
into an interactive shell. The same command tree is available in both modes:
pim@summer:~$ maglevc show frontends
pim@summer:~$ maglevc show backends nginx0-nlams0
pim@summer:~$ maglevc --color=false show vpp lb state
pim@summer:~$ maglevc --server chbtl2.net.ipng.ch:9090 watch events log level debug backend
In interactive mode, the prompt is maglev> . I put real effort into the shell experience because
this is the tool I reach for constantly when I want to interact with the system. I'm inspired by
Bird and try to mimic its look and feel, which will come in handy as IPng Networks uses Bird in
our routing controlplane. Having these tools all look and feel the same really helps, especially
when fecal matter hits the fast-spinning cooling device.
Command Tree and Completion
The CLI is built around a tree of command nodes. Each node carries a short description used for
inline help, a list of fixed keyword children, and optionally a live-completion function that
fetches candidates from the runtime state when the tab key is pressed. For backend names, the
completion function calls ListBackends with a one-second timeout; for frontend names,
ListFrontends; and so on. Unambiguous prefixes complete in place; multiple matches are listed so
I know what to type next. I saw this trick first in the SR Linux command-line interface, and I like the
in-line completion logic a lot. As the Dutch would say, 'Beter goed gestolen dan slecht bedacht'.
Prefix matching means I never have to type the full command. sh ba nginx0 is equivalent to
show backends nginx0, and sh vpp l s expands to show vpp lb state. This was important to me
because I am often working in a hurry and do not want to type long commands.
Inline help via ? will print the available completions for the current cursor position with
a short description next to each keyword. The ? character is not consumed - the input line is
unchanged after the help display, which is identical to how Bird consumes ? characters.
Color mode defaults to on in the interactive shell and off in one-shot mode, so piped output is
always clean. You can override either default with --color=true or --color=false. This is of
course not necessary, but sometimes is helpful to see the difference between static tokens and
variable nouns in the output. I like it, anyway :)
Viewing State
The most frequently used commands are the show family. show backends <name> shows the current
state, the enabled flag, the healthcheck, and the recent transition history with timestamps:
maglev> show backends nginx0-chplo0
name nginx0-chplo0
address 2001:678:d78:7::2:0
state up for 5d19h23m35s
enabled true
healthcheck nginx
transitions down → up 2026-04-24 18:19:51.608 5d19h23m35s ago
up → down 2026-04-23 22:14:48.311 6d15h28m39s ago
unknown → up 2026-04-22 09:44:31.664 8d3h58m55s ago
disabled → unknown 2026-04-22 09:44:30.628 8d3h58m56s ago
up → disabled 2026-04-22 09:41:54.495 8d4h1m33s ago
show frontends <name> shows both the configured weight and the effective weight for every backend
in every pool. The effective weight is what was actually programmed into VPP after pool failover
logic:
maglev> show frontends nginx-ip6-https
name nginx-ip6-https
address 2001:678:d78::1:0:1
protocol tcp
port 443
src-ip-sticky false
flush-on-down true
description IPv6 HTTPS VIP - nginx backends
pools
name primary
backends nginx0-chlzn0 weight 100 effective 100
nginx0-chplo0 weight 100 effective 0 [disabled]
name secondary
backends nginx0-nlams0 weight 100 effective 0
nginx0-frggh0 weight 100 effective 0
Here, I brought nginx0-chplo0 down so its effective weight is zero; the two instances
nginx0-nlams0 and nginx0-frggh0 are in the secondary pool, which is inactive because the primary
pool still has nginx0-chlzn0 up and serving (all) the traffic.
VPP State - A Separate Concern
One design decision I am happy with is keeping the maglevd view of the world (frontend and backend
state, health counters, effective weights) completely separate from the VPP view (what is actually
programmed in the dataplane). Both are visible through maglevc, but through different commands:
maglev> show frontends # maglevd's view: pools, backends, effective weights
maglev> show vpp lb state # VPP's view: VIPs, AS addresses, bucket counts
maglev> show vpp lb counters # VPP's view: per-VIP packet/byte counters
The show vpp lb state command shows the VPP load-balancer state as the plugin sees it: each VIP
with its application servers, their VPP-side weights, and how many of the 1024 Maglev hash buckets
are assigned to each AS. This is invaluable for confirming that a sync operation actually reached
VPP, and for debugging bucket distribution across backends with different weights.
Operator Actions
The set commands drive mutations. set backend <name> pause stops the probe goroutine and drives
the effective weight to zero; set backend <name> disable does the same but also flushes existing
flows. set backend <name> resume and set backend <name> enable restart probing and recompute
effective weights when the backend is ready to serve again.
Weight changes are immediate:
maglev> set frontend nginx-ip6-https pool primary backend nginx0-chplo0 weight 0
maglev> set frontend nginx-ip6-https pool primary backend nginx0-chplo0 weight 0 flush
maglev> set backend nginx0-chplo0 disable
The first command gracefully drains nginx0-chplo0 from the pool primary in frontend
nginx-ip6-https. When setting the weight to zero, new flows go elsewhere but existing ones finish.
The second flushes existing flows immediately. The third command then marks the backend as disabled,
which will remove it from serving in all pools it's a member of. This is useful when performing
maintenance on a backend, and it's the command I ran in the 'show frontend' output above.
Arguably the coolest idea, maglevc watch events, streams everything in real time. Combined with
log level debug, it shows every probe attempt and every VPP API call as they happen:
maglev> watch events log level debug backend
{"backend":{"backendName":"nginx0-chlzn0","transition":{"from":"up","to":"up"}}}
{"backend":{"backendName":"nginx0-chplo0","transition":{"from":"up","to":"up"}}}
{"backend":{"backendName":"nginx0-frggh0","transition":{"from":"up","to":"up"}}}
{"backend":{"backendName":"nginx0-nlams0","transition":{"from":"up","to":"up"}}}
{"log":{"atUnixNs":"1777558154335278835","level":"DEBUG","msg":"probe-start",
"attrs":[{"key":"backend","value":"nginx0-chplo0"},{"key":"type","value":"https"}]}}
{"log":{"atUnixNs":"1777558154371619020","level":"DEBUG","msg":"probe-done",
"attrs":[{"key":"backend","value":"nginx0-chplo0"},{"key":"type","value":"https"},
{"key":"ok","value":"true"},{"key":"code","value":"L7OK"},{"key":"detail"},
{"key":"elapsed","value":"36ms"}]}}
And finally, I mimic Bird's "reconfigure" with a set of two primitives config check and config reload which let me validate and apply configuration changes without restarting the daemon. With
that, the maglev daemon, the main brains of the operation, is feature complete.
Test Utility: maglevt
Once maglevd is running and maglevc shows everything healthy, the natural next question is: does
it actually work end-to-end? A healthcheck passing means the backend can accept a TCP connection
or return an HTTP 200, but it does not tell me whether a client hitting the VIP actually reaches the
right backend, or whether failover is visible at the application level?
I wanted a tool that could sit outside the control plane entirely - not talking gRPC, not reading
maglevd state - but just hitting the VIPs directly as a real client would, tallying which backend
served each request. The obvious approach is to configure each backend to include its own hostname
in an HTTP response header. On my nginx servers I add a header X-IPng-Frontend which returns the
local $hostname variable. Then a probe tool that reads X-IPng-Frontend from each response can
show the live distribution across backends, and a failover is immediately visible as a
redistribution of the tally.
That idea turns into maglevt, which reads one or more maglev.yaml files, enumerates the
HTTP/HTTPS frontends, and probes each VIP at a configurable interval (default 100ms per VIP, with
+/-10% jitter to prevent phase-locking). Each probe opens a fresh TCP connection - keep-alives are off
by default - so every request is independently hashed by VPP's Maglev algorithm. The tally
reshuffles the moment a backend goes down or a standby pool activates.
The UI is a terminal dashboard built with [Bubble Tea], a Go TUI library. Each VIP gets a tile showing a rolling latency summary (min, max, average, p95), running success and failure counts, and the response header tally, and a set of errors, like so:
{{< image width="100%" src="/assets/vpp-maglev/maglevt.png" alt="VPP Maglev TUI client" >}}
There's a lot to see in this screenshot, so let me unpack it. I'm running maglevt on a machine at
AS12859, BIT in the Netherlands, called nlede01.paphosting.net. It's reaching the VIPs that are
announced in Amsterdam, the Netherlands (vip0.l.ipng.ch) and Lille, France (vip1.l.ipng.ch), and
it is doing so with both IPv4 and IPv6, and it is doing so on port 80 and 443, which yields eight
targets. The webservers are configured to respond with an empty HTTP 204 response, and I've replayed
about 1Mio requests to each VIP. A few of these failed, which was mostly me playing around with
backend drains/flushes, hostile shutdowns (rebooting an nginx), and VIP failover scenarios. Then,
each VIP shows its last 100 probes in terms of latency, latency tail, and success rate.
In the second section, the tool is just showing how many times a response had a certain HTTP header in it. The greyed out ones are values which have not been seen in five seconds, the white ones are seen: it shows that I'm consistently hashing this client to one frontend at a time (because each row has exactly one bright white entry): this test is using HTTP keepalive.
In the bottom section, a list of recent events is shown - this is mostly when the latency ceiling is hit. These are 'spikes' written in bright yellow, or if things like timeouts occur, they would be written in bright red.
{{< image width="4em" float="left" src="/assets/vpp-maglev/Claude_AI.svg" alt="Claude Code" >}}
I have to be honest here: before this project I had never written a Terminal UI in my life. The Bubble Tea documentation is good but the model - a pure functional message-passing loop - took me a while to internalize. I ended up leaning on Claude quite a bit to get the layout right, especially the live-updating cells and the latency histogram accumulation.
What I found was that I could describe what I wanted in plain language and the code that came back was usually correct and idiomatic. I then spent time reading and understanding the code before committing it. I learned a lot about how Go handles terminal output and about the Elm architecture that Bubble Tea is based on - much faster than I would have on my own. Having an AI collaborator that writes correct code does not mean I can stop learning; if anything, having working code in front of me makes the learning faster!
Frontend: GUI maglevd-frontend
Now that I'm in "yes, I vibe"-admission-mode, there's another type of component I've rarely if ever
worked on: web frontends! maglevd-frontend is a single Go binary with a
[SolidJS] single-page app embedded at build time via //go:embed - no
runtime file dependencies, no Node.js required after the build. Simple and standalone.
One design goal I set early was to be able to observe all my load balancer instances from a single
dashboard. maglevd-frontend connects to one or more maglevd instances by adding them to a --server
flag upon startup.
At the top of the page, I add a scope selector, one pill per configured maglevd, colored green when
the frontend's gRPC channel to that instance is alive and red when it cannot connect. Clicking a pill
switches the entire view to that site's frontends. I notice that reloading the page resets all of
it, so I add a cookie so that all selections can persist across page reloads.
Frontend: Live Event Streaming
I learn about Server-Sent Events (SSE): maglevd-frontend subscribes to WatchEvents on each
configured maglevd and translates the gRPC stream into SSE events on the /view/api/events
endpoint. The browser's EventSource API reconnects automatically on disconnect, and the server
maintains a 30-second / 2000-event ring buffer so that a page reload replays recent events using
Last-Event-ID. I'm pleased with the result: a dashboard that stays current in real time with no
polling and visible catch-up after a brief disconnect, like a laptop lid close.
When a backend transitions from up to down, the badge in the frontend card updates within
milliseconds. A pool failover - where the primary pool empties and the fallback pool activates -
appears as a cascade of state changes followed by a re-rendering of the effective weight column. The
LB buckets column (showing VPP's actual hash table allocation for each AS) is refreshed via a
debounced GetVPPLBState scrape on every transition, at most once per second per maglevd. And
looking at this frontend, it may be clear to you why I designed the backend to have a subscribable
event stream:
{{< image width="100%" src="/assets/vpp-maglev/maglev-frontend.png" alt="VPP Maglev Frontend" >}}
The tech stack for the Single Page App is [SolidJS], a super cool reactive framework that compiles away its virtual DOM and produces small, fast bundles. I chose it over React partly because I was curious about it and partly because the bundle size matters when you are embedding the whole thing in a Go binary. The event store is a simple Solid signal that the SSE handler updates; every component that cares re-renders automatically without explicit subscription management. It's slick and much easier to use than I had initially thought!
Frontend: Admin Surface
When both MAGLEV_FRONTEND_USER and MAGLEV_FRONTEND_PASSWORD environment variables are set, the
admin surface is activated at /admin/. I make sure that without credentials, /admin/ returns
404. In this case, the admin path is not just unprotected, it is entirely absent. Security matters,
at least a little bit, even if the frontend will not be exposed onto the Internet.
In admin mode, every backend row grows a ⋮ (kebab) menu with pause, resume, enable,
disable, and set weight entries. Lifecycle actions open a confirmation dialog that spells out the
dataplane consequence: disable explicitly warns that it will drop live sessions via the flow-table
flush. The weight dialog has a 0-100 slider and a flush existing flows checkbox - unchecked is the
graceful drain, checked is the immediate session-drop path.
Also in admin mode, a Debug panel at the bottom of the page tails every event the SPA has seen
across all maglevd instances: backend and frontend transitions, log lines, VPP LB sync events, and
connection status flips, all formatted for scanning. A scope filter narrows the tail to the current
maglevd; an all maglevds checkbox enables firehose mode; a pause button freezes the tail so
you can read back.
Results
I've rolled this out at IPng Networks a few weeks ago, and it's been running rock solid ever since.
I've taken four VPP machines, connected them to the core routers, and started to announce two VIPs,
each announced in two cities. vip0 is announced from Zurich (Switzerland) and Amsterdam (the
Netherlands), and vip1 is announced from Lucerne (Switzerland) and Lille (France). I've moved over
most websites, as I find putting skin in the game is important:
pim@summer:~$ host ipng.ch
ipng.ch has address 194.1.163.31
ipng.ch has address 194.126.235.31
ipng.ch has IPv6 address 2001:678:d78::1:0:1
ipng.ch has IPv6 address 2a0b:dd80::1:0:1
The only service I'm a bit apprehensive about - even though I don't think I need to be - is the [Static CT Logs], which do about 2.5kqps of reads and 400qps of writes at the moment. The plan is to let this marinate for a few weeks, and then move the read-path and later on, also the write-path to this construction.
What's Next
Using Maglev has a few significant benefits. Most importantly, I can drain (or weather an outage of) any nginx frontend within seconds, and there is no more DNS propagation delay. Another key property is that the loadbalanced VIPs themselves are now completely mobile, and anycasted. I can drain a VPP loadbalancer by simply removing its announcement of the VIPs, and anycast routing will seamlessly move the traffic to another live replica. This immunizes IPng from site / datacenter / machine failures as well, as rerouting happens within only a few seconds.
However, there's also a few smaller downsides. Notably, this setup is more complex than merely having "the webserver", there are now half a dozen webservers, and potentially half a dozen places where traffic can enter the system, which poses a challenge with observability. In an upcoming article, I'll spend some time thinking through how to make it as easy as possible, with Prometheus and Grafana dashboards, as well as a clever trick to be able to see which Maglev loadbalancer sent which request to which IPng nginx Frontend. If this type of thing is interesting to you, stay tuned!