Typo and readability fixes

2026-04-30 19:24:32 +02:00
parent 078720cfac
commit a09a9da87c
1 changed files with 53 additions and 53 deletions
@@ -32,7 +32,7 @@ to backends at very high rates, and all the rest is left as an exercise for the
 The core problem is that VPP's `lb` plugin is pure dataplane. It holds a table of VIPs, each with a
 set of application servers and their weights using the feature I added. It then hashes new flows
 deterministically onto those servers. That is cool, but it is all it does. If a backend stops
-responding, VPP does not know and does not care — it will keep sending traffic to that address until
+responding, VPP does not know and does not care - it will keep sending traffic to that address until
 someone or something tells it otherwise. The result is a black hole: clients trying to establish new
 connections time out while waiting for a backend that will never respond.

@@ -61,7 +61,7 @@ controlplane daemon), `maglevc` (a CLI for it), `maglevd-frontend` (a web dashbo
 Before blindly writing code, I wrote down a few of the constraints I wanted to hold true. Wait, a
 design you say? Well, yes! And this design turned out to drive most of the architectural decisions:

-**One source of truth.** Every component — CLI, web dashboard, alerting scripts — reads `maglevd`
+**One source of truth.** Every component - CLI, web dashboard, alerting scripts - reads `maglevd`
 through one typed gRPC interface. There is no secondary control plane. The CLI and the web dashboard
 show exactly the same state as each other because they both ask the same controlplane daemon.

@@ -129,7 +129,7 @@ maglev:
            nginx0-fra: {}
 ```

-A **healthcheck** defines how to probe — the protocol, port, success criteria, timing parameters,
+A **healthcheck** defines how to probe - the protocol, port, success criteria, timing parameters,
 and so on. A **backend** is a named IP address bound to a healthcheck. A **frontend** is a VIP
 address with one or more named **pools**, where each pool is an ordered list of `(backend, weight)`
 tuples. At runtime, each backend gets exactly one probe (which Go lets me use goroutines for),
@@ -163,7 +163,7 @@ On each probe, a pass increments the counter (ceiling at maximum); a failure dec
 at zero). This gives **hysteresis**: a backend sitting at the rise boundary needs `fall`
 consecutive failures before it transitions to down, and a fully-down backend needs `rise`
 consecutive passes to come back up. A flapping backend that alternates between passing and failing
-stays in the degraded zone without bouncing between states — which is exactly what I want to
+stays in the degraded zone without bouncing between states - which is exactly what I want to
 avoid a storm of VPP API calls from a noisy backend.

 In _pseudocode_, here's what that simple yet elegant approach looks like:
@@ -206,21 +206,21 @@ way down to 0.

 **Probe types.** `maglevd` starts off its life supporting four probe types:

- **`icmp`** — sends an ICMP echo request and waits for a reply, for which I do not need to run the
+- **`icmp`** - sends an ICMP echo request and waits for a reply, for which I do not need to run the
 daemon with root privileges, instead I can assign `CAP_NET_RAW` for this purpose. This healthcheck
 type is useful for checking basic reachability without opening a TCP connection. Borrowing again
 from HAProxy, this can result in probe codes: `L4OK` on reply, `L4TOUT` on timeout, `L4CON` on send
 error.
- **`tcp`** — opens a TCP connection to the configured port and closes it cleanly. This healthcheck
+- **`tcp`** - opens a TCP connection to the configured port and closes it cleanly. This healthcheck
 can optionally wrap the connection in TLS with parameter `ssl: true`, with optional server name and
 `insecure-skip-verify` to allow for self-signed certificates. The resulting probe codes are `L4OK`
 on connect, `L4CON` on refused, `L4TOUT` on timeout, `L6OK`/`L6CON`/`L6TOUT` for TLS.
- **`http`** — opens a TCP connection, sends an HTTP/1.1 `GET` request to the configured path with
+- **`http`** - opens a TCP connection, sends an HTTP/1.1 `GET` request to the configured path with
 an optional `Host` header, and validates the response code against a configured range (e.g.
 `"200-204"`). This healthcheck can optionally validate the body against a regular expression, making
 it similar to how Nagios does its checks. The probe return codes are: `L7OK` on success, `L7STS` on
 unexpected status code, `L7RSP` on bad response, and `L7TOUT` on timeout.
- **`https`** — This is a special-case of the `http` healthcheck type, but using TLS. It supports
+- **`https`** - This is a special-case of the `http` healthcheck type, but using TLS. It supports
  the use of SNI `server-name` override and `insecure-skip-verify` as well for backends with
 self-signed certificates.

@@ -260,7 +260,7 @@ RPCs.

 The most powerful RPC I add is called `WatchEvents`. This one returns a streaming response, and a
 client can initiate a `WatchRequest` which specifies which event types to include. The `vpp-maglev`
-daemon then pushes events as they happen — there is no polling. The event envelope is a protobuf `oneof`:
+daemon then pushes events as they happen - there is no polling. The event envelope is a protobuf `oneof`:

 ```protobuf
 message Event {
@@ -296,8 +296,8 @@ perform to make the dataplane reflect the backend state as seen by the controlpl
 I tried to keep the reconciler as simple as possible. It only subscribes to the healthchecker's
 event channel and for every backend transition, calls `SyncLBStateVIP` for the affected frontend. To
 catch drift in the VPP dataplane, for example if VPP restarted, or if we re-connected to VPP, a
-periodic `SyncLBStateAll` also runs and sweeps up any changes - these should not occur in general
-operation, though. It's a belt-and-suspenders type of thing.
+periodic `SyncLBStateAll` also runs and sweeps up any changes. This should not occur in general
+operation, though, it's a belt-and-suspenders type of thing.

 This isolated `SyncLBState*` stuff is also a future hook for divorcing the healthchecker and the LB
 reconciler into two different binaries: think of a datacenter with 100 maglev frontends and 1000
@@ -306,7 +306,7 @@ to have every maglev check every backend!

 Otherwise, the reconciler carries no state of its own. I put all the logic in `SyncLBStateVIP`,
 which computes the full desired state from the config and current health, diffs it against what VPP
-has, and issues only the necessary API calls.
+has, and issues only the necessary Binary API calls to bring the two in sync.

 ### Dataplane API: Startup Warmup

@@ -315,9 +315,9 @@ has, and issues only the necessary API calls.
 During one of my tests, I noticed that after restarting the maglevd, it completely wipes the VPP
 loadbalancer VIPs. In hindsight this makes total sense because when the healthchecker starts, all
 backends are in `unknown` state, which causes the weights to be zero until the backends transition
-to the `up` state. This causes thrashing in the dataplane, which is not OK! I think for a bit and
-decide how I'm going to prevent that. My solution is a two-phase startup warmup controlled by
-`startup-min-delay` (default 5s) and `startup-max-delay` (default 30s):
+to the `up` state. This causes thrashing in the dataplane, which is not what I intended. I think for
+a bit and decide how I'm going to prevent that. My solution is a two-phase startup warmup controlled
+by `startup-min-delay` (default 5s) and `startup-max-delay` (default 30s):

 **Phase 1: hands-off window.** For the first `startup-min-delay` seconds after maglevd starts,
 neither the reconciler nor the periodic sync loop can touch VPP at all. Probes run, the checker
@@ -331,7 +331,7 @@ on every received transition. Whichever wins the race performs a single `SyncLBS
 VIP. It is free to live its life.

 **Watchdog.** At `startup-max-delay`, any VIP whose backends are still `unknown` is swept by a
-final `SyncLBStateAll`. Those stragglers are programmed with weight zero — something is still wrong
+final `SyncLBStateAll`. Those stragglers are programmed with weight zero: something is still wrong
 with them, but this is an unlikely situation, and one of those belt-and-suspenders things again.

 ## Controlplane CLI: `maglevc`
@@ -348,7 +348,7 @@ pim@summer:~$ maglevc --server chbtl2.net.ipng.ch:9090 watch events log level de

 In interactive mode, the prompt is `maglev> `. I put real effort into the shell experience because
 this is the tool I reach for constantly when I want to interact with the system. I'm inspired by
-Bird2 and try to mimic its look and feel, which will come in handy as IPng Networks uses Bird in
+Bird and try to mimic its look and feel, which will come in handy as IPng Networks uses Bird in
 our routing controlplane. Having these tools all look and feel the same really helps, especially
 when fecal matter hits the fast-spinning cooling device.

@@ -363,11 +363,11 @@ I know what to type next. I saw this trick first in the SR Linux command-line in
 in-line completion logic a lot. As the Dutch would say, 'Beter goed gestolen dan slecht bedacht'.

 **Prefix matching** means I never have to type the full command. `sh ba nginx0` is equivalent to
-`show backends nginx0`, and `sh v l s` expands to `show vpp lb state`. This was important to me
+`show backends nginx0`, and `sh vpp l s` expands to `show vpp lb state`. This was important to me
 because I am often working in a hurry and do not want to type long commands.

 **Inline help** via `?` will print the available completions for the current cursor position with
-a short description next to each keyword. The `?` character is not consumed — the input line is
+a short description next to each keyword. The `?` character is not consumed - the input line is
 unchanged after the help display, which is identical to how Bird consumes `?` characters.

 **Color mode** defaults to on in the interactive shell and off in one-shot mode, so piped output is
@@ -406,7 +406,7 @@ protocol       tcp
 port           443
 src-ip-sticky  false
 flush-on-down  true
-description    IPv6 HTTPS VIP — nginx backends
+description    IPv6 HTTPS VIP - nginx backends
 pools
  name      primary
  backends  nginx0-chlzn0  weight 100  effective 100
@@ -420,7 +420,7 @@ Here, I brought `nginx0-chplo0` down so its effective weight is zero; the two in
 `nginx0-nlams0` and `nginx0-frggh0` are in the secondary pool, which is inactive because the primary
 pool still has `nginx0-chlzn0` up and serving (all) the traffic.

-### VPP State — A Separate Concern
+### VPP State - A Separate Concern

 One design decision I am happy with is keeping the `maglevd` view of the world (frontend and backend
 state, health counters, effective weights) completely separate from the VPP view (what is actually
@@ -458,8 +458,8 @@ The second flushes existing flows immediately. The third command then marks the
 which will remove it from serving in all pools it's a member of. This is useful when performing
 maintenance on a backend, and it's the command I ran in the 'show frontend' output above.

-`maglevc watch events` streams everything in real time. Combined with `log level debug`, it shows
-every probe attempt and every VPP API call as they happen:
+Arguably the coolest idea, `maglevc watch events`, streams everything in real time. Combined with
+`log level debug`, it shows every probe attempt and every VPP API call as they happen:

 ```
 maglev> watch events log level debug backend
@@ -475,7 +475,7 @@ maglev> watch events log level debug backend
          {"key":"elapsed","value":"36ms"}]}}
 ```

-And finally, I mimic Bird2's "reconfigure" with a set of two primitives `config check` and `config
+And finally, I mimic Bird's "reconfigure" with a set of two primitives `config check` and `config
 reload` which let me validate and apply configuration changes without restarting the daemon. With
 that, the maglev daemon, the main brains of the operation, is feature complete.

@@ -484,19 +484,19 @@ that, the maglev daemon, the main brains of the operation, is feature complete.
 Once `maglevd` is running and `maglevc` shows everything healthy, the natural next question is: does
 it actually work end-to-end? A healthcheck passing means the backend can accept a TCP connection
 or return an HTTP 200, but it does not tell me whether a client hitting the VIP actually reaches the
-right backend, or whether failover is visible at the application level.
+right backend, or whether failover is visible at the application level?

-I wanted a tool that could sit outside the control plane entirely — not talking gRPC, not reading
-`maglevd` state — and just hit the VIPs directly as a real client would, tallying which backend
+I wanted a tool that could sit outside the control plane entirely - not talking gRPC, not reading
+`maglevd` state - but just hitting the VIPs directly as a real client would, tallying which backend
 served each request. The obvious approach is to configure each backend to include its own hostname
-in an HTTP response header. On my NGINX servers I add a header `X-IPng-Frontend` which returns the
+in an HTTP response header. On my nginx servers I add a header `X-IPng-Frontend` which returns the
 local `$hostname` variable. Then a probe tool that reads `X-IPng-Frontend` from each response can
 show the live distribution across backends, and a failover is immediately visible as a
 redistribution of the tally.

 That idea turns into `maglevt`, which reads one or more `maglev.yaml` files, enumerates the
 HTTP/HTTPS frontends, and probes each VIP at a configurable interval (default 100ms per VIP, with
-+/-10% jitter to prevent phase-locking). Each probe opens a fresh TCP connection — keep-alives are off
+/-10% jitter to prevent phase-locking). Each probe opens a fresh TCP connection - keep-alives are off
 by default - so every request is independently hashed by VPP's Maglev algorithm. The tally
 reshuffles the moment a backend goes down or a standby pool activates.

@@ -507,8 +507,8 @@ running success and failure counts, and the response header tally, and a set of
 {{< image width="100%" src="/assets/vpp-maglev/maglevt.png" alt="VPP Maglev TUI client" >}}

 There's a lot to see in this screenshot, so let me unpack it. I'm running `maglevt` on a machine at
-AS12859, BIT in the Netherlands called `nlede01.paphosting.net`. It's reaching the VIPs that are
-announced in Amsterdam (the Netherlands, `vip0.l.ipng.ch`) and Lille (France, `vip1.l.ipng.ch`), and
+AS12859, BIT in the Netherlands, called `nlede01.paphosting.net`. It's reaching the VIPs that are
+announced in Amsterdam, the Netherlands (`vip0.l.ipng.ch`) and Lille, France (`vip1.l.ipng.ch`), and
 it is doing so with both IPv4 and IPv6, and it is doing so on port 80 and 443, which yields eight
 targets. The webservers are configured to respond with an empty HTTP 204 response, and I've replayed
 about 1Mio requests to each VIP. A few of these failed, which was mostly me playing around with
@@ -516,33 +516,33 @@ backend drains/flushes, hostile shutdowns (rebooting an nginx), and VIP failover
 each VIP shows its last 100 probes in terms of latency, latency tail, and success rate.

 In the second section, the tool is just showing how many times a response had a certain HTTP header
-in it. The greyed out ones are headers which have not been seen in five seconds, the white ones are
+in it. The greyed out ones are values which have not been seen in five seconds, the white ones are
 seen: it shows that I'm consistently hashing this client to one frontend at a time (because each row
 has exactly one bright white entry): this test is using HTTP keepalive.

 In the bottom section, a list of recent events is shown - this is mostly when the latency ceiling is
-hit. These are 'spikes' written in bright yellow, or things like timeouts occur, which would be
+hit. These are 'spikes' written in bright yellow, or if things like timeouts occur, they would be
 written in bright red.

 {{< image width="4em" float="left" src="/assets/vpp-maglev/Claude_AI.svg" alt="Claude Code" >}}

-I have to be honest here: before this project I had never written a terminal UI in my life. The
-Bubble Tea documentation is good but the model — a pure functional message-passing loop — took me
+I have to be honest here: before this project I had never written a Terminal UI in my life. The
+Bubble Tea documentation is good but the model - a pure functional message-passing loop - took me
 a while to internalize. I ended up leaning on Claude quite a bit to get the layout right, especially
 the live-updating cells and the latency histogram accumulation.

 What I found was that I could describe what I wanted in plain language and the code that came back
 was usually correct and idiomatic.  I then spent time reading and understanding the code before
 committing it. I learned a lot about how Go handles terminal output and about the Elm architecture
-that Bubble Tea is based on — much faster than I would have on my own. Having an AI collaborator
-that writes correct code does not mean you stop learning; if anything, having working code in front
-of you makes the learning faster!
+that Bubble Tea is based on - much faster than I would have on my own. Having an AI collaborator
+that writes correct code does not mean I can stop learning; if anything, having working code in
+front of me makes the learning faster!

 ## Frontend: GUI `maglevd-frontend`

 Now that I'm in "yes, I vibe"-admission-mode, there's another type of component I've rarely if ever
-worked on: web frontends! `maglevd-frontend` is an optional web dashboard, a single Go binary with a
-[[SolidJS](https://www.solidjs.com/)] single-page app embedded at build time via `//go:embed` — no
+worked on: web frontends! `maglevd-frontend` is a single Go binary with a
+[[SolidJS](https://www.solidjs.com/)] single-page app embedded at build time via `//go:embed` - no
 runtime file dependencies, no Node.js required after the build. Simple and standalone.

 One design goal I set early was to be able to observe all my load balancer instances from a single
@@ -564,7 +564,7 @@ maintains a 30-second / 2000-event ring buffer so that a page reload replays rec
 polling and visible catch-up after a brief disconnect, like a laptop lid close.

 When a backend transitions from `up` to `down`, the badge in the frontend card updates within
-milliseconds. A pool failover — where the primary pool empties and the fallback pool activates —
+milliseconds. A pool failover - where the primary pool empties and the fallback pool activates -
 appears as a cascade of state changes followed by a re-rendering of the effective weight column. The
 LB buckets column (showing VPP's actual hash table allocation for each AS) is refreshed via a
 debounced `GetVPPLBState` scrape on every transition, at most once per second per `maglevd`. And
@@ -573,24 +573,24 @@ event stream:

 {{< image width="100%" src="/assets/vpp-maglev/maglev-frontend.png" alt="VPP Maglev Frontend" >}}

-The tech stack for the Single Page App is [SolidJS](https://www.solidjs.com/), a super cool reactive
+The tech stack for the Single Page App is [[SolidJS](https://www.solidjs.com/)], a super cool reactive
 framework that compiles away its virtual DOM and produces small, fast bundles. I chose it over React
 partly because I was curious about it and partly because the bundle size matters when you are
 embedding the whole thing in a Go binary. The event store is a simple Solid signal that the SSE
 handler updates; every component that cares re-renders automatically without explicit subscription
-management. It's slick!
+management. It's slick and much easier to use than I had initially thought!

 ### Frontend: Admin Surface

 When both `MAGLEV_FRONTEND_USER` and `MAGLEV_FRONTEND_PASSWORD` environment variables are set, the
-admin surface is activated at `/admin/`. Without credentials, `/admin/` returns 404 — the admin
-path is not just unprotected, it is entirely absent. Security matters, at least a little bit, even
-if the frontend will not be exposed onto the Internet.
+admin surface is activated at `/admin/`. I make sure that without credentials, `/admin/` returns
+404. In this case, the admin path is not just unprotected, it is entirely absent. Security matters,
+at least a little bit, even if the frontend will not be exposed onto the Internet.

 In admin mode, every backend row grows a `⋮` (kebab) menu with `pause`, `resume`, `enable`,
 `disable`, and `set weight` entries. Lifecycle actions open a confirmation dialog that spells out the
 dataplane consequence: `disable` explicitly warns that it will drop live sessions via the flow-table
-flush. The weight dialog has a 0-100 slider and a `flush existing flows` checkbox — unchecked is the
+flush. The weight dialog has a 0-100 slider and a `flush existing flows` checkbox - unchecked is the
 graceful drain, checked is the immediate session-drop path.

 Also in admin mode, a **Debug panel** at the bottom of the page tails every event the SPA has seen
@@ -623,7 +623,7 @@ write-path to this construction.
 ## What's Next

 Using Maglev has a few significant benefits. Most importantly, I can drain (or weather an outage of)
-any NGINX frontend within seconds, and there is no more DNS propagation delay. Another key property
+any nginx frontend within seconds, and there is no more DNS propagation delay. Another key property
 is that the loadbalanced VIPs themselves are now completely mobile, and anycasted. I can drain a VPP
 loadbalancer by simply removing its announcement of the VIPs, and anycast routing will seamlessly
 move the traffic to another live replica. This immunizes IPng from site / datacenter / machine
@@ -634,4 +634,4 @@ having "the webserver", there are now half a dozen webservers, and potentially h
 where traffic can enter the system, which poses a challenge with observability. In an upcoming
 article, I'll spend some time thinking through how to make it as easy as possible, with Prometheus
 and Grafana dashboards, as well as a clever trick to be able to see which Maglev loadbalancer sent
-which request to which IPng NGINX Frontend. If this type of thing is interesting to you, stay tuned!
+which request to which IPng nginx Frontend. If this type of thing is interesting to you, stay tuned!