Add Prometheus metrics endpoint; containerize integration tests

Prometheus metrics (internal/metrics/, cmd/maglevd/)
- New --metrics-addr flag (default :9091, env MAGLEV_METRICS_ADDR)
  serving /metrics via promhttp.
- Gauge metrics scraped on demand via a custom prometheus.Collector:
  maglev_backend_state, maglev_backend_health, maglev_backend_enabled,
  maglev_frontend_pool_backend_weight.
- Inline counter/histogram metrics updated per probe:
  maglev_probe_total (by backend, type, result, code),
  maglev_probe_duration_seconds (by backend, type),
  maglev_backend_transitions_total (by backend, from, to).
- StateSource interface in metrics package breaks the import cycle
  with checker; checker.Checker satisfies it via GetBackendInfo.

Integration tests
- Run maglevd inside a containerlab node (debian:trixie-slim with
  build/ bind-mounted) instead of on the host. Eliminates port
  collisions with any host maglevd.
- maglevc commands run via docker exec into the maglevd container.
- Add 6 Prometheus test cases: endpoint reachable, all backends
  report state=up, probe counters non-zero, duration histogram
  populated, pool weights correct, transition counters present.
This commit is contained in:
2026-04-11 20:50:59 +02:00
parent 8bde00eb61
commit 4ab3096c8b
9 changed files with 311 additions and 18 deletions

View File

@@ -1,6 +1,5 @@
*** Settings ***
Library OperatingSystem
Library Process
Resource ../common.robot
Suite Setup Setup Suite
@@ -10,26 +9,20 @@ Suite Teardown Cleanup Suite
*** Variables ***
${lab-name} maglevd-test
${lab-file} maglevd-lab/maglevd.clab.yml
${config-file} maglevd-lab/maglev.yaml
${runtime} docker
${GRPC_PORT} 9091
${MAGLEVD_NODE} clab-maglevd-test-maglevd
${METRICS_URL} http://172.20.30.2:9091/metrics
*** Test Cases ***
Deploy maglevd-test lab
[Documentation] Deploy the containerlab topology. The maglevd node starts
... automatically as PID 1 via start.sh and begins probing the nginx
... backends immediately.
${rc} ${output} = Run And Return Rc And Output
... ${CLAB_BIN} --runtime ${runtime} deploy -t ${CURDIR}/${lab-file}
Log ${output}
Should Be Equal As Integers ${rc} 0
Start maglevd
${handle} = Start Process ${MAGLEVD}
... --config ${CURDIR}/${config-file}
... --grpc-addr :${GRPC_PORT}
... --log-level debug
... alias=maglevd stdout=${EXECDIR}/tests/out/maglevd.log
... stderr=STDOUT
Set Suite Variable ${MAGLEVD_HANDLE} ${handle}
Sleep 3s Wait for nginx containers and probes to converge
All backends reach up state
@@ -86,22 +79,55 @@ Enable backend restarts probing
... Backend Should Be Up nginx2
Prometheus endpoint is reachable
${rc} ${output} = Run And Return Rc And Output
... curl -sf ${METRICS_URL}
Log ${output}
Should Be Equal As Integers ${rc} 0
Should Contain ${output} maglev_backend_state
Prometheus reports all backends up
${output} = Scrape Metrics
# Each backend should have state="up" = 1.
Should Contain ${output} maglev_backend_state{address="172.20.30.11",backend="nginx1",healthcheck="http-check",state="up"} 1
Should Contain ${output} maglev_backend_state{address="172.20.30.12",backend="nginx2",healthcheck="http-check",state="up"} 1
Should Contain ${output} maglev_backend_state{address="172.20.30.13",backend="nginx3",healthcheck="http-check",state="up"} 1
Prometheus reports probe counters
${output} = Scrape Metrics
Should Match Regexp ${output} maglev_probe_total\\{backend="nginx1".*result="success".*\\}\\s+[1-9]
Should Match Regexp ${output} maglev_probe_total\\{backend="nginx2".*result="success".*\\}\\s+[1-9]
Should Match Regexp ${output} maglev_probe_total\\{backend="nginx3".*result="success".*\\}\\s+[1-9]
Prometheus reports probe duration histogram
${output} = Scrape Metrics
Should Match Regexp ${output} maglev_probe_duration_seconds_count\\{backend="nginx1".*\\}\\s+[1-9]
Prometheus reports pool weights
${output} = Scrape Metrics
Should Contain ${output} maglev_frontend_pool_backend_weight{backend="nginx1",frontend="http-vip",pool="primary"} 100
Should Contain ${output} maglev_frontend_pool_backend_weight{backend="nginx3",frontend="http-vip",pool="fallback"} 100
Prometheus reports transition counters
${output} = Scrape Metrics
# All backends transitioned unknown → up during startup.
Should Match Regexp ${output} maglev_backend_transitions_total\\{backend="nginx1",from="unknown",to="up"\\}\\s+[1-9]
*** Keywords ***
Setup Suite
${arch} = Run go env GOARCH
Set Suite Variable ${ARCH} ${arch}
Set Suite Variable ${MAGLEVD} ${EXECDIR}/build/${ARCH}/maglevd
Set Suite Variable ${MAGLEVC} ${EXECDIR}/build/${ARCH}/maglevc
Cleanup Suite
Run Keyword And Ignore Error Terminate Process maglevd kill=true
Run docker logs ${MAGLEVD_NODE} > ${EXECDIR}/tests/out/maglevd.log 2>&1
Run ${CLAB_BIN} --runtime ${runtime} destroy -t ${CURDIR}/${lab-file} --cleanup
Maglevc
[Documentation] Run a maglevc command and return its output.
[Documentation] Run a maglevc command inside the maglevd container.
[Arguments] ${cmd}
${rc} ${output} = Run And Return Rc And Output
... ${MAGLEVC} --server\=localhost:${GRPC_PORT} --color\=false ${cmd}
... docker exec ${MAGLEVD_NODE} /opt/maglev/build/${ARCH}/maglevc --color\=false ${cmd}
Log ${output}
Should Be Equal As Integers ${rc} 0
RETURN ${output}
@@ -133,3 +159,10 @@ Probe Count Should Be Positive
${count} = Get Probe Count ${name}
Should Be True ${count} > 0
... No health-check requests found in nginx logs for ${name}
Scrape Metrics
[Documentation] Fetch the Prometheus /metrics endpoint from the maglevd container.
${rc} ${output} = Run And Return Rc And Output
... curl -sf ${METRICS_URL}
Should Be Equal As Integers ${rc} 0
RETURN ${output}

View File

@@ -6,6 +6,15 @@ mgmt:
topology:
nodes:
maglevd:
kind: linux
image: debian:trixie-slim
mgmt-ipv4: 172.20.30.2
binds:
- ../../../build:/opt/maglev/build:ro
- ./maglev.yaml:/etc/maglev/maglev.yaml:ro
- ./start.sh:/start.sh:ro
cmd: /start.sh
nginx1:
kind: linux
image: nginx:alpine

View File

@@ -0,0 +1,7 @@
#!/bin/sh
ARCH=$(uname -m)
case "$ARCH" in
x86_64) ARCH=amd64 ;;
aarch64) ARCH=arm64 ;;
esac
exec /opt/maglev/build/${ARCH}/maglevd --config /etc/maglev/maglev.yaml --log-level debug