Add ngx_http_ipng_stats_module: per-VIP, per-device traffic counters

Full implementation of the nginx dynamic module with: - SO_BINDTODEVICE-based per-interface traffic attribution - Per-worker lock-free counters flushed to shared memory - Prometheus text and JSON scrape endpoint at configurable location - UDP-only global logtail (ipng_stats_logtail) for fire-and-forget access log streaming - $ipng_source_tag nginx variable for use in log_format/map - Histogram buckets, EWMA rate gauges, zone meta-metrics - Debian packaging (libnginx-mod-http-ipng-stats) - Robot Framework end-to-end tests via containerlab - SPDX Apache-2.0 headers on all source files
2026-04-16 17:36:42 +02:00
parent c05bcf6aa6
commit 5a7e2f77f1
25 changed files with 4016 additions and 102 deletions
--- a/docs/user-guide.md
+++ b/docs/user-guide.md
@@ -0,0 +1,384 @@
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+# nginx-ipng-stats-plugin — User Guide
+
+This document walks an operator through installing the plugin, deploying it on a single nginx host serving traffic that arrives on
+distinct interfaces (GRE tunnels, VLANs, bonded links, or plain ethernet), verifying that counters are flowing, and hooking up the
+scrape endpoint to Prometheus and other consumers.
+
+It covers (NFR-7.1):
+
+1. Installing the Debian package.
+2. Setting up interfaces for per-device attribution (GRE tunnel example).
+3. Writing a minimal nginx configuration.
+4. Verifying with `curl`.
+5. Scraping from Prometheus.
+6. Setting up a global logtail access log.
+7. Integrating with scrape consumers.
+
+For a directive-by-directive reference, read [`config-guide.md`](config-guide.md) alongside this guide.
+
+## 1. Install the package
+
+On Debian Trixie (and newer), the module is distributed as `libnginx-mod-http-ipng-stats`. The package depends on the stock `nginx`
+package and loads cleanly into it without recompiling nginx itself.
+
+```
+sudo apt install ./libnginx-mod-http-ipng-stats_0.1.0-1_amd64.deb
+```
+
+The package will:
+
+- Drop `ngx_http_ipng_stats_module.so` into `/usr/lib/nginx/modules/`.
+- Place a `load_module` stanza in `/etc/nginx/modules-available/50-mod-http-ipng-stats.conf`.
+- Symlink it into `/etc/nginx/modules-enabled/` so nginx picks it up on the next reload.
+- Run `nginx -t` and, if the test fails, remove the `modules-enabled` symlink and print a warning — so a broken upgrade never leaves
+  you with an nginx that cannot start.
+
+Confirm the module is loaded:
+
+```
+nginx -V 2>&1 | grep -o ngx_http_ipng_stats_module
+```
+
+## 2. Set up interfaces for per-device attribution
+
+The plugin attributes traffic by watching which interface the request came in on, using `SO_BINDTODEVICE` on per-interface listening
+sockets. For this to work, each traffic source that should be tracked separately MUST arrive on its own interface.
+
+This works with any kind of Linux interface — GRE tunnels, VLANs, VXLANs, bonded links, or plain ethernet. This guide uses GRE
+tunnels as the example, but the module does not care about the interface type.
+
+This guide doesn't prescribe a specific networking layer — use whatever your host already uses (`systemd-networkd`, Netplan,
+`/etc/network/interfaces`, or a hand-rolled script). The only hard requirement is:
+
+- Each traffic source that should be separately attributed gets its own interface on the nginx host.
+- Interfaces follow a consistent naming pattern. For GRE tunnels we recommend `gre-<tag>`, e.g. `gre-mg1`, `gre-mg2`.
+- The VIPs are bound to a local dummy or loopback interface so the kernel accepts packets destined for them.
+
+For example, with `systemd-networkd`, a GRE tunnel to a remote peer at `2001:db8::1` from this host at `2001:db8::100` looks like:
+
+```
+# /etc/systemd/network/10-gre-mg1.netdev
+[NetDev]
+Name=gre-mg1
+Kind=ip6gre
+
+[Tunnel]
+Local=2001:db8::100
+Remote=2001:db8::1
+TTL=64
+```
+
+```
+# /etc/systemd/network/10-gre-mg1.network
+[Match]
+Name=gre-mg1
+
+[Network]
+LinkLocalAddressing=no
+```
+
+Repeat for each additional tunnel. A trimmed-down variant of this scheme is what IPng uses in production.
+
+Verify the interfaces exist and carry traffic:
+
+```
+ip -6 tunnel show | grep gre-mg
+ip -6 -s link show gre-mg1
+```
+
+## 3. Write the nginx configuration
+
+The plugin needs three things in `nginx.conf`:
+
+1. A shared-memory zone for counters (`ipng_stats_zone`).
+2. A set of `listen` directives — a wildcard fallback plus one device-bound listener per attributed interface.
+3. A scrape location serving the `ipng_stats` handler.
+
+A minimal working configuration looks like this:
+
+```nginx
+load_module modules/ngx_http_ipng_stats_module.so;
+
+events {
+    worker_connections 4096;
+}
+
+http {
+    ipng_stats_zone ipng:4m;
+    ipng_stats_flush_interval 1s;
+    ipng_stats_default_source direct;
+
+    # A normal vhost. The fallback listen lines serve direct web traffic;
+    # the included file adds one device-bound listen per attributed interface.
+    server {
+        listen 80;
+        listen [::]:80;
+        include /etc/nginx/ipng-stats/listens.conf;
+
+        server_name _;
+        root /var/www/html;
+    }
+
+    # A second server block exposing the scrape endpoint on a locked-down port.
+    server {
+        listen 127.0.0.1:9113;
+        listen [::1]:9113;
+
+        location = /.well-known/ipng/statsz {
+            ipng_stats;
+            allow 127.0.0.1;
+            allow ::1;
+            allow 2001:db8::/48;   # your scrape consumers
+            deny all;
+        }
+    }
+}
+```
+
+And `/etc/nginx/ipng-stats/listens.conf` — the hand-maintained include file — is two lines per attributed interface (one per address
+family):
+
+```nginx
+listen 80      device=gre-mg1 ipng_source_tag=mg1;
+listen [::]:80 device=gre-mg1 ipng_source_tag=mg1;
+listen 80      device=gre-mg2 ipng_source_tag=mg2;
+listen [::]:80 device=gre-mg2 ipng_source_tag=mg2;
+listen 80      device=gre-mg3 ipng_source_tag=mg3;
+listen [::]:80 device=gre-mg3 ipng_source_tag=mg3;
+listen 80      device=gre-mg4 ipng_source_tag=mg4;
+listen [::]:80 device=gre-mg4 ipng_source_tag=mg4;
+```
+
+Test and reload:
+
+```
+sudo nginx -t
+sudo nginx -s reload
+```
+
+If `nginx -t` complains about an unknown `listen` parameter (`device=` or `ipng_source_tag=`), the module isn't loaded — check step 1.
+
+### Why wildcard listens?
+
+You do not need to enumerate VIPs in `listen`. A wildcard `listen 80 device=gre-mg1 ipng_source_tag=mg1;` accepts any local address
+served through the `gre-mg1` interface, and nginx routes per-request to the right vhost by `server_name` / `Host:` header. Adding a new
+VIP is a `server_name` change; adding a new interface is an append to `listens.conf`.
+
+### Why both a wildcard and device-bound listens?
+
+The fallback `listen 80;` / `listen [::]:80;` catches traffic arriving on any interface that isn't one of your attributed interfaces —
+for example, real clients hitting your host directly over `eth0`. The kernel's TCP socket lookup prefers the most-specific
+(device-matching) listener, so a SYN on `gre-mg1` always lands on the `mg1` socket, and a SYN on `eth0` always lands on the fallback.
+No races, no stealing. Direct traffic is counted under the tag set by `ipng_stats_default_source` (`direct` by default).
+
+## 4. Verify with curl
+
+Generate some traffic (or wait for real traffic), then scrape the endpoint locally:
+
+```
+curl -s http://127.0.0.1:9113/.well-known/ipng/statsz
+```
+
+Default output is Prometheus text format:
+
+```
+# HELP nginx_ipng_requests_total Total HTTP requests, per (source_tag, vip, code).
+# TYPE nginx_ipng_requests_total counter
+nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="200"} 12345
+nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="404"} 17
+nginx_ipng_requests_total{source_tag="mg2",vip="192.0.2.10",code="200"} 9876
+nginx_ipng_requests_total{source_tag="direct",vip="192.0.2.10",code="200"} 42
+# HELP nginx_ipng_bytes_in_total Request bytes received, per (source_tag, vip, code).
+# TYPE nginx_ipng_bytes_in_total counter
+nginx_ipng_bytes_in_total{source_tag="mg1",vip="192.0.2.10",code="200"} 9876543
+# ... and so on
+```
+
+For JSON output instead, set the `Accept` header:
+
+```
+curl -s -H 'Accept: application/json' http://127.0.0.1:9113/.well-known/ipng/statsz | jq .
+```
+
+To filter server-side to a single source tag:
+
+```
+curl -s 'http://127.0.0.1:9113/.well-known/ipng/statsz?source_tag=mg1'
+curl -s 'http://127.0.0.1:9113/.well-known/ipng/statsz?source_tag=mg1&vip=192.0.2.10'
+```
+
+If you see `source_tag="direct"` entries with non-zero counts and you expected all traffic to come in via attributed interfaces,
+something is routing around them — typically an interface that isn't in `listens.conf`, or an interface that's down.
+
+## 5. Scrape from Prometheus
+
+The same endpoint serves Prometheus text by default. Add a scrape job:
+
+```yaml
+# /etc/prometheus/prometheus.yml
+scrape_configs:
+  - job_name: nginx-ipng
+    scrape_interval: 15s
+    static_configs:
+      - targets:
+          - 'nginx-backend-1.example.com:9113'
+          - 'nginx-backend-2.example.com:9113'
+    metrics_path: /.well-known/ipng/statsz
+```
+
+You'll want to add `nginx-backend-*` to your `allow` rules in the scrape server block, or front the plugin with a TLS-terminating
+reverse proxy. The module does not ship its own auth; the nginx `allow`/`deny` ACL is your access control.
+
+Typical PromQL queries:
+
+```
+# Requests per second per source, per VIP:
+sum by (source_tag, vip) (rate(nginx_ipng_requests_total[1m]))
+
+# 5xx error rate per VIP, aggregated across all sources:
+sum by (vip) (rate(nginx_ipng_requests_total{code=~"5.."}[5m]))
+  /
+sum by (vip) (rate(nginx_ipng_requests_total[5m]))
+
+# p95 request duration per (source_tag, vip):
+histogram_quantile(0.95,
+    sum by (source_tag, vip, le) (rate(nginx_ipng_request_duration_seconds_bucket[5m])))
+```
+
+## 6. Set up a global logtail access log
+
+Operators who want a single unified access log covering all traffic — regardless of which `server` block handled the request — normally
+have to repeat `access_log` in every `server {}` block or rely on a catch-all virtual host. The `ipng_stats_logtail` directive removes
+that requirement: one line at the `http` level registers a global log-phase writer that fires unconditionally for every request (FR-8.1).
+
+The logtail sends each buffer flush as a single UDP datagram to a `host:port`. Zero disk I/O, no backpressure, no blocking if the
+receiver is down. This makes it ideal for fire-and-forget analytics pipelines where delivery guarantees are unnecessary and disk writes
+would add unwanted I/O pressure. For file-based access logging, use nginx's built-in `access_log` directive.
+
+### Define the log format
+
+Add a `log_format` declaration inside the `http { ... }` block, **before** the `ipng_stats_logtail` directive that references it:
+
+```nginx
+log_format logtail '$host\t$remote_addr\t$ipng_source_tag\t$server_addr\t'
+                   '$request_method\t$request_uri\t$status\t$body_bytes_sent\t'
+                   '$request_time';
+```
+
+Any nginx variable is usable here, including `$ipng_source_tag` (the device attribution tag, FR-6.1) and `$server_addr` (the VIP
+that received the request).
+
+### Configuration
+
+```nginx
+http {
+    ipng_stats_zone ipng:4m;
+
+    log_format logtail '$host\t$remote_addr\t$ipng_source_tag\t$server_addr\t'
+                       '$request_method\t$request_uri\t$status\t$body_bytes_sent\t'
+                       '$request_time';
+
+    ipng_stats_logtail logtail udp://127.0.0.1:9514 buffer=16k flush=1s;
+
+    server { ... }
+}
+```
+
+- **`logtail`** (first argument) — the `log_format` name.
+- **`udp://127.0.0.1:9514`** — destination as a `udp://host:port` URI. `host` must be a literal IPv4 address (no hostnames, no IPv6
+  in v0.1).
+- **`buffer=16k`** — per-worker write buffer. Lines are held in memory until the buffer fills, the flush timer fires, or the worker
+  exits. Default is `64k`; minimum is `1k` (FR-8.3).
+- **`flush=1s`** — maximum age of buffered data before it is sent. Default is `1s`; minimum is `100ms` (FR-8.3).
+
+Each buffer flush becomes a single `sendto()` on a per-worker `SOCK_DGRAM` socket. When the flush timer fires (or the buffer fills),
+the entire buffered payload is sent as one datagram — no file open, no `write()`, no `fsync()`. If no receiver is listening, the kernel
+drops the datagram silently and the worker carries on. This is by design: the logtail exists for non-critical analytics pipes where
+lost datagrams are acceptable and disk I/O is not.
+
+**Constraints (v0.1):**
+
+- `host` must be a literal IPv4 address. Hostnames and IPv6 are not supported yet.
+- Large `buffer=` values produce large datagrams. On the loopback interface the practical ceiling is ~64 KB, well above typical
+  configured buffer sizes. On routed paths, path MTU applies.
+- There is no acknowledgment, retry, or sequence number. If the receiver is down, the data is gone.
+
+**Starting a receiver** is trivial:
+
+```bash
+# Quick one-shot inspection:
+nc -u -l 127.0.0.1 9514
+```
+
+For a production-ready logtail consumer, see [`nginx-logtail`](https://git.ipng.ch/ipng/nginx-logtail), which receives the UDP
+datagram stream and processes it into structured log output.
+
+A typical received log line (with the format above, tab-separated) looks like:
+
+```
+example.com	203.0.113.42	mg1	192.0.2.10	GET	/index.html	200	4321	0.003
+```
+
+The third field (`mg1`) comes from `$ipng_source_tag` — free per-device attribution in every log line.
+
+### Why this complements per-server `access_log`
+
+A conventional nginx access log requires the operator to repeat `access_log /path/to/file logtail;` in every `server {}` block that
+should be captured. This is error-prone: adding a new vhost and forgetting the directive means that vhost's traffic is silently absent
+from the log. `ipng_stats_logtail` is installed at the module's log-phase hook, which nginx calls for every request with no per-server
+configuration required.
+
+See [`config-guide.md`](config-guide.md#ipng_stats_logtail-format_name-udphostport-buffersize-flushduration) for the full directive
+reference and FR-8 for the requirements behind this feature.
+
+## 7. Integrate with scrape consumers
+
+The scrape endpoint (`ipng_stats;`) serves both Prometheus text and JSON from a single location. Any HTTP client that can issue a GET
+request can consume it. Two integration patterns are common:
+
+### Prometheus
+
+See section 5 above. Prometheus scrapes the endpoint at a configured interval and stores the time series. This is the simplest
+integration and covers most monitoring and alerting use cases.
+
+### Custom consumers
+
+The `?source_tag=<tag>` query parameter lets a consumer filter the scrape response to only the traffic attributed to a specific source.
+This is useful when multiple consumers share the same nginx backends — each consumer scrapes with its own tag and never sees the
+others' traffic.
+
+The JSON output (`Accept: application/json`) includes a top-level `schema` field for versioning, making it straightforward to parse
+from any language.
+
+Once wired, a consumer can derive from the scrape data:
+
+- Live QPS per backend (from the EWMA gauges).
+- Status-code mix per backend (from the counter families).
+- p50/p95 latency per backend (from the duration histogram).
+- Traffic volume per backend (from the bytes counters).
+
+For an example of this pattern in a GRE tunnel fleet, see [`vpp-maglev`](https://git.ipng.ch/ipng/vpp-maglev), whose frontend scrapes
+each nginx backend filtered by source tag to show per-backend traffic alongside health state.
+
+## Troubleshooting
+
+**`nginx -t` reports "unknown listen parameter: device=" or "unknown listen parameter: ipng_source_tag=".** The module isn't loaded.
+Check `/etc/nginx/modules-enabled/` for the `50-mod-http-ipng-stats.conf` symlink and re-run `nginx -t`.
+
+**All traffic is attributed to `direct` even though device-bound interfaces exist.** The interface names don't match the `device=`
+values in `listens.conf`, or the interfaces aren't up. Run `ip -br link` and confirm the interface names match.
+
+**Counters reset after every reload.** They should survive `nginx -s reload`. If they don't, check that the `ipng_stats_zone` name in
+`nginx.conf` is stable across reloads — renaming the zone forces a new shared-memory segment.
+
+**`nginx_ipng_zone_full_events_total` is non-zero.** The shared-memory zone is too small for your VIP count. Increase the size in
+`ipng_stats_zone ipng:<size>` (default 4 MB is enough for ~hundreds of VIPs with the full status-code set).
+
+**`curl http://127.0.0.1:9113/.well-known/ipng/statsz` returns "403 Forbidden".** The `allow`/`deny` ACL is blocking your source address. Either add
+yourself or scrape from a host already in the allow list.
+
+## Where to go next
+
+- [`config-guide.md`](config-guide.md) — every directive and `listen` parameter with contexts, allowed values, and defaults.
+- [`design.md`](design.md) — full design document, including the attribution model, hot-path cost analysis, and failure modes.