Add ngx_http_ipng_stats_module: per-VIP, per-device traffic counters

Full implementation of the nginx dynamic module with:
- SO_BINDTODEVICE-based per-interface traffic attribution
- Per-worker lock-free counters flushed to shared memory
- Prometheus text and JSON scrape endpoint at configurable location
- UDP-only global logtail (ipng_stats_logtail) for fire-and-forget
  access log streaming
- $ipng_source_tag nginx variable for use in log_format/map
- Histogram buckets, EWMA rate gauges, zone meta-metrics
- Debian packaging (libnginx-mod-http-ipng-stats)
- Robot Framework end-to-end tests via containerlab
- SPDX Apache-2.0 headers on all source files
This commit is contained in:
2026-04-16 17:36:42 +02:00
parent c05bcf6aa6
commit 5a7e2f77f1
25 changed files with 4016 additions and 102 deletions

290
docs/config-guide.md Normal file
View File

@@ -0,0 +1,290 @@
<!-- SPDX-License-Identifier: Apache-2.0 -->
# nginx-ipng-stats-plugin — Configuration Reference
This document enumerates every directive and `listen` parameter introduced by `ngx_http_ipng_stats_module`, the nginx contexts in which
each is legal, the allowed values, and the default (NFR-7.2). For an end-to-end walkthrough read [`user-guide.md`](user-guide.md); for
the reasoning behind the design read [`design.md`](design.md).
## `listen` parameters
These extend the stock nginx `listen` directive. They are parsed by the module and stripped from `cf->args` before the original handler
is invoked, so they compose with every standard `listen` parameter (`ssl`, `http2`, `default_server`, `reuseport`, etc.).
### `device=<ifname>`
**Context:** `listen` directive (wherever `listen` itself is legal — typically inside `server { ... }`).
**Value:** a Linux interface name, e.g. `gre-mg1`, `eth0`. Maximum `IFNAMSIZ - 1` characters (15 on current kernels).
**Default:** not set (plain listen).
**Effect:** the resulting listening socket has `SO_BINDTODEVICE` applied at init-module time, making the kernel accept only connections
whose ingress interface is `<ifname>`. Combined with a wildcard listen address (`80`, `[::]:80`) this is the mechanism by which the
plugin attributes traffic to a specific ingress interface.
The `setsockopt(SO_BINDTODEVICE)` call runs in the nginx master process while it still holds its initial privileges — workers never
call it, and no additional Linux capability is required beyond what stock nginx already has (NFR-6.1).
See FR-1.1, FR-1.5, FR-1.6.
### `ipng_source_tag=<tag>`
**Context:** `listen` directive.
**Value:** a short opaque string identifying the traffic source. No length limit is enforced, but keep it ≤ 32 characters
for readable metric output.
**Default:** when `ipng_source_tag=` is absent but `device=X` is set, the tag defaults to the interface name `X` (FR-1.4). When both
are absent, the tag defaults to the value of `ipng_stats_default_source` at the enclosing `http` level.
**Effect:** every counter recorded on this listener carries `source_tag=<tag>` as a Prometheus label and as the outer key in the JSON
output. Scrape consumers can use this tag to filter the response to only the traffic they delivered. To obtain the VIP address in
nginx config (e.g. in `log_format` or `map`), use nginx's built-in `$server_addr` variable.
See FR-1.2, FR-1.3, FR-1.4.
## `http`-level directives
All plugin-wide settings live in the `http { ... }` block. They cannot be overridden in inner contexts.
### `ipng_stats_zone <name>:<size>`
**Context:** `http`.
**Value:** `<name>` is a string identifier for the shared-memory zone; `<size>` is an nginx size spec with `k` or `m` suffix.
**Default:** none — the directive is mandatory if the module is loaded.
**Effect:** allocates a shared-memory zone of `<size>` bytes to hold the counter hash table. The `<name>` must be stable across
`nginx -s reload` — renaming it forces a fresh segment, which is the one situation where counters reset without a master restart.
**Sizing guidance:** the dominant factor in zone size is `~60 keys per (source, vip)` (one per observed status code). A host serving
50 VIPs behind 4 source interfaces uses `4 × 50 × 60 ≈ 12000` keys, each a few hundred bytes. The default-sized `4m` zone comfortably fits that.
If the zone fills, the module drops new keys and increments `nginx_ipng_zone_full_events_total` — resize and reload.
See FR-5.1, NFR-3.1.
### `ipng_stats_flush_interval <duration>`
**Context:** `http`.
**Value:** an nginx duration string, e.g. `500ms`, `1s`, `2s`.
**Default:** `1s`.
**Minimum:** `100ms`.
**Effect:** sets the cadence of the per-worker flush timer that moves private counter deltas into the shared-memory zone. Lower values
reduce the window of data loss if a worker crashes; higher values reduce the number of atomic adds on the shared zone. The default
is sized so that a scrape interval of 515 s sees effectively no lag.
See FR-4.2, FR-5.2.
### `ipng_stats_default_source <tag>`
**Context:** `http`.
**Value:** a short string; see `ipng_source_tag=` above for conventions.
**Default:** `direct`.
**Effect:** sets the tag applied to listening sockets that have neither `device=` nor `ipng_source_tag=`. A host serving a mix of device-attributed
and direct web traffic will see direct traffic under this tag in the scrape output. Rename it to `public`, `localnet`, or anything else
that reads better for your deployment.
See FR-1.3, FR-5.3.
### `ipng_stats_buckets <ms> <ms> <ms> ...`
**Context:** `http`.
**Value:** two or more positive integers, strictly increasing, representing histogram bucket upper bounds in milliseconds.
**Default:** `1 5 10 25 50 100 250 500 1000 2500 5000 10000`, plus an implicit `+Inf` bucket.
**Effect:** overrides the default histogram bucket boundaries for both `request_duration` and `upstream_response_time` histograms. The
same set applies to every `(source, vip)` key in the module (v0.1 does not support per-key override; see
[`design.md`](design.md#decisions-deferred-post-v01)).
See FR-2.3, FR-5.4.
### `ipng_stats on | off`
**Context:** `http`, `server`, `location`.
**Value:** boolean (`on` or `off`).
**Default:** `on` at the `http` level when the module is loaded.
**Effect:** opts a context into or out of counting. Cost of a disabled context is one branch in the log-phase handler. A location
serving the `ipng_stats` scrape handler is automatically excluded from counting regardless of this directive — scraping the scrape
endpoint does not inflate its own counters.
See FR-5.5.
### `ipng_stats_logtail <format_name> udp://<host>:<port> [buffer=<size>] [flush=<duration>]`
**Context:** `http`.
**Value:** `<format_name>` is the name of an existing `log_format` defined earlier in the same `http` block. The destination MUST be a
`udp://host:port` URI. `buffer=<size>` is an optional nginx size spec (default `64k`, minimum `1k`). `flush=<duration>` is an optional
nginx duration string (default `1s`, minimum `100ms`).
**Default:** not set — the directive is optional. When absent, no global logtail output is written.
**Effect:** registers a global log-phase writer that fires unconditionally for every request, regardless of `server` or `location`
context. The named `log_format` is looked up from nginx's log module at configuration time; nginx's standard variable-expansion
machinery renders each line, so any variable usable in a regular `log_format` — including `$ipng_source_tag` and `$server_addr` — is
available here.
Each worker maintains a private in-memory write buffer of `buffer=<size>` bytes. Each buffer flush is transmitted as a single
`sendto()` call on a per-worker `SOCK_DGRAM` socket that is opened at worker init and closed at worker exit. The address is resolved
once at configuration time — there is no DNS lookup at flush time. The buffer is flushed when:
- the buffer is full (immediate flush, no lines are dropped);
- the `flush=<duration>` timer fires (periodic flush); or
- the worker exits during a graceful reload or shutdown (final flush).
This covers all request traffic with a single directive at the `http` level, eliminating the need to repeat `access_log` in every
`server` block. It is particularly useful when the format includes `$ipng_source_tag` and `$server_addr`, giving per-device attribution
in every log line at no extra configuration cost.
File-based access logging is intentionally not supported by this directive — use nginx's built-in `access_log` directive for that.
```nginx
log_format logtail '$host\t$remote_addr\t$ipng_source_tag\t$server_addr\t'
'$request_method\t$request_uri\t$status\t$body_bytes_sent\t'
'$request_time';
ipng_stats_logtail logtail udp://127.0.0.1:9514 buffer=16k flush=1s;
```
**Constraints and behavior:**
- `host` MUST be a literal IPv4 address. Hostnames and IPv6 addresses are not supported in v0.1.
- Each flush emits a single UDP datagram. At the default `buffer=64k` size, datagram payloads comfortably fit within the ~64 KB
loopback MTU. Operators using very large buffers on non-loopback paths should be aware of path MTU limits.
- If no receiver is listening, the kernel silently discards the datagram. The worker receives no error and is not blocked. This is
intentional: the logtail is a fire-and-forget analytics transport — zero disk I/O and no backpressure are the point.
- There is no acknowledgment, no retry, and no sequence number. Datagrams lost in transit or because the receiver is down are
permanently lost.
**Receiver side:** any UDP server works. Two minimal examples:
```bash
# Quick inspection with netcat:
nc -u -l 127.0.0.1 9514
# Production Go receiver snippet:
conn, _ := net.ListenPacket("udp", ":9514")
buf := make([]byte, 65536)
for {
n, _, _ := conn.ReadFrom(buf)
process(buf[:n])
}
```
See FR-8.1, FR-8.2, FR-8.3, FR-8.4.
### `ipng_stats;` (scrape handler)
**Context:** `location`.
**Value:** no argument. Placed on its own line inside a `location` block.
**Default:** not set.
**Effect:** turns the enclosing location into the module's scrape handler. No other content handler (`proxy_pass`, `root`, `return`,
`fastcgi_pass`, ...) may be combined with `ipng_stats;` in the same location. The handler honors:
- `Accept:` header — `application/json` for JSON, anything else for Prometheus text.
- `?source_tag=<tag>` — filter output to only counters whose `source_tag` dimension equals the tag. Exact match, case-sensitive.
- `?vip=<address>` — filter output to only counters whose `vip` dimension equals the canonicalized address.
Filters MAY be combined; their effect is the intersection.
**Security:** the module does not ship authentication. Place an `allow`/`deny` ACL in the same `location` block (or its enclosing
`server`) to control access (NFR-6.2).
See FR-3.1, FR-3.2, FR-3.3, FR-3.4, FR-3.5.
## Metric names
For Prometheus, the module exports under the `nginx_ipng_` prefix.
| metric | type | labels | meaning |
| --- | --- | --- | --- |
| `nginx_ipng_requests_total` | counter | `source_tag`, `vip`, `code` | Request count per `(source, vip, status_code)`. |
| `nginx_ipng_bytes_in_total` | counter | `source_tag`, `vip`, `code` | Request bytes received (request line + headers + body). |
| `nginx_ipng_bytes_out_total` | counter | `source_tag`, `vip`, `code` | Response bytes sent (status line + headers + body). |
| `nginx_ipng_request_duration_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Request duration histogram (Prometheus shape). |
| `nginx_ipng_request_duration_seconds_sum` | histogram sum | `source_tag`, `vip` | Sum of observed durations in seconds. |
| `nginx_ipng_request_duration_seconds_count` | histogram count | `source_tag`, `vip` | Count of observations. |
| `nginx_ipng_upstream_response_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Upstream response time histogram. |
| `nginx_ipng_upstream_response_seconds_sum` | histogram sum | `source_tag`, `vip` | |
| `nginx_ipng_upstream_response_seconds_count` | histogram count | `source_tag`, `vip` | |
| `nginx_ipng_rate_1s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 1-second decay. |
| `nginx_ipng_rate_10s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 10-second decay. |
| `nginx_ipng_rate_60s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 60-second decay. |
| `nginx_ipng_zone_bytes_used` | gauge | — | Shared-memory zone bytes currently allocated. |
| `nginx_ipng_zone_bytes_total` | gauge | — | Shared-memory zone capacity in bytes. |
| `nginx_ipng_zone_full_events_total` | counter | — | Number of key insertions dropped because the zone was full. |
| `nginx_ipng_flushes_total` | counter | `worker` | Number of per-worker flush ticks executed. |
| `nginx_ipng_flush_duration_seconds` | histogram | `worker` | Histogram of flush durations. |
| `nginx_ipng_scrape_duration_seconds` | histogram | — | Histogram of scrape handler runtimes. |
See FR-2.*, FR-3.7.
## JSON output shape
```json
{
"schema": 1,
"by_source": {
"mg1": {
"vips": {
"192.0.2.10": {
"rate_1s": 42.3,
"rate_10s": 40.1,
"rate_60s": 39.8,
"codes": {
"200": { "requests": 12345, "bytes_in": 9876543, "bytes_out": 54321098 },
"404": { "requests": 17, "bytes_in": 2048, "bytes_out": 9216 }
},
"request_duration_ms": {
"buckets": { "1": 10, "5": 40, "10": 120, "25": 350, "50": 870, "100": 2100,
"250": 3400, "500": 4000, "1000": 4100, "2500": 4120,
"5000": 4123, "10000": 4124, "+Inf": 4124 },
"sum_ms": 87654,
"count": 4124
},
"upstream_response_ms": { "...": "..." }
}
}
}
},
"meta": {
"zone_bytes_used": 131072,
"zone_bytes_total": 4194304,
"zone_full_events": 0
}
}
```
The top-level `schema` field is versioned — breaking changes bump it, additive changes don't. Consumers SHOULD check `schema`
before parsing.
See FR-3.6.
## Context summary
| knob | `http` | `server` | `location` | `listen` |
| --- | --- | --- | --- | --- |
| `ipng_stats_zone` | ✅ | — | — | — |
| `ipng_stats_flush_interval` | ✅ | — | — | — |
| `ipng_stats_default_source` | ✅ | — | — | — |
| `ipng_stats_buckets` | ✅ | — | — | — |
| `ipng_stats_logtail` | ✅ | — | — | — |
| `ipng_stats on\|off` | ✅ | ✅ | ✅ | — |
| `ipng_stats;` (handler) | — | — | ✅ | — |
| `device=<ifname>` | — | — | — | ✅ |
| `ipng_source_tag=<tag>` | — | — | — | ✅ |

View File

@@ -1,4 +1,5 @@
# nginx-vpp-maglev-plugin Design Document
<!-- SPDX-License-Identifier: Apache-2.0 -->
# nginx-ipng-stats-plugin Design Document
## Metadata
@@ -7,7 +8,7 @@
| **Status** | Draft — describes intended behavior for `v0.1.0` |
| **Author** | Pim van Pelt `<pim@ipng.ch>` |
| **Last updated** | 2026-04-16 |
| **Audience** | Operators and contributors building the nginx-side observability half of `vpp-maglev` |
| **Audience** | Operators and contributors deploying per-device, per-VIP traffic observability on nginx |
The key words **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**, and **MAY** are used as described in
[RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119), and are reserved in this document for requirements that are intended to be
@@ -16,60 +17,52 @@ lowercase — "can", "will", "does" — and should not be read as normative.
## Summary
`nginx-vpp-maglev-plugin` is a dynamic nginx module and its surrounding Debian packaging. Loaded into stock upstream nginx, the module records
per-VIP traffic counters — requests, status codes, bytes, latency — and attributes them to the specific `vpp-maglev` instance whose GRE
tunnel delivered each connection. A small HTTP scrape endpoint exposes the counters as both Prometheus text and JSON so that
`maglevd-frontend`, Prometheus, and ad-hoc `curl` sessions can all read the same data. The module is the nginx-side answer to the open
question in [`vpp-maglev/docs/design.md`](../../vpp-maglev/docs/design.md) about per-backend traffic counters: VPP's `lb` plugin bypasses
the FIB and cannot produce them, so the backends report what they see.
`nginx-ipng-stats-plugin` is a dynamic nginx module and its surrounding Debian packaging. Loaded into stock upstream nginx, the module
records per-VIP traffic counters — requests, status codes, bytes, latency — and attributes them to the specific interface on which each
connection arrived. A small HTTP scrape endpoint exposes the counters as both Prometheus text and JSON so that Prometheus, custom
dashboards, and ad-hoc `curl` sessions can all read the same data.
## Background
`vpp-maglev` programs VPP's `lb` plugin so that traffic hashed to a VIP lands on a pool of healthy Application Servers (ASes). For the
deployment this module targets, every AS is an nginx instance receiving GRE-encapsulated traffic from one or more `maglevd` daemons,
decapsulating it, and terminating or proxying HTTP and HTTPS as it would for any other inbound client.
Any deployment where traffic arrives on distinct Linux interfaces — GRE tunnels, VLANs, VXLANs, bonded links, or plain ethernet — can
benefit from per-interface traffic visibility. The nginx instances that serve the traffic already observe everything an operator wants to
see — they are the authoritative source for request rate, response code mix, bytes moved, and latency distributions. A small in-process
module emits those numbers on an HTTP endpoint, and consumers scrape the data filtered by source tag.
The design document for `vpp-maglev` identifies **per-AS traffic counters** as an explicit open question: VPP's `lb` fast path bypasses
the FIB, so VPP exposes per-VIP counters in the stats segment but not per-backend ones. An operator looking at the `maglevd-frontend`
status page for a frontend with four backends can see the frontend's aggregate packet rate but not which backend is carrying how much of
it, which errors are concentrated on which backend, or whether one backend's p95 latency is drifting.
This project closes that gap from the opposite end. The nginx instances that serve the traffic already observe everything an operator
wants to see — they are the authoritative source for request rate, response code mix, bytes moved, and latency distributions. A small
in-process module emits those numbers on an HTTP endpoint, and `maglevd-frontend` fans out to the backends of each frontend and aggregates
the result into the existing status page.
One motivating use case is [`vpp-maglev`](https://git.ipng.ch/ipng/vpp-maglev), where each load-balancer instance terminates a GRE
tunnel on the nginx host. The module attributes traffic per tunnel, letting the frontend show per-backend counters that VPP's fast path
cannot provide. But the module is not coupled to that use case — it works with any interface type and any consumer.
## Goals and Non-Goals
### Product Goals
1. **Per-VIP, per-maglev traffic visibility.** For each VIP, the module records request count, status-code distribution, bytes in and out,
and request-duration histograms, split by which `maglevd` instance delivered the traffic.
1. **Per-VIP, per-device traffic visibility.** For each VIP, the module records request count, status-code distribution, bytes in and
out, and request-duration histograms, split by which interface delivered the traffic.
2. **Negligible hot-path cost.** At steady state, a request traversing an nginx worker with the module loaded pays at most a handful of
non-atomic integer increments and a histogram bucket update. No locks, no allocations, no system calls.
3. **Two readers, one endpoint.** A single HTTP location serves both Prometheus text and JSON, so a site running Prometheus and a site
using only the `maglevd-frontend` UI can both consume the module without extra configuration.
using a custom consumer can both consume the module without extra configuration.
4. **Packaging as a dynamic module.** The module builds with nginx's `--with-compat` ABI and ships as a Debian package that loads into
stock upstream nginx without recompiling nginx itself.
5. **Composable with normal nginx use.** A host running the module as a maglev backend **and** serving unrelated direct web traffic on the
same ports MUST remain a correct nginx deployment. The module MUST NOT change the semantics of any existing directive; it only adds new
parameters and directives that are no-ops when unused.
5. **Composable with normal nginx use.** A host running the module with device-bound listeners **and** serving unrelated direct web
traffic on the same ports MUST remain a correct nginx deployment. The module MUST NOT change the semantics of any existing directive;
it only adds new parameters and directives that are no-ops when unused.
6. **Graceful reload.** An `nginx -s reload` MUST NOT reset counters, lose history, or drop in-flight connections from the module's point
of view.
### Non-Goals
- The module is **not** a generic nginx metrics exporter. It does not aim to replace `nginx-module-vts`, `ngx_http_stub_status`, or
`nginx-lua-prometheus`. Its metric set is deliberately narrow and shaped by the `maglevd-frontend` status page.
`nginx-lua-prometheus`. Its metric set is deliberately narrow: per-VIP, per-device counters, histograms, and rate gauges.
- The module does **not** terminate TLS, rewrite headers, or alter the request in any way. It is observation-only.
- The module does **not** talk to `maglevd` directly. It does not initiate gRPC, it does not read maglev configuration, and it does not
know which maglev instance owns which VIP. The attribution tag it emits is a string supplied by the operator in the `listen` directive;
nothing more.
- The module does **not** talk to any external daemon. It does not initiate gRPC or read any external configuration. The attribution tag
it emits is a string supplied by the operator in the `listen` directive; nothing more.
- The module does **not** provide per-client-IP, per-path, or per-User-Agent counters. Those dimensions explode cardinality and belong in
access logs and existing log-analysis tools.
- The module does **not** provide persistent storage. Counters live in shared memory for the lifetime of the nginx master process; on
restart they start at zero. Consumers who need historical retention SHOULD scrape it from Prometheus.
- The module does **not** own the GRE tunnels, the VIP addresses, or the `SO_BINDTODEVICE` privilege. Tunnel creation, VIP binding, and
- The module does **not** own the interfaces, the VIP addresses, or the `SO_BINDTODEVICE` privilege. Interface creation, VIP binding, and
nginx master privileges are the operator's responsibility.
## Requirements
@@ -83,11 +76,11 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
- **FR-1.1** The module MUST support a new parameter on the nginx `listen` directive, `device=<ifname>`, which causes the resulting
listening socket to be created with `SO_BINDTODEVICE` set to the named interface. A listen directive without `device=` MUST create a
plain listening socket as stock nginx does.
- **FR-1.2** The module MUST support a new parameter on the nginx `listen` directive, `source=<tag>`, which attaches a short string tag to
- **FR-1.2** The module MUST support a new parameter on the nginx `listen` directive, `ipng_source_tag=<tag>`, which attaches a short string tag to
the listening socket. The tag is the dimension the scrape endpoint exports for every counter that came in on that listener.
- **FR-1.3** A listening socket with neither `device=` nor `source=` MUST be tagged with the configured default source string (see
- **FR-1.3** A listening socket with neither `device=` nor `ipng_source_tag=` MUST be tagged with the configured default source string (see
`ipng_stats_default_source`, FR-5.3). The default default is the literal string `direct`.
- **FR-1.4** A listening socket with `device=X` but no `source=` MUST be tagged with the interface name `X`.
- **FR-1.4** A listening socket with `device=X` but no `ipng_source_tag=` MUST be tagged with the interface name `X`.
- **FR-1.5** Two `listen` directives that share `address:port` but differ in `device=` MUST coexist, and the kernel's TCP socket lookup
rules MUST be relied on to dispatch each SYN to the most specific match. The module MUST NOT attempt to duplicate this logic in
userspace.
@@ -122,14 +115,14 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
- **FR-3.2** The `ipng_stats` handler MUST support content negotiation via the `Accept` request header:
- `Accept: application/json` → JSON output.
- `Accept: text/plain` (or anything else, including absent) → Prometheus text exposition format.
- **FR-3.3** The handler MUST support a `source=<tag>` query parameter that filters the output to only counters whose source dimension
- **FR-3.3** The handler MUST support a `source_tag=<tag>` query parameter that filters the output to only counters whose source dimension
equals the supplied tag. The comparison is exact-match and case-sensitive.
- **FR-3.4** The handler MUST support a `vip=<address>` query parameter that filters the output to only counters whose VIP dimension
equals the supplied address. The comparison uses the canonicalized form of FR-2.5.
- **FR-3.5** Both filters MAY be supplied together; their effect is the intersection.
- **FR-3.6** The JSON schema MUST be documented in `docs/scrape-api.md` and MUST version via a top-level `schema` field so that breaking
changes can be made additively without bricking existing consumers.
- **FR-3.7** The Prometheus text output MUST use stable metric names prefixed with `nginx_ipng_` and MUST label every series with `source`
- **FR-3.7** The Prometheus text output MUST use stable metric names prefixed with `nginx_ipng_` and MUST label every series with `source_tag`
and `vip`. Counter metrics additionally carry a `code` label.
**FR-4 Hot path and flush**
@@ -148,28 +141,60 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
default); `size` is a size with suffix (`k`, `m`). The directive is mandatory if the module is loaded.
- **FR-5.2** `ipng_stats_flush_interval <duration>` at the `http` level sets the worker flush cadence. Default `1s`. Minimum `100ms`.
- **FR-5.3** `ipng_stats_default_source <tag>` at the `http` level sets the tag applied to listening sockets that have neither `device=`
nor `source=`. Default `direct`.
nor `ipng_source_tag=`. Default `direct`.
- **FR-5.4** `ipng_stats_buckets <ms ms ms ...>` at the `http` level overrides the default histogram bucket boundaries. Values MUST be
strictly increasing positive integers.
- **FR-5.5** `ipng_stats on|off` at the `http`, `server`, or `location` level opts a context into or out of counting. Default `on` at the
`http` level when the module is loaded. A location serving the `ipng_stats` handler MUST NOT have itself counted (the module
automatically sets `off` for the scrape location).
**FR-6 Packaging**
**FR-6 Variables**
- **FR-6.1** The module MUST build as a dynamic module using nginx's `--with-compat --add-dynamic-module=...` flow, against the nginx-dev
- **FR-6.1** The module MUST register an nginx variable `$ipng_source_tag` that resolves to the source tag of the listening socket that
accepted the current connection. For device-bound listeners this is the `ipng_source_tag=` value (or the `device=` name if
`ipng_source_tag=` was not set); for wildcard fallback listeners this is the value of `ipng_stats_default_source`. The variable is
usable in `log_format`, `map`, `add_header`, `if`, and any other nginx context that accepts variables.
- **FR-6.2** `$ipng_source_tag` MUST be available unconditionally when the module is loaded, even if `ipng_stats_zone` is not
configured. It does not depend on the counter subsystem; it only depends on the listen-parameter parsing. Operators who need the VIP
address should use nginx's built-in `$server_addr` variable.
**FR-7 Packaging**
- **FR-7.1** The module MUST build as a dynamic module using nginx's `--with-compat --add-dynamic-module=...` flow, against the nginx-dev
headers of the target Debian release, so that the resulting `.so` loads into stock upstream nginx on that release without rebuilding
nginx itself.
- **FR-6.2** The module MUST ship as a Debian package named `libnginx-mod-http-ipng-stats`, following the `libnginx-mod-http-*` naming
- **FR-7.2** The module MUST ship as a Debian package named `libnginx-mod-http-ipng-stats`, following the `libnginx-mod-http-*` naming
convention used by existing third-party nginx modules packaged for Debian.
- **FR-6.3** The package MUST install:
- **FR-7.3** The package MUST install:
- `/usr/lib/nginx/modules/ngx_http_ipng_stats_module.so`
- `/etc/nginx/modules-available/50-mod-http-ipng-stats.conf` containing the `load_module` directive.
- A symlink `/etc/nginx/modules-enabled/50-mod-http-ipng-stats.conf → ../modules-available/50-mod-http-ipng-stats.conf` created in the
package's postinst.
- **FR-6.4** The package postinst MUST run `nginx -t` after installing the module. If the test fails, postinst MUST remove the
- **FR-7.4** The package postinst MUST run `nginx -t` after installing the module. If the test fails, postinst MUST remove the
`modules-enabled` symlink and report a non-fatal warning so that a broken upgrade does not leave the operator's nginx unable to start.
**FR-8 Logtail**
- **FR-8.1** The module MUST support an `ipng_stats_logtail <format_name> udp://host:port [buffer=<size>] [flush=<duration>]` directive
at the `http` level that registers a global log-phase writer which fires unconditionally for every request, regardless of which
`server` or `location` block handled it. One directive at the `http` level is sufficient to cover all vhosts — operators MUST NOT be
required to repeat an `access_log` directive in every `server` block to achieve a single global access log.
- **FR-8.2** The `<format_name>` argument MUST be the name of an existing nginx `log_format` defined in the same `http` block before
this directive. The module MUST look up the compiled log format from nginx's log module at configuration time and use it to render each
log line at request time. The module MUST NOT define its own format language; all `$variable` expansion is handled by nginx's standard
log-format machinery, including `$ipng_source_tag` and `$server_addr`.
- **FR-8.3** Each worker MUST buffer log lines in a per-worker memory buffer before transmitting them as UDP datagrams. The buffer size
is controlled by the optional `buffer=<size>` parameter (default `64k`, minimum `1k`). The buffer MUST be flushed when it is full,
when the optional `flush=<duration>` timer fires (default `1s`, minimum `100ms`), or when the worker exits. This ensures that a
graceful `nginx -s reload` or a clean worker shutdown transmits all buffered log entries.
- **FR-8.4** The destination argument of `ipng_stats_logtail` MUST be a `udp://host:port` URI, where `host` is a literal IPv4 address
(no hostnames, no IPv6 in v0.1). Each buffer flush is transmitted as a single `sendto()` call on a per-worker `SOCK_DGRAM` socket
opened at worker init and closed at worker exit. If no receiver is listening on the target address and port, the kernel silently
discards the datagram — no error is returned, no disk I/O occurs, and the worker is never blocked. Lost datagrams when no receiver is
present are intentional; the UDP transport is designed for fire-and-forget analytics pipelines where delivery guarantees are
unnecessary and zero disk I/O is preferred over persistence. File-based access logging is not supported by this directive — operators
should use nginx's built-in `access_log` for that purpose.
### Non-Functional Requirements
**NFR-1 Correctness under concurrency**
@@ -242,7 +267,7 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
- **NFR-7.1** The repository MUST ship a `docs/user-guide.md` that walks an operator through installing the Debian package, loading the
module, configuring a minimal end-to-end deployment (GRE tunnels, VIPs, `listen` lines, scrape endpoint), verifying that counters are
flowing, and integrating the scrape endpoint with both `maglevd-frontend` and a standalone Prometheus scraper. The user guide is the
flowing, and integrating the scrape endpoint with Prometheus and other consumers. The user guide is the
document an operator reads once to get from a freshly-installed package to a working, observable deployment.
- **NFR-7.2** The repository MUST ship a `docs/config-guide.md` that enumerates every directive and `listen` parameter introduced by the
module, together with the nginx configuration contexts (`http`, `server`, `location`, or `listen`) in which each is legal, the allowed
@@ -265,26 +290,31 @@ There is no daemon, no socket the module listens on, no control plane. Everythin
Requests enter nginx through one of two listener classes:
1. **Device-bound listeners** (`listen ... device=X source=Y`) accept only connections whose ingress interface is `X`. Each is tagged
1. **Device-bound listeners** (`listen ... device=X ipng_source_tag=Y`) accept only connections whose ingress interface is `X`. Each is tagged
with a source string `Y`.
2. **Wildcard fallback listeners** (`listen 80;`, `listen [::]:80;`) accept everything that didn't match a more specific listener. They
are tagged with the configured default source (FR-1.3).
During request processing nginx behaves exactly as it would without the module: no handler runs early, no header is rewritten. At log
phase, the module's log-phase handler increments the worker-local counter table keyed by `(source, vip, status_code)`.
phase, the module's log-phase handler runs two independent responsibilities:
1. **Counter update** — increments the worker-local counter table keyed by `(source, vip, status_code)`.
2. **Logtail write** — if `ipng_stats_logtail` is configured (FR-8), renders the named `log_format` for this request and appends the
resulting line to the per-worker write buffer. The buffer is flushed as a UDP datagram on a timer, when full, or on worker exit
(FR-8.3, FR-8.4). This runs for every request regardless of `server` or `location` context.
A per-worker timer, firing at the configured flush interval (FR-5.2), walks the dirty keys in the worker-local table and applies their
deltas to the shared-memory zone via atomic adds.
deltas to the shared-memory zone via atomic adds. The same timer triggers a logtail buffer flush if the flush duration has elapsed (FR-8.3).
The scrape handler, when invoked at `GET /ipng-stats` (or whatever location the operator chose), reads the shared-memory zone directly
The scrape handler, when invoked at `GET /.well-known/ipng/statsz` (or whatever location the operator chose), reads the shared-memory zone directly
and formats the output per the requested content type.
`maglevd-frontend` fetches the scrape endpoint of each backend in its configured fleet at roughly the same cadence it already uses for
maglevd state. It filters server-side via `?source=<its own tag>` so that it only sees the traffic it delivered. The aggregated view is
rendered alongside the existing maglev status page.
Scrape consumers fetch the endpoint at their configured cadence, optionally filtering via `?source_tag=<tag>` so that each consumer only
sees the traffic it delivered.
No component in this project writes to anything outside nginx's own memory. In particular, the module does not touch the file system,
does not emit log lines on the request path, and does not speak to any upstream.
Aside from the logtail output (FR-8) — which sends UDP datagrams to a configured receiver — no component in this
project writes to anything outside nginx's own memory. The module does not emit log lines on the request path for the counter subsystem
and does not speak to any upstream.
## Components
@@ -295,7 +325,7 @@ dynamic-module ABI.
#### Responsibilities
- Parse new `listen` parameters `device=` and `source=` and attach their values to each listening socket's config (FR-1.1, FR-1.2).
- Parse new `listen` parameters `device=` and `ipng_source_tag=` and attach their values to each listening socket's config (FR-1.1, FR-1.2).
- Call `setsockopt(SO_BINDTODEVICE)` in the master process at bind time for listeners that set `device=` (FR-1.1, NFR-6.1).
- Maintain per-worker private counter tables keyed by `(source_id, vip_id, status_code)` (FR-2.1, NFR-1.1).
- Run a per-worker flush timer that moves deltas into the shared-memory zone atomically (FR-4.2, NFR-1.2).
@@ -305,22 +335,22 @@ dynamic-module ABI.
#### Attribution Model
The module's single novel idea is that per-maglev attribution is done by the Linux kernel's TCP socket lookup, not by any userspace
inspection. Each `maglevd` instance terminates its GRE tunnel on a dedicated interface on the nginx host; the operator writes one
`listen ... device=<ifname> source=<tag>` line per `(family, tunnel)` pair. The kernel binds that listening socket with `SO_BINDTODEVICE`,
which causes it to match only connections whose ingress interface is that tunnel. A wildcard `listen 80;` and `listen [::]:80;` pair
provides the fallback for traffic arriving on any other interface — typically normal web traffic, not from maglev.
The module's single novel idea is that per-device attribution is done by the Linux kernel's TCP socket lookup, not by any userspace
inspection. Each traffic source that should be tracked separately terminates on a dedicated interface on the nginx host; the operator
writes one `listen ... device=<ifname> ipng_source_tag=<tag>` line per `(family, interface)` pair. The kernel binds that listening socket
with `SO_BINDTODEVICE`, which causes it to match only connections whose ingress interface is that device. A wildcard `listen 80;` and
`listen [::]:80;` pair provides the fallback for traffic arriving on any other interface — typically normal web traffic.
The kernel's TCP listener lookup prefers a more-specific (device-matching) listener over a less-specific (wildcard) one, so the fallback
and the device-bound listeners coexist without conflicts. The module does not need to duplicate this logic and does not try to.
Because the `device=` binding uses a wildcard address, the module does not need to know the set of VIPs served through each tunnel.
Because the `device=` binding uses a wildcard address, the module does not need to know the set of VIPs served through each interface.
Adding a VIP (binding an address to `lo` and writing a new `server_name` block) does not require touching the `listen` lines. Adding a
new maglev instance (a new GRE tunnel) does. This is the correct split: VIPs are vhost-level concerns and change often; maglev instances
are fleet-level concerns and change rarely.
new attributed interface does. This is the correct split: VIPs are vhost-level concerns and change often; interfaces are
infrastructure-level concerns and change rarely.
The design assumes GRE tunnels used as `device=` sources carry **only** maglev-originated traffic. Any other traffic arriving on such an
interface is silently misattributed to that maglev's source tag. This is a deployment invariant, not a defect.
The design assumes interfaces used as `device=` sources carry **only** traffic from the expected source. Any other traffic arriving on
such an interface is silently misattributed to that interface's source tag. This is a deployment invariant, not a defect.
#### Counter Data Model
@@ -344,7 +374,7 @@ endpoint can recover the original strings without re-parsing configuration.
String interning is capacity-bounded: the zone is sized by the operator, and once capacity is exhausted new keys are dropped with a
counter bump and an infrequent log line (NFR-3.1). In practice, the number of distinct VIPs on a single nginx host is small (tens, maybe
low hundreds), and the number of distinct source tags is the number of maglev instances (single digits). The dominant factor is
low hundreds), and the number of distinct source tags is the number of attributed interfaces (single digits). The dominant factor is
`status_code`; ~60 keys per VIP is a typical steady state.
#### Hot Path
@@ -408,7 +438,7 @@ The worker never walks the entire table — only dirty slots — so idle VIPs co
The `ipng_stats` handler is a leaf content handler. It:
1. Parses `?source=` and `?vip=` into exact-match filters.
1. Parses `?source_tag=` and `?vip=` into exact-match filters.
2. Parses `Accept:` to pick output format.
3. Walks the shared-memory zone under a shared lock (readers hold the read side of a rwlock; flushes and interners hold the write side
briefly).
@@ -423,10 +453,10 @@ fixed-size buffer per chain link and requests new links only when full.
- **One nginx content handler**, `ipng_stats`, usable in any `location` block. Serves Prometheus text and JSON, filtered by optional
query parameters.
- **Two new `listen` parameters**, `device=` and `source=`, usable anywhere a `listen` directive is used.
- **Two new `listen` parameters**, `device=` and `ipng_source_tag=`, usable anywhere a `listen` directive is used.
- **Five new `http`-level directives**: `ipng_stats_zone`, `ipng_stats_flush_interval`, `ipng_stats_default_source`,
`ipng_stats_buckets`, `ipng_stats` (on/off).
- **A Prometheus metric family** prefixed `nginx_ipng_*`, labelled `source`, `vip`, and (for request counters) `code`.
- **A Prometheus metric family** prefixed `nginx_ipng_*`, labelled `source_tag`, `vip`, and (for request counters) `code`.
**Consumes.**
@@ -442,10 +472,10 @@ Debian is the target and upstream nginx on Debian is the platform.
#### Responsibilities
- Build the module against the target release's nginx-dev headers with `--with-compat` (NFR-5.1, NFR-5.3).
- Install the compiled `.so` into `/usr/lib/nginx/modules` (FR-6.3).
- Install the compiled `.so` into `/usr/lib/nginx/modules` (FR-7.3).
- Drop a `load_module` stanza into `/etc/nginx/modules-available/` and enable it by default via a symlink in `modules-enabled/`
(FR-6.3).
- Sanity-check the resulting config with `nginx -t` in the postinst and back out cleanly if it fails (FR-6.4).
(FR-7.3).
- Sanity-check the resulting config with `nginx -t` in the postinst and back out cleanly if it fails (FR-7.4).
#### Build
@@ -476,12 +506,13 @@ No nginx binary is produced, shipped, or touched. The package is strictly additi
A typical deployment on a single nginx host looks like:
- One GRE tunnel per maglev instance, terminated on the nginx host by the operator's networking layer (systemd-networkd, Netplan, or a
hand-rolled interface config). Interface names follow a consistent pattern, typically `gre-<tag>` — e.g. `gre-mg1`, `gre-mg2`.
- VIPs bound to a local dummy or loopback interface so the kernel accepts inner packets destined for them.
- A hand-maintained `listen` include file with one device-bound listen per `(family, tunnel)` pair, reused across vhosts.
- One interface per traffic source that should be separately attributed (e.g. GRE tunnels, VLANs), set up by the operator's networking
layer (systemd-networkd, Netplan, or a hand-rolled interface config). Interface names follow a consistent pattern, typically
`gre-<tag>` — e.g. `gre-mg1`, `gre-mg2`.
- VIPs bound to a local dummy or loopback interface so the kernel accepts packets destined for them.
- A hand-maintained `listen` include file with one device-bound listen per `(family, interface)` pair, reused across vhosts.
- Fallback `listen 80;` and `listen [::]:80;` in whichever server blocks serve direct web traffic.
- A single scrape location, e.g. `location = /ipng-stats`, served from a locked-down server block that only allows the maglev fleet and
- A single scrape location, e.g. `location = /.well-known/ipng/statsz`, served from a locked-down server block that only allows scrape consumers and
the local Prometheus scraper.
### Configuration
@@ -497,7 +528,7 @@ http {
server {
listen 80;
listen [::]:80;
include /etc/nginx/ipng-maglev/listens.conf;
include /etc/nginx/ipng-stats/listens.conf;
server_name _;
# ... normal vhost content
@@ -505,17 +536,17 @@ http {
server {
listen 127.0.0.1:9113;
location = /ipng-stats {
location = /.well-known/ipng/statsz {
ipng_stats;
allow 127.0.0.1;
allow 2001:db8::/48; # maglev fleet
allow 2001:db8::/48; # scrape consumers
deny all;
}
}
}
```
`listens.conf` is eight lines (two families × four maglevs) and stable across vhost changes.
`listens.conf` is two lines per attributed interface (two address families each) and stable across vhost changes.
### Nginx Reload Semantics
@@ -550,15 +581,15 @@ some other endpoint.
- **nginx master crash / package upgrade.** The shared zone is torn down with the old master. When the new master starts, the zone is
recreated empty. Counters start from zero. Consumers that need history SHOULD read from Prometheus, which retains history across
restarts.
- **Device disappears.** If an operator removes a GRE tunnel without removing its `listen` line, nginx's bind will fail on the next
- **Device disappears.** If an operator removes an interface without removing its `listen` line, nginx's bind will fail on the next
reload and the reload will error cleanly. The module does not hide this; a failing `nginx -t` is the right answer.
- **Traffic on a wildcard listener that should have been device-bound.** The traffic is counted under `direct` (or the configured
default). This is detectable: if the operator expects zero traffic under `direct` and the dashboard shows non-zero, a maglev instance
is probably missing from the `listen` include.
default). This is detectable: if the operator expects zero traffic under `direct` and the dashboard shows non-zero, an interface is
probably missing from the `listen` include.
- **Slow scrape on a large zone.** Scrape cost is linear in the number of keys (NFR-2.3). On a host with a very large VIP count, the
operator SHOULD increase the flush interval, lower the scrape frequency, or both. The module does not cap scrape runtime.
- **Maglev frontend is down.** The module is unaffected; its counters continue to increment and the Prometheus scrape continues to work.
When the frontend comes back, it resumes fetching. No state is lost.
- **Scrape consumer is down.** The module is unaffected; its counters continue to increment and the Prometheus scrape continues to work.
When the consumer comes back, it resumes fetching. No state is lost.
### Security
@@ -586,18 +617,16 @@ some other endpoint.
decapsulation; the outer and inner conntrack entries are independent and mark does not cross. Even if tagging worked, `SO_MARK` on an
accepted socket does not reflect incoming packet or conntrack mark without a per-packet `libnetfilter_conntrack` lookup, which is too
heavy for a log-phase handler.
- **Attribution via multiple GRE tunnels and CONNMARK.** Rejected as strictly worse than `SO_BINDTODEVICE`: it still requires per-maglev
- **Attribution via multiple GRE tunnels and CONNMARK.** Rejected as strictly worse than `SO_BINDTODEVICE`: it still requires per-source
tunnels, still needs nginx to read the mark (hard), and adds a netfilter dependency. `SO_BINDTODEVICE` solves the same problem with
kernel primitives nginx already knows about.
- **Attribution via eBPF `SO_REUSEPORT` programs.** Rejected as dramatic overkill for a problem the kernel already solves for free via
socket-lookup specificity.
- **Per-VIP enumeration in `listen` directives.** Rejected in favor of wildcard `listen 80 device=gre-mg1;`. The wildcard form works
- **Per-VIP enumeration in `listen` directives.** Rejected in favor of wildcard `listen 80 device=gre-mg1 ipng_source_tag=mg1;`. The wildcard form works
because nginx routes by `server_name` post-accept, so the `listen` only needs to express `(port, device)` and does not need the VIP
address. This makes the generated include file size independent of the VIP count.
- **Pushing counters from the module into `maglevd` over gRPC.** Rejected. It inverts the wait-for graph (maglevd's design doc is
careful to keep the daemon free of callbacks from the backends), it complicates restart neutrality, and it adds a gRPC client to a C
module. Pull-based scrape keeps maglevd out of the traffic-metrics business, matches the doc's philosophy, and lets the frontend use
its existing per-server goroutine model.
- **Pushing counters to an external daemon over gRPC.** Rejected. It complicates restart neutrality and adds a gRPC client dependency to
a C module. Pull-based scrape is simpler: consumers fetch when they want, and the module has no outbound connections.
- **Shipping separate JSON and Prometheus handlers.** Rejected. Content negotiation on one handler is simpler to configure and serves
both audiences from one ACL.
@@ -609,5 +638,5 @@ some other endpoint.
- **TLS handshake metrics.** The module reports `request_duration` from the start of the HTTP request, not from TCP accept. For
TLS-terminating frontends a handshake-time fraction is invisible. Adding a `tls_handshake_duration` histogram is deferred until
operators ask for it.
- **`maglevd-frontend` fetch cadence.** Whichever cadence the frontend adopts for traffic counters — the existing ~one-second refresh,
or an SSE bridge layered on top — the plugin supports it. The choice is on the frontend side.
- **Consumer fetch cadence.** Whichever cadence a consumer adopts for traffic counters — a one-second refresh, a longer Prometheus
scrape interval, or an SSE bridge layered on top — the plugin supports it. The choice is on the consumer side.

384
docs/user-guide.md Normal file
View File

@@ -0,0 +1,384 @@
<!-- SPDX-License-Identifier: Apache-2.0 -->
# nginx-ipng-stats-plugin — User Guide
This document walks an operator through installing the plugin, deploying it on a single nginx host serving traffic that arrives on
distinct interfaces (GRE tunnels, VLANs, bonded links, or plain ethernet), verifying that counters are flowing, and hooking up the
scrape endpoint to Prometheus and other consumers.
It covers (NFR-7.1):
1. Installing the Debian package.
2. Setting up interfaces for per-device attribution (GRE tunnel example).
3. Writing a minimal nginx configuration.
4. Verifying with `curl`.
5. Scraping from Prometheus.
6. Setting up a global logtail access log.
7. Integrating with scrape consumers.
For a directive-by-directive reference, read [`config-guide.md`](config-guide.md) alongside this guide.
## 1. Install the package
On Debian Trixie (and newer), the module is distributed as `libnginx-mod-http-ipng-stats`. The package depends on the stock `nginx`
package and loads cleanly into it without recompiling nginx itself.
```
sudo apt install ./libnginx-mod-http-ipng-stats_0.1.0-1_amd64.deb
```
The package will:
- Drop `ngx_http_ipng_stats_module.so` into `/usr/lib/nginx/modules/`.
- Place a `load_module` stanza in `/etc/nginx/modules-available/50-mod-http-ipng-stats.conf`.
- Symlink it into `/etc/nginx/modules-enabled/` so nginx picks it up on the next reload.
- Run `nginx -t` and, if the test fails, remove the `modules-enabled` symlink and print a warning — so a broken upgrade never leaves
you with an nginx that cannot start.
Confirm the module is loaded:
```
nginx -V 2>&1 | grep -o ngx_http_ipng_stats_module
```
## 2. Set up interfaces for per-device attribution
The plugin attributes traffic by watching which interface the request came in on, using `SO_BINDTODEVICE` on per-interface listening
sockets. For this to work, each traffic source that should be tracked separately MUST arrive on its own interface.
This works with any kind of Linux interface — GRE tunnels, VLANs, VXLANs, bonded links, or plain ethernet. This guide uses GRE
tunnels as the example, but the module does not care about the interface type.
This guide doesn't prescribe a specific networking layer — use whatever your host already uses (`systemd-networkd`, Netplan,
`/etc/network/interfaces`, or a hand-rolled script). The only hard requirement is:
- Each traffic source that should be separately attributed gets its own interface on the nginx host.
- Interfaces follow a consistent naming pattern. For GRE tunnels we recommend `gre-<tag>`, e.g. `gre-mg1`, `gre-mg2`.
- The VIPs are bound to a local dummy or loopback interface so the kernel accepts packets destined for them.
For example, with `systemd-networkd`, a GRE tunnel to a remote peer at `2001:db8::1` from this host at `2001:db8::100` looks like:
```
# /etc/systemd/network/10-gre-mg1.netdev
[NetDev]
Name=gre-mg1
Kind=ip6gre
[Tunnel]
Local=2001:db8::100
Remote=2001:db8::1
TTL=64
```
```
# /etc/systemd/network/10-gre-mg1.network
[Match]
Name=gre-mg1
[Network]
LinkLocalAddressing=no
```
Repeat for each additional tunnel. A trimmed-down variant of this scheme is what IPng uses in production.
Verify the interfaces exist and carry traffic:
```
ip -6 tunnel show | grep gre-mg
ip -6 -s link show gre-mg1
```
## 3. Write the nginx configuration
The plugin needs three things in `nginx.conf`:
1. A shared-memory zone for counters (`ipng_stats_zone`).
2. A set of `listen` directives — a wildcard fallback plus one device-bound listener per attributed interface.
3. A scrape location serving the `ipng_stats` handler.
A minimal working configuration looks like this:
```nginx
load_module modules/ngx_http_ipng_stats_module.so;
events {
worker_connections 4096;
}
http {
ipng_stats_zone ipng:4m;
ipng_stats_flush_interval 1s;
ipng_stats_default_source direct;
# A normal vhost. The fallback listen lines serve direct web traffic;
# the included file adds one device-bound listen per attributed interface.
server {
listen 80;
listen [::]:80;
include /etc/nginx/ipng-stats/listens.conf;
server_name _;
root /var/www/html;
}
# A second server block exposing the scrape endpoint on a locked-down port.
server {
listen 127.0.0.1:9113;
listen [::1]:9113;
location = /.well-known/ipng/statsz {
ipng_stats;
allow 127.0.0.1;
allow ::1;
allow 2001:db8::/48; # your scrape consumers
deny all;
}
}
}
```
And `/etc/nginx/ipng-stats/listens.conf` — the hand-maintained include file — is two lines per attributed interface (one per address
family):
```nginx
listen 80 device=gre-mg1 ipng_source_tag=mg1;
listen [::]:80 device=gre-mg1 ipng_source_tag=mg1;
listen 80 device=gre-mg2 ipng_source_tag=mg2;
listen [::]:80 device=gre-mg2 ipng_source_tag=mg2;
listen 80 device=gre-mg3 ipng_source_tag=mg3;
listen [::]:80 device=gre-mg3 ipng_source_tag=mg3;
listen 80 device=gre-mg4 ipng_source_tag=mg4;
listen [::]:80 device=gre-mg4 ipng_source_tag=mg4;
```
Test and reload:
```
sudo nginx -t
sudo nginx -s reload
```
If `nginx -t` complains about an unknown `listen` parameter (`device=` or `ipng_source_tag=`), the module isn't loaded — check step 1.
### Why wildcard listens?
You do not need to enumerate VIPs in `listen`. A wildcard `listen 80 device=gre-mg1 ipng_source_tag=mg1;` accepts any local address
served through the `gre-mg1` interface, and nginx routes per-request to the right vhost by `server_name` / `Host:` header. Adding a new
VIP is a `server_name` change; adding a new interface is an append to `listens.conf`.
### Why both a wildcard and device-bound listens?
The fallback `listen 80;` / `listen [::]:80;` catches traffic arriving on any interface that isn't one of your attributed interfaces —
for example, real clients hitting your host directly over `eth0`. The kernel's TCP socket lookup prefers the most-specific
(device-matching) listener, so a SYN on `gre-mg1` always lands on the `mg1` socket, and a SYN on `eth0` always lands on the fallback.
No races, no stealing. Direct traffic is counted under the tag set by `ipng_stats_default_source` (`direct` by default).
## 4. Verify with curl
Generate some traffic (or wait for real traffic), then scrape the endpoint locally:
```
curl -s http://127.0.0.1:9113/.well-known/ipng/statsz
```
Default output is Prometheus text format:
```
# HELP nginx_ipng_requests_total Total HTTP requests, per (source_tag, vip, code).
# TYPE nginx_ipng_requests_total counter
nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="200"} 12345
nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="404"} 17
nginx_ipng_requests_total{source_tag="mg2",vip="192.0.2.10",code="200"} 9876
nginx_ipng_requests_total{source_tag="direct",vip="192.0.2.10",code="200"} 42
# HELP nginx_ipng_bytes_in_total Request bytes received, per (source_tag, vip, code).
# TYPE nginx_ipng_bytes_in_total counter
nginx_ipng_bytes_in_total{source_tag="mg1",vip="192.0.2.10",code="200"} 9876543
# ... and so on
```
For JSON output instead, set the `Accept` header:
```
curl -s -H 'Accept: application/json' http://127.0.0.1:9113/.well-known/ipng/statsz | jq .
```
To filter server-side to a single source tag:
```
curl -s 'http://127.0.0.1:9113/.well-known/ipng/statsz?source_tag=mg1'
curl -s 'http://127.0.0.1:9113/.well-known/ipng/statsz?source_tag=mg1&vip=192.0.2.10'
```
If you see `source_tag="direct"` entries with non-zero counts and you expected all traffic to come in via attributed interfaces,
something is routing around them — typically an interface that isn't in `listens.conf`, or an interface that's down.
## 5. Scrape from Prometheus
The same endpoint serves Prometheus text by default. Add a scrape job:
```yaml
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: nginx-ipng
scrape_interval: 15s
static_configs:
- targets:
- 'nginx-backend-1.example.com:9113'
- 'nginx-backend-2.example.com:9113'
metrics_path: /.well-known/ipng/statsz
```
You'll want to add `nginx-backend-*` to your `allow` rules in the scrape server block, or front the plugin with a TLS-terminating
reverse proxy. The module does not ship its own auth; the nginx `allow`/`deny` ACL is your access control.
Typical PromQL queries:
```
# Requests per second per source, per VIP:
sum by (source_tag, vip) (rate(nginx_ipng_requests_total[1m]))
# 5xx error rate per VIP, aggregated across all sources:
sum by (vip) (rate(nginx_ipng_requests_total{code=~"5.."}[5m]))
/
sum by (vip) (rate(nginx_ipng_requests_total[5m]))
# p95 request duration per (source_tag, vip):
histogram_quantile(0.95,
sum by (source_tag, vip, le) (rate(nginx_ipng_request_duration_seconds_bucket[5m])))
```
## 6. Set up a global logtail access log
Operators who want a single unified access log covering all traffic — regardless of which `server` block handled the request — normally
have to repeat `access_log` in every `server {}` block or rely on a catch-all virtual host. The `ipng_stats_logtail` directive removes
that requirement: one line at the `http` level registers a global log-phase writer that fires unconditionally for every request (FR-8.1).
The logtail sends each buffer flush as a single UDP datagram to a `host:port`. Zero disk I/O, no backpressure, no blocking if the
receiver is down. This makes it ideal for fire-and-forget analytics pipelines where delivery guarantees are unnecessary and disk writes
would add unwanted I/O pressure. For file-based access logging, use nginx's built-in `access_log` directive.
### Define the log format
Add a `log_format` declaration inside the `http { ... }` block, **before** the `ipng_stats_logtail` directive that references it:
```nginx
log_format logtail '$host\t$remote_addr\t$ipng_source_tag\t$server_addr\t'
'$request_method\t$request_uri\t$status\t$body_bytes_sent\t'
'$request_time';
```
Any nginx variable is usable here, including `$ipng_source_tag` (the device attribution tag, FR-6.1) and `$server_addr` (the VIP
that received the request).
### Configuration
```nginx
http {
ipng_stats_zone ipng:4m;
log_format logtail '$host\t$remote_addr\t$ipng_source_tag\t$server_addr\t'
'$request_method\t$request_uri\t$status\t$body_bytes_sent\t'
'$request_time';
ipng_stats_logtail logtail udp://127.0.0.1:9514 buffer=16k flush=1s;
server { ... }
}
```
- **`logtail`** (first argument) — the `log_format` name.
- **`udp://127.0.0.1:9514`** — destination as a `udp://host:port` URI. `host` must be a literal IPv4 address (no hostnames, no IPv6
in v0.1).
- **`buffer=16k`** — per-worker write buffer. Lines are held in memory until the buffer fills, the flush timer fires, or the worker
exits. Default is `64k`; minimum is `1k` (FR-8.3).
- **`flush=1s`** — maximum age of buffered data before it is sent. Default is `1s`; minimum is `100ms` (FR-8.3).
Each buffer flush becomes a single `sendto()` on a per-worker `SOCK_DGRAM` socket. When the flush timer fires (or the buffer fills),
the entire buffered payload is sent as one datagram — no file open, no `write()`, no `fsync()`. If no receiver is listening, the kernel
drops the datagram silently and the worker carries on. This is by design: the logtail exists for non-critical analytics pipes where
lost datagrams are acceptable and disk I/O is not.
**Constraints (v0.1):**
- `host` must be a literal IPv4 address. Hostnames and IPv6 are not supported yet.
- Large `buffer=` values produce large datagrams. On the loopback interface the practical ceiling is ~64 KB, well above typical
configured buffer sizes. On routed paths, path MTU applies.
- There is no acknowledgment, retry, or sequence number. If the receiver is down, the data is gone.
**Starting a receiver** is trivial:
```bash
# Quick one-shot inspection:
nc -u -l 127.0.0.1 9514
```
For a production-ready logtail consumer, see [`nginx-logtail`](https://git.ipng.ch/ipng/nginx-logtail), which receives the UDP
datagram stream and processes it into structured log output.
A typical received log line (with the format above, tab-separated) looks like:
```
example.com 203.0.113.42 mg1 192.0.2.10 GET /index.html 200 4321 0.003
```
The third field (`mg1`) comes from `$ipng_source_tag` — free per-device attribution in every log line.
### Why this complements per-server `access_log`
A conventional nginx access log requires the operator to repeat `access_log /path/to/file logtail;` in every `server {}` block that
should be captured. This is error-prone: adding a new vhost and forgetting the directive means that vhost's traffic is silently absent
from the log. `ipng_stats_logtail` is installed at the module's log-phase hook, which nginx calls for every request with no per-server
configuration required.
See [`config-guide.md`](config-guide.md#ipng_stats_logtail-format_name-udphostport-buffersize-flushduration) for the full directive
reference and FR-8 for the requirements behind this feature.
## 7. Integrate with scrape consumers
The scrape endpoint (`ipng_stats;`) serves both Prometheus text and JSON from a single location. Any HTTP client that can issue a GET
request can consume it. Two integration patterns are common:
### Prometheus
See section 5 above. Prometheus scrapes the endpoint at a configured interval and stores the time series. This is the simplest
integration and covers most monitoring and alerting use cases.
### Custom consumers
The `?source_tag=<tag>` query parameter lets a consumer filter the scrape response to only the traffic attributed to a specific source.
This is useful when multiple consumers share the same nginx backends — each consumer scrapes with its own tag and never sees the
others' traffic.
The JSON output (`Accept: application/json`) includes a top-level `schema` field for versioning, making it straightforward to parse
from any language.
Once wired, a consumer can derive from the scrape data:
- Live QPS per backend (from the EWMA gauges).
- Status-code mix per backend (from the counter families).
- p50/p95 latency per backend (from the duration histogram).
- Traffic volume per backend (from the bytes counters).
For an example of this pattern in a GRE tunnel fleet, see [`vpp-maglev`](https://git.ipng.ch/ipng/vpp-maglev), whose frontend scrapes
each nginx backend filtered by source tag to show per-backend traffic alongside health state.
## Troubleshooting
**`nginx -t` reports "unknown listen parameter: device=" or "unknown listen parameter: ipng_source_tag=".** The module isn't loaded.
Check `/etc/nginx/modules-enabled/` for the `50-mod-http-ipng-stats.conf` symlink and re-run `nginx -t`.
**All traffic is attributed to `direct` even though device-bound interfaces exist.** The interface names don't match the `device=`
values in `listens.conf`, or the interfaces aren't up. Run `ip -br link` and confirm the interface names match.
**Counters reset after every reload.** They should survive `nginx -s reload`. If they don't, check that the `ipng_stats_zone` name in
`nginx.conf` is stable across reloads — renaming the zone forces a new shared-memory segment.
**`nginx_ipng_zone_full_events_total` is non-zero.** The shared-memory zone is too small for your VIP count. Increase the size in
`ipng_stats_zone ipng:<size>` (default 4 MB is enough for ~hundreds of VIPs with the full status-code set).
**`curl http://127.0.0.1:9113/.well-known/ipng/statsz` returns "403 Forbidden".** The `allow`/`deny` ACL is blocking your source address. Either add
yourself or scrape from a host already in the allow list.
## Where to go next
- [`config-guide.md`](config-guide.md) — every directive and `listen` parameter with contexts, allowed values, and defaults.
- [`design.md`](design.md) — full design document, including the attribution model, hot-path cost analysis, and failure modes.