Full implementation of the nginx dynamic module with: - SO_BINDTODEVICE-based per-interface traffic attribution - Per-worker lock-free counters flushed to shared memory - Prometheus text and JSON scrape endpoint at configurable location - UDP-only global logtail (ipng_stats_logtail) for fire-and-forget access log streaming - $ipng_source_tag nginx variable for use in log_format/map - Histogram buckets, EWMA rate gauges, zone meta-metrics - Debian packaging (libnginx-mod-http-ipng-stats) - Robot Framework end-to-end tests via containerlab - SPDX Apache-2.0 headers on all source files
291 lines
13 KiB
Markdown
291 lines
13 KiB
Markdown
<!-- SPDX-License-Identifier: Apache-2.0 -->
|
||
# nginx-ipng-stats-plugin — Configuration Reference
|
||
|
||
This document enumerates every directive and `listen` parameter introduced by `ngx_http_ipng_stats_module`, the nginx contexts in which
|
||
each is legal, the allowed values, and the default (NFR-7.2). For an end-to-end walkthrough read [`user-guide.md`](user-guide.md); for
|
||
the reasoning behind the design read [`design.md`](design.md).
|
||
|
||
## `listen` parameters
|
||
|
||
These extend the stock nginx `listen` directive. They are parsed by the module and stripped from `cf->args` before the original handler
|
||
is invoked, so they compose with every standard `listen` parameter (`ssl`, `http2`, `default_server`, `reuseport`, etc.).
|
||
|
||
### `device=<ifname>`
|
||
|
||
**Context:** `listen` directive (wherever `listen` itself is legal — typically inside `server { ... }`).
|
||
|
||
**Value:** a Linux interface name, e.g. `gre-mg1`, `eth0`. Maximum `IFNAMSIZ - 1` characters (15 on current kernels).
|
||
|
||
**Default:** not set (plain listen).
|
||
|
||
**Effect:** the resulting listening socket has `SO_BINDTODEVICE` applied at init-module time, making the kernel accept only connections
|
||
whose ingress interface is `<ifname>`. Combined with a wildcard listen address (`80`, `[::]:80`) this is the mechanism by which the
|
||
plugin attributes traffic to a specific ingress interface.
|
||
|
||
The `setsockopt(SO_BINDTODEVICE)` call runs in the nginx master process while it still holds its initial privileges — workers never
|
||
call it, and no additional Linux capability is required beyond what stock nginx already has (NFR-6.1).
|
||
|
||
See FR-1.1, FR-1.5, FR-1.6.
|
||
|
||
### `ipng_source_tag=<tag>`
|
||
|
||
**Context:** `listen` directive.
|
||
|
||
**Value:** a short opaque string identifying the traffic source. No length limit is enforced, but keep it ≤ 32 characters
|
||
for readable metric output.
|
||
|
||
**Default:** when `ipng_source_tag=` is absent but `device=X` is set, the tag defaults to the interface name `X` (FR-1.4). When both
|
||
are absent, the tag defaults to the value of `ipng_stats_default_source` at the enclosing `http` level.
|
||
|
||
**Effect:** every counter recorded on this listener carries `source_tag=<tag>` as a Prometheus label and as the outer key in the JSON
|
||
output. Scrape consumers can use this tag to filter the response to only the traffic they delivered. To obtain the VIP address in
|
||
nginx config (e.g. in `log_format` or `map`), use nginx's built-in `$server_addr` variable.
|
||
|
||
See FR-1.2, FR-1.3, FR-1.4.
|
||
|
||
## `http`-level directives
|
||
|
||
All plugin-wide settings live in the `http { ... }` block. They cannot be overridden in inner contexts.
|
||
|
||
### `ipng_stats_zone <name>:<size>`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** `<name>` is a string identifier for the shared-memory zone; `<size>` is an nginx size spec with `k` or `m` suffix.
|
||
|
||
**Default:** none — the directive is mandatory if the module is loaded.
|
||
|
||
**Effect:** allocates a shared-memory zone of `<size>` bytes to hold the counter hash table. The `<name>` must be stable across
|
||
`nginx -s reload` — renaming it forces a fresh segment, which is the one situation where counters reset without a master restart.
|
||
|
||
**Sizing guidance:** the dominant factor in zone size is `~60 keys per (source, vip)` (one per observed status code). A host serving
|
||
50 VIPs behind 4 source interfaces uses `4 × 50 × 60 ≈ 12000` keys, each a few hundred bytes. The default-sized `4m` zone comfortably fits that.
|
||
If the zone fills, the module drops new keys and increments `nginx_ipng_zone_full_events_total` — resize and reload.
|
||
|
||
See FR-5.1, NFR-3.1.
|
||
|
||
### `ipng_stats_flush_interval <duration>`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** an nginx duration string, e.g. `500ms`, `1s`, `2s`.
|
||
|
||
**Default:** `1s`.
|
||
|
||
**Minimum:** `100ms`.
|
||
|
||
**Effect:** sets the cadence of the per-worker flush timer that moves private counter deltas into the shared-memory zone. Lower values
|
||
reduce the window of data loss if a worker crashes; higher values reduce the number of atomic adds on the shared zone. The default
|
||
is sized so that a scrape interval of 5–15 s sees effectively no lag.
|
||
|
||
See FR-4.2, FR-5.2.
|
||
|
||
### `ipng_stats_default_source <tag>`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** a short string; see `ipng_source_tag=` above for conventions.
|
||
|
||
**Default:** `direct`.
|
||
|
||
**Effect:** sets the tag applied to listening sockets that have neither `device=` nor `ipng_source_tag=`. A host serving a mix of device-attributed
|
||
and direct web traffic will see direct traffic under this tag in the scrape output. Rename it to `public`, `localnet`, or anything else
|
||
that reads better for your deployment.
|
||
|
||
See FR-1.3, FR-5.3.
|
||
|
||
### `ipng_stats_buckets <ms> <ms> <ms> ...`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** two or more positive integers, strictly increasing, representing histogram bucket upper bounds in milliseconds.
|
||
|
||
**Default:** `1 5 10 25 50 100 250 500 1000 2500 5000 10000`, plus an implicit `+Inf` bucket.
|
||
|
||
**Effect:** overrides the default histogram bucket boundaries for both `request_duration` and `upstream_response_time` histograms. The
|
||
same set applies to every `(source, vip)` key in the module (v0.1 does not support per-key override; see
|
||
[`design.md`](design.md#decisions-deferred-post-v01)).
|
||
|
||
See FR-2.3, FR-5.4.
|
||
|
||
### `ipng_stats on | off`
|
||
|
||
**Context:** `http`, `server`, `location`.
|
||
|
||
**Value:** boolean (`on` or `off`).
|
||
|
||
**Default:** `on` at the `http` level when the module is loaded.
|
||
|
||
**Effect:** opts a context into or out of counting. Cost of a disabled context is one branch in the log-phase handler. A location
|
||
serving the `ipng_stats` scrape handler is automatically excluded from counting regardless of this directive — scraping the scrape
|
||
endpoint does not inflate its own counters.
|
||
|
||
See FR-5.5.
|
||
|
||
### `ipng_stats_logtail <format_name> udp://<host>:<port> [buffer=<size>] [flush=<duration>]`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** `<format_name>` is the name of an existing `log_format` defined earlier in the same `http` block. The destination MUST be a
|
||
`udp://host:port` URI. `buffer=<size>` is an optional nginx size spec (default `64k`, minimum `1k`). `flush=<duration>` is an optional
|
||
nginx duration string (default `1s`, minimum `100ms`).
|
||
|
||
**Default:** not set — the directive is optional. When absent, no global logtail output is written.
|
||
|
||
**Effect:** registers a global log-phase writer that fires unconditionally for every request, regardless of `server` or `location`
|
||
context. The named `log_format` is looked up from nginx's log module at configuration time; nginx's standard variable-expansion
|
||
machinery renders each line, so any variable usable in a regular `log_format` — including `$ipng_source_tag` and `$server_addr` — is
|
||
available here.
|
||
|
||
Each worker maintains a private in-memory write buffer of `buffer=<size>` bytes. Each buffer flush is transmitted as a single
|
||
`sendto()` call on a per-worker `SOCK_DGRAM` socket that is opened at worker init and closed at worker exit. The address is resolved
|
||
once at configuration time — there is no DNS lookup at flush time. The buffer is flushed when:
|
||
|
||
- the buffer is full (immediate flush, no lines are dropped);
|
||
- the `flush=<duration>` timer fires (periodic flush); or
|
||
- the worker exits during a graceful reload or shutdown (final flush).
|
||
|
||
This covers all request traffic with a single directive at the `http` level, eliminating the need to repeat `access_log` in every
|
||
`server` block. It is particularly useful when the format includes `$ipng_source_tag` and `$server_addr`, giving per-device attribution
|
||
in every log line at no extra configuration cost.
|
||
|
||
File-based access logging is intentionally not supported by this directive — use nginx's built-in `access_log` directive for that.
|
||
|
||
```nginx
|
||
log_format logtail '$host\t$remote_addr\t$ipng_source_tag\t$server_addr\t'
|
||
'$request_method\t$request_uri\t$status\t$body_bytes_sent\t'
|
||
'$request_time';
|
||
ipng_stats_logtail logtail udp://127.0.0.1:9514 buffer=16k flush=1s;
|
||
```
|
||
|
||
**Constraints and behavior:**
|
||
|
||
- `host` MUST be a literal IPv4 address. Hostnames and IPv6 addresses are not supported in v0.1.
|
||
- Each flush emits a single UDP datagram. At the default `buffer=64k` size, datagram payloads comfortably fit within the ~64 KB
|
||
loopback MTU. Operators using very large buffers on non-loopback paths should be aware of path MTU limits.
|
||
- If no receiver is listening, the kernel silently discards the datagram. The worker receives no error and is not blocked. This is
|
||
intentional: the logtail is a fire-and-forget analytics transport — zero disk I/O and no backpressure are the point.
|
||
- There is no acknowledgment, no retry, and no sequence number. Datagrams lost in transit or because the receiver is down are
|
||
permanently lost.
|
||
|
||
**Receiver side:** any UDP server works. Two minimal examples:
|
||
|
||
```bash
|
||
# Quick inspection with netcat:
|
||
nc -u -l 127.0.0.1 9514
|
||
|
||
# Production Go receiver snippet:
|
||
conn, _ := net.ListenPacket("udp", ":9514")
|
||
buf := make([]byte, 65536)
|
||
for {
|
||
n, _, _ := conn.ReadFrom(buf)
|
||
process(buf[:n])
|
||
}
|
||
```
|
||
|
||
See FR-8.1, FR-8.2, FR-8.3, FR-8.4.
|
||
|
||
### `ipng_stats;` (scrape handler)
|
||
|
||
**Context:** `location`.
|
||
|
||
**Value:** no argument. Placed on its own line inside a `location` block.
|
||
|
||
**Default:** not set.
|
||
|
||
**Effect:** turns the enclosing location into the module's scrape handler. No other content handler (`proxy_pass`, `root`, `return`,
|
||
`fastcgi_pass`, ...) may be combined with `ipng_stats;` in the same location. The handler honors:
|
||
|
||
- `Accept:` header — `application/json` for JSON, anything else for Prometheus text.
|
||
- `?source_tag=<tag>` — filter output to only counters whose `source_tag` dimension equals the tag. Exact match, case-sensitive.
|
||
- `?vip=<address>` — filter output to only counters whose `vip` dimension equals the canonicalized address.
|
||
|
||
Filters MAY be combined; their effect is the intersection.
|
||
|
||
**Security:** the module does not ship authentication. Place an `allow`/`deny` ACL in the same `location` block (or its enclosing
|
||
`server`) to control access (NFR-6.2).
|
||
|
||
See FR-3.1, FR-3.2, FR-3.3, FR-3.4, FR-3.5.
|
||
|
||
## Metric names
|
||
|
||
For Prometheus, the module exports under the `nginx_ipng_` prefix.
|
||
|
||
| metric | type | labels | meaning |
|
||
| --- | --- | --- | --- |
|
||
| `nginx_ipng_requests_total` | counter | `source_tag`, `vip`, `code` | Request count per `(source, vip, status_code)`. |
|
||
| `nginx_ipng_bytes_in_total` | counter | `source_tag`, `vip`, `code` | Request bytes received (request line + headers + body). |
|
||
| `nginx_ipng_bytes_out_total` | counter | `source_tag`, `vip`, `code` | Response bytes sent (status line + headers + body). |
|
||
| `nginx_ipng_request_duration_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Request duration histogram (Prometheus shape). |
|
||
| `nginx_ipng_request_duration_seconds_sum` | histogram sum | `source_tag`, `vip` | Sum of observed durations in seconds. |
|
||
| `nginx_ipng_request_duration_seconds_count` | histogram count | `source_tag`, `vip` | Count of observations. |
|
||
| `nginx_ipng_upstream_response_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Upstream response time histogram. |
|
||
| `nginx_ipng_upstream_response_seconds_sum` | histogram sum | `source_tag`, `vip` | |
|
||
| `nginx_ipng_upstream_response_seconds_count` | histogram count | `source_tag`, `vip` | |
|
||
| `nginx_ipng_rate_1s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 1-second decay. |
|
||
| `nginx_ipng_rate_10s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 10-second decay. |
|
||
| `nginx_ipng_rate_60s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 60-second decay. |
|
||
| `nginx_ipng_zone_bytes_used` | gauge | — | Shared-memory zone bytes currently allocated. |
|
||
| `nginx_ipng_zone_bytes_total` | gauge | — | Shared-memory zone capacity in bytes. |
|
||
| `nginx_ipng_zone_full_events_total` | counter | — | Number of key insertions dropped because the zone was full. |
|
||
| `nginx_ipng_flushes_total` | counter | `worker` | Number of per-worker flush ticks executed. |
|
||
| `nginx_ipng_flush_duration_seconds` | histogram | `worker` | Histogram of flush durations. |
|
||
| `nginx_ipng_scrape_duration_seconds` | histogram | — | Histogram of scrape handler runtimes. |
|
||
|
||
See FR-2.*, FR-3.7.
|
||
|
||
## JSON output shape
|
||
|
||
```json
|
||
{
|
||
"schema": 1,
|
||
"by_source": {
|
||
"mg1": {
|
||
"vips": {
|
||
"192.0.2.10": {
|
||
"rate_1s": 42.3,
|
||
"rate_10s": 40.1,
|
||
"rate_60s": 39.8,
|
||
"codes": {
|
||
"200": { "requests": 12345, "bytes_in": 9876543, "bytes_out": 54321098 },
|
||
"404": { "requests": 17, "bytes_in": 2048, "bytes_out": 9216 }
|
||
},
|
||
"request_duration_ms": {
|
||
"buckets": { "1": 10, "5": 40, "10": 120, "25": 350, "50": 870, "100": 2100,
|
||
"250": 3400, "500": 4000, "1000": 4100, "2500": 4120,
|
||
"5000": 4123, "10000": 4124, "+Inf": 4124 },
|
||
"sum_ms": 87654,
|
||
"count": 4124
|
||
},
|
||
"upstream_response_ms": { "...": "..." }
|
||
}
|
||
}
|
||
}
|
||
},
|
||
"meta": {
|
||
"zone_bytes_used": 131072,
|
||
"zone_bytes_total": 4194304,
|
||
"zone_full_events": 0
|
||
}
|
||
}
|
||
```
|
||
|
||
The top-level `schema` field is versioned — breaking changes bump it, additive changes don't. Consumers SHOULD check `schema`
|
||
before parsing.
|
||
|
||
See FR-3.6.
|
||
|
||
## Context summary
|
||
|
||
| knob | `http` | `server` | `location` | `listen` |
|
||
| --- | --- | --- | --- | --- |
|
||
| `ipng_stats_zone` | ✅ | — | — | — |
|
||
| `ipng_stats_flush_interval` | ✅ | — | — | — |
|
||
| `ipng_stats_default_source` | ✅ | — | — | — |
|
||
| `ipng_stats_buckets` | ✅ | — | — | — |
|
||
| `ipng_stats_logtail` | ✅ | — | — | — |
|
||
| `ipng_stats on\|off` | ✅ | ✅ | ✅ | — |
|
||
| `ipng_stats;` (handler) | — | — | ✅ | — |
|
||
| `device=<ifname>` | — | — | — | ✅ |
|
||
| `ipng_source_tag=<tag>` | — | — | — | ✅ |
|