SO_BINDTODEVICE pins both ingress *and* egress to the bound interface — the kernel uses the listening socket's device binding when choosing the output interface for the SYN-ACK, which is sent before accept() returns and therefore can't be fixed up in userspace. That's fatal for maglev / DSR deployments where the SYN arrives through a GRE tunnel but the return path has to leave via the default route; the SYN-ACK goes out the GRE and is dropped by the uplink, so every new connection times out. Rework the listen plumbing so the module never touches SO_BINDTODEVICE. init_module now enables IP_PKTINFO and IPV6_RECVPKTINFO on every HTTP listening socket and resolves each configured `device=` name to an ifindex. At request time resolve_source calls getsockopt(IP_PKTOPTIONS) on the accepted fd to read the per-connection in(6)_pktinfo cmsg the kernel stashed during the handshake, then matches (ifindex, family) against the bindings table. The listening sockets remain plain wildcards, so the return path follows the normal routing table and DSR works. The wrapper also no longer clones or rebinds sockets: it still dedups per (cscf, sockaddr) so multiple device-tagged listens in a single server block coexist, and dedups bindings on (device, family) so the same device can carry different tags for v4 and v6 (e.g. tag2-v4 / tag2-v6) but not pointlessly duplicate when a listen include is shared across server blocks. Drive-by fixes to unblock `make pkg-deb` after a prior `make build-asan`: - debian/rules overrides dh_clean to exclude build/, since nginx-asan's install creates nobody:0700 temp dirs dh_clean can't traverse. - Makefile's build-asan removes those unused runtime temp dirs so the tree is clean afterwards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
332 lines
16 KiB
Markdown
332 lines
16 KiB
Markdown
<!-- SPDX-License-Identifier: Apache-2.0 -->
|
||
# nginx-ipng-stats-plugin — Configuration Reference
|
||
|
||
This document enumerates every directive and `listen` parameter introduced by `ngx_http_ipng_stats_module`, the nginx contexts in which
|
||
each is legal, the allowed values, and the default (NFR-7.2). For an end-to-end walkthrough read [`user-guide.md`](user-guide.md); for
|
||
the reasoning behind the design read [`design.md`](design.md).
|
||
|
||
## `listen` parameters
|
||
|
||
These extend the stock nginx `listen` directive. They are parsed by the module and stripped from `cf->args` before the original handler
|
||
is invoked, so they compose with every standard `listen` parameter (`ssl`, `http2`, `default_server`, `reuseport`, etc.).
|
||
|
||
### `device=<ifname>`
|
||
|
||
**Context:** `listen` directive (wherever `listen` itself is legal — typically inside `server { ... }`).
|
||
|
||
**Value:** a Linux interface name, e.g. `gre-mg1`, `eth0`. Maximum `IFNAMSIZ - 1` characters (15 on current kernels).
|
||
|
||
**Default:** not set (plain listen).
|
||
|
||
**Effect:** records a binding between `<ifname>` and the listen's source tag. At request time the log handler reads the ingress
|
||
ifindex for the connection (via `IP_PKTINFO` / `IPV6_PKTINFO` cmsg that the module enables on every HTTP listening socket at
|
||
init-module time) and attributes the request to whichever binding matches. The listening socket itself is a plain wildcard — no
|
||
`SO_BINDTODEVICE`, no extra sockets — which keeps outgoing packets on the default routing table and makes DSR / maglev
|
||
deployments work.
|
||
|
||
No additional Linux capability is required beyond what stock nginx already has (NFR-6.1).
|
||
|
||
See FR-1.1, FR-1.5, FR-1.6.
|
||
|
||
### `ipng_source_tag=<tag>`
|
||
|
||
**Context:** `listen` directive.
|
||
|
||
**Value:** a short opaque string identifying the traffic source. No length limit is enforced, but keep it ≤ 32 characters
|
||
for readable metric output.
|
||
|
||
**Default:** when `ipng_source_tag=` is absent but `device=X` is set, the tag defaults to the interface name `X` (FR-1.4). When both
|
||
are absent, the tag defaults to the value of `ipng_stats_default_source` at the enclosing `http` level.
|
||
|
||
**Effect:** every counter recorded on this listener carries `source_tag=<tag>` as a Prometheus label and as the outer key in the JSON
|
||
output. Scrape consumers can use this tag to filter the response to only the traffic they delivered. To obtain the VIP address in
|
||
nginx config (e.g. in `log_format` or `map`), use nginx's built-in `$server_addr` variable.
|
||
|
||
See FR-1.2, FR-1.3, FR-1.4.
|
||
|
||
## `http`-level directives
|
||
|
||
All plugin-wide settings live in the `http { ... }` block. They cannot be overridden in inner contexts.
|
||
|
||
### `ipng_stats_zone <name>:<size>`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** `<name>` is a string identifier for the shared-memory zone; `<size>` is an nginx size spec with `k` or `m` suffix.
|
||
|
||
**Default:** none — the directive is mandatory if the module is loaded.
|
||
|
||
**Effect:** allocates a shared-memory zone of `<size>` bytes to hold the counter hash table. The `<name>` must be stable across
|
||
`nginx -s reload` — renaming it forces a fresh segment, which is the one situation where counters reset without a master restart.
|
||
|
||
**Sizing guidance:** the dominant factor in zone size is `~60 keys per (source, vip)` (one per observed status code). A host serving
|
||
50 VIPs behind 4 source interfaces uses `4 × 50 × 60 ≈ 12000` keys, each a few hundred bytes. The default-sized `4m` zone comfortably fits that.
|
||
If the zone fills, the module drops new keys and increments `nginx_ipng_zone_full_events_total` — resize and reload.
|
||
|
||
See FR-5.1, NFR-3.1.
|
||
|
||
### `ipng_stats_flush_interval <duration>`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** an nginx duration string, e.g. `500ms`, `1s`, `2s`.
|
||
|
||
**Default:** `1s`.
|
||
|
||
**Minimum:** `100ms`.
|
||
|
||
**Effect:** sets the cadence of the per-worker flush timer that moves private counter deltas into the shared-memory zone. Lower values
|
||
reduce the window of data loss if a worker crashes; higher values reduce the number of atomic adds on the shared zone. The default
|
||
is sized so that a scrape interval of 5–15 s sees effectively no lag.
|
||
|
||
See FR-4.2, FR-5.2.
|
||
|
||
### `ipng_stats_default_source <tag>`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** a short string; see `ipng_source_tag=` above for conventions.
|
||
|
||
**Default:** `direct`.
|
||
|
||
**Effect:** sets the tag applied to listening sockets that have neither `device=` nor `ipng_source_tag=`. A host serving a mix of device-attributed
|
||
and direct web traffic will see direct traffic under this tag in the scrape output. Rename it to `public`, `localnet`, or anything else
|
||
that reads better for your deployment.
|
||
|
||
See FR-1.3, FR-5.3.
|
||
|
||
### `ipng_stats_buckets <ms> <ms> <ms> ...`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** two or more positive integers, strictly increasing, representing histogram bucket upper bounds in milliseconds.
|
||
|
||
**Default:** `1 5 10 25 50 100 250 500 1000 2500 5000 10000`, plus an implicit `+Inf` bucket.
|
||
|
||
**Effect:** overrides the default histogram bucket boundaries for both `request_duration` and `upstream_response_time` histograms. The
|
||
same set applies to every `(source, vip)` key in the module (v0.1 does not support per-key override; see
|
||
[`design.md`](design.md#decisions-deferred-post-v01)).
|
||
|
||
See FR-2.3, FR-5.4.
|
||
|
||
### `ipng_stats_byte_buckets <size> <size> ...`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** two or more strictly increasing sizes (nginx size spec: `100`, `1k`, `1m`, ...) representing byte-size histogram upper
|
||
bounds.
|
||
|
||
**Default:** `100 1000 10000 100000 1000000 10000000`, plus an implicit `+Inf` bucket.
|
||
|
||
**Effect:** overrides the default bucket boundaries for the `nginx_ipng_bytes_in` and `nginx_ipng_bytes_out` histograms. Pick values
|
||
that match your traffic mix — these bucket bounds feed the scrape output only, not the per-`(source, vip, class)` byte counters, which
|
||
are exact.
|
||
|
||
See FR-2.3.
|
||
|
||
### `ipng_stats on | off`
|
||
|
||
**Context:** `http`, `server`, `location`.
|
||
|
||
**Value:** boolean (`on` or `off`).
|
||
|
||
**Default:** `on` at the `http` level when the module is loaded.
|
||
|
||
**Effect:** opts a context into or out of counting. Cost of a disabled context is one branch in the log-phase handler. A location
|
||
serving the `ipng_stats` scrape handler is automatically excluded from counting regardless of this directive — scraping the scrape
|
||
endpoint does not inflate its own counters.
|
||
|
||
See FR-5.5.
|
||
|
||
### `ipng_stats_logtail <format_name> udp://<host>:<port> [buffer=<size>] [flush=<duration>] [if=<$variable>]`
|
||
|
||
**Context:** `http`.
|
||
|
||
**Value:** `<format_name>` is the name of an existing `log_format` defined earlier in the same `http` block. The destination MUST be a
|
||
`udp://host:port` URI. `buffer=<size>` is an optional nginx size spec (default `64k`, minimum `1k`). `flush=<duration>` is an optional
|
||
nginx duration string (default `1s`, minimum `100ms`). `if=<$variable>` is an optional condition variable — when set, the log line is
|
||
only emitted if the variable evaluates to a non-empty value other than `"0"`.
|
||
|
||
**Default:** not set — the directive is optional. When absent, no global logtail output is written.
|
||
|
||
**Effect:** registers a global log-phase writer that fires unconditionally for every request (unless suppressed by `if=`), regardless
|
||
of `server` or `location` context. The named `log_format` is looked up from nginx's log module at configuration time; nginx's standard
|
||
variable-expansion machinery renders each line, so any variable usable in a regular `log_format` — including `$ipng_source_tag` and
|
||
`$server_addr` — is available here.
|
||
|
||
Each worker maintains a private in-memory write buffer of `buffer=<size>` bytes. Each buffer flush is transmitted as a single
|
||
`sendto()` call on a per-worker `SOCK_DGRAM` socket that is opened at worker init and closed at worker exit. The address is resolved
|
||
once at configuration time — there is no DNS lookup at flush time. The buffer is flushed when:
|
||
|
||
- the buffer is full (immediate flush, no lines are dropped);
|
||
- the `flush=<duration>` timer fires (periodic flush); or
|
||
- the worker exits during a graceful reload or shutdown (final flush).
|
||
|
||
This covers all request traffic with a single directive at the `http` level, eliminating the need to repeat `access_log` in every
|
||
`server` block. It is particularly useful when the format includes `$ipng_source_tag` and `$server_addr`, giving per-device attribution
|
||
in every log line at no extra configuration cost.
|
||
|
||
File-based access logging is intentionally not supported by this directive — use nginx's built-in `access_log` directive for that.
|
||
|
||
```nginx
|
||
log_format ipng_stats_logtail '$host\t$remote_addr\t$request_method\t$request_uri\t'
|
||
'$status\t$body_bytes_sent\t'
|
||
'$ipng_source_tag\t$server_addr\t$scheme';
|
||
ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 buffer=16k flush=1s;
|
||
```
|
||
|
||
#### Conditional logging with `if=`
|
||
|
||
The `if=$variable` parameter suppresses log lines for requests where the variable is empty, not found, or `"0"`. This uses the same
|
||
semantics as nginx's built-in `access_log ... if=` and works well with `map` blocks:
|
||
|
||
```nginx
|
||
# Suppress health checks from the logtail stream:
|
||
map $request_uri $logtail_enabled {
|
||
~^/\.well-known/ipng/healthz 0;
|
||
default 1;
|
||
}
|
||
|
||
ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 if=$logtail_enabled;
|
||
```
|
||
|
||
The `map` compiles to a hash table at configuration time; at request time it costs a single hash probe, evaluated lazily only when
|
||
the variable is read. The condition is checked before the log format is rendered, so filtered requests skip the format rendering
|
||
entirely.
|
||
|
||
**Constraints and behavior:**
|
||
|
||
- `host` MUST be a literal IPv4 address. Hostnames and IPv6 addresses are not supported in v0.1.
|
||
- Each flush emits a single UDP datagram. At the default `buffer=64k` size, datagram payloads comfortably fit within the ~64 KB
|
||
loopback MTU. Operators using very large buffers on non-loopback paths should be aware of path MTU limits.
|
||
- If no receiver is listening, the kernel silently discards the datagram. The worker receives no error and is not blocked. This is
|
||
intentional: the logtail is a fire-and-forget analytics transport — zero disk I/O and no backpressure are the point.
|
||
- There is no acknowledgment, no retry, and no sequence number. Datagrams lost in transit or because the receiver is down are
|
||
permanently lost.
|
||
|
||
**Receiver side:** any UDP server works. Two minimal examples:
|
||
|
||
```bash
|
||
# Quick inspection with netcat:
|
||
nc -u -l 127.0.0.1 9514
|
||
|
||
# Production Go receiver snippet:
|
||
conn, _ := net.ListenPacket("udp", ":9514")
|
||
buf := make([]byte, 65536)
|
||
for {
|
||
n, _, _ := conn.ReadFrom(buf)
|
||
process(buf[:n])
|
||
}
|
||
```
|
||
|
||
See FR-8.1, FR-8.2, FR-8.3, FR-8.4.
|
||
|
||
### `ipng_stats;` (scrape handler)
|
||
|
||
**Context:** `location`.
|
||
|
||
**Value:** no argument. Placed on its own line inside a `location` block.
|
||
|
||
**Default:** not set.
|
||
|
||
**Effect:** turns the enclosing location into the module's scrape handler. No other content handler (`proxy_pass`, `root`, `return`,
|
||
`fastcgi_pass`, ...) may be combined with `ipng_stats;` in the same location. The handler honors:
|
||
|
||
- `Accept:` header — `application/json` for JSON, anything else for Prometheus text.
|
||
- `?source_tag=<tag>` — filter output to only counters whose `source_tag` dimension equals the tag. Exact match, case-sensitive.
|
||
- `?vip=<address>` — filter output to only counters whose `vip` dimension equals the canonicalized address.
|
||
|
||
Filters MAY be combined; their effect is the intersection.
|
||
|
||
**Security:** the module does not ship authentication. Place an `allow`/`deny` ACL in the same `location` block (or its enclosing
|
||
`server`) to control access (NFR-6.2).
|
||
|
||
See FR-3.1, FR-3.2, FR-3.3, FR-3.4, FR-3.5.
|
||
|
||
## Metric names
|
||
|
||
For Prometheus, the module exports under the `nginx_ipng_` prefix.
|
||
|
||
The `code` label is a class bucket — one of `1xx`, `2xx`, `3xx`, `4xx`, `5xx`, or `unknown` (for codes outside `[100, 599]`). This
|
||
keeps per-`(source, vip)` counter cardinality bounded at six lanes regardless of how many distinct three-digit responses nginx serves.
|
||
Histogram series do not carry `code` — they aggregate across all classes for a given `(source, vip)`. Operators who need a full
|
||
per-three-digit-code breakdown should enable `ipng_stats_logtail` and derive it from the access-log stream off the hot path.
|
||
|
||
| metric | type | labels | meaning |
|
||
| --- | --- | --- | --- |
|
||
| `nginx_ipng_requests_total` | counter | `source_tag`, `vip`, `code` | Request count per `(source, vip, class)`. |
|
||
| `nginx_ipng_bytes_in_total` | counter | `source_tag`, `vip`, `code` | Request bytes received (request line + headers + body). |
|
||
| `nginx_ipng_bytes_out_total` | counter | `source_tag`, `vip`, `code` | Response bytes sent (status line + headers + body). |
|
||
| `nginx_ipng_latency_total` | counter | `source_tag`, `vip`, `code` | Sum of request durations, in seconds. Divide by `_requests_total` for mean latency per class. |
|
||
| `nginx_ipng_request_duration_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Request duration histogram, aggregated across classes. |
|
||
| `nginx_ipng_request_duration_seconds_sum` | histogram sum | `source_tag`, `vip` | Sum of observed durations in seconds. |
|
||
| `nginx_ipng_request_duration_seconds_count` | histogram count | `source_tag`, `vip` | Count of observations. |
|
||
| `nginx_ipng_upstream_response_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Upstream response time histogram. |
|
||
| `nginx_ipng_upstream_response_seconds_sum` | histogram sum | `source_tag`, `vip` | |
|
||
| `nginx_ipng_upstream_response_seconds_count` | histogram count | `source_tag`, `vip` | |
|
||
| `nginx_ipng_bytes_in_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Request-size histogram (bytes). |
|
||
| `nginx_ipng_bytes_in_sum` | histogram sum | `source_tag`, `vip` | Sum of request bytes (equals `bytes_in_total` summed over classes). |
|
||
| `nginx_ipng_bytes_in_count` | histogram count | `source_tag`, `vip` | Observations. |
|
||
| `nginx_ipng_bytes_out_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Response-size histogram (bytes). |
|
||
| `nginx_ipng_bytes_out_sum` | histogram sum | `source_tag`, `vip` | Sum of response bytes. |
|
||
| `nginx_ipng_bytes_out_count` | histogram count | `source_tag`, `vip` | Observations. |
|
||
| `nginx_ipng_rate_1s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 1-second decay. |
|
||
| `nginx_ipng_rate_10s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 10-second decay. |
|
||
| `nginx_ipng_rate_60s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 60-second decay. |
|
||
| `nginx_ipng_zone_bytes_used` | gauge | — | Shared-memory zone bytes currently allocated. |
|
||
| `nginx_ipng_zone_bytes_total` | gauge | — | Shared-memory zone capacity in bytes. |
|
||
| `nginx_ipng_zone_full_events_total` | counter | — | Number of key insertions dropped because the zone was full. |
|
||
| `nginx_ipng_flushes_total` | counter | `worker` | Number of per-worker flush ticks executed. |
|
||
| `nginx_ipng_flush_duration_seconds` | histogram | `worker` | Histogram of flush durations. |
|
||
| `nginx_ipng_scrape_duration_seconds` | histogram | — | Histogram of scrape handler runtimes. |
|
||
|
||
See FR-2.*, FR-3.7.
|
||
|
||
## JSON output shape
|
||
|
||
```json
|
||
{
|
||
"schema": 2,
|
||
"records": [
|
||
{
|
||
"source_tag": "mg1",
|
||
"vip": "192.0.2.10",
|
||
"classes": {
|
||
"2xx": { "requests": 12345, "bytes_in": 9876543, "bytes_out": 54321098,
|
||
"latency_ms": 87654, "upstream_latency_ms": 61234 },
|
||
"4xx": { "requests": 17, "bytes_in": 2048, "bytes_out": 9216,
|
||
"latency_ms": 102, "upstream_latency_ms": 0 }
|
||
},
|
||
"request_duration_ms": {
|
||
"sum": 87756, "count": 12362,
|
||
"buckets": { "1": 10, "5": 40, "10": 120, "+Inf": 12362 }
|
||
},
|
||
"upstream_response_ms": { "sum": 61234, "count": 12345, "buckets": { "...": "..." } },
|
||
"bytes_in": { "count": 12362, "buckets": { "100": 200, "1000": 9000, "+Inf": 12362 } },
|
||
"bytes_out": { "count": 12362, "buckets": { "...": "..." } }
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
The top-level `schema` field is versioned — breaking changes bump it, additive changes don't. Schema `2` collapses status codes to
|
||
class buckets and moves histograms out of the per-class records to a per-`(source, vip)` record. Consumers SHOULD check `schema`
|
||
before parsing.
|
||
|
||
See FR-3.6.
|
||
|
||
## Context summary
|
||
|
||
| knob | `http` | `server` | `location` | `listen` |
|
||
| --- | --- | --- | --- | --- |
|
||
| `ipng_stats_zone` | ✅ | — | — | — |
|
||
| `ipng_stats_flush_interval` | ✅ | — | — | — |
|
||
| `ipng_stats_default_source` | ✅ | — | — | — |
|
||
| `ipng_stats_buckets` | ✅ | — | — | — |
|
||
| `ipng_stats_byte_buckets` | ✅ | — | — | — |
|
||
| `ipng_stats_logtail` | ✅ | — | — | — |
|
||
| `ipng_stats on\|off` | ✅ | ✅ | ✅ | — |
|
||
| `ipng_stats;` (handler) | — | — | ✅ | — |
|
||
| `device=<ifname>` | — | — | — | ✅ |
|
||
| `ipng_source_tag=<tag>` | — | — | — | ✅ |
|