Reduce scrape cardinality: class codes, per-(source,vip) histograms, byte histograms

Collapses the status-code dimension of the counter key into six class
lanes (1xx..5xx/unknown) so per-(source,vip) counter cardinality no
longer grows with the number of distinct three-digit responses nginx
serves. Histogram series drop the code label entirely and aggregate
across classes. Adds nginx_ipng_latency_total with a code class label
so average latency per class can still be computed off the scrape.
Adds nginx_ipng_bytes_{in,out} histograms with configurable boundaries
via the new ipng_stats_byte_buckets directive. Bumps JSON schema to 2.

Operators who need full three-digit-code resolution should consume the
ipng_stats_logtail stream off-host; the stats zone intentionally trades
that resolution for a bounded scrape size.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-17 14:36:16 +02:00
parent 87050bcf13
commit b3ad74cbde
5 changed files with 817 additions and 312 deletions

View File

@@ -108,6 +108,21 @@ same set applies to every `(source, vip)` key in the module (v0.1 does not suppo
See FR-2.3, FR-5.4. See FR-2.3, FR-5.4.
### `ipng_stats_byte_buckets <size> <size> ...`
**Context:** `http`.
**Value:** two or more strictly increasing sizes (nginx size spec: `100`, `1k`, `1m`, ...) representing byte-size histogram upper
bounds.
**Default:** `100 1000 10000 100000 1000000 10000000`, plus an implicit `+Inf` bucket.
**Effect:** overrides the default bucket boundaries for the `nginx_ipng_bytes_in` and `nginx_ipng_bytes_out` histograms. Pick values
that match your traffic mix — these bucket bounds feed the scrape output only, not the per-`(source, vip, class)` byte counters, which
are exact.
See FR-2.3.
### `ipng_stats on | off` ### `ipng_stats on | off`
**Context:** `http`, `server`, `location`. **Context:** `http`, `server`, `location`.
@@ -231,17 +246,29 @@ See FR-3.1, FR-3.2, FR-3.3, FR-3.4, FR-3.5.
For Prometheus, the module exports under the `nginx_ipng_` prefix. For Prometheus, the module exports under the `nginx_ipng_` prefix.
The `code` label is a class bucket — one of `1xx`, `2xx`, `3xx`, `4xx`, `5xx`, or `unknown` (for codes outside `[100, 599]`). This
keeps per-`(source, vip)` counter cardinality bounded at six lanes regardless of how many distinct three-digit responses nginx serves.
Histogram series do not carry `code` — they aggregate across all classes for a given `(source, vip)`. Operators who need a full
per-three-digit-code breakdown should enable `ipng_stats_logtail` and derive it from the access-log stream off the hot path.
| metric | type | labels | meaning | | metric | type | labels | meaning |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `nginx_ipng_requests_total` | counter | `source_tag`, `vip`, `code` | Request count per `(source, vip, status_code)`. | | `nginx_ipng_requests_total` | counter | `source_tag`, `vip`, `code` | Request count per `(source, vip, class)`. |
| `nginx_ipng_bytes_in_total` | counter | `source_tag`, `vip`, `code` | Request bytes received (request line + headers + body). | | `nginx_ipng_bytes_in_total` | counter | `source_tag`, `vip`, `code` | Request bytes received (request line + headers + body). |
| `nginx_ipng_bytes_out_total` | counter | `source_tag`, `vip`, `code` | Response bytes sent (status line + headers + body). | | `nginx_ipng_bytes_out_total` | counter | `source_tag`, `vip`, `code` | Response bytes sent (status line + headers + body). |
| `nginx_ipng_request_duration_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Request duration histogram (Prometheus shape). | | `nginx_ipng_latency_total` | counter | `source_tag`, `vip`, `code` | Sum of request durations, in seconds. Divide by `_requests_total` for mean latency per class. |
| `nginx_ipng_request_duration_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Request duration histogram, aggregated across classes. |
| `nginx_ipng_request_duration_seconds_sum` | histogram sum | `source_tag`, `vip` | Sum of observed durations in seconds. | | `nginx_ipng_request_duration_seconds_sum` | histogram sum | `source_tag`, `vip` | Sum of observed durations in seconds. |
| `nginx_ipng_request_duration_seconds_count` | histogram count | `source_tag`, `vip` | Count of observations. | | `nginx_ipng_request_duration_seconds_count` | histogram count | `source_tag`, `vip` | Count of observations. |
| `nginx_ipng_upstream_response_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Upstream response time histogram. | | `nginx_ipng_upstream_response_seconds_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Upstream response time histogram. |
| `nginx_ipng_upstream_response_seconds_sum` | histogram sum | `source_tag`, `vip` | | | `nginx_ipng_upstream_response_seconds_sum` | histogram sum | `source_tag`, `vip` | |
| `nginx_ipng_upstream_response_seconds_count` | histogram count | `source_tag`, `vip` | | | `nginx_ipng_upstream_response_seconds_count` | histogram count | `source_tag`, `vip` | |
| `nginx_ipng_bytes_in_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Request-size histogram (bytes). |
| `nginx_ipng_bytes_in_sum` | histogram sum | `source_tag`, `vip` | Sum of request bytes (equals `bytes_in_total` summed over classes). |
| `nginx_ipng_bytes_in_count` | histogram count | `source_tag`, `vip` | Observations. |
| `nginx_ipng_bytes_out_bucket` | histogram bucket | `source_tag`, `vip`, `le` | Response-size histogram (bytes). |
| `nginx_ipng_bytes_out_sum` | histogram sum | `source_tag`, `vip` | Sum of response bytes. |
| `nginx_ipng_bytes_out_count` | histogram count | `source_tag`, `vip` | Observations. |
| `nginx_ipng_rate_1s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 1-second decay. | | `nginx_ipng_rate_1s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 1-second decay. |
| `nginx_ipng_rate_10s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 10-second decay. | | `nginx_ipng_rate_10s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 10-second decay. |
| `nginx_ipng_rate_60s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 60-second decay. | | `nginx_ipng_rate_60s` | gauge | `source_tag`, `vip` | EWMA requests/sec, 60-second decay. |
@@ -258,39 +285,31 @@ See FR-2.*, FR-3.7.
```json ```json
{ {
"schema": 1, "schema": 2,
"by_source": { "records": [
"mg1": { {
"vips": { "source_tag": "mg1",
"192.0.2.10": { "vip": "192.0.2.10",
"rate_1s": 42.3, "classes": {
"rate_10s": 40.1, "2xx": { "requests": 12345, "bytes_in": 9876543, "bytes_out": 54321098,
"rate_60s": 39.8, "latency_ms": 87654, "upstream_latency_ms": 61234 },
"codes": { "4xx": { "requests": 17, "bytes_in": 2048, "bytes_out": 9216,
"200": { "requests": 12345, "bytes_in": 9876543, "bytes_out": 54321098 }, "latency_ms": 102, "upstream_latency_ms": 0 }
"404": { "requests": 17, "bytes_in": 2048, "bytes_out": 9216 }
}, },
"request_duration_ms": { "request_duration_ms": {
"buckets": { "1": 10, "5": 40, "10": 120, "25": 350, "50": 870, "100": 2100, "sum": 87756, "count": 12362,
"250": 3400, "500": 4000, "1000": 4100, "2500": 4120, "buckets": { "1": 10, "5": 40, "10": 120, "+Inf": 12362 }
"5000": 4123, "10000": 4124, "+Inf": 4124 },
"sum_ms": 87654,
"count": 4124
}, },
"upstream_response_ms": { "...": "..." } "upstream_response_ms": { "sum": 61234, "count": 12345, "buckets": { "...": "..." } },
} "bytes_in": { "count": 12362, "buckets": { "100": 200, "1000": 9000, "+Inf": 12362 } },
} "bytes_out": { "count": 12362, "buckets": { "...": "..." } }
}
},
"meta": {
"zone_bytes_used": 131072,
"zone_bytes_total": 4194304,
"zone_full_events": 0
} }
]
} }
``` ```
The top-level `schema` field is versioned — breaking changes bump it, additive changes don't. Consumers SHOULD check `schema` The top-level `schema` field is versioned — breaking changes bump it, additive changes don't. Schema `2` collapses status codes to
class buckets and moves histograms out of the per-class records to a per-`(source, vip)` record. Consumers SHOULD check `schema`
before parsing. before parsing.
See FR-3.6. See FR-3.6.
@@ -303,6 +322,7 @@ See FR-3.6.
| `ipng_stats_flush_interval` | ✅ | — | — | — | | `ipng_stats_flush_interval` | ✅ | — | — | — |
| `ipng_stats_default_source` | ✅ | — | — | — | | `ipng_stats_default_source` | ✅ | — | — | — |
| `ipng_stats_buckets` | ✅ | — | — | — | | `ipng_stats_buckets` | ✅ | — | — | — |
| `ipng_stats_byte_buckets` | ✅ | — | — | — |
| `ipng_stats_logtail` | ✅ | — | — | — | | `ipng_stats_logtail` | ✅ | — | — | — |
| `ipng_stats on\|off` | ✅ | ✅ | ✅ | — | | `ipng_stats on\|off` | ✅ | ✅ | ✅ | — |
| `ipng_stats;` (handler) | — | — | ✅ | — | | `ipng_stats;` (handler) | — | — | ✅ | — |

View File

@@ -90,14 +90,18 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
**FR-2 Counters** **FR-2 Counters**
- **FR-2.1** The module MUST maintain, for every observed `(source, vip, status_code)` tuple, the following counters: total requests, - **FR-2.1** The module MUST maintain, for every observed `(source, vip, status_class)` tuple, the following counters: total requests,
total bytes received (sum of request bytes including request line, headers, and body), total bytes sent (sum of response bytes total bytes received (sum of request bytes including request line, headers, and body), total bytes sent (sum of response bytes
including status line, headers, and body), and a fixed-bucket histogram of request duration in milliseconds. including status line, headers, and body), and sum of request durations in milliseconds (exported as `nginx_ipng_latency_total`).
The module MUST additionally maintain, per `(source, vip)` pair (no `code` label), fixed-bucket histograms of request duration in
milliseconds and of request/response sizes in bytes.
- **FR-2.2** When an upstream is used to serve the request, the module MUST additionally maintain a fixed-bucket histogram of upstream - **FR-2.2** When an upstream is used to serve the request, the module MUST additionally maintain a fixed-bucket histogram of upstream
response time in milliseconds, keyed by the same `(source, vip)` pair. response time in milliseconds, keyed by the same `(source, vip)` pair.
- **FR-2.3** The histogram bucket boundaries MUST be fixed at module initialization and MUST be the same for every `(source, vip)` key. - **FR-2.3** The duration histogram bucket boundaries MUST be fixed at module initialization and MUST be the same for every `(source,
The default boundaries are `{1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000}` milliseconds plus an implicit `+Inf` bucket. vip)` key. The default boundaries are `{1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000}` milliseconds plus an implicit
Operators MAY override the boundaries via the `ipng_stats_buckets` directive at the `http` level. `+Inf` bucket. Operators MAY override the boundaries via the `ipng_stats_buckets` directive at the `http` level. The byte-size
histograms (request and response bodies) use independent bounds defaulting to `{100, 1000, 10000, 100000, 1000000, 10000000}` bytes;
`ipng_stats_byte_buckets` overrides them.
- **FR-2.4** The module MUST additionally maintain, per `(source, vip)` pair, exponentially-weighted moving averages for instantaneous - **FR-2.4** The module MUST additionally maintain, per `(source, vip)` pair, exponentially-weighted moving averages for instantaneous
request rate with decay windows of 1 second, 10 seconds, and 60 seconds. EWMAs are updated from the periodic flush tick (see FR-4.2), request rate with decay windows of 1 second, 10 seconds, and 60 seconds. EWMAs are updated from the periodic flush tick (see FR-4.2),
not from the request path. not from the request path.
@@ -105,8 +109,11 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
IPv4, RFC 5952 lowercase-compressed form for IPv6). IPv6 zone identifiers (scope-ids), if any, MUST be stripped during canonicalization; IPv4, RFC 5952 lowercase-compressed form for IPv6). IPv6 zone identifiers (scope-ids), if any, MUST be stripped during canonicalization;
link-local VIPs (which are not expected in practice) are attributed under their scope-less textual form. Port is not part of the key; link-local VIPs (which are not expected in practice) are attributed under their scope-less textual form. Port is not part of the key;
a VIP that listens on both 80 and 443 MUST be aggregated. a VIP that listens on both 80 and 443 MUST be aggregated.
- **FR-2.6** The `status_code` dimension MUST be the full three-digit HTTP status code as recorded by nginx at log phase. The module MUST - **FR-2.6** The `status_code` dimension MUST be bucketed into a single class label: `1xx`, `2xx`, `3xx`, `4xx`, `5xx`, or `unknown` for
NOT bucket codes into classes (2xx/3xx/4xx/5xx); bucketing is the consumer's job. codes outside `[100, 599]`. This bounds per-`(source, vip)` cardinality to six lanes regardless of how many distinct three-digit
codes are observed. Operators who need a full per-code breakdown SHOULD enable `ipng_stats_logtail` (FR-8) and derive the per-code
view from the access-log stream off the hot path; the stats zone intentionally trades that resolution for a much smaller scrape
response.
**FR-3 Scrape endpoint** **FR-3 Scrape endpoint**
@@ -122,8 +129,10 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
- **FR-3.5** Both filters MAY be supplied together; their effect is the intersection. - **FR-3.5** Both filters MAY be supplied together; their effect is the intersection.
- **FR-3.6** The JSON schema MUST be documented in `docs/scrape-api.md` and MUST version via a top-level `schema` field so that breaking - **FR-3.6** The JSON schema MUST be documented in `docs/scrape-api.md` and MUST version via a top-level `schema` field so that breaking
changes can be made additively without bricking existing consumers. changes can be made additively without bricking existing consumers.
- **FR-3.7** The Prometheus text output MUST use stable metric names prefixed with `nginx_ipng_` and MUST label every series with `source_tag` - **FR-3.7** The Prometheus text output MUST use stable metric names prefixed with `nginx_ipng_` and MUST label every series with
and `vip`. Counter metrics additionally carry a `code` label. `source_tag` and `vip`. Counter metrics (`nginx_ipng_requests_total`, `nginx_ipng_bytes_{in,out}_total`, `nginx_ipng_latency_total`)
additionally carry a `code` label with a class value (`1xx`..`5xx`/`unknown`). Histogram series (duration, upstream response,
request/response byte size) MUST NOT carry a `code` label — they aggregate across all classes for a given `(source, vip)` pair.
**FR-4 Hot path and flush** **FR-4 Hot path and flush**
@@ -220,7 +229,7 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
(cached on the connection struct), a constant-time status-code index computation, a constant number of integer increments, and a (cached on the connection struct), a constant-time status-code index computation, a constant number of integer increments, and a
`O(log B)` histogram binary search where `B` is the number of buckets. No syscalls, no allocations, no locks. `O(log B)` histogram binary search where `B` is the number of buckets. No syscalls, no allocations, no locks.
- **NFR-2.2** The per-flush cost per worker MUST be bounded by `O(K)` atomic adds, where `K` is the number of distinct - **NFR-2.2** The per-flush cost per worker MUST be bounded by `O(K)` atomic adds, where `K` is the number of distinct
`(source, vip, code)` keys touched by that worker since the last flush. Keys untouched during an interval MUST NOT be visited. `(source, vip, class)` keys touched by that worker since the last flush. Keys untouched during an interval MUST NOT be visited.
- **NFR-2.3** The scrape cost MUST be bounded by `O(K_total)` reads from the shared zone plus `O(K_total)` string format operations, - **NFR-2.3** The scrape cost MUST be bounded by `O(K_total)` reads from the shared zone plus `O(K_total)` string format operations,
where `K_total` is the number of distinct keys in the zone. where `K_total` is the number of distinct keys in the zone.
@@ -231,9 +240,9 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
level no more than once per minute per worker. level no more than once per minute per worker.
- **NFR-3.2** The per-worker private counter table MUST be bounded by the same total key count the shared zone admits. A worker MUST NOT - **NFR-3.2** The per-worker private counter table MUST be bounded by the same total key count the shared zone admits. A worker MUST NOT
accumulate private state that exceeds the shared-zone capacity. accumulate private state that exceeds the shared-zone capacity.
- **NFR-3.3** The set of distinct status codes observed is small (typically ≤ 60) and MUST NOT be allowed to explode due to non-standard - **NFR-3.3** Status codes are collapsed to six classes (`1xx`..`5xx`/`unknown`) at counter-update time (FR-2.6), bounding per-`(source,
responses; the module MUST clamp any observed code `< 100` or `>= 600` into a single bucket labeled `code="unknown"` rather than vip)` counter cardinality at six lanes regardless of how many three-digit codes are observed. Any code outside `[100, 599]` falls
allocating a new key. into `code="unknown"`. Per-code resolution is available via `ipng_stats_logtail` (FR-8), which operates off the hot path.
**NFR-4 Reload neutrality** **NFR-4 Reload neutrality**
@@ -332,7 +341,7 @@ dynamic-module ABI.
- Parse new `listen` parameters `device=` and `ipng_source_tag=` and attach their values to each listening socket's config (FR-1.1, FR-1.2). - Parse new `listen` parameters `device=` and `ipng_source_tag=` and attach their values to each listening socket's config (FR-1.1, FR-1.2).
- Call `setsockopt(SO_BINDTODEVICE)` in the master process at bind time for listeners that set `device=` (FR-1.1, NFR-6.1). - Call `setsockopt(SO_BINDTODEVICE)` in the master process at bind time for listeners that set `device=` (FR-1.1, NFR-6.1).
- Maintain per-worker private counter tables keyed by `(source_id, vip_id, status_code)` (FR-2.1, NFR-1.1). - Maintain per-worker private counter tables keyed by `(source_id, vip_id, status_class)` (FR-2.1, NFR-1.1).
- Run a per-worker flush timer that moves deltas into the shared-memory zone atomically (FR-4.2, NFR-1.2). - Run a per-worker flush timer that moves deltas into the shared-memory zone atomically (FR-4.2, NFR-1.2).
- Update EWMAs at flush time (FR-2.4). - Update EWMAs at flush time (FR-2.4).
- Serve the scrape endpoint with content negotiation and optional filters (FR-3). - Serve the scrape endpoint with content negotiation and optional filters (FR-3).
@@ -359,17 +368,22 @@ such an interface is silently misattributed to that interface's source tag. This
#### Counter Data Model #### Counter Data Model
Counters are stored as a flat hash table in a shared-memory zone. The key is the tuple `(source_id, vip_id, status_code)` where Counters are stored as a flat hash table in a shared-memory zone. The key is the tuple `(source_id, vip_id, status_class)` where
`source_id` and `vip_id` are small integers assigned at first observation and reused thereafter. The value is a fixed-size record `source_id` and `vip_id` are small integers assigned at first observation and `status_class` is one of six values (`0=unknown`,
containing: `1..5` for `1xx`..`5xx`). The value is a fixed-size record containing:
- `requests` (u64) - `requests` (u64)
- `bytes_in` (u64) - `bytes_in` (u64)
- `bytes_out` (u64) - `bytes_out` (u64)
- `duration_hist``B+1` u64 lanes (one per bucket plus the `+Inf` bucket) - `duration_sum_ms` (u64) — exported as `nginx_ipng_latency_total` (per class)
- `duration_sum_ms` (u64)
- `upstream_hist` — same shape, only updated when an upstream served the request
- `upstream_sum_ms` (u64) - `upstream_sum_ms` (u64)
- `duration_hist` — `B+1` u64 lanes (one per bucket plus the `+Inf` bucket)
- `upstream_hist` — same shape, only updated when an upstream served the request
- `bytes_in_hist`, `bytes_out_hist` — `Bb+1` u64 lanes over the byte-size bucket bounds
Histogram lanes are kept per `(source, vip, class)` in storage, then summed across classes at scrape time to produce one
`_bucket`/`_sum`/`_count` series per `(source, vip)` — the Prometheus exposition never carries a `code` label on histogram series
(FR-3.7).
A parallel table keyed by `(source_id, vip_id)` — one row per VIP — holds the EWMAs for instantaneous rate. EWMAs are floats but updated A parallel table keyed by `(source_id, vip_id)` — one row per VIP — holds the EWMAs for instantaneous rate. EWMAs are floats but updated
only from the flush tick, so there is no float contention on the request path. only from the flush tick, so there is no float contention on the request path.
@@ -379,8 +393,9 @@ endpoint can recover the original strings without re-parsing configuration.
String interning is capacity-bounded: the zone is sized by the operator, and once capacity is exhausted new keys are dropped with a String interning is capacity-bounded: the zone is sized by the operator, and once capacity is exhausted new keys are dropped with a
counter bump and an infrequent log line (NFR-3.1). In practice, the number of distinct VIPs on a single nginx host is small (tens, maybe counter bump and an infrequent log line (NFR-3.1). In practice, the number of distinct VIPs on a single nginx host is small (tens, maybe
low hundreds), and the number of distinct source tags is the number of attributed interfaces (single digits). The dominant factor is low hundreds), and the number of distinct source tags is the number of attributed interfaces (single digits). Because status codes are
`status_code`; ~60 keys per VIP is a typical steady state. collapsed to six classes (FR-2.6), the `status_class` dimension contributes at most 6× the `(source, vip)` count — a ~10× reduction
from the per-three-digit-code model considered and discarded.
#### Hot Path #### Hot Path
@@ -393,7 +408,7 @@ ipng_stats_log_handler(ngx_http_request_t *r)
ipng_listen_ctx_t *lctx; ipng_listen_ctx_t *lctx;
ipng_counter_t *counter; ipng_counter_t *counter;
ngx_msec_int_t elapsed_ms; ngx_msec_int_t elapsed_ms;
ngx_uint_t code_idx; ngx_uint_t class_idx;
if (!ipng_stats_enabled(r)) { if (!ipng_stats_enabled(r)) {
return NGX_OK; return NGX_OK;
@@ -403,8 +418,8 @@ ipng_stats_log_handler(ngx_http_request_t *r)
/* lctx contains source_id and the cached VIP id, /* lctx contains source_id and the cached VIP id,
or resolves VIP lazily on first seen address */ or resolves VIP lazily on first seen address */
code_idx = ipng_status_to_index(r->headers_out.status); class_idx = ipng_status_to_class(r->headers_out.status); /* 0..5 */
counter = ipng_worker_slot(lctx, r->connection->local_sockaddr, code_idx); counter = ipng_worker_slot(lctx, r->connection->local_sockaddr, class_idx);
counter->requests++; counter->requests++;
counter->bytes_in += r->request_length; counter->bytes_in += r->request_length;
@@ -425,7 +440,7 @@ ipng_stats_log_handler(ngx_http_request_t *r)
``` ```
Nothing here touches shared memory. `ipng_worker_slot` resolves a private table slot using a small per-worker hash keyed by Nothing here touches shared memory. `ipng_worker_slot` resolves a private table slot using a small per-worker hash keyed by
`(source_id, vip_id, code_idx)`. VIP lookup is cached on the connection so that keep-alive requests reuse the resolved ID. `(source_id, vip_id, class_idx)`. VIP lookup is cached on the connection so that keep-alive requests reuse the resolved ID.
#### Flush Timer #### Flush Timer
@@ -459,9 +474,10 @@ fixed-size buffer per chain link and requests new links only when full.
- **One nginx content handler**, `ipng_stats`, usable in any `location` block. Serves Prometheus text and JSON, filtered by optional - **One nginx content handler**, `ipng_stats`, usable in any `location` block. Serves Prometheus text and JSON, filtered by optional
query parameters. query parameters.
- **Two new `listen` parameters**, `device=` and `ipng_source_tag=`, usable anywhere a `listen` directive is used. - **Two new `listen` parameters**, `device=` and `ipng_source_tag=`, usable anywhere a `listen` directive is used.
- **Five new `http`-level directives**: `ipng_stats_zone`, `ipng_stats_flush_interval`, `ipng_stats_default_source`, - **Six new `http`-level directives**: `ipng_stats_zone`, `ipng_stats_flush_interval`, `ipng_stats_default_source`,
`ipng_stats_buckets`, `ipng_stats` (on/off). `ipng_stats_buckets`, `ipng_stats_byte_buckets`, `ipng_stats` (on/off).
- **A Prometheus metric family** prefixed `nginx_ipng_*`, labelled `source_tag`, `vip`, and (for request counters) `code`. - **A Prometheus metric family** prefixed `nginx_ipng_*`, labelled `source_tag`, `vip`, and (for counter metrics) a `code` class label
(`1xx`..`5xx`/`unknown`).
**Consumes.** **Consumes.**

View File

@@ -183,16 +183,20 @@ curl -s http://127.0.0.1:9113/.well-known/ipng/statsz
Default output is Prometheus text format: Default output is Prometheus text format:
``` ```
# HELP nginx_ipng_requests_total Total HTTP requests, per (source_tag, vip, code). # HELP nginx_ipng_requests_total Total HTTP requests.
# TYPE nginx_ipng_requests_total counter # TYPE nginx_ipng_requests_total counter
nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="200"} 12345 nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="2xx"} 12345
nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="404"} 17 nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="4xx"} 17
nginx_ipng_requests_total{source_tag="mg2",vip="192.0.2.10",code="200"} 9876 nginx_ipng_requests_total{source_tag="mg2",vip="192.0.2.10",code="2xx"} 9876
nginx_ipng_requests_total{source_tag="direct",vip="192.0.2.10",code="200"} 42 nginx_ipng_requests_total{source_tag="direct",vip="192.0.2.10",code="2xx"} 42
# HELP nginx_ipng_bytes_in_total Request bytes received, per (source_tag, vip, code). # HELP nginx_ipng_bytes_in_total Request bytes received.
# TYPE nginx_ipng_bytes_in_total counter # TYPE nginx_ipng_bytes_in_total counter
nginx_ipng_bytes_in_total{source_tag="mg1",vip="192.0.2.10",code="200"} 9876543 nginx_ipng_bytes_in_total{source_tag="mg1",vip="192.0.2.10",code="2xx"} 9876543
# ... and so on # ... and so on
# Histogram series (request_duration, upstream_response, bytes_in, bytes_out)
# do NOT carry a `code` label — they aggregate across classes per (source, vip).
nginx_ipng_request_duration_seconds_bucket{source_tag="mg1",vip="192.0.2.10",le="0.050"} 11200
``` ```
For JSON output instead, set the `Accept` header: For JSON output instead, set the `Accept` header:
@@ -237,7 +241,7 @@ Typical PromQL queries:
sum by (source_tag, vip) (rate(nginx_ipng_requests_total[1m])) sum by (source_tag, vip) (rate(nginx_ipng_requests_total[1m]))
# 5xx error rate per VIP, aggregated across all sources: # 5xx error rate per VIP, aggregated across all sources:
sum by (vip) (rate(nginx_ipng_requests_total{code=~"5.."}[5m])) sum by (vip) (rate(nginx_ipng_requests_total{code="5xx"}[5m]))
/ /
sum by (vip) (rate(nginx_ipng_requests_total[5m])) sum by (vip) (rate(nginx_ipng_requests_total[5m]))
@@ -252,6 +256,11 @@ Operators who want a single unified access log covering all traffic — regardle
have to repeat `access_log` in every `server {}` block or rely on a catch-all virtual host. The `ipng_stats_logtail` directive removes have to repeat `access_log` in every `server {}` block or rely on a catch-all virtual host. The `ipng_stats_logtail` directive removes
that requirement: one line at the `http` level registers a global log-phase writer that fires unconditionally for every request (FR-8.1). that requirement: one line at the `http` level registers a global log-phase writer that fires unconditionally for every request (FR-8.1).
The logtail is also the recommended escape hatch when you need richer cardinality than the stats zone exposes. The Prometheus counters
deliberately collapse HTTP status codes into six class lanes (`1xx`..`5xx`/`unknown`) to keep scrape size bounded. Operators who need
per-three-digit-code, per-path, per-user-agent, or any other high-cardinality breakdown should ship the logtail stream to an off-path
analytics receiver and compute those views there — that work happens in a different process and never touches the nginx hot path.
The logtail sends each buffer flush as a single UDP datagram to a `host:port`. Zero disk I/O, no backpressure, no blocking if the The logtail sends each buffer flush as a single UDP datagram to a `host:port`. Zero disk I/O, no backpressure, no blocking if the
receiver is down. This makes it ideal for fire-and-forget analytics pipelines where delivery guarantees are unnecessary and disk writes receiver is down. This makes it ideal for fire-and-forget analytics pipelines where delivery guarantees are unnecessary and disk writes
would add unwanted I/O pressure. For file-based access logging, use nginx's built-in `access_log` directive. would add unwanted I/O pressure. For file-based access logging, use nginx's built-in `access_log` directive.
@@ -374,9 +383,10 @@ from any language.
Once wired, a consumer can derive from the scrape data: Once wired, a consumer can derive from the scrape data:
- Live QPS per backend (from the EWMA gauges). - Live QPS per backend (from the EWMA gauges).
- Status-code mix per backend (from the counter families). - Status-class mix per backend (the six-lane `1xx`..`5xx`/`unknown` counter families). Full three-digit codes are not exported by the
- p50/p95 latency per backend (from the duration histogram). scrape endpoint; route the logtail stream off-host and aggregate there if you need per-code breakdowns.
- Traffic volume per backend (from the bytes counters). - p50/p95 latency per backend (from the duration histogram, aggregated across classes).
- Traffic volume per backend (from the bytes counters and the new bytes histograms).
For an example of this pattern in a GRE tunnel fleet, see [`vpp-maglev`](https://git.ipng.ch/ipng/vpp-maglev), whose frontend scrapes For an example of this pattern in a GRE tunnel fleet, see [`vpp-maglev`](https://git.ipng.ch/ipng/vpp-maglev), whose frontend scrapes
each nginx backend filtered by source tag to show per-backend traffic alongside health state. each nginx backend filtered by source tag to show per-backend traffic alongside health state.
@@ -393,7 +403,8 @@ values in `listens.conf`, or the interfaces aren't up. Run `ip -br link` and con
`nginx.conf` is stable across reloads — renaming the zone forces a new shared-memory segment. `nginx.conf` is stable across reloads — renaming the zone forces a new shared-memory segment.
**`nginx_ipng_zone_full_events_total` is non-zero.** The shared-memory zone is too small for your VIP count. Increase the size in **`nginx_ipng_zone_full_events_total` is non-zero.** The shared-memory zone is too small for your VIP count. Increase the size in
`ipng_stats_zone ipng:<size>` (default 4 MB is enough for ~hundreds of VIPs with the full status-code set). `ipng_stats_zone ipng:<size>` (default 4 MB is enough for ~hundreds of VIPs the code dimension is bucketed to six classes, so
one 4 MB zone holds a very large deployment).
**`curl http://127.0.0.1:9113/.well-known/ipng/statsz` returns "403 Forbidden".** The `allow`/`deny` ACL is blocking your source address. Either add **`curl http://127.0.0.1:9113/.well-known/ipng/statsz` returns "403 Forbidden".** The `allow`/`deny` ACL is blocking your source address. Either add
yourself or scrape from a host already in the allow list. yourself or scrape from a host already in the allow list.

File diff suppressed because it is too large Load Diff

View File

@@ -71,14 +71,14 @@ Direct traffic tagged
# --- Status code tracking --- # --- Status code tracking ---
Per-code counters Per-class code counters
[Documentation] 404 and 200 appear as distinct code= labels. [Documentation] 4xx and 2xx appear as class-bucketed code= labels.
Docker Exec Ignore Rc ${CLIENT1} curl -s http://10.0.1.1:8080/notfound Docker Exec Ignore Rc ${CLIENT1} curl -s http://10.0.1.1:8080/notfound
Docker Exec Ignore Rc ${CLIENT1} curl -s http://10.0.1.1:8080/notfound Docker Exec Ignore Rc ${CLIENT1} curl -s http://10.0.1.1:8080/notfound
Wait For Flush Wait For Flush
${output} = Scrape With Filter source_tag=cl1 ${output} = Scrape With Filter source_tag=cl1
Should Contain ${output} code="404" Should Contain ${output} code="4xx"
Should Contain ${output} code="200" Should Contain ${output} code="2xx"
# --- Duration histogram --- # --- Duration histogram ---