Reduce scrape cardinality: class codes, per-(source,vip) histograms, byte histograms

Collapses the status-code dimension of the counter key into six class
lanes (1xx..5xx/unknown) so per-(source,vip) counter cardinality no
longer grows with the number of distinct three-digit responses nginx
serves. Histogram series drop the code label entirely and aggregate
across classes. Adds nginx_ipng_latency_total with a code class label
so average latency per class can still be computed off the scrape.
Adds nginx_ipng_bytes_{in,out} histograms with configurable boundaries
via the new ipng_stats_byte_buckets directive. Bumps JSON schema to 2.

Operators who need full three-digit-code resolution should consume the
ipng_stats_logtail stream off-host; the stats zone intentionally trades
that resolution for a bounded scrape size.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-17 14:36:16 +02:00
parent 87050bcf13
commit b3ad74cbde
5 changed files with 817 additions and 312 deletions

View File

@@ -90,14 +90,18 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
**FR-2 Counters**
- **FR-2.1** The module MUST maintain, for every observed `(source, vip, status_code)` tuple, the following counters: total requests,
- **FR-2.1** The module MUST maintain, for every observed `(source, vip, status_class)` tuple, the following counters: total requests,
total bytes received (sum of request bytes including request line, headers, and body), total bytes sent (sum of response bytes
including status line, headers, and body), and a fixed-bucket histogram of request duration in milliseconds.
including status line, headers, and body), and sum of request durations in milliseconds (exported as `nginx_ipng_latency_total`).
The module MUST additionally maintain, per `(source, vip)` pair (no `code` label), fixed-bucket histograms of request duration in
milliseconds and of request/response sizes in bytes.
- **FR-2.2** When an upstream is used to serve the request, the module MUST additionally maintain a fixed-bucket histogram of upstream
response time in milliseconds, keyed by the same `(source, vip)` pair.
- **FR-2.3** The histogram bucket boundaries MUST be fixed at module initialization and MUST be the same for every `(source, vip)` key.
The default boundaries are `{1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000}` milliseconds plus an implicit `+Inf` bucket.
Operators MAY override the boundaries via the `ipng_stats_buckets` directive at the `http` level.
- **FR-2.3** The duration histogram bucket boundaries MUST be fixed at module initialization and MUST be the same for every `(source,
vip)` key. The default boundaries are `{1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000}` milliseconds plus an implicit
`+Inf` bucket. Operators MAY override the boundaries via the `ipng_stats_buckets` directive at the `http` level. The byte-size
histograms (request and response bodies) use independent bounds defaulting to `{100, 1000, 10000, 100000, 1000000, 10000000}` bytes;
`ipng_stats_byte_buckets` overrides them.
- **FR-2.4** The module MUST additionally maintain, per `(source, vip)` pair, exponentially-weighted moving averages for instantaneous
request rate with decay windows of 1 second, 10 seconds, and 60 seconds. EWMAs are updated from the periodic flush tick (see FR-4.2),
not from the request path.
@@ -105,8 +109,11 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
IPv4, RFC 5952 lowercase-compressed form for IPv6). IPv6 zone identifiers (scope-ids), if any, MUST be stripped during canonicalization;
link-local VIPs (which are not expected in practice) are attributed under their scope-less textual form. Port is not part of the key;
a VIP that listens on both 80 and 443 MUST be aggregated.
- **FR-2.6** The `status_code` dimension MUST be the full three-digit HTTP status code as recorded by nginx at log phase. The module MUST
NOT bucket codes into classes (2xx/3xx/4xx/5xx); bucketing is the consumer's job.
- **FR-2.6** The `status_code` dimension MUST be bucketed into a single class label: `1xx`, `2xx`, `3xx`, `4xx`, `5xx`, or `unknown` for
codes outside `[100, 599]`. This bounds per-`(source, vip)` cardinality to six lanes regardless of how many distinct three-digit
codes are observed. Operators who need a full per-code breakdown SHOULD enable `ipng_stats_logtail` (FR-8) and derive the per-code
view from the access-log stream off the hot path; the stats zone intentionally trades that resolution for a much smaller scrape
response.
**FR-3 Scrape endpoint**
@@ -122,8 +129,10 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
- **FR-3.5** Both filters MAY be supplied together; their effect is the intersection.
- **FR-3.6** The JSON schema MUST be documented in `docs/scrape-api.md` and MUST version via a top-level `schema` field so that breaking
changes can be made additively without bricking existing consumers.
- **FR-3.7** The Prometheus text output MUST use stable metric names prefixed with `nginx_ipng_` and MUST label every series with `source_tag`
and `vip`. Counter metrics additionally carry a `code` label.
- **FR-3.7** The Prometheus text output MUST use stable metric names prefixed with `nginx_ipng_` and MUST label every series with
`source_tag` and `vip`. Counter metrics (`nginx_ipng_requests_total`, `nginx_ipng_bytes_{in,out}_total`, `nginx_ipng_latency_total`)
additionally carry a `code` label with a class value (`1xx`..`5xx`/`unknown`). Histogram series (duration, upstream response,
request/response byte size) MUST NOT carry a `code` label — they aggregate across all classes for a given `(source, vip)` pair.
**FR-4 Hot path and flush**
@@ -220,7 +229,7 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
(cached on the connection struct), a constant-time status-code index computation, a constant number of integer increments, and a
`O(log B)` histogram binary search where `B` is the number of buckets. No syscalls, no allocations, no locks.
- **NFR-2.2** The per-flush cost per worker MUST be bounded by `O(K)` atomic adds, where `K` is the number of distinct
`(source, vip, code)` keys touched by that worker since the last flush. Keys untouched during an interval MUST NOT be visited.
`(source, vip, class)` keys touched by that worker since the last flush. Keys untouched during an interval MUST NOT be visited.
- **NFR-2.3** The scrape cost MUST be bounded by `O(K_total)` reads from the shared zone plus `O(K_total)` string format operations,
where `K_total` is the number of distinct keys in the zone.
@@ -231,9 +240,9 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
level no more than once per minute per worker.
- **NFR-3.2** The per-worker private counter table MUST be bounded by the same total key count the shared zone admits. A worker MUST NOT
accumulate private state that exceeds the shared-zone capacity.
- **NFR-3.3** The set of distinct status codes observed is small (typically ≤ 60) and MUST NOT be allowed to explode due to non-standard
responses; the module MUST clamp any observed code `< 100` or `>= 600` into a single bucket labeled `code="unknown"` rather than
allocating a new key.
- **NFR-3.3** Status codes are collapsed to six classes (`1xx`..`5xx`/`unknown`) at counter-update time (FR-2.6), bounding per-`(source,
vip)` counter cardinality at six lanes regardless of how many three-digit codes are observed. Any code outside `[100, 599]` falls
into `code="unknown"`. Per-code resolution is available via `ipng_stats_logtail` (FR-8), which operates off the hot path.
**NFR-4 Reload neutrality**
@@ -332,7 +341,7 @@ dynamic-module ABI.
- Parse new `listen` parameters `device=` and `ipng_source_tag=` and attach their values to each listening socket's config (FR-1.1, FR-1.2).
- Call `setsockopt(SO_BINDTODEVICE)` in the master process at bind time for listeners that set `device=` (FR-1.1, NFR-6.1).
- Maintain per-worker private counter tables keyed by `(source_id, vip_id, status_code)` (FR-2.1, NFR-1.1).
- Maintain per-worker private counter tables keyed by `(source_id, vip_id, status_class)` (FR-2.1, NFR-1.1).
- Run a per-worker flush timer that moves deltas into the shared-memory zone atomically (FR-4.2, NFR-1.2).
- Update EWMAs at flush time (FR-2.4).
- Serve the scrape endpoint with content negotiation and optional filters (FR-3).
@@ -359,17 +368,22 @@ such an interface is silently misattributed to that interface's source tag. This
#### Counter Data Model
Counters are stored as a flat hash table in a shared-memory zone. The key is the tuple `(source_id, vip_id, status_code)` where
`source_id` and `vip_id` are small integers assigned at first observation and reused thereafter. The value is a fixed-size record
containing:
Counters are stored as a flat hash table in a shared-memory zone. The key is the tuple `(source_id, vip_id, status_class)` where
`source_id` and `vip_id` are small integers assigned at first observation and `status_class` is one of six values (`0=unknown`,
`1..5` for `1xx`..`5xx`). The value is a fixed-size record containing:
- `requests` (u64)
- `bytes_in` (u64)
- `bytes_out` (u64)
- `duration_hist``B+1` u64 lanes (one per bucket plus the `+Inf` bucket)
- `duration_sum_ms` (u64)
- `upstream_hist` — same shape, only updated when an upstream served the request
- `duration_sum_ms` (u64) — exported as `nginx_ipng_latency_total` (per class)
- `upstream_sum_ms` (u64)
- `duration_hist` — `B+1` u64 lanes (one per bucket plus the `+Inf` bucket)
- `upstream_hist` — same shape, only updated when an upstream served the request
- `bytes_in_hist`, `bytes_out_hist` — `Bb+1` u64 lanes over the byte-size bucket bounds
Histogram lanes are kept per `(source, vip, class)` in storage, then summed across classes at scrape time to produce one
`_bucket`/`_sum`/`_count` series per `(source, vip)` — the Prometheus exposition never carries a `code` label on histogram series
(FR-3.7).
A parallel table keyed by `(source_id, vip_id)` — one row per VIP — holds the EWMAs for instantaneous rate. EWMAs are floats but updated
only from the flush tick, so there is no float contention on the request path.
@@ -379,8 +393,9 @@ endpoint can recover the original strings without re-parsing configuration.
String interning is capacity-bounded: the zone is sized by the operator, and once capacity is exhausted new keys are dropped with a
counter bump and an infrequent log line (NFR-3.1). In practice, the number of distinct VIPs on a single nginx host is small (tens, maybe
low hundreds), and the number of distinct source tags is the number of attributed interfaces (single digits). The dominant factor is
`status_code`; ~60 keys per VIP is a typical steady state.
low hundreds), and the number of distinct source tags is the number of attributed interfaces (single digits). Because status codes are
collapsed to six classes (FR-2.6), the `status_class` dimension contributes at most 6× the `(source, vip)` count — a ~10× reduction
from the per-three-digit-code model considered and discarded.
#### Hot Path
@@ -393,7 +408,7 @@ ipng_stats_log_handler(ngx_http_request_t *r)
ipng_listen_ctx_t *lctx;
ipng_counter_t *counter;
ngx_msec_int_t elapsed_ms;
ngx_uint_t code_idx;
ngx_uint_t class_idx;
if (!ipng_stats_enabled(r)) {
return NGX_OK;
@@ -403,8 +418,8 @@ ipng_stats_log_handler(ngx_http_request_t *r)
/* lctx contains source_id and the cached VIP id,
or resolves VIP lazily on first seen address */
code_idx = ipng_status_to_index(r->headers_out.status);
counter = ipng_worker_slot(lctx, r->connection->local_sockaddr, code_idx);
class_idx = ipng_status_to_class(r->headers_out.status); /* 0..5 */
counter = ipng_worker_slot(lctx, r->connection->local_sockaddr, class_idx);
counter->requests++;
counter->bytes_in += r->request_length;
@@ -425,7 +440,7 @@ ipng_stats_log_handler(ngx_http_request_t *r)
```
Nothing here touches shared memory. `ipng_worker_slot` resolves a private table slot using a small per-worker hash keyed by
`(source_id, vip_id, code_idx)`. VIP lookup is cached on the connection so that keep-alive requests reuse the resolved ID.
`(source_id, vip_id, class_idx)`. VIP lookup is cached on the connection so that keep-alive requests reuse the resolved ID.
#### Flush Timer
@@ -459,9 +474,10 @@ fixed-size buffer per chain link and requests new links only when full.
- **One nginx content handler**, `ipng_stats`, usable in any `location` block. Serves Prometheus text and JSON, filtered by optional
query parameters.
- **Two new `listen` parameters**, `device=` and `ipng_source_tag=`, usable anywhere a `listen` directive is used.
- **Five new `http`-level directives**: `ipng_stats_zone`, `ipng_stats_flush_interval`, `ipng_stats_default_source`,
`ipng_stats_buckets`, `ipng_stats` (on/off).
- **A Prometheus metric family** prefixed `nginx_ipng_*`, labelled `source_tag`, `vip`, and (for request counters) `code`.
- **Six new `http`-level directives**: `ipng_stats_zone`, `ipng_stats_flush_interval`, `ipng_stats_default_source`,
`ipng_stats_buckets`, `ipng_stats_byte_buckets`, `ipng_stats` (on/off).
- **A Prometheus metric family** prefixed `nginx_ipng_*`, labelled `source_tag`, `vip`, and (for counter metrics) a `code` class label
(`1xx`..`5xx`/`unknown`).
**Consumes.**