Files
nginx-ipng-stats-plugin/docs/config-guide.md
Pim van Pelt b3ad74cbde Reduce scrape cardinality: class codes, per-(source,vip) histograms, byte histograms
Collapses the status-code dimension of the counter key into six class
lanes (1xx..5xx/unknown) so per-(source,vip) counter cardinality no
longer grows with the number of distinct three-digit responses nginx
serves. Histogram series drop the code label entirely and aggregate
across classes. Adds nginx_ipng_latency_total with a code class label
so average latency per class can still be computed off the scrape.
Adds nginx_ipng_bytes_{in,out} histograms with configurable boundaries
via the new ipng_stats_byte_buckets directive. Bumps JSON schema to 2.

Operators who need full three-digit-code resolution should consume the
ipng_stats_logtail stream off-host; the stats zone intentionally trades
that resolution for a bounded scrape size.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:36:16 +02:00

16 KiB
Raw Blame History

nginx-ipng-stats-plugin — Configuration Reference

This document enumerates every directive and listen parameter introduced by ngx_http_ipng_stats_module, the nginx contexts in which each is legal, the allowed values, and the default (NFR-7.2). For an end-to-end walkthrough read user-guide.md; for the reasoning behind the design read design.md.

listen parameters

These extend the stock nginx listen directive. They are parsed by the module and stripped from cf->args before the original handler is invoked, so they compose with every standard listen parameter (ssl, http2, default_server, reuseport, etc.).

device=<ifname>

Context: listen directive (wherever listen itself is legal — typically inside server { ... }).

Value: a Linux interface name, e.g. gre-mg1, eth0. Maximum IFNAMSIZ - 1 characters (15 on current kernels).

Default: not set (plain listen).

Effect: the resulting listening socket has SO_BINDTODEVICE applied at init-module time, making the kernel accept only connections whose ingress interface is <ifname>. Combined with a wildcard listen address (80, [::]:80) this is the mechanism by which the plugin attributes traffic to a specific ingress interface.

The setsockopt(SO_BINDTODEVICE) call runs in the nginx master process while it still holds its initial privileges — workers never call it, and no additional Linux capability is required beyond what stock nginx already has (NFR-6.1).

See FR-1.1, FR-1.5, FR-1.6.

ipng_source_tag=<tag>

Context: listen directive.

Value: a short opaque string identifying the traffic source. No length limit is enforced, but keep it ≤ 32 characters for readable metric output.

Default: when ipng_source_tag= is absent but device=X is set, the tag defaults to the interface name X (FR-1.4). When both are absent, the tag defaults to the value of ipng_stats_default_source at the enclosing http level.

Effect: every counter recorded on this listener carries source_tag=<tag> as a Prometheus label and as the outer key in the JSON output. Scrape consumers can use this tag to filter the response to only the traffic they delivered. To obtain the VIP address in nginx config (e.g. in log_format or map), use nginx's built-in $server_addr variable.

See FR-1.2, FR-1.3, FR-1.4.

http-level directives

All plugin-wide settings live in the http { ... } block. They cannot be overridden in inner contexts.

ipng_stats_zone <name>:<size>

Context: http.

Value: <name> is a string identifier for the shared-memory zone; <size> is an nginx size spec with k or m suffix.

Default: none — the directive is mandatory if the module is loaded.

Effect: allocates a shared-memory zone of <size> bytes to hold the counter hash table. The <name> must be stable across nginx -s reload — renaming it forces a fresh segment, which is the one situation where counters reset without a master restart.

Sizing guidance: the dominant factor in zone size is ~60 keys per (source, vip) (one per observed status code). A host serving 50 VIPs behind 4 source interfaces uses 4 × 50 × 60 ≈ 12000 keys, each a few hundred bytes. The default-sized 4m zone comfortably fits that. If the zone fills, the module drops new keys and increments nginx_ipng_zone_full_events_total — resize and reload.

See FR-5.1, NFR-3.1.

ipng_stats_flush_interval <duration>

Context: http.

Value: an nginx duration string, e.g. 500ms, 1s, 2s.

Default: 1s.

Minimum: 100ms.

Effect: sets the cadence of the per-worker flush timer that moves private counter deltas into the shared-memory zone. Lower values reduce the window of data loss if a worker crashes; higher values reduce the number of atomic adds on the shared zone. The default is sized so that a scrape interval of 515 s sees effectively no lag.

See FR-4.2, FR-5.2.

ipng_stats_default_source <tag>

Context: http.

Value: a short string; see ipng_source_tag= above for conventions.

Default: direct.

Effect: sets the tag applied to listening sockets that have neither device= nor ipng_source_tag=. A host serving a mix of device-attributed and direct web traffic will see direct traffic under this tag in the scrape output. Rename it to public, localnet, or anything else that reads better for your deployment.

See FR-1.3, FR-5.3.

ipng_stats_buckets <ms> <ms> <ms> ...

Context: http.

Value: two or more positive integers, strictly increasing, representing histogram bucket upper bounds in milliseconds.

Default: 1 5 10 25 50 100 250 500 1000 2500 5000 10000, plus an implicit +Inf bucket.

Effect: overrides the default histogram bucket boundaries for both request_duration and upstream_response_time histograms. The same set applies to every (source, vip) key in the module (v0.1 does not support per-key override; see design.md).

See FR-2.3, FR-5.4.

ipng_stats_byte_buckets <size> <size> ...

Context: http.

Value: two or more strictly increasing sizes (nginx size spec: 100, 1k, 1m, ...) representing byte-size histogram upper bounds.

Default: 100 1000 10000 100000 1000000 10000000, plus an implicit +Inf bucket.

Effect: overrides the default bucket boundaries for the nginx_ipng_bytes_in and nginx_ipng_bytes_out histograms. Pick values that match your traffic mix — these bucket bounds feed the scrape output only, not the per-(source, vip, class) byte counters, which are exact.

See FR-2.3.

ipng_stats on | off

Context: http, server, location.

Value: boolean (on or off).

Default: on at the http level when the module is loaded.

Effect: opts a context into or out of counting. Cost of a disabled context is one branch in the log-phase handler. A location serving the ipng_stats scrape handler is automatically excluded from counting regardless of this directive — scraping the scrape endpoint does not inflate its own counters.

See FR-5.5.

ipng_stats_logtail <format_name> udp://<host>:<port> [buffer=<size>] [flush=<duration>] [if=<$variable>]

Context: http.

Value: <format_name> is the name of an existing log_format defined earlier in the same http block. The destination MUST be a udp://host:port URI. buffer=<size> is an optional nginx size spec (default 64k, minimum 1k). flush=<duration> is an optional nginx duration string (default 1s, minimum 100ms). if=<$variable> is an optional condition variable — when set, the log line is only emitted if the variable evaluates to a non-empty value other than "0".

Default: not set — the directive is optional. When absent, no global logtail output is written.

Effect: registers a global log-phase writer that fires unconditionally for every request (unless suppressed by if=), regardless of server or location context. The named log_format is looked up from nginx's log module at configuration time; nginx's standard variable-expansion machinery renders each line, so any variable usable in a regular log_format — including $ipng_source_tag and $server_addr — is available here.

Each worker maintains a private in-memory write buffer of buffer=<size> bytes. Each buffer flush is transmitted as a single sendto() call on a per-worker SOCK_DGRAM socket that is opened at worker init and closed at worker exit. The address is resolved once at configuration time — there is no DNS lookup at flush time. The buffer is flushed when:

  • the buffer is full (immediate flush, no lines are dropped);
  • the flush=<duration> timer fires (periodic flush); or
  • the worker exits during a graceful reload or shutdown (final flush).

This covers all request traffic with a single directive at the http level, eliminating the need to repeat access_log in every server block. It is particularly useful when the format includes $ipng_source_tag and $server_addr, giving per-device attribution in every log line at no extra configuration cost.

File-based access logging is intentionally not supported by this directive — use nginx's built-in access_log directive for that.

log_format ipng_stats_logtail '$host\t$remote_addr\t$request_method\t$request_uri\t'
                              '$status\t$body_bytes_sent\t'
                              '$ipng_source_tag\t$server_addr\t$scheme';
ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 buffer=16k flush=1s;

Conditional logging with if=

The if=$variable parameter suppresses log lines for requests where the variable is empty, not found, or "0". This uses the same semantics as nginx's built-in access_log ... if= and works well with map blocks:

# Suppress health checks from the logtail stream:
map $request_uri $logtail_enabled {
    ~^/\.well-known/ipng/healthz  0;
    default                       1;
}

ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 if=$logtail_enabled;

The map compiles to a hash table at configuration time; at request time it costs a single hash probe, evaluated lazily only when the variable is read. The condition is checked before the log format is rendered, so filtered requests skip the format rendering entirely.

Constraints and behavior:

  • host MUST be a literal IPv4 address. Hostnames and IPv6 addresses are not supported in v0.1.
  • Each flush emits a single UDP datagram. At the default buffer=64k size, datagram payloads comfortably fit within the ~64 KB loopback MTU. Operators using very large buffers on non-loopback paths should be aware of path MTU limits.
  • If no receiver is listening, the kernel silently discards the datagram. The worker receives no error and is not blocked. This is intentional: the logtail is a fire-and-forget analytics transport — zero disk I/O and no backpressure are the point.
  • There is no acknowledgment, no retry, and no sequence number. Datagrams lost in transit or because the receiver is down are permanently lost.

Receiver side: any UDP server works. Two minimal examples:

# Quick inspection with netcat:
nc -u -l 127.0.0.1 9514

# Production Go receiver snippet:
conn, _ := net.ListenPacket("udp", ":9514")
buf := make([]byte, 65536)
for {
    n, _, _ := conn.ReadFrom(buf)
    process(buf[:n])
}

See FR-8.1, FR-8.2, FR-8.3, FR-8.4.

ipng_stats; (scrape handler)

Context: location.

Value: no argument. Placed on its own line inside a location block.

Default: not set.

Effect: turns the enclosing location into the module's scrape handler. No other content handler (proxy_pass, root, return, fastcgi_pass, ...) may be combined with ipng_stats; in the same location. The handler honors:

  • Accept: header — application/json for JSON, anything else for Prometheus text.
  • ?source_tag=<tag> — filter output to only counters whose source_tag dimension equals the tag. Exact match, case-sensitive.
  • ?vip=<address> — filter output to only counters whose vip dimension equals the canonicalized address.

Filters MAY be combined; their effect is the intersection.

Security: the module does not ship authentication. Place an allow/deny ACL in the same location block (or its enclosing server) to control access (NFR-6.2).

See FR-3.1, FR-3.2, FR-3.3, FR-3.4, FR-3.5.

Metric names

For Prometheus, the module exports under the nginx_ipng_ prefix.

The code label is a class bucket — one of 1xx, 2xx, 3xx, 4xx, 5xx, or unknown (for codes outside [100, 599]). This keeps per-(source, vip) counter cardinality bounded at six lanes regardless of how many distinct three-digit responses nginx serves. Histogram series do not carry code — they aggregate across all classes for a given (source, vip). Operators who need a full per-three-digit-code breakdown should enable ipng_stats_logtail and derive it from the access-log stream off the hot path.

metric type labels meaning
nginx_ipng_requests_total counter source_tag, vip, code Request count per (source, vip, class).
nginx_ipng_bytes_in_total counter source_tag, vip, code Request bytes received (request line + headers + body).
nginx_ipng_bytes_out_total counter source_tag, vip, code Response bytes sent (status line + headers + body).
nginx_ipng_latency_total counter source_tag, vip, code Sum of request durations, in seconds. Divide by _requests_total for mean latency per class.
nginx_ipng_request_duration_seconds_bucket histogram bucket source_tag, vip, le Request duration histogram, aggregated across classes.
nginx_ipng_request_duration_seconds_sum histogram sum source_tag, vip Sum of observed durations in seconds.
nginx_ipng_request_duration_seconds_count histogram count source_tag, vip Count of observations.
nginx_ipng_upstream_response_seconds_bucket histogram bucket source_tag, vip, le Upstream response time histogram.
nginx_ipng_upstream_response_seconds_sum histogram sum source_tag, vip
nginx_ipng_upstream_response_seconds_count histogram count source_tag, vip
nginx_ipng_bytes_in_bucket histogram bucket source_tag, vip, le Request-size histogram (bytes).
nginx_ipng_bytes_in_sum histogram sum source_tag, vip Sum of request bytes (equals bytes_in_total summed over classes).
nginx_ipng_bytes_in_count histogram count source_tag, vip Observations.
nginx_ipng_bytes_out_bucket histogram bucket source_tag, vip, le Response-size histogram (bytes).
nginx_ipng_bytes_out_sum histogram sum source_tag, vip Sum of response bytes.
nginx_ipng_bytes_out_count histogram count source_tag, vip Observations.
nginx_ipng_rate_1s gauge source_tag, vip EWMA requests/sec, 1-second decay.
nginx_ipng_rate_10s gauge source_tag, vip EWMA requests/sec, 10-second decay.
nginx_ipng_rate_60s gauge source_tag, vip EWMA requests/sec, 60-second decay.
nginx_ipng_zone_bytes_used gauge Shared-memory zone bytes currently allocated.
nginx_ipng_zone_bytes_total gauge Shared-memory zone capacity in bytes.
nginx_ipng_zone_full_events_total counter Number of key insertions dropped because the zone was full.
nginx_ipng_flushes_total counter worker Number of per-worker flush ticks executed.
nginx_ipng_flush_duration_seconds histogram worker Histogram of flush durations.
nginx_ipng_scrape_duration_seconds histogram Histogram of scrape handler runtimes.

See FR-2.*, FR-3.7.

JSON output shape

{
  "schema": 2,
  "records": [
    {
      "source_tag": "mg1",
      "vip": "192.0.2.10",
      "classes": {
        "2xx": { "requests": 12345, "bytes_in": 9876543, "bytes_out": 54321098,
                 "latency_ms": 87654, "upstream_latency_ms": 61234 },
        "4xx": { "requests": 17, "bytes_in": 2048, "bytes_out": 9216,
                 "latency_ms": 102, "upstream_latency_ms": 0 }
      },
      "request_duration_ms": {
        "sum": 87756, "count": 12362,
        "buckets": { "1": 10, "5": 40, "10": 120, "+Inf": 12362 }
      },
      "upstream_response_ms": { "sum": 61234, "count": 12345, "buckets": { "...": "..." } },
      "bytes_in":  { "count": 12362, "buckets": { "100": 200, "1000": 9000, "+Inf": 12362 } },
      "bytes_out": { "count": 12362, "buckets": { "...": "..." } }
    }
  ]
}

The top-level schema field is versioned — breaking changes bump it, additive changes don't. Schema 2 collapses status codes to class buckets and moves histograms out of the per-class records to a per-(source, vip) record. Consumers SHOULD check schema before parsing.

See FR-3.6.

Context summary

knob http server location listen
ipng_stats_zone
ipng_stats_flush_interval
ipng_stats_default_source
ipng_stats_buckets
ipng_stats_byte_buckets
ipng_stats_logtail
ipng_stats on|off
ipng_stats; (handler)
device=<ifname>
ipng_source_tag=<tag>