488 lines
22 KiB
Markdown
488 lines
22 KiB
Markdown
<!-- SPDX-License-Identifier: Apache-2.0 -->
|
|
# nginx-ipng-stats-plugin — User Guide
|
|
|
|
This document walks an operator through installing the plugin, deploying it on a single nginx host serving traffic that arrives on
|
|
distinct interfaces (GRE tunnels, VLANs, bonded links, or plain ethernet), verifying that counters are flowing, and hooking up the
|
|
scrape endpoint to Prometheus and other consumers.
|
|
|
|
It covers (NFR-7.1):
|
|
|
|
1. Installing the Debian package.
|
|
2. Setting up interfaces for per-device attribution (GRE tunnel example).
|
|
3. Writing a minimal nginx configuration.
|
|
4. Verifying with `curl`.
|
|
5. Scraping from Prometheus.
|
|
6. Setting up a global logtail access log.
|
|
7. Integrating with scrape consumers.
|
|
|
|
For a directive-by-directive reference, read [`config-guide.md`](config-guide.md) alongside this guide.
|
|
|
|
## 1. Install the package
|
|
|
|
On Debian Trixie (and newer), the module is distributed as `libnginx-mod-http-ipng-stats`. The package depends on the stock `nginx`
|
|
package and loads cleanly into it without recompiling nginx itself.
|
|
|
|
```
|
|
sudo apt install ./libnginx-mod-http-ipng-stats_*_amd64.deb
|
|
```
|
|
|
|
The package will:
|
|
|
|
- Drop `ngx_http_ipng_stats_module.so` into `/usr/lib/nginx/modules/`.
|
|
- Place a `load_module` stanza in `/etc/nginx/modules-available/50-mod-http-ipng-stats.conf`.
|
|
- Symlink it into `/etc/nginx/modules-enabled/` so nginx picks it up on the next reload.
|
|
- Run `nginx -t` and, if the test fails, remove the `modules-enabled` symlink and print a warning — so a broken upgrade never leaves
|
|
you with an nginx that cannot start.
|
|
|
|
Confirm the module is loaded:
|
|
|
|
```
|
|
nginx -V 2>&1 | grep -o ngx_http_ipng_stats_module
|
|
```
|
|
|
|
## 2. Set up interfaces for per-device attribution
|
|
|
|
The plugin attributes traffic by watching which interface the request came in on. It enables `IP_PKTINFO` / `IPV6_RECVPKTINFO` on
|
|
each listening socket and reads the ingress `ifindex` per accepted connection, so the listening sockets remain plain wildcards and
|
|
outgoing packets follow the normal routing table — this is what makes DSR / maglev deployments work, where the SYN arrives via a
|
|
GRE tunnel but the SYN-ACK must leave via the default route. For the attribution itself to work, each traffic source that should be
|
|
tracked separately MUST arrive on its own interface.
|
|
|
|
This works with any kind of Linux interface — GRE tunnels, VLANs, VXLANs, bonded links, or plain ethernet. This guide uses GRE
|
|
tunnels as the example, but the module does not care about the interface type.
|
|
|
|
This guide doesn't prescribe a specific networking layer — use whatever your host already uses (`systemd-networkd`, Netplan,
|
|
`/etc/network/interfaces`, or a hand-rolled script). The only hard requirement is:
|
|
|
|
- Each traffic source that should be separately attributed gets its own interface on the nginx host.
|
|
- Interfaces follow a consistent naming pattern. For GRE tunnels we recommend `gre-<tag>`, e.g. `gre-mg1`, `gre-mg2`.
|
|
- The VIPs are bound to a local dummy or loopback interface so the kernel accepts packets destined for them.
|
|
|
|
For example, with `systemd-networkd`, a GRE tunnel to a remote peer at `2001:db8::1` from this host at `2001:db8::100` looks like:
|
|
|
|
```
|
|
# /etc/systemd/network/10-gre-mg1.netdev
|
|
[NetDev]
|
|
Name=gre-mg1
|
|
Kind=ip6gre
|
|
|
|
[Tunnel]
|
|
Local=2001:db8::100
|
|
Remote=2001:db8::1
|
|
TTL=64
|
|
```
|
|
|
|
```
|
|
# /etc/systemd/network/10-gre-mg1.network
|
|
[Match]
|
|
Name=gre-mg1
|
|
|
|
[Network]
|
|
LinkLocalAddressing=no
|
|
```
|
|
|
|
Repeat for each additional tunnel. A trimmed-down variant of this scheme is what IPng uses in production.
|
|
|
|
Verify the interfaces exist and carry traffic:
|
|
|
|
```
|
|
ip -6 tunnel show | grep gre-mg
|
|
ip -6 -s link show gre-mg1
|
|
```
|
|
|
|
## 3. Write the nginx configuration
|
|
|
|
The plugin needs three things in `nginx.conf`:
|
|
|
|
1. A shared-memory zone for counters (`ipng_stats_zone`).
|
|
2. One device-bound `listen` directive per attributed (interface, address family) pair.
|
|
3. A scrape location serving the `ipng_stats` handler.
|
|
|
|
A minimal working configuration looks like this:
|
|
|
|
```nginx
|
|
load_module modules/ngx_http_ipng_stats_module.so;
|
|
|
|
events {
|
|
worker_connections 4096;
|
|
}
|
|
|
|
http {
|
|
ipng_stats_zone ipng:4m;
|
|
ipng_stats_flush_interval 1s;
|
|
ipng_stats_default_source direct;
|
|
|
|
# Attributed vhost. Wildcard listens below register one binding
|
|
# per (device, family); all collapse to a single kernel socket
|
|
# under the IP_PKTINFO attribution model.
|
|
server {
|
|
include /etc/nginx/ipng-stats/listens.conf;
|
|
|
|
server_name _;
|
|
root /var/www/html;
|
|
}
|
|
|
|
# Direct (un-attributed) traffic on a separate port — the listen has no
|
|
# device=, so requests get the `ipng_stats_default_source` tag.
|
|
server {
|
|
listen 198.51.100.1:8081 default_server;
|
|
listen [2001:db8::1]:8081 default_server;
|
|
|
|
server_name _;
|
|
root /var/www/html;
|
|
}
|
|
|
|
# A second server block exposing the scrape endpoint on a locked-down port.
|
|
server {
|
|
listen 127.0.0.1:9113;
|
|
listen [::1]:9113;
|
|
|
|
location = /.well-known/ipng/statsz {
|
|
ipng_stats;
|
|
allow 127.0.0.1;
|
|
allow ::1;
|
|
allow 2001:db8::/48; # your scrape consumers
|
|
deny all;
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
And `/etc/nginx/ipng-stats/listens.conf` — the hand-maintained include file — is two lines per attributed interface (one per address
|
|
family):
|
|
|
|
```nginx
|
|
listen 80 device=gre-mg1 ipng_source_tag=mg1;
|
|
listen [::]:80 device=gre-mg1 ipng_source_tag=mg1;
|
|
listen 80 device=gre-mg2 ipng_source_tag=mg2;
|
|
listen [::]:80 device=gre-mg2 ipng_source_tag=mg2;
|
|
listen 80 device=gre-mg3 ipng_source_tag=mg3;
|
|
listen [::]:80 device=gre-mg3 ipng_source_tag=mg3;
|
|
listen 80 device=gre-mg4 ipng_source_tag=mg4;
|
|
listen [::]:80 device=gre-mg4 ipng_source_tag=mg4;
|
|
```
|
|
|
|
Test and reload:
|
|
|
|
```
|
|
sudo nginx -t
|
|
sudo nginx -s reload
|
|
```
|
|
|
|
If `nginx -t` complains about an unknown `listen` parameter (`device=` or `ipng_source_tag=`), the module isn't loaded — check step 1.
|
|
|
|
### Why wildcard listens?
|
|
|
|
You do not need to enumerate VIPs in `listen`. A wildcard `listen 80 device=gre-mg1 ipng_source_tag=mg1;` accepts any local address
|
|
served through the `gre-mg1` interface, and nginx routes per-request to the right vhost by `server_name` / `Host:` header. Adding a new
|
|
VIP is a `server_name` change; adding a new interface is an append to `listens.conf`.
|
|
|
|
### Sharing a single port across address families and devices
|
|
|
|
Under the `IP_PKTINFO` attribution model, all listens at a given sockaddr collapse to a single wildcard kernel socket at runtime —
|
|
the kernel stamps every accepted connection with its ingress ifindex, and the module looks that up in the table of `device=`
|
|
bindings registered by the listen wrapper. Multiple device-tagged wildcard listens on port 80 are therefore not "multiple
|
|
sockets"; they're one wildcard socket plus N entries in the attribution table.
|
|
|
|
A device can reuse one tag across address families or split into per-family tags — whichever reads better in the scrape output:
|
|
|
|
```nginx
|
|
listen 80 device=gre-mg1 ipng_source_tag=mg1;
|
|
listen [::]:80 device=gre-mg1 ipng_source_tag=mg1; # same tag across families
|
|
listen 80 device=gre-mg2 ipng_source_tag=mg2-v4;
|
|
listen [::]:80 device=gre-mg2 ipng_source_tag=mg2-v6; # per-family tags
|
|
```
|
|
|
|
A plain `listen 80;` can sit alongside device-tagged listens in the same server block; the wrapper treats the first occurrence
|
|
at a given `(server, sockaddr)` pair as the one that registers the kernel socket and lets subsequent device-tagged siblings
|
|
register bindings without tripping nginx's duplicate-listen check. Traffic arriving on an interface that has no binding falls
|
|
back to `ipng_stats_default_source` (`direct` by default). Keeping "direct" traffic on its own port — e.g.
|
|
`listen 198.51.100.1:8081;` — remains a fine pattern when you want a hard split, but it's no longer required.
|
|
|
|
### Shared includes with `reuseport` (or other socket-level options)
|
|
|
|
Socket-level `listen` options — `reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`,
|
|
`deferred`, `ipv6only=`, `so_keepalive=` — belong to the one kernel socket that backs a given sockaddr, not to a particular
|
|
`server { ... }` block. Stock nginx enforces this by accepting them on at most the *first* listen per sockaddr and emitting
|
|
`duplicate listen options for <addr>` on any subsequent repeat. That rule collides with the common deployment pattern of a single
|
|
`listens.conf` included from every vhost, because each vhost's `include` re-submits the same options.
|
|
|
|
The wrapper resolves this transparently. When a sockaddr recurs under a different `server` block than the one that first
|
|
registered it, the wrapper strips socket-level options from the incoming `cf->args` before delegating to nginx's core listen
|
|
handler. The first `server` block owns the options on the kernel socket (including `reuseport`, which triggers per-worker
|
|
socket cloning); later blocks merge cleanly via `ngx_http_add_server` and inherit the same socket. The wrapper logs one
|
|
`[notice] ipng_stats: stripped socket options from duplicate listen on <addr>` per stripped listen — informational, not an
|
|
error. So this include works unchanged across as many vhosts as you like:
|
|
|
|
```nginx
|
|
listen 443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
|
|
listen [::]:443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
|
|
```
|
|
|
|
`reuseport` noticeably helps worker load-balancing on busy hosts: without it, a single shared listening socket forces workers
|
|
to compete for accepts and traffic routinely concentrates on one or two workers. HTTP/2 and long-lived keepalive connections
|
|
can still skew CPU toward whichever worker holds a few heavy clients — `reuseport` does not reshuffle existing connections —
|
|
but new-connection distribution across workers becomes kernel-hashed, not first-ready-wins.
|
|
|
|
## 4. Verify with curl
|
|
|
|
Generate some traffic (or wait for real traffic), then scrape the endpoint locally:
|
|
|
|
```
|
|
curl -s http://127.0.0.1:9113/.well-known/ipng/statsz
|
|
```
|
|
|
|
Default output is Prometheus text format:
|
|
|
|
```
|
|
# HELP nginx_ipng_requests_total Total HTTP requests.
|
|
# TYPE nginx_ipng_requests_total counter
|
|
nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="2xx"} 12345
|
|
nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="4xx"} 17
|
|
nginx_ipng_requests_total{source_tag="mg2",vip="192.0.2.10",code="2xx"} 9876
|
|
nginx_ipng_requests_total{source_tag="direct",vip="192.0.2.10",code="2xx"} 42
|
|
# HELP nginx_ipng_bytes_in_total Request bytes received.
|
|
# TYPE nginx_ipng_bytes_in_total counter
|
|
nginx_ipng_bytes_in_total{source_tag="mg1",vip="192.0.2.10",code="2xx"} 9876543
|
|
# ... and so on
|
|
|
|
# Histogram series (request_duration, upstream_response, bytes_in, bytes_out)
|
|
# do NOT carry a `code` label — they aggregate across classes per (source, vip).
|
|
nginx_ipng_request_duration_seconds_bucket{source_tag="mg1",vip="192.0.2.10",le="0.050"} 11200
|
|
|
|
# In-flight gauges per (source, vip). These are point-in-time request counts,
|
|
# not rates: `active` = requests observed at POST_READ that haven't finalized
|
|
# yet; `reading` = in pre-response phases (rewrite/access/content); `writing`
|
|
# = past header send. reading + writing = active at any instant.
|
|
nginx_ipng_active{source_tag="mg1",vip="192.0.2.10"} 3
|
|
nginx_ipng_reading{source_tag="mg1",vip="192.0.2.10"} 1
|
|
nginx_ipng_writing{source_tag="mg1",vip="192.0.2.10"} 2
|
|
```
|
|
|
|
For JSON output instead, set the `Accept` header:
|
|
|
|
```
|
|
curl -s -H 'Accept: application/json' http://127.0.0.1:9113/.well-known/ipng/statsz | jq .
|
|
```
|
|
|
|
To filter server-side to a single source tag:
|
|
|
|
```
|
|
curl -s 'http://127.0.0.1:9113/.well-known/ipng/statsz?source_tag=mg1'
|
|
curl -s 'http://127.0.0.1:9113/.well-known/ipng/statsz?source_tag=mg1&vip=192.0.2.10'
|
|
```
|
|
|
|
If you see `source_tag="direct"` entries with non-zero counts and you expected all traffic to come in via attributed interfaces,
|
|
something is routing around them — typically an interface that isn't in `listens.conf`, or an interface that's down.
|
|
|
|
## 5. Scrape from Prometheus
|
|
|
|
The same endpoint serves Prometheus text by default. Add a scrape job:
|
|
|
|
```yaml
|
|
# /etc/prometheus/prometheus.yml
|
|
scrape_configs:
|
|
- job_name: nginx-ipng
|
|
scrape_interval: 15s
|
|
static_configs:
|
|
- targets:
|
|
- 'nginx-backend-1.example.com:9113'
|
|
- 'nginx-backend-2.example.com:9113'
|
|
metrics_path: /.well-known/ipng/statsz
|
|
```
|
|
|
|
You'll want to add `nginx-backend-*` to your `allow` rules in the scrape server block, or front the plugin with a TLS-terminating
|
|
reverse proxy. The module does not ship its own auth; the nginx `allow`/`deny` ACL is your access control.
|
|
|
|
Typical PromQL queries:
|
|
|
|
```
|
|
# Requests per second per source, per VIP:
|
|
sum by (source_tag, vip) (rate(nginx_ipng_requests_total[1m]))
|
|
|
|
# 5xx error rate per VIP, aggregated across all sources:
|
|
sum by (vip) (rate(nginx_ipng_requests_total{code="5xx"}[5m]))
|
|
/
|
|
sum by (vip) (rate(nginx_ipng_requests_total[5m]))
|
|
|
|
# p95 request duration per (source_tag, vip):
|
|
histogram_quantile(0.95,
|
|
sum by (source_tag, vip, le) (rate(nginx_ipng_request_duration_seconds_bucket[5m])))
|
|
|
|
# In-flight concurrency per (source_tag, vip). Gauges are exported as-is;
|
|
# use max_over_time for load-shedding alerts or avg_over_time for capacity
|
|
# planning:
|
|
max_over_time(nginx_ipng_active[5m])
|
|
```
|
|
|
|
## 6. Set up a global logtail access log
|
|
|
|
Operators who want a single unified access log covering all traffic — regardless of which `server` block handled the request — normally
|
|
have to repeat `access_log` in every `server {}` block or rely on a catch-all virtual host. The `ipng_stats_logtail` directive removes
|
|
that requirement: one line at the `http` level registers a global log-phase writer that fires unconditionally for every request (FR-8.1).
|
|
|
|
The logtail is also the recommended escape hatch when you need richer cardinality than the stats zone exposes. The Prometheus counters
|
|
deliberately collapse HTTP status codes into six class lanes (`1xx`..`5xx`/`unknown`) to keep scrape size bounded. Operators who need
|
|
per-three-digit-code, per-path, per-user-agent, or any other high-cardinality breakdown should ship the logtail stream to an off-path
|
|
analytics receiver and compute those views there — that work happens in a different process and never touches the nginx hot path.
|
|
|
|
The logtail sends each buffer flush as a single UDP datagram to a `host:port`. Zero disk I/O, no backpressure, no blocking if the
|
|
receiver is down. This makes it ideal for fire-and-forget analytics pipelines where delivery guarantees are unnecessary and disk writes
|
|
would add unwanted I/O pressure. For file-based access logging, use nginx's built-in `access_log` directive.
|
|
|
|
### Define the log format
|
|
|
|
Add a `log_format` declaration inside the `http { ... }` block, **before** the `ipng_stats_logtail` directive that references it:
|
|
|
|
```nginx
|
|
log_format ipng_stats_logtail '$host\t$remote_addr\t$request_method\t$request_uri\t'
|
|
'$status\t$body_bytes_sent\t'
|
|
'$ipng_source_tag\t$server_addr\t$scheme';
|
|
```
|
|
|
|
Any nginx variable is usable here, including `$ipng_source_tag` (the device attribution tag, FR-6.1), `$server_addr` (the VIP
|
|
that received the request), and `$scheme` (`http` or `https` — useful since `$server_addr` alone doesn't distinguish ports).
|
|
|
|
### Configuration
|
|
|
|
```nginx
|
|
http {
|
|
ipng_stats_zone ipng:4m;
|
|
|
|
log_format ipng_stats_logtail '$host\t$remote_addr\t$request_method\t$request_uri\t'
|
|
'$status\t$body_bytes_sent\t'
|
|
'$ipng_source_tag\t$server_addr\t$scheme';
|
|
|
|
ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 buffer=16k flush=1s;
|
|
|
|
server { ... }
|
|
}
|
|
```
|
|
|
|
- **`ipng_stats_logtail`** (first argument) — the `log_format` name.
|
|
- **`udp://127.0.0.1:9514`** — destination as a `udp://host:port` URI. `host` must be a literal IPv4 address (no hostnames, no IPv6
|
|
in v0.1).
|
|
- **`buffer=16k`** — per-worker write buffer. Lines are held in memory until the buffer fills, the flush timer fires, or the worker
|
|
exits. Default is `64k`; minimum is `1k` (FR-8.3).
|
|
- **`flush=1s`** — maximum age of buffered data before it is sent. Default is `1s`; minimum is `100ms` (FR-8.3).
|
|
|
|
Each buffer flush becomes a single `sendto()` on a per-worker `SOCK_DGRAM` socket. When the flush timer fires (or the buffer fills),
|
|
the entire buffered payload is sent as one datagram — no file open, no `write()`, no `fsync()`. If no receiver is listening, the kernel
|
|
drops the datagram silently and the worker carries on. This is by design: the logtail exists for non-critical analytics pipes where
|
|
lost datagrams are acceptable and disk I/O is not.
|
|
|
|
**Constraints (v0.1):**
|
|
|
|
- `host` must be a literal IPv4 address. Hostnames and IPv6 are not supported yet.
|
|
- Large `buffer=` values produce large datagrams. On the loopback interface the practical ceiling is ~64 KB, well above typical
|
|
configured buffer sizes. On routed paths, path MTU applies.
|
|
- There is no acknowledgment, retry, or sequence number. If the receiver is down, the data is gone.
|
|
|
|
### Filtering with `if=`
|
|
|
|
High-frequency requests like health checks can be suppressed from the logtail stream using the `if=$variable` parameter. Use a `map`
|
|
block to define which requests should be logged:
|
|
|
|
```nginx
|
|
map $request_uri $logtail_enabled {
|
|
~^/\.well-known/ipng/healthz 0;
|
|
default 1;
|
|
}
|
|
|
|
ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 buffer=16k flush=1s if=$logtail_enabled;
|
|
```
|
|
|
|
Filtered requests are still counted by the stats module — only the logtail output is suppressed. The condition is checked before the
|
|
log format is rendered, so filtered requests have zero logtail overhead. Multiple conditions can be combined using nested `map` blocks.
|
|
|
|
See [`config-guide.md`](config-guide.md#conditional-logging-with-if) for the full semantics.
|
|
|
|
**Starting a receiver** is trivial:
|
|
|
|
```bash
|
|
# Quick one-shot inspection:
|
|
nc -u -l 127.0.0.1 9514
|
|
```
|
|
|
|
For a production-ready logtail consumer, see [`nginx-logtail`](https://git.ipng.ch/ipng/nginx-logtail), which receives the UDP
|
|
datagram stream and processes it into structured log output.
|
|
|
|
A typical received log line (with the format above, tab-separated) looks like:
|
|
|
|
```
|
|
example.com 203.0.113.42 GET /index.html 200 4321 mg1 192.0.2.10 https
|
|
```
|
|
|
|
The `mg1` field comes from `$ipng_source_tag` and `https` from `$scheme` — free per-device attribution and protocol visibility in
|
|
every log line.
|
|
|
|
### Why this complements per-server `access_log`
|
|
|
|
A conventional nginx access log requires the operator to repeat `access_log /path/to/file logtail;` in every `server {}` block that
|
|
should be captured. This is error-prone: adding a new vhost and forgetting the directive means that vhost's traffic is silently absent
|
|
from the log. `ipng_stats_logtail` is installed at the module's log-phase hook, which nginx calls for every request with no per-server
|
|
configuration required.
|
|
|
|
See [`config-guide.md`](config-guide.md#ipng_stats_logtail-format_name-udphostport-buffersize-flushduration) for the full directive
|
|
reference and FR-8 for the requirements behind this feature.
|
|
|
|
## 7. Integrate with scrape consumers
|
|
|
|
The scrape endpoint (`ipng_stats;`) serves both Prometheus text and JSON from a single location. Any HTTP client that can issue a GET
|
|
request can consume it. Two integration patterns are common:
|
|
|
|
### Prometheus
|
|
|
|
See section 5 above. Prometheus scrapes the endpoint at a configured interval and stores the time series. This is the simplest
|
|
integration and covers most monitoring and alerting use cases.
|
|
|
|
### Custom consumers
|
|
|
|
The `?source_tag=<tag>` query parameter lets a consumer filter the scrape response to only the traffic attributed to a specific source.
|
|
This is useful when multiple consumers share the same nginx backends — each consumer scrapes with its own tag and never sees the
|
|
others' traffic.
|
|
|
|
The JSON output (`Accept: application/json`) includes a top-level `schema` field for versioning, making it straightforward to parse
|
|
from any language.
|
|
|
|
Once wired, a consumer can derive from the scrape data:
|
|
|
|
- Live QPS per backend (from the EWMA gauges).
|
|
- Status-class mix per backend (the six-lane `1xx`..`5xx`/`unknown` counter families). Full three-digit codes are not exported by the
|
|
scrape endpoint; route the logtail stream off-host and aggregate there if you need per-code breakdowns.
|
|
- p50/p95 latency per backend (from the duration histogram, aggregated across classes).
|
|
- Traffic volume per backend (from the bytes counters and the new bytes histograms).
|
|
|
|
For an example of this pattern in a GRE tunnel fleet, see [`vpp-maglev`](https://git.ipng.ch/ipng/vpp-maglev), whose frontend scrapes
|
|
each nginx backend filtered by source tag to show per-backend traffic alongside health state.
|
|
|
|
## Troubleshooting
|
|
|
|
**`nginx -t` reports "unknown listen parameter: device=" or "unknown listen parameter: ipng_source_tag=".** The module isn't loaded.
|
|
Check `/etc/nginx/modules-enabled/` for the `50-mod-http-ipng-stats.conf` symlink and re-run `nginx -t`.
|
|
|
|
**All traffic is attributed to `direct` even though device-bound interfaces exist.** The interface names don't match the `device=`
|
|
values in `listens.conf`, or the interfaces aren't up. Run `ip -br link` and confirm the interface names match.
|
|
|
|
**Counters reset after every reload.** They should survive `nginx -s reload`. If they don't, check that the `ipng_stats_zone` name in
|
|
`nginx.conf` is stable across reloads — renaming the zone forces a new shared-memory segment.
|
|
|
|
**`nginx_ipng_zone_full_events_total` is non-zero.** The shared-memory zone is too small for your VIP count. Increase the size in
|
|
`ipng_stats_zone ipng:<size>` (default 4 MB is enough for ~hundreds of VIPs — the code dimension is bucketed to six classes, so
|
|
one 4 MB zone holds a very large deployment).
|
|
|
|
**`nginx_ipng_ifindex_misses_total` is climbing.** A connection arrived on an interface whose ifindex isn't in the binding table.
|
|
Two common causes: (a) a configured interface was torn down and recreated (e.g. a GRE tunnel reprovision) and now has a fresh
|
|
ifindex — the per-worker rescan timer (`ipng_stats_rescan_interval`, default `60s`) will pick it up on the next tick; (b) traffic
|
|
legitimately arrives on an interface that no `device=` binding claims — either add the binding or accept that it lands under the
|
|
default source. If the counter keeps rising between rescans, shorten `ipng_stats_rescan_interval` or trigger `nginx -s reload` to
|
|
re-resolve immediately.
|
|
|
|
**`curl http://127.0.0.1:9113/.well-known/ipng/statsz` returns "403 Forbidden".** The `allow`/`deny` ACL is blocking your source address. Either add
|
|
yourself or scrape from a host already in the allow list.
|
|
|
|
## Where to go next
|
|
|
|
- [`config-guide.md`](config-guide.md) — every directive and `listen` parameter with contexts, allowed values, and defaults.
|
|
- [`design.md`](design.md) — full design document, including the attribution model, hot-path cost analysis, and failure modes.
|