PRE-RELEASE v0.7.0
Self-heal device= → ifindex attribution and expose plugin meta counters in the scrape. ipng_stats_rescan_interval (default 60s, 0 to disable) runs a per-worker timer that re-resolves every binding via if_nametoindex, so interface teardown/recreate (e.g. GRE tunnel reprovision) picks up the new ifindex without requiring an nginx reload. nginx_ipng_ifindex_misses_total increments whenever a cmsg-reported ingress ifindex doesn't match any binding — making stale mappings observable. Also expose the existing zone_full_events and flushes_total shared-memory counters, which were tracked but never emitted. JSON output gains a top-level "meta" object; schema stays at 2 (additive change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -81,6 +81,22 @@ is sized so that a scrape interval of 5–15 s sees effectively no lag.
|
||||
|
||||
See FR-4.2, FR-5.2.
|
||||
|
||||
### `ipng_stats_rescan_interval <duration>`
|
||||
|
||||
**Context:** `http`.
|
||||
|
||||
**Value:** an nginx duration string (e.g. `30s`, `60s`, `5m`) or `0` to disable.
|
||||
|
||||
**Default:** `60s`.
|
||||
|
||||
**Minimum:** `1s` (when non-zero).
|
||||
|
||||
**Effect:** sets the cadence of a per-worker timer that re-resolves every `device=<ifname>` binding via `if_nametoindex(3)`. This
|
||||
self-heals the attribution table when a configured interface is torn down and recreated (e.g. a GRE tunnel reprovision) — it gets a
|
||||
fresh kernel ifindex, which the next rescan picks up. Between the kernel change and the next tick, arriving traffic falls through to
|
||||
the default source and increments `nginx_ipng_ifindex_misses_total`; watch that counter to size this interval. Set to `0` to disable
|
||||
and rely solely on `nginx -s reload` (which always re-runs `if_nametoindex` for every binding in the new cycle).
|
||||
|
||||
### `ipng_stats_default_source <tag>`
|
||||
|
||||
**Context:** `http`.
|
||||
@@ -276,7 +292,8 @@ per-three-digit-code breakdown should enable `ipng_stats_logtail` and derive it
|
||||
| `nginx_ipng_zone_bytes_used` | gauge | — | Shared-memory zone bytes currently allocated. |
|
||||
| `nginx_ipng_zone_bytes_total` | gauge | — | Shared-memory zone capacity in bytes. |
|
||||
| `nginx_ipng_zone_full_events_total` | counter | — | Number of key insertions dropped because the zone was full. |
|
||||
| `nginx_ipng_flushes_total` | counter | `worker` | Number of per-worker flush ticks executed. |
|
||||
| `nginx_ipng_flushes_total` | counter | — | Per-worker flushes into the shared zone, summed across workers. |
|
||||
| `nginx_ipng_ifindex_misses_total` | counter | — | Connections whose ingress ifindex did not match any configured `device=` binding. |
|
||||
| `nginx_ipng_flush_duration_seconds` | histogram | `worker` | Histogram of flush durations. |
|
||||
| `nginx_ipng_scrape_duration_seconds` | histogram | — | Histogram of scrape handler runtimes. |
|
||||
|
||||
@@ -287,6 +304,11 @@ See FR-2.*, FR-3.7.
|
||||
```json
|
||||
{
|
||||
"schema": 2,
|
||||
"meta": {
|
||||
"ifindex_misses": 0,
|
||||
"zone_full_events": 0,
|
||||
"flushes_total": 1234
|
||||
},
|
||||
"records": [
|
||||
{
|
||||
"source_tag": "mg1",
|
||||
@@ -321,6 +343,7 @@ See FR-3.6.
|
||||
| --- | --- | --- | --- | --- |
|
||||
| `ipng_stats_zone` | ✅ | — | — | — |
|
||||
| `ipng_stats_flush_interval` | ✅ | — | — | — |
|
||||
| `ipng_stats_rescan_interval` | ✅ | — | — | — |
|
||||
| `ipng_stats_default_source` | ✅ | — | — | — |
|
||||
| `ipng_stats_buckets` | ✅ | — | — | — |
|
||||
| `ipng_stats_byte_buckets` | ✅ | — | — | — |
|
||||
|
||||
@@ -434,6 +434,13 @@ values in `listens.conf`, or the interfaces aren't up. Run `ip -br link` and con
|
||||
`ipng_stats_zone ipng:<size>` (default 4 MB is enough for ~hundreds of VIPs — the code dimension is bucketed to six classes, so
|
||||
one 4 MB zone holds a very large deployment).
|
||||
|
||||
**`nginx_ipng_ifindex_misses_total` is climbing.** A connection arrived on an interface whose ifindex isn't in the binding table.
|
||||
Two common causes: (a) a configured interface was torn down and recreated (e.g. a GRE tunnel reprovision) and now has a fresh
|
||||
ifindex — the per-worker rescan timer (`ipng_stats_rescan_interval`, default `60s`) will pick it up on the next tick; (b) traffic
|
||||
legitimately arrives on an interface that no `device=` binding claims — either add the binding or accept that it lands under the
|
||||
default source. If the counter keeps rising between rescans, shorten `ipng_stats_rescan_interval` or trigger `nginx -s reload` to
|
||||
re-resolve immediately.
|
||||
|
||||
**`curl http://127.0.0.1:9113/.well-known/ipng/statsz` returns "403 Forbidden".** The `allow`/`deny` ACL is blocking your source address. Either add
|
||||
yourself or scrape from a host already in the allow list.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user