Commit Graph

23 Commits

Author SHA1 Message Date
Pim van Pelt
d501e3c2c3 Merge remote-tracking branch 'origin/main'
# Conflicts:
#	Makefile
#	debian/changelog
#	tests/01-module/01-e2e.robot
2026-04-23 09:28:56 +02:00
Pim van Pelt
4b1b412fb7 PRE-RELEASE v0.8.2 2026-04-23 09:27:53 +02:00
Pim van pelt
db551cfa08 PRE-RELEASE v0.8.1
Drop the flaky "Shared-listen-include across multiple server blocks"
robot test. Its final assertion looked for the
"ipng_stats: stripped socket options from duplicate listen" NOTICE
in `nginx -t 2>&1`, but that message is emitted via
ngx_conf_log_error at NGX_LOG_NOTICE and lands in the configured
error_log destination (the lab routes it into docker logs) rather
than the subprocess's stderr. The stripping behaviour itself still
works — it just isn't observable from `nginx -t` output in this
harness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 10:10:35 +02:00
Pim van pelt
2b7a2c136a PRE-RELEASE v0.8.0
Bundles the discard-body accounting fix for nginx_ipng_bytes_in_total
and the robot-test/install-deps iteration-speed work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 10:02:12 +02:00
Pim van pelt
05d405aba5 Count discarded POST bodies against bytes_in
Counter coverage for nginx_ipng_bytes_in_total was missing most of
the wire for rate-limited POST workloads. Observed on a CT-log-style
box: the plugin was reporting ~300 KB/s inbound while btop showed
6+ MB/s on the NIC, a ~20x gap. Root cause is in nginx itself: when
a request is rejected before a handler reads the body (limit_req
403, auth_request denial, 413 on fixed Content-Length, early
`return 4xx;`), nginx routes the body through
ngx_http_discard_request_body / HTTP/2 skip_data. Both paths drop
the bytes without ever incrementing r->request_length, so
$request_length (and therefore the plugin's bytes_in counter)
reflects only the request line and headers.

Log handler now adds r->headers_in.content_length_n to bin_sz when
r->discard_body is set and the client advertised a Content-Length.
That's a tight upper bound on the body bytes actually received —
abusive clients like the CT-log hammer case typically send the
full declared body before nginx's RST propagates. Normal 200 POSTs
are unchanged (they go through the buffered body path, which does
update r->request_length, and r->discard_body stays 0).

docs/nginx-issues.md catalogs the full analysis — which nginx paths
update r->request_length and which don't, how 413 differs between
fixed-CL and chunked and HTTP/2, why r->connection->sent is safe to
accumulate (it's reset per-request on keepalive and per-stream on
HTTP/2), and what the residual gap looks like after this fix
(HTTP/2 framing overhead, TLS/TCP, chunked requests with no
Content-Length header). Option B (per-connection recv wrapping) is
sketched there for later if the residual matters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 09:57:45 +02:00
Pim van pelt
96f77baacd Make robot-test reuse artifacts and iterate quickly
Pull the .deb + ASan-nginx rebuild out of `make robot-test` — a full
dpkg-buildpackage + nginx recompile before every test run was turning
a 15-second test loop into a multi-minute one, which hurts when
iterating on a flaky suite. robot-test now fails fast with an
actionable message if either artifact is missing:

  Bootstrap once:  make pkg-deb build-asan
  Then iterate:    make robot-test       # reuses both

install-deps grew to cover what a truly minimal Debian box needs —
`build-essential`, `ca-certificates`, and an explicit check that
`deb-src` is enabled (required by `apt source nginx`, which both
fetch-nginx-src and build-asan rely on). `nginx-dev` transitively
brings in the nginx build-deps (libpcre2-dev, libssl-dev, libxslt1-dev,
libgeoip-dev, libperl-dev, libexpat-dev, libgd-dev, zlib1g-dev,
debhelper-compat, po-debconf) so those stay off the explicit list.

debian/rules' override_dh_clean now pre-clears
build/nginx-asan/{fastcgi,proxy,scgi,uwsgi,client_body}_temp before
running dh_clean. Those dirs get chowned to "nobody" when the 02-asan
robot suite bind-mounts build/nginx-asan/ RW into its container and
nginx master startup creates them — subsequent pkg-deb runs were
dying with EACCES from dh_clean's find traversal. rm -rf only needs
write access to the parent (which we have), so this is safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 09:56:53 +02:00
Pim van Pelt
7ed77f5b22 Strip socket options on cross-cscf repeat listens (v0.7.2)
Make the shared-listen-include pattern work with `reuseport` and the
other socket-level listen options. Nginx core enforces at-most-once
per sockaddr on options that set lsopt.set=1 (reuseport, bind,
backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=, accept_filter=,
deferred, ipv6only=, so_keepalive=) and emits "duplicate listen
options for <addr>" otherwise. That rule collides with a single
listens.conf included from every vhost — each vhost's include
re-submits the same options.

The listen wrapper now detects the cross-cscf case, strips those
options from cf->args before delegating to the core handler, and
logs one notice per stripped listen. The first cscf owns the
options on the kernel socket; later cscfs merge cleanly via
ngx_http_add_server. Protocol-level flags (ssl, http2, quic,
proxy_protocol) pass through untouched since nginx OR-merges those
across cscfs.

This unblocks `reuseport` for deployments that want better
new-connection spread across workers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:23:58 +02:00
Pim van Pelt
badb684431 Allow plain and device-tagged listens to share a sockaddr (v0.7.1)
The previous wrapper skipped nginx's duplicate-listen check only
for listens that carried device=, so a `listen 80;` next to a
`listen 80 device=eth0 ...;` in the same server block was
rejected at config time. Under SO_BINDTODEVICE that restriction
tracked a real kernel constraint (device-tagged listens created
separate sockets, a bare listen alongside them was genuinely
ambiguous). Under the IP_PKTINFO model introduced in 450391a
the constraint no longer exists — all same-sockaddr listens
collapse to one wildcard kernel socket and attribution is a
per-connection cmsg readback — but the wrapper kept enforcing
the old rule by accident.

Extend the (cscf, sockaddr) dedup in the listen wrapper to
cover plain listens too: the first occurrence at a given
(server, sockaddr) pair calls nginx's handler and registers the
kernel socket, and every subsequent sibling — plain or
device-tagged — is accepted without tripping nginx's
duplicate-listen check. Device-tagged siblings additionally
push a binding into the attribution table as before; plain
siblings contribute only the seen-list entry. No code path
exercised by the existing 22 e2e tests changes behavior.

Update FR-1.5, the user-guide "shared port" section, the
module's top-of-function comments, and the test nginx.conf
comment to describe the relaxed rule. Bump VERSION and add a
debian/changelog entry for 0.7.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 09:17:44 +02:00
Pim van Pelt
8e0b1cdde9 PRE-RELEASE v0.7.0
Self-heal device= → ifindex attribution and expose plugin meta
counters in the scrape.

ipng_stats_rescan_interval (default 60s, 0 to disable) runs a
per-worker timer that re-resolves every binding via if_nametoindex,
so interface teardown/recreate (e.g. GRE tunnel reprovision) picks
up the new ifindex without requiring an nginx reload.

nginx_ipng_ifindex_misses_total increments whenever a cmsg-reported
ingress ifindex doesn't match any binding — making stale mappings
observable. Also expose the existing zone_full_events and
flushes_total shared-memory counters, which were tracked but never
emitted. JSON output gains a top-level "meta" object; schema stays
at 2 (additive change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:37:15 +02:00
Pim van Pelt
59f3deef66 PRE-RELEASE v0.6.0
Describe the ipng_stats_logtail UDP feature in debian/control alongside
the per-VIP / per-device counter description, so the package metadata
reflects what the module actually ships.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:31:31 +02:00
Pim van Pelt
055cf9f830 Update docs and header comments for the IP_PKTINFO attribution model
The SO_BINDTODEVICE → IP_PKTINFO switch in the previous commit
was a semantic change: the module no longer touches outgoing
routing at all, and several places in the docs and the module's
top-of-file comment still described the old mechanism.

- README.md and debian/control now describe attribution as
  reading the ingress ifindex per connection from the kernel's
  IP_PKTINFO / IPV6_PKTINFO cmsg, and explicitly call out that
  the DSR / maglev return-path constraint is what makes the
  change necessary.
- docs/design.md FR-1.1 / FR-1.5 / FR-1.6 are rewritten to
  forbid SO_BINDTODEVICE and to describe the cmsg-based lookup.
  NFR-6.1 notes these are ordinary unprivileged socket options.
  The "Components" / "Composes With" sections and the
  "Alternatives Considered" entry are brought in line — and a
  new entry records SO_BINDTODEVICE as a rejected alternative
  with the exact failure mode seen on an IPng production box.
- docs/config-guide.md already carried the new description;
  unchanged here.
- src/ngx_http_ipng_stats_module.c's top-level block comment is
  rewritten to match; the section header above init_module goes
  from "rebind listen sockets with SO_BINDTODEVICE" to "enable
  IP_PKTINFO on listen sockets, resolve ifindexes".

Three SO_BINDTODEVICE mentions deliberately remain in the source
and one in the design doc's alternatives table — all of them
explain that the module *avoids* the option, which is itself
load-bearing documentation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:16:10 +02:00
Pim van Pelt
31c2ac2d65 PRE-RELEASE v0.5.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:00:46 +02:00
Pim van Pelt
450391af6b Switch per-device attribution from SO_BINDTODEVICE to IP_PKTINFO
SO_BINDTODEVICE pins both ingress *and* egress to the bound
interface — the kernel uses the listening socket's device
binding when choosing the output interface for the SYN-ACK,
which is sent before accept() returns and therefore can't be
fixed up in userspace. That's fatal for maglev / DSR
deployments where the SYN arrives through a GRE tunnel but the
return path has to leave via the default route; the SYN-ACK
goes out the GRE and is dropped by the uplink, so every new
connection times out.

Rework the listen plumbing so the module never touches
SO_BINDTODEVICE. init_module now enables IP_PKTINFO and
IPV6_RECVPKTINFO on every HTTP listening socket and resolves
each configured `device=` name to an ifindex. At request time
resolve_source calls getsockopt(IP_PKTOPTIONS) on the accepted
fd to read the per-connection in(6)_pktinfo cmsg the kernel
stashed during the handshake, then matches (ifindex, family)
against the bindings table. The listening sockets remain plain
wildcards, so the return path follows the normal routing table
and DSR works.

The wrapper also no longer clones or rebinds sockets: it still
dedups per (cscf, sockaddr) so multiple device-tagged listens
in a single server block coexist, and dedups bindings on
(device, family) so the same device can carry different tags
for v4 and v6 (e.g. tag2-v4 / tag2-v6) but not pointlessly
duplicate when a listen include is shared across server blocks.

Drive-by fixes to unblock `make pkg-deb` after a prior
`make build-asan`:
- debian/rules overrides dh_clean to exclude build/, since
  nginx-asan's install creates nobody:0700 temp dirs dh_clean
  can't traverse.
- Makefile's build-asan removes those unused runtime temp dirs
  so the tree is clean afterwards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:00:46 +02:00
Pim van Pelt
cf7a538ee6 PRE-RELEASE v0.4.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:45:12 +02:00
Pim van Pelt
1f144f4c19 Fix shared-listen-include pattern across multiple server blocks
v0.3.0's listen wrapper treated every listen beyond the first at a
given sockaddr as a skip-core "duplicate", which was correct for
two `listen 80 device=X/Y;` lines in one server block but broke on
the real deployment pattern where every server-*.conf pulls in the
same `include listens.conf;`. Symptoms:

  * every server block after the first ended up with no listen
    directive processed, so nginx assigned them the default
    `*:80`, producing a flood of "conflicting server name"
    warnings and attaching every server block to an unrelated
    wildcard bind;
  * the bindings list grew linearly with the number of server
    blocks, so init_module tried to create (server_blocks) ×
    (devices × families) listening sockets and hit EMFILE.

Replace the single dedup with two independent checks:

  * listens_seen is a (cscf, sockaddr) ledger. The core listen
    handler is invoked at most once per (server block, sockaddr),
    matching nginx's own duplicate check so server-block N just
    attaches its cscf to the existing address via
    ngx_http_add_server.

  * `bind` is added only for the first global occurrence of each
    sockaddr; subsequent cscfs inherit opt.set/opt.bind from the
    first, which is what keeps nginx's "duplicate listen options"
    check happy across server blocks.

  * bindings dedup on (sockaddr, device) globally, so init_module
    creates one socket per unique pair regardless of how many
    server blocks reference it.

Add a regression test at tests/01-module/ that wires three server
blocks to the same ipng-listens.inc and asserts that nginx -t is
clean and exactly four sockets are bound on port 8080.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:15:04 +02:00
Pim van Pelt
ef821e577b PRE-RELEASE v0.3.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:49:49 +02:00
Pim van Pelt
df05bae8a3 Support multiple device-pinned listens sharing a single port
Nginx's config-level duplicate-listen check rejected the
documented pattern of `listen 80 device=X ipng_source_tag=A;
listen 80 device=Y ipng_source_tag=B;` with "a duplicate listen
0.0.0.0:80", and even when the dedup was bypassed the kernel
refused the second bind() because the first socket was already
holding the port without SO_BINDTODEVICE.

The listen wrapper now detects same-sockaddr duplicates before
the core handler sees them and records them with `needs_clone=1`.
In init_module, phase 1 clones an ngx_listening_t for each such
duplicate, phase 3 closes every inherited naked fd, and phase 4
rebinds every target with SO_REUSEADDR + SO_REUSEPORT +
SO_BINDTODEVICE set before bind(). SO_REUSEPORT keeps
`nginx -s reload` from colliding with the still-bound sockets
held by old workers during graceful drain; IPV6_V6ONLY matches
nginx's default so the IPv6 listen doesn't claim the IPv4
wildcard and collide with sibling IPv4-specific listens.

Restructure 01-module to cover the pattern end-to-end: four
device-pinned listens on port 8080 (eth1 shares tag `tag1`
across v4 and v6; eth2 splits into `tag2-v4` / `tag2-v6`),
clients and server both get IPv6 addresses, and a new
"Per-(device, family) request count accuracy" case proves that
10 requests on each of the four combinations yields tag1=20,
tag2-v4=10, tag2-v6=10. Mgmt/direct traffic moves to port 9180
so it no longer clashes with the shared-port wildcards.

Document the constraint in docs/user-guide.md: all listens on
a given port must carry `device=`, and direct traffic belongs
on a separate port.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:45:40 +02:00
Pim van Pelt
fdef2a552b Harden scrape rendering and add AddressSanitizer test suite
Move all heap allocation out of the slab-mutex critical section in
render_prom/render_json: snapshot cardinality under a brief lock,
allocate aggs/snaps/string tables outside the lock, then re-acquire
only to deep-copy strings and walk the LRU into the pre-allocated
buffers. A worker crash during output buffer allocation can no
longer leave the shared-memory zone locked, and a corrupt cardinality
count is caught by a 10k sanity cap rather than causing a runaway
ngx_pcalloc.

Add build-asan and tests/02-asan/: a full sanitizer-instrumented
nginx + module built via apt-source, and a 2-node containerlab
Robot suite that drives reload storms, concurrent scrape-during-reload,
and intern-table growth, failing if AddressSanitizer or UBSan
reports anything on stderr. The two Robot suites now check for
their required build artifacts up front so `make robot-test` no
longer rebuilds them on every invocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:58:51 +02:00
Pim van Pelt
cdcbb07c9a Release v0.2.0: single-source the version, wildcard it in docs
Introduces a VERSION variable in the top-level Makefile as the
authoritative source for the module's reported version. A new
version-header target writes src/version.h only when the content
would change, so no-op rebuilds don't rewrite the file. The C source
#includes that header in place of a hardcoded #define; the
user-guide's install example is wildcarded
(libnginx-mod-http-ipng-stats_*_amd64.deb) so it doesn't drift.

The design doc still references v0.2.0 by name — operators read it as
a point-in-time description, not a moving target.

debian/changelog keeps its own 0.2.0-1 entry because dpkg reads the
package version from there directly; the e2e test is updated to match
the JSON schema bump to 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:45:15 +02:00
Pim van Pelt
b3ad74cbde Reduce scrape cardinality: class codes, per-(source,vip) histograms, byte histograms
Collapses the status-code dimension of the counter key into six class
lanes (1xx..5xx/unknown) so per-(source,vip) counter cardinality no
longer grows with the number of distinct three-digit responses nginx
serves. Histogram series drop the code label entirely and aggregate
across classes. Adds nginx_ipng_latency_total with a code class label
so average latency per class can still be computed off the scrape.
Adds nginx_ipng_bytes_{in,out} histograms with configurable boundaries
via the new ipng_stats_byte_buckets directive. Bumps JSON schema to 2.

Operators who need full three-digit-code resolution should consume the
ipng_stats_logtail stream off-host; the stats zone intentionally trades
that resolution for a bounded scrape size.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:36:16 +02:00
Pim van Pelt
87050bcf13 Add logtail if=$variable filtering and update log format examples
- ipng_stats_logtail now accepts an optional if=$variable parameter
  that suppresses log lines when the variable is empty or "0",
  following the same semantics as nginx's access_log if=. The
  condition is checked before format rendering for zero overhead on
  filtered requests. Filtered requests are still counted by stats.
- Log format examples updated to include $scheme for http/https
  visibility, and renamed to ipng_stats_logtail to match production.
- Robot test added for the if= filter (19 tests, 19 pass).
- FR-8.5 added to design doc for the if= semantics.
2026-04-16 18:49:22 +02:00
Pim van Pelt
5a7e2f77f1 Add ngx_http_ipng_stats_module: per-VIP, per-device traffic counters
Full implementation of the nginx dynamic module with:
- SO_BINDTODEVICE-based per-interface traffic attribution
- Per-worker lock-free counters flushed to shared memory
- Prometheus text and JSON scrape endpoint at configurable location
- UDP-only global logtail (ipng_stats_logtail) for fire-and-forget
  access log streaming
- $ipng_source_tag nginx variable for use in log_format/map
- Histogram buckets, EWMA rate gauges, zone meta-metrics
- Debian packaging (libnginx-mod-http-ipng-stats)
- Robot Framework end-to-end tests via containerlab
- SPDX Apache-2.0 headers on all source files
2026-04-16 17:41:23 +02:00
Pim van Pelt
c05bcf6aa6 Add designdoc and AP2.0 license for this nginx module 2026-04-16 02:12:56 +02:00