Make the shared-listen-include pattern work with `reuseport` and the
other socket-level listen options. Nginx core enforces at-most-once
per sockaddr on options that set lsopt.set=1 (reuseport, bind,
backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=, accept_filter=,
deferred, ipv6only=, so_keepalive=) and emits "duplicate listen
options for <addr>" otherwise. That rule collides with a single
listens.conf included from every vhost — each vhost's include
re-submits the same options.
The listen wrapper now detects the cross-cscf case, strips those
options from cf->args before delegating to the core handler, and
logs one notice per stripped listen. The first cscf owns the
options on the kernel socket; later cscfs merge cleanly via
ngx_http_add_server. Protocol-level flags (ssl, http2, quic,
proxy_protocol) pass through untouched since nginx OR-merges those
across cscfs.
This unblocks `reuseport` for deployments that want better
new-connection spread across workers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous wrapper skipped nginx's duplicate-listen check only
for listens that carried device=, so a `listen 80;` next to a
`listen 80 device=eth0 ...;` in the same server block was
rejected at config time. Under SO_BINDTODEVICE that restriction
tracked a real kernel constraint (device-tagged listens created
separate sockets, a bare listen alongside them was genuinely
ambiguous). Under the IP_PKTINFO model introduced in 450391a
the constraint no longer exists — all same-sockaddr listens
collapse to one wildcard kernel socket and attribution is a
per-connection cmsg readback — but the wrapper kept enforcing
the old rule by accident.
Extend the (cscf, sockaddr) dedup in the listen wrapper to
cover plain listens too: the first occurrence at a given
(server, sockaddr) pair calls nginx's handler and registers the
kernel socket, and every subsequent sibling — plain or
device-tagged — is accepted without tripping nginx's
duplicate-listen check. Device-tagged siblings additionally
push a binding into the attribution table as before; plain
siblings contribute only the seen-list entry. No code path
exercised by the existing 22 e2e tests changes behavior.
Update FR-1.5, the user-guide "shared port" section, the
module's top-of-function comments, and the test nginx.conf
comment to describe the relaxed rule. Bump VERSION and add a
debian/changelog entry for 0.7.1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-heal device= → ifindex attribution and expose plugin meta
counters in the scrape.
ipng_stats_rescan_interval (default 60s, 0 to disable) runs a
per-worker timer that re-resolves every binding via if_nametoindex,
so interface teardown/recreate (e.g. GRE tunnel reprovision) picks
up the new ifindex without requiring an nginx reload.
nginx_ipng_ifindex_misses_total increments whenever a cmsg-reported
ingress ifindex doesn't match any binding — making stale mappings
observable. Also expose the existing zone_full_events and
flushes_total shared-memory counters, which were tracked but never
emitted. JSON output gains a top-level "meta" object; schema stays
at 2 (additive change).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Describe the ipng_stats_logtail UDP feature in debian/control alongside
the per-VIP / per-device counter description, so the package metadata
reflects what the module actually ships.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The SO_BINDTODEVICE → IP_PKTINFO switch in the previous commit
was a semantic change: the module no longer touches outgoing
routing at all, and several places in the docs and the module's
top-of-file comment still described the old mechanism.
- README.md and debian/control now describe attribution as
reading the ingress ifindex per connection from the kernel's
IP_PKTINFO / IPV6_PKTINFO cmsg, and explicitly call out that
the DSR / maglev return-path constraint is what makes the
change necessary.
- docs/design.md FR-1.1 / FR-1.5 / FR-1.6 are rewritten to
forbid SO_BINDTODEVICE and to describe the cmsg-based lookup.
NFR-6.1 notes these are ordinary unprivileged socket options.
The "Components" / "Composes With" sections and the
"Alternatives Considered" entry are brought in line — and a
new entry records SO_BINDTODEVICE as a rejected alternative
with the exact failure mode seen on an IPng production box.
- docs/config-guide.md already carried the new description;
unchanged here.
- src/ngx_http_ipng_stats_module.c's top-level block comment is
rewritten to match; the section header above init_module goes
from "rebind listen sockets with SO_BINDTODEVICE" to "enable
IP_PKTINFO on listen sockets, resolve ifindexes".
Three SO_BINDTODEVICE mentions deliberately remain in the source
and one in the design doc's alternatives table — all of them
explain that the module *avoids* the option, which is itself
load-bearing documentation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SO_BINDTODEVICE pins both ingress *and* egress to the bound
interface — the kernel uses the listening socket's device
binding when choosing the output interface for the SYN-ACK,
which is sent before accept() returns and therefore can't be
fixed up in userspace. That's fatal for maglev / DSR
deployments where the SYN arrives through a GRE tunnel but the
return path has to leave via the default route; the SYN-ACK
goes out the GRE and is dropped by the uplink, so every new
connection times out.
Rework the listen plumbing so the module never touches
SO_BINDTODEVICE. init_module now enables IP_PKTINFO and
IPV6_RECVPKTINFO on every HTTP listening socket and resolves
each configured `device=` name to an ifindex. At request time
resolve_source calls getsockopt(IP_PKTOPTIONS) on the accepted
fd to read the per-connection in(6)_pktinfo cmsg the kernel
stashed during the handshake, then matches (ifindex, family)
against the bindings table. The listening sockets remain plain
wildcards, so the return path follows the normal routing table
and DSR works.
The wrapper also no longer clones or rebinds sockets: it still
dedups per (cscf, sockaddr) so multiple device-tagged listens
in a single server block coexist, and dedups bindings on
(device, family) so the same device can carry different tags
for v4 and v6 (e.g. tag2-v4 / tag2-v6) but not pointlessly
duplicate when a listen include is shared across server blocks.
Drive-by fixes to unblock `make pkg-deb` after a prior
`make build-asan`:
- debian/rules overrides dh_clean to exclude build/, since
nginx-asan's install creates nobody:0700 temp dirs dh_clean
can't traverse.
- Makefile's build-asan removes those unused runtime temp dirs
so the tree is clean afterwards.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces a VERSION variable in the top-level Makefile as the
authoritative source for the module's reported version. A new
version-header target writes src/version.h only when the content
would change, so no-op rebuilds don't rewrite the file. The C source
#includes that header in place of a hardcoded #define; the
user-guide's install example is wildcarded
(libnginx-mod-http-ipng-stats_*_amd64.deb) so it doesn't drift.
The design doc still references v0.2.0 by name — operators read it as
a point-in-time description, not a moving target.
debian/changelog keeps its own 0.2.0-1 entry because dpkg reads the
package version from there directly; the e2e test is updated to match
the JSON schema bump to 2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full implementation of the nginx dynamic module with:
- SO_BINDTODEVICE-based per-interface traffic attribution
- Per-worker lock-free counters flushed to shared memory
- Prometheus text and JSON scrape endpoint at configurable location
- UDP-only global logtail (ipng_stats_logtail) for fire-and-forget
access log streaming
- $ipng_source_tag nginx variable for use in log_format/map
- Histogram buckets, EWMA rate gauges, zone meta-metrics
- Debian packaging (libnginx-mod-http-ipng-stats)
- Robot Framework end-to-end tests via containerlab
- SPDX Apache-2.0 headers on all source files