Strip socket options on cross-cscf repeat listens (v0.7.2)

Make the shared-listen-include pattern work with `reuseport` and the
other socket-level listen options. Nginx core enforces at-most-once
per sockaddr on options that set lsopt.set=1 (reuseport, bind,
backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=, accept_filter=,
deferred, ipv6only=, so_keepalive=) and emits "duplicate listen
options for <addr>" otherwise. That rule collides with a single
listens.conf included from every vhost — each vhost's include
re-submits the same options.

The listen wrapper now detects the cross-cscf case, strips those
options from cf->args before delegating to the core handler, and
logs one notice per stripped listen. The first cscf owns the
options on the kernel socket; later cscfs merge cleanly via
ngx_http_add_server. Protocol-level flags (ssl, http2, quic,
proxy_protocol) pass through untouched since nginx OR-merges those
across cscfs.

This unblocks `reuseport` for deployments that want better
new-connection spread across workers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-19 16:23:58 +02:00
parent badb684431
commit 7ed77f5b22
8 changed files with 168 additions and 18 deletions

View File

@@ -23,7 +23,7 @@ BUILD_DIR := $(CURDIR)/build
# the package version from there directly. The C code picks up VERSION # the package version from there directly. The C code picks up VERSION
# via the generated src/version.h (written by the version-header target # via the generated src/version.h (written by the version-header target
# below and depended on by the module build). # below and depended on by the module build).
VERSION := 0.7.1 VERSION := 0.7.2
NGINX_SRC ?= NGINX_SRC ?=

20
debian/changelog vendored
View File

@@ -1,3 +1,23 @@
nginx-ipng-stats-plugin (0.7.2-1) unstable; urgency=medium
* Pre-release v0.7.2.
- Listen wrapper now strips socket-level options (reuseport,
bind, backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=,
accept_filter=, deferred, ipv6only=, so_keepalive=) from
cf->args when a sockaddr recurs under a different server
block. Previously this pattern — typical with a shared
`include listens.conf;` across vhosts — tripped nginx's
"duplicate listen options for <addr>" check because those
options are one-shot per kernel socket. The first cscf now
owns the options on the shared kernel socket and later
cscfs merge cleanly via ngx_http_add_server. A NOTICE is
logged each time socket options are stripped. This makes
`reuseport` usable with the shared-include deployment
pattern from docs/user-guide.md, which helps worker
load-balancing on busy hosts.
-- Pim van Pelt <pim@ipng.ch> Sun, 19 Apr 2026 16:30:00 +0200
nginx-ipng-stats-plugin (0.7.1-1) unstable; urgency=medium nginx-ipng-stats-plugin (0.7.1-1) unstable; urgency=medium
* Pre-release v0.7.1. * Pre-release v0.7.1.

View File

@@ -90,6 +90,14 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
`(server block, sockaddr)` across both plain and device-tagged listens: the first occurrence registers the kernel socket, `(server block, sockaddr)` across both plain and device-tagged listens: the first occurrence registers the kernel socket,
and subsequent same-sockaddr siblings (plain or device-tagged) are accepted without tripping nginx's duplicate-listen check. and subsequent same-sockaddr siblings (plain or device-tagged) are accepted without tripping nginx's duplicate-listen check.
Device-tagged siblings additionally register an entry in the attribution table. Device-tagged siblings additionally register an entry in the attribution table.
- **FR-1.5a** When the *same* sockaddr is listen'd from a *different* `server { ... }` block — the shared `include` pattern —
the wrapper MUST strip socket-level options from `cf->args` before delegating to the core listen handler. These options
(`reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`, `deferred`, `ipv6only=`,
`so_keepalive=`) apply to the one kernel socket that backs the sockaddr and nginx rejects any attempt to set them more
than once per sockaddr with `duplicate listen options for <addr>` (see `ngx_http_add_addresses` in `src/http/ngx_http.c`).
The first cscf to hit the sockaddr owns the options; subsequent cscfs pass the core handler with the options removed and
merge via `ngx_http_add_server`. Protocol-level flags (`ssl`, `http2`, `quic`, `proxy_protocol`) are preserved on every call
because nginx merges them with OR semantics across cscfs.
- **FR-1.6** A `listen` directive that uses a wildcard address (`80`, `[::]:80`) together with `device=<ifname>` MUST attribute - **FR-1.6** A `listen` directive that uses a wildcard address (`80`, `[::]:80`) together with `device=<ifname>` MUST attribute
every connection whose ingress interface is `<ifname>` — regardless of which local address the client addressed — to that every connection whose ingress interface is `<ifname>` — regardless of which local address the client addressed — to that
listen's source tag. Traffic on other interfaces MUST fall back to the configured default source (see FR-1.3). listen's source tag. Traffic on other interfaces MUST fall back to the configured default source (see FR-1.3).

View File

@@ -199,6 +199,31 @@ register bindings without tripping nginx's duplicate-listen check. Traffic arriv
back to `ipng_stats_default_source` (`direct` by default). Keeping "direct" traffic on its own port — e.g. back to `ipng_stats_default_source` (`direct` by default). Keeping "direct" traffic on its own port — e.g.
`listen 198.51.100.1:8081;` — remains a fine pattern when you want a hard split, but it's no longer required. `listen 198.51.100.1:8081;` — remains a fine pattern when you want a hard split, but it's no longer required.
### Shared includes with `reuseport` (or other socket-level options)
Socket-level `listen` options — `reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`,
`deferred`, `ipv6only=`, `so_keepalive=` — belong to the one kernel socket that backs a given sockaddr, not to a particular
`server { ... }` block. Stock nginx enforces this by accepting them on at most the *first* listen per sockaddr and emitting
`duplicate listen options for <addr>` on any subsequent repeat. That rule collides with the common deployment pattern of a single
`listens.conf` included from every vhost, because each vhost's `include` re-submits the same options.
The wrapper resolves this transparently. When a sockaddr recurs under a different `server` block than the one that first
registered it, the wrapper strips socket-level options from the incoming `cf->args` before delegating to nginx's core listen
handler. The first `server` block owns the options on the kernel socket (including `reuseport`, which triggers per-worker
socket cloning); later blocks merge cleanly via `ngx_http_add_server` and inherit the same socket. The wrapper logs one
`[notice] ipng_stats: stripped socket options from duplicate listen on <addr>` per stripped listen — informational, not an
error. So this include works unchanged across as many vhosts as you like:
```nginx
listen 443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
listen [::]:443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
```
`reuseport` noticeably helps worker load-balancing on busy hosts: without it, a single shared listening socket forces workers
to compete for accepts and traffic routinely concentrates on one or two workers. HTTP/2 and long-lived keepalive connections
can still skew CPU toward whichever worker holds a few heavy clients — `reuseport` does not reshuffle existing connections —
but new-connection distribution across workers becomes kernel-hashed, not first-ready-wins.
## 4. Verify with curl ## 4. Verify with curl
Generate some traffic (or wait for real traffic), then scrape the endpoint locally: Generate some traffic (or wait for real traffic), then scrape the endpoint locally:

View File

@@ -499,6 +499,50 @@ ngx_http_ipng_stats_preconfig(ngx_conf_t *cf)
} }
/* Return 1 if `a` is a listen parameter that makes nginx set
* `lsopt.set = 1` (i.e. a socket-level option that nginx enforces
* at-most-once per sockaddr via its "duplicate listen options" check
* in ngx_http_add_addresses). Keep in sync with nginx core's
* ngx_http_core_listen — currently: bind, setfib=, fastopen=,
* backlog=, rcvbuf=, sndbuf=, accept_filter=, deferred, ipv6only=,
* reuseport, so_keepalive=. */
static ngx_uint_t
ngx_http_ipng_stats_arg_is_set_triggering(ngx_str_t *a)
{
static const struct { const char *s; size_t n; ngx_uint_t exact; } tbl[] = {
{ "bind", 4, 1 },
{ "deferred", 8, 1 },
{ "reuseport", 9, 1 },
{ "setfib=", 7, 0 },
{ "fastopen=", 9, 0 },
{ "backlog=", 8, 0 },
{ "rcvbuf=", 7, 0 },
{ "sndbuf=", 7, 0 },
{ "accept_filter=",14, 0 },
{ "ipv6only=", 9, 0 },
{ "so_keepalive=", 13, 0 },
};
ngx_uint_t k;
for (k = 0; k < sizeof(tbl) / sizeof(tbl[0]); k++) {
if (tbl[k].exact) {
if (a->len == tbl[k].n
&& ngx_strncmp(a->data, tbl[k].s, tbl[k].n) == 0)
{
return 1;
}
} else {
if (a->len > tbl[k].n
&& ngx_strncmp(a->data, tbl[k].s, tbl[k].n) == 0)
{
return 1;
}
}
}
return 0;
}
/* The wrapper extracts device= and ipng_source_tag= from cf->args, compacting /* The wrapper extracts device= and ipng_source_tag= from cf->args, compacting
* the array in place, then calls the original ngx_http_core_module * the array in place, then calls the original ngx_http_core_module
* listen handler. After a successful call it records a binding in * listen handler. After a successful call it records a binding in
@@ -580,24 +624,59 @@ ngx_http_ipng_stats_listen_wrapper(ngx_conf_t *cf, ngx_command_t *cmd,
* nginx's handler runs once (creating the socket), subsequent * nginx's handler runs once (creating the socket), subsequent
* same-sockaddr listens just register additional device * same-sockaddr listens just register additional device
* bindings. Across different server blocks the same sockaddr * bindings. Across different server blocks the same sockaddr
* re-appears and nginx merges the cscf via ngx_http_add_server. */ * re-appears and nginx merges the cscf via ngx_http_add_server.
*
* The `sockaddr_seen_elsewhere` flag catches the
* shared-listen-include pattern where the SAME sockaddr appears
* under multiple server blocks: nginx rejects a second call
* carrying socket-level options (reuseport, bind, backlog=, ...)
* with "duplicate listen options". Those options belong to the
* single kernel socket that backs the sockaddr — the first cscf
* has already applied them; stripping them off subsequent
* same-sockaddr listens lets the include stay symmetrical. */
void *cscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_core_module); void *cscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_core_module);
ngx_http_ipng_stats_seen_t *seen = imcf->listens_seen->elts; ngx_http_ipng_stats_seen_t *seen = imcf->listens_seen->elts;
ngx_uint_t same_cscf_sockaddr = 0; ngx_uint_t same_cscf_sockaddr = 0;
ngx_uint_t sockaddr_seen_elsewhere = 0;
for (i = 0; i < imcf->listens_seen->nelts; i++) { for (i = 0; i < imcf->listens_seen->nelts; i++) {
if (seen[i].cscf != cscf) continue;
if (seen[i].socklen != u.addrs[0].socklen) continue; if (seen[i].socklen != u.addrs[0].socklen) continue;
if (ngx_cmp_sockaddr((struct sockaddr *) &seen[i].sockaddr, if (ngx_cmp_sockaddr((struct sockaddr *) &seen[i].sockaddr,
seen[i].socklen, seen[i].socklen,
u.addrs[0].sockaddr, u.addrs[0].sockaddr,
u.addrs[0].socklen, 1) == NGX_OK) u.addrs[0].socklen, 1) != NGX_OK)
{ {
continue;
}
if (seen[i].cscf == cscf) {
same_cscf_sockaddr = 1; same_cscf_sockaddr = 1;
break; break;
} }
sockaddr_seen_elsewhere = 1;
} }
if (!same_cscf_sockaddr) { if (!same_cscf_sockaddr) {
if (sockaddr_seen_elsewhere) {
ngx_uint_t stripped = 0;
i = 1;
while (i < cf->args->nelts) {
if (ngx_http_ipng_stats_arg_is_set_triggering(&value[i])) {
for (j = i; j + 1 < cf->args->nelts; j++) {
value[j] = value[j + 1];
}
cf->args->nelts--;
stripped = 1;
continue;
}
i++;
}
if (stripped) {
ngx_conf_log_error(NGX_LOG_NOTICE, cf, 0,
"ipng_stats: stripped socket options from "
"duplicate listen on %V — already applied by "
"an earlier server block", &value[1]);
}
}
rv = ngx_http_core_listen_orig(cf, cmd, conf); rv = ngx_http_core_listen_orig(cf, cmd, conf);
if (rv != NGX_CONF_OK) { if (rv != NGX_CONF_OK) {
return rv; return rv;

View File

@@ -31,14 +31,16 @@ Module loads
Shared-listen-include across multiple server blocks Shared-listen-include across multiple server blocks
[Documentation] Three server blocks all pull in the same [Documentation] Three server blocks all pull in the same
... ipng-listens.inc (see docs/user-guide.md). nginx ... ipng-listens.inc (see docs/user-guide.md). The
... must start without "conflicting server name" or ... include also carries `reuseport` on every listen
... "duplicate listen options" warnings, and the ... — nginx core would normally reject the second
... module must end up with exactly one listening ... server block with "duplicate listen options", but
... socket per address family on port 8080 (one for ... the wrapper strips socket-level options on a
... v4 wildcard, one for v6) — not one per (server ... repeat (cross-cscf) sockaddr so the first cscf
... block × device × family), which would exhaust ... owns the reuseport-cloned socket and the rest
... the fd table on a real host. ... merge cleanly. With worker_processes unset
... (default 1), reuseport produces one socket per
... (worker × family), i.e. 2 on :8080 here.
${output} = Docker Exec ${SERVER} nginx -t 2>&1 ${output} = Docker Exec ${SERVER} nginx -t 2>&1
Should Not Contain ${output} conflicting server name Should Not Contain ${output} conflicting server name
Should Not Contain ${output} duplicate listen Should Not Contain ${output} duplicate listen
@@ -46,6 +48,10 @@ Shared-listen-include across multiple server blocks
${count} = Get Regexp Matches ${listens} :8080\\s ${count} = Get Regexp Matches ${listens} :8080\\s
Length Should Be ${count} 2 Length Should Be ${count} 2
... Expected 2 listening sockets on port 8080 (v4+v6 wildcards); got ${count} ... Expected 2 listening sockets on port 8080 (v4+v6 wildcards); got ${count}
# Proves the cross-cscf option-stripping path actually fired for
# the 2nd and 3rd server blocks. `nginx -t` replays the whole
# config and emits the wrapper's NOTICE each time it strips.
Should Contain ${output} stripped socket options from duplicate listen
Prometheus scrape Prometheus scrape
[Documentation] Scrape returns HELP/TYPE preamble. [Documentation] Scrape returns HELP/TYPE preamble.

View File

@@ -5,8 +5,15 @@
# file from multiple server blocks to exercise the wrapper's dedup # file from multiple server blocks to exercise the wrapper's dedup
# logic: a naive implementation would either error with "duplicate # logic: a naive implementation would either error with "duplicate
# listen options" or create N * (devices × families) sockets. # listen options" or create N * (devices × families) sockets.
#
# `reuseport` is present on every listen to exercise the wrapper's
# cross-cscf option-stripping path: nginx itself would reject the
# second server block's include with "duplicate listen options" if
# reuseport weren't stripped from the repeat calls. The first cscf's
# listen binds the reuseport-cloned kernel socket; subsequent ones
# merge cleanly into it.
listen 8080 device=eth1 ipng_source_tag=tag1; listen 8080 reuseport device=eth1 ipng_source_tag=tag1;
listen [::]:8080 device=eth1 ipng_source_tag=tag1; listen [::]:8080 reuseport device=eth1 ipng_source_tag=tag1;
listen 8080 device=eth2 ipng_source_tag=tag2-v4; listen 8080 reuseport device=eth2 ipng_source_tag=tag2-v4;
listen [::]:8080 device=eth2 ipng_source_tag=tag2-v6; listen [::]:8080 reuseport device=eth2 ipng_source_tag=tag2-v6;

View File

@@ -48,9 +48,14 @@ http {
# pair, so each server block gets its own cscf attached but no # pair, so each server block gets its own cscf attached but no
# server block triggers nginx's "duplicate listen options" # server block triggers nginx's "duplicate listen options"
# check; # check;
# * strip socket-level options (reuseport, bind, backlog=, ...)
# from cross-cscf repeat sockaddrs — nginx enforces these
# at-most-once per sockaddr, and the first cscf already owns
# the single kernel socket that the remaining cscfs merge
# into;
# * dedup bindings globally on (sockaddr, device), so init_module # * dedup bindings globally on (sockaddr, device), so init_module
# creates exactly four sockets here (two families × two # creates exactly one binding per (device, family) rather than
# devices) rather than 3 × 4 = 12. # one per (server block × device × family).
# The default server owns the locations used by the traffic tests; # The default server owns the locations used by the traffic tests;
# the two extras exist only to exercise the shared-include pattern. # the two extras exist only to exercise the shared-include pattern.
server { server {