Strip socket options on cross-cscf repeat listens (v0.7.2)
Make the shared-listen-include pattern work with `reuseport` and the other socket-level listen options. Nginx core enforces at-most-once per sockaddr on options that set lsopt.set=1 (reuseport, bind, backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=, accept_filter=, deferred, ipv6only=, so_keepalive=) and emits "duplicate listen options for <addr>" otherwise. That rule collides with a single listens.conf included from every vhost — each vhost's include re-submits the same options. The listen wrapper now detects the cross-cscf case, strips those options from cf->args before delegating to the core handler, and logs one notice per stripped listen. The first cscf owns the options on the kernel socket; later cscfs merge cleanly via ngx_http_add_server. Protocol-level flags (ssl, http2, quic, proxy_protocol) pass through untouched since nginx OR-merges those across cscfs. This unblocks `reuseport` for deployments that want better new-connection spread across workers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2
Makefile
2
Makefile
@@ -23,7 +23,7 @@ BUILD_DIR := $(CURDIR)/build
|
|||||||
# the package version from there directly. The C code picks up VERSION
|
# the package version from there directly. The C code picks up VERSION
|
||||||
# via the generated src/version.h (written by the version-header target
|
# via the generated src/version.h (written by the version-header target
|
||||||
# below and depended on by the module build).
|
# below and depended on by the module build).
|
||||||
VERSION := 0.7.1
|
VERSION := 0.7.2
|
||||||
|
|
||||||
NGINX_SRC ?=
|
NGINX_SRC ?=
|
||||||
|
|
||||||
|
|||||||
20
debian/changelog
vendored
20
debian/changelog
vendored
@@ -1,3 +1,23 @@
|
|||||||
|
nginx-ipng-stats-plugin (0.7.2-1) unstable; urgency=medium
|
||||||
|
|
||||||
|
* Pre-release v0.7.2.
|
||||||
|
- Listen wrapper now strips socket-level options (reuseport,
|
||||||
|
bind, backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=,
|
||||||
|
accept_filter=, deferred, ipv6only=, so_keepalive=) from
|
||||||
|
cf->args when a sockaddr recurs under a different server
|
||||||
|
block. Previously this pattern — typical with a shared
|
||||||
|
`include listens.conf;` across vhosts — tripped nginx's
|
||||||
|
"duplicate listen options for <addr>" check because those
|
||||||
|
options are one-shot per kernel socket. The first cscf now
|
||||||
|
owns the options on the shared kernel socket and later
|
||||||
|
cscfs merge cleanly via ngx_http_add_server. A NOTICE is
|
||||||
|
logged each time socket options are stripped. This makes
|
||||||
|
`reuseport` usable with the shared-include deployment
|
||||||
|
pattern from docs/user-guide.md, which helps worker
|
||||||
|
load-balancing on busy hosts.
|
||||||
|
|
||||||
|
-- Pim van Pelt <pim@ipng.ch> Sun, 19 Apr 2026 16:30:00 +0200
|
||||||
|
|
||||||
nginx-ipng-stats-plugin (0.7.1-1) unstable; urgency=medium
|
nginx-ipng-stats-plugin (0.7.1-1) unstable; urgency=medium
|
||||||
|
|
||||||
* Pre-release v0.7.1.
|
* Pre-release v0.7.1.
|
||||||
|
|||||||
@@ -90,6 +90,14 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
|
|||||||
`(server block, sockaddr)` across both plain and device-tagged listens: the first occurrence registers the kernel socket,
|
`(server block, sockaddr)` across both plain and device-tagged listens: the first occurrence registers the kernel socket,
|
||||||
and subsequent same-sockaddr siblings (plain or device-tagged) are accepted without tripping nginx's duplicate-listen check.
|
and subsequent same-sockaddr siblings (plain or device-tagged) are accepted without tripping nginx's duplicate-listen check.
|
||||||
Device-tagged siblings additionally register an entry in the attribution table.
|
Device-tagged siblings additionally register an entry in the attribution table.
|
||||||
|
- **FR-1.5a** When the *same* sockaddr is listen'd from a *different* `server { ... }` block — the shared `include` pattern —
|
||||||
|
the wrapper MUST strip socket-level options from `cf->args` before delegating to the core listen handler. These options
|
||||||
|
(`reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`, `deferred`, `ipv6only=`,
|
||||||
|
`so_keepalive=`) apply to the one kernel socket that backs the sockaddr and nginx rejects any attempt to set them more
|
||||||
|
than once per sockaddr with `duplicate listen options for <addr>` (see `ngx_http_add_addresses` in `src/http/ngx_http.c`).
|
||||||
|
The first cscf to hit the sockaddr owns the options; subsequent cscfs pass the core handler with the options removed and
|
||||||
|
merge via `ngx_http_add_server`. Protocol-level flags (`ssl`, `http2`, `quic`, `proxy_protocol`) are preserved on every call
|
||||||
|
because nginx merges them with OR semantics across cscfs.
|
||||||
- **FR-1.6** A `listen` directive that uses a wildcard address (`80`, `[::]:80`) together with `device=<ifname>` MUST attribute
|
- **FR-1.6** A `listen` directive that uses a wildcard address (`80`, `[::]:80`) together with `device=<ifname>` MUST attribute
|
||||||
every connection whose ingress interface is `<ifname>` — regardless of which local address the client addressed — to that
|
every connection whose ingress interface is `<ifname>` — regardless of which local address the client addressed — to that
|
||||||
listen's source tag. Traffic on other interfaces MUST fall back to the configured default source (see FR-1.3).
|
listen's source tag. Traffic on other interfaces MUST fall back to the configured default source (see FR-1.3).
|
||||||
|
|||||||
@@ -199,6 +199,31 @@ register bindings without tripping nginx's duplicate-listen check. Traffic arriv
|
|||||||
back to `ipng_stats_default_source` (`direct` by default). Keeping "direct" traffic on its own port — e.g.
|
back to `ipng_stats_default_source` (`direct` by default). Keeping "direct" traffic on its own port — e.g.
|
||||||
`listen 198.51.100.1:8081;` — remains a fine pattern when you want a hard split, but it's no longer required.
|
`listen 198.51.100.1:8081;` — remains a fine pattern when you want a hard split, but it's no longer required.
|
||||||
|
|
||||||
|
### Shared includes with `reuseport` (or other socket-level options)
|
||||||
|
|
||||||
|
Socket-level `listen` options — `reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`,
|
||||||
|
`deferred`, `ipv6only=`, `so_keepalive=` — belong to the one kernel socket that backs a given sockaddr, not to a particular
|
||||||
|
`server { ... }` block. Stock nginx enforces this by accepting them on at most the *first* listen per sockaddr and emitting
|
||||||
|
`duplicate listen options for <addr>` on any subsequent repeat. That rule collides with the common deployment pattern of a single
|
||||||
|
`listens.conf` included from every vhost, because each vhost's `include` re-submits the same options.
|
||||||
|
|
||||||
|
The wrapper resolves this transparently. When a sockaddr recurs under a different `server` block than the one that first
|
||||||
|
registered it, the wrapper strips socket-level options from the incoming `cf->args` before delegating to nginx's core listen
|
||||||
|
handler. The first `server` block owns the options on the kernel socket (including `reuseport`, which triggers per-worker
|
||||||
|
socket cloning); later blocks merge cleanly via `ngx_http_add_server` and inherit the same socket. The wrapper logs one
|
||||||
|
`[notice] ipng_stats: stripped socket options from duplicate listen on <addr>` per stripped listen — informational, not an
|
||||||
|
error. So this include works unchanged across as many vhosts as you like:
|
||||||
|
|
||||||
|
```nginx
|
||||||
|
listen 443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
|
||||||
|
listen [::]:443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
|
||||||
|
```
|
||||||
|
|
||||||
|
`reuseport` noticeably helps worker load-balancing on busy hosts: without it, a single shared listening socket forces workers
|
||||||
|
to compete for accepts and traffic routinely concentrates on one or two workers. HTTP/2 and long-lived keepalive connections
|
||||||
|
can still skew CPU toward whichever worker holds a few heavy clients — `reuseport` does not reshuffle existing connections —
|
||||||
|
but new-connection distribution across workers becomes kernel-hashed, not first-ready-wins.
|
||||||
|
|
||||||
## 4. Verify with curl
|
## 4. Verify with curl
|
||||||
|
|
||||||
Generate some traffic (or wait for real traffic), then scrape the endpoint locally:
|
Generate some traffic (or wait for real traffic), then scrape the endpoint locally:
|
||||||
|
|||||||
@@ -499,6 +499,50 @@ ngx_http_ipng_stats_preconfig(ngx_conf_t *cf)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/* Return 1 if `a` is a listen parameter that makes nginx set
|
||||||
|
* `lsopt.set = 1` (i.e. a socket-level option that nginx enforces
|
||||||
|
* at-most-once per sockaddr via its "duplicate listen options" check
|
||||||
|
* in ngx_http_add_addresses). Keep in sync with nginx core's
|
||||||
|
* ngx_http_core_listen — currently: bind, setfib=, fastopen=,
|
||||||
|
* backlog=, rcvbuf=, sndbuf=, accept_filter=, deferred, ipv6only=,
|
||||||
|
* reuseport, so_keepalive=. */
|
||||||
|
static ngx_uint_t
|
||||||
|
ngx_http_ipng_stats_arg_is_set_triggering(ngx_str_t *a)
|
||||||
|
{
|
||||||
|
static const struct { const char *s; size_t n; ngx_uint_t exact; } tbl[] = {
|
||||||
|
{ "bind", 4, 1 },
|
||||||
|
{ "deferred", 8, 1 },
|
||||||
|
{ "reuseport", 9, 1 },
|
||||||
|
{ "setfib=", 7, 0 },
|
||||||
|
{ "fastopen=", 9, 0 },
|
||||||
|
{ "backlog=", 8, 0 },
|
||||||
|
{ "rcvbuf=", 7, 0 },
|
||||||
|
{ "sndbuf=", 7, 0 },
|
||||||
|
{ "accept_filter=",14, 0 },
|
||||||
|
{ "ipv6only=", 9, 0 },
|
||||||
|
{ "so_keepalive=", 13, 0 },
|
||||||
|
};
|
||||||
|
ngx_uint_t k;
|
||||||
|
|
||||||
|
for (k = 0; k < sizeof(tbl) / sizeof(tbl[0]); k++) {
|
||||||
|
if (tbl[k].exact) {
|
||||||
|
if (a->len == tbl[k].n
|
||||||
|
&& ngx_strncmp(a->data, tbl[k].s, tbl[k].n) == 0)
|
||||||
|
{
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
if (a->len > tbl[k].n
|
||||||
|
&& ngx_strncmp(a->data, tbl[k].s, tbl[k].n) == 0)
|
||||||
|
{
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
/* The wrapper extracts device= and ipng_source_tag= from cf->args, compacting
|
/* The wrapper extracts device= and ipng_source_tag= from cf->args, compacting
|
||||||
* the array in place, then calls the original ngx_http_core_module
|
* the array in place, then calls the original ngx_http_core_module
|
||||||
* listen handler. After a successful call it records a binding in
|
* listen handler. After a successful call it records a binding in
|
||||||
@@ -580,24 +624,59 @@ ngx_http_ipng_stats_listen_wrapper(ngx_conf_t *cf, ngx_command_t *cmd,
|
|||||||
* nginx's handler runs once (creating the socket), subsequent
|
* nginx's handler runs once (creating the socket), subsequent
|
||||||
* same-sockaddr listens just register additional device
|
* same-sockaddr listens just register additional device
|
||||||
* bindings. Across different server blocks the same sockaddr
|
* bindings. Across different server blocks the same sockaddr
|
||||||
* re-appears and nginx merges the cscf via ngx_http_add_server. */
|
* re-appears and nginx merges the cscf via ngx_http_add_server.
|
||||||
|
*
|
||||||
|
* The `sockaddr_seen_elsewhere` flag catches the
|
||||||
|
* shared-listen-include pattern where the SAME sockaddr appears
|
||||||
|
* under multiple server blocks: nginx rejects a second call
|
||||||
|
* carrying socket-level options (reuseport, bind, backlog=, ...)
|
||||||
|
* with "duplicate listen options". Those options belong to the
|
||||||
|
* single kernel socket that backs the sockaddr — the first cscf
|
||||||
|
* has already applied them; stripping them off subsequent
|
||||||
|
* same-sockaddr listens lets the include stay symmetrical. */
|
||||||
void *cscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_core_module);
|
void *cscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_core_module);
|
||||||
ngx_http_ipng_stats_seen_t *seen = imcf->listens_seen->elts;
|
ngx_http_ipng_stats_seen_t *seen = imcf->listens_seen->elts;
|
||||||
ngx_uint_t same_cscf_sockaddr = 0;
|
ngx_uint_t same_cscf_sockaddr = 0;
|
||||||
|
ngx_uint_t sockaddr_seen_elsewhere = 0;
|
||||||
for (i = 0; i < imcf->listens_seen->nelts; i++) {
|
for (i = 0; i < imcf->listens_seen->nelts; i++) {
|
||||||
if (seen[i].cscf != cscf) continue;
|
|
||||||
if (seen[i].socklen != u.addrs[0].socklen) continue;
|
if (seen[i].socklen != u.addrs[0].socklen) continue;
|
||||||
if (ngx_cmp_sockaddr((struct sockaddr *) &seen[i].sockaddr,
|
if (ngx_cmp_sockaddr((struct sockaddr *) &seen[i].sockaddr,
|
||||||
seen[i].socklen,
|
seen[i].socklen,
|
||||||
u.addrs[0].sockaddr,
|
u.addrs[0].sockaddr,
|
||||||
u.addrs[0].socklen, 1) == NGX_OK)
|
u.addrs[0].socklen, 1) != NGX_OK)
|
||||||
{
|
{
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (seen[i].cscf == cscf) {
|
||||||
same_cscf_sockaddr = 1;
|
same_cscf_sockaddr = 1;
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
sockaddr_seen_elsewhere = 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!same_cscf_sockaddr) {
|
if (!same_cscf_sockaddr) {
|
||||||
|
if (sockaddr_seen_elsewhere) {
|
||||||
|
ngx_uint_t stripped = 0;
|
||||||
|
i = 1;
|
||||||
|
while (i < cf->args->nelts) {
|
||||||
|
if (ngx_http_ipng_stats_arg_is_set_triggering(&value[i])) {
|
||||||
|
for (j = i; j + 1 < cf->args->nelts; j++) {
|
||||||
|
value[j] = value[j + 1];
|
||||||
|
}
|
||||||
|
cf->args->nelts--;
|
||||||
|
stripped = 1;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
i++;
|
||||||
|
}
|
||||||
|
if (stripped) {
|
||||||
|
ngx_conf_log_error(NGX_LOG_NOTICE, cf, 0,
|
||||||
|
"ipng_stats: stripped socket options from "
|
||||||
|
"duplicate listen on %V — already applied by "
|
||||||
|
"an earlier server block", &value[1]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
rv = ngx_http_core_listen_orig(cf, cmd, conf);
|
rv = ngx_http_core_listen_orig(cf, cmd, conf);
|
||||||
if (rv != NGX_CONF_OK) {
|
if (rv != NGX_CONF_OK) {
|
||||||
return rv;
|
return rv;
|
||||||
|
|||||||
@@ -31,14 +31,16 @@ Module loads
|
|||||||
|
|
||||||
Shared-listen-include across multiple server blocks
|
Shared-listen-include across multiple server blocks
|
||||||
[Documentation] Three server blocks all pull in the same
|
[Documentation] Three server blocks all pull in the same
|
||||||
... ipng-listens.inc (see docs/user-guide.md). nginx
|
... ipng-listens.inc (see docs/user-guide.md). The
|
||||||
... must start without "conflicting server name" or
|
... include also carries `reuseport` on every listen
|
||||||
... "duplicate listen options" warnings, and the
|
... — nginx core would normally reject the second
|
||||||
... module must end up with exactly one listening
|
... server block with "duplicate listen options", but
|
||||||
... socket per address family on port 8080 (one for
|
... the wrapper strips socket-level options on a
|
||||||
... v4 wildcard, one for v6) — not one per (server
|
... repeat (cross-cscf) sockaddr so the first cscf
|
||||||
... block × device × family), which would exhaust
|
... owns the reuseport-cloned socket and the rest
|
||||||
... the fd table on a real host.
|
... merge cleanly. With worker_processes unset
|
||||||
|
... (default 1), reuseport produces one socket per
|
||||||
|
... (worker × family), i.e. 2 on :8080 here.
|
||||||
${output} = Docker Exec ${SERVER} nginx -t 2>&1
|
${output} = Docker Exec ${SERVER} nginx -t 2>&1
|
||||||
Should Not Contain ${output} conflicting server name
|
Should Not Contain ${output} conflicting server name
|
||||||
Should Not Contain ${output} duplicate listen
|
Should Not Contain ${output} duplicate listen
|
||||||
@@ -46,6 +48,10 @@ Shared-listen-include across multiple server blocks
|
|||||||
${count} = Get Regexp Matches ${listens} :8080\\s
|
${count} = Get Regexp Matches ${listens} :8080\\s
|
||||||
Length Should Be ${count} 2
|
Length Should Be ${count} 2
|
||||||
... Expected 2 listening sockets on port 8080 (v4+v6 wildcards); got ${count}
|
... Expected 2 listening sockets on port 8080 (v4+v6 wildcards); got ${count}
|
||||||
|
# Proves the cross-cscf option-stripping path actually fired for
|
||||||
|
# the 2nd and 3rd server blocks. `nginx -t` replays the whole
|
||||||
|
# config and emits the wrapper's NOTICE each time it strips.
|
||||||
|
Should Contain ${output} stripped socket options from duplicate listen
|
||||||
|
|
||||||
Prometheus scrape
|
Prometheus scrape
|
||||||
[Documentation] Scrape returns HELP/TYPE preamble.
|
[Documentation] Scrape returns HELP/TYPE preamble.
|
||||||
|
|||||||
@@ -5,8 +5,15 @@
|
|||||||
# file from multiple server blocks to exercise the wrapper's dedup
|
# file from multiple server blocks to exercise the wrapper's dedup
|
||||||
# logic: a naive implementation would either error with "duplicate
|
# logic: a naive implementation would either error with "duplicate
|
||||||
# listen options" or create N * (devices × families) sockets.
|
# listen options" or create N * (devices × families) sockets.
|
||||||
|
#
|
||||||
|
# `reuseport` is present on every listen to exercise the wrapper's
|
||||||
|
# cross-cscf option-stripping path: nginx itself would reject the
|
||||||
|
# second server block's include with "duplicate listen options" if
|
||||||
|
# reuseport weren't stripped from the repeat calls. The first cscf's
|
||||||
|
# listen binds the reuseport-cloned kernel socket; subsequent ones
|
||||||
|
# merge cleanly into it.
|
||||||
|
|
||||||
listen 8080 device=eth1 ipng_source_tag=tag1;
|
listen 8080 reuseport device=eth1 ipng_source_tag=tag1;
|
||||||
listen [::]:8080 device=eth1 ipng_source_tag=tag1;
|
listen [::]:8080 reuseport device=eth1 ipng_source_tag=tag1;
|
||||||
listen 8080 device=eth2 ipng_source_tag=tag2-v4;
|
listen 8080 reuseport device=eth2 ipng_source_tag=tag2-v4;
|
||||||
listen [::]:8080 device=eth2 ipng_source_tag=tag2-v6;
|
listen [::]:8080 reuseport device=eth2 ipng_source_tag=tag2-v6;
|
||||||
|
|||||||
@@ -48,9 +48,14 @@ http {
|
|||||||
# pair, so each server block gets its own cscf attached but no
|
# pair, so each server block gets its own cscf attached but no
|
||||||
# server block triggers nginx's "duplicate listen options"
|
# server block triggers nginx's "duplicate listen options"
|
||||||
# check;
|
# check;
|
||||||
|
# * strip socket-level options (reuseport, bind, backlog=, ...)
|
||||||
|
# from cross-cscf repeat sockaddrs — nginx enforces these
|
||||||
|
# at-most-once per sockaddr, and the first cscf already owns
|
||||||
|
# the single kernel socket that the remaining cscfs merge
|
||||||
|
# into;
|
||||||
# * dedup bindings globally on (sockaddr, device), so init_module
|
# * dedup bindings globally on (sockaddr, device), so init_module
|
||||||
# creates exactly four sockets here (two families × two
|
# creates exactly one binding per (device, family) rather than
|
||||||
# devices) rather than 3 × 4 = 12.
|
# one per (server block × device × family).
|
||||||
# The default server owns the locations used by the traffic tests;
|
# The default server owns the locations used by the traffic tests;
|
||||||
# the two extras exist only to exercise the shared-include pattern.
|
# the two extras exist only to exercise the shared-include pattern.
|
||||||
server {
|
server {
|
||||||
|
|||||||
Reference in New Issue
Block a user