Strip socket options on cross-cscf repeat listens (v0.7.2)
Make the shared-listen-include pattern work with `reuseport` and the other socket-level listen options. Nginx core enforces at-most-once per sockaddr on options that set lsopt.set=1 (reuseport, bind, backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=, accept_filter=, deferred, ipv6only=, so_keepalive=) and emits "duplicate listen options for <addr>" otherwise. That rule collides with a single listens.conf included from every vhost — each vhost's include re-submits the same options. The listen wrapper now detects the cross-cscf case, strips those options from cf->args before delegating to the core handler, and logs one notice per stripped listen. The first cscf owns the options on the kernel socket; later cscfs merge cleanly via ngx_http_add_server. Protocol-level flags (ssl, http2, quic, proxy_protocol) pass through untouched since nginx OR-merges those across cscfs. This unblocks `reuseport` for deployments that want better new-connection spread across workers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2
Makefile
2
Makefile
@@ -23,7 +23,7 @@ BUILD_DIR := $(CURDIR)/build
|
||||
# the package version from there directly. The C code picks up VERSION
|
||||
# via the generated src/version.h (written by the version-header target
|
||||
# below and depended on by the module build).
|
||||
VERSION := 0.7.1
|
||||
VERSION := 0.7.2
|
||||
|
||||
NGINX_SRC ?=
|
||||
|
||||
|
||||
20
debian/changelog
vendored
20
debian/changelog
vendored
@@ -1,3 +1,23 @@
|
||||
nginx-ipng-stats-plugin (0.7.2-1) unstable; urgency=medium
|
||||
|
||||
* Pre-release v0.7.2.
|
||||
- Listen wrapper now strips socket-level options (reuseport,
|
||||
bind, backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=,
|
||||
accept_filter=, deferred, ipv6only=, so_keepalive=) from
|
||||
cf->args when a sockaddr recurs under a different server
|
||||
block. Previously this pattern — typical with a shared
|
||||
`include listens.conf;` across vhosts — tripped nginx's
|
||||
"duplicate listen options for <addr>" check because those
|
||||
options are one-shot per kernel socket. The first cscf now
|
||||
owns the options on the shared kernel socket and later
|
||||
cscfs merge cleanly via ngx_http_add_server. A NOTICE is
|
||||
logged each time socket options are stripped. This makes
|
||||
`reuseport` usable with the shared-include deployment
|
||||
pattern from docs/user-guide.md, which helps worker
|
||||
load-balancing on busy hosts.
|
||||
|
||||
-- Pim van Pelt <pim@ipng.ch> Sun, 19 Apr 2026 16:30:00 +0200
|
||||
|
||||
nginx-ipng-stats-plugin (0.7.1-1) unstable; urgency=medium
|
||||
|
||||
* Pre-release v0.7.1.
|
||||
|
||||
@@ -90,6 +90,14 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
|
||||
`(server block, sockaddr)` across both plain and device-tagged listens: the first occurrence registers the kernel socket,
|
||||
and subsequent same-sockaddr siblings (plain or device-tagged) are accepted without tripping nginx's duplicate-listen check.
|
||||
Device-tagged siblings additionally register an entry in the attribution table.
|
||||
- **FR-1.5a** When the *same* sockaddr is listen'd from a *different* `server { ... }` block — the shared `include` pattern —
|
||||
the wrapper MUST strip socket-level options from `cf->args` before delegating to the core listen handler. These options
|
||||
(`reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`, `deferred`, `ipv6only=`,
|
||||
`so_keepalive=`) apply to the one kernel socket that backs the sockaddr and nginx rejects any attempt to set them more
|
||||
than once per sockaddr with `duplicate listen options for <addr>` (see `ngx_http_add_addresses` in `src/http/ngx_http.c`).
|
||||
The first cscf to hit the sockaddr owns the options; subsequent cscfs pass the core handler with the options removed and
|
||||
merge via `ngx_http_add_server`. Protocol-level flags (`ssl`, `http2`, `quic`, `proxy_protocol`) are preserved on every call
|
||||
because nginx merges them with OR semantics across cscfs.
|
||||
- **FR-1.6** A `listen` directive that uses a wildcard address (`80`, `[::]:80`) together with `device=<ifname>` MUST attribute
|
||||
every connection whose ingress interface is `<ifname>` — regardless of which local address the client addressed — to that
|
||||
listen's source tag. Traffic on other interfaces MUST fall back to the configured default source (see FR-1.3).
|
||||
|
||||
@@ -199,6 +199,31 @@ register bindings without tripping nginx's duplicate-listen check. Traffic arriv
|
||||
back to `ipng_stats_default_source` (`direct` by default). Keeping "direct" traffic on its own port — e.g.
|
||||
`listen 198.51.100.1:8081;` — remains a fine pattern when you want a hard split, but it's no longer required.
|
||||
|
||||
### Shared includes with `reuseport` (or other socket-level options)
|
||||
|
||||
Socket-level `listen` options — `reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`,
|
||||
`deferred`, `ipv6only=`, `so_keepalive=` — belong to the one kernel socket that backs a given sockaddr, not to a particular
|
||||
`server { ... }` block. Stock nginx enforces this by accepting them on at most the *first* listen per sockaddr and emitting
|
||||
`duplicate listen options for <addr>` on any subsequent repeat. That rule collides with the common deployment pattern of a single
|
||||
`listens.conf` included from every vhost, because each vhost's `include` re-submits the same options.
|
||||
|
||||
The wrapper resolves this transparently. When a sockaddr recurs under a different `server` block than the one that first
|
||||
registered it, the wrapper strips socket-level options from the incoming `cf->args` before delegating to nginx's core listen
|
||||
handler. The first `server` block owns the options on the kernel socket (including `reuseport`, which triggers per-worker
|
||||
socket cloning); later blocks merge cleanly via `ngx_http_add_server` and inherit the same socket. The wrapper logs one
|
||||
`[notice] ipng_stats: stripped socket options from duplicate listen on <addr>` per stripped listen — informational, not an
|
||||
error. So this include works unchanged across as many vhosts as you like:
|
||||
|
||||
```nginx
|
||||
listen 443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
|
||||
listen [::]:443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
|
||||
```
|
||||
|
||||
`reuseport` noticeably helps worker load-balancing on busy hosts: without it, a single shared listening socket forces workers
|
||||
to compete for accepts and traffic routinely concentrates on one or two workers. HTTP/2 and long-lived keepalive connections
|
||||
can still skew CPU toward whichever worker holds a few heavy clients — `reuseport` does not reshuffle existing connections —
|
||||
but new-connection distribution across workers becomes kernel-hashed, not first-ready-wins.
|
||||
|
||||
## 4. Verify with curl
|
||||
|
||||
Generate some traffic (or wait for real traffic), then scrape the endpoint locally:
|
||||
|
||||
@@ -499,6 +499,50 @@ ngx_http_ipng_stats_preconfig(ngx_conf_t *cf)
|
||||
}
|
||||
|
||||
|
||||
/* Return 1 if `a` is a listen parameter that makes nginx set
|
||||
* `lsopt.set = 1` (i.e. a socket-level option that nginx enforces
|
||||
* at-most-once per sockaddr via its "duplicate listen options" check
|
||||
* in ngx_http_add_addresses). Keep in sync with nginx core's
|
||||
* ngx_http_core_listen — currently: bind, setfib=, fastopen=,
|
||||
* backlog=, rcvbuf=, sndbuf=, accept_filter=, deferred, ipv6only=,
|
||||
* reuseport, so_keepalive=. */
|
||||
static ngx_uint_t
|
||||
ngx_http_ipng_stats_arg_is_set_triggering(ngx_str_t *a)
|
||||
{
|
||||
static const struct { const char *s; size_t n; ngx_uint_t exact; } tbl[] = {
|
||||
{ "bind", 4, 1 },
|
||||
{ "deferred", 8, 1 },
|
||||
{ "reuseport", 9, 1 },
|
||||
{ "setfib=", 7, 0 },
|
||||
{ "fastopen=", 9, 0 },
|
||||
{ "backlog=", 8, 0 },
|
||||
{ "rcvbuf=", 7, 0 },
|
||||
{ "sndbuf=", 7, 0 },
|
||||
{ "accept_filter=",14, 0 },
|
||||
{ "ipv6only=", 9, 0 },
|
||||
{ "so_keepalive=", 13, 0 },
|
||||
};
|
||||
ngx_uint_t k;
|
||||
|
||||
for (k = 0; k < sizeof(tbl) / sizeof(tbl[0]); k++) {
|
||||
if (tbl[k].exact) {
|
||||
if (a->len == tbl[k].n
|
||||
&& ngx_strncmp(a->data, tbl[k].s, tbl[k].n) == 0)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
} else {
|
||||
if (a->len > tbl[k].n
|
||||
&& ngx_strncmp(a->data, tbl[k].s, tbl[k].n) == 0)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/* The wrapper extracts device= and ipng_source_tag= from cf->args, compacting
|
||||
* the array in place, then calls the original ngx_http_core_module
|
||||
* listen handler. After a successful call it records a binding in
|
||||
@@ -580,24 +624,59 @@ ngx_http_ipng_stats_listen_wrapper(ngx_conf_t *cf, ngx_command_t *cmd,
|
||||
* nginx's handler runs once (creating the socket), subsequent
|
||||
* same-sockaddr listens just register additional device
|
||||
* bindings. Across different server blocks the same sockaddr
|
||||
* re-appears and nginx merges the cscf via ngx_http_add_server. */
|
||||
* re-appears and nginx merges the cscf via ngx_http_add_server.
|
||||
*
|
||||
* The `sockaddr_seen_elsewhere` flag catches the
|
||||
* shared-listen-include pattern where the SAME sockaddr appears
|
||||
* under multiple server blocks: nginx rejects a second call
|
||||
* carrying socket-level options (reuseport, bind, backlog=, ...)
|
||||
* with "duplicate listen options". Those options belong to the
|
||||
* single kernel socket that backs the sockaddr — the first cscf
|
||||
* has already applied them; stripping them off subsequent
|
||||
* same-sockaddr listens lets the include stay symmetrical. */
|
||||
void *cscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_core_module);
|
||||
ngx_http_ipng_stats_seen_t *seen = imcf->listens_seen->elts;
|
||||
ngx_uint_t same_cscf_sockaddr = 0;
|
||||
ngx_uint_t sockaddr_seen_elsewhere = 0;
|
||||
for (i = 0; i < imcf->listens_seen->nelts; i++) {
|
||||
if (seen[i].cscf != cscf) continue;
|
||||
if (seen[i].socklen != u.addrs[0].socklen) continue;
|
||||
if (ngx_cmp_sockaddr((struct sockaddr *) &seen[i].sockaddr,
|
||||
seen[i].socklen,
|
||||
u.addrs[0].sockaddr,
|
||||
u.addrs[0].socklen, 1) == NGX_OK)
|
||||
u.addrs[0].socklen, 1) != NGX_OK)
|
||||
{
|
||||
continue;
|
||||
}
|
||||
if (seen[i].cscf == cscf) {
|
||||
same_cscf_sockaddr = 1;
|
||||
break;
|
||||
}
|
||||
sockaddr_seen_elsewhere = 1;
|
||||
}
|
||||
|
||||
if (!same_cscf_sockaddr) {
|
||||
if (sockaddr_seen_elsewhere) {
|
||||
ngx_uint_t stripped = 0;
|
||||
i = 1;
|
||||
while (i < cf->args->nelts) {
|
||||
if (ngx_http_ipng_stats_arg_is_set_triggering(&value[i])) {
|
||||
for (j = i; j + 1 < cf->args->nelts; j++) {
|
||||
value[j] = value[j + 1];
|
||||
}
|
||||
cf->args->nelts--;
|
||||
stripped = 1;
|
||||
continue;
|
||||
}
|
||||
i++;
|
||||
}
|
||||
if (stripped) {
|
||||
ngx_conf_log_error(NGX_LOG_NOTICE, cf, 0,
|
||||
"ipng_stats: stripped socket options from "
|
||||
"duplicate listen on %V — already applied by "
|
||||
"an earlier server block", &value[1]);
|
||||
}
|
||||
}
|
||||
|
||||
rv = ngx_http_core_listen_orig(cf, cmd, conf);
|
||||
if (rv != NGX_CONF_OK) {
|
||||
return rv;
|
||||
|
||||
@@ -31,14 +31,16 @@ Module loads
|
||||
|
||||
Shared-listen-include across multiple server blocks
|
||||
[Documentation] Three server blocks all pull in the same
|
||||
... ipng-listens.inc (see docs/user-guide.md). nginx
|
||||
... must start without "conflicting server name" or
|
||||
... "duplicate listen options" warnings, and the
|
||||
... module must end up with exactly one listening
|
||||
... socket per address family on port 8080 (one for
|
||||
... v4 wildcard, one for v6) — not one per (server
|
||||
... block × device × family), which would exhaust
|
||||
... the fd table on a real host.
|
||||
... ipng-listens.inc (see docs/user-guide.md). The
|
||||
... include also carries `reuseport` on every listen
|
||||
... — nginx core would normally reject the second
|
||||
... server block with "duplicate listen options", but
|
||||
... the wrapper strips socket-level options on a
|
||||
... repeat (cross-cscf) sockaddr so the first cscf
|
||||
... owns the reuseport-cloned socket and the rest
|
||||
... merge cleanly. With worker_processes unset
|
||||
... (default 1), reuseport produces one socket per
|
||||
... (worker × family), i.e. 2 on :8080 here.
|
||||
${output} = Docker Exec ${SERVER} nginx -t 2>&1
|
||||
Should Not Contain ${output} conflicting server name
|
||||
Should Not Contain ${output} duplicate listen
|
||||
@@ -46,6 +48,10 @@ Shared-listen-include across multiple server blocks
|
||||
${count} = Get Regexp Matches ${listens} :8080\\s
|
||||
Length Should Be ${count} 2
|
||||
... Expected 2 listening sockets on port 8080 (v4+v6 wildcards); got ${count}
|
||||
# Proves the cross-cscf option-stripping path actually fired for
|
||||
# the 2nd and 3rd server blocks. `nginx -t` replays the whole
|
||||
# config and emits the wrapper's NOTICE each time it strips.
|
||||
Should Contain ${output} stripped socket options from duplicate listen
|
||||
|
||||
Prometheus scrape
|
||||
[Documentation] Scrape returns HELP/TYPE preamble.
|
||||
|
||||
@@ -5,8 +5,15 @@
|
||||
# file from multiple server blocks to exercise the wrapper's dedup
|
||||
# logic: a naive implementation would either error with "duplicate
|
||||
# listen options" or create N * (devices × families) sockets.
|
||||
#
|
||||
# `reuseport` is present on every listen to exercise the wrapper's
|
||||
# cross-cscf option-stripping path: nginx itself would reject the
|
||||
# second server block's include with "duplicate listen options" if
|
||||
# reuseport weren't stripped from the repeat calls. The first cscf's
|
||||
# listen binds the reuseport-cloned kernel socket; subsequent ones
|
||||
# merge cleanly into it.
|
||||
|
||||
listen 8080 device=eth1 ipng_source_tag=tag1;
|
||||
listen [::]:8080 device=eth1 ipng_source_tag=tag1;
|
||||
listen 8080 device=eth2 ipng_source_tag=tag2-v4;
|
||||
listen [::]:8080 device=eth2 ipng_source_tag=tag2-v6;
|
||||
listen 8080 reuseport device=eth1 ipng_source_tag=tag1;
|
||||
listen [::]:8080 reuseport device=eth1 ipng_source_tag=tag1;
|
||||
listen 8080 reuseport device=eth2 ipng_source_tag=tag2-v4;
|
||||
listen [::]:8080 reuseport device=eth2 ipng_source_tag=tag2-v6;
|
||||
|
||||
@@ -48,9 +48,14 @@ http {
|
||||
# pair, so each server block gets its own cscf attached but no
|
||||
# server block triggers nginx's "duplicate listen options"
|
||||
# check;
|
||||
# * strip socket-level options (reuseport, bind, backlog=, ...)
|
||||
# from cross-cscf repeat sockaddrs — nginx enforces these
|
||||
# at-most-once per sockaddr, and the first cscf already owns
|
||||
# the single kernel socket that the remaining cscfs merge
|
||||
# into;
|
||||
# * dedup bindings globally on (sockaddr, device), so init_module
|
||||
# creates exactly four sockets here (two families × two
|
||||
# devices) rather than 3 × 4 = 12.
|
||||
# creates exactly one binding per (device, family) rather than
|
||||
# one per (server block × device × family).
|
||||
# The default server owns the locations used by the traffic tests;
|
||||
# the two extras exist only to exercise the shared-include pattern.
|
||||
server {
|
||||
|
||||
Reference in New Issue
Block a user