Strip socket options on cross-cscf repeat listens (v0.7.2)

Make the shared-listen-include pattern work with `reuseport` and the
other socket-level listen options. Nginx core enforces at-most-once
per sockaddr on options that set lsopt.set=1 (reuseport, bind,
backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=, accept_filter=,
deferred, ipv6only=, so_keepalive=) and emits "duplicate listen
options for <addr>" otherwise. That rule collides with a single
listens.conf included from every vhost — each vhost's include
re-submits the same options.

The listen wrapper now detects the cross-cscf case, strips those
options from cf->args before delegating to the core handler, and
logs one notice per stripped listen. The first cscf owns the
options on the kernel socket; later cscfs merge cleanly via
ngx_http_add_server. Protocol-level flags (ssl, http2, quic,
proxy_protocol) pass through untouched since nginx OR-merges those
across cscfs.

This unblocks `reuseport` for deployments that want better
new-connection spread across workers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-19 16:23:58 +02:00
parent badb684431
commit 7ed77f5b22
8 changed files with 168 additions and 18 deletions

View File

@@ -90,6 +90,14 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
`(server block, sockaddr)` across both plain and device-tagged listens: the first occurrence registers the kernel socket,
and subsequent same-sockaddr siblings (plain or device-tagged) are accepted without tripping nginx's duplicate-listen check.
Device-tagged siblings additionally register an entry in the attribution table.
- **FR-1.5a** When the *same* sockaddr is listen'd from a *different* `server { ... }` block — the shared `include` pattern —
the wrapper MUST strip socket-level options from `cf->args` before delegating to the core listen handler. These options
(`reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`, `deferred`, `ipv6only=`,
`so_keepalive=`) apply to the one kernel socket that backs the sockaddr and nginx rejects any attempt to set them more
than once per sockaddr with `duplicate listen options for <addr>` (see `ngx_http_add_addresses` in `src/http/ngx_http.c`).
The first cscf to hit the sockaddr owns the options; subsequent cscfs pass the core handler with the options removed and
merge via `ngx_http_add_server`. Protocol-level flags (`ssl`, `http2`, `quic`, `proxy_protocol`) are preserved on every call
because nginx merges them with OR semantics across cscfs.
- **FR-1.6** A `listen` directive that uses a wildcard address (`80`, `[::]:80`) together with `device=<ifname>` MUST attribute
every connection whose ingress interface is `<ifname>` — regardless of which local address the client addressed — to that
listen's source tag. Traffic on other interfaces MUST fall back to the configured default source (see FR-1.3).

View File

@@ -199,6 +199,31 @@ register bindings without tripping nginx's duplicate-listen check. Traffic arriv
back to `ipng_stats_default_source` (`direct` by default). Keeping "direct" traffic on its own port — e.g.
`listen 198.51.100.1:8081;` — remains a fine pattern when you want a hard split, but it's no longer required.
### Shared includes with `reuseport` (or other socket-level options)
Socket-level `listen` options — `reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`,
`deferred`, `ipv6only=`, `so_keepalive=` — belong to the one kernel socket that backs a given sockaddr, not to a particular
`server { ... }` block. Stock nginx enforces this by accepting them on at most the *first* listen per sockaddr and emitting
`duplicate listen options for <addr>` on any subsequent repeat. That rule collides with the common deployment pattern of a single
`listens.conf` included from every vhost, because each vhost's `include` re-submits the same options.
The wrapper resolves this transparently. When a sockaddr recurs under a different `server` block than the one that first
registered it, the wrapper strips socket-level options from the incoming `cf->args` before delegating to nginx's core listen
handler. The first `server` block owns the options on the kernel socket (including `reuseport`, which triggers per-worker
socket cloning); later blocks merge cleanly via `ngx_http_add_server` and inherit the same socket. The wrapper logs one
`[notice] ipng_stats: stripped socket options from duplicate listen on <addr>` per stripped listen — informational, not an
error. So this include works unchanged across as many vhosts as you like:
```nginx
listen 443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
listen [::]:443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
```
`reuseport` noticeably helps worker load-balancing on busy hosts: without it, a single shared listening socket forces workers
to compete for accepts and traffic routinely concentrates on one or two workers. HTTP/2 and long-lived keepalive connections
can still skew CPU toward whichever worker holds a few heavy clients — `reuseport` does not reshuffle existing connections —
but new-connection distribution across workers becomes kernel-hashed, not first-ready-wins.
## 4. Verify with curl
Generate some traffic (or wait for real traffic), then scrape the endpoint locally: