Strip socket options on cross-cscf repeat listens (v0.7.2)
Make the shared-listen-include pattern work with `reuseport` and the other socket-level listen options. Nginx core enforces at-most-once per sockaddr on options that set lsopt.set=1 (reuseport, bind, backlog=, rcvbuf=, sndbuf=, setfib=, fastopen=, accept_filter=, deferred, ipv6only=, so_keepalive=) and emits "duplicate listen options for <addr>" otherwise. That rule collides with a single listens.conf included from every vhost — each vhost's include re-submits the same options. The listen wrapper now detects the cross-cscf case, strips those options from cf->args before delegating to the core handler, and logs one notice per stripped listen. The first cscf owns the options on the kernel socket; later cscfs merge cleanly via ngx_http_add_server. Protocol-level flags (ssl, http2, quic, proxy_protocol) pass through untouched since nginx OR-merges those across cscfs. This unblocks `reuseport` for deployments that want better new-connection spread across workers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -90,6 +90,14 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
|
||||
`(server block, sockaddr)` across both plain and device-tagged listens: the first occurrence registers the kernel socket,
|
||||
and subsequent same-sockaddr siblings (plain or device-tagged) are accepted without tripping nginx's duplicate-listen check.
|
||||
Device-tagged siblings additionally register an entry in the attribution table.
|
||||
- **FR-1.5a** When the *same* sockaddr is listen'd from a *different* `server { ... }` block — the shared `include` pattern —
|
||||
the wrapper MUST strip socket-level options from `cf->args` before delegating to the core listen handler. These options
|
||||
(`reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`, `deferred`, `ipv6only=`,
|
||||
`so_keepalive=`) apply to the one kernel socket that backs the sockaddr and nginx rejects any attempt to set them more
|
||||
than once per sockaddr with `duplicate listen options for <addr>` (see `ngx_http_add_addresses` in `src/http/ngx_http.c`).
|
||||
The first cscf to hit the sockaddr owns the options; subsequent cscfs pass the core handler with the options removed and
|
||||
merge via `ngx_http_add_server`. Protocol-level flags (`ssl`, `http2`, `quic`, `proxy_protocol`) are preserved on every call
|
||||
because nginx merges them with OR semantics across cscfs.
|
||||
- **FR-1.6** A `listen` directive that uses a wildcard address (`80`, `[::]:80`) together with `device=<ifname>` MUST attribute
|
||||
every connection whose ingress interface is `<ifname>` — regardless of which local address the client addressed — to that
|
||||
listen's source tag. Traffic on other interfaces MUST fall back to the configured default source (see FR-1.3).
|
||||
|
||||
@@ -199,6 +199,31 @@ register bindings without tripping nginx's duplicate-listen check. Traffic arriv
|
||||
back to `ipng_stats_default_source` (`direct` by default). Keeping "direct" traffic on its own port — e.g.
|
||||
`listen 198.51.100.1:8081;` — remains a fine pattern when you want a hard split, but it's no longer required.
|
||||
|
||||
### Shared includes with `reuseport` (or other socket-level options)
|
||||
|
||||
Socket-level `listen` options — `reuseport`, `bind`, `backlog=`, `rcvbuf=`, `sndbuf=`, `setfib=`, `fastopen=`, `accept_filter=`,
|
||||
`deferred`, `ipv6only=`, `so_keepalive=` — belong to the one kernel socket that backs a given sockaddr, not to a particular
|
||||
`server { ... }` block. Stock nginx enforces this by accepting them on at most the *first* listen per sockaddr and emitting
|
||||
`duplicate listen options for <addr>` on any subsequent repeat. That rule collides with the common deployment pattern of a single
|
||||
`listens.conf` included from every vhost, because each vhost's `include` re-submits the same options.
|
||||
|
||||
The wrapper resolves this transparently. When a sockaddr recurs under a different `server` block than the one that first
|
||||
registered it, the wrapper strips socket-level options from the incoming `cf->args` before delegating to nginx's core listen
|
||||
handler. The first `server` block owns the options on the kernel socket (including `reuseport`, which triggers per-worker
|
||||
socket cloning); later blocks merge cleanly via `ngx_http_add_server` and inherit the same socket. The wrapper logs one
|
||||
`[notice] ipng_stats: stripped socket options from duplicate listen on <addr>` per stripped listen — informational, not an
|
||||
error. So this include works unchanged across as many vhosts as you like:
|
||||
|
||||
```nginx
|
||||
listen 443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
|
||||
listen [::]:443 ssl reuseport device=gre-mg1 ipng_source_tag=mg1;
|
||||
```
|
||||
|
||||
`reuseport` noticeably helps worker load-balancing on busy hosts: without it, a single shared listening socket forces workers
|
||||
to compete for accepts and traffic routinely concentrates on one or two workers. HTTP/2 and long-lived keepalive connections
|
||||
can still skew CPU toward whichever worker holds a few heavy clients — `reuseport` does not reshuffle existing connections —
|
||||
but new-connection distribution across workers becomes kernel-hashed, not first-ready-wins.
|
||||
|
||||
## 4. Verify with curl
|
||||
|
||||
Generate some traffic (or wait for real traffic), then scrape the endpoint locally:
|
||||
|
||||
Reference in New Issue
Block a user