Support multiple device-pinned listens sharing a single port

Nginx's config-level duplicate-listen check rejected the
documented pattern of `listen 80 device=X ipng_source_tag=A;
listen 80 device=Y ipng_source_tag=B;` with "a duplicate listen
0.0.0.0:80", and even when the dedup was bypassed the kernel
refused the second bind() because the first socket was already
holding the port without SO_BINDTODEVICE.

The listen wrapper now detects same-sockaddr duplicates before
the core handler sees them and records them with `needs_clone=1`.
In init_module, phase 1 clones an ngx_listening_t for each such
duplicate, phase 3 closes every inherited naked fd, and phase 4
rebinds every target with SO_REUSEADDR + SO_REUSEPORT +
SO_BINDTODEVICE set before bind(). SO_REUSEPORT keeps
`nginx -s reload` from colliding with the still-bound sockets
held by old workers during graceful drain; IPV6_V6ONLY matches
nginx's default so the IPv6 listen doesn't claim the IPv4
wildcard and collide with sibling IPv4-specific listens.

Restructure 01-module to cover the pattern end-to-end: four
device-pinned listens on port 8080 (eth1 shares tag `tag1`
across v4 and v6; eth2 splits into `tag2-v4` / `tag2-v6`),
clients and server both get IPv6 addresses, and a new
"Per-(device, family) request count accuracy" case proves that
10 requests on each of the four combinations yields tag1=20,
tag2-v4=10, tag2-v6=10. Mgmt/direct traffic moves to port 9180
so it no longer clashes with the shared-port wildcards.

Document the constraint in docs/user-guide.md: all listens on
a given port must carry `device=`, and direct traffic belongs
on a separate port.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-18 11:45:40 +02:00
parent fdef2a552b
commit df05bae8a3
7 changed files with 400 additions and 153 deletions

View File

@@ -92,7 +92,7 @@ ip -6 -s link show gre-mg1
The plugin needs three things in `nginx.conf`:
1. A shared-memory zone for counters (`ipng_stats_zone`).
2. A set of `listen` directives — a wildcard fallback plus one device-bound listener per attributed interface.
2. One device-bound `listen` directive per attributed (interface, address family) pair.
3. A scrape location serving the `ipng_stats` handler.
A minimal working configuration looks like this:
@@ -109,17 +109,25 @@ http {
ipng_stats_flush_interval 1s;
ipng_stats_default_source direct;
# A normal vhost. The fallback listen lines serve direct web traffic;
# the included file adds one device-bound listen per attributed interface.
# Attributed vhost. Every listen on this port must be device-tagged —
# see "All listens on a shared port must be device-tagged" below.
server {
listen 80;
listen [::]:80;
include /etc/nginx/ipng-stats/listens.conf;
server_name _;
root /var/www/html;
}
# Direct (un-attributed) traffic on a separate port — the listen has no
# device=, so requests get the `ipng_stats_default_source` tag.
server {
listen 198.51.100.1:8081 default_server;
listen [2001:db8::1]:8081 default_server;
server_name _;
root /var/www/html;
}
# A second server block exposing the scrape endpoint on a locked-down port.
server {
listen 127.0.0.1:9113;
@@ -165,12 +173,30 @@ You do not need to enumerate VIPs in `listen`. A wildcard `listen 80 device=gre-
served through the `gre-mg1` interface, and nginx routes per-request to the right vhost by `server_name` / `Host:` header. Adding a new
VIP is a `server_name` change; adding a new interface is an append to `listens.conf`.
### Why both a wildcard and device-bound listens?
### All listens on a shared port must be device-tagged
The fallback `listen 80;` / `listen [::]:80;` catches traffic arriving on any interface that isn't one of your attributed interfaces —
for example, real clients hitting your host directly over `eth0`. The kernel's TCP socket lookup prefers the most-specific
(device-matching) listener, so a SYN on `gre-mg1` always lands on the `mg1` socket, and a SYN on `eth0` always lands on the fallback.
No races, no stealing. Direct traffic is counted under the tag set by `ipng_stats_default_source` (`direct` by default).
If you use multiple `listen` directives on the same port (e.g. port 80), **every one of them must carry `device=<ifname>`**. Mixing a
device-pinned listen with a plain `listen 80;` or with an address-specific `listen 192.0.2.1:80;` on the same port is **not
supported** and nginx will fail to start. This is a kernel-level limitation: a device-pinned socket sets `SO_BINDTODEVICE` before
`bind(2)`, while a plain wildcard socket sets no device filter — Linux refuses to hold both on the same `(addr, port)` tuple, so
the second bind fails with `EADDRINUSE` regardless of what the nginx config-level dedup might do.
For "direct" traffic — clients hitting the host on a non-attributed interface — use a **separate port** on the direct interface
(e.g. `listen 198.51.100.1:8081;`). That listen then has no `device=`, so it falls back to the tag set by
`ipng_stats_default_source` (`direct` by default).
### Sharing a single port across address families and devices
Within the device-tagged set, you're free to share port numbers freely across devices and address families: as long as each listen
has a distinct `device=`, the kernel keeps them apart, and within one device you can either reuse a single tag or split by family.
For example:
```nginx
listen 80 device=gre-mg1 ipng_source_tag=mg1;
listen [::]:80 device=gre-mg1 ipng_source_tag=mg1; # same tag across families
listen 80 device=gre-mg2 ipng_source_tag=mg2-v4;
listen [::]:80 device=gre-mg2 ipng_source_tag=mg2-v6; # per-family tags
```
## 4. Verify with curl