Support multiple device-pinned listens sharing a single port
Nginx's config-level duplicate-listen check rejected the documented pattern of `listen 80 device=X ipng_source_tag=A; listen 80 device=Y ipng_source_tag=B;` with "a duplicate listen 0.0.0.0:80", and even when the dedup was bypassed the kernel refused the second bind() because the first socket was already holding the port without SO_BINDTODEVICE. The listen wrapper now detects same-sockaddr duplicates before the core handler sees them and records them with `needs_clone=1`. In init_module, phase 1 clones an ngx_listening_t for each such duplicate, phase 3 closes every inherited naked fd, and phase 4 rebinds every target with SO_REUSEADDR + SO_REUSEPORT + SO_BINDTODEVICE set before bind(). SO_REUSEPORT keeps `nginx -s reload` from colliding with the still-bound sockets held by old workers during graceful drain; IPV6_V6ONLY matches nginx's default so the IPv6 listen doesn't claim the IPv4 wildcard and collide with sibling IPv4-specific listens. Restructure 01-module to cover the pattern end-to-end: four device-pinned listens on port 8080 (eth1 shares tag `tag1` across v4 and v6; eth2 splits into `tag2-v4` / `tag2-v6`), clients and server both get IPv6 addresses, and a new "Per-(device, family) request count accuracy" case proves that 10 requests on each of the four combinations yields tag1=20, tag2-v4=10, tag2-v6=10. Mgmt/direct traffic moves to port 9180 so it no longer clashes with the shared-port wildcards. Document the constraint in docs/user-guide.md: all listens on a given port must carry `device=`, and direct traffic belongs on a separate port. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -92,7 +92,7 @@ ip -6 -s link show gre-mg1
|
||||
The plugin needs three things in `nginx.conf`:
|
||||
|
||||
1. A shared-memory zone for counters (`ipng_stats_zone`).
|
||||
2. A set of `listen` directives — a wildcard fallback plus one device-bound listener per attributed interface.
|
||||
2. One device-bound `listen` directive per attributed (interface, address family) pair.
|
||||
3. A scrape location serving the `ipng_stats` handler.
|
||||
|
||||
A minimal working configuration looks like this:
|
||||
@@ -109,17 +109,25 @@ http {
|
||||
ipng_stats_flush_interval 1s;
|
||||
ipng_stats_default_source direct;
|
||||
|
||||
# A normal vhost. The fallback listen lines serve direct web traffic;
|
||||
# the included file adds one device-bound listen per attributed interface.
|
||||
# Attributed vhost. Every listen on this port must be device-tagged —
|
||||
# see "All listens on a shared port must be device-tagged" below.
|
||||
server {
|
||||
listen 80;
|
||||
listen [::]:80;
|
||||
include /etc/nginx/ipng-stats/listens.conf;
|
||||
|
||||
server_name _;
|
||||
root /var/www/html;
|
||||
}
|
||||
|
||||
# Direct (un-attributed) traffic on a separate port — the listen has no
|
||||
# device=, so requests get the `ipng_stats_default_source` tag.
|
||||
server {
|
||||
listen 198.51.100.1:8081 default_server;
|
||||
listen [2001:db8::1]:8081 default_server;
|
||||
|
||||
server_name _;
|
||||
root /var/www/html;
|
||||
}
|
||||
|
||||
# A second server block exposing the scrape endpoint on a locked-down port.
|
||||
server {
|
||||
listen 127.0.0.1:9113;
|
||||
@@ -165,12 +173,30 @@ You do not need to enumerate VIPs in `listen`. A wildcard `listen 80 device=gre-
|
||||
served through the `gre-mg1` interface, and nginx routes per-request to the right vhost by `server_name` / `Host:` header. Adding a new
|
||||
VIP is a `server_name` change; adding a new interface is an append to `listens.conf`.
|
||||
|
||||
### Why both a wildcard and device-bound listens?
|
||||
### All listens on a shared port must be device-tagged
|
||||
|
||||
The fallback `listen 80;` / `listen [::]:80;` catches traffic arriving on any interface that isn't one of your attributed interfaces —
|
||||
for example, real clients hitting your host directly over `eth0`. The kernel's TCP socket lookup prefers the most-specific
|
||||
(device-matching) listener, so a SYN on `gre-mg1` always lands on the `mg1` socket, and a SYN on `eth0` always lands on the fallback.
|
||||
No races, no stealing. Direct traffic is counted under the tag set by `ipng_stats_default_source` (`direct` by default).
|
||||
If you use multiple `listen` directives on the same port (e.g. port 80), **every one of them must carry `device=<ifname>`**. Mixing a
|
||||
device-pinned listen with a plain `listen 80;` or with an address-specific `listen 192.0.2.1:80;` on the same port is **not
|
||||
supported** and nginx will fail to start. This is a kernel-level limitation: a device-pinned socket sets `SO_BINDTODEVICE` before
|
||||
`bind(2)`, while a plain wildcard socket sets no device filter — Linux refuses to hold both on the same `(addr, port)` tuple, so
|
||||
the second bind fails with `EADDRINUSE` regardless of what the nginx config-level dedup might do.
|
||||
|
||||
For "direct" traffic — clients hitting the host on a non-attributed interface — use a **separate port** on the direct interface
|
||||
(e.g. `listen 198.51.100.1:8081;`). That listen then has no `device=`, so it falls back to the tag set by
|
||||
`ipng_stats_default_source` (`direct` by default).
|
||||
|
||||
### Sharing a single port across address families and devices
|
||||
|
||||
Within the device-tagged set, you're free to share port numbers freely across devices and address families: as long as each listen
|
||||
has a distinct `device=`, the kernel keeps them apart, and within one device you can either reuse a single tag or split by family.
|
||||
For example:
|
||||
|
||||
```nginx
|
||||
listen 80 device=gre-mg1 ipng_source_tag=mg1;
|
||||
listen [::]:80 device=gre-mg1 ipng_source_tag=mg1; # same tag across families
|
||||
listen 80 device=gre-mg2 ipng_source_tag=mg2-v4;
|
||||
listen [::]:80 device=gre-mg2 ipng_source_tag=mg2-v6; # per-family tags
|
||||
```
|
||||
|
||||
## 4. Verify with curl
|
||||
|
||||
|
||||
Reference in New Issue
Block a user