Files

Pim van Pelt df05bae8a3 Support multiple device-pinned listens sharing a single port

Nginx's config-level duplicate-listen check rejected the
documented pattern of `listen 80 device=X ipng_source_tag=A;
listen 80 device=Y ipng_source_tag=B;` with "a duplicate listen
0.0.0.0:80", and even when the dedup was bypassed the kernel
refused the second bind() because the first socket was already
holding the port without SO_BINDTODEVICE.

The listen wrapper now detects same-sockaddr duplicates before
the core handler sees them and records them with `needs_clone=1`.
In init_module, phase 1 clones an ngx_listening_t for each such
duplicate, phase 3 closes every inherited naked fd, and phase 4
rebinds every target with SO_REUSEADDR + SO_REUSEPORT +
SO_BINDTODEVICE set before bind(). SO_REUSEPORT keeps
`nginx -s reload` from colliding with the still-bound sockets
held by old workers during graceful drain; IPV6_V6ONLY matches
nginx's default so the IPv6 listen doesn't claim the IPv4
wildcard and collide with sibling IPv4-specific listens.

Restructure 01-module to cover the pattern end-to-end: four
device-pinned listens on port 8080 (eth1 shares tag `tag1`
across v4 and v6; eth2 splits into `tag2-v4` / `tag2-v6`),
clients and server both get IPv6 addresses, and a new
"Per-(device, family) request count accuracy" case proves that
10 requests on each of the four combinations yields tag1=20,
tag2-v4=10, tag2-v6=10. Mgmt/direct traffic moves to port 9180
so it no longer clashes with the shared-port wildcards.

Document the constraint in docs/user-guide.md: all listens on
a given port must carry `device=`, and direct traffic belongs
on a separate port.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-18 11:45:40 +02:00

19 KiB

Raw Blame History

nginx-ipng-stats-plugin — User Guide

This document walks an operator through installing the plugin, deploying it on a single nginx host serving traffic that arrives on distinct interfaces (GRE tunnels, VLANs, bonded links, or plain ethernet), verifying that counters are flowing, and hooking up the scrape endpoint to Prometheus and other consumers.

It covers (NFR-7.1):

Installing the Debian package.
Setting up interfaces for per-device attribution (GRE tunnel example).
Writing a minimal nginx configuration.
Verifying with curl.
Scraping from Prometheus.
Setting up a global logtail access log.
Integrating with scrape consumers.

For a directive-by-directive reference, read config-guide.md alongside this guide.

1. Install the package

On Debian Trixie (and newer), the module is distributed as libnginx-mod-http-ipng-stats. The package depends on the stock nginx package and loads cleanly into it without recompiling nginx itself.

sudo apt install ./libnginx-mod-http-ipng-stats_*_amd64.deb

The package will:

Drop ngx_http_ipng_stats_module.so into /usr/lib/nginx/modules/.
Place a load_module stanza in /etc/nginx/modules-available/50-mod-http-ipng-stats.conf.
Symlink it into /etc/nginx/modules-enabled/ so nginx picks it up on the next reload.
Run nginx -t and, if the test fails, remove the modules-enabled symlink and print a warning — so a broken upgrade never leaves you with an nginx that cannot start.

Confirm the module is loaded:

nginx -V 2>&1 | grep -o ngx_http_ipng_stats_module

2. Set up interfaces for per-device attribution

The plugin attributes traffic by watching which interface the request came in on, using SO_BINDTODEVICE on per-interface listening sockets. For this to work, each traffic source that should be tracked separately MUST arrive on its own interface.

This works with any kind of Linux interface — GRE tunnels, VLANs, VXLANs, bonded links, or plain ethernet. This guide uses GRE tunnels as the example, but the module does not care about the interface type.

This guide doesn't prescribe a specific networking layer — use whatever your host already uses (systemd-networkd, Netplan, /etc/network/interfaces, or a hand-rolled script). The only hard requirement is:

Each traffic source that should be separately attributed gets its own interface on the nginx host.
Interfaces follow a consistent naming pattern. For GRE tunnels we recommend gre-<tag>, e.g. gre-mg1, gre-mg2.
The VIPs are bound to a local dummy or loopback interface so the kernel accepts packets destined for them.

For example, with systemd-networkd, a GRE tunnel to a remote peer at 2001:db8::1 from this host at 2001:db8::100 looks like:

# /etc/systemd/network/10-gre-mg1.netdev
[NetDev]
Name=gre-mg1
Kind=ip6gre

[Tunnel]
Local=2001:db8::100
Remote=2001:db8::1
TTL=64

# /etc/systemd/network/10-gre-mg1.network
[Match]
Name=gre-mg1

[Network]
LinkLocalAddressing=no

Repeat for each additional tunnel. A trimmed-down variant of this scheme is what IPng uses in production.

Verify the interfaces exist and carry traffic:

ip -6 tunnel show | grep gre-mg
ip -6 -s link show gre-mg1

3. Write the nginx configuration

The plugin needs three things in nginx.conf:

A shared-memory zone for counters (ipng_stats_zone).
One device-bound listen directive per attributed (interface, address family) pair.
A scrape location serving the ipng_stats handler.

A minimal working configuration looks like this:

load_module modules/ngx_http_ipng_stats_module.so;

events {
    worker_connections 4096;
}

http {
    ipng_stats_zone ipng:4m;
    ipng_stats_flush_interval 1s;
    ipng_stats_default_source direct;

    # Attributed vhost. Every listen on this port must be device-tagged —
    # see "All listens on a shared port must be device-tagged" below.
    server {
        include /etc/nginx/ipng-stats/listens.conf;

        server_name _;
        root /var/www/html;
    }

    # Direct (un-attributed) traffic on a separate port — the listen has no
    # device=, so requests get the `ipng_stats_default_source` tag.
    server {
        listen 198.51.100.1:8081 default_server;
        listen [2001:db8::1]:8081 default_server;

        server_name _;
        root /var/www/html;
    }

    # A second server block exposing the scrape endpoint on a locked-down port.
    server {
        listen 127.0.0.1:9113;
        listen [::1]:9113;

        location = /.well-known/ipng/statsz {
            ipng_stats;
            allow 127.0.0.1;
            allow ::1;
            allow 2001:db8::/48;   # your scrape consumers
            deny all;
        }
    }
}

And /etc/nginx/ipng-stats/listens.conf — the hand-maintained include file — is two lines per attributed interface (one per address family):

listen 80      device=gre-mg1 ipng_source_tag=mg1;
listen [::]:80 device=gre-mg1 ipng_source_tag=mg1;
listen 80      device=gre-mg2 ipng_source_tag=mg2;
listen [::]:80 device=gre-mg2 ipng_source_tag=mg2;
listen 80      device=gre-mg3 ipng_source_tag=mg3;
listen [::]:80 device=gre-mg3 ipng_source_tag=mg3;
listen 80      device=gre-mg4 ipng_source_tag=mg4;
listen [::]:80 device=gre-mg4 ipng_source_tag=mg4;

Test and reload:

sudo nginx -t
sudo nginx -s reload

If nginx -t complains about an unknown listen parameter (device= or ipng_source_tag=), the module isn't loaded — check step 1.

Why wildcard listens?

You do not need to enumerate VIPs in listen. A wildcard listen 80 device=gre-mg1 ipng_source_tag=mg1; accepts any local address served through the gre-mg1 interface, and nginx routes per-request to the right vhost by server_name / Host: header. Adding a new VIP is a server_name change; adding a new interface is an append to listens.conf.

All listens on a shared port must be device-tagged

If you use multiple listen directives on the same port (e.g. port 80), every one of them must carry device=<ifname>. Mixing a device-pinned listen with a plain listen 80; or with an address-specific listen 192.0.2.1:80; on the same port is not supported and nginx will fail to start. This is a kernel-level limitation: a device-pinned socket sets SO_BINDTODEVICE before bind(2), while a plain wildcard socket sets no device filter — Linux refuses to hold both on the same (addr, port) tuple, so the second bind fails with EADDRINUSE regardless of what the nginx config-level dedup might do.

For "direct" traffic — clients hitting the host on a non-attributed interface — use a separate port on the direct interface (e.g. listen 198.51.100.1:8081;). That listen then has no device=, so it falls back to the tag set by ipng_stats_default_source (direct by default).

Within the device-tagged set, you're free to share port numbers freely across devices and address families: as long as each listen has a distinct device=, the kernel keeps them apart, and within one device you can either reuse a single tag or split by family. For example:

listen 80        device=gre-mg1 ipng_source_tag=mg1;
listen [::]:80   device=gre-mg1 ipng_source_tag=mg1;        # same tag across families
listen 80        device=gre-mg2 ipng_source_tag=mg2-v4;
listen [::]:80   device=gre-mg2 ipng_source_tag=mg2-v6;     # per-family tags

4. Verify with curl

Generate some traffic (or wait for real traffic), then scrape the endpoint locally:

curl -s http://127.0.0.1:9113/.well-known/ipng/statsz

Default output is Prometheus text format:

# HELP nginx_ipng_requests_total Total HTTP requests.
# TYPE nginx_ipng_requests_total counter
nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="2xx"} 12345
nginx_ipng_requests_total{source_tag="mg1",vip="192.0.2.10",code="4xx"} 17
nginx_ipng_requests_total{source_tag="mg2",vip="192.0.2.10",code="2xx"} 9876
nginx_ipng_requests_total{source_tag="direct",vip="192.0.2.10",code="2xx"} 42
# HELP nginx_ipng_bytes_in_total Request bytes received.
# TYPE nginx_ipng_bytes_in_total counter
nginx_ipng_bytes_in_total{source_tag="mg1",vip="192.0.2.10",code="2xx"} 9876543
# ... and so on

# Histogram series (request_duration, upstream_response, bytes_in, bytes_out)
# do NOT carry a `code` label — they aggregate across classes per (source, vip).
nginx_ipng_request_duration_seconds_bucket{source_tag="mg1",vip="192.0.2.10",le="0.050"} 11200

For JSON output instead, set the Accept header:

curl -s -H 'Accept: application/json' http://127.0.0.1:9113/.well-known/ipng/statsz | jq .

To filter server-side to a single source tag:

curl -s 'http://127.0.0.1:9113/.well-known/ipng/statsz?source_tag=mg1'
curl -s 'http://127.0.0.1:9113/.well-known/ipng/statsz?source_tag=mg1&vip=192.0.2.10'

If you see source_tag="direct" entries with non-zero counts and you expected all traffic to come in via attributed interfaces, something is routing around them — typically an interface that isn't in listens.conf, or an interface that's down.

5. Scrape from Prometheus

The same endpoint serves Prometheus text by default. Add a scrape job:

# /etc/prometheus/prometheus.yml
scrape_configs:
  - job_name: nginx-ipng
    scrape_interval: 15s
    static_configs:
      - targets:
          - 'nginx-backend-1.example.com:9113'
          - 'nginx-backend-2.example.com:9113'
    metrics_path: /.well-known/ipng/statsz

You'll want to add nginx-backend-* to your allow rules in the scrape server block, or front the plugin with a TLS-terminating reverse proxy. The module does not ship its own auth; the nginx allow/deny ACL is your access control.

Typical PromQL queries:

# Requests per second per source, per VIP:
sum by (source_tag, vip) (rate(nginx_ipng_requests_total[1m]))

# 5xx error rate per VIP, aggregated across all sources:
sum by (vip) (rate(nginx_ipng_requests_total{code="5xx"}[5m]))
  /
sum by (vip) (rate(nginx_ipng_requests_total[5m]))

# p95 request duration per (source_tag, vip):
histogram_quantile(0.95,
    sum by (source_tag, vip, le) (rate(nginx_ipng_request_duration_seconds_bucket[5m])))

6. Set up a global logtail access log

Operators who want a single unified access log covering all traffic — regardless of which server block handled the request — normally have to repeat access_log in every server {} block or rely on a catch-all virtual host. The ipng_stats_logtail directive removes that requirement: one line at the http level registers a global log-phase writer that fires unconditionally for every request (FR-8.1).

The logtail is also the recommended escape hatch when you need richer cardinality than the stats zone exposes. The Prometheus counters deliberately collapse HTTP status codes into six class lanes (1xx..5xx/unknown) to keep scrape size bounded. Operators who need per-three-digit-code, per-path, per-user-agent, or any other high-cardinality breakdown should ship the logtail stream to an off-path analytics receiver and compute those views there — that work happens in a different process and never touches the nginx hot path.

The logtail sends each buffer flush as a single UDP datagram to a host:port. Zero disk I/O, no backpressure, no blocking if the receiver is down. This makes it ideal for fire-and-forget analytics pipelines where delivery guarantees are unnecessary and disk writes would add unwanted I/O pressure. For file-based access logging, use nginx's built-in access_log directive.

Define the log format

Add a log_format declaration inside the http { ... } block, before the ipng_stats_logtail directive that references it:

log_format ipng_stats_logtail '$host\t$remote_addr\t$request_method\t$request_uri\t'
                              '$status\t$body_bytes_sent\t'
                              '$ipng_source_tag\t$server_addr\t$scheme';

Any nginx variable is usable here, including $ipng_source_tag (the device attribution tag, FR-6.1), $server_addr (the VIP that received the request), and $scheme (http or https — useful since $server_addr alone doesn't distinguish ports).

Configuration

http {
    ipng_stats_zone ipng:4m;

    log_format ipng_stats_logtail '$host\t$remote_addr\t$request_method\t$request_uri\t'
                                  '$status\t$body_bytes_sent\t'
                                  '$ipng_source_tag\t$server_addr\t$scheme';

    ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 buffer=16k flush=1s;

    server { ... }
}

ipng_stats_logtail (first argument) — the log_format name.
udp://127.0.0.1:9514 — destination as a udp://host:port URI. host must be a literal IPv4 address (no hostnames, no IPv6 in v0.1).
buffer=16k — per-worker write buffer. Lines are held in memory until the buffer fills, the flush timer fires, or the worker exits. Default is 64k; minimum is 1k (FR-8.3).
flush=1s — maximum age of buffered data before it is sent. Default is 1s; minimum is 100ms (FR-8.3).

Each buffer flush becomes a single sendto() on a per-worker SOCK_DGRAM socket. When the flush timer fires (or the buffer fills), the entire buffered payload is sent as one datagram — no file open, no write(), no fsync(). If no receiver is listening, the kernel drops the datagram silently and the worker carries on. This is by design: the logtail exists for non-critical analytics pipes where lost datagrams are acceptable and disk I/O is not.

Constraints (v0.1):

host must be a literal IPv4 address. Hostnames and IPv6 are not supported yet.
Large buffer= values produce large datagrams. On the loopback interface the practical ceiling is ~64 KB, well above typical configured buffer sizes. On routed paths, path MTU applies.
There is no acknowledgment, retry, or sequence number. If the receiver is down, the data is gone.

Filtering with `if=`

High-frequency requests like health checks can be suppressed from the logtail stream using the if=$variable parameter. Use a map block to define which requests should be logged:

map $request_uri $logtail_enabled {
    ~^/\.well-known/ipng/healthz  0;
    default                       1;
}

ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 buffer=16k flush=1s if=$logtail_enabled;

Filtered requests are still counted by the stats module — only the logtail output is suppressed. The condition is checked before the log format is rendered, so filtered requests have zero logtail overhead. Multiple conditions can be combined using nested map blocks.

See config-guide.md for the full semantics.

Starting a receiver is trivial:

# Quick one-shot inspection:
nc -u -l 127.0.0.1 9514

For a production-ready logtail consumer, see nginx-logtail, which receives the UDP datagram stream and processes it into structured log output.

A typical received log line (with the format above, tab-separated) looks like:

example.com	203.0.113.42	GET	/index.html	200	4321	mg1	192.0.2.10	https

The mg1 field comes from $ipng_source_tag and https from $scheme — free per-device attribution and protocol visibility in every log line.

Why this complements per-server `access_log`

A conventional nginx access log requires the operator to repeat access_log /path/to/file logtail; in every server {} block that should be captured. This is error-prone: adding a new vhost and forgetting the directive means that vhost's traffic is silently absent from the log. ipng_stats_logtail is installed at the module's log-phase hook, which nginx calls for every request with no per-server configuration required.

See config-guide.md for the full directive reference and FR-8 for the requirements behind this feature.

7. Integrate with scrape consumers

The scrape endpoint (ipng_stats;) serves both Prometheus text and JSON from a single location. Any HTTP client that can issue a GET request can consume it. Two integration patterns are common:

Prometheus

See section 5 above. Prometheus scrapes the endpoint at a configured interval and stores the time series. This is the simplest integration and covers most monitoring and alerting use cases.

Custom consumers

The ?source_tag=<tag> query parameter lets a consumer filter the scrape response to only the traffic attributed to a specific source. This is useful when multiple consumers share the same nginx backends — each consumer scrapes with its own tag and never sees the others' traffic.

The JSON output (Accept: application/json) includes a top-level schema field for versioning, making it straightforward to parse from any language.

Once wired, a consumer can derive from the scrape data:

Live QPS per backend (from the EWMA gauges).
Status-class mix per backend (the six-lane 1xx..5xx/unknown counter families). Full three-digit codes are not exported by the scrape endpoint; route the logtail stream off-host and aggregate there if you need per-code breakdowns.
p50/p95 latency per backend (from the duration histogram, aggregated across classes).
Traffic volume per backend (from the bytes counters and the new bytes histograms).

For an example of this pattern in a GRE tunnel fleet, see vpp-maglev, whose frontend scrapes each nginx backend filtered by source tag to show per-backend traffic alongside health state.

Troubleshooting

nginx -t reports "unknown listen parameter: device=" or "unknown listen parameter: ipng_source_tag=". The module isn't loaded. Check /etc/nginx/modules-enabled/ for the 50-mod-http-ipng-stats.conf symlink and re-run nginx -t.

All traffic is attributed to direct even though device-bound interfaces exist. The interface names don't match the device= values in listens.conf, or the interfaces aren't up. Run ip -br link and confirm the interface names match.

Counters reset after every reload. They should survive nginx -s reload. If they don't, check that the ipng_stats_zone name in nginx.conf is stable across reloads — renaming the zone forces a new shared-memory segment.

nginx_ipng_zone_full_events_total is non-zero. The shared-memory zone is too small for your VIP count. Increase the size in ipng_stats_zone ipng:<size> (default 4 MB is enough for ~hundreds of VIPs — the code dimension is bucketed to six classes, so one 4 MB zone holds a very large deployment).

curl http://127.0.0.1:9113/.well-known/ipng/statsz returns "403 Forbidden". The allow/deny ACL is blocking your source address. Either add yourself or scrape from a host already in the allow list.

Where to go next

config-guide.md — every directive and listen parameter with contexts, allowed values, and defaults.
design.md — full design document, including the attribution model, hot-path cost analysis, and failure modes.

19 KiB Raw Blame History