Counter coverage for nginx_ipng_bytes_in_total was missing most of the wire for rate-limited POST workloads. Observed on a CT-log-style box: the plugin was reporting ~300 KB/s inbound while btop showed 6+ MB/s on the NIC, a ~20x gap. Root cause is in nginx itself: when a request is rejected before a handler reads the body (limit_req 403, auth_request denial, 413 on fixed Content-Length, early `return 4xx;`), nginx routes the body through ngx_http_discard_request_body / HTTP/2 skip_data. Both paths drop the bytes without ever incrementing r->request_length, so $request_length (and therefore the plugin's bytes_in counter) reflects only the request line and headers. Log handler now adds r->headers_in.content_length_n to bin_sz when r->discard_body is set and the client advertised a Content-Length. That's a tight upper bound on the body bytes actually received — abusive clients like the CT-log hammer case typically send the full declared body before nginx's RST propagates. Normal 200 POSTs are unchanged (they go through the buffered body path, which does update r->request_length, and r->discard_body stays 0). docs/nginx-issues.md catalogs the full analysis — which nginx paths update r->request_length and which don't, how 413 differs between fixed-CL and chunked and HTTP/2, why r->connection->sent is safe to accumulate (it's reset per-request on keepalive and per-stream on HTTP/2), and what the residual gap looks like after this fix (HTTP/2 framing overhead, TLS/TCP, chunked requests with no Content-Length header). Option B (per-connection recv wrapping) is sketched there for later if the residual matters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
123 lines
7.2 KiB
Markdown
123 lines
7.2 KiB
Markdown
# Upstream nginx quirks that affect this plugin
|
||
|
||
This file catalogs behaviors of nginx itself (not bugs in this plugin) that
|
||
have a visible effect on the counters the plugin exposes. It exists so a
|
||
future reader can page the context back in quickly instead of re-deriving it
|
||
from the nginx source.
|
||
|
||
All `path:line` citations below are into the **nginx source tree** (checked
|
||
against tip as of 2026-04), not into this repository.
|
||
|
||
## TL;DR
|
||
|
||
- `r->connection->sent` is **per-request** (reset between HTTP/1.1 keepalive
|
||
requests, per-stream on HTTP/2). Safe to `+=` in the log phase.
|
||
- `r->request_length` is **not** a faithful count of bytes received from the
|
||
client. It is only updated on specific code paths inside nginx, and the
|
||
discard-body path — which is taken on every request that nginx rejects
|
||
before a handler reads the body — bypasses it entirely. On a workload with
|
||
lots of early rejections (rate-limit 403, auth_request denial, 413 with
|
||
fixed Content-Length, `return 4xx;` on POST), `$request_length` and
|
||
therefore `nginx_ipng_bytes_in_total` under-count the wire traffic
|
||
substantially.
|
||
|
||
## `r->request_length` — where it is, and is not, updated
|
||
|
||
| Path | Updates `request_length`? | Source |
|
||
| ----------------------------------------------------- | ------------------------- | --------------------------------------------------- |
|
||
| Buffered body (`ngx_http_do_read_client_request_body`) | yes | `ngx_http_request_body.c:397` (after every `recv`), preread at `:120` |
|
||
| Unbuffered body (`ngx_http_read_unbuffered_request_body`) | yes (HTTP/1 — delegates to the buffered path) | `ngx_http_request_body.c:265` |
|
||
| Discard body (`ngx_http_discard_request_body` → `ngx_http_read_discarded_request_body`) | **NO** | `ngx_http_request_body.c:782-841` — reads into a stack buffer, never touches `request_length` |
|
||
| Chunked HTTP/1 | yes (indirect) | Wire bytes counted at `:397` before the chunked parser runs |
|
||
| HTTP/2 DATA payload | yes (payload only) | `ngx_http_v2_filter_request_body` at `v2/ngx_http_v2.c:4201` |
|
||
| HTTP/2 DATA via `stream->skip_data` | **NO** | `v2/ngx_http_v2.c:1077-1082` — skipped stream data is silently dropped |
|
||
| HTTP/2 frame headers, SETTINGS, WINDOW_UPDATE, PING, PRIORITY, RST_STREAM, GOAWAY, padding | **NO** | Not counted anywhere |
|
||
| HTTP/2 HEADERS / CONTINUATION | yes (decoded size only) | `v2/ngx_http_v2.c:1333, 1968` |
|
||
| HTTP/3 DATA | yes (payload only) | `v3/ngx_http_v3_request.c:1559, 1614, 1660, 1668` |
|
||
| Pipelined-request carry-over in `header_in` | **subtracted** back out | `ngx_http_request_body.c:541` (`r->request_length -= n`) |
|
||
| TLS handshake / record framing | never visible to nginx | Below `c->recv` |
|
||
|
||
### 413 `client_max_body_size` specifically
|
||
|
||
- **Fixed Content-Length:** `ngx_http_core_module.c:1006-1018` detects the
|
||
overrun right after headers are parsed and calls
|
||
`ngx_http_discard_request_body()` before finalizing 413. That path does
|
||
not bump `request_length`, so `$request_length` ends up reflecting only
|
||
the request line + headers. Body bytes that actually streamed in before
|
||
nginx closed are invisible.
|
||
- **Chunked HTTP/1:** `ngx_http_request_body.c:1142-1154` trips 413 inside
|
||
the body filter after the wire bytes have already been counted at `:397`,
|
||
so they are included — but only up to what nginx had `recv`ed before
|
||
returning.
|
||
- **HTTP/2 chunked:** `v2/ngx_http_v2.c:4216-4223` returns 413 after the
|
||
`r->request_length +=` at `:4201`, so partial-body bytes do count.
|
||
|
||
### The common "rate-limit 403" pattern
|
||
|
||
`limit_req` rejects in PREACCESS. On rejection, nginx finalizes with 403,
|
||
which calls `ngx_http_discard_request_body()`. On HTTP/1 this sends the
|
||
body into the stack-buffer discard reader (not counted). On HTTP/2 it sets
|
||
the stream's `skip_data` flag at `v2/ngx_http_v2.c:1077-1082`, so the
|
||
client's DATA frames are parsed and their payload silently dropped (also
|
||
not counted). Either way, `$request_length` reflects headers only.
|
||
|
||
This is the dominant source of under-counting on rate-limited abusive
|
||
workloads. Observed example: a Certificate Transparency client hammering
|
||
`POST /ct/v1/add-chain` at ~460 req/s, each carrying a ~10 KB cert chain
|
||
in the body, all hitting 403. 460 × 10 KB = ~4.6 MB/s of inbound bytes
|
||
completely absent from `$request_length`.
|
||
|
||
## `r->connection->sent` — this one is fine
|
||
|
||
- HTTP/1.1 keepalive: reset at `ngx_http_request.c:3493` (in
|
||
`ngx_http_keepalive_handler`, which begins at `:3356`) when a new request
|
||
arrives on an idle keepalive connection, and at
|
||
`ngx_http_request.c:3226` (in `ngx_http_set_keepalive`, which begins at
|
||
`:3128`) for the pipelined-next-request case.
|
||
- HTTP/2: each request has a fake per-stream connection `fc` initialized
|
||
with `fc->sent = 0` at `v2/ngx_http_v2.c:3039-3044`, so
|
||
`r->connection->sent` is per-stream. Framing overhead on the real TCP
|
||
connection (HEADERS, SETTINGS, WINDOW_UPDATE, PING, GOAWAY) is invisible
|
||
to `r->connection->sent` — that is consistent with how
|
||
`$bytes_sent` behaves and is fine for per-request accounting.
|
||
- `ngx_http_log_module.c:880` (`ngx_http_log_bytes_sent`) just prints
|
||
`r->connection->sent`, confirming this is the same value `$bytes_sent`
|
||
exposes.
|
||
|
||
The plugin's `slot->bytes_out += r->connection->sent` in the log handler is
|
||
therefore correct.
|
||
|
||
## What the plugin does about it
|
||
|
||
### Option A (implemented): Content-Length fallback on discard
|
||
|
||
In the log handler, when `r->discard_body` is set and
|
||
`r->headers_in.content_length_n > 0`, the plugin adds `content_length_n` to
|
||
`bin_sz` instead of trusting `r->request_length`.
|
||
|
||
- **Covers** the rate-limit-403 / auth-denial / 413-fixed-CL / early
|
||
`return 4xx;` cases on any request where the client advertised a
|
||
Content-Length (HTTP/1 or HTTP/2).
|
||
- **Does not cover** chunked requests with no Content-Length header (those
|
||
already count whatever bytes nginx `recv`ed before closing; the
|
||
post-close bytes are gone in any accounting scheme).
|
||
- **Over-counts** when a client announces a Content-Length then hangs up
|
||
partway through (nginx never received the remainder). In practice,
|
||
abusive clients like the CT log hammer case send the full body before
|
||
nginx's RST propagates, so the over-count is small.
|
||
- **Under-counts** HTTP/2 framing overhead (frame headers, control frames).
|
||
That overhead is small (~1% at typical DATA frame sizes) and can be
|
||
ignored for a first cut.
|
||
|
||
See `src/ngx_http_ipng_stats_module.c` — search for `discard_body`.
|
||
|
||
### Option B (not implemented): per-connection recv wrapping
|
||
|
||
Replace `c->recv` on each accepted HTTP connection with a wrapper that
|
||
increments a per-connection byte counter, then delegates to the original.
|
||
At log phase, use `(counter_at_log - counter_at_request_start)` as the
|
||
authoritative `bytes_in`. Captures discarded bodies, HTTP/2 framing, HTTP/2
|
||
`skip_data` bodies, partial uploads, and chunked — everything that crossed
|
||
nginx's `c->recv`. TLS/TCP overhead remains invisible (below the recv
|
||
abstraction). Revisit if the A-residual gap turns out to matter.
|