Count discarded POST bodies against bytes_in

Counter coverage for nginx_ipng_bytes_in_total was missing most of
the wire for rate-limited POST workloads. Observed on a CT-log-style
box: the plugin was reporting ~300 KB/s inbound while btop showed
6+ MB/s on the NIC, a ~20x gap. Root cause is in nginx itself: when
a request is rejected before a handler reads the body (limit_req
403, auth_request denial, 413 on fixed Content-Length, early
`return 4xx;`), nginx routes the body through
ngx_http_discard_request_body / HTTP/2 skip_data. Both paths drop
the bytes without ever incrementing r->request_length, so
$request_length (and therefore the plugin's bytes_in counter)
reflects only the request line and headers.

Log handler now adds r->headers_in.content_length_n to bin_sz when
r->discard_body is set and the client advertised a Content-Length.
That's a tight upper bound on the body bytes actually received —
abusive clients like the CT-log hammer case typically send the
full declared body before nginx's RST propagates. Normal 200 POSTs
are unchanged (they go through the buffered body path, which does
update r->request_length, and r->discard_body stays 0).

docs/nginx-issues.md catalogs the full analysis — which nginx paths
update r->request_length and which don't, how 413 differs between
fixed-CL and chunked and HTTP/2, why r->connection->sent is safe to
accumulate (it's reset per-request on keepalive and per-stream on
HTTP/2), and what the residual gap looks like after this fix
(HTTP/2 framing overhead, TLS/TCP, chunked requests with no
Content-Length header). Option B (per-connection recv wrapping) is
sketched there for later if the residual matters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-22 09:57:45 +02:00
parent 96f77baacd
commit 05d405aba5
2 changed files with 133 additions and 0 deletions

122
docs/nginx-issues.md Normal file
View File

@@ -0,0 +1,122 @@
# Upstream nginx quirks that affect this plugin
This file catalogs behaviors of nginx itself (not bugs in this plugin) that
have a visible effect on the counters the plugin exposes. It exists so a
future reader can page the context back in quickly instead of re-deriving it
from the nginx source.
All `path:line` citations below are into the **nginx source tree** (checked
against tip as of 2026-04), not into this repository.
## TL;DR
- `r->connection->sent` is **per-request** (reset between HTTP/1.1 keepalive
requests, per-stream on HTTP/2). Safe to `+=` in the log phase.
- `r->request_length` is **not** a faithful count of bytes received from the
client. It is only updated on specific code paths inside nginx, and the
discard-body path — which is taken on every request that nginx rejects
before a handler reads the body — bypasses it entirely. On a workload with
lots of early rejections (rate-limit 403, auth_request denial, 413 with
fixed Content-Length, `return 4xx;` on POST), `$request_length` and
therefore `nginx_ipng_bytes_in_total` under-count the wire traffic
substantially.
## `r->request_length` — where it is, and is not, updated
| Path | Updates `request_length`? | Source |
| ----------------------------------------------------- | ------------------------- | --------------------------------------------------- |
| Buffered body (`ngx_http_do_read_client_request_body`) | yes | `ngx_http_request_body.c:397` (after every `recv`), preread at `:120` |
| Unbuffered body (`ngx_http_read_unbuffered_request_body`) | yes (HTTP/1 — delegates to the buffered path) | `ngx_http_request_body.c:265` |
| Discard body (`ngx_http_discard_request_body``ngx_http_read_discarded_request_body`) | **NO** | `ngx_http_request_body.c:782-841` — reads into a stack buffer, never touches `request_length` |
| Chunked HTTP/1 | yes (indirect) | Wire bytes counted at `:397` before the chunked parser runs |
| HTTP/2 DATA payload | yes (payload only) | `ngx_http_v2_filter_request_body` at `v2/ngx_http_v2.c:4201` |
| HTTP/2 DATA via `stream->skip_data` | **NO** | `v2/ngx_http_v2.c:1077-1082` — skipped stream data is silently dropped |
| HTTP/2 frame headers, SETTINGS, WINDOW_UPDATE, PING, PRIORITY, RST_STREAM, GOAWAY, padding | **NO** | Not counted anywhere |
| HTTP/2 HEADERS / CONTINUATION | yes (decoded size only) | `v2/ngx_http_v2.c:1333, 1968` |
| HTTP/3 DATA | yes (payload only) | `v3/ngx_http_v3_request.c:1559, 1614, 1660, 1668` |
| Pipelined-request carry-over in `header_in` | **subtracted** back out | `ngx_http_request_body.c:541` (`r->request_length -= n`) |
| TLS handshake / record framing | never visible to nginx | Below `c->recv` |
### 413 `client_max_body_size` specifically
- **Fixed Content-Length:** `ngx_http_core_module.c:1006-1018` detects the
overrun right after headers are parsed and calls
`ngx_http_discard_request_body()` before finalizing 413. That path does
not bump `request_length`, so `$request_length` ends up reflecting only
the request line + headers. Body bytes that actually streamed in before
nginx closed are invisible.
- **Chunked HTTP/1:** `ngx_http_request_body.c:1142-1154` trips 413 inside
the body filter after the wire bytes have already been counted at `:397`,
so they are included — but only up to what nginx had `recv`ed before
returning.
- **HTTP/2 chunked:** `v2/ngx_http_v2.c:4216-4223` returns 413 after the
`r->request_length +=` at `:4201`, so partial-body bytes do count.
### The common "rate-limit 403" pattern
`limit_req` rejects in PREACCESS. On rejection, nginx finalizes with 403,
which calls `ngx_http_discard_request_body()`. On HTTP/1 this sends the
body into the stack-buffer discard reader (not counted). On HTTP/2 it sets
the stream's `skip_data` flag at `v2/ngx_http_v2.c:1077-1082`, so the
client's DATA frames are parsed and their payload silently dropped (also
not counted). Either way, `$request_length` reflects headers only.
This is the dominant source of under-counting on rate-limited abusive
workloads. Observed example: a Certificate Transparency client hammering
`POST /ct/v1/add-chain` at ~460 req/s, each carrying a ~10 KB cert chain
in the body, all hitting 403. 460 × 10 KB = ~4.6 MB/s of inbound bytes
completely absent from `$request_length`.
## `r->connection->sent` — this one is fine
- HTTP/1.1 keepalive: reset at `ngx_http_request.c:3493` (in
`ngx_http_keepalive_handler`, which begins at `:3356`) when a new request
arrives on an idle keepalive connection, and at
`ngx_http_request.c:3226` (in `ngx_http_set_keepalive`, which begins at
`:3128`) for the pipelined-next-request case.
- HTTP/2: each request has a fake per-stream connection `fc` initialized
with `fc->sent = 0` at `v2/ngx_http_v2.c:3039-3044`, so
`r->connection->sent` is per-stream. Framing overhead on the real TCP
connection (HEADERS, SETTINGS, WINDOW_UPDATE, PING, GOAWAY) is invisible
to `r->connection->sent` — that is consistent with how
`$bytes_sent` behaves and is fine for per-request accounting.
- `ngx_http_log_module.c:880` (`ngx_http_log_bytes_sent`) just prints
`r->connection->sent`, confirming this is the same value `$bytes_sent`
exposes.
The plugin's `slot->bytes_out += r->connection->sent` in the log handler is
therefore correct.
## What the plugin does about it
### Option A (implemented): Content-Length fallback on discard
In the log handler, when `r->discard_body` is set and
`r->headers_in.content_length_n > 0`, the plugin adds `content_length_n` to
`bin_sz` instead of trusting `r->request_length`.
- **Covers** the rate-limit-403 / auth-denial / 413-fixed-CL / early
`return 4xx;` cases on any request where the client advertised a
Content-Length (HTTP/1 or HTTP/2).
- **Does not cover** chunked requests with no Content-Length header (those
already count whatever bytes nginx `recv`ed before closing; the
post-close bytes are gone in any accounting scheme).
- **Over-counts** when a client announces a Content-Length then hangs up
partway through (nginx never received the remainder). In practice,
abusive clients like the CT log hammer case send the full body before
nginx's RST propagates, so the over-count is small.
- **Under-counts** HTTP/2 framing overhead (frame headers, control frames).
That overhead is small (~1% at typical DATA frame sizes) and can be
ignored for a first cut.
See `src/ngx_http_ipng_stats_module.c` — search for `discard_body`.
### Option B (not implemented): per-connection recv wrapping
Replace `c->recv` on each accepted HTTP connection with a wrapper that
increments a per-connection byte counter, then delegates to the original.
At log phase, use `(counter_at_log - counter_at_request_start)` as the
authoritative `bytes_in`. Captures discarded bodies, HTTP/2 framing, HTTP/2
`skip_data` bodies, partial uploads, and chunked — everything that crossed
nginx's `c->recv`. TLS/TCP overhead remains invisible (below the recv
abstraction). Revisit if the A-residual gap turns out to matter.

View File

@@ -2065,6 +2065,17 @@ ngx_http_ipng_stats_log_handler(ngx_http_request_t *r)
bin_sz = r->request_length > 0 ? (uint64_t) r->request_length : 0;
bout_sz = (uint64_t) r->connection->sent;
/* When nginx rejects a request before any handler reads the body
* (rate-limit 403, auth_request denial, 413 on fixed Content-Length,
* early `return 4xx;` on POST, ...), the body goes through
* ngx_http_discard_request_body / HTTP/2 skip_data, which do NOT
* update r->request_length. If the client advertised a
* Content-Length, fall back to that as a tight upper bound on the
* bytes we actually received. See docs/nginx-issues.md. */
if (r->discard_body && r->headers_in.content_length_n > 0) {
bin_sz += (uint64_t) r->headers_in.content_length_n;
}
slot->requests += 1;
slot->bytes_in += bin_sz;
slot->bytes_out += bout_sz;