Count discarded POST bodies against bytes_in
Counter coverage for nginx_ipng_bytes_in_total was missing most of the wire for rate-limited POST workloads. Observed on a CT-log-style box: the plugin was reporting ~300 KB/s inbound while btop showed 6+ MB/s on the NIC, a ~20x gap. Root cause is in nginx itself: when a request is rejected before a handler reads the body (limit_req 403, auth_request denial, 413 on fixed Content-Length, early `return 4xx;`), nginx routes the body through ngx_http_discard_request_body / HTTP/2 skip_data. Both paths drop the bytes without ever incrementing r->request_length, so $request_length (and therefore the plugin's bytes_in counter) reflects only the request line and headers. Log handler now adds r->headers_in.content_length_n to bin_sz when r->discard_body is set and the client advertised a Content-Length. That's a tight upper bound on the body bytes actually received — abusive clients like the CT-log hammer case typically send the full declared body before nginx's RST propagates. Normal 200 POSTs are unchanged (they go through the buffered body path, which does update r->request_length, and r->discard_body stays 0). docs/nginx-issues.md catalogs the full analysis — which nginx paths update r->request_length and which don't, how 413 differs between fixed-CL and chunked and HTTP/2, why r->connection->sent is safe to accumulate (it's reset per-request on keepalive and per-stream on HTTP/2), and what the residual gap looks like after this fix (HTTP/2 framing overhead, TLS/TCP, chunked requests with no Content-Length header). Option B (per-connection recv wrapping) is sketched there for later if the residual matters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
122
docs/nginx-issues.md
Normal file
122
docs/nginx-issues.md
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
# Upstream nginx quirks that affect this plugin
|
||||||
|
|
||||||
|
This file catalogs behaviors of nginx itself (not bugs in this plugin) that
|
||||||
|
have a visible effect on the counters the plugin exposes. It exists so a
|
||||||
|
future reader can page the context back in quickly instead of re-deriving it
|
||||||
|
from the nginx source.
|
||||||
|
|
||||||
|
All `path:line` citations below are into the **nginx source tree** (checked
|
||||||
|
against tip as of 2026-04), not into this repository.
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
- `r->connection->sent` is **per-request** (reset between HTTP/1.1 keepalive
|
||||||
|
requests, per-stream on HTTP/2). Safe to `+=` in the log phase.
|
||||||
|
- `r->request_length` is **not** a faithful count of bytes received from the
|
||||||
|
client. It is only updated on specific code paths inside nginx, and the
|
||||||
|
discard-body path — which is taken on every request that nginx rejects
|
||||||
|
before a handler reads the body — bypasses it entirely. On a workload with
|
||||||
|
lots of early rejections (rate-limit 403, auth_request denial, 413 with
|
||||||
|
fixed Content-Length, `return 4xx;` on POST), `$request_length` and
|
||||||
|
therefore `nginx_ipng_bytes_in_total` under-count the wire traffic
|
||||||
|
substantially.
|
||||||
|
|
||||||
|
## `r->request_length` — where it is, and is not, updated
|
||||||
|
|
||||||
|
| Path | Updates `request_length`? | Source |
|
||||||
|
| ----------------------------------------------------- | ------------------------- | --------------------------------------------------- |
|
||||||
|
| Buffered body (`ngx_http_do_read_client_request_body`) | yes | `ngx_http_request_body.c:397` (after every `recv`), preread at `:120` |
|
||||||
|
| Unbuffered body (`ngx_http_read_unbuffered_request_body`) | yes (HTTP/1 — delegates to the buffered path) | `ngx_http_request_body.c:265` |
|
||||||
|
| Discard body (`ngx_http_discard_request_body` → `ngx_http_read_discarded_request_body`) | **NO** | `ngx_http_request_body.c:782-841` — reads into a stack buffer, never touches `request_length` |
|
||||||
|
| Chunked HTTP/1 | yes (indirect) | Wire bytes counted at `:397` before the chunked parser runs |
|
||||||
|
| HTTP/2 DATA payload | yes (payload only) | `ngx_http_v2_filter_request_body` at `v2/ngx_http_v2.c:4201` |
|
||||||
|
| HTTP/2 DATA via `stream->skip_data` | **NO** | `v2/ngx_http_v2.c:1077-1082` — skipped stream data is silently dropped |
|
||||||
|
| HTTP/2 frame headers, SETTINGS, WINDOW_UPDATE, PING, PRIORITY, RST_STREAM, GOAWAY, padding | **NO** | Not counted anywhere |
|
||||||
|
| HTTP/2 HEADERS / CONTINUATION | yes (decoded size only) | `v2/ngx_http_v2.c:1333, 1968` |
|
||||||
|
| HTTP/3 DATA | yes (payload only) | `v3/ngx_http_v3_request.c:1559, 1614, 1660, 1668` |
|
||||||
|
| Pipelined-request carry-over in `header_in` | **subtracted** back out | `ngx_http_request_body.c:541` (`r->request_length -= n`) |
|
||||||
|
| TLS handshake / record framing | never visible to nginx | Below `c->recv` |
|
||||||
|
|
||||||
|
### 413 `client_max_body_size` specifically
|
||||||
|
|
||||||
|
- **Fixed Content-Length:** `ngx_http_core_module.c:1006-1018` detects the
|
||||||
|
overrun right after headers are parsed and calls
|
||||||
|
`ngx_http_discard_request_body()` before finalizing 413. That path does
|
||||||
|
not bump `request_length`, so `$request_length` ends up reflecting only
|
||||||
|
the request line + headers. Body bytes that actually streamed in before
|
||||||
|
nginx closed are invisible.
|
||||||
|
- **Chunked HTTP/1:** `ngx_http_request_body.c:1142-1154` trips 413 inside
|
||||||
|
the body filter after the wire bytes have already been counted at `:397`,
|
||||||
|
so they are included — but only up to what nginx had `recv`ed before
|
||||||
|
returning.
|
||||||
|
- **HTTP/2 chunked:** `v2/ngx_http_v2.c:4216-4223` returns 413 after the
|
||||||
|
`r->request_length +=` at `:4201`, so partial-body bytes do count.
|
||||||
|
|
||||||
|
### The common "rate-limit 403" pattern
|
||||||
|
|
||||||
|
`limit_req` rejects in PREACCESS. On rejection, nginx finalizes with 403,
|
||||||
|
which calls `ngx_http_discard_request_body()`. On HTTP/1 this sends the
|
||||||
|
body into the stack-buffer discard reader (not counted). On HTTP/2 it sets
|
||||||
|
the stream's `skip_data` flag at `v2/ngx_http_v2.c:1077-1082`, so the
|
||||||
|
client's DATA frames are parsed and their payload silently dropped (also
|
||||||
|
not counted). Either way, `$request_length` reflects headers only.
|
||||||
|
|
||||||
|
This is the dominant source of under-counting on rate-limited abusive
|
||||||
|
workloads. Observed example: a Certificate Transparency client hammering
|
||||||
|
`POST /ct/v1/add-chain` at ~460 req/s, each carrying a ~10 KB cert chain
|
||||||
|
in the body, all hitting 403. 460 × 10 KB = ~4.6 MB/s of inbound bytes
|
||||||
|
completely absent from `$request_length`.
|
||||||
|
|
||||||
|
## `r->connection->sent` — this one is fine
|
||||||
|
|
||||||
|
- HTTP/1.1 keepalive: reset at `ngx_http_request.c:3493` (in
|
||||||
|
`ngx_http_keepalive_handler`, which begins at `:3356`) when a new request
|
||||||
|
arrives on an idle keepalive connection, and at
|
||||||
|
`ngx_http_request.c:3226` (in `ngx_http_set_keepalive`, which begins at
|
||||||
|
`:3128`) for the pipelined-next-request case.
|
||||||
|
- HTTP/2: each request has a fake per-stream connection `fc` initialized
|
||||||
|
with `fc->sent = 0` at `v2/ngx_http_v2.c:3039-3044`, so
|
||||||
|
`r->connection->sent` is per-stream. Framing overhead on the real TCP
|
||||||
|
connection (HEADERS, SETTINGS, WINDOW_UPDATE, PING, GOAWAY) is invisible
|
||||||
|
to `r->connection->sent` — that is consistent with how
|
||||||
|
`$bytes_sent` behaves and is fine for per-request accounting.
|
||||||
|
- `ngx_http_log_module.c:880` (`ngx_http_log_bytes_sent`) just prints
|
||||||
|
`r->connection->sent`, confirming this is the same value `$bytes_sent`
|
||||||
|
exposes.
|
||||||
|
|
||||||
|
The plugin's `slot->bytes_out += r->connection->sent` in the log handler is
|
||||||
|
therefore correct.
|
||||||
|
|
||||||
|
## What the plugin does about it
|
||||||
|
|
||||||
|
### Option A (implemented): Content-Length fallback on discard
|
||||||
|
|
||||||
|
In the log handler, when `r->discard_body` is set and
|
||||||
|
`r->headers_in.content_length_n > 0`, the plugin adds `content_length_n` to
|
||||||
|
`bin_sz` instead of trusting `r->request_length`.
|
||||||
|
|
||||||
|
- **Covers** the rate-limit-403 / auth-denial / 413-fixed-CL / early
|
||||||
|
`return 4xx;` cases on any request where the client advertised a
|
||||||
|
Content-Length (HTTP/1 or HTTP/2).
|
||||||
|
- **Does not cover** chunked requests with no Content-Length header (those
|
||||||
|
already count whatever bytes nginx `recv`ed before closing; the
|
||||||
|
post-close bytes are gone in any accounting scheme).
|
||||||
|
- **Over-counts** when a client announces a Content-Length then hangs up
|
||||||
|
partway through (nginx never received the remainder). In practice,
|
||||||
|
abusive clients like the CT log hammer case send the full body before
|
||||||
|
nginx's RST propagates, so the over-count is small.
|
||||||
|
- **Under-counts** HTTP/2 framing overhead (frame headers, control frames).
|
||||||
|
That overhead is small (~1% at typical DATA frame sizes) and can be
|
||||||
|
ignored for a first cut.
|
||||||
|
|
||||||
|
See `src/ngx_http_ipng_stats_module.c` — search for `discard_body`.
|
||||||
|
|
||||||
|
### Option B (not implemented): per-connection recv wrapping
|
||||||
|
|
||||||
|
Replace `c->recv` on each accepted HTTP connection with a wrapper that
|
||||||
|
increments a per-connection byte counter, then delegates to the original.
|
||||||
|
At log phase, use `(counter_at_log - counter_at_request_start)` as the
|
||||||
|
authoritative `bytes_in`. Captures discarded bodies, HTTP/2 framing, HTTP/2
|
||||||
|
`skip_data` bodies, partial uploads, and chunked — everything that crossed
|
||||||
|
nginx's `c->recv`. TLS/TCP overhead remains invisible (below the recv
|
||||||
|
abstraction). Revisit if the A-residual gap turns out to matter.
|
||||||
@@ -2065,6 +2065,17 @@ ngx_http_ipng_stats_log_handler(ngx_http_request_t *r)
|
|||||||
bin_sz = r->request_length > 0 ? (uint64_t) r->request_length : 0;
|
bin_sz = r->request_length > 0 ? (uint64_t) r->request_length : 0;
|
||||||
bout_sz = (uint64_t) r->connection->sent;
|
bout_sz = (uint64_t) r->connection->sent;
|
||||||
|
|
||||||
|
/* When nginx rejects a request before any handler reads the body
|
||||||
|
* (rate-limit 403, auth_request denial, 413 on fixed Content-Length,
|
||||||
|
* early `return 4xx;` on POST, ...), the body goes through
|
||||||
|
* ngx_http_discard_request_body / HTTP/2 skip_data, which do NOT
|
||||||
|
* update r->request_length. If the client advertised a
|
||||||
|
* Content-Length, fall back to that as a tight upper bound on the
|
||||||
|
* bytes we actually received. See docs/nginx-issues.md. */
|
||||||
|
if (r->discard_body && r->headers_in.content_length_n > 0) {
|
||||||
|
bin_sz += (uint64_t) r->headers_in.content_length_n;
|
||||||
|
}
|
||||||
|
|
||||||
slot->requests += 1;
|
slot->requests += 1;
|
||||||
slot->bytes_in += bin_sz;
|
slot->bytes_in += bin_sz;
|
||||||
slot->bytes_out += bout_sz;
|
slot->bytes_out += bout_sz;
|
||||||
|
|||||||
Reference in New Issue
Block a user