Files

Pim van Pelt 143aad9063 PRE-RELEASE 0.9.1: Makefile, Debian packaging, versioned UDP

Build and release tooling:
- Makefile with help as default; targets: build/build-amd64/build-arm64,
  test, lint, proto, pkg-deb, docker, docker-push, clean, plus
  install-deps (+ three sub-targets for apt / Go toolchain / Go tools).
- internal/version package; -ldflags -X injects Version/Commit/Date into
  every binary. -version flag on all four binaries (nginx-logtail version
  for the CLI).
- Dockerfile takes VERSION/COMMIT/DATE build-args and forwards them.
- .deb output lands in build/; .gitignore ignores /build/.

Debian package:
- debian/build-deb.sh packages all four static binaries into a single
  nginx-logtail_<ver>_<arch>.deb using dpkg-deb.
- Binary layout: /usr/sbin/nginx-logtail-{collector,aggregator,frontend}
  and /usr/bin/nginx-logtail.
- nginx-logtail(8) manpage.
- Three systemd units (collector, aggregator, frontend) shipped under
  /lib/systemd/system/. Installed but never enabled or started — the
  operator opts in per host.
- Collector runs as _logtail:www-data (log access); aggregator and
  frontend as _logtail:_logtail. postinst creates the system user/group
  idempotently.
- Single shared env file /etc/default/nginx-logtail rendered from a
  template at first install with %HOSTNAME% substituted. Sensible
  defaults for every COLLECTOR_*, AGGREGATOR_*, FRONTEND_* variable;
  plus COLLECTOR_ARGS / AGGREGATOR_ARGS / FRONTEND_ARGS escape hatches
  appended to ExecStart. Not a dpkg conffile: operator edits survive
  upgrades and dpkg --purge removes it.

Versioned UDP wire format:
- ParseUDPLine dispatches on a leading "v<N>\t" tag; v1 routes to the
  existing 12-field parser. Unknown/missing versions fail closed so
  future v2 parsers can land before emitters are upgraded.
- Tests updated; design.md FR-2.2 rewritten to make the version tag
  normative.

Docs:
- README.md gains a Quick Start (Debian / Docker Compose / from source).
- user-guide.md rewritten around Installation and Configuration: full
  env-var table, UDP-only default explained, precise file/UDP log_format
  layouts, note that operators can emit "0" for unknown \$is_tor / \$asn.
- Drilldown cycle, frontend filter table, and CLI --group-by list all
  include source_tag. UDP counters documented in the Prometheus section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-17 10:35:08 +02:00

41 KiB

Raw Blame History

nginx-logtail Design Document

Metadata


Status	Describes intended behavior as of `v0.2.0`
Author	Pim van Pelt `<pim@ipng.ch>`
Last updated	2026-04-17
Audience	Operators and contributors running real-time traffic analysis and DDoS detection across a fleet of nginx hosts

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used as described in RFC 2119, and are reserved in this document for requirements that are intended to be enforced in code or by an external dependency. Plain-language descriptions of what the system or an operator can do are written in lowercase — "can", "will", "does" — and should not be read as normative.

Summary

nginx-logtail is a four-binary Go system for real-time analysis of nginx traffic across a fleet of hosts. Each nginx host runs a collector that ingests logs (from files via fsnotify, from a UDP socket, or both) and maintains in-memory ranked top-K counters across multiple time windows. A central aggregator subscribes to the collectors' snapshot streams and serves a merged view. An HTTP frontend renders a drilldown dashboard (server-rendered HTML, zero JavaScript). A CLI offers the same queries as a shell companion. All four programs speak a single gRPC service (LogtailService), so the frontend and CLI work against any collector or the aggregator interchangeably.

Background

Operators running tens of nginx hosts behind a load balancer need a live, drilldown view of request traffic for DDoS detection and traffic analysis. Questions the system answers include:

Which client prefix is causing the most HTTP 429s right now?
Which website is getting the most 503s over the last 24 hours?
Which nginx machine is the busiest?
Is there a DDoS in progress, and from where?

Existing log-analysis pipelines (ELK, Loki, ClickHouse, etc.) answer questions like these but require infrastructure that is disproportionate for the target workload. A handful of nginx hosts each doing ~10 K req/s at peak can be kept on a per-minute top-K structure in ~1 GB of RAM per host, with <250 ms query latency across the whole fleet, without a storage tier.

A companion project, nginx-ipng-stats-plugin, adds per-device attribution in nginx itself and can emit a logtail-format access log as UDP datagrams. nginx-logtail was extended in v0.2.0 to ingest that stream natively, so operators can run it either from on-disk log files, from the UDP feed, or both on the same host.

Goals and Non-Goals

Product Goals

Live top-K per (website, client_prefix, URI, status, is_tor, asn, source_tag). For every combination of these dimensions the system maintains an integer count, ranked so that the top entries are readily available across 1 m, 5 m, 15 m, 60 m, 6 h, and 24 h windows.
Sub-second query latency. TopN and Trend queries MUST return from the collector and from the aggregator in well under one second at the target scale (10 hosts, 10 K req/s each).
Bounded memory. The collector MUST stay within a 1 GB steady-state memory budget regardless of input cardinality, including during high-cardinality DDoS attacks.
Two ingest paths, one data model. On-disk log files (fsnotify-tailed, logrotate-aware) and UDP datagrams (from nginx-ipng-stats-plugin) MUST both feed the same in-memory structure, with a single log format per path and no operator-visible difference downstream.
No external storage, no TLS, no CGO. The entire system runs as four static Go binaries on a trusted internal network. Operators who need retention beyond the ring buffers SHOULD scrape Prometheus.
One service contract. Collectors and the aggregator implement the same gRPC LogtailService. Frontend and CLI MUST work against either interchangeably, with the collector returning "itself" from ListTargets and the aggregator returning its configured collector set.

Non-Goals

The system does not parse arbitrary nginx log_format strings. Two fixed tab-separated formats are supported: a file format and a UDP format (see FR-2). Operators who need general parsing should use Vector, Fluent Bit, or Promtail.
The system does not store raw log lines. Counts are aggregated at ingest; the original log lines are not kept in memory or on disk. The project does not replace an access log.
The system does not persist counters across restarts. Ring buffers are in-memory only. On aggregator restart, historical state is reconstructed by calling DumpSnapshots on each collector (FR-4.3). On collector restart the rings start empty and refill as new traffic arrives.
The system does not provide per-URI request timing distributions. Latency histograms exist only in the collector's Prometheus exposition (per host), not in the top-K data model.
The system does not ship TLS or authentication for its gRPC endpoints. Operators who expose it beyond a trusted network are expected to terminate TLS in a front proxy.
The system is not a general-purpose metric store. The Prometheus exporter on the collector exposes a deliberately narrow set: per-host request counter, per-host body-size and request-time histograms, and per-source_tag rollup counters.

Requirements

Each requirement carries a unique identifier (FR-X.Y or NFR-X.Y) so that later sections can cite it.

Functional Requirements

FR-1 Counter data model

FR-1.1 The canonical unit of counting MUST be a 7-tuple (website, client_prefix, http_request_uri, http_response, is_tor, asn, ipng_source_tag) mapped to a 64-bit integer request count. The data model contains no other fields: no timing, no byte counts, no method (those live only in the Prometheus exposition, FR-8).
FR-1.2 website MUST be the nginx $host value.
FR-1.3 client_prefix MUST be the client IP truncated to a configurable prefix length, formatted as CIDR. Default /24 for IPv4 and /48 for IPv6 (flags -v4prefix, -v6prefix). Truncation happens at ingest; the original address is not retained.
FR-1.4 http_request_uri MUST be the $request_uri path only — the query string (from the first ? onward) MUST be stripped at ingest. This is the dominant cardinality-reduction measure; DDoS traffic with attacker-generated query strings cannot grow the working set.
FR-1.5 http_response MUST be the HTTP status code as recorded by nginx.
FR-1.6 is_tor MUST be a boolean, populated by the operator in the log format (typically via a lookup against a TOR exit-node list). For the file format, lines without this field default to false for backward compatibility.
FR-1.7 asn MUST be an int32 decimal value sourced from MaxMind GeoIP2 (or equivalent). For the file format, lines without this field default to 0.
FR-1.8 ipng_source_tag MUST be a short string identifying which attribution tag the request arrived under. For records from on-disk log files, the collector MUST assign the tag "direct" (mirroring nginx-ipng-stats-plugin's default-source convention). For records from the UDP stream, the tag is taken from the log line as emitted by the plugin.

FR-2 Log formats

FR-2.1 File format. The collector MUST accept nginx access logs in the following tab-separated layout, with the last two fields (is_tor, asn) optional for backward compatibility:

log_format logtail '$host\t$remote_addr\t$msec\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time\t$is_tor\t$asn';

#	Field	Ingested into
0	`$host`	`website`
1	`$remote_addr`	`client_prefix` (truncated)
2	`$msec`	(discarded)
3	`$request_method`	Prom `method` label
4	`$request_uri`	`http_request_uri`
5	`$status`	`http_response`
6	`$body_bytes_sent`	Prom body histogram
7	`$request_time`	Prom duration histogram
8	`$is_tor`	`is_tor` (optional)
9	`$asn`	`asn` (optional)

FR-2.2 UDP format. The collector MUST accept datagrams in a versioned tab-separated layout, as emitted by nginx-ipng-stats-plugin's ipng_stats_logtail directive. Every datagram MUST begin with a literal version tag (v<N>\t) so the collector can route each packet to the appropriate parser. Only v1 is defined in this revision; unknown versions MUST be counted as parse failures and dropped.
```
log_format ipng_stats_logtail 'v1\t$host\t$remote_addr\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time\t$is_tor\t$asn\t$ipng_source_tag\t$server_addr\t$scheme';
```
The v1 payload MUST have exactly 12 tab-separated fields after the v1 tag (13 fields total). $server_addr and $scheme MUST be parsed but dropped; they are reserved for future use. Malformed datagrams (wrong version, wrong field count, bad IP) MUST be counted (FR-8.5) and silently dropped.
FR-2.3 The file tailer MUST set source_tag="direct" on every record it parses. The UDP listener MUST propagate $ipng_source_tag verbatim. This is the only difference in downstream processing between the two ingest paths.

FR-3 Ring buffers and time windows

FR-3.1 Each collector and the aggregator MUST maintain two tiered ring buffers:

Tier Bucket size Buckets Top-K/bucket Covers

Fine 1 min 60 50 000 1 h

Coarse 5 min 288 5 000 24 h
FR-3.2 The Window enum MUST map queries to tiers as follows:

Window Tier Buckets summed

1 m fine 1

5 m fine 5

15 m fine 15

60 m fine 60

6 h coarse 72

24 h coarse 288
FR-3.3 Every minute, the collector MUST snapshot its live map into the fine ring (top-50 000, sorted desc) and reset the live map. Every fifth fine tick, the collector MUST merge the most recent five fine snapshots into one coarse snapshot (top-5 000). The fine/coarse merge MUST be pinned to the 1-minute and 5-minute boundaries of the local clock so sparklines align across collectors.
FR-3.4 Querying MUST always read from the rings, never from the live map. A sub-minute request MUST return an empty top-1 result rather than surfacing partially-accumulated data; this keeps per-minute results monotonic.

Tier	Bucket size	Buckets	Top-K/bucket	Covers
Fine	1 min	60	50 000	1 h
Coarse	5 min	288	5 000	24 h

Window	Tier	Buckets summed
1 m	fine	1
5 m	fine	5
15 m	fine	15
60 m	fine	60
6 h	coarse	72
24 h	coarse	288

FR-4 Push-based streaming and aggregation

FR-4.1 The collector MUST expose a server-streaming RPC StreamSnapshots(SnapshotRequest) → stream Snapshot that emits one fine (1-min) snapshot per minute rotation. Subscribers MUST receive the same snapshot independently (per-subscriber buffered fan-out, bounded buffer, drop on full).
FR-4.2 The aggregator MUST subscribe to every configured collector via StreamSnapshots and merge snapshots into a single ring-buffer cache. The merge strategy MUST be delta-based: on each new snapshot from collector X, the aggregator MUST subtract X's previous contribution and add the new entries, giving O(snapshot_size) per update (not O(N_collectors × size)).
FR-4.3 The aggregator MUST expose a unary DumpSnapshots(DumpSnapshotsRequest) → stream Snapshot on each collector that streams all fine buckets (with is_coarse=false) followed by all coarse buckets (with is_coarse=true). On aggregator startup, it MUST call DumpSnapshots against every collector once (concurrently, after its own gRPC server is already listening), merge the per-timestamp entries the same way the live path does, and load the result into its cache via a single atomic replacement. Collectors that return Unimplemented MUST be skipped without blocking live streaming from the others.
FR-4.4 The aggregator MUST reconnect to each collector independently with exponential backoff (100 ms → cap 30 s). After three consecutive connection failures the aggregator MUST zero the degraded collector's contribution (subtract its last-known snapshot and delete its entry). When the collector recovers and sends a new snapshot, its contribution MUST automatically be reintegrated.

FR-5 Query service (LogtailService)

FR-5.1 Collector and aggregator MUST implement the same gRPC LogtailService:

service LogtailService {
  rpc TopN(TopNRequest)                  returns (TopNResponse);
  rpc Trend(TrendRequest)                returns (TrendResponse);
  rpc StreamSnapshots(SnapshotRequest)   returns (stream Snapshot);
  rpc ListTargets(ListTargetsRequest)    returns (ListTargetsResponse);
  rpc DumpSnapshots(DumpSnapshotsRequest)returns (stream Snapshot);
}

FR-5.2 Filter MUST support exact, inequality, and RE2-regex constraints on the dimensions of FR-1. Status and ASN accept the six-operator expression language (=, !=, >, >=, <, <=). Website and URI accept regex match and regex exclusion. TOR filtering uses a three-state enum (ANY/YES/NO). Source-tag filtering is exact match only.
FR-5.3 GroupBy MUST cover every dimension of FR-1 except is_tor (which is boolean and rarely useful as a group-by target): WEBSITE, CLIENT_PREFIX, REQUEST_URI, HTTP_RESPONSE, ASN_NUMBER, SOURCE_TAG.
FR-5.4 ListTargets MUST return, from the aggregator, every configured collector with its display name and gRPC address; from a collector, a single entry describing itself with an empty addr (meaning "this endpoint").
FR-5.5 All queries MUST be answered from the local ring buffers. The aggregator MUST NOT fan out to collectors at query time.

FR-6 HTTP frontend

FR-6.1 The frontend MUST render a server-rendered HTML dashboard with no JavaScript, using inline SVG for sparklines and <meta http-equiv="refresh"> for auto-refresh. It MUST work in text-mode browsers (w3m, lynx) and under curl.
FR-6.2 All filter, group-by, and window state MUST live in the URL query string so that URLs are shareable and bookmarkable. No server-side session.
FR-6.3 The frontend MUST provide a drilldown affordance: clicking a row MUST add that row's value as a filter and advance the group-by dimension through the cycle website → prefix → uri → status → asn → source_tag → website.
FR-6.4 The frontend MUST issue TopN, Trend, and ListTargets concurrently with a 5 s deadline. Trend failure MUST suppress the sparkline but not the table. ListTargets failure MUST hide the source picker but not the rest of the page.
FR-6.5 Appending &raw=1 to any URL MUST return the TopN result as JSON, so the dashboard can be scripted without the CLI.
FR-6.6 The frontend MUST accept a q= parameter holding a mini filter expression (status>=400 AND website~=gouda.*). On submission it MUST parse the expression and redirect to the canonical URL with the individual f_* params populated; parse errors MUST render inline without losing the current filter state.

FR-7 CLI

FR-7.1 The CLI MUST provide four subcommands: topn, trend, stream, targets. Each subcommand MUST accept --target host:port[,host:port...] and fan out concurrently, printing results in order with per-target headers (omitted for single-target invocations, so output pipes cleanly into jq).
FR-7.2 The CLI MUST expose every Filter dimension as a dedicated flag and default to a human-readable table. --json MUST switch to newline-delimited JSON for stream and to a single JSON array for topn/trend.
FR-7.3 stream MUST reconnect automatically on error with a 5 s backoff and run until interrupted.

FR-8 Prometheus exposition (collector only)

FR-8.1 The collector MUST expose a Prometheus /metrics endpoint on -prom-listen (default :9100). Setting the flag to the empty string MUST disable it entirely.
FR-8.2 The collector MUST expose a per-request counter nginx_http_requests_total{host, method, status} capped at promCounterCap = 250 000 distinct label sets. When the cap is reached, further new label sets MUST be dropped (existing series keep incrementing) until the map is rolled over.
FR-8.3 The collector MUST expose per-host histograms nginx_http_response_body_bytes{host, le} (body-size distribution) and nginx_http_request_duration_seconds{host, le} (request-time distribution). The duration histogram MUST NOT be split by source_tag — its bucket count would multiply without operational benefit.
FR-8.4 The collector MUST expose two parallel roll-ups labeled by source_tag only (not cross-producted with host): nginx_http_requests_by_source_total{source_tag} and nginx_http_response_body_bytes_by_source{source_tag, le}. These are separate metric names to avoid inconsistent label sets under a single name.
FR-8.5 The collector MUST expose three counters that let operators distinguish UDP parse failures from back-pressure drops: logtail_udp_packets_received_total (datagrams off the socket), logtail_udp_loglines_success_total (parsed OK), and logtail_udp_loglines_consumed_total (forwarded to the store — i.e. not dropped).

Non-Functional Requirements

NFR-1 Correctness under concurrency

NFR-1.1 The collector MUST run a single goroutine ("the store goroutine") that owns the live map and the ring-buffer write path. No other goroutine MUST write to these structures. The file tailer and the UDP listener MUST communicate with the store goroutine through a bounded channel.
NFR-1.2 Readers (query RPCs and subscriber fan-out) MUST take an RLock on the rings. Writers MUST take a Lock only for the moment the slice header of the new snapshot is installed; serialisation and network I/O MUST happen outside the lock.
NFR-1.3 DumpSnapshots MUST copy ring headers and filled counts under RLock only, then release the lock before streaming. The minute-rotation write path MUST never observe a lock held for longer than a microsecond-scale slice copy.
NFR-1.4 A query that races with a rotation MUST observe a monotonically non-decreasing total for a fixed filter over a fixed window; it MUST NOT observe a partially-rotated state that would cause a total to decrease compared to a prior reading.

NFR-2 Memory bounds

NFR-2.1 The collector's live map MUST be hard-capped at 100 000 entries. Once the cap is reached, only updates to existing keys MUST proceed; new keys MUST be dropped until the next minute rotation resets the map. This bounds memory under high-cardinality attacks.
NFR-2.2 Fine-ring snapshots MUST be capped at top-50 000 entries; coarse-ring snapshots at top-5 000. The full memory budget for a collector is therefore approximately 845 MB (live map ~19 MB + fine ring ~558 MB + coarse ring ~268 MB).
NFR-2.3 The aggregator MUST apply the same tier caps as the collector. Its steady-state memory is roughly equivalent to one collector regardless of the number of collectors subscribed.
NFR-2.4 The Prometheus counter map (FR-8.2) MUST be capped at promCounterCap = 250 000 entries. The per-host and per-source histograms MUST NOT be capped explicitly — they grow only with the distinct host count, which is bounded by the operator's vhost configuration.

NFR-3 Performance

NFR-3.1 ParseLine and ParseUDPLine MUST use strings.Split / strings.SplitN (no regex), so that per-line cost stays around 50 ns on commodity hardware.
NFR-3.2 TopN and Trend queries across the full 24-hour coarse ring MUST complete in well under 250 ms at the 50 000-entry fine cap, for fully-specified filters.
NFR-3.3 The collector's input channel MUST be sized to absorb approximately 20 s of peak load (e.g. 200 000 at 10 K lines/s) so that transient pauses in the store goroutine do not back up the tailer or the UDP listener.
NFR-3.4 When either the tailer or the UDP listener cannot enqueue a parsed record because the channel is full, the record MUST be dropped rather than blocking the ingest goroutine. UDP drops MUST be visible via the counters in FR-8.5; file-path drops are implicit (the tailer falls behind the file).

NFR-4 Fault tolerance and recovery

NFR-4.1 The file tailer MUST tolerate logrotate automatically. On RENAME/REMOVE events it MUST drain the old file descriptor to EOF, close it, and retry opening the original path with exponential backoff until the new file appears. No SIGHUP or restart MUST be required.
NFR-4.2 The aggregator MUST NOT block frontend queries during backfill. Its gRPC server MUST start listening first; backfill (FR-4.3) MUST run in a background goroutine.
NFR-4.3 A collector restart MUST NOT affect peer collectors or the aggregator's ability to continue serving the surviving collectors' data. When the restarted collector reconnects, its stream MUST resume without operator action.
NFR-4.4 An aggregator restart MUST recover its ring-buffer contents from all collectors via DumpSnapshots; live streaming MUST resume in parallel with backfill so that no minute is lost even during a restart.

NFR-5 Observability of the system itself

NFR-5.1 The collector MUST expose operator-facing log lines on stdout covering: file discovery, logrotate reopen events, UDP listener bind, subscriber connect/disconnect, and fatal configuration errors. The collector MUST NOT log anything on the per-request hot path.
NFR-5.2 The aggregator MUST log each collector's connect, disconnect, degraded transition, and recovery. Backfill MUST log a per-collector line with bucket counts, entry counts, and wall-clock duration.
NFR-5.3 The Prometheus exporter MUST be the primary out-of-band health signal. Counters FR-8.5 plus the per-host request counter (FR-8.2) give an operator a full view of ingest health without needing to read the logs.

NFR-6 Security

NFR-6.1 gRPC traffic MUST be cleartext HTTP/2. Operators who expose the endpoints beyond a trusted network are expected to terminate TLS in a front proxy.
NFR-6.2 The collector MUST bind its UDP listener to 127.0.0.1 by default (configurable via -logtail-bind) so that merely setting -logtail-port MUST NOT expose the socket to the public Internet.
NFR-6.3 The system MUST NOT record per-request personally-identifying data beyond what nginx already logs. Client IPs are truncated at ingest (FR-1.3); URIs lose their query strings (FR-1.4).

NFR-7 Documentation and packaging

NFR-7.1 The repository MUST ship docs/user-guide.md that walks an operator through nginx log format configuration, running each of the four binaries (flags, systemd examples, Docker Compose), and integrating the Prometheus exporter. It MUST contain enough examples that a new operator can stand up a single-host deployment end-to-end without reading the source.
NFR-7.2 The repository MUST ship docs/design.md (this document) covering the normative requirements and the architectural rationale.
NFR-7.3 All four binaries MUST build as static Go binaries with CGO_ENABLED=0 -trimpath -ldflags="-s -w" and MUST ship together in a single scratch-based Docker image. No OS, no shell, no runtime dependencies.

Architecture Overview

Process Model

The project ships four binaries:

collector — runs on every nginx host. Ingests logs from files and/or UDP, maintains the live map and tiered rings, serves LogtailService on port 9090, and exposes Prometheus on port 9100.
aggregator — runs centrally. Subscribes to every collector, merges snapshots, serves the same LogtailService on port 9091.
frontend — runs centrally, alongside the aggregator. HTTP server on port 8080, rendering HTML against the aggregator (or any other LogtailService endpoint).
cli — runs wherever the operator is. Talks to any LogtailService. No daemon.

Because all four binaries speak one service, the aggregator is optional for a single-host deployment: the frontend and CLI can point directly at a collector.

Data Flow

             ┌──────────────┐ files  ┌───────────────┐
   nginx ──▶ │ access.log   │───────▶│ file tailer   │
             │ (file mode)  │        │ (fsnotify)    │──┐
             └──────────────┘        └───────────────┘  │
                                                        │
             ┌──────────────┐  UDP   ┌───────────────┐  │
  nginx-ipng ▶ ipng_stats_  ├───────▶│ udp listener  │──┼──▶ LogRecord ──▶ ┌──────────┐
  -stats-    │ logtail      │        │ (127.0.0.1)   │  │    channel (200K)│  store    │
  plugin     └──────────────┘        └───────────────┘  │                  │ goroutine│
                                                        │                  └─────┬────┘
                                                        ▼                        │
                                               Prom exporter                     │
                                                                                 ▼
                                                                         ┌─────────────┐
                                                                         │ live map    │
                                                                         │ (≤100 K)    │
                                                                         └──────┬──────┘
                                                                                │ every 1 m
                                                                                ▼
                                                                         ┌─────────────┐
                                                                         │ fine ring   │
                                                                         │ 60×50 K     │────┐
                                                                         └──────┬──────┘    │
                                                                                │ every 5 m │
                                                                                ▼           │
                                                                         ┌─────────────┐    │
                                                                         │ coarse ring │    │
                                                                         │ 288×5 K     │    │
                                                                         └─────────────┘    │
                                                                                            │
                                                     ┌──────────────────────────────────────┘
                                                     │ StreamSnapshots (push)
                                                     ▼
                                               aggregator ──▶ merged cache ──▶ frontend / CLI

Requests enter nginx. The nginx writes either to a log file (file mode) or via the ipng_stats_logtail directive to a UDP socket (UDP mode), or both. The collector has two ingest goroutines that parse a line into a LogRecord and enqueue it on a shared 200 K channel. A single store goroutine consumes the channel, updating the live map and maintaining the tiered rings. A once-per-minute timer rotates the live map into the fine ring and (every fifth tick) into the coarse ring, and fans the fresh snapshot out to every StreamSnapshots subscriber. The aggregator is one such subscriber.

Query RPCs (TopN, Trend) MUST read only from the rings and MUST NOT read from the live map. The aggregator's cache is itself a ring built from the merged-view snapshots; it is updated on the same 1-minute cadence regardless of how many collectors are connected.

Components

Program 1 — Collector (`cmd/collector`)

Responsibilities

Tail on-disk log files via a single fsnotify.Watcher, handle logrotate, and re-scan glob patterns periodically to pick up new files (FR-2.1, NFR-4.1).
Listen on an optional UDP socket for ipng_stats_logtail datagrams (FR-2.2).
Parse each log line into a LogRecord (FR-1).
Maintain the live map, fine ring, coarse ring, and subscriber fan-out under a single-writer goroutine (FR-3, NFR-1).
Serve LogtailService on -listen (FR-5).
Expose Prometheus metrics on -prom-listen (FR-8).

Key data types

LogRecord — ten fields (website, client_prefix, URI, status, is_tor, asn, method, body_bytes_sent, request_time, source_tag). Produced by ParseLine or ParseUDPLine and consumed by the store goroutine.
Tuple6 (historical name; carries seven fields now) — the aggregation key. NUL-separated when encoded as a map key for snapshots. The code name is intentionally stable so downstream tests and consumers are not churned.
Snapshot — (timestamp, []Entry) where Entry = (label, count) and label is an encoded Tuple6.

Presents

LogtailService on TCP (default :9090).
A Prometheus /metrics handler on TCP (default :9100).

Consumes

One or more on-disk log files matched by --logs and/or --logs-file globs.
Optionally, a UDP socket on --logtail-bind:--logtail-port (default 127.0.0.1, disabled when port is 0).

Program 2 — Aggregator (`cmd/aggregator`)

Responsibilities

Dial every configured collector and subscribe via StreamSnapshots (FR-4.2).
Merge incoming snapshots into a single cache using delta-based subtraction, so a collector's contribution is updated in place rather than accumulated (FR-4.2).
At startup, call DumpSnapshots on each collector once, merge the per-timestamp entries, and load the result into the cache atomically (FR-4.3).
Handle collector outages with exponential-backoff reconnect and degraded-collector zeroing (FR-4.4).
Serve the same LogtailService as the collector (FR-5).
Maintain a TargetRegistry that maps collector addresses to display names (updated from the source field of incoming snapshots).

Presents

LogtailService on TCP (default :9091).

Consumes

The StreamSnapshots and DumpSnapshots RPCs on every configured collector (--collectors).

Program 3 — Frontend (`cmd/frontend`)

Responsibilities

Render the drilldown dashboard server-side with no JavaScript (FR-6.1).
Parse URL query string into filter / group-by / window state (FR-6.2).
Issue TopN, Trend, and ListTargets concurrently with a 5 s deadline (FR-6.4).
Render inline SVG sparklines from TrendResponse (FR-6.1).
Support the mini filter-expression language (FR-6.6) and the raw=1 JSON output (FR-6.5).
Expose a source-picker row populated from ListTargets.

Presents

An HTTP dashboard on TCP (default :8080).

Consumes

Any LogtailService endpoint (--target, default localhost:9091 — the aggregator).

Program 4 — CLI (`cmd/cli`)

Responsibilities

Dispatch to topn, trend, stream, or targets (FR-7.1).
Parse shared and per-subcommand flags, build a Filter proto from them, and fan out to every --target concurrently (FR-7.2).
Print human-readable tables by default; switch to JSON with --json (FR-7.2).
Reconnect automatically in stream mode (FR-7.3).

Presents

Exit status 0 on success, non-zero on RPC error (except stream, which runs until interrupted).

Consumes

Any LogtailService endpoint.

Protobuf service (`proto/logtail.proto`)

One proto file defines every shared type: Tuple6 is encoded as a NUL-separated label string inside TopNEntry, and the Snapshot message carries both fine (1-min) and coarse (5-min) ring contents. GroupBy and Window are enums; Filter carries optional exact-match fields, regex fields, and the StatusOp comparison enum used for both http_response and asn_number.

Operational Concerns

Deployment Topology

A typical deployment is:

Per nginx host: one collector systemd unit, pointed at /var/log/nginx/*.log and/or listening on 127.0.0.1:9514 for the nginx-ipng-stats-plugin UDP stream. Exposes :9090 (gRPC) and :9100 (Prometheus).
Central: one aggregator systemd unit on e.g. agg:9091, subscribed to all collectors; and one frontend systemd unit on agg:8080, pointed at the aggregator. Operators reach the dashboard via http://agg:8080/. Alternatively, the Docker Compose file in the repo root runs the aggregator and frontend together.
Operator laptop: logtail-cli invocations, pointed at the aggregator for fleet-wide questions or at a specific collector for a single-host drilldown.

Configuration

All four binaries are configured via flags with matching environment variables. The canonical reference is docs/user-guide.md. Representative settings:

collector: --logs /var/log/nginx/*.log, --logtail-port 9514, --source $(hostname), --prom-listen :9100.
aggregator: --collectors nginx1:9090,nginx2:9090, --listen :9091.
frontend: --target agg:9091, --listen :8080.
cli: no persistent configuration; every invocation carries --target.

Reload and Restart Semantics

Collector restart. The live map and both rings start empty. The file tailer resumes at EOF of each watched file (no historical replay). The fine ring refills within an hour; the coarse ring within 24 hours.
Aggregator restart. Backfill reconstructs the cache from all collectors' DumpSnapshots streams. The gRPC server is listening before backfill begins (NFR-4.2), so the frontend is never blocked during restart — it just sees an incomplete cache for the few seconds backfill takes.
Collector outage. The aggregator reconnects with backoff; after three consecutive failures the collector's contribution is zeroed (FR-4.4) so the merged view does not show stale counts. On recovery the zeroing is reversed by the next snapshot.
nginx logrotate. The collector drains the old fd, closes, and retries the original path. No operator action (NFR-4.1).
nginx-ipng-stats-plugin reload. The plugin's UDP socket is per-worker; a reload simply causes new workers to open fresh sockets to the same address. The collector sees a brief gap and resumes.

Observability of the System Itself

Primary channel is the collector's Prometheus endpoint (FR-8). Beyond the per-host request counter and the per-source roll-ups, three UDP counters give direct visibility into the UDP ingest path:

logtail_udp_packets_received_total — what arrived.
logtail_udp_loglines_success_total — what parsed cleanly.
logtail_udp_loglines_consumed_total — what made it to the store (i.e. was not dropped by a full channel).

received - success is the parse-failure rate; success - consumed is the back-pressure drop rate. Operators should alert on both being non-zero.

Each binary logs human-readable lines on stdout for connect/disconnect events, logrotate reopen, backfill timing, and degraded transitions. No per-request logging.

Failure Modes

High-cardinality DDoS. The live map hits 100 000 entries and stops accepting new keys until the next rotation (NFR-2.1). Existing top-K entries keep accumulating, so the attacker's dominant prefixes / URIs remain visible. The cap resets every minute.
Collector crash. In-flight live-map state for the current minute is lost. The next collector start resumes tailing; the aggregator zeroes the degraded collector's contribution after a few seconds and reintegrates it when snapshots resume.
Aggregator crash. No collector is affected. The operator restarts the aggregator; backfill reconstructs the cache.
Frontend crash. Stateless. Operator restarts.
UDP datagram loss. Any datagram dropped in-kernel (socket buffer full, network drop) does not register as a parse failure; it is simply invisible. Operators should size SO_RCVBUF appropriately; the collector already requests 4 MiB.
Malformed log lines. File format: lines with <8 tab-separated fields are silently skipped; an invalid IP also drops the line. UDP: packets without a recognised v<N>\t prefix, or with the wrong field count for the claimed version, or with a bad IP, are counted as received-but-not-success and dropped.
Clock skew between collectors. Trend sparklines derived from merged data assume collectors are roughly NTP-synced. Per-bucket alignment is to the local minute / 5-minute boundary of each collector.
gRPC traffic over untrusted links. The system does not ship TLS; operators should front the gRPC ports with a TLS-terminating proxy or an IPsec tunnel.

Security

No TLS, no auth. Deliberate (NFR-6.1). Deploy on a trusted network or behind a TLS proxy.
UDP bind. Default 127.0.0.1 so merely turning on the listener does not expose a public socket (NFR-6.2).
Client-IP truncation. Client addresses are truncated at ingest; the system never stores full client IPs (NFR-6.3, FR-1.3).
Query-string stripping. URIs lose their query strings at ingest. A user who cares about ?q= parameters must re-engineer nginx's log format — and then accept that cardinality consequence.

Alternatives Considered

Log shipping to ClickHouse / ELK. Rejected as the default: adds a storage tier to a problem that fits in a per-host 1 GB ring, for the target fleet size. A future ClickHouse export from the aggregator is viable and would be additive (deferred).
Raw request logging to Kafka. Rejected: preserves every request at much higher cost for no visibility benefit; the operator wants top-K ranking, not a replay log. If raw logging is desired, nginx's own access log is the right tool.
Promtail / Grafana Loki. Rejected as the primary interface. Loki is excellent for free-text log search but weak for fast ranked aggregations over dozens of dimensions; the drilldown interaction the operator wants fits poorly into LogQL.
In-process Lua aggregator on each nginx. Considered for the collector tier. Rejected: shipping counters to a central view still requires a process outside nginx; keeping the ingest path out of the nginx worker avoids a class of latency regressions.
pull-based collector polling (aggregator polls collectors every second). Rejected in favor of push. Polling multiplies query latency and makes the aggregator's cache stale by the poll interval. Push-stream with delta merge keeps the cache within seconds of real time.
One metric name for both per-host and per-source_tag roll-ups. Rejected for Prometheus hygiene. Mixing different label sets under one metric name breaks aggregation rules; separate metric names (_by_source) are clearer and easier to query.
Cross-product of host × source_tag for every counter and histogram. Rejected. With ~20 tags and ~50 hosts the cardinality explodes quickly on the duration histogram without operational benefit. The duration histogram stays per-host; requests and body size get a parallel _by_source rollup.
Writing every snapshot to disk for restart recovery. Rejected in favor of DumpSnapshots RPC backfill. Disk-backed persistence would multiply operational surface (rotation, fsck, permissions) for a feature that needs to survive only an aggregator restart.

Decisions Deferred Post-v0.2

ClickHouse export from aggregator. 1-minute pre-aggregated rows pushed into a SummingMergeTree table for 7-day / 30-day windows. Frontend would route longer windows to ClickHouse while shorter windows stay on the in-memory rings. Strictly additive; no interface changes. Deferred until a concrete retention requirement lands.
TLS on gRPC endpoints. The argument for shipping TLS changes if/when the aggregator is deployed across an untrusted network segment. Until then, a front proxy is the right shape.
Ring-buffer sizing on a per-collector basis. Today every collector ships the same 60×50 K / 288×5 K dimensions. A low-traffic collector can afford smaller rings; a hot one might want larger. Deferred — the uniform default is operationally simpler.
Authenticated Prometheus scraping. The endpoint is currently open on :9100. If a future deployment puts the scraper on a less-trusted path, scrape-side auth (bearer token, TLS client cert) is the right add-on.
Coarse tier beyond 24 h. Extending to 7 days in-memory would cost ~70 MB per collector but add 2016 buckets to iterate on a W24H+ query. Deferred until the operator wants a 7-day drilldown without ClickHouse.

41 KiB Raw Blame History Unescape Escape

nginx-logtail Design Document

Metadata

Summary

Background

Goals and Non-Goals

Product Goals

Non-Goals

Requirements

Functional Requirements

Non-Functional Requirements

Architecture Overview

Process Model

Data Flow

Components

Program 1 — Collector (cmd/collector)

Responsibilities

Key data types

Presents

Consumes

Program 2 — Aggregator (cmd/aggregator)

Responsibilities

Presents

Consumes

Program 3 — Frontend (cmd/frontend)

Responsibilities

Presents

Consumes

Program 4 — CLI (cmd/cli)

Responsibilities

Presents

Consumes

Protobuf service (proto/logtail.proto)

Operational Concerns

Deployment Topology

Configuration

Reload and Restart Semantics

Observability of the System Itself

Failure Modes

Security

Alternatives Considered

Decisions Deferred Post-v0.2

41 KiB

Raw Blame History

Program 1 — Collector (`cmd/collector`)

Program 2 — Aggregator (`cmd/aggregator`)

Program 3 — Frontend (`cmd/frontend`)

Program 4 — CLI (`cmd/cli`)

Protobuf service (`proto/logtail.proto`)