Update README.md

2026-03-14 20:50:10 +01:00
parent c092561af2
commit 2962590a74
1 changed files with 162 additions and 186 deletions
@@ -1,27 +1,27 @@
 SPECIFICATION

-This project contains three programs:
-1) A collector that can tail any number of nginx logfiles, and will keep a data structure of
-{website,client_prefix,http_request_uri,http_response} across all logfiles in memory. It is
-queryable and can give topN clients by website and by http_request; in other words I can see "who is
-causing the most HTTP 429" or "what is the busiest website". This program pre-aggregates the logs
-into a queryable structure. It runs on any number (10 or so) of nginx machines in a cluster. There
-is no UI here, only a gRPC interface.
+This project contains four programs:

-2) An aggregator that can query the first one and show global stats and trending information. It needs
-to be able to show global aggregated information from the first (collectors) to show 'what is the
-busiest nginx' in addition to  'what is the busiest website' or 'which client_prefix or
-http_request_uri is causing the most HTTP 503s'. It runs on a central machine and can show trending
-information; useful for ddos detection.  This aggregator is an RPC client of the collectors, and
-itself presents a gRPC interface.
+1) A **collector** that tails any number of nginx log files and maintains an in-memory structure of
+`{website, client_prefix, http_request_uri, http_response}` counts across all files. It answers
+TopN and Trend queries via gRPC and pushes minute snapshots to the aggregator via server-streaming.
+Runs on each nginx machine in the cluster. No UI — gRPC interface only.

-3) An HTTP companion frontend to the aggregator that can query either collector or aggregator and
-answer user queries in a drilldown fashion, eg 'restrict to http_response=429' then 'restrict to
-website=www.example.com' and so on.  This is an interactive rollup UI that helps operators see
-which websites are performing well, and which are performing poorly (eg excessive requests,
-excessive http response errors, DDoS)
+2) An **aggregator** that subscribes to the snapshot stream from all collectors, merges their data
+into a unified in-memory cache, and exposes the same gRPC interface. Answers questions like "what
+is the busiest website globally", "which client prefix is causing the most HTTP 503s", and shows
+trending information useful for DDoS detection. Runs on a central machine.

-Programs are written in Golang with a modern, responsive interactive interface.
+3) An **HTTP frontend** companion to the aggregator that renders a drilldown dashboard. Operators
+can restrict by `http_response=429`, then by `website=www.example.com`, and so on. Works with
+either a collector or aggregator as its backend. Zero JavaScript — server-rendered HTML with inline
+SVG sparklines and meta-refresh.
+
+4) A **CLI** for shell-based debugging. Sends `topn`, `trend`, and `stream` queries to any
+collector or aggregator, fans out to multiple targets in parallel, and outputs human-readable
+tables or newline-delimited JSON.
+
+Programs are written in Go. No CGO, no external runtime dependencies.

 ---

@@ -33,26 +33,39 @@ DESIGN
 nginx-logtail/
 ├── proto/
 │   └── logtail.proto                  # shared protobuf definitions
+├── internal/
+│   └── store/
+│       └── store.go                   # shared types: Tuple4, Entry, Snapshot, ring helpers
 └── cmd/
    ├── collector/
    │   ├── main.go
-    │   ├── tailer.go                  # tail multiple log files via fsnotify, handle logrotate
-    │   ├── parser.go                  # tab-separated logtail log_format parser
+    │   ├── tailer.go                  # MultiTailer: tail N files via one shared fsnotify watcher
+    │   ├── parser.go                  # tab-separated logtail log_format parser (~50 ns/line)
    │   ├── store.go                   # bounded top-K in-memory store + tiered ring buffers
-    │   └── server.go                  # gRPC server with server-streaming StreamSnapshots
+    │   └── server.go                  # gRPC server: TopN, Trend, StreamSnapshots
    ├── aggregator/
    │   ├── main.go
-    │   ├── subscriber.go              # opens streaming RPC to each collector, merges into cache
-    │   ├── merger.go                  # merge/sum TopN entries across sources
-    │   ├── cache.go                   # merged snapshot + tiered ring buffer served to frontend
+    │   ├── subscriber.go              # one goroutine per collector; StreamSnapshots with backoff
+    │   ├── merger.go                  # delta-merge: O(snapshot_size) per update
+    │   ├── cache.go                   # tick-based ring buffer cache served to clients
    │   └── server.go                  # gRPC server (same surface as collector)
    ├── frontend/
    │   ├── main.go
-    │   ├── handler.go                 # HTTP handlers, filter state in URL query string
-    │   ├── client.go                  # gRPC client to aggregator (or collector)
-    │   └── templates/                 # server-rendered HTML + inline SVG sparklines
+    │   ├── handler.go                 # URL param parsing, concurrent TopN+Trend, template exec
+    │   ├── client.go                  # gRPC dial helper
+    │   ├── sparkline.go               # TrendPoints → inline SVG polyline
+    │   ├── format.go                  # fmtCount (space thousands separator)
+    │   └── templates/
+    │       ├── base.html              # outer HTML shell, inline CSS, meta-refresh
+    │       └── index.html             # window tabs, group-by tabs, breadcrumb, table, footer
    └── cli/
-        └── main.go                    # topn / trend / stream subcommands, JSON output
+        ├── main.go                    # subcommand dispatch and usage
+        ├── flags.go                   # shared flags, parseTargets, buildFilter, parseWindow
+        ├── client.go                  # gRPC dial helper
+        ├── format.go                  # printTable, fmtCount, fmtTime, targetHeader
+        ├── cmd_topn.go                # topn: concurrent fan-out, table + JSON output
+        ├── cmd_trend.go               # trend: concurrent fan-out, table + JSON output
+        └── cmd_stream.go              # stream: multiplexed streams, auto-reconnect
 ```

 ## Data Model
@@ -78,7 +91,7 @@ Two ring buffers at different resolutions cover all query windows up to 24 hours
 Supported query windows and which tier they read from:

 | Window | Tier   | Buckets summed |
-|--------|--------|---------------|
+|--------|--------|----------------|
 | 1 min  | fine   | last 1         |
 | 5 min  | fine   | last 5         |
 | 15 min | fine   | last 15        |
@@ -95,10 +108,10 @@ Entry size: ~30 B website + ~15 B prefix + ~50 B URI + 3 B status + 8 B count +
 overhead ≈ **~186 bytes per entry**.

 | Structure               | Entries     | Size        |
-|-------------------------|------------|------------|
+|-------------------------|-------------|-------------|
 | Live map (capped)       | 100 000     | ~19 MB      |
 | Fine ring (60 × 1-min)  | 60 × 50 000 | ~558 MB     |
-| Coarse ring (288 × 5-min)| 288 × 5 000 | ~268 MB  |
+| Coarse ring (288 × 5-min)| 288 × 5 000| ~268 MB     |
 | **Total**               |             | **~845 MB** |

 The live map is **hard-capped at 100 K entries**. Once full, only updates to existing keys are
@@ -146,12 +159,12 @@ message TopNRequest   { Filter filter = 1; GroupBy group_by = 2; int32 n = 3; Wi
 message TopNEntry     { string label = 1; int64 count = 2; }
 message TopNResponse  { repeated TopNEntry entries = 1; string source = 2; }

-// Trend: one total count per minute bucket, for sparklines
+// Trend: one total count per minute (or 5-min) bucket, for sparklines
 message TrendRequest  { Filter filter = 1; Window window = 4; }
 message TrendPoint    { int64 timestamp_unix = 1; int64 count = 2; }
-message TrendResponse { repeated TrendPoint points = 1; }
+message TrendResponse { repeated TrendPoint points = 1; string source = 2; }

-// Streaming: collector pushes a snapshot after every minute rotation
+// Streaming: collector pushes a fine snapshot after every minute rotation
 message SnapshotRequest {}
 message Snapshot {
  string             source    = 1;
@@ -165,17 +178,19 @@ service LogtailService {
  rpc StreamSnapshots(SnapshotRequest) returns (stream Snapshot);
 }
 // Both collector and aggregator implement LogtailService.
-// Aggregator's StreamSnapshots fans out to all collectors and merges.
+// The aggregator's StreamSnapshots re-streams the merged view.
 ```

 ## Program 1 — Collector

 ### tailer.go
- One goroutine per log file. Opens file, seeks to EOF.
- Uses **fsnotify** (inotify on Linux) to detect writes. On `WRITE` event: read all new lines.
- On `RENAME`/`REMOVE` event (logrotate): drain to EOF of old fd, then **re-open** the original
-  path (with retry backoff) and resume from position 0. No lines are lost between drain and reopen.
- Emits `LogRecord` structs on a shared buffered channel (size 200 K — absorbs ~20 s of peak load).
+- **`MultiTailer`**: one shared `fsnotify.Watcher` for all files regardless of count — avoids
+  the inotify instance limit when tailing hundreds of files.
+- On `WRITE` event: read all new lines from that file's `bufio.Reader`.
+- On `RENAME`/`REMOVE` (logrotate): drain old fd to EOF, close, start retry-open goroutine with
+  exponential backoff. Sends the new `*os.File` back via a channel to keep map access single-threaded.
+- Emits `LogRecord` structs on a shared buffered channel (capacity 200 K — absorbs ~20 s of peak).
+- Accepts paths via `--logs` (comma-separated or glob) and `--logs-file` (one path/glob per line).

 ### parser.go
 - Parses the fixed **logtail** nginx log format — tab-separated, fixed field order, no quoting:
@@ -184,174 +199,130 @@ service LogtailService {
  log_format logtail '$host\t$remote_addr\t$msec\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time';
  ```

-  Example line:
-  ```
-  www.example.com	1.2.3.4	1741954800.123	GET	/api/v1/search	200	1452	0.043
-  ```
-
-  Field positions (0-indexed):
-
  | # | Field             | Used for         |
-  |---|------------------|-----------------|
+  |---|-------------------|------------------|
  | 0 | `$host`           | website          |
  | 1 | `$remote_addr`    | client_prefix    |
  | 2 | `$msec`           | (discarded)      |
-  | 3 | `$request_method`| (discarded)     |
-  | 4 | `$request_uri`   | http_request_uri|
+  | 3 | `$request_method` | (discarded)      |
+  | 4 | `$request_uri`    | http_request_uri |
  | 5 | `$status`         | http_response    |
  | 6 | `$body_bytes_sent`| (discarded)      |
  | 7 | `$request_time`   | (discarded)      |

- At runtime: `strings.SplitN(line, "\t", 8)` — single call, ~50 ns/line. No regex, no state machine.
+- `strings.SplitN(line, "\t", 8)` — ~50 ns/line. No regex.
 - `$request_uri`: query string discarded at first `?`.
- `$remote_addr`: truncated to /24 (IPv4) or /48 (IPv6); prefix lengths configurable.
- Lines with fewer than 8 fields are silently skipped (malformed / truncated write).
+- `$remote_addr`: truncated to /24 (IPv4) or /48 (IPv6); prefix lengths configurable via flags.
+- Lines with fewer than 8 fields are silently skipped.

 ### store.go
 - **Single aggregator goroutine** reads from the channel and updates the live map — no locking on
  the hot path. At 10 K lines/s the goroutine uses <1% CPU.
 - Live map: `map[Tuple4]int64`, hard-capped at 100 K entries (new keys dropped when full).
- **Minute ticker**: goroutine heap-selects top-50K entries from live map, writes snapshot into
-  fine ring buffer slot, clears live map, advances fine ring head.
- Every 5 fine ticks: merge last 5 fine snapshots → heap-select top-5K → write to coarse ring.
- Fine ring: `[60]Snapshot` circular array. Coarse ring: `[288]Snapshot` circular array.
-  Each Snapshot is `[]TopNEntry` sorted desc by count (already sorted, merge is a heap pass).
- **TopN query path**: RLock relevant ring, sum the bucket range, group by dimension, apply filter,
-  heap-select top N. Worst case: 288×5K = 1.4M iterations — completes in <20 ms.
- **Trend query path**: for each bucket in range, sum counts of entries matching filter, emit one
-  `TrendPoint`. O(buckets × K) but result is tiny (max 288 points).
+- **Minute ticker**: heap-selects top-50K entries, writes snapshot to fine ring, resets live map.
+- Every 5 fine ticks: merge last 5 fine snapshots → top-5K → write to coarse ring.
+- **TopN query**: RLock ring, sum bucket range, apply filter, group by dimension, heap-select top N.
+- **Trend query**: per-bucket filtered sum, returns one `TrendPoint` per bucket.
+- **Subscriber fan-out**: per-subscriber buffered channel; `Subscribe`/`Unsubscribe` for streaming.

 ### server.go
- gRPC server on configurable port (default :9090).
- `TopN` and `Trend`: read-only calls into store, answered directly.
- `StreamSnapshots`: on each minute rotation the store signals a broadcast channel; the streaming
-  handler wakes, reads the latest snapshot from the ring, and sends it to all connected aggregators.
-  Uses `sync.Cond` or a fan-out via per-subscriber buffered channels.
+- gRPC server on configurable port (default `:9090`).
+- `TopN` and `Trend`: unary, answered from the ring buffer under RLock.
+- `StreamSnapshots`: registers a subscriber channel; loops `Recv` on it; 30 s keepalive ticker.

 ## Program 2 — Aggregator

 ### subscriber.go
- On startup: dials each collector, calls `StreamSnapshots`, receives `Snapshot` messages.
- Each incoming snapshot is handed to **merger.go**. Reconnects with exponential backoff on
-  stream error. Marks collector as degraded after 3 failed reconnects; clears on success.
+- One goroutine per collector. Dials, calls `StreamSnapshots`, forwards each `Snapshot` to the
+  merger.
+- Reconnects with exponential backoff (100 ms → doubles → cap 30 s).
+- After 3 consecutive failures: calls `merger.Zero(addr)` to remove that collector's contribution
+  from the merged view (prevents stale counts accumulating during outages).
+- Resets failure count on first successful `Recv`; logs recovery.

 ### merger.go
- Maintains one `map[Tuple4]int64` per collector (latest snapshot only — no ring buffer here,
-  the aggregator's cache serves that role).
- On each new snapshot from a collector: replace that collector's map, then rebuild the merged
-  view by summing across all collector maps. Store merged result into cache.go's ring buffer.
+- **Delta strategy**: on each new snapshot from collector X, subtract X's previous entries from
+  `merged`, add the new entries, store new map. O(snapshot_size) per update — not
+  O(N_collectors × snapshot_size).
+- `Zero(addr)`: subtracts the collector's last-known contribution and deletes its entry — called
+  when a collector is marked degraded.

 ### cache.go
- Same ring-buffer structure as the collector store (60 slots), populated by merger.
- `TopN` and `Trend` queries are answered from this cache — no live fan-out needed at query time,
-  satisfying the 250 ms SLA with headroom.
- Also tracks per-collector entry counts for "busiest nginx" queries (answered by treating
-  `source` as an additional group-by dimension).
+- **Tick-based rotation** (1-min ticker, not snapshot-triggered): keeps the aggregator ring aligned
+  to the same 1-minute cadence as collectors regardless of how many collectors are connected.
+- Same tiered ring structure as the collector store; populated from `merger.TopK()` each tick.
+- `QueryTopN`, `QueryTrend`, `Subscribe`/`Unsubscribe` — identical interface to collector store.

 ### server.go
- Implements the same `LogtailService` proto as the collector.
- `StreamSnapshots` on the aggregator re-streams merged snapshots to any downstream consumer
-  (e.g. a second-tier aggregator, or monitoring).
+- Implements `LogtailService` backed by the cache (not live fan-out).
+- `StreamSnapshots` re-streams merged fine snapshots; usable by a second-tier aggregator or
+  monitoring system.

 ## Program 3 — Frontend

 ### handler.go
- Filter state lives entirely in the **URL query string** (no server-side session needed; multiple
-  operators see independent views without shared state). Parameters: `w` (window), `by` (group_by),
-  `f_website`, `f_prefix`, `f_uri`, `f_status`.
- Main page: renders a ranked table. Clicking a row appends that dimension to the URL filter and
-  redirects. A breadcrumb shows active filters; each token is a link that removes it.
- **Auto-refresh**: `<meta http-equiv="refresh" content="30">` — simple, reliable, no JS required.
- A `?raw=1` flag returns JSON for scripting/curl use.
+- All filter state in the **URL query string**: `w` (window), `by` (group_by), `f_website`,
+  `f_prefix`, `f_uri`, `f_status`, `n`, `target`. No server-side session — URLs are shareable
+  and bookmarkable; multiple operators see independent views.
+- `TopN` and `Trend` RPCs issued **concurrently** (both with a 5 s deadline); page renders with
+  whatever completes. Trend failure suppresses the sparkline without erroring the page.
+- **Drilldown**: clicking a table row adds the current dimension's filter and advances `by` through
+  `website → prefix → uri → status → website` (cycles).
+- **`raw=1`**: returns the TopN result as JSON — same URL, no CLI needed for scripting.
+- **`target=` override**: per-request gRPC endpoint override for comparing sources.
+- Error pages render at HTTP 502 with the window/group-by tabs still functional.
+
+### sparkline.go
+- `renderSparkline([]*pb.TrendPoint) template.HTML` — fixed `viewBox="0 0 300 60"` SVG,
+  Y-scaled to max count, rendered as `<polyline>`. Returns `""` for fewer than 2 points or
+  all-zero data.

 ### templates/
- Base layout with filter breadcrumb and window selector tabs (1m / 5m / 15m / 60m / 6h / 24h).
- Table partial: columns are label, count, % of total, bar (inline `<meter>`).
- Sparkline partial: inline SVG polyline built from `TrendResponse.points` — 60 points, scaled to
-  the bucket's max, rendered server-side. No JS, no external assets.
+- `base.html`: outer shell, inline CSS (~40 lines), conditional `<meta http-equiv="refresh">`.
+- `index.html`: window tabs, group-by tabs, filter breadcrumb with `×` remove links, sparkline,
+  TopN table with `<meter>` bars (% relative to rank-1), footer with source and refresh info.
+- No external CSS, no web fonts, no JavaScript. Renders in w3m/lynx.

 ## Program 4 — CLI

-A single binary (`cmd/cli/main.go`) for shell-based debugging and programmatic top-K queries.
-Talks to any collector or aggregator via gRPC. All output is JSON.
-
 ### Subcommands

 ```
-cli topn    --target HOST:PORT [filter flags] [--by DIM] [--window W] [--n N] [--pretty]
-cli trend   --target HOST:PORT [filter flags] [--window W] [--pretty]
-cli stream  --target HOST:PORT [--pretty]
+logtail-cli topn    [flags]   ranked label → count table (exits after one response)
+logtail-cli trend   [flags]   per-bucket time series (exits after one response)
+logtail-cli stream  [flags]   live snapshot feed (runs until Ctrl-C, auto-reconnects)
 ```

 ### Flags

+**Shared** (all subcommands):
+
 | Flag         | Default          | Description                                              |
-|---------------|--------------|--------------------------------------------------------|
-| `--target`    | `localhost:9090` | gRPC address of collector or aggregator           |
-| `--by`        | `website`    | Group-by dimension: `website`, `prefix`, `uri`, `status` |
-| `--window`    | `5m`         | Time window: `1m` `5m` `15m` `60m` `6h` `24h`        |
-| `--n`         | `10`         | Number of top entries to return                        |
-| `--website`   | —            | Filter: restrict to this website                       |
-| `--prefix`    | —            | Filter: restrict to this client prefix                 |
-| `--uri`       | —            | Filter: restrict to this request URI                   |
-| `--status`    | —            | Filter: restrict to this HTTP status code              |
-| `--pretty`    | false        | Indent JSON output                                     |
+|--------------|------------------|----------------------------------------------------------|
+| `--target`   | `localhost:9090` | Comma-separated `host:port` list; fan-out to all         |
+| `--json`     | false            | Emit newline-delimited JSON instead of a table           |
+| `--website`  | —                | Filter: website                                          |
+| `--prefix`   | —                | Filter: client prefix                                    |
+| `--uri`      | —                | Filter: request URI                                      |
+| `--status`   | —                | Filter: HTTP status code                                 |

-### Output format
+**`topn` only**: `--n 10`, `--window 5m`, `--group-by website`

-**`topn`** — single JSON object, exits after one response:
-```json
-{
-  "target": "agg:9091", "window": "5m", "group_by": "prefix",
-  "filter": {"status": 429, "website": "www.example.com"},
-  "queried_at": "2026-03-14T12:00:00Z",
-  "entries": [
-    {"rank": 1, "label": "1.2.3.0/24",  "count": 8471},
-    {"rank": 2, "label": "5.6.7.0/24",  "count": 3201}
-  ]
-}
-```
+**`trend` only**: `--window 5m`

-**`trend`** — single JSON object, exits after one response:
-```json
-{
-  "target": "agg:9091", "window": "24h", "filter": {"status": 503},
-  "queried_at": "2026-03-14T12:00:00Z",
-  "points": [
-    {"time": "2026-03-14T11:00:00Z", "count": 45},
-    {"time": "2026-03-14T11:05:00Z", "count": 120}
-  ]
-}
-```
+### Multi-target fan-out

-**`stream`** — NDJSON (one JSON object per line, unbounded), suitable for `| jq -c 'select(...)'`:
-```json
-{"source": "nginx3:9090", "bucket_time": "2026-03-14T12:01:00Z", "entry_count": 42318, "top5": [{"label": "www.example.com", "count": 18000}, ...]}
-```
+`--target` accepts a comma-separated list. All targets are queried concurrently; results are
+printed in order with a per-target header. Single-target output omits the header for clean
+pipe-to-`jq` use.

-### Example usage
+### Output

-```bash
-# Who is hammering us with 429s right now?
-cli topn --target agg:9091 --window 1m --by prefix --status 429 --n 20 | jq '.entries[]'
+Default: human-readable table with space-separated thousands (`18 432`).
+`--json`: one JSON object per target (NDJSON for `stream`).

-# Which website has the most 503s over the last 24h?
-cli topn --target agg:9091 --window 24h --by website --status 503
-
-# Trend of all traffic to one site over 6h (for a quick graph)
-cli trend --target agg:9091 --window 6h --website api.example.com | jq '.points[] | [.time, .count]'
-
-# Watch live snapshots from one collector, filter for high-volume buckets
-cli stream --target nginx3:9090 | jq -c 'select(.entry_count > 10000)'
-```
-
-### Implementation notes
-
- Single `main.go` using the standard `flag` package with a manual subcommand dispatch —
-  no external CLI framework needed for three subcommands.
- Shares no code with the other binaries; duplicates the gRPC client setup locally (it's three
-  lines). Avoids creating a shared internal package for something this small.
- Non-zero exit code on any gRPC error so it composes cleanly in shell scripts.
+`stream` reconnects automatically on error (5 s backoff). All other subcommands exit immediately
+with a non-zero code on gRPC error.

 ## Key Design Decisions

@@ -360,12 +331,17 @@ cli stream --target nginx3:9090 | jq -c 'select(.entry_count > 10000)'
 | Single aggregator goroutine in collector | Eliminates all map lock contention on the 10 K/s hot path |
 | Hard cap live map at 100 K entries | Bounds memory regardless of DDoS cardinality explosion |
 | Ring buffer of sorted snapshots (not raw maps) | TopN queries avoid re-sorting; merge is a single heap pass |
-| Push-based streaming (collector → aggregator) | Aggregator cache is always fresh; query latency is cache-read only |
-| Same `LogtailService` for collector and aggregator | Frontend works with either; useful for single-box and debugging |
-| Filter state in URL, not session cookie | Supports multiple concurrent operators; shareable/bookmarkable URLs |
+| Push-based streaming (collector → aggregator) | Aggregator cache always fresh; query latency is cache-read only |
+| Delta merge in aggregator | O(snapshot_size) per update, not O(N_collectors × size) |
+| Tick-based cache rotation in aggregator | Ring stays on the same 1-min cadence regardless of collector count |
+| Degraded collector zeroing | Stale counts from failed collectors don't accumulate in the merged view |
+| Same `LogtailService` for collector and aggregator | CLI and frontend work with either; no special-casing |
+| `internal/store` shared package | ~200 lines of ring-buffer logic shared between collector and aggregator |
+| Filter state in URL, not session cookie | Multiple concurrent operators; shareable/bookmarkable URLs |
 | Query strings stripped at ingest | Major cardinality reduction; prevents URI explosion under attack |
 | No persistent storage | Simplicity; acceptable for ops dashboards (restart = lose history) |
 | Trusted internal network, no TLS | Reduces operational complexity; add a TLS proxy if needed later |
 | Server-side SVG sparklines, meta-refresh | Zero JS dependencies; works in terminal browsers and curl |
-| CLI outputs JSON, NDJSON for streaming | Composable with jq; non-zero exit on error for shell scripts |
-| CLI uses stdlib `flag`, no framework | Three subcommands don't justify a dependency; single file |
+| CLI default: human-readable table | Operator-friendly by default; `--json` opt-in for scripting |
+| CLI multi-target fan-out | Compare a collector vs. aggregator, or two collectors, in one command |
+| CLI uses stdlib `flag`, no framework | Four subcommands don't justify a dependency |