Update README.md

This commit is contained in:
2026-03-14 20:50:10 +01:00
parent c092561af2
commit 2962590a74

304
README.md
View File

@@ -1,27 +1,27 @@
SPECIFICATION
This project contains three programs:
1) A collector that can tail any number of nginx logfiles, and will keep a data structure of
{website,client_prefix,http_request_uri,http_response} across all logfiles in memory. It is
queryable and can give topN clients by website and by http_request; in other words I can see "who is
causing the most HTTP 429" or "what is the busiest website". This program pre-aggregates the logs
into a queryable structure. It runs on any number (10 or so) of nginx machines in a cluster. There
is no UI here, only a gRPC interface.
This project contains four programs:
2) An aggregator that can query the first one and show global stats and trending information. It needs
to be able to show global aggregated information from the first (collectors) to show 'what is the
busiest nginx' in addition to 'what is the busiest website' or 'which client_prefix or
http_request_uri is causing the most HTTP 503s'. It runs on a central machine and can show trending
information; useful for ddos detection. This aggregator is an RPC client of the collectors, and
itself presents a gRPC interface.
1) A **collector** that tails any number of nginx log files and maintains an in-memory structure of
`{website, client_prefix, http_request_uri, http_response}` counts across all files. It answers
TopN and Trend queries via gRPC and pushes minute snapshots to the aggregator via server-streaming.
Runs on each nginx machine in the cluster. No UI — gRPC interface only.
3) An HTTP companion frontend to the aggregator that can query either collector or aggregator and
answer user queries in a drilldown fashion, eg 'restrict to http_response=429' then 'restrict to
website=www.example.com' and so on. This is an interactive rollup UI that helps operators see
which websites are performing well, and which are performing poorly (eg excessive requests,
excessive http response errors, DDoS)
2) An **aggregator** that subscribes to the snapshot stream from all collectors, merges their data
into a unified in-memory cache, and exposes the same gRPC interface. Answers questions like "what
is the busiest website globally", "which client prefix is causing the most HTTP 503s", and shows
trending information useful for DDoS detection. Runs on a central machine.
Programs are written in Golang with a modern, responsive interactive interface.
3) An **HTTP frontend** companion to the aggregator that renders a drilldown dashboard. Operators
can restrict by `http_response=429`, then by `website=www.example.com`, and so on. Works with
either a collector or aggregator as its backend. Zero JavaScript — server-rendered HTML with inline
SVG sparklines and meta-refresh.
4) A **CLI** for shell-based debugging. Sends `topn`, `trend`, and `stream` queries to any
collector or aggregator, fans out to multiple targets in parallel, and outputs human-readable
tables or newline-delimited JSON.
Programs are written in Go. No CGO, no external runtime dependencies.
---
@@ -33,26 +33,39 @@ DESIGN
nginx-logtail/
├── proto/
│ └── logtail.proto # shared protobuf definitions
├── internal/
│ └── store/
│ └── store.go # shared types: Tuple4, Entry, Snapshot, ring helpers
└── cmd/
├── collector/
│ ├── main.go
│ ├── tailer.go # tail multiple log files via fsnotify, handle logrotate
│ ├── parser.go # tab-separated logtail log_format parser
│ ├── tailer.go # MultiTailer: tail N files via one shared fsnotify watcher
│ ├── parser.go # tab-separated logtail log_format parser (~50 ns/line)
│ ├── store.go # bounded top-K in-memory store + tiered ring buffers
│ └── server.go # gRPC server with server-streaming StreamSnapshots
│ └── server.go # gRPC server: TopN, Trend, StreamSnapshots
├── aggregator/
│ ├── main.go
│ ├── subscriber.go # opens streaming RPC to each collector, merges into cache
│ ├── merger.go # merge/sum TopN entries across sources
│ ├── cache.go # merged snapshot + tiered ring buffer served to frontend
│ ├── subscriber.go # one goroutine per collector; StreamSnapshots with backoff
│ ├── merger.go # delta-merge: O(snapshot_size) per update
│ ├── cache.go # tick-based ring buffer cache served to clients
│ └── server.go # gRPC server (same surface as collector)
├── frontend/
│ ├── main.go
│ ├── handler.go # HTTP handlers, filter state in URL query string
│ ├── client.go # gRPC client to aggregator (or collector)
── templates/ # server-rendered HTML + inline SVG sparklines
│ ├── handler.go # URL param parsing, concurrent TopN+Trend, template exec
│ ├── client.go # gRPC dial helper
── sparkline.go # TrendPoints → inline SVG polyline
│ ├── format.go # fmtCount (space thousands separator)
│ └── templates/
│ ├── base.html # outer HTML shell, inline CSS, meta-refresh
│ └── index.html # window tabs, group-by tabs, breadcrumb, table, footer
└── cli/
── main.go # topn / trend / stream subcommands, JSON output
── main.go # subcommand dispatch and usage
├── flags.go # shared flags, parseTargets, buildFilter, parseWindow
├── client.go # gRPC dial helper
├── format.go # printTable, fmtCount, fmtTime, targetHeader
├── cmd_topn.go # topn: concurrent fan-out, table + JSON output
├── cmd_trend.go # trend: concurrent fan-out, table + JSON output
└── cmd_stream.go # stream: multiplexed streams, auto-reconnect
```
## Data Model
@@ -78,7 +91,7 @@ Two ring buffers at different resolutions cover all query windows up to 24 hours
Supported query windows and which tier they read from:
| Window | Tier | Buckets summed |
|--------|--------|---------------|
|--------|--------|----------------|
| 1 min | fine | last 1 |
| 5 min | fine | last 5 |
| 15 min | fine | last 15 |
@@ -95,10 +108,10 @@ Entry size: ~30 B website + ~15 B prefix + ~50 B URI + 3 B status + 8 B count +
overhead ≈ **~186 bytes per entry**.
| Structure | Entries | Size |
|-------------------------|------------|------------|
|-------------------------|-------------|-------------|
| Live map (capped) | 100 000 | ~19 MB |
| Fine ring (60 × 1-min) | 60 × 50 000 | ~558 MB |
| Coarse ring (288 × 5-min)| 288 × 5 000 | ~268 MB |
| Coarse ring (288 × 5-min)| 288 × 5 000| ~268 MB |
| **Total** | | **~845 MB** |
The live map is **hard-capped at 100 K entries**. Once full, only updates to existing keys are
@@ -146,12 +159,12 @@ message TopNRequest { Filter filter = 1; GroupBy group_by = 2; int32 n = 3; Wi
message TopNEntry { string label = 1; int64 count = 2; }
message TopNResponse { repeated TopNEntry entries = 1; string source = 2; }
// Trend: one total count per minute bucket, for sparklines
// Trend: one total count per minute (or 5-min) bucket, for sparklines
message TrendRequest { Filter filter = 1; Window window = 4; }
message TrendPoint { int64 timestamp_unix = 1; int64 count = 2; }
message TrendResponse { repeated TrendPoint points = 1; }
message TrendResponse { repeated TrendPoint points = 1; string source = 2; }
// Streaming: collector pushes a snapshot after every minute rotation
// Streaming: collector pushes a fine snapshot after every minute rotation
message SnapshotRequest {}
message Snapshot {
string source = 1;
@@ -165,17 +178,19 @@ service LogtailService {
rpc StreamSnapshots(SnapshotRequest) returns (stream Snapshot);
}
// Both collector and aggregator implement LogtailService.
// Aggregator's StreamSnapshots fans out to all collectors and merges.
// The aggregator's StreamSnapshots re-streams the merged view.
```
## Program 1 — Collector
### tailer.go
- One goroutine per log file. Opens file, seeks to EOF.
- Uses **fsnotify** (inotify on Linux) to detect writes. On `WRITE` event: read all new lines.
- On `RENAME`/`REMOVE` event (logrotate): drain to EOF of old fd, then **re-open** the original
path (with retry backoff) and resume from position 0. No lines are lost between drain and reopen.
- Emits `LogRecord` structs on a shared buffered channel (size 200 K — absorbs ~20 s of peak load).
- **`MultiTailer`**: one shared `fsnotify.Watcher` for all files regardless of count — avoids
the inotify instance limit when tailing hundreds of files.
- On `WRITE` event: read all new lines from that file's `bufio.Reader`.
- On `RENAME`/`REMOVE` (logrotate): drain old fd to EOF, close, start retry-open goroutine with
exponential backoff. Sends the new `*os.File` back via a channel to keep map access single-threaded.
- Emits `LogRecord` structs on a shared buffered channel (capacity 200 K — absorbs ~20 s of peak).
- Accepts paths via `--logs` (comma-separated or glob) and `--logs-file` (one path/glob per line).
### parser.go
- Parses the fixed **logtail** nginx log format — tab-separated, fixed field order, no quoting:
@@ -184,174 +199,130 @@ service LogtailService {
log_format logtail '$host\t$remote_addr\t$msec\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time';
```
Example line:
```
www.example.com 1.2.3.4 1741954800.123 GET /api/v1/search 200 1452 0.043
```
Field positions (0-indexed):
| # | Field | Used for |
|---|------------------|-----------------|
|---|-------------------|------------------|
| 0 | `$host` | website |
| 1 | `$remote_addr` | client_prefix |
| 2 | `$msec` | (discarded) |
| 3 | `$request_method`| (discarded) |
| 4 | `$request_uri` | http_request_uri|
| 3 | `$request_method` | (discarded) |
| 4 | `$request_uri` | http_request_uri |
| 5 | `$status` | http_response |
| 6 | `$body_bytes_sent`| (discarded) |
| 7 | `$request_time` | (discarded) |
- At runtime: `strings.SplitN(line, "\t", 8)` — single call, ~50 ns/line. No regex, no state machine.
- `strings.SplitN(line, "\t", 8)` — ~50 ns/line. No regex.
- `$request_uri`: query string discarded at first `?`.
- `$remote_addr`: truncated to /24 (IPv4) or /48 (IPv6); prefix lengths configurable.
- Lines with fewer than 8 fields are silently skipped (malformed / truncated write).
- `$remote_addr`: truncated to /24 (IPv4) or /48 (IPv6); prefix lengths configurable via flags.
- Lines with fewer than 8 fields are silently skipped.
### store.go
- **Single aggregator goroutine** reads from the channel and updates the live map — no locking on
the hot path. At 10 K lines/s the goroutine uses <1% CPU.
- Live map: `map[Tuple4]int64`, hard-capped at 100 K entries (new keys dropped when full).
- **Minute ticker**: goroutine heap-selects top-50K entries from live map, writes snapshot into
fine ring buffer slot, clears live map, advances fine ring head.
- Every 5 fine ticks: merge last 5 fine snapshots → heap-select top-5K → write to coarse ring.
- Fine ring: `[60]Snapshot` circular array. Coarse ring: `[288]Snapshot` circular array.
Each Snapshot is `[]TopNEntry` sorted desc by count (already sorted, merge is a heap pass).
- **TopN query path**: RLock relevant ring, sum the bucket range, group by dimension, apply filter,
heap-select top N. Worst case: 288×5K = 1.4M iterations — completes in <20 ms.
- **Trend query path**: for each bucket in range, sum counts of entries matching filter, emit one
`TrendPoint`. O(buckets × K) but result is tiny (max 288 points).
- **Minute ticker**: heap-selects top-50K entries, writes snapshot to fine ring, resets live map.
- Every 5 fine ticks: merge last 5 fine snapshots → top-5K → write to coarse ring.
- **TopN query**: RLock ring, sum bucket range, apply filter, group by dimension, heap-select top N.
- **Trend query**: per-bucket filtered sum, returns one `TrendPoint` per bucket.
- **Subscriber fan-out**: per-subscriber buffered channel; `Subscribe`/`Unsubscribe` for streaming.
### server.go
- gRPC server on configurable port (default :9090).
- `TopN` and `Trend`: read-only calls into store, answered directly.
- `StreamSnapshots`: on each minute rotation the store signals a broadcast channel; the streaming
handler wakes, reads the latest snapshot from the ring, and sends it to all connected aggregators.
Uses `sync.Cond` or a fan-out via per-subscriber buffered channels.
- gRPC server on configurable port (default `:9090`).
- `TopN` and `Trend`: unary, answered from the ring buffer under RLock.
- `StreamSnapshots`: registers a subscriber channel; loops `Recv` on it; 30 s keepalive ticker.
## Program 2 — Aggregator
### subscriber.go
- On startup: dials each collector, calls `StreamSnapshots`, receives `Snapshot` messages.
- Each incoming snapshot is handed to **merger.go**. Reconnects with exponential backoff on
stream error. Marks collector as degraded after 3 failed reconnects; clears on success.
- One goroutine per collector. Dials, calls `StreamSnapshots`, forwards each `Snapshot` to the
merger.
- Reconnects with exponential backoff (100 ms → doubles → cap 30 s).
- After 3 consecutive failures: calls `merger.Zero(addr)` to remove that collector's contribution
from the merged view (prevents stale counts accumulating during outages).
- Resets failure count on first successful `Recv`; logs recovery.
### merger.go
- Maintains one `map[Tuple4]int64` per collector (latest snapshot only — no ring buffer here,
the aggregator's cache serves that role).
- On each new snapshot from a collector: replace that collector's map, then rebuild the merged
view by summing across all collector maps. Store merged result into cache.go's ring buffer.
- **Delta strategy**: on each new snapshot from collector X, subtract X's previous entries from
`merged`, add the new entries, store new map. O(snapshot_size) per update — not
O(N_collectors × snapshot_size).
- `Zero(addr)`: subtracts the collector's last-known contribution and deletes its entry — called
when a collector is marked degraded.
### cache.go
- Same ring-buffer structure as the collector store (60 slots), populated by merger.
- `TopN` and `Trend` queries are answered from this cache — no live fan-out needed at query time,
satisfying the 250 ms SLA with headroom.
- Also tracks per-collector entry counts for "busiest nginx" queries (answered by treating
`source` as an additional group-by dimension).
- **Tick-based rotation** (1-min ticker, not snapshot-triggered): keeps the aggregator ring aligned
to the same 1-minute cadence as collectors regardless of how many collectors are connected.
- Same tiered ring structure as the collector store; populated from `merger.TopK()` each tick.
- `QueryTopN`, `QueryTrend`, `Subscribe`/`Unsubscribe` — identical interface to collector store.
### server.go
- Implements the same `LogtailService` proto as the collector.
- `StreamSnapshots` on the aggregator re-streams merged snapshots to any downstream consumer
(e.g. a second-tier aggregator, or monitoring).
- Implements `LogtailService` backed by the cache (not live fan-out).
- `StreamSnapshots` re-streams merged fine snapshots; usable by a second-tier aggregator or
monitoring system.
## Program 3 — Frontend
### handler.go
- Filter state lives entirely in the **URL query string** (no server-side session needed; multiple
operators see independent views without shared state). Parameters: `w` (window), `by` (group_by),
`f_website`, `f_prefix`, `f_uri`, `f_status`.
- Main page: renders a ranked table. Clicking a row appends that dimension to the URL filter and
redirects. A breadcrumb shows active filters; each token is a link that removes it.
- **Auto-refresh**: `<meta http-equiv="refresh" content="30">` — simple, reliable, no JS required.
- A `?raw=1` flag returns JSON for scripting/curl use.
- All filter state in the **URL query string**: `w` (window), `by` (group_by), `f_website`,
`f_prefix`, `f_uri`, `f_status`, `n`, `target`. No server-side session — URLs are shareable
and bookmarkable; multiple operators see independent views.
- `TopN` and `Trend` RPCs issued **concurrently** (both with a 5 s deadline); page renders with
whatever completes. Trend failure suppresses the sparkline without erroring the page.
- **Drilldown**: clicking a table row adds the current dimension's filter and advances `by` through
`website → prefix → uri → status → website` (cycles).
- **`raw=1`**: returns the TopN result as JSON — same URL, no CLI needed for scripting.
- **`target=` override**: per-request gRPC endpoint override for comparing sources.
- Error pages render at HTTP 502 with the window/group-by tabs still functional.
### sparkline.go
- `renderSparkline([]*pb.TrendPoint) template.HTML` — fixed `viewBox="0 0 300 60"` SVG,
Y-scaled to max count, rendered as `<polyline>`. Returns `""` for fewer than 2 points or
all-zero data.
### templates/
- Base layout with filter breadcrumb and window selector tabs (1m / 5m / 15m / 60m / 6h / 24h).
- Table partial: columns are label, count, % of total, bar (inline `<meter>`).
- Sparkline partial: inline SVG polyline built from `TrendResponse.points` — 60 points, scaled to
the bucket's max, rendered server-side. No JS, no external assets.
- `base.html`: outer shell, inline CSS (~40 lines), conditional `<meta http-equiv="refresh">`.
- `index.html`: window tabs, group-by tabs, filter breadcrumb with `×` remove links, sparkline,
TopN table with `<meter>` bars (% relative to rank-1), footer with source and refresh info.
- No external CSS, no web fonts, no JavaScript. Renders in w3m/lynx.
## Program 4 — CLI
A single binary (`cmd/cli/main.go`) for shell-based debugging and programmatic top-K queries.
Talks to any collector or aggregator via gRPC. All output is JSON.
### Subcommands
```
cli topn --target HOST:PORT [filter flags] [--by DIM] [--window W] [--n N] [--pretty]
cli trend --target HOST:PORT [filter flags] [--window W] [--pretty]
cli stream --target HOST:PORT [--pretty]
logtail-cli topn [flags] ranked label → count table (exits after one response)
logtail-cli trend [flags] per-bucket time series (exits after one response)
logtail-cli stream [flags] live snapshot feed (runs until Ctrl-C, auto-reconnects)
```
### Flags
**Shared** (all subcommands):
| Flag | Default | Description |
|---------------|--------------|--------------------------------------------------------|
| `--target` | `localhost:9090` | gRPC address of collector or aggregator |
| `--by` | `website` | Group-by dimension: `website`, `prefix`, `uri`, `status` |
| `--window` | `5m` | Time window: `1m` `5m` `15m` `60m` `6h` `24h` |
| `--n` | `10` | Number of top entries to return |
| `--website` | — | Filter: restrict to this website |
| `--prefix` | — | Filter: restrict to this client prefix |
| `--uri` | — | Filter: restrict to this request URI |
| `--status` | — | Filter: restrict to this HTTP status code |
| `--pretty` | false | Indent JSON output |
|--------------|------------------|----------------------------------------------------------|
| `--target` | `localhost:9090` | Comma-separated `host:port` list; fan-out to all |
| `--json` | false | Emit newline-delimited JSON instead of a table |
| `--website` | — | Filter: website |
| `--prefix` | — | Filter: client prefix |
| `--uri` | — | Filter: request URI |
| `--status` | — | Filter: HTTP status code |
### Output format
**`topn` only**: `--n 10`, `--window 5m`, `--group-by website`
**`topn`** — single JSON object, exits after one response:
```json
{
"target": "agg:9091", "window": "5m", "group_by": "prefix",
"filter": {"status": 429, "website": "www.example.com"},
"queried_at": "2026-03-14T12:00:00Z",
"entries": [
{"rank": 1, "label": "1.2.3.0/24", "count": 8471},
{"rank": 2, "label": "5.6.7.0/24", "count": 3201}
]
}
```
**`trend` only**: `--window 5m`
**`trend`** — single JSON object, exits after one response:
```json
{
"target": "agg:9091", "window": "24h", "filter": {"status": 503},
"queried_at": "2026-03-14T12:00:00Z",
"points": [
{"time": "2026-03-14T11:00:00Z", "count": 45},
{"time": "2026-03-14T11:05:00Z", "count": 120}
]
}
```
### Multi-target fan-out
**`stream`** — NDJSON (one JSON object per line, unbounded), suitable for `| jq -c 'select(...)'`:
```json
{"source": "nginx3:9090", "bucket_time": "2026-03-14T12:01:00Z", "entry_count": 42318, "top5": [{"label": "www.example.com", "count": 18000}, ...]}
```
`--target` accepts a comma-separated list. All targets are queried concurrently; results are
printed in order with a per-target header. Single-target output omits the header for clean
pipe-to-`jq` use.
### Example usage
### Output
```bash
# Who is hammering us with 429s right now?
cli topn --target agg:9091 --window 1m --by prefix --status 429 --n 20 | jq '.entries[]'
Default: human-readable table with space-separated thousands (`18 432`).
`--json`: one JSON object per target (NDJSON for `stream`).
# Which website has the most 503s over the last 24h?
cli topn --target agg:9091 --window 24h --by website --status 503
# Trend of all traffic to one site over 6h (for a quick graph)
cli trend --target agg:9091 --window 6h --website api.example.com | jq '.points[] | [.time, .count]'
# Watch live snapshots from one collector, filter for high-volume buckets
cli stream --target nginx3:9090 | jq -c 'select(.entry_count > 10000)'
```
### Implementation notes
- Single `main.go` using the standard `flag` package with a manual subcommand dispatch —
no external CLI framework needed for three subcommands.
- Shares no code with the other binaries; duplicates the gRPC client setup locally (it's three
lines). Avoids creating a shared internal package for something this small.
- Non-zero exit code on any gRPC error so it composes cleanly in shell scripts.
`stream` reconnects automatically on error (5 s backoff). All other subcommands exit immediately
with a non-zero code on gRPC error.
## Key Design Decisions
@@ -360,12 +331,17 @@ cli stream --target nginx3:9090 | jq -c 'select(.entry_count > 10000)'
| Single aggregator goroutine in collector | Eliminates all map lock contention on the 10 K/s hot path |
| Hard cap live map at 100 K entries | Bounds memory regardless of DDoS cardinality explosion |
| Ring buffer of sorted snapshots (not raw maps) | TopN queries avoid re-sorting; merge is a single heap pass |
| Push-based streaming (collector → aggregator) | Aggregator cache is always fresh; query latency is cache-read only |
| Same `LogtailService` for collector and aggregator | Frontend works with either; useful for single-box and debugging |
| Filter state in URL, not session cookie | Supports multiple concurrent operators; shareable/bookmarkable URLs |
| Push-based streaming (collector → aggregator) | Aggregator cache always fresh; query latency is cache-read only |
| Delta merge in aggregator | O(snapshot_size) per update, not O(N_collectors × size) |
| Tick-based cache rotation in aggregator | Ring stays on the same 1-min cadence regardless of collector count |
| Degraded collector zeroing | Stale counts from failed collectors don't accumulate in the merged view |
| Same `LogtailService` for collector and aggregator | CLI and frontend work with either; no special-casing |
| `internal/store` shared package | ~200 lines of ring-buffer logic shared between collector and aggregator |
| Filter state in URL, not session cookie | Multiple concurrent operators; shareable/bookmarkable URLs |
| Query strings stripped at ingest | Major cardinality reduction; prevents URI explosion under attack |
| No persistent storage | Simplicity; acceptable for ops dashboards (restart = lose history) |
| Trusted internal network, no TLS | Reduces operational complexity; add a TLS proxy if needed later |
| Server-side SVG sparklines, meta-refresh | Zero JS dependencies; works in terminal browsers and curl |
| CLI outputs JSON, NDJSON for streaming | Composable with jq; non-zero exit on error for shell scripts |
| CLI uses stdlib `flag`, no framework | Three subcommands don't justify a dependency; single file |
| CLI default: human-readable table | Operator-friendly by default; `--json` opt-in for scripting |
| CLI multi-target fan-out | Compare a collector vs. aggregator, or two collectors, in one command |
| CLI uses stdlib `flag`, no framework | Four subcommands don't justify a dependency |