SPECIFICATION

This project contains three programs:
1) A collector that can tail any number of nginx logfiles, and will keep a data structure of
{website,client_prefix,http_request_uri,http_response} across all logfiles in memory. It is
queryable and can give topN clients by website and by http_request; in other words I can see "who is
causing the most HTTP 429" or "what is the busiest website". This program pre-aggregates the logs
into a queryable structure. It runs on any number (10 or so) of nginx machines in a cluster. There
is no UI here, only a gRPC interface.

2) An aggregator that can query the first one and show global stats and trending information. It needs
to be able to show global aggregated information from the first (collectors) to show 'what is the
busiest nginx' in addition to  'what is the busiest website' or 'which client_prefix or
http_request_uri is causing the most HTTP 503s'. It runs on a central machine and can show trending
information; useful for ddos detection.  This aggregator is an RPC client of the collectors, and
itself presents a gRPC interface.

3) An HTTP companion frontend to the aggregator that can query either collector or aggregator and
answer user queries in a drilldown fashion, eg 'restrict to http_response=429' then 'restrict to
website=www.example.com' and so on.  This is an interactive rollup UI that helps operators see
which websites are performing well, and which are performing poorly (eg excessive requests,
excessive http response errors, DDoS)

Programs are written in Golang with a modern, responsive interactive interface.

---

DESIGN

## Directory Layout

```
nginx-logtail/
├── proto/
│   └── logtail.proto                  # shared protobuf definitions
└── cmd/
    ├── collector/
    │   ├── main.go
    │   ├── tailer.go                  # tail multiple log files via fsnotify, handle logrotate
    │   ├── parser.go                  # tab-separated logtail log_format parser
    │   ├── store.go                   # bounded top-K in-memory store + tiered ring buffers
    │   └── server.go                  # gRPC server with server-streaming StreamSnapshots
    ├── aggregator/
    │   ├── main.go
    │   ├── subscriber.go              # opens streaming RPC to each collector, merges into cache
    │   ├── merger.go                  # merge/sum TopN entries across sources
    │   ├── cache.go                   # merged snapshot + tiered ring buffer served to frontend
    │   └── server.go                  # gRPC server (same surface as collector)
    ├── frontend/
    │   ├── main.go
    │   ├── handler.go                 # HTTP handlers, filter state in URL query string
    │   ├── client.go                  # gRPC client to aggregator (or collector)
    │   └── templates/                 # server-rendered HTML + inline SVG sparklines
    └── cli/
        └── main.go                    # topn / trend / stream subcommands, JSON output
```

## Data Model

The core unit is a **count keyed by four dimensions**:

| Field             | Description                                          | Example           |
|-------------------|------------------------------------------------------|-------------------|
| `website`         | nginx `$host`                                        | `www.example.com` |
| `client_prefix`   | client IP truncated to /24 IPv4 or /48 IPv6          | `1.2.3.0/24`      |
| `http_request_uri`| `$request_uri` path only — query string stripped     | `/api/v1/search`  |
| `http_response`   | HTTP status code                                     | `429`             |

## Time Windows & Tiered Ring Buffers

Two ring buffers at different resolutions cover all query windows up to 24 hours:

| Tier   | Bucket size | Buckets | Top-K/bucket | Covers | Roll-up trigger     |
|--------|-------------|---------|--------------|--------|---------------------|
| Fine   | 1 min       | 60      | 50 000       | 1 h    | every minute        |
| Coarse | 5 min       | 288     | 5 000        | 24 h   | every 5 fine ticks  |

Supported query windows and which tier they read from:

| Window | Tier   | Buckets summed |
|--------|--------|---------------|
| 1 min  | fine   | last 1        |
| 5 min  | fine   | last 5        |
| 15 min | fine   | last 15       |
| 60 min | fine   | all 60        |
| 6 h    | coarse | last 72       |
| 24 h   | coarse | all 288       |

Every minute: snapshot live map → top-50K → append to fine ring, reset live map.
Every 5 minutes: merge last 5 fine snapshots → top-5K → append to coarse ring.

## Memory Budget (Collector, target ≤ 1 GB)

Entry size: ~30 B website + ~15 B prefix + ~50 B URI + 3 B status + 8 B count + ~80 B Go map
overhead ≈ **~186 bytes per entry**.

| Structure               | Entries    | Size       |
|-------------------------|------------|------------|
| Live map (capped)       | 100 000    | ~19 MB     |
| Fine ring (60 × 1-min)  | 60 × 50 000 | ~558 MB   |
| Coarse ring (288 × 5-min)| 288 × 5 000 | ~268 MB  |
| **Total**               |            | **~845 MB** |

The live map is **hard-capped at 100 K entries**. Once full, only updates to existing keys are
accepted; new keys are dropped until the next rotation resets the map. This keeps memory bounded
regardless of attack cardinality.

## Future Work — ClickHouse Export (post-MVP)

> **Do not implement until the end-to-end MVP is running.**

The aggregator will optionally write 1-minute pre-aggregated rows to ClickHouse for 7d/30d
historical views. Schema sketch:

```sql
CREATE TABLE logtail (
  ts            DateTime,
  website       LowCardinality(String),
  client_prefix String,
  request_uri   LowCardinality(String),
  status        UInt16,
  count         UInt64
) ENGINE = SummingMergeTree(count)
PARTITION BY toYYYYMMDD(ts)
ORDER BY (ts, website, status, client_prefix, request_uri);
```

The frontend routes `window=7d|30d` queries to ClickHouse; all shorter windows continue to use
the in-memory cache. Kafka is not needed — the aggregator writes directly. This is purely additive
and does not change any existing interface.

## Protobuf API (`proto/logtail.proto`)

```protobuf
message Filter {
  optional string website          = 1;
  optional string client_prefix    = 2;
  optional string http_request_uri = 3;
  optional int32  http_response    = 4;
}

enum GroupBy { WEBSITE = 0; CLIENT_PREFIX = 1; REQUEST_URI = 2; HTTP_RESPONSE = 3; }
enum Window  { W1M = 0; W5M = 1; W15M = 2; W60M = 3; W6H = 4; W24H = 5; }

message TopNRequest   { Filter filter = 1; GroupBy group_by = 2; int32 n = 3; Window window = 4; }
message TopNEntry     { string label = 1; int64 count = 2; }
message TopNResponse  { repeated TopNEntry entries = 1; string source = 2; }

// Trend: one total count per minute bucket, for sparklines
message TrendRequest  { Filter filter = 1; Window window = 4; }
message TrendPoint    { int64 timestamp_unix = 1; int64 count = 2; }
message TrendResponse { repeated TrendPoint points = 1; }

// Streaming: collector pushes a snapshot after every minute rotation
message SnapshotRequest {}
message Snapshot {
  string              source    = 1;
  int64               timestamp = 2;
  repeated TopNEntry  entries   = 3;  // full top-50K for this bucket
}

service LogtailService {
  rpc TopN(TopNRequest)              returns (TopNResponse);
  rpc Trend(TrendRequest)            returns (TrendResponse);
  rpc StreamSnapshots(SnapshotRequest) returns (stream Snapshot);
}
// Both collector and aggregator implement LogtailService.
// Aggregator's StreamSnapshots fans out to all collectors and merges.
```

## Program 1 — Collector

### tailer.go
- One goroutine per log file. Opens file, seeks to EOF.
- Uses **fsnotify** (inotify on Linux) to detect writes. On `WRITE` event: read all new lines.
- On `RENAME`/`REMOVE` event (logrotate): drain to EOF of old fd, then **re-open** the original
  path (with retry backoff) and resume from position 0. No lines are lost between drain and reopen.
- Emits `LogRecord` structs on a shared buffered channel (size 200 K — absorbs ~20 s of peak load).

### parser.go
- Parses the fixed **logtail** nginx log format — tab-separated, fixed field order, no quoting:

  ```nginx
  log_format logtail '$host\t$remote_addr\t$msec\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time';
  ```

  Example line:
  ```
  www.example.com	1.2.3.4	1741954800.123	GET	/api/v1/search	200	1452	0.043
  ```

  Field positions (0-indexed):

  | # | Field            | Used for        |
  |---|------------------|-----------------|
  | 0 | `$host`          | website         |
  | 1 | `$remote_addr`   | client_prefix   |
  | 2 | `$msec`          | (discarded)     |
  | 3 | `$request_method`| (discarded)     |
  | 4 | `$request_uri`   | http_request_uri|
  | 5 | `$status`        | http_response   |
  | 6 | `$body_bytes_sent`| (discarded)    |
  | 7 | `$request_time`  | (discarded)     |

- At runtime: `strings.SplitN(line, "\t", 8)` — single call, ~50 ns/line. No regex, no state machine.
- `$request_uri`: query string discarded at first `?`.
- `$remote_addr`: truncated to /24 (IPv4) or /48 (IPv6); prefix lengths configurable.
- Lines with fewer than 8 fields are silently skipped (malformed / truncated write).

### store.go
- **Single aggregator goroutine** reads from the channel and updates the live map — no locking on
  the hot path. At 10 K lines/s the goroutine uses <1% CPU.
- Live map: `map[Tuple4]int64`, hard-capped at 100 K entries (new keys dropped when full).
- **Minute ticker**: goroutine heap-selects top-50K entries from live map, writes snapshot into
  fine ring buffer slot, clears live map, advances fine ring head.
- Every 5 fine ticks: merge last 5 fine snapshots → heap-select top-5K → write to coarse ring.
- Fine ring: `[60]Snapshot` circular array. Coarse ring: `[288]Snapshot` circular array.
  Each Snapshot is `[]TopNEntry` sorted desc by count (already sorted, merge is a heap pass).
- **TopN query path**: RLock relevant ring, sum the bucket range, group by dimension, apply filter,
  heap-select top N. Worst case: 288×5K = 1.4M iterations — completes in <20 ms.
- **Trend query path**: for each bucket in range, sum counts of entries matching filter, emit one
  `TrendPoint`. O(buckets × K) but result is tiny (max 288 points).

### server.go
- gRPC server on configurable port (default :9090).
- `TopN` and `Trend`: read-only calls into store, answered directly.
- `StreamSnapshots`: on each minute rotation the store signals a broadcast channel; the streaming
  handler wakes, reads the latest snapshot from the ring, and sends it to all connected aggregators.
  Uses `sync.Cond` or a fan-out via per-subscriber buffered channels.

## Program 2 — Aggregator

### subscriber.go
- On startup: dials each collector, calls `StreamSnapshots`, receives `Snapshot` messages.
- Each incoming snapshot is handed to **merger.go**. Reconnects with exponential backoff on
  stream error. Marks collector as degraded after 3 failed reconnects; clears on success.

### merger.go
- Maintains one `map[Tuple4]int64` per collector (latest snapshot only — no ring buffer here,
  the aggregator's cache serves that role).
- On each new snapshot from a collector: replace that collector's map, then rebuild the merged
  view by summing across all collector maps. Store merged result into cache.go's ring buffer.

### cache.go
- Same ring-buffer structure as the collector store (60 slots), populated by merger.
- `TopN` and `Trend` queries are answered from this cache — no live fan-out needed at query time,
  satisfying the 250 ms SLA with headroom.
- Also tracks per-collector entry counts for "busiest nginx" queries (answered by treating
  `source` as an additional group-by dimension).

### server.go
- Implements the same `LogtailService` proto as the collector.
- `StreamSnapshots` on the aggregator re-streams merged snapshots to any downstream consumer
  (e.g. a second-tier aggregator, or monitoring).

## Program 3 — Frontend

### handler.go
- Filter state lives entirely in the **URL query string** (no server-side session needed; multiple
  operators see independent views without shared state). Parameters: `w` (window), `by` (group_by),
  `f_website`, `f_prefix`, `f_uri`, `f_status`.
- Main page: renders a ranked table. Clicking a row appends that dimension to the URL filter and
  redirects. A breadcrumb shows active filters; each token is a link that removes it.
- **Auto-refresh**: `<meta http-equiv="refresh" content="30">` — simple, reliable, no JS required.
- A `?raw=1` flag returns JSON for scripting/curl use.

### templates/
- Base layout with filter breadcrumb and window selector tabs (1m / 5m / 15m / 60m / 6h / 24h).
- Table partial: columns are label, count, % of total, bar (inline `<meter>`).
- Sparkline partial: inline SVG polyline built from `TrendResponse.points` — 60 points, scaled to
  the bucket's max, rendered server-side. No JS, no external assets.

## Program 4 — CLI

A single binary (`cmd/cli/main.go`) for shell-based debugging and programmatic top-K queries.
Talks to any collector or aggregator via gRPC. All output is JSON.

### Subcommands

```
cli topn    --target HOST:PORT [filter flags] [--by DIM] [--window W] [--n N] [--pretty]
cli trend   --target HOST:PORT [filter flags] [--window W] [--pretty]
cli stream  --target HOST:PORT [--pretty]
```

### Flags

| Flag          | Default      | Description                                            |
|---------------|--------------|--------------------------------------------------------|
| `--target`    | `localhost:9090` | gRPC address of collector or aggregator           |
| `--by`        | `website`    | Group-by dimension: `website`, `prefix`, `uri`, `status` |
| `--window`    | `5m`         | Time window: `1m` `5m` `15m` `60m` `6h` `24h`        |
| `--n`         | `10`         | Number of top entries to return                        |
| `--website`   | —            | Filter: restrict to this website                       |
| `--prefix`    | —            | Filter: restrict to this client prefix                 |
| `--uri`       | —            | Filter: restrict to this request URI                   |
| `--status`    | —            | Filter: restrict to this HTTP status code              |
| `--pretty`    | false        | Indent JSON output                                     |

### Output format

**`topn`** — single JSON object, exits after one response:
```json
{
  "target": "agg:9091", "window": "5m", "group_by": "prefix",
  "filter": {"status": 429, "website": "www.example.com"},
  "queried_at": "2026-03-14T12:00:00Z",
  "entries": [
    {"rank": 1, "label": "1.2.3.0/24",  "count": 8471},
    {"rank": 2, "label": "5.6.7.0/24",  "count": 3201}
  ]
}
```

**`trend`** — single JSON object, exits after one response:
```json
{
  "target": "agg:9091", "window": "24h", "filter": {"status": 503},
  "queried_at": "2026-03-14T12:00:00Z",
  "points": [
    {"time": "2026-03-14T11:00:00Z", "count": 45},
    {"time": "2026-03-14T11:05:00Z", "count": 120}
  ]
}
```

**`stream`** — NDJSON (one JSON object per line, unbounded), suitable for `| jq -c 'select(...)'`:
```json
{"source": "nginx3:9090", "bucket_time": "2026-03-14T12:01:00Z", "entry_count": 42318, "top5": [{"label": "www.example.com", "count": 18000}, ...]}
```

### Example usage

```bash
# Who is hammering us with 429s right now?
cli topn --target agg:9091 --window 1m --by prefix --status 429 --n 20 | jq '.entries[]'

# Which website has the most 503s over the last 24h?
cli topn --target agg:9091 --window 24h --by website --status 503

# Trend of all traffic to one site over 6h (for a quick graph)
cli trend --target agg:9091 --window 6h --website api.example.com | jq '.points[] | [.time, .count]'

# Watch live snapshots from one collector, filter for high-volume buckets
cli stream --target nginx3:9090 | jq -c 'select(.entry_count > 10000)'
```

### Implementation notes

- Single `main.go` using the standard `flag` package with a manual subcommand dispatch —
  no external CLI framework needed for three subcommands.
- Shares no code with the other binaries; duplicates the gRPC client setup locally (it's three
  lines). Avoids creating a shared internal package for something this small.
- Non-zero exit code on any gRPC error so it composes cleanly in shell scripts.

## Key Design Decisions

| Decision | Rationale |
|----------|-----------|
| Single aggregator goroutine in collector | Eliminates all map lock contention on the 10 K/s hot path |
| Hard cap live map at 100 K entries | Bounds memory regardless of DDoS cardinality explosion |
| Ring buffer of sorted snapshots (not raw maps) | TopN queries avoid re-sorting; merge is a single heap pass |
| Push-based streaming (collector → aggregator) | Aggregator cache is always fresh; query latency is cache-read only |
| Same `LogtailService` for collector and aggregator | Frontend works with either; useful for single-box and debugging |
| Filter state in URL, not session cookie | Supports multiple concurrent operators; shareable/bookmarkable URLs |
| Query strings stripped at ingest | Major cardinality reduction; prevents URI explosion under attack |
| No persistent storage | Simplicity; acceptable for ops dashboards (restart = lose history) |
| Trusted internal network, no TLS | Reduces operational complexity; add a TLS proxy if needed later |
| Server-side SVG sparklines, meta-refresh | Zero JS dependencies; works in terminal browsers and curl |
| CLI outputs JSON, NDJSON for streaming | Composable with jq; non-zero exit on error for shell scripts |
| CLI uses stdlib `flag`, no framework | Three subcommands don't justify a dependency; single file |