# Collector v0 — Implementation Plan ✓ COMPLETE Module path: `git.ipng.ch/ipng/nginx-logtail` **Scope:** A working collector that tails files, aggregates into memory, and serves `TopN`, `Trend`, and `StreamSnapshots` over gRPC. Full vertical slice, no optimisation passes yet. --- ## Step 1 — Repo scaffolding - `go mod init git.ipng.ch/ipng/nginx-logtail` - `.gitignore` - Install deps: `google.golang.org/grpc`, `google.golang.org/protobuf`, `github.com/fsnotify/fsnotify` ## Step 2 — Proto (`proto/logtail.proto`) Write the full proto file as specified in README.md DESIGN § Protobuf API. Generate Go stubs with `protoc`. Commit generated files. This defines the contract everything else builds on. ## Step 3 — Parser (`cmd/collector/parser.go`) - `LogRecord` struct: `Website`, `ClientPrefix`, `URI`, `Status string` - `ParseLine(line string) (LogRecord, bool)` — `SplitN` on tab, discard query string at `?`, return `false` for lines with fewer than 8 fields - `TruncateIP(addr string, v4bits, v6bits int) string` — handle IPv4 and IPv6 - Unit-tested with table-driven tests: normal line, short line, IPv6, query string stripping, /24 and /48 truncation ## Step 4 — Store (`cmd/collector/store.go`) Implement in order, each piece testable independently: 1. **`Tuple4` and live map** — `map[Tuple4]int64`, cap enforcement at 100K, `Ingest(r LogRecord)` 2. **Fine ring buffer** — `[60]Snapshot` circular array, `rotate()` heap-selects top-50K from live map, appends to ring, resets live map 3. **Coarse ring buffer** — `[288]Snapshot`, populated every 5 fine rotations by merging the last 5 fine snapshots into a top-5K snapshot 4. **`QueryTopN(filter, groupBy, n, window)`** — RLock, sum bucket range, group by dimension, apply filter, heap-select top N 5. **`QueryTrend(filter, window)`** — per-bucket count sum, returns one point per bucket 6. **`Store.Run(ch <-chan LogRecord)`** — single goroutine: read channel → `Ingest`, minute ticker → `rotate()` 7. **Snapshot broadcast** — per-subscriber buffered channel fan-out; `Subscribe() <-chan Snapshot` / `Unsubscribe(ch)` ## Step 5 — Tailer (`cmd/collector/tailer.go`) - `Tailer` struct: path, fsnotify watcher, output channel - On start: open file, seek to EOF, register fsnotify watch - On `fsnotify.Write`: `bufio.Scanner` reads all new lines, sends `LogRecord` to channel - On `fsnotify.Rename` / `Remove`: drain to EOF, close fd, retry open with 100 ms backoff (up to 5 s), resume from position 0 — no lines lost between drain and reopen - `Tailer.Run(ctx context.Context)` — blocks until context cancelled ## Step 6 — gRPC server (`cmd/collector/server.go`) - `Server` wraps `*Store`, implements `LogtailServiceServer` - `TopN`: `store.QueryTopN` → marshal to proto response - `Trend`: `store.QueryTrend` → marshal to proto response - `StreamSnapshots`: `store.Subscribe()`, loop sending snapshots until client disconnects or context done, then `store.Unsubscribe(ch)` ## Step 7 — Main (`cmd/collector/main.go`) Flags: - `--listen` default `:9090` - `--logs` comma-separated log file paths - `--source` name for this collector instance (default: hostname) - `--v4prefix` default `24` - `--v6prefix` default `48` Wire-up: create channel → start `store.Run` goroutine → start one `Tailer` goroutine per log path → start gRPC server → `signal.NotifyContext` for clean shutdown on SIGINT/SIGTERM. ## Step 8 — Smoke test - Generate fake log lines at 10K/s (small Go script or shell one-liner) - Run collector against them - Use `grpcurl` to call `TopN` and verify results - Check `runtime.MemStats` to confirm memory stays well under 1 GB --- ## Deferred (not in v0) - `cmd/cli`, `cmd/aggregator`, `cmd/frontend` - ClickHouse export - TLS / auth - Prometheus metrics endpoint --- ## Implementation notes ### Deviation from plan: MultiTailer Step 5 planned one `Tailer` struct per file. During implementation this was changed to a single `MultiTailer` with one shared `fsnotify.Watcher`. Reason: one watcher per file creates one inotify instance per file; the kernel default limit is 128 instances per user, which would be hit with 100s of log files. The `MultiTailer` uses a single instance and routes events by path via a `map[string]*fileState`. ### Deviation from plan: IPv6 /48 semantics The design doc said "truncate to /48". `/48` keeps the first three full 16-bit groups intact (e.g. `2001:db8:cafe::1` → `2001:db8:cafe::/48`). An early test expected `2001:db8:ca00::/48` (truncating mid-group), which was wrong. The code is correct; the test was fixed. --- ## Test results Run with: `go test ./cmd/collector/ -v -count=1 -timeout 120s` | Test | What it covers | |-----------------------------|----------------------------------------------------| | `TestParseLine` (7 cases) | Tab parsing, query string stripping, bad lines | | `TestTruncateIP` | IPv4 /24 and IPv6 /48 masking | | `TestIngestAndRotate` | Live map → fine ring rotation | | `TestLiveMapCap` | Hard cap at 100 K entries, no panic beyond cap | | `TestQueryTopN` | Ranked results from ring buffer | | `TestQueryTopNWithFilter` | Filter by HTTP status code | | `TestQueryTrend` | Per-bucket counts, oldest-first ordering | | `TestCoarseRingPopulated` | 5 fine ticks → 1 coarse bucket, count aggregation | | `TestSubscribeBroadcast` | Fan-out channel delivery after rotation | | `TestTopKOrdering` | Heap select returns correct top-K descending | | `TestMultiTailerReadsLines` | Live file write → LogRecord received on channel | | `TestMultiTailerMultipleFiles` | 5 files, one watcher, all lines received | | `TestMultiTailerLogRotation`| RENAME → drain → retry → new file tailed correctly | | `TestExpandGlobs` | Glob pattern expands to matching files only | | `TestExpandGlobsDeduplication` | Same file via path + glob deduplicated to one | | `TestMemoryBudget` | Full ring fill stays within 1 GB heap | | `TestGRPCEndToEnd` | Real gRPC server: TopN, filtered TopN, Trend, StreamSnapshots | **Total: 17 tests, all passing.** --- ## Benchmark results Run with: `go test ./cmd/collector/ -bench=. -benchtime=3s` Hardware: 12th Gen Intel Core i7-12700T | Benchmark | ns/op | throughput | headroom vs 10K/s | |--------------------|-------|----------------|-------------------| | `BenchmarkParseLine` | 418 | ~2.4M lines/s | 240× | | `BenchmarkIngest` | 152 | ~6.5M records/s| 650× | Both the parser and the store ingestion goroutine have several hundred times more capacity than the 10 000 lines/second peak requirement. The bottleneck at scale will be fsnotify event delivery and kernel I/O, not the Go code.