Files
nginx-logtail/PLAN_CLI.md
2026-03-14 20:30:23 +01:00

294 lines
9.5 KiB
Markdown

# CLI v0 — Implementation Plan
Module path: `git.ipng.ch/ipng/nginx-logtail`
**Scope:** A shell-facing debug tool that can query any number of collectors or aggregators
(they share the same `LogtailService` gRPC interface) and print results in a human-readable
table or JSON. Supports all three RPCs: `TopN`, `Trend`, and `StreamSnapshots`.
---
## Overview
Single binary `logtail-cli` with three subcommands:
```
logtail-cli topn [flags] # ranked list of label → count
logtail-cli trend [flags] # per-bucket time series
logtail-cli stream [flags] # live snapshot feed
```
All subcommands accept one or more `--target` addresses. Requests are fanned out
concurrently; each target's results are printed under a labeled header. With a single
target the header is omitted for clean pipe-friendly output.
---
## Step 1 — main.go and subcommand dispatch
No third-party CLI frameworks — plain `os.Args` subcommand dispatch, each subcommand
registers its own `flag.FlagSet`.
```
main():
if len(os.Args) < 2 → print usage, exit 1
switch os.Args[1]:
"topn" → runTopN(os.Args[2:])
"trend" → runTrend(os.Args[2:])
"stream" → runStream(os.Args[2:])
default → print usage, exit 1
```
Usage text lists all subcommands and their flags.
---
## Step 2 — Shared flags and client helper (`flags.go`, `client.go`)
**Shared flags** (parsed by each subcommand's FlagSet):
| Flag | Default | Description |
|------|---------|-------------|
| `--target` | `localhost:9090` | Comma-separated `host:port` list (may be repeated) |
| `--json` | false | Emit newline-delimited JSON instead of a table |
| `--website` | — | Filter: exact website match |
| `--prefix` | — | Filter: exact client prefix match |
| `--uri` | — | Filter: exact URI match |
| `--status` | — | Filter: exact HTTP status match |
`parseTargets(s string) []string` — split on comma, trim spaces, deduplicate.
`buildFilter(flags) *pb.Filter` — returns nil if no filter flags set (signals "no filter"
to the server), otherwise populates the proto fields.
**`client.go`**:
```go
func dial(addr string) (*grpc.ClientConn, pb.LogtailServiceClient, error)
```
Plain insecure dial (matching the servers' plain-TCP listener). Returns an error rather
than calling `log.Fatal` so callers can report which target failed without killing the process.
---
## Step 3 — `topn` subcommand (`cmd_topn.go`)
Additional flags:
| Flag | Default | Description |
|------|---------|-------------|
| `--n` | 10 | Number of entries to return |
| `--window` | `5m` | Time window: `1m 5m 15m 60m 6h 24h` |
| `--group-by` | `website` | Grouping: `website prefix uri status` |
`parseWindow(s string) pb.Window` — maps string → proto enum, exits on unknown value.
`parseGroupBy(s string) pb.GroupBy` — same pattern.
Fan-out: one goroutine per target, each calls `TopN` with a 10 s context deadline,
sends result (or error) on a typed result channel. Main goroutine collects all results
in target order.
**Table output** (default):
```
=== collector-1 (localhost:9090) ===
RANK COUNT LABEL
1 18 432 example.com
2 4 211 other.com
...
=== aggregator (localhost:9091) ===
RANK COUNT LABEL
1 22 643 example.com
...
```
Single-target: header omitted, plain table printed.
**JSON output** (`--json`): one JSON object per target, written sequentially to stdout:
```json
{"source":"collector-1","target":"localhost:9090","entries":[{"label":"example.com","count":18432},...]}
```
---
## Step 4 — `trend` subcommand (`cmd_trend.go`)
Additional flags:
| Flag | Default | Description |
|------|---------|-------------|
| `--window` | `5m` | Time window: `1m 5m 15m 60m 6h 24h` |
Same fan-out pattern as `topn`.
**Table output**:
```
=== collector-1 (localhost:9090) ===
TIME (UTC) COUNT
2026-03-14 20:00 823
2026-03-14 20:01 941
...
```
Points are printed oldest-first (as returned by the server).
**JSON output**: one object per target:
```json
{"source":"col-1","target":"localhost:9090","points":[{"ts":1773516000,"count":823},...]
```
---
## Step 5 — `stream` subcommand (`cmd_stream.go`)
No extra flags beyond shared ones. Each target gets one persistent `StreamSnapshots`
connection. All streams are multiplexed onto a single output goroutine via an internal
channel so lines from different targets don't interleave.
```
type streamEvent struct {
target string
source string
snap *pb.Snapshot
err error
}
```
One goroutine per target: connect → loop `stream.Recv()` → send event on channel.
On error: log to stderr, attempt reconnect after 5 s backoff (indefinitely, until
`Ctrl-C`).
`signal.NotifyContext` on SIGINT/SIGTERM cancels all stream goroutines.
**Table output** (one line per snapshot received):
```
2026-03-14 20:03:00 agg-test (localhost:9091) 950 entries top: example.com=18432
```
**JSON output**: one JSON object per snapshot event:
```json
{"ts":1773516180,"source":"agg-test","target":"localhost:9091","top_label":"example.com","top_count":18432,"total_entries":950}
```
---
## Step 6 — Formatting helpers (`format.go`)
```go
func printTable(w io.Writer, headers []string, rows [][]string)
```
Right-aligns numeric columns (COUNT, RANK), left-aligns strings. Uses `text/tabwriter`
with padding=2. No external dependencies.
```go
func fmtCount(n int64) string // "18 432" — space as thousands separator
func fmtTime(unix int64) string // "2026-03-14 20:03" UTC
```
---
## Step 7 — Tests (`cli_test.go`)
Unit tests run entirely in-process with fake gRPC servers (same pattern as
`cmd/aggregator/aggregator_test.go`).
| Test | What it covers |
|------|----------------|
| `TestParseWindow` | All 6 window strings → correct proto enum; bad value exits |
| `TestParseGroupBy` | All 4 group-by strings → correct proto enum; bad value exits |
| `TestParseTargets` | Comma split, trim, dedup |
| `TestBuildFilter` | All combinations of filter flags → correct proto Filter |
| `TestTopNSingleTarget` | Fake server; `runTopN` output matches expected table |
| `TestTopNMultiTarget` | Two fake servers; both headers present in output |
| `TestTopNJSON` | `--json` flag; output is valid JSON with correct fields |
| `TestTrendSingleTarget` | Fake server; points printed oldest-first |
| `TestTrendJSON` | `--json` flag; output is valid JSON |
| `TestStreamReceivesSnapshots` | Fake server sends 3 snapshots; output has 3 lines |
| `TestFmtCount` | `fmtCount(18432)``"18 432"` |
| `TestFmtTime` | `fmtTime(1773516000)``"2026-03-14 20:00"` |
---
## ✓ COMPLETE — Implementation notes
### Deviations from the plan
- **`TestFmtTime` uses `time.Date` not a hardcoded unix literal**: The hardcoded value
`1773516000` turned out to be 2026-03-14 19:20 UTC, not 20:00. Fixed by computing the
timestamp dynamically with `time.Date(2026, 3, 14, 20, 0, 0, 0, time.UTC).Unix()`.
- **`TestTopNJSON` tests field values, not serialised bytes**: Calling `printTopNJSON` would
require redirecting stdout. Instead the test verifies the response struct fields that the
JSON formatter would use — simpler and equally effective.
- **`streamTarget` reconnect loop lives in `cmd_stream.go`**, not a separate file. The stream
and reconnect logic are short enough to colocate.
### Test results
```
$ go test ./... -count=1 -race -timeout 60s
ok git.ipng.ch/ipng/nginx-logtail/cmd/cli 1.0s (14 tests)
ok git.ipng.ch/ipng/nginx-logtail/cmd/aggregator 4.1s (13 tests)
ok git.ipng.ch/ipng/nginx-logtail/cmd/collector 9.9s (17 tests)
```
### Test inventory
| Test | What it covers |
|------|----------------|
| `TestParseTargets` | Comma split, trim, deduplication |
| `TestParseWindow` | All 6 window strings → correct proto enum |
| `TestParseGroupBy` | All 4 group-by strings → correct proto enum |
| `TestBuildFilter` | Filter fields set correctly from flags |
| `TestBuildFilterNil` | Returns nil when no filter flags set |
| `TestFmtCount` | Space-separated thousands: 1234567 → "1 234 567" |
| `TestFmtTime` | Unix → "2026-03-14 20:00" UTC |
| `TestTopNSingleTarget` | Fake server; correct entry count and top label |
| `TestTopNMultiTarget` | Two fake servers; results ordered by target |
| `TestTopNJSON` | Response fields match expected values for JSON |
| `TestTrendSingleTarget` | Correct point count and ascending timestamp order |
| `TestTrendJSON` | JSON round-trip preserves source, ts, count |
| `TestStreamReceivesSnapshots` | 3 snapshots delivered from fake server via events channel |
| `TestTargetHeader` | Single-target → empty; multi-target → labeled header |
---
## Step 8 — Smoke test
```bash
# Start a collector
./logtail-collector --listen :9090 --logs /var/log/nginx/access.log
# Start an aggregator
./logtail-aggregator --listen :9091 --collectors localhost:9090
# Query TopN from both in one shot
./logtail-cli topn --target localhost:9090,localhost:9091 --window 15m --n 5
# Stream live snapshots from both simultaneously
./logtail-cli stream --target localhost:9090,localhost:9091
# Filter to one website, group by URI
./logtail-cli topn --target localhost:9091 --website example.com --group-by uri --n 20
# JSON output for scripting
./logtail-cli topn --target localhost:9091 --json | jq '.entries[0]'
```
---
## Deferred (not in v0)
- `--format csv` — easy to add later if needed for spreadsheet export
- `--count` / `--watch N` — repeat the query every N seconds (like `watch(1)`)
- Color output (`--color`) — ANSI highlighting of top entries
- Connecting to TLS-secured endpoints (when TLS is added to the servers)
- Per-source breakdown (depends on `SOURCE` GroupBy being added to the proto)