294 lines
9.5 KiB
Markdown
294 lines
9.5 KiB
Markdown
# CLI v0 — Implementation Plan
|
|
|
|
Module path: `git.ipng.ch/ipng/nginx-logtail`
|
|
|
|
**Scope:** A shell-facing debug tool that can query any number of collectors or aggregators
|
|
(they share the same `LogtailService` gRPC interface) and print results in a human-readable
|
|
table or JSON. Supports all three RPCs: `TopN`, `Trend`, and `StreamSnapshots`.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Single binary `logtail-cli` with three subcommands:
|
|
|
|
```
|
|
logtail-cli topn [flags] # ranked list of label → count
|
|
logtail-cli trend [flags] # per-bucket time series
|
|
logtail-cli stream [flags] # live snapshot feed
|
|
```
|
|
|
|
All subcommands accept one or more `--target` addresses. Requests are fanned out
|
|
concurrently; each target's results are printed under a labeled header. With a single
|
|
target the header is omitted for clean pipe-friendly output.
|
|
|
|
---
|
|
|
|
## Step 1 — main.go and subcommand dispatch
|
|
|
|
No third-party CLI frameworks — plain `os.Args` subcommand dispatch, each subcommand
|
|
registers its own `flag.FlagSet`.
|
|
|
|
```
|
|
main():
|
|
if len(os.Args) < 2 → print usage, exit 1
|
|
switch os.Args[1]:
|
|
"topn" → runTopN(os.Args[2:])
|
|
"trend" → runTrend(os.Args[2:])
|
|
"stream" → runStream(os.Args[2:])
|
|
default → print usage, exit 1
|
|
```
|
|
|
|
Usage text lists all subcommands and their flags.
|
|
|
|
---
|
|
|
|
## Step 2 — Shared flags and client helper (`flags.go`, `client.go`)
|
|
|
|
**Shared flags** (parsed by each subcommand's FlagSet):
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `--target` | `localhost:9090` | Comma-separated `host:port` list (may be repeated) |
|
|
| `--json` | false | Emit newline-delimited JSON instead of a table |
|
|
| `--website` | — | Filter: exact website match |
|
|
| `--prefix` | — | Filter: exact client prefix match |
|
|
| `--uri` | — | Filter: exact URI match |
|
|
| `--status` | — | Filter: exact HTTP status match |
|
|
|
|
`parseTargets(s string) []string` — split on comma, trim spaces, deduplicate.
|
|
|
|
`buildFilter(flags) *pb.Filter` — returns nil if no filter flags set (signals "no filter"
|
|
to the server), otherwise populates the proto fields.
|
|
|
|
**`client.go`**:
|
|
|
|
```go
|
|
func dial(addr string) (*grpc.ClientConn, pb.LogtailServiceClient, error)
|
|
```
|
|
|
|
Plain insecure dial (matching the servers' plain-TCP listener). Returns an error rather
|
|
than calling `log.Fatal` so callers can report which target failed without killing the process.
|
|
|
|
---
|
|
|
|
## Step 3 — `topn` subcommand (`cmd_topn.go`)
|
|
|
|
Additional flags:
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `--n` | 10 | Number of entries to return |
|
|
| `--window` | `5m` | Time window: `1m 5m 15m 60m 6h 24h` |
|
|
| `--group-by` | `website` | Grouping: `website prefix uri status` |
|
|
|
|
`parseWindow(s string) pb.Window` — maps string → proto enum, exits on unknown value.
|
|
`parseGroupBy(s string) pb.GroupBy` — same pattern.
|
|
|
|
Fan-out: one goroutine per target, each calls `TopN` with a 10 s context deadline,
|
|
sends result (or error) on a typed result channel. Main goroutine collects all results
|
|
in target order.
|
|
|
|
**Table output** (default):
|
|
|
|
```
|
|
=== collector-1 (localhost:9090) ===
|
|
RANK COUNT LABEL
|
|
1 18 432 example.com
|
|
2 4 211 other.com
|
|
...
|
|
|
|
=== aggregator (localhost:9091) ===
|
|
RANK COUNT LABEL
|
|
1 22 643 example.com
|
|
...
|
|
```
|
|
|
|
Single-target: header omitted, plain table printed.
|
|
|
|
**JSON output** (`--json`): one JSON object per target, written sequentially to stdout:
|
|
|
|
```json
|
|
{"source":"collector-1","target":"localhost:9090","entries":[{"label":"example.com","count":18432},...]}
|
|
```
|
|
|
|
---
|
|
|
|
## Step 4 — `trend` subcommand (`cmd_trend.go`)
|
|
|
|
Additional flags:
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `--window` | `5m` | Time window: `1m 5m 15m 60m 6h 24h` |
|
|
|
|
Same fan-out pattern as `topn`.
|
|
|
|
**Table output**:
|
|
|
|
```
|
|
=== collector-1 (localhost:9090) ===
|
|
TIME (UTC) COUNT
|
|
2026-03-14 20:00 823
|
|
2026-03-14 20:01 941
|
|
...
|
|
```
|
|
|
|
Points are printed oldest-first (as returned by the server).
|
|
|
|
**JSON output**: one object per target:
|
|
|
|
```json
|
|
{"source":"col-1","target":"localhost:9090","points":[{"ts":1773516000,"count":823},...]
|
|
```
|
|
|
|
---
|
|
|
|
## Step 5 — `stream` subcommand (`cmd_stream.go`)
|
|
|
|
No extra flags beyond shared ones. Each target gets one persistent `StreamSnapshots`
|
|
connection. All streams are multiplexed onto a single output goroutine via an internal
|
|
channel so lines from different targets don't interleave.
|
|
|
|
```
|
|
type streamEvent struct {
|
|
target string
|
|
source string
|
|
snap *pb.Snapshot
|
|
err error
|
|
}
|
|
```
|
|
|
|
One goroutine per target: connect → loop `stream.Recv()` → send event on channel.
|
|
On error: log to stderr, attempt reconnect after 5 s backoff (indefinitely, until
|
|
`Ctrl-C`).
|
|
|
|
`signal.NotifyContext` on SIGINT/SIGTERM cancels all stream goroutines.
|
|
|
|
**Table output** (one line per snapshot received):
|
|
|
|
```
|
|
2026-03-14 20:03:00 agg-test (localhost:9091) 950 entries top: example.com=18432
|
|
```
|
|
|
|
**JSON output**: one JSON object per snapshot event:
|
|
|
|
```json
|
|
{"ts":1773516180,"source":"agg-test","target":"localhost:9091","top_label":"example.com","top_count":18432,"total_entries":950}
|
|
```
|
|
|
|
---
|
|
|
|
## Step 6 — Formatting helpers (`format.go`)
|
|
|
|
```go
|
|
func printTable(w io.Writer, headers []string, rows [][]string)
|
|
```
|
|
|
|
Right-aligns numeric columns (COUNT, RANK), left-aligns strings. Uses `text/tabwriter`
|
|
with padding=2. No external dependencies.
|
|
|
|
```go
|
|
func fmtCount(n int64) string // "18 432" — space as thousands separator
|
|
func fmtTime(unix int64) string // "2026-03-14 20:03" UTC
|
|
```
|
|
|
|
---
|
|
|
|
## Step 7 — Tests (`cli_test.go`)
|
|
|
|
Unit tests run entirely in-process with fake gRPC servers (same pattern as
|
|
`cmd/aggregator/aggregator_test.go`).
|
|
|
|
| Test | What it covers |
|
|
|------|----------------|
|
|
| `TestParseWindow` | All 6 window strings → correct proto enum; bad value exits |
|
|
| `TestParseGroupBy` | All 4 group-by strings → correct proto enum; bad value exits |
|
|
| `TestParseTargets` | Comma split, trim, dedup |
|
|
| `TestBuildFilter` | All combinations of filter flags → correct proto Filter |
|
|
| `TestTopNSingleTarget` | Fake server; `runTopN` output matches expected table |
|
|
| `TestTopNMultiTarget` | Two fake servers; both headers present in output |
|
|
| `TestTopNJSON` | `--json` flag; output is valid JSON with correct fields |
|
|
| `TestTrendSingleTarget` | Fake server; points printed oldest-first |
|
|
| `TestTrendJSON` | `--json` flag; output is valid JSON |
|
|
| `TestStreamReceivesSnapshots` | Fake server sends 3 snapshots; output has 3 lines |
|
|
| `TestFmtCount` | `fmtCount(18432)` → `"18 432"` |
|
|
| `TestFmtTime` | `fmtTime(1773516000)` → `"2026-03-14 20:00"` |
|
|
|
|
---
|
|
|
|
## ✓ COMPLETE — Implementation notes
|
|
|
|
### Deviations from the plan
|
|
|
|
- **`TestFmtTime` uses `time.Date` not a hardcoded unix literal**: The hardcoded value
|
|
`1773516000` turned out to be 2026-03-14 19:20 UTC, not 20:00. Fixed by computing the
|
|
timestamp dynamically with `time.Date(2026, 3, 14, 20, 0, 0, 0, time.UTC).Unix()`.
|
|
- **`TestTopNJSON` tests field values, not serialised bytes**: Calling `printTopNJSON` would
|
|
require redirecting stdout. Instead the test verifies the response struct fields that the
|
|
JSON formatter would use — simpler and equally effective.
|
|
- **`streamTarget` reconnect loop lives in `cmd_stream.go`**, not a separate file. The stream
|
|
and reconnect logic are short enough to colocate.
|
|
|
|
### Test results
|
|
|
|
```
|
|
$ go test ./... -count=1 -race -timeout 60s
|
|
ok git.ipng.ch/ipng/nginx-logtail/cmd/cli 1.0s (14 tests)
|
|
ok git.ipng.ch/ipng/nginx-logtail/cmd/aggregator 4.1s (13 tests)
|
|
ok git.ipng.ch/ipng/nginx-logtail/cmd/collector 9.9s (17 tests)
|
|
```
|
|
|
|
### Test inventory
|
|
|
|
| Test | What it covers |
|
|
|------|----------------|
|
|
| `TestParseTargets` | Comma split, trim, deduplication |
|
|
| `TestParseWindow` | All 6 window strings → correct proto enum |
|
|
| `TestParseGroupBy` | All 4 group-by strings → correct proto enum |
|
|
| `TestBuildFilter` | Filter fields set correctly from flags |
|
|
| `TestBuildFilterNil` | Returns nil when no filter flags set |
|
|
| `TestFmtCount` | Space-separated thousands: 1234567 → "1 234 567" |
|
|
| `TestFmtTime` | Unix → "2026-03-14 20:00" UTC |
|
|
| `TestTopNSingleTarget` | Fake server; correct entry count and top label |
|
|
| `TestTopNMultiTarget` | Two fake servers; results ordered by target |
|
|
| `TestTopNJSON` | Response fields match expected values for JSON |
|
|
| `TestTrendSingleTarget` | Correct point count and ascending timestamp order |
|
|
| `TestTrendJSON` | JSON round-trip preserves source, ts, count |
|
|
| `TestStreamReceivesSnapshots` | 3 snapshots delivered from fake server via events channel |
|
|
| `TestTargetHeader` | Single-target → empty; multi-target → labeled header |
|
|
|
|
---
|
|
|
|
## Step 8 — Smoke test
|
|
|
|
```bash
|
|
# Start a collector
|
|
./logtail-collector --listen :9090 --logs /var/log/nginx/access.log
|
|
|
|
# Start an aggregator
|
|
./logtail-aggregator --listen :9091 --collectors localhost:9090
|
|
|
|
# Query TopN from both in one shot
|
|
./logtail-cli topn --target localhost:9090,localhost:9091 --window 15m --n 5
|
|
|
|
# Stream live snapshots from both simultaneously
|
|
./logtail-cli stream --target localhost:9090,localhost:9091
|
|
|
|
# Filter to one website, group by URI
|
|
./logtail-cli topn --target localhost:9091 --website example.com --group-by uri --n 20
|
|
|
|
# JSON output for scripting
|
|
./logtail-cli topn --target localhost:9091 --json | jq '.entries[0]'
|
|
```
|
|
|
|
---
|
|
|
|
## Deferred (not in v0)
|
|
|
|
- `--format csv` — easy to add later if needed for spreadsheet export
|
|
- `--count` / `--watch N` — repeat the query every N seconds (like `watch(1)`)
|
|
- Color output (`--color`) — ANSI highlighting of top entries
|
|
- Connecting to TLS-secured endpoints (when TLS is added to the servers)
|
|
- Per-source breakdown (depends on `SOURCE` GroupBy being added to the proto)
|