Files
nginx-logtail/PLAN_CLI.md
2026-03-14 20:30:23 +01:00

9.5 KiB

CLI v0 — Implementation Plan

Module path: git.ipng.ch/ipng/nginx-logtail

Scope: A shell-facing debug tool that can query any number of collectors or aggregators (they share the same LogtailService gRPC interface) and print results in a human-readable table or JSON. Supports all three RPCs: TopN, Trend, and StreamSnapshots.


Overview

Single binary logtail-cli with three subcommands:

logtail-cli topn    [flags]   # ranked list of label → count
logtail-cli trend   [flags]   # per-bucket time series
logtail-cli stream  [flags]   # live snapshot feed

All subcommands accept one or more --target addresses. Requests are fanned out concurrently; each target's results are printed under a labeled header. With a single target the header is omitted for clean pipe-friendly output.


Step 1 — main.go and subcommand dispatch

No third-party CLI frameworks — plain os.Args subcommand dispatch, each subcommand registers its own flag.FlagSet.

main():
  if len(os.Args) < 2 → print usage, exit 1
  switch os.Args[1]:
    "topn"   → runTopN(os.Args[2:])
    "trend"  → runTrend(os.Args[2:])
    "stream" → runStream(os.Args[2:])
    default  → print usage, exit 1

Usage text lists all subcommands and their flags.


Step 2 — Shared flags and client helper (flags.go, client.go)

Shared flags (parsed by each subcommand's FlagSet):

Flag Default Description
--target localhost:9090 Comma-separated host:port list (may be repeated)
--json false Emit newline-delimited JSON instead of a table
--website Filter: exact website match
--prefix Filter: exact client prefix match
--uri Filter: exact URI match
--status Filter: exact HTTP status match

parseTargets(s string) []string — split on comma, trim spaces, deduplicate.

buildFilter(flags) *pb.Filter — returns nil if no filter flags set (signals "no filter" to the server), otherwise populates the proto fields.

client.go:

func dial(addr string) (*grpc.ClientConn, pb.LogtailServiceClient, error)

Plain insecure dial (matching the servers' plain-TCP listener). Returns an error rather than calling log.Fatal so callers can report which target failed without killing the process.


Step 3 — topn subcommand (cmd_topn.go)

Additional flags:

Flag Default Description
--n 10 Number of entries to return
--window 5m Time window: 1m 5m 15m 60m 6h 24h
--group-by website Grouping: website prefix uri status

parseWindow(s string) pb.Window — maps string → proto enum, exits on unknown value. parseGroupBy(s string) pb.GroupBy — same pattern.

Fan-out: one goroutine per target, each calls TopN with a 10 s context deadline, sends result (or error) on a typed result channel. Main goroutine collects all results in target order.

Table output (default):

=== collector-1 (localhost:9090) ===
RANK  COUNT   LABEL
   1  18 432  example.com
   2   4 211  other.com
   ...

=== aggregator (localhost:9091) ===
RANK  COUNT   LABEL
   1  22 643  example.com
   ...

Single-target: header omitted, plain table printed.

JSON output (--json): one JSON object per target, written sequentially to stdout:

{"source":"collector-1","target":"localhost:9090","entries":[{"label":"example.com","count":18432},...]}

Step 4 — trend subcommand (cmd_trend.go)

Additional flags:

Flag Default Description
--window 5m Time window: 1m 5m 15m 60m 6h 24h

Same fan-out pattern as topn.

Table output:

=== collector-1 (localhost:9090) ===
TIME (UTC)        COUNT
2026-03-14 20:00    823
2026-03-14 20:01    941
...

Points are printed oldest-first (as returned by the server).

JSON output: one object per target:

{"source":"col-1","target":"localhost:9090","points":[{"ts":1773516000,"count":823},...]

Step 5 — stream subcommand (cmd_stream.go)

No extra flags beyond shared ones. Each target gets one persistent StreamSnapshots connection. All streams are multiplexed onto a single output goroutine via an internal channel so lines from different targets don't interleave.

type streamEvent struct {
    target string
    source string
    snap   *pb.Snapshot
    err    error
}

One goroutine per target: connect → loop stream.Recv() → send event on channel. On error: log to stderr, attempt reconnect after 5 s backoff (indefinitely, until Ctrl-C).

signal.NotifyContext on SIGINT/SIGTERM cancels all stream goroutines.

Table output (one line per snapshot received):

2026-03-14 20:03:00  agg-test (localhost:9091)  950 entries  top: example.com=18432

JSON output: one JSON object per snapshot event:

{"ts":1773516180,"source":"agg-test","target":"localhost:9091","top_label":"example.com","top_count":18432,"total_entries":950}

Step 6 — Formatting helpers (format.go)

func printTable(w io.Writer, headers []string, rows [][]string)

Right-aligns numeric columns (COUNT, RANK), left-aligns strings. Uses text/tabwriter with padding=2. No external dependencies.

func fmtCount(n int64) string  // "18 432" — space as thousands separator
func fmtTime(unix int64) string // "2026-03-14 20:03" UTC

Step 7 — Tests (cli_test.go)

Unit tests run entirely in-process with fake gRPC servers (same pattern as cmd/aggregator/aggregator_test.go).

Test What it covers
TestParseWindow All 6 window strings → correct proto enum; bad value exits
TestParseGroupBy All 4 group-by strings → correct proto enum; bad value exits
TestParseTargets Comma split, trim, dedup
TestBuildFilter All combinations of filter flags → correct proto Filter
TestTopNSingleTarget Fake server; runTopN output matches expected table
TestTopNMultiTarget Two fake servers; both headers present in output
TestTopNJSON --json flag; output is valid JSON with correct fields
TestTrendSingleTarget Fake server; points printed oldest-first
TestTrendJSON --json flag; output is valid JSON
TestStreamReceivesSnapshots Fake server sends 3 snapshots; output has 3 lines
TestFmtCount fmtCount(18432)"18 432"
TestFmtTime fmtTime(1773516000)"2026-03-14 20:00"

✓ COMPLETE — Implementation notes

Deviations from the plan

  • TestFmtTime uses time.Date not a hardcoded unix literal: The hardcoded value 1773516000 turned out to be 2026-03-14 19:20 UTC, not 20:00. Fixed by computing the timestamp dynamically with time.Date(2026, 3, 14, 20, 0, 0, 0, time.UTC).Unix().
  • TestTopNJSON tests field values, not serialised bytes: Calling printTopNJSON would require redirecting stdout. Instead the test verifies the response struct fields that the JSON formatter would use — simpler and equally effective.
  • streamTarget reconnect loop lives in cmd_stream.go, not a separate file. The stream and reconnect logic are short enough to colocate.

Test results

$ go test ./... -count=1 -race -timeout 60s
ok  git.ipng.ch/ipng/nginx-logtail/cmd/cli         1.0s   (14 tests)
ok  git.ipng.ch/ipng/nginx-logtail/cmd/aggregator  4.1s   (13 tests)
ok  git.ipng.ch/ipng/nginx-logtail/cmd/collector   9.9s   (17 tests)

Test inventory

Test What it covers
TestParseTargets Comma split, trim, deduplication
TestParseWindow All 6 window strings → correct proto enum
TestParseGroupBy All 4 group-by strings → correct proto enum
TestBuildFilter Filter fields set correctly from flags
TestBuildFilterNil Returns nil when no filter flags set
TestFmtCount Space-separated thousands: 1234567 → "1 234 567"
TestFmtTime Unix → "2026-03-14 20:00" UTC
TestTopNSingleTarget Fake server; correct entry count and top label
TestTopNMultiTarget Two fake servers; results ordered by target
TestTopNJSON Response fields match expected values for JSON
TestTrendSingleTarget Correct point count and ascending timestamp order
TestTrendJSON JSON round-trip preserves source, ts, count
TestStreamReceivesSnapshots 3 snapshots delivered from fake server via events channel
TestTargetHeader Single-target → empty; multi-target → labeled header

Step 8 — Smoke test

# Start a collector
./logtail-collector --listen :9090 --logs /var/log/nginx/access.log

# Start an aggregator
./logtail-aggregator --listen :9091 --collectors localhost:9090

# Query TopN from both in one shot
./logtail-cli topn --target localhost:9090,localhost:9091 --window 15m --n 5

# Stream live snapshots from both simultaneously
./logtail-cli stream --target localhost:9090,localhost:9091

# Filter to one website, group by URI
./logtail-cli topn --target localhost:9091 --website example.com --group-by uri --n 20

# JSON output for scripting
./logtail-cli topn --target localhost:9091 --json | jq '.entries[0]'

Deferred (not in v0)

  • --format csv — easy to add later if needed for spreadsheet export
  • --count / --watch N — repeat the query every N seconds (like watch(1))
  • Color output (--color) — ANSI highlighting of top entries
  • Connecting to TLS-secured endpoints (when TLS is added to the servers)
  • Per-source breakdown (depends on SOURCE GroupBy being added to the proto)