Update docs with collector, aggregator, CLI and frontend
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
## Overview
|
||||
|
||||
nginx-logtail is a three-component system for real-time traffic analysis across a cluster of nginx
|
||||
nginx-logtail is a four-component system for real-time traffic analysis across a cluster of nginx
|
||||
machines. It answers questions like:
|
||||
|
||||
- Which client prefix is causing the most HTTP 429s right now?
|
||||
@@ -166,8 +166,10 @@ WantedBy=multi-user.target
|
||||
|
||||
## Aggregator
|
||||
|
||||
Runs on a central machine. Connects to all collectors via gRPC streaming, merges their snapshots
|
||||
into a unified view, and serves the same gRPC interface as the collector.
|
||||
Runs on a central machine. Subscribes to the `StreamSnapshots` push stream from every configured
|
||||
collector, merges their snapshots into a unified in-memory cache, and serves the same gRPC
|
||||
interface as the collector. The frontend and CLI query the aggregator exactly as they would query
|
||||
a single collector.
|
||||
|
||||
### Flags
|
||||
|
||||
@@ -177,100 +179,216 @@ into a unified view, and serves the same gRPC interface as the collector.
|
||||
| `--collectors` | — | Comma-separated `host:port` addresses of collectors |
|
||||
| `--source` | hostname | Name for this aggregator in query responses |
|
||||
|
||||
`--collectors` is required; the aggregator exits immediately if it is not set.
|
||||
|
||||
### Example
|
||||
|
||||
```bash
|
||||
./aggregator \
|
||||
--collectors nginx1:9090,nginx2:9090,nginx3:9090 \
|
||||
--listen :9091
|
||||
--listen :9091 \
|
||||
--source agg-prod
|
||||
```
|
||||
|
||||
The aggregator tolerates collector failures — if one collector is unreachable, results from the
|
||||
remaining collectors are returned with a warning. It reconnects automatically with backoff.
|
||||
### Fault tolerance
|
||||
|
||||
The aggregator reconnects to each collector independently with exponential backoff (100 ms →
|
||||
doubles → cap 30 s). After 3 consecutive failures to a collector it marks that collector
|
||||
**degraded**: its last-known contribution is subtracted from the merged view so stale counts
|
||||
do not accumulate. When the collector recovers and sends a new snapshot, it is automatically
|
||||
reintegrated. The remaining collectors continue serving queries throughout.
|
||||
|
||||
### Memory
|
||||
|
||||
The aggregator's merged cache uses the same tiered ring-buffer structure as the collector
|
||||
(60 × 1-min fine, 288 × 5-min coarse) but holds at most top-50 000 entries per fine bucket
|
||||
and top-5 000 per coarse bucket across all collectors combined. Memory footprint is roughly
|
||||
the same as one collector (~845 MB worst case).
|
||||
|
||||
### Systemd unit example
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=nginx-logtail aggregator
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
ExecStart=/usr/local/bin/aggregator \
|
||||
--collectors nginx1:9090,nginx2:9090,nginx3:9090 \
|
||||
--listen :9091 \
|
||||
--source %H
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Frontend
|
||||
|
||||
HTTP dashboard. Connects to the aggregator (or directly to a single collector for debugging).
|
||||
Zero JavaScript — server-rendered HTML with inline SVG sparklines.
|
||||
|
||||
### Flags
|
||||
|
||||
| Flag | Default | Description |
|
||||
|-------------|--------------|---------------------------------------|
|
||||
| `--listen` | `:8080` | HTTP listen address |
|
||||
| `--target` | `localhost:9091` | gRPC address of aggregator or collector |
|
||||
| Flag | Default | Description |
|
||||
|-------------|-------------------|--------------------------------------------------|
|
||||
| `--listen` | `:8080` | HTTP listen address |
|
||||
| `--target` | `localhost:9091` | Default gRPC endpoint (aggregator or collector) |
|
||||
| `--n` | `25` | Default number of table rows |
|
||||
| `--refresh` | `30` | Auto-refresh interval in seconds; `0` to disable |
|
||||
|
||||
### Usage
|
||||
|
||||
Navigate to `http://your-host:8080`. The dashboard shows a ranked table of the top entries for
|
||||
the selected dimension and time window.
|
||||
|
||||
**Filter controls:**
|
||||
- Click any row to add that value as a filter (e.g. click a website to restrict to it)
|
||||
- The filter breadcrumb at the top shows all active filters; click any token to remove it
|
||||
- Use the window tabs to switch between 1m / 5m / 15m / 60m / 6h / 24h
|
||||
- The page auto-refreshes every 30 seconds
|
||||
**Window tabs** — switch between `1m / 5m / 15m / 60m / 6h / 24h`. Only the window changes;
|
||||
all active filters are preserved.
|
||||
|
||||
**Dimension selector:** switch between grouping by Website, Client Prefix, Request URI, or HTTP
|
||||
Status using the tabs at the top of the table.
|
||||
**Dimension tabs** — switch between grouping by `website / prefix / uri / status`.
|
||||
|
||||
**Sparkline:** the trend chart shows total request count per bucket for the selected window and
|
||||
active filters. Useful for spotting sudden spikes.
|
||||
**Drilldown** — click any table row to add that value as a filter and advance to the next
|
||||
dimension in the hierarchy:
|
||||
|
||||
**URL sharing:** all filter state is in the URL query string — copy the URL to share a specific
|
||||
view with another operator.
|
||||
```
|
||||
website → client prefix → request URI → HTTP status → website (cycles)
|
||||
```
|
||||
|
||||
Example: click `example.com` in the website view to see which client prefixes are hitting it;
|
||||
click a prefix there to see which URIs it is requesting; and so on.
|
||||
|
||||
**Breadcrumb strip** — shows all active filters above the table. Click `×` next to any token
|
||||
to remove just that filter, keeping the others.
|
||||
|
||||
**Sparkline** — inline SVG trend chart showing total request count per time bucket for the
|
||||
current filter state. Useful for spotting sudden spikes or sustained DDoS ramps.
|
||||
|
||||
**URL sharing** — all filter state is in the URL query string (`w`, `by`, `f_website`,
|
||||
`f_prefix`, `f_uri`, `f_status`, `n`). Copy the URL to share an exact view with another
|
||||
operator, or bookmark a recurring query.
|
||||
|
||||
**JSON output** — append `&raw=1` to any URL to receive the TopN result as JSON instead of
|
||||
HTML. Useful for scripting without the CLI binary:
|
||||
|
||||
```bash
|
||||
curl -s 'http://frontend:8080/?f_status=429&by=prefix&w=1m&raw=1' | jq '.entries[0]'
|
||||
```
|
||||
|
||||
**Target override** — append `?target=host:port` to point the frontend at a different gRPC
|
||||
endpoint for that request (useful for comparing a single collector against the aggregator):
|
||||
|
||||
```bash
|
||||
http://frontend:8080/?target=nginx3:9090&w=5m
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI
|
||||
|
||||
A shell companion for one-off queries and debugging. Outputs JSON; pipe to `jq` for filtering.
|
||||
A shell companion for one-off queries and debugging. Works with any `LogtailService` endpoint —
|
||||
collector or aggregator. Accepts multiple targets, fans out concurrently, and labels each result.
|
||||
Default output is a human-readable table; add `--json` for machine-readable NDJSON.
|
||||
|
||||
### Subcommands
|
||||
|
||||
```
|
||||
cli topn --target HOST:PORT [filters] [--by DIM] [--window W] [--n N] [--pretty]
|
||||
cli trend --target HOST:PORT [filters] [--window W] [--pretty]
|
||||
cli stream --target HOST:PORT [--pretty]
|
||||
logtail-cli topn [flags] ranked label → count table
|
||||
logtail-cli trend [flags] per-bucket time series
|
||||
logtail-cli stream [flags] live snapshot feed (runs until Ctrl-C)
|
||||
```
|
||||
|
||||
### Common flags
|
||||
### Shared flags (all subcommands)
|
||||
|
||||
| Flag | Default | Description |
|
||||
|---------------|------------------|----------------------------------------------------------|
|
||||
| `--target` | `localhost:9090` | gRPC address of collector or aggregator |
|
||||
| `--by` | `website` | Dimension: `website` `prefix` `uri` `status` |
|
||||
| `--window` | `5m` | Window: `1m` `5m` `15m` `60m` `6h` `24h` |
|
||||
| `--n` | `10` | Number of results |
|
||||
| `--target` | `localhost:9090` | Comma-separated `host:port` list; queries fan out to all |
|
||||
| `--json` | false | Emit newline-delimited JSON instead of a table |
|
||||
| `--website` | — | Filter to this website |
|
||||
| `--prefix` | — | Filter to this client prefix |
|
||||
| `--uri` | — | Filter to this request URI |
|
||||
| `--status` | — | Filter to this HTTP status code |
|
||||
| `--pretty` | false | Pretty-print JSON |
|
||||
| `--status` | — | Filter to this HTTP status code (integer) |
|
||||
|
||||
### `topn` flags
|
||||
|
||||
| Flag | Default | Description |
|
||||
|---------------|------------|----------------------------------------------------------|
|
||||
| `--n` | `10` | Number of entries |
|
||||
| `--window` | `5m` | `1m` `5m` `15m` `60m` `6h` `24h` |
|
||||
| `--group-by` | `website` | `website` `prefix` `uri` `status` |
|
||||
|
||||
### `trend` flags
|
||||
|
||||
| Flag | Default | Description |
|
||||
|---------------|------------|----------------------------------------------------------|
|
||||
| `--window` | `5m` | `1m` `5m` `15m` `60m` `6h` `24h` |
|
||||
|
||||
### Output format
|
||||
|
||||
**Table** (default — single target, no header):
|
||||
```
|
||||
RANK COUNT LABEL
|
||||
1 18 432 example.com
|
||||
2 4 211 other.com
|
||||
```
|
||||
|
||||
**Multi-target** — each target gets a labeled section:
|
||||
```
|
||||
=== col-1 (nginx1:9090) ===
|
||||
RANK COUNT LABEL
|
||||
1 10 000 example.com
|
||||
|
||||
=== agg-prod (agg:9091) ===
|
||||
RANK COUNT LABEL
|
||||
1 18 432 example.com
|
||||
```
|
||||
|
||||
**JSON** (`--json`) — one object per target, suitable for `jq`:
|
||||
```json
|
||||
{"source":"agg-prod","target":"agg:9091","entries":[{"label":"example.com","count":18432},...]}
|
||||
```
|
||||
|
||||
**`stream` JSON** — one object per snapshot received (NDJSON), runs until interrupted:
|
||||
```json
|
||||
{"ts":1773516180,"source":"col-1","target":"nginx1:9090","total_entries":823,"top_label":"example.com","top_count":10000}
|
||||
```
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# Top 20 client prefixes sending 429s right now
|
||||
cli topn --target agg:9091 --window 1m --by prefix --status 429 --n 20 | jq '.entries[]'
|
||||
logtail-cli topn --target agg:9091 --window 1m --group-by prefix --status 429 --n 20
|
||||
|
||||
# Which website has the most 503s in the last 24h?
|
||||
cli topn --target agg:9091 --window 24h --by website --status 503
|
||||
# Same query, pipe to jq for scripting
|
||||
logtail-cli topn --target agg:9091 --window 1m --group-by prefix --status 429 --n 20 \
|
||||
--json | jq '.entries[0]'
|
||||
|
||||
# Trend of 429s on one site over 6h — pipe to a quick graph
|
||||
cli trend --target agg:9091 --window 6h --website api.example.com \
|
||||
| jq '[.points[] | {t: .time, n: .count}]'
|
||||
# Which website has the most 503s over the last 24h?
|
||||
logtail-cli topn --target agg:9091 --window 24h --group-by website --status 503
|
||||
|
||||
# Watch live snapshots from one collector; alert on large entry counts
|
||||
cli stream --target nginx3:9090 | jq -c 'select(.entry_count > 50000)'
|
||||
# Drill: top URIs on one website over the last 60 minutes
|
||||
logtail-cli topn --target agg:9091 --window 60m --group-by uri --website api.example.com
|
||||
|
||||
# Query a single collector directly (bypass aggregator)
|
||||
cli topn --target nginx1:9090 --window 5m --by prefix --pretty
|
||||
# Compare two collectors side by side in one command
|
||||
logtail-cli topn --target nginx1:9090,nginx2:9090 --window 5m
|
||||
|
||||
# Query both a collector and the aggregator at once
|
||||
logtail-cli topn --target nginx3:9090,agg:9091 --window 5m --group-by prefix
|
||||
|
||||
# Trend of total traffic over 6h (for a quick sparkline in the terminal)
|
||||
logtail-cli trend --target agg:9091 --window 6h --json | jq '[.points[] | .count]'
|
||||
|
||||
# Watch live merged snapshots from the aggregator
|
||||
logtail-cli stream --target agg:9091
|
||||
|
||||
# Watch two collectors simultaneously; each snapshot is labeled by source
|
||||
logtail-cli stream --target nginx1:9090,nginx2:9090
|
||||
```
|
||||
|
||||
The `stream` subcommand emits one JSON object per line (NDJSON) and runs until interrupted.
|
||||
Exit code is non-zero on any gRPC error.
|
||||
The `stream` subcommand reconnects automatically after errors (5 s backoff) and runs until
|
||||
interrupted with Ctrl-C. The `topn` and `trend` subcommands exit immediately after one response.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user