# nginx-logtail User Guide ## Overview nginx-logtail is a four-component system for real-time traffic analysis across a cluster of nginx machines. It answers questions like: - Which client prefix is causing the most HTTP 429s right now? - Which website is getting the most 503s over the last 24 hours? - Which nginx machine is the busiest? - Is there a DDoS in progress, and from where? Components: | Binary | Runs on | Role | |---------------|------------------|----------------------------------------------------| | `collector` | each nginx host | Tails log files and/or UDP datagrams, aggregates in memory, serves gRPC | | `aggregator` | central host | Merges all collectors, serves unified gRPC | | `frontend` | central host | HTTP dashboard with drilldown UI | | `cli` | operator laptop | Shell queries against collector or aggregator | Every binary accepts `-version` (or `nginx-logtail version` for the CLI) and prints its version, git commit, and build date. --- ## Installation Three flavors. `make help` lists every target; `make install-deps` sets up a fresh build box (apt deps, Go toolchain, `protoc-gen-go`, `golangci-lint`). ### Debian package ```bash make pkg-deb # produces nginx-logtail__{amd64,arm64}.deb sudo dpkg -i nginx-logtail_*_amd64.deb ``` The package installs: | Path | Contents | |---------------------------------------------------------------|---------------------------------------------------| | `/usr/sbin/nginx-logtail-{collector,aggregator,frontend}` | Service binaries | | `/usr/bin/nginx-logtail` | CLI | | `/lib/systemd/system/nginx-logtail-*.service` | Three systemd units | | `/usr/share/man/man8/nginx-logtail.8.gz` | Manpage (`man 8 nginx-logtail`) | | `/usr/share/nginx-logtail/default.template` | Defaults template | | `/etc/default/nginx-logtail` | **Generated on first install** from the template | The postinst creates a system user/group `_logtail` if absent and renders the template into `/etc/default/nginx-logtail` with the short hostname substituted. **None of the services are enabled or started automatically** — installing the package is safe on any host. Operators opt in per service: ```bash sudo systemctl enable --now nginx-logtail-collector.service # on each nginx host sudo systemctl enable --now nginx-logtail-aggregator.service # on the central host sudo systemctl enable --now nginx-logtail-frontend.service # on the central host ``` The collector runs as `_logtail:www-data` so it can read nginx access logs that are group-readable by `www-data`; aggregator and frontend run as `_logtail:_logtail`. ### Docker / Docker Compose The repo's `docker-compose.yml` runs the aggregator and frontend together from a single image that contains all four binaries. ```bash make docker # builds git.ipng.ch/ipng/nginx-logtail:v + :latest, native arch make docker-push # multi-arch (amd64+arm64) buildx push AGGREGATOR_COLLECTORS=nginx1:9090,nginx2:9090 docker compose up -d # frontend on :8080, aggregator gRPC on :9091 ``` Each container explicitly selects its binary via `command: ["/usr/local/bin/"]`. ### From source ```bash git clone https://git.ipng.ch/ipng/nginx-logtail cd nginx-logtail make build # -> build//{collector,aggregator,frontend,cli} make test ./build/*/cli version ``` Requires Go ≥ 1.24 (see `go.mod`). No CGO, no external runtime dependencies. --- ## Configuration ### /etc/default/nginx-logtail The Debian package ships one shared environment file read by all three systemd units via `EnvironmentFile=-/etc/default/nginx-logtail`. It enumerates every flag the three daemons accept as a `COLLECTOR_*`, `AGGREGATOR_*`, or `FRONTEND_*` env var. Defaults on first install are sensible for a single-host deployment: | Variable | First-install default | Purpose | |----------------------------|------------------------------|---------------------------------------------------| | `COLLECTOR_LISTEN` | `:9090` | gRPC listen address | | `COLLECTOR_PROM_LISTEN` | `:9100` | Prometheus metrics; set `""` to disable | | `COLLECTOR_LOGS` | *(empty — UDP-only)* | Comma-sep log paths/globs | | `COLLECTOR_LOGS_FILE` | *(empty)* | File with one path/glob per line | | `COLLECTOR_SOURCE` | `$(hostname -s)` at install | Display name in query responses | | `COLLECTOR_V4PREFIX` | `24` | IPv4 bucket prefix | | `COLLECTOR_V6PREFIX` | `48` | IPv6 bucket prefix | | `COLLECTOR_SCAN_INTERVAL` | `10s` | Log-glob rescan cadence | | `COLLECTOR_LOGTAIL_PORT` | `9514` | UDP port for `ipng_stats_logtail` (0 disables) | | `COLLECTOR_LOGTAIL_BIND` | `127.0.0.1` | UDP bind address | | `AGGREGATOR_LISTEN` | `:9091` | gRPC listen address | | `AGGREGATOR_COLLECTORS` | `localhost:9090` | Comma-sep collectors (mandatory) | | `AGGREGATOR_SOURCE` | `$(hostname -s)` at install | Display name | | `FRONTEND_LISTEN` | `:8080` | HTTP dashboard address | | `FRONTEND_TARGET` | `localhost:9091` | Default gRPC endpoint | | `FRONTEND_N` | `25` | Default table row count | | `FRONTEND_REFRESH` | `30` | Meta-refresh seconds; `0` disables | At least one of `COLLECTOR_LOGS`, `COLLECTOR_LOGS_FILE`, or `COLLECTOR_LOGTAIL_PORT > 0` must be set, otherwise the collector refuses to start. The shipped default (`COLLECTOR_LOGS=` empty plus `COLLECTOR_LOGTAIL_PORT=9514`) makes the collector UDP-only — no file tailer goroutine is launched when no log patterns are supplied. Three escape-hatch variables — `COLLECTOR_ARGS`, `AGGREGATOR_ARGS`, `FRONTEND_ARGS` — are appended verbatim to each unit's `ExecStart` argv. Use them for flags without an env-var form, or for temporary overrides, without editing the unit. The file is **not a dpkg conffile**: postinst writes it only when absent, so operator edits survive upgrades, and `dpkg --purge` removes it. ### nginx — log format Both ingest paths (file and UDP) use the same versioned tab-separated format. Every line MUST begin with a literal `v1\t` or `v2\t` prefix; lines without a recognised prefix are dropped. Two versions are defined; you can mix them across a fleet during a rollout (the collector parses both). #### v2 (recommended) v2 carries five operationally important fields v1 lacks: `$bytes_sent` (full wire bytes, replaces `$body_bytes_sent`), `$request_length` (request size including headers), `$upstream_response_time`, and `$upstream_status`. Together they let dashboards split end-to-end latency into upstream vs. nginx overhead, attribute errors to the upstream vs. the edge, and report ingress bandwidth. ```nginx http { log_format ipng_stats_logtail 'v2\t$host\t$remote_addr\t$request_method\t$request_uri\t$status\t' '$bytes_sent\t$request_length\t$request_time\t$upstream_response_time\t$upstream_status\t' '$is_tor\t$asn\t$ipng_source_tag\t$server_addr\t$scheme'; # File ingest: server { access_log /var/log/nginx/access.log ipng_stats_logtail; } # UDP ingest (nginx-ipng-stats-plugin): ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 buffer=64k flush=1s; } ``` | # | Field | Ingested into | |---|---------------------------|---------------------------------------------------| | 0 | `v2` | version tag | | 1 | `$host` | `website` | | 2 | `$remote_addr` | `client_prefix` (truncated) | | 3 | `$request_method` | Prom `method` label | | 4 | `$request_uri` | `http_request_uri` (query stripped) | | 5 | `$status` | `http_response` | | 6 | `$bytes_sent` | Prom `nginx_http_bytes_sent` | | 7 | `$request_length` | Prom `nginx_http_request_bytes` | | 8 | `$request_time` | Prom `nginx_http_request_duration_seconds` | | 9 | `$upstream_response_time` | Prom `nginx_http_upstream_duration_seconds` | | 10| `$upstream_status` | Prom `nginx_http_upstream_requests_total` | | 11| `$is_tor` | `is_tor` | | 12| `$asn` | `asn` | | 13| `$ipng_source_tag` | `source_tag` | | 14| `$server_addr` | *(parsed and discarded)* | | 15| `$scheme` | *(parsed and discarded)* | For requests served without an upstream (static files, redirects, errors), nginx emits literal `-` for `$upstream_response_time` and `$upstream_status`; the parser treats those as "no upstream" and skips the upstream metrics rather than counting them as zeros. When nginx retries across multiple upstreams, both fields are comma-separated and the parser keeps the last value (the upstream that ultimately served the response). #### v1 (legacy) v1 is preserved unchanged so existing emitters can be upgraded after the collector. Layout: ```nginx log_format ipng_stats_logtail 'v1\t$host\t$remote_addr\t$request_method\t$request_uri\t$status\t' '$body_bytes_sent\t$request_time\t$is_tor\t$asn\t$ipng_source_tag\t$server_addr\t$scheme'; ``` 12 tab-separated payload fields after the `v1` prefix. v1 fills `nginx_http_bytes_sent` from `$body_bytes_sent`; v2 fills it from `$bytes_sent`. Operators will see a small step up in that metric (header overhead, typically a few hundred bytes per response) when emitters move to v2. #### Required values `$is_tor` is `1` if the client IP is a TOR exit node and `0` otherwise (typically populated via a Lua script or `$geoip2_data_*`). `$asn` is the client AS number as a decimal integer (e.g. MaxMind GeoIP2's `$geoip2_data_autonomous_system_number`). Operators without TOR or GeoIP data MUST emit literal `0` for both — a literal `0` in `$is_tor` parses as `false`; a literal `0` in `$asn` is ASN `0`, filterable at query time with `--asn '!=0'`. `$ipng_source_tag` is provided by [`nginx-ipng-stats-plugin`](https://git.ipng.ch/ipng/nginx-ipng-stats-plugin). Operators not running the plugin SHOULD declare a constant via `set $ipng_source_tag direct;` in their `server` block — there is no synthesised fallback in the collector. #### Pointing the collector at logs For file ingest, set `COLLECTOR_LOGS` to comma-separated paths or glob patterns. Make sure the files are group-readable by `www-data` (the collector's primary group in the systemd unit). For UDP ingest, the plugin's `ipng_stats_logtail udp://127.0.0.1:9514` line above is sufficient. Both paths can feed the same collector simultaneously and converge on the same aggregation pipeline. Malformed lines (wrong version, wrong field count, bad IP) are silently dropped; for UDP they show up as `logtail_udp_packets_received_total` minus `logtail_udp_loglines_success_total`. --- ## Collector Runs on each nginx machine. Ingests logs from files (via `fsnotify`) and/or UDP datagrams (from `nginx-ipng-stats-plugin`), maintains in-memory top-K counters across six time windows, and exposes a gRPC interface for the aggregator (and directly for the CLI). ### Flags | Flag | Default | Description | |-------------------|---------------|-------------------------------------------------------------------| | `--listen` | `:9090` | gRPC listen address | | `--prom-listen` | `:9100` | Prometheus metrics address; empty string to disable | | `--logs` | — | Comma-separated log file paths or glob patterns | | `--logs-file` | — | File containing one log path/glob per line | | `--source` | hostname | Name for this collector in query responses | | `--v4prefix` | `24` | IPv4 prefix length for client bucketing | | `--v6prefix` | `48` | IPv6 prefix length for client bucketing | | `--scan-interval` | `10s` | How often to rescan glob patterns for new/removed files | | `--logtail-port` | `0` (off) | UDP port receiving `ipng_stats_logtail` datagrams | | `--logtail-bind` | `127.0.0.1` | UDP bind address | | `--version` | — | Print version, commit, build date and exit | At least one of `--logs`, `--logs-file`, or `--logtail-port > 0` is required; otherwise the collector refuses to start. ### Examples ```bash # UDP-only (nginx-ipng-stats-plugin feed) ./collector --logtail-port 9514 # Single file ./collector --logs /var/log/nginx/access.log # Multiple files via glob (one inotify instance regardless of count) ./collector --logs "/var/log/nginx/*/access.log" # Files and UDP at the same time ./collector --logs "/var/log/nginx/*.log" --logtail-port 9514 # Many files via a config file ./collector --logs-file /etc/nginx-logtail/logs.conf # Custom prefix lengths and listen address ./collector \ --logs "/var/log/nginx/*.log" \ --listen :9091 \ --source nginx3.prod \ --v4prefix 24 \ --v6prefix 48 ``` ### logs-file format One path or glob pattern per line. Lines starting with `#` are ignored. ``` # /etc/nginx-logtail/logs.conf /var/log/nginx/access.log /var/log/nginx/*/access.log /var/log/nginx/api.example.com.access.log ``` ### Log rotation The collector handles logrotate automatically. On `RENAME`/`REMOVE` events it drains the old file descriptor to EOF (so no lines are lost), then retries opening the original path with backoff until the new file appears. No restart or SIGHUP required. ### Prometheus metrics The collector exposes a Prometheus-compatible `/metrics` endpoint on `--prom-listen` (default `:9100`). Set `--prom-listen ""` to disable it entirely. **Per-{host,source_tag} series** (both v1 and v2): - `nginx_http_requests_total{host, method, status}` — counter. Map capped at 250 000 distinct label sets; new entries beyond the cap are dropped until the map is rolled over. - `nginx_http_bytes_sent_{bucket,count,sum}{host, source_tag, le}` — histogram of response size. v1 fills from `$body_bytes_sent`; v2 fills from `$bytes_sent`. Buckets (bytes): `256, 1024, 4096, 16384, 65536, 262144, 1048576, +Inf`. - `nginx_http_request_duration_seconds_{bucket,count,sum}{host, source_tag, le}` — histogram of `$request_time`. Buckets (seconds): `0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, +Inf`. **v2-only series** (populated only when v2 emitters are running, and the upstream histograms only when nginx involved an upstream): - `nginx_http_request_bytes_{bucket,count,sum}{host, source_tag, le}` — histogram of `$request_length` (ingress, headers + body). Same byte buckets as `bytes_sent`. - `nginx_http_upstream_duration_seconds_{bucket,count,sum}{host, source_tag, le}` — histogram of `$upstream_response_time`. Same time buckets as `request_duration`. Lets you split end-to-end latency into upstream vs. nginx overhead. - `nginx_http_upstream_requests_total{host, source_tag, status_class}` — counter incremented once per upstream-served request, classed by `$upstream_status` (`2xx`/`3xx`/`4xx`/`5xx`/`other`). Lets you spot upstream errors masked at the edge (e.g. nginx 502 because origin 504). **Source-tag rollup** (fleet-wide attribution health, intentionally not crossed with host): - `nginx_http_requests_by_source_total{source_tag, status_class}` — counter classed by `$status`. Use it to spot per-source error spikes without exploding cardinality. **UDP ingest counters** — lets operators distinguish parse failures from back-pressure drops: - `logtail_udp_packets_received_total` — datagrams read off the socket. - `logtail_udp_loglines_success_total` — log lines parsed OK. - `logtail_udp_loglines_consumed_total` — log lines forwarded to the store (not dropped). Note the unit mismatch: `packets_*` counts datagrams, `loglines_*` counts log lines. The nginx plugin batches many log lines into a single UDP datagram (default `buffer=64k flush=1s`), so `loglines_success ≫ packets_received` is normal — operators should see roughly `loglines_success / packets_received ≈ avg lines per batch`. `loglines_success - loglines_consumed` is the back-pressure drop rate (channel full). A large gap between `packets_received * expected_lines_per_packet` and `loglines_success` indicates parse failures. **Prometheus scrape config:** ```yaml scrape_configs: - job_name: nginx_logtail static_configs: - targets: - nginx1:9100 - nginx2:9100 - nginx3:9100 ``` Or with service discovery — the collector has no special requirements beyond a reachable TCP port. **Example queries:** ```promql # Request rate per host over last 5 minutes rate(nginx_http_requests_total[5m]) # 5xx error rate fraction per host sum by (host) (rate(nginx_http_requests_total{status=~"5.."}[5m])) / sum by (host) (rate(nginx_http_requests_total[5m])) # 95th percentile response time per host histogram_quantile(0.95, sum by (host, le) (rate(nginx_http_request_duration_seconds_bucket[5m])) ) # 95th percentile response time per source_tag (drill in further as needed) histogram_quantile(0.95, sum by (source_tag, le) (rate(nginx_http_request_duration_seconds_bucket[5m])) ) # Median response size per host histogram_quantile(0.50, sum by (host, le) (rate(nginx_http_bytes_sent_bucket[5m])) ) # v2-only: upstream P95, split out from nginx overhead histogram_quantile(0.95, sum by (host, le) (rate(nginx_http_upstream_duration_seconds_bucket[5m])) ) # v2-only: upstream 5xx rate per source_tag sum by (source_tag) (rate(nginx_http_upstream_requests_total{status_class="5xx"}[5m])) ``` ### Memory usage The collector is designed to stay well under 1 GB: | Structure | Max entries | Approx size | |-----------------------------|-------------|-------------| | Live map (current minute) | 100 000 | ~19 MB | | Fine ring (60 × 1-min) | 60 × 50 000 | ~558 MB | | Coarse ring (288 × 5-min) | 288 × 5 000 | ~268 MB | | **Total** | | **~845 MB** | When the live map reaches 100 000 distinct 6-tuples, new keys are dropped for the rest of that minute. Existing keys continue to accumulate counts. The cap resets at each minute rotation. ### Time windows Data is served from two tiered ring buffers: | Window | Source ring | Resolution | |--------|-------------|------------| | 1 min | fine | 1 minute | | 5 min | fine | 1 minute | | 15 min | fine | 1 minute | | 60 min | fine | 1 minute | | 6 h | coarse | 5 minutes | | 24 h | coarse | 5 minutes | History is lost on restart — the collector resumes tailing immediately but all ring buffers start empty. The fine ring fills in 1 hour; the coarse ring fills in 24 hours. ### Running under systemd The Debian package ships `nginx-logtail-collector.service` ready to run under the `_logtail` system user with `Group=www-data` (for log-file access). Every flag comes from `/etc/default/nginx-logtail`. To operate it: ```bash sudo $EDITOR /etc/default/nginx-logtail # set COLLECTOR_LOGS / COLLECTOR_LOGTAIL_PORT sudo systemctl enable --now nginx-logtail-collector.service sudo systemctl status nginx-logtail-collector.service sudo journalctl -u nginx-logtail-collector.service -f ``` If you run from source without the package, compose a unit from the packaged template at `debian/nginx-logtail-collector.service`. --- ## Aggregator Runs on a central machine. Subscribes to the `StreamSnapshots` push stream from every configured collector, merges their snapshots into a unified in-memory cache, and serves the same gRPC interface as the collector. The frontend and CLI query the aggregator exactly as they would query a single collector. ### Flags | Flag | Default | Description | |----------------|-----------|--------------------------------------------------------| | `--listen` | `:9091` | gRPC listen address | | `--collectors` | — | Comma-separated `host:port` addresses of collectors | | `--source` | hostname | Name for this aggregator in query responses | `--collectors` is required; the aggregator exits immediately if it is not set. ### Example ```bash ./aggregator \ --collectors nginx1:9090,nginx2:9090,nginx3:9090 \ --listen :9091 \ --source agg-prod ``` ### Fault tolerance The aggregator reconnects to each collector independently with exponential backoff (100 ms → doubles → cap 30 s). After 3 consecutive failures to a collector it marks that collector **degraded**: its last-known contribution is subtracted from the merged view so stale counts do not accumulate. When the collector recovers and sends a new snapshot, it is automatically reintegrated. The remaining collectors continue serving queries throughout. ### Memory The aggregator's merged cache uses the same tiered ring-buffer structure as the collector (60 × 1-min fine, 288 × 5-min coarse) but holds at most top-50 000 entries per fine bucket and top-5 000 per coarse bucket across all collectors combined. Memory footprint is roughly the same as one collector (~845 MB worst case). ### Systemd unit example ```ini [Unit] Description=nginx-logtail aggregator After=network.target [Service] ExecStart=/usr/local/bin/aggregator \ --collectors nginx1:9090,nginx2:9090,nginx3:9090 \ --listen :9091 \ --source %H Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target ``` --- ## Frontend HTTP dashboard. Connects to the aggregator (or directly to a single collector for debugging). Zero JavaScript — server-rendered HTML with inline SVG sparklines. ### Flags | Flag | Default | Description | |-------------|-------------------|--------------------------------------------------| | `--listen` | `:8080` | HTTP listen address | | `--target` | `localhost:9091` | Default gRPC endpoint (aggregator or collector) | | `--n` | `25` | Default number of table rows | | `--refresh` | `30` | Auto-refresh interval in seconds; `0` to disable | ### Usage Navigate to `http://your-host:8080`. The dashboard shows a ranked table of the top entries for the selected dimension and time window. **Window tabs** — switch between `1m / 5m / 15m / 60m / 6h / 24h`. Only the window changes; all active filters are preserved. **Dimension tabs** — switch between grouping by `website / asn / prefix / status / uri / source`. **Drilldown** — click any table row to add that value as a filter and advance to the next dimension in the hierarchy: ``` website → client prefix → request URI → HTTP status → ASN → source_tag → website (cycles) ``` Example: click `example.com` in the website view to see which client prefixes are hitting it; click a prefix there to see which URIs it is requesting; and so on. **Breadcrumb strip** — shows all active filters above the table. Click `×` next to any token to remove just that filter, keeping the others. **Sparkline** — inline SVG trend chart showing total request count per time bucket for the current filter state. Useful for spotting sudden spikes or sustained DDoS ramps. **Filter expression box** — a text input above the table accepts a mini filter language that lets you type expressions directly without editing the URL: ``` status>=400 status>=400 AND website~=gouda.* status>=400 AND website~=gouda.* AND uri~="^/api/" website=example.com AND prefix=1.2.3.0/24 ``` Supported fields and operators: | Field | Operators | Example | |-----------|---------------------|-----------------------------------| | `status` | `=` `!=` `>` `>=` `<` `<=` | `status>=400` | | `website` | `=` `~=` | `website~=gouda.*` | | `uri` | `=` `~=` | `uri~=^/api/` | | `prefix` | `=` | `prefix=1.2.3.0/24` | | `is_tor` | `=` `!=` | `is_tor=1`, `is_tor!=0` | | `asn` | `=` `!=` `>` `>=` `<` `<=` | `asn=8298`, `asn>=1000` | | `source_tag` | `=` | `source_tag=direct`, `source_tag=cdn` | `is_tor=1` and `is_tor!=0` are equivalent (TOR traffic only). `is_tor=0` and `is_tor!=1` are equivalent (non-TOR traffic only). `asn` accepts the same comparison expressions as `status`. Use `asn=8298` to match a single AS, `asn>=64512` to match the private-use ASN range, or `asn!=0` to exclude unresolved entries. `~=` means RE2 regex match. Values with spaces or quotes may be wrapped in double or single quotes: `uri~="^/search\?q="`. The box pre-fills with the current active filter (including filters set by drilldown clicks), so you can see and extend what is applied. Submitting redirects to a clean URL with the individual filter params; `× clear` removes all filters at once. On a parse error the page re-renders with the error shown below the input and the current data and filters unchanged. **Status expressions** — the `f_status` URL param (and `status` in the expression box) accepts comparison expressions: `200`, `!=200`, `>=400`, `<500`, etc. **Regex filters** — `f_website_re` and `f_uri_re` URL params (and `~=` in the expression box) accept RE2 regular expressions. The breadcrumb strip shows them as `website~=gouda.*` and `uri~=^/api/` with the usual `×` remove link. **URL sharing** — all filter state is in the URL query string (`w`, `by`, `f_website`, `f_prefix`, `f_uri`, `f_status`, `f_website_re`, `f_uri_re`, `f_is_tor`, `f_asn`, `f_source_tag`, `n`). Copy the URL to share an exact view with another operator, or bookmark a recurring query. **JSON output** — append `&raw=1` to any URL to receive the TopN result as JSON instead of HTML. Useful for scripting without the CLI binary: ```bash # All 429s by prefix curl -s 'http://frontend:8080/?f_status=429&by=prefix&w=1m&raw=1' | jq '.entries[0]' # All errors (>=400) on gouda hosts curl -s 'http://frontend:8080/?f_status=%3E%3D400&f_website_re=gouda.*&by=uri&w=5m&raw=1' ``` **Target override** — append `?target=host:port` to point the frontend at a different gRPC endpoint for that request (useful for comparing a single collector against the aggregator): ```bash http://frontend:8080/?target=nginx3:9090&w=5m ``` **Source picker** — when the frontend is pointed at an aggregator, a `source:` tab row appears below the dimension tabs listing each individual collector alongside an **all** tab (the default merged view). Clicking a collector tab switches the frontend to query that collector directly for the current request, letting you answer "which nginx machine is the busiest?" without leaving the dashboard. The picker is hidden when querying a collector directly (it has no sub-sources to list). --- ## CLI A shell companion for one-off queries and debugging. Works with any `LogtailService` endpoint — collector or aggregator. Accepts multiple targets, fans out concurrently, and labels each result. Default output is a human-readable table; add `--json` for machine-readable NDJSON. ### Subcommands ``` logtail-cli topn [flags] ranked label → count table logtail-cli trend [flags] per-bucket time series logtail-cli stream [flags] live snapshot feed (runs until Ctrl-C) logtail-cli targets [flags] list targets known to the queried endpoint ``` ### Shared flags (all subcommands) | Flag | Default | Description | |---------------|------------------|----------------------------------------------------------| | `--target` | `localhost:9090` | Comma-separated `host:port` list; queries fan out to all | | `--json` | false | Emit newline-delimited JSON instead of a table | | `--website` | — | Filter to this website | | `--prefix` | — | Filter to this client prefix | | `--uri` | — | Filter to this request URI | | `--status` | — | Filter: HTTP status expression (`200`, `!=200`, `>=400`, `<500`, …) | | `--website-re`| — | Filter: RE2 regex against website | | `--uri-re` | — | Filter: RE2 regex against request URI | | `--is-tor` | — | Filter: `1` or `!=0` = TOR only; `0` or `!=1` = non-TOR only | | `--asn` | — | Filter: ASN expression (`12345`, `!=65000`, `>=1000`, `<64512`, …) | | `--source-tag`| — | Filter: exact `ipng_source_tag` (e.g. `direct`, `cdn`) | ### `topn` flags | Flag | Default | Description | |---------------|------------|-----------------------------------------------------------------------| | `--n` | `10` | Number of entries | | `--window` | `5m` | `1m` `5m` `15m` `60m` `6h` `24h` | | `--group-by` | `website` | `website` `prefix` `uri` `status` `asn` `source_tag` | ### `trend` flags | Flag | Default | Description | |---------------|------------|----------------------------------------------------------| | `--window` | `5m` | `1m` `5m` `15m` `60m` `6h` `24h` | ### Output format **Table** (default — single target, no header): ``` RANK COUNT LABEL 1 18 432 example.com 2 4 211 other.com ``` **Multi-target** — each target gets a labeled section: ``` === col-1 (nginx1:9090) === RANK COUNT LABEL 1 10 000 example.com === agg-prod (agg:9091) === RANK COUNT LABEL 1 18 432 example.com ``` **JSON** (`--json`) — a single JSON array with one object per target, suitable for `jq`: ```json [{"source":"agg-prod","target":"agg:9091","entries":[{"label":"example.com","count":18432},...]}] ``` **`stream` JSON** — one object per snapshot received (NDJSON), runs until interrupted: ```json {"ts":1773516180,"source":"col-1","target":"nginx1:9090","total_entries":823,"top_label":"example.com","top_count":10000} ``` ### `targets` subcommand Lists the targets (collectors) known to the queried endpoint. When querying an aggregator, returns all configured collectors with their display names and addresses. When querying a collector, returns the collector itself (address shown as `(self)`). ```bash # List collectors behind the aggregator logtail-cli targets --target agg:9091 # Machine-readable output logtail-cli targets --target agg:9091 --json ``` Table output example: ``` nginx1.prod nginx1:9090 nginx2.prod nginx2:9090 nginx3.prod (self) ``` JSON output (`--json`) — one object per target: ```json {"query_target":"agg:9091","name":"nginx1.prod","addr":"nginx1:9090"} ``` ### Examples ```bash # Top 20 client prefixes sending 429s right now logtail-cli topn --target agg:9091 --window 1m --group-by prefix --status 429 --n 20 # Same query, pipe to jq for scripting logtail-cli topn --target agg:9091 --window 1m --group-by prefix --status 429 --n 20 \ --json | jq '.[0].entries[0]' # Which website has the most errors (4xx or 5xx) over the last 24h? logtail-cli topn --target agg:9091 --window 24h --group-by website --status '>=400' # Which client prefixes are NOT getting 200s? (anything non-success) logtail-cli topn --target agg:9091 --window 5m --group-by prefix --status '!=200' # Drill: top URIs on one website over the last 60 minutes logtail-cli topn --target agg:9091 --window 60m --group-by uri --website api.example.com # Filter by website regex: all gouda hosts logtail-cli topn --target agg:9091 --window 5m --website-re 'gouda.*' # Filter by URI regex: all /api/ paths logtail-cli topn --target agg:9091 --window 5m --group-by uri --uri-re '^/api/' # Show only TOR traffic — which websites are TOR clients hitting? logtail-cli topn --target agg:9091 --window 5m --is-tor 1 # Show non-TOR traffic only — exclude exit nodes from the view logtail-cli topn --target agg:9091 --window 5m --is-tor 0 # Top ASNs by request count over the last 5 minutes logtail-cli topn --target agg:9091 --window 5m --group-by asn # Which ASNs are generating the most 429s? logtail-cli topn --target agg:9091 --window 5m --group-by asn --status 429 # Filter to traffic from a specific ASN logtail-cli topn --target agg:9091 --window 5m --asn 8298 # Filter to traffic from private-use / unallocated ASNs logtail-cli topn --target agg:9091 --window 5m --group-by prefix --asn '>=64512' # Exclude unresolved entries (ASN 0) and show top source ASNs logtail-cli topn --target agg:9091 --window 5m --group-by asn --asn '!=0' # Compare two collectors side by side in one command logtail-cli topn --target nginx1:9090,nginx2:9090 --window 5m # Query both a collector and the aggregator at once logtail-cli topn --target nginx3:9090,agg:9091 --window 5m --group-by prefix # Trend of total traffic over 6h (for a quick sparkline in the terminal) logtail-cli trend --target agg:9091 --window 6h --json | jq '.[0].points | [.[].count]' # Watch live merged snapshots from the aggregator logtail-cli stream --target agg:9091 # Watch two collectors simultaneously; each snapshot is labeled by source logtail-cli stream --target nginx1:9090,nginx2:9090 ``` The `stream` subcommand reconnects automatically after errors (5 s backoff) and runs until interrupted with Ctrl-C. The `topn` and `trend` subcommands exit immediately after one response. --- ## Operational notes **No persistence.** All data is in-memory. A collector restart loses ring buffer history but resumes tailing the log file from the current position immediately. **No TLS.** Designed for trusted internal networks. If you need encryption in transit, put a TLS-terminating proxy (e.g. stunnel, nginx stream) in front of the gRPC port. **inotify limits.** The collector uses a single inotify instance regardless of how many files it tails. If you tail files across many different directories, check `/proc/sys/fs/inotify/max_user_watches` (default 8192); increase it if needed: ```bash echo 65536 | sudo tee /proc/sys/fs/inotify/max_user_watches ``` **High-cardinality attacks.** If a DDoS sends traffic from thousands of unique /24 prefixes with unique URIs, the live map will hit its 100 000 entry cap and drop new keys for the rest of that minute. The top-K entries already tracked continue accumulating counts. This is by design — the cap prevents memory exhaustion under attack conditions. **Clock skew.** Trend sparklines are based on the collector's local clock. If collectors have significant clock skew, trend buckets from different collectors may not align precisely in the aggregator. NTP sync is recommended.