PRE-RELEASE 0.9.1: Makefile, Debian packaging, versioned UDP

Build and release tooling:
- Makefile with help as default; targets: build/build-amd64/build-arm64,
  test, lint, proto, pkg-deb, docker, docker-push, clean, plus
  install-deps (+ three sub-targets for apt / Go toolchain / Go tools).
- internal/version package; -ldflags -X injects Version/Commit/Date into
  every binary. -version flag on all four binaries (nginx-logtail version
  for the CLI).
- Dockerfile takes VERSION/COMMIT/DATE build-args and forwards them.
- .deb output lands in build/; .gitignore ignores /build/.

Debian package:
- debian/build-deb.sh packages all four static binaries into a single
  nginx-logtail_<ver>_<arch>.deb using dpkg-deb.
- Binary layout: /usr/sbin/nginx-logtail-{collector,aggregator,frontend}
  and /usr/bin/nginx-logtail.
- nginx-logtail(8) manpage.
- Three systemd units (collector, aggregator, frontend) shipped under
  /lib/systemd/system/. Installed but never enabled or started — the
  operator opts in per host.
- Collector runs as _logtail:www-data (log access); aggregator and
  frontend as _logtail:_logtail. postinst creates the system user/group
  idempotently.
- Single shared env file /etc/default/nginx-logtail rendered from a
  template at first install with %HOSTNAME% substituted. Sensible
  defaults for every COLLECTOR_*, AGGREGATOR_*, FRONTEND_* variable;
  plus COLLECTOR_ARGS / AGGREGATOR_ARGS / FRONTEND_ARGS escape hatches
  appended to ExecStart. Not a dpkg conffile: operator edits survive
  upgrades and dpkg --purge removes it.

Versioned UDP wire format:
- ParseUDPLine dispatches on a leading "v<N>\t" tag; v1 routes to the
  existing 12-field parser. Unknown/missing versions fail closed so
  future v2 parsers can land before emitters are upgraded.
- Tests updated; design.md FR-2.2 rewritten to make the version tag
  normative.

Docs:
- README.md gains a Quick Start (Debian / Docker Compose / from source).
- user-guide.md rewritten around Installation and Configuration: full
  env-var table, UDP-only default explained, precise file/UDP log_format
  layouts, note that operators can emit "0" for unknown \$is_tor / \$asn.
- Drilldown cycle, frontend filter table, and CLI --group-by list all
  include source_tag. UDP counters documented in the Prometheus section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-17 10:35:08 +02:00
parent 577ed3dad5
commit 143aad9063
23 changed files with 1214 additions and 114 deletions

View File

@@ -14,16 +14,126 @@ Components:
| Binary | Runs on | Role |
|---------------|------------------|----------------------------------------------------|
| `collector` | each nginx host | Tails log files, aggregates in memory, serves gRPC |
| `collector` | each nginx host | Tails log files and/or UDP datagrams, aggregates in memory, serves gRPC |
| `aggregator` | central host | Merges all collectors, serves unified gRPC |
| `frontend` | central host | HTTP dashboard with drilldown UI |
| `cli` | operator laptop | Shell queries against collector or aggregator |
Every binary accepts `-version` (or `nginx-logtail version` for the CLI) and prints its version,
git commit, and build date.
---
## nginx Configuration
## Installation
Add the `logtail` log format to your `nginx.conf` and apply it to each `server` block:
Three flavors. `make help` lists every target; `make install-deps` sets up a fresh build box
(apt deps, Go toolchain, `protoc-gen-go`, `golangci-lint`).
### Debian package
```bash
make pkg-deb # produces nginx-logtail_<ver>_{amd64,arm64}.deb
sudo dpkg -i nginx-logtail_*_amd64.deb
```
The package installs:
| Path | Contents |
|---------------------------------------------------------------|---------------------------------------------------|
| `/usr/sbin/nginx-logtail-{collector,aggregator,frontend}` | Service binaries |
| `/usr/bin/nginx-logtail` | CLI |
| `/lib/systemd/system/nginx-logtail-*.service` | Three systemd units |
| `/usr/share/man/man8/nginx-logtail.8.gz` | Manpage (`man 8 nginx-logtail`) |
| `/usr/share/nginx-logtail/default.template` | Defaults template |
| `/etc/default/nginx-logtail` | **Generated on first install** from the template |
The postinst creates a system user/group `_logtail` if absent and renders the template into
`/etc/default/nginx-logtail` with the short hostname substituted. **None of the services are
enabled or started automatically** — installing the package is safe on any host. Operators
opt in per service:
```bash
sudo systemctl enable --now nginx-logtail-collector.service # on each nginx host
sudo systemctl enable --now nginx-logtail-aggregator.service # on the central host
sudo systemctl enable --now nginx-logtail-frontend.service # on the central host
```
The collector runs as `_logtail:www-data` so it can read nginx access logs that are
group-readable by `www-data`; aggregator and frontend run as `_logtail:_logtail`.
### Docker / Docker Compose
The repo's `docker-compose.yml` runs the aggregator and frontend together from a single image
that contains all four binaries.
```bash
make docker # builds git.ipng.ch/ipng/nginx-logtail:v<ver> + :latest, native arch
make docker-push # multi-arch (amd64+arm64) buildx push
AGGREGATOR_COLLECTORS=nginx1:9090,nginx2:9090 docker compose up -d
# frontend on :8080, aggregator gRPC on :9091
```
Each container explicitly selects its binary via `command: ["/usr/local/bin/<binary>"]`.
### From source
```bash
git clone https://git.ipng.ch/ipng/nginx-logtail
cd nginx-logtail
make build # -> build/<arch>/{collector,aggregator,frontend,cli}
make test
./build/*/cli version
```
Requires Go ≥ 1.24 (see `go.mod`). No CGO, no external runtime dependencies.
---
## Configuration
### /etc/default/nginx-logtail
The Debian package ships one shared environment file read by all three systemd units via
`EnvironmentFile=-/etc/default/nginx-logtail`. It enumerates every flag the three daemons
accept as a `COLLECTOR_*`, `AGGREGATOR_*`, or `FRONTEND_*` env var. Defaults on first install
are sensible for a single-host deployment:
| Variable | First-install default | Purpose |
|----------------------------|------------------------------|---------------------------------------------------|
| `COLLECTOR_LISTEN` | `:9090` | gRPC listen address |
| `COLLECTOR_PROM_LISTEN` | `:9100` | Prometheus metrics; set `""` to disable |
| `COLLECTOR_LOGS` | *(empty — UDP-only)* | Comma-sep log paths/globs |
| `COLLECTOR_LOGS_FILE` | *(empty)* | File with one path/glob per line |
| `COLLECTOR_SOURCE` | `$(hostname -s)` at install | Display name in query responses |
| `COLLECTOR_V4PREFIX` | `24` | IPv4 bucket prefix |
| `COLLECTOR_V6PREFIX` | `48` | IPv6 bucket prefix |
| `COLLECTOR_SCAN_INTERVAL` | `10s` | Log-glob rescan cadence |
| `COLLECTOR_LOGTAIL_PORT` | `9514` | UDP port for `ipng_stats_logtail` (0 disables) |
| `COLLECTOR_LOGTAIL_BIND` | `127.0.0.1` | UDP bind address |
| `AGGREGATOR_LISTEN` | `:9091` | gRPC listen address |
| `AGGREGATOR_COLLECTORS` | `localhost:9090` | Comma-sep collectors (mandatory) |
| `AGGREGATOR_SOURCE` | `$(hostname -s)` at install | Display name |
| `FRONTEND_LISTEN` | `:8080` | HTTP dashboard address |
| `FRONTEND_TARGET` | `localhost:9091` | Default gRPC endpoint |
| `FRONTEND_N` | `25` | Default table row count |
| `FRONTEND_REFRESH` | `30` | Meta-refresh seconds; `0` disables |
At least one of `COLLECTOR_LOGS`, `COLLECTOR_LOGS_FILE`, or `COLLECTOR_LOGTAIL_PORT > 0` must
be set, otherwise the collector refuses to start. The shipped default (`COLLECTOR_LOGS=` empty
plus `COLLECTOR_LOGTAIL_PORT=9514`) makes the collector UDP-only — no file tailer goroutine
is launched when no log patterns are supplied.
Three escape-hatch variables — `COLLECTOR_ARGS`, `AGGREGATOR_ARGS`, `FRONTEND_ARGS` — are
appended verbatim to each unit's `ExecStart` argv. Use them for flags without an env-var form,
or for temporary overrides, without editing the unit.
The file is **not a dpkg conffile**: postinst writes it only when absent, so operator edits
survive upgrades, and `dpkg --purge` removes it.
### nginx — file-based ingest
Add the `logtail` format and attach it to whichever `server` blocks you want tracked:
```nginx
http {
@@ -37,64 +147,128 @@ http {
}
```
The format is tab-separated with fixed field positions. Query strings are stripped from the URI
by the collector at ingest time — only the path is tracked.
Tab-separated, fixed field order, ten fields. The precise layout:
`$is_tor` must be set to `1` when the client IP is a TOR exit node and `0` otherwise (typically
populated by a custom nginx variable or a Lua script that checks the IP against a TOR exit list).
The field is optional for backward compatibility — log lines without it are accepted and treated
as `is_tor=0`.
| # | Field | Ingested into |
|---|-------------------|--------------------------|
| 0 | `$host` | `website` |
| 1 | `$remote_addr` | `client_prefix` (truncated) |
| 2 | `$msec` | *(discarded)* |
| 3 | `$request_method` | Prom `method` label |
| 4 | `$request_uri` | `http_request_uri` (query stripped) |
| 5 | `$status` | `http_response` |
| 6 | `$body_bytes_sent`| Prom body histogram |
| 7 | `$request_time` | Prom duration histogram |
| 8 | `$is_tor` | `is_tor` (optional) |
| 9 | `$asn` | `asn` (optional) |
`$asn` must be set to the client's AS number as a decimal integer (e.g. from MaxMind GeoIP2's
`$geoip2_data_autonomous_system_number`). The field is optional — log lines without it default
to `asn=0`.
`$is_tor` is `1` if the client IP is a TOR exit node and `0` otherwise (typically populated
via a Lua script or `$geoip2_data_*`). `$asn` is the client AS number as a decimal integer
(e.g. MaxMind GeoIP2's `$geoip2_data_autonomous_system_number`).
---
**If either is unknown, emit `0`.** A literal `0` in `$is_tor` parses as `false`; a literal
`0` in `$asn` parses as ASN `0`, which you can exclude at query time with `--asn '!=0'` / the
`asn!=0` filter expression. Operators who don't have TOR or GeoIP data can simply emit `0` for
both columns and everything works.
## Building
Both fields are also **positionally optional** for backward compatibility — older 8-field
lines are accepted and default to `false` / `0`. Records from the file tailer are always
tagged `source_tag="direct"`.
```bash
git clone https://git.ipng.ch/ipng/nginx-logtail
cd nginx-logtail
go build ./cmd/collector/
go build ./cmd/aggregator/
go build ./cmd/frontend/
go build ./cmd/cli/
Then point the collector at the log files via `COLLECTOR_LOGS` — comma-separated paths or
glob patterns. Make sure the files are group-readable by `www-data` (the collector's primary
group in the systemd unit).
### nginx — UDP ingest (`nginx-ipng-stats-plugin`)
If the nginx host runs [`nginx-ipng-stats-plugin`](https://git.ipng.ch/ipng/nginx-ipng-stats-plugin),
the plugin's `ipng_stats_logtail` directive emits one UDP datagram per request directly to
the collector, no log file involved. The wire format is **versioned** — every datagram starts
with a literal `v1\t` prefix so the collector can ship new parser versions (v2, v3, …) before
emitters are upgraded and route each packet accordingly.
```nginx
http {
log_format ipng_stats_logtail
'v1\t$host\t$remote_addr\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time\t$is_tor\t$asn\t$ipng_source_tag\t$server_addr\t$scheme';
ipng_stats_logtail ipng_stats_logtail udp://127.0.0.1:9514 buffer=64k flush=1s;
}
```
Requires Go 1.21+. No CGO, no external runtime dependencies.
Precise v1 layout — 13 tab-separated fields total (version prefix + 12 payload fields):
| # | Field | Ingested into |
|---|-------------------|------------------------------|
| 0 | `v1` | version tag |
| 1 | `$host` | `website` |
| 2 | `$remote_addr` | `client_prefix` (truncated) |
| 3 | `$request_method` | Prom `method` label |
| 4 | `$request_uri` | `http_request_uri` (query stripped) |
| 5 | `$status` | `http_response` |
| 6 | `$body_bytes_sent`| Prom body histogram |
| 7 | `$request_time` | Prom duration histogram |
| 8 | `$is_tor` | `is_tor` |
| 9 | `$asn` | `asn` |
| 10| `$ipng_source_tag`| `source_tag` |
| 11| `$server_addr` | *(parsed and discarded)* |
| 12| `$scheme` | *(parsed and discarded)* |
Compared to the file format: the version tag is added, `$msec` is dropped, and three fields
are appended — `$ipng_source_tag` (propagated into the data model), `$server_addr` and
`$scheme` (reserved for future use).
**Unknown `$is_tor` / `$asn`: emit `0`.** Same convention as the file format — operators
without TOR or GeoIP data can emit `0` for both columns and everything works. A literal `0`
in `$is_tor` is `false`; a literal `0` in `$asn` is ASN `0`, filterable at query time.
All 13 fields are required for v1 — malformed packets (wrong version, wrong field count, bad
IP) are silently dropped and counted via `logtail_udp_packets_received_total` minus
`logtail_udp_loglines_success_total`. Both paths (file + UDP) can feed the same collector
simultaneously; they converge on the same aggregation pipeline.
---
## Collector
Runs on each nginx machine. Tails log files, maintains in-memory top-K counters across six time
Runs on each nginx machine. Ingests logs from files (via `fsnotify`) and/or UDP datagrams
(from `nginx-ipng-stats-plugin`), maintains in-memory top-K counters across six time
windows, and exposes a gRPC interface for the aggregator (and directly for the CLI).
### Flags
| Flag | Default | Description |
|-------------------|--------------|-----------------------------------------------------------|
| `--listen` | `:9090` | gRPC listen address |
| `--prom-listen` | `:9100` | Prometheus metrics address; empty string to disable |
| `--logs` | — | Comma-separated log file paths or glob patterns |
| `--logs-file` | — | File containing one log path/glob per line |
| `--source` | hostname | Name for this collector in query responses |
| `--v4prefix` | `24` | IPv4 prefix length for client bucketing (e.g. /24 → /23) |
| `--v6prefix` | `48` | IPv6 prefix length for client bucketing |
| `--scan-interval` | `10s` | How often to rescan glob patterns for new/removed files |
| Flag | Default | Description |
|-------------------|---------------|-------------------------------------------------------------------|
| `--listen` | `:9090` | gRPC listen address |
| `--prom-listen` | `:9100` | Prometheus metrics address; empty string to disable |
| `--logs` | — | Comma-separated log file paths or glob patterns |
| `--logs-file` | — | File containing one log path/glob per line |
| `--source` | hostname | Name for this collector in query responses |
| `--v4prefix` | `24` | IPv4 prefix length for client bucketing |
| `--v6prefix` | `48` | IPv6 prefix length for client bucketing |
| `--scan-interval` | `10s` | How often to rescan glob patterns for new/removed files |
| `--logtail-port` | `0` (off) | UDP port receiving `ipng_stats_logtail` datagrams |
| `--logtail-bind` | `127.0.0.1` | UDP bind address |
| `--version` | — | Print version, commit, build date and exit |
At least one of `--logs` or `--logs-file` is required.
At least one of `--logs`, `--logs-file`, or `--logtail-port > 0` is required; otherwise the
collector refuses to start.
### Examples
```bash
# UDP-only (nginx-ipng-stats-plugin feed)
./collector --logtail-port 9514
# Single file
./collector --logs /var/log/nginx/access.log
# Multiple files via glob (one inotify instance regardless of count)
./collector --logs "/var/log/nginx/*/access.log"
# Files and UDP at the same time
./collector --logs "/var/log/nginx/*.log" --logtail-port 9514
# Many files via a config file
./collector --logs-file /etc/nginx-logtail/logs.conf
@@ -129,30 +303,30 @@ the new file appears. No restart or SIGHUP required.
The collector exposes a Prometheus-compatible `/metrics` endpoint on `--prom-listen` (default
`:9100`). Set `--prom-listen ""` to disable it entirely.
Three metrics are exported:
**Per-host series:**
**`nginx_http_requests_total`** — counter, labeled `{host, method, status}`:
```
nginx_http_requests_total{host="example.com",method="GET",status="200"} 18432
nginx_http_requests_total{host="example.com",method="POST",status="201"} 304
nginx_http_requests_total{host="api.example.com",method="GET",status="429"} 57
```
- `nginx_http_requests_total{host, method, status}` — counter. Map capped at 250 000 distinct
label sets; new entries beyond the cap are dropped until the map is rolled over.
- `nginx_http_response_body_bytes_{bucket,count,sum}{host, le}` — histogram of
`$body_bytes_sent`. Buckets (bytes): `256, 1024, 4096, 16384, 65536, 262144, 1048576, +Inf`.
- `nginx_http_request_duration_seconds_{bucket,count,sum}{host, le}` — histogram of
`$request_time`. Buckets (seconds): `0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5,
10, +Inf`. Not split by `source_tag` (duration histogram stays per-host to avoid cardinality
blow-up).
**`nginx_http_response_body_bytes`** — histogram, labeled `{host}`. Observes the
`$body_bytes_sent` value for every request. Bucket upper bounds (bytes):
`256, 1024, 4096, 16384, 65536, 262144, 1048576, +Inf`.
**Per-`source_tag` roll-ups** (parallel series, not a cross-product with `host`):
**`nginx_http_request_duration_seconds`** — histogram, labeled `{host}`. Observes the
`$request_time` value for every request. Bucket upper bounds (seconds):
`0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, +Inf`.
- `nginx_http_requests_by_source_total{source_tag}` — counter.
- `nginx_http_response_body_bytes_by_source_{bucket,count,sum}{source_tag, le}` — histogram.
Body and request-time histograms use only the `host` label (not method/status) to keep
cardinality bounded — the label sets stay proportional to the number of virtual hosts, not
the number of unique method × status combinations.
**UDP ingest counters** — lets operators distinguish parse failures from back-pressure drops:
The counter map is capped at 100 000 distinct `{host, method, status}` tuples. Entries beyond
the cap are silently dropped for the current scrape interval, so memory is bounded regardless
of traffic patterns.
- `logtail_udp_packets_received_total` — datagrams read off the socket.
- `logtail_udp_loglines_success_total` — parsed OK.
- `logtail_udp_loglines_consumed_total` — forwarded to the store (not dropped).
`received - success` is the parse-failure rate; `success - consumed` is the back-pressure
drop rate. Alert on either being non-zero.
**Prometheus scrape config:**
@@ -221,25 +395,22 @@ Data is served from two tiered ring buffers:
History is lost on restart — the collector resumes tailing immediately but all ring buffers start
empty. The fine ring fills in 1 hour; the coarse ring fills in 24 hours.
### Systemd unit example
### Running under systemd
```ini
[Unit]
Description=nginx-logtail collector
After=network.target
The Debian package ships `nginx-logtail-collector.service` ready to run under the `_logtail`
system user with `Group=www-data` (for log-file access). Every flag comes from
`/etc/default/nginx-logtail`. To operate it:
[Service]
ExecStart=/usr/local/bin/collector \
--logs-file /etc/nginx-logtail/logs.conf \
--listen :9090 \
--source %H
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
```bash
sudo $EDITOR /etc/default/nginx-logtail # set COLLECTOR_LOGS / COLLECTOR_LOGTAIL_PORT
sudo systemctl enable --now nginx-logtail-collector.service
sudo systemctl status nginx-logtail-collector.service
sudo journalctl -u nginx-logtail-collector.service -f
```
If you run from source without the package, compose a unit from the packaged template at
`debian/nginx-logtail-collector.service`.
---
## Aggregator
@@ -326,13 +497,13 @@ the selected dimension and time window.
**Window tabs** — switch between `1m / 5m / 15m / 60m / 6h / 24h`. Only the window changes;
all active filters are preserved.
**Dimension tabs** — switch between grouping by `website / asn / prefix / status / uri`.
**Dimension tabs** — switch between grouping by `website / asn / prefix / status / uri / source`.
**Drilldown** — click any table row to add that value as a filter and advance to the next
dimension in the hierarchy:
```
website → client prefix → request URI → HTTP status → ASN → website (cycles)
website → client prefix → request URI → HTTP status → ASN → source_tag → website (cycles)
```
Example: click `example.com` in the website view to see which client prefixes are hitting it;
@@ -364,6 +535,7 @@ Supported fields and operators:
| `prefix` | `=` | `prefix=1.2.3.0/24` |
| `is_tor` | `=` `!=` | `is_tor=1`, `is_tor!=0` |
| `asn` | `=` `!=` `>` `>=` `<` `<=` | `asn=8298`, `asn>=1000` |
| `source_tag` | `=` | `source_tag=direct`, `source_tag=cdn` |
`is_tor=1` and `is_tor!=0` are equivalent (TOR traffic only). `is_tor=0` and `is_tor!=1` are
equivalent (non-TOR traffic only).
@@ -389,8 +561,9 @@ accept RE2 regular expressions. The breadcrumb strip shows them as `website~=gou
`uri~=^/api/` with the usual `×` remove link.
**URL sharing** — all filter state is in the URL query string (`w`, `by`, `f_website`,
`f_prefix`, `f_uri`, `f_status`, `f_website_re`, `f_uri_re`, `f_is_tor`, `f_asn`, `n`). Copy
the URL to share an exact view with another operator, or bookmark a recurring query.
`f_prefix`, `f_uri`, `f_status`, `f_website_re`, `f_uri_re`, `f_is_tor`, `f_asn`,
`f_source_tag`, `n`). Copy the URL to share an exact view with another operator, or bookmark
a recurring query.
**JSON output** — append `&raw=1` to any URL to receive the TopN result as JSON instead of
HTML. Useful for scripting without the CLI binary:
@@ -447,14 +620,15 @@ logtail-cli targets [flags] list targets known to the queried endpoint
| `--uri-re` | — | Filter: RE2 regex against request URI |
| `--is-tor` | — | Filter: `1` or `!=0` = TOR only; `0` or `!=1` = non-TOR only |
| `--asn` | — | Filter: ASN expression (`12345`, `!=65000`, `>=1000`, `<64512`, …) |
| `--source-tag`| — | Filter: exact `ipng_source_tag` (e.g. `direct`, `cdn`) |
### `topn` flags
| Flag | Default | Description |
|---------------|------------|----------------------------------------------------------|
| `--n` | `10` | Number of entries |
| `--window` | `5m` | `1m` `5m` `15m` `60m` `6h` `24h` |
| `--group-by` | `website` | `website` `prefix` `uri` `status` `asn` |
| Flag | Default | Description |
|---------------|------------|-----------------------------------------------------------------------|
| `--n` | `10` | Number of entries |
| `--window` | `5m` | `1m` `5m` `15m` `60m` `6h` `24h` |
| `--group-by` | `website` | `website` `prefix` `uri` `status` `asn` `source_tag` |
### `trend` flags