PRE-RELEASE 0.9.1: Makefile, Debian packaging, versioned UDP

Build and release tooling:
- Makefile with help as default; targets: build/build-amd64/build-arm64,
  test, lint, proto, pkg-deb, docker, docker-push, clean, plus
  install-deps (+ three sub-targets for apt / Go toolchain / Go tools).
- internal/version package; -ldflags -X injects Version/Commit/Date into
  every binary. -version flag on all four binaries (nginx-logtail version
  for the CLI).
- Dockerfile takes VERSION/COMMIT/DATE build-args and forwards them.
- .deb output lands in build/; .gitignore ignores /build/.

Debian package:
- debian/build-deb.sh packages all four static binaries into a single
  nginx-logtail_<ver>_<arch>.deb using dpkg-deb.
- Binary layout: /usr/sbin/nginx-logtail-{collector,aggregator,frontend}
  and /usr/bin/nginx-logtail.
- nginx-logtail(8) manpage.
- Three systemd units (collector, aggregator, frontend) shipped under
  /lib/systemd/system/. Installed but never enabled or started — the
  operator opts in per host.
- Collector runs as _logtail:www-data (log access); aggregator and
  frontend as _logtail:_logtail. postinst creates the system user/group
  idempotently.
- Single shared env file /etc/default/nginx-logtail rendered from a
  template at first install with %HOSTNAME% substituted. Sensible
  defaults for every COLLECTOR_*, AGGREGATOR_*, FRONTEND_* variable;
  plus COLLECTOR_ARGS / AGGREGATOR_ARGS / FRONTEND_ARGS escape hatches
  appended to ExecStart. Not a dpkg conffile: operator edits survive
  upgrades and dpkg --purge removes it.

Versioned UDP wire format:
- ParseUDPLine dispatches on a leading "v<N>\t" tag; v1 routes to the
  existing 12-field parser. Unknown/missing versions fail closed so
  future v2 parsers can land before emitters are upgraded.
- Tests updated; design.md FR-2.2 rewritten to make the version tag
  normative.

Docs:
- README.md gains a Quick Start (Debian / Docker Compose / from source).
- user-guide.md rewritten around Installation and Configuration: full
  env-var table, UDP-only default explained, precise file/UDP log_format
  layouts, note that operators can emit "0" for unknown \$is_tor / \$asn.
- Drilldown cycle, frontend filter table, and CLI --group-by list all
  include source_tag. UDP counters documented in the Prometheus section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-17 10:35:08 +02:00
parent 577ed3dad5
commit 143aad9063
23 changed files with 1214 additions and 114 deletions

View File

@@ -127,15 +127,18 @@ Each requirement carries a unique identifier (`FR-X.Y` or `NFR-X.Y`) so that lat
| 8 | `$is_tor` | `is_tor` (optional) |
| 9 | `$asn` | `asn` (optional) |
- **FR-2.2 UDP format.** The collector MUST accept datagrams in the following tab-separated layout, as emitted by
`nginx-ipng-stats-plugin`'s `ipng_stats_logtail` directive:
- **FR-2.2 UDP format.** The collector MUST accept datagrams in a versioned tab-separated layout, as emitted by
`nginx-ipng-stats-plugin`'s `ipng_stats_logtail` directive. Every datagram MUST begin with a literal version tag
(`v<N>\t`) so the collector can route each packet to the appropriate parser. Only `v1` is defined in this revision;
unknown versions MUST be counted as parse failures and dropped.
```nginx
log_format ipng_stats_logtail '$host\t$remote_addr\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time\t$is_tor\t$asn\t$ipng_source_tag\t$server_addr\t$scheme';
log_format ipng_stats_logtail 'v1\t$host\t$remote_addr\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time\t$is_tor\t$asn\t$ipng_source_tag\t$server_addr\t$scheme';
```
Exactly 12 tab-separated fields are required. `$server_addr` and `$scheme` MUST be parsed but dropped; they are reserved for
future use. Malformed datagrams MUST be counted (FR-8.5) and silently dropped.
The v1 payload MUST have exactly 12 tab-separated fields after the `v1` tag (13 fields total). `$server_addr` and
`$scheme` MUST be parsed but dropped; they are reserved for future use. Malformed datagrams (wrong version, wrong
field count, bad IP) MUST be counted (FR-8.5) and silently dropped.
- **FR-2.3** The file tailer MUST set `source_tag="direct"` on every record it parses. The UDP listener MUST propagate
`$ipng_source_tag` verbatim. This is the only difference in downstream processing between the two ingest paths.
@@ -556,7 +559,8 @@ transitions. No per-request logging.
- **UDP datagram loss.** Any datagram dropped in-kernel (socket buffer full, network drop) does not register as a parse failure; it
is simply invisible. Operators should size `SO_RCVBUF` appropriately; the collector already requests 4 MiB.
- **Malformed log lines.** File format: lines with <8 tab-separated fields are silently skipped; an invalid IP also drops the line.
UDP: packets without exactly 12 fields are counted as received-but-not-success and dropped.
UDP: packets without a recognised `v<N>\t` prefix, or with the wrong field count for the claimed version, or with a bad IP, are
counted as received-but-not-success and dropped.
- **Clock skew between collectors.** Trend sparklines derived from merged data assume collectors are roughly NTP-synced. Per-bucket
alignment is to the local minute / 5-minute boundary of each collector.
- **gRPC traffic over untrusted links.** The system does not ship TLS; operators should front the gRPC ports with a TLS-terminating