Add is_tor plumbing from collector->aggregator->frontend/cli
This commit is contained in:
49
README.md
49
README.md
@@ -38,7 +38,7 @@ nginx-logtail/
|
||||
│ └── logtail_grpc.pb.go # generated: service stubs
|
||||
├── internal/
|
||||
│ └── store/
|
||||
│ └── store.go # shared types: Tuple4, Entry, Snapshot, ring helpers
|
||||
│ └── store.go # shared types: Tuple5, Entry, Snapshot, ring helpers
|
||||
└── cmd/
|
||||
├── collector/
|
||||
│ ├── main.go
|
||||
@@ -76,7 +76,7 @@ nginx-logtail/
|
||||
|
||||
## Data Model
|
||||
|
||||
The core unit is a **count keyed by four dimensions**:
|
||||
The core unit is a **count keyed by five dimensions**:
|
||||
|
||||
| Field | Description | Example |
|
||||
|-------------------|------------------------------------------------------|-------------------|
|
||||
@@ -84,6 +84,7 @@ The core unit is a **count keyed by four dimensions**:
|
||||
| `client_prefix` | client IP truncated to /24 IPv4 or /48 IPv6 | `1.2.3.0/24` |
|
||||
| `http_request_uri`| `$request_uri` path only — query string stripped | `/api/v1/search` |
|
||||
| `http_response` | HTTP status code | `429` |
|
||||
| `is_tor` | whether the client IP is a TOR exit node | `1` |
|
||||
|
||||
## Time Windows & Tiered Ring Buffers
|
||||
|
||||
@@ -110,8 +111,8 @@ Every 5 minutes: merge last 5 fine snapshots → top-5K → append to coarse rin
|
||||
|
||||
## Memory Budget (Collector, target ≤ 1 GB)
|
||||
|
||||
Entry size: ~30 B website + ~15 B prefix + ~50 B URI + 3 B status + 8 B count + ~80 B Go map
|
||||
overhead ≈ **~186 bytes per entry**.
|
||||
Entry size: ~30 B website + ~15 B prefix + ~50 B URI + 3 B status + 1 B is_tor + 8 B count + ~80 B Go map
|
||||
overhead ≈ **~187 bytes per entry**.
|
||||
|
||||
| Structure | Entries | Size |
|
||||
|-------------------------|-------------|-------------|
|
||||
@@ -151,7 +152,8 @@ and does not change any existing interface.
|
||||
## Protobuf API (`proto/logtail.proto`)
|
||||
|
||||
```protobuf
|
||||
enum StatusOp { EQ = 0; NE = 1; GT = 2; GE = 3; LT = 4; LE = 5; }
|
||||
enum TorFilter { TOR_ANY = 0; TOR_YES = 1; TOR_NO = 2; }
|
||||
enum StatusOp { EQ = 0; NE = 1; GT = 2; GE = 3; LT = 4; LE = 5; }
|
||||
|
||||
message Filter {
|
||||
optional string website = 1;
|
||||
@@ -161,6 +163,7 @@ message Filter {
|
||||
StatusOp status_op = 5; // comparison operator for http_response
|
||||
optional string website_regex = 6; // RE2 regex against website
|
||||
optional string uri_regex = 7; // RE2 regex against http_request_uri
|
||||
TorFilter tor = 8; // TOR_ANY (default) / TOR_YES / TOR_NO
|
||||
}
|
||||
|
||||
enum GroupBy { WEBSITE = 0; CLIENT_PREFIX = 1; REQUEST_URI = 2; HTTP_RESPONSE = 3; }
|
||||
@@ -217,7 +220,7 @@ service LogtailService {
|
||||
- Parses the fixed **logtail** nginx log format — tab-separated, fixed field order, no quoting:
|
||||
|
||||
```nginx
|
||||
log_format logtail '$host\t$remote_addr\t$msec\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time';
|
||||
log_format logtail '$host\t$remote_addr\t$msec\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time\t$is_tor';
|
||||
```
|
||||
|
||||
| # | Field | Used for |
|
||||
@@ -230,16 +233,19 @@ service LogtailService {
|
||||
| 5 | `$status` | http_response |
|
||||
| 6 | `$body_bytes_sent`| (discarded) |
|
||||
| 7 | `$request_time` | (discarded) |
|
||||
| 8 | `$is_tor` | is_tor |
|
||||
|
||||
- `strings.SplitN(line, "\t", 8)` — ~50 ns/line. No regex.
|
||||
- `strings.SplitN(line, "\t", 9)` — ~50 ns/line. No regex.
|
||||
- `$request_uri`: query string discarded at first `?`.
|
||||
- `$remote_addr`: truncated to /24 (IPv4) or /48 (IPv6); prefix lengths configurable via flags.
|
||||
- `$is_tor`: `1` if the client IP is a TOR exit node, `0` otherwise. Field is optional — lines
|
||||
with exactly 8 fields (old format) are accepted and default to `is_tor=false`.
|
||||
- Lines with fewer than 8 fields are silently skipped.
|
||||
|
||||
### store.go
|
||||
- **Single aggregator goroutine** reads from the channel and updates the live map — no locking on
|
||||
the hot path. At 10 K lines/s the goroutine uses <1% CPU.
|
||||
- Live map: `map[Tuple4]int64`, hard-capped at 100 K entries (new keys dropped when full).
|
||||
- Live map: `map[Tuple5]int64`, hard-capped at 100 K entries (new keys dropped when full).
|
||||
- **Minute ticker**: heap-selects top-50K entries, writes snapshot to fine ring, resets live map.
|
||||
- Every 5 fine ticks: merge last 5 fine snapshots → top-5K → write to coarse ring.
|
||||
- **TopN query**: RLock ring, sum bucket range, apply filter, group by dimension, heap-select top N.
|
||||
@@ -291,7 +297,7 @@ service LogtailService {
|
||||
|
||||
### handler.go
|
||||
- All filter state in the **URL query string**: `w` (window), `by` (group_by), `f_website`,
|
||||
`f_prefix`, `f_uri`, `f_status`, `f_website_re`, `f_uri_re`, `n`, `target`. No server-side
|
||||
`f_prefix`, `f_uri`, `f_status`, `f_website_re`, `f_uri_re`, `f_is_tor`, `n`, `target`. No server-side
|
||||
session — URLs are shareable and bookmarkable; multiple operators see independent views.
|
||||
- **Filter expression box**: a `q=` parameter carries a mini filter language
|
||||
(`status>=400 AND website~=gouda.* AND uri~=^/api/`). On submission the handler parses it
|
||||
@@ -340,16 +346,17 @@ logtail-cli targets [flags] list targets known to the queried endpoint
|
||||
|
||||
**Shared** (all subcommands):
|
||||
|
||||
| Flag | Default | Description |
|
||||
|--------------|------------------|----------------------------------------------------------|
|
||||
| `--target` | `localhost:9090` | Comma-separated `host:port` list; fan-out to all |
|
||||
| `--json` | false | Emit newline-delimited JSON instead of a table |
|
||||
| `--website` | — | Filter: website |
|
||||
| `--prefix` | — | Filter: client prefix |
|
||||
| `--uri` | — | Filter: request URI |
|
||||
| `--status` | — | Filter: HTTP status expression (`200`, `!=200`, `>=400`, `<500`, …) |
|
||||
| `--website-re`| — | Filter: RE2 regex against website |
|
||||
| `--uri-re` | — | Filter: RE2 regex against request URI |
|
||||
| Flag | Default | Description |
|
||||
|---------------|------------------|----------------------------------------------------------|
|
||||
| `--target` | `localhost:9090` | Comma-separated `host:port` list; fan-out to all |
|
||||
| `--json` | false | Emit newline-delimited JSON instead of a table |
|
||||
| `--website` | — | Filter: website |
|
||||
| `--prefix` | — | Filter: client prefix |
|
||||
| `--uri` | — | Filter: request URI |
|
||||
| `--status` | — | Filter: HTTP status expression (`200`, `!=200`, `>=400`, `<500`, …) |
|
||||
| `--website-re`| — | Filter: RE2 regex against website |
|
||||
| `--uri-re` | — | Filter: RE2 regex against request URI |
|
||||
| `--is-tor` | — | Filter: TOR traffic (`1` or `!=0` = TOR only; `0` or `!=1` = non-TOR only) |
|
||||
|
||||
**`topn` only**: `--n 10`, `--window 5m`, `--group-by website`
|
||||
|
||||
@@ -381,7 +388,7 @@ with a non-zero code on gRPC error.
|
||||
| Tick-based cache rotation in aggregator | Ring stays on the same 1-min cadence regardless of collector count |
|
||||
| Degraded collector zeroing | Stale counts from failed collectors don't accumulate in the merged view |
|
||||
| Same `LogtailService` for collector and aggregator | CLI and frontend work with either; no special-casing |
|
||||
| `internal/store` shared package | ~200 lines of ring-buffer logic shared between collector and aggregator |
|
||||
| `internal/store` shared package | ring-buffer, `Tuple5` encoding, and filter logic shared between collector and aggregator |
|
||||
| Filter state in URL, not session cookie | Multiple concurrent operators; shareable/bookmarkable URLs |
|
||||
| Query strings stripped at ingest | Major cardinality reduction; prevents URI explosion under attack |
|
||||
| No persistent storage | Simplicity; acceptable for ops dashboards (restart = lose history) |
|
||||
@@ -393,4 +400,4 @@ with a non-zero code on gRPC error.
|
||||
| Status filter as expression string (`!=200`, `>=400`) | Operator-friendly; parsed once at query boundary, encoded as `(int32, StatusOp)` in proto |
|
||||
| Regex filters compiled once per query (`CompiledFilter`) | Up to 288 × 5 000 per-entry calls — compiling per-entry would dominate query latency |
|
||||
| Filter expression box (`q=`) redirects to canonical URL | Filter state stays in individual `f_*` params; URLs remain shareable and bookmarkable |
|
||||
| `ListTargets` + frontend source picker (no Tuple5) | "Which nginx is busiest?" answered by switching `target=` to a collector; no data model changes, no extra memory |
|
||||
| `ListTargets` + frontend source picker | "Which nginx is busiest?" answered by switching `target=` to a collector; no data model changes, no extra memory |
|
||||
|
||||
Reference in New Issue
Block a user