Add is_tor plumbing from collector->aggregator->frontend/cli
This commit is contained in:
@@ -27,7 +27,7 @@ Add the `logtail` log format to your `nginx.conf` and apply it to each `server`
|
||||
|
||||
```nginx
|
||||
http {
|
||||
log_format logtail '$host\t$remote_addr\t$msec\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time';
|
||||
log_format logtail '$host\t$remote_addr\t$msec\t$request_method\t$request_uri\t$status\t$body_bytes_sent\t$request_time\t$is_tor';
|
||||
|
||||
server {
|
||||
access_log /var/log/nginx/access.log logtail;
|
||||
@@ -38,7 +38,10 @@ http {
|
||||
```
|
||||
|
||||
The format is tab-separated with fixed field positions. Query strings are stripped from the URI
|
||||
by the collector at ingest time — only the path is tracked.
|
||||
by the collector at ingest time — only the path is tracked. `$is_tor` must be set to `1` when
|
||||
the client IP is a TOR exit node and `0` otherwise (this is typically populated by a custom nginx
|
||||
variable or a Lua script that checks the IP against a TOR exit list). The field is optional for
|
||||
backward compatibility — log lines without it are accepted and treated as `is_tor=0`.
|
||||
|
||||
---
|
||||
|
||||
@@ -64,14 +67,15 @@ windows, and exposes a gRPC interface for the aggregator (and directly for the C
|
||||
|
||||
### Flags
|
||||
|
||||
| Flag | Default | Description |
|
||||
|----------------|--------------|-----------------------------------------------------------|
|
||||
| `--listen` | `:9090` | gRPC listen address |
|
||||
| `--logs` | — | Comma-separated log file paths or glob patterns |
|
||||
| `--logs-file` | — | File containing one log path/glob per line |
|
||||
| `--source` | hostname | Name for this collector in query responses |
|
||||
| `--v4prefix` | `24` | IPv4 prefix length for client bucketing (e.g. /24 → /23) |
|
||||
| `--v6prefix` | `48` | IPv6 prefix length for client bucketing |
|
||||
| Flag | Default | Description |
|
||||
|-------------------|--------------|-----------------------------------------------------------|
|
||||
| `--listen` | `:9090` | gRPC listen address |
|
||||
| `--logs` | — | Comma-separated log file paths or glob patterns |
|
||||
| `--logs-file` | — | File containing one log path/glob per line |
|
||||
| `--source` | hostname | Name for this collector in query responses |
|
||||
| `--v4prefix` | `24` | IPv4 prefix length for client bucketing (e.g. /24 → /23) |
|
||||
| `--v6prefix` | `48` | IPv6 prefix length for client bucketing |
|
||||
| `--scan-interval` | `10s` | How often to rescan glob patterns for new/removed files |
|
||||
|
||||
At least one of `--logs` or `--logs-file` is required.
|
||||
|
||||
@@ -124,7 +128,7 @@ The collector is designed to stay well under 1 GB:
|
||||
| Coarse ring (288 × 5-min) | 288 × 5 000 | ~268 MB |
|
||||
| **Total** | | **~845 MB** |
|
||||
|
||||
When the live map reaches 100 000 distinct 4-tuples, new keys are dropped for the rest of that
|
||||
When the live map reaches 100 000 distinct 5-tuples, new keys are dropped for the rest of that
|
||||
minute. Existing keys continue to accumulate counts. The cap resets at each minute rotation.
|
||||
|
||||
### Time windows
|
||||
@@ -284,6 +288,10 @@ Supported fields and operators:
|
||||
| `website` | `=` `~=` | `website~=gouda.*` |
|
||||
| `uri` | `=` `~=` | `uri~=^/api/` |
|
||||
| `prefix` | `=` | `prefix=1.2.3.0/24` |
|
||||
| `is_tor` | `=` `!=` | `is_tor=1`, `is_tor!=0` |
|
||||
|
||||
`is_tor=1` and `is_tor!=0` are equivalent (TOR traffic only). `is_tor=0` and `is_tor!=1` are
|
||||
equivalent (non-TOR traffic only).
|
||||
|
||||
`~=` means RE2 regex match. Values with spaces or quotes may be wrapped in double or single
|
||||
quotes: `uri~="^/search\?q="`.
|
||||
@@ -303,8 +311,8 @@ accept RE2 regular expressions. The breadcrumb strip shows them as `website~=gou
|
||||
`uri~=^/api/` with the usual `×` remove link.
|
||||
|
||||
**URL sharing** — all filter state is in the URL query string (`w`, `by`, `f_website`,
|
||||
`f_prefix`, `f_uri`, `f_status`, `f_website_re`, `f_uri_re`, `n`). Copy the URL to share an
|
||||
exact view with another operator, or bookmark a recurring query.
|
||||
`f_prefix`, `f_uri`, `f_status`, `f_website_re`, `f_uri_re`, `f_is_tor`, `n`). Copy the URL to
|
||||
share an exact view with another operator, or bookmark a recurring query.
|
||||
|
||||
**JSON output** — append `&raw=1` to any URL to receive the TopN result as JSON instead of
|
||||
HTML. Useful for scripting without the CLI binary:
|
||||
@@ -359,6 +367,7 @@ logtail-cli targets [flags] list targets known to the queried endpoint
|
||||
| `--status` | — | Filter: HTTP status expression (`200`, `!=200`, `>=400`, `<500`, …) |
|
||||
| `--website-re`| — | Filter: RE2 regex against website |
|
||||
| `--uri-re` | — | Filter: RE2 regex against request URI |
|
||||
| `--is-tor` | — | Filter: `1` or `!=0` = TOR only; `0` or `!=1` = non-TOR only |
|
||||
|
||||
### `topn` flags
|
||||
|
||||
@@ -455,6 +464,12 @@ logtail-cli topn --target agg:9091 --window 5m --website-re 'gouda.*'
|
||||
# Filter by URI regex: all /api/ paths
|
||||
logtail-cli topn --target agg:9091 --window 5m --group-by uri --uri-re '^/api/'
|
||||
|
||||
# Show only TOR traffic — which websites are TOR clients hitting?
|
||||
logtail-cli topn --target agg:9091 --window 5m --is-tor 1
|
||||
|
||||
# Show non-TOR traffic only — exclude exit nodes from the view
|
||||
logtail-cli topn --target agg:9091 --window 5m --is-tor 0
|
||||
|
||||
# Compare two collectors side by side in one command
|
||||
logtail-cli topn --target nginx1:9090,nginx2:9090 --window 5m
|
||||
|
||||
|
||||
Reference in New Issue
Block a user