Add aggregator backfill, pulling fine+coarse buckets from collectors
This commit is contained in:
33
README.md
33
README.md
@@ -264,6 +264,7 @@ message Snapshot {
|
||||
string source = 1;
|
||||
int64 timestamp = 2;
|
||||
repeated TopNEntry entries = 3; // full top-50K for this bucket
|
||||
bool is_coarse = 4; // true for 5-min coarse buckets (DumpSnapshots only)
|
||||
}
|
||||
|
||||
// Target discovery: list the collectors behind the queried endpoint
|
||||
@@ -274,15 +275,22 @@ message TargetInfo {
|
||||
}
|
||||
message ListTargetsResponse { repeated TargetInfo targets = 1; }
|
||||
|
||||
// Backfill: dump full ring buffer contents for aggregator restart recovery
|
||||
message DumpSnapshotsRequest {}
|
||||
// Response reuses Snapshot; is_coarse distinguishes fine (1-min) from coarse (5-min) buckets.
|
||||
// Stream closes after all historical data is sent (unlike StreamSnapshots which stays open).
|
||||
|
||||
service LogtailService {
|
||||
rpc TopN(TopNRequest) returns (TopNResponse);
|
||||
rpc Trend(TrendRequest) returns (TrendResponse);
|
||||
rpc StreamSnapshots(SnapshotRequest) returns (stream Snapshot);
|
||||
rpc ListTargets(ListTargetsRequest) returns (ListTargetsResponse);
|
||||
rpc DumpSnapshots(DumpSnapshotsRequest) returns (stream Snapshot);
|
||||
}
|
||||
// Both collector and aggregator implement LogtailService.
|
||||
// The aggregator's StreamSnapshots re-streams the merged view.
|
||||
// ListTargets: aggregator returns all configured collectors; collector returns itself.
|
||||
// DumpSnapshots: collector only; aggregator calls this on startup to backfill its ring.
|
||||
```
|
||||
|
||||
## Program 1 — Collector
|
||||
@@ -334,11 +342,16 @@ service LogtailService {
|
||||
- **TopN query**: RLock ring, sum bucket range, apply filter, group by dimension, heap-select top N.
|
||||
- **Trend query**: per-bucket filtered sum, returns one `TrendPoint` per bucket.
|
||||
- **Subscriber fan-out**: per-subscriber buffered channel; `Subscribe`/`Unsubscribe` for streaming.
|
||||
- **`DumpRings()`**: acquires `RLock`, copies both ring arrays and their head/filled pointers
|
||||
(just slice headers — microseconds), releases lock, then returns chronologically-ordered fine
|
||||
and coarse snapshot slices. The lock is never held during serialisation or network I/O.
|
||||
|
||||
### server.go
|
||||
- gRPC server on configurable port (default `:9090`).
|
||||
- `TopN` and `Trend`: unary, answered from the ring buffer under RLock.
|
||||
- `StreamSnapshots`: registers a subscriber channel; loops `Recv` on it; 30 s keepalive ticker.
|
||||
- `DumpSnapshots`: calls `DumpRings()`, streams all fine buckets (`is_coarse=false`) then all
|
||||
coarse buckets (`is_coarse=true`), then closes the stream. No lock held during streaming.
|
||||
|
||||
## Program 2 — Aggregator
|
||||
|
||||
@@ -362,6 +375,23 @@ service LogtailService {
|
||||
to the same 1-minute cadence as collectors regardless of how many collectors are connected.
|
||||
- Same tiered ring structure as the collector store; populated from `merger.TopK()` each tick.
|
||||
- `QueryTopN`, `QueryTrend`, `Subscribe`/`Unsubscribe` — identical interface to collector store.
|
||||
- **`LoadHistorical(fine, coarse []Snapshot)`**: writes pre-merged backfill snapshots directly into
|
||||
the ring arrays under `mu.Lock()`, sets head and filled counters, then returns. Safe to call
|
||||
concurrently with queries. The live ticker continues from the updated head after this returns.
|
||||
|
||||
### backfill.go
|
||||
- **`Backfill(ctx, collectorAddrs, cache)`**: called once at aggregator startup (in a goroutine,
|
||||
after the gRPC server is already listening so the frontend is never blocked).
|
||||
- Dials all collectors concurrently and calls `DumpSnapshots` on each.
|
||||
- Accumulates entries per timestamp in `map[unix-second]map[label]count`; multiple collectors'
|
||||
contributions for the same bucket are summed — the same delta-merge semantics as the live path.
|
||||
- Sorts timestamps chronologically, runs `TopKFromMap` per bucket, caps to ring size.
|
||||
- Calls `cache.LoadHistorical` once with the merged results.
|
||||
- **Graceful degradation**: if a collector returns `Unimplemented` (old binary without
|
||||
`DumpSnapshots`), logs an informational message and skips it — live streaming still starts
|
||||
normally. Any other error is logged with timing and also skipped. Partial backfill (some
|
||||
collectors succeed, some fail) is supported.
|
||||
- Logs per-collector stats: bucket counts, total entry counts, and wall-clock duration.
|
||||
|
||||
### registry.go
|
||||
- **`TargetRegistry`**: `sync.RWMutex`-protected `map[addr → name]`. Initialised with the
|
||||
@@ -489,3 +519,6 @@ with a non-zero code on gRPC error.
|
||||
| Regex filters compiled once per query (`CompiledFilter`) | Up to 288 × 5 000 per-entry calls — compiling per-entry would dominate query latency |
|
||||
| Filter expression box (`q=`) redirects to canonical URL | Filter state stays in individual `f_*` params; URLs remain shareable and bookmarkable |
|
||||
| `ListTargets` + frontend source picker | "Which nginx is busiest?" answered by switching `target=` to a collector; no data model changes, no extra memory |
|
||||
| Backfill via `DumpSnapshots` on restart | Aggregator recovers full 24h ring from collectors on restart; gRPC server starts first so frontend is never blocked during backfill |
|
||||
| `DumpRings()` copies under lock, streams without lock | Lock held for microseconds (slice-header copy only); network I/O happens outside the lock so minute rotation is never delayed |
|
||||
| Backfill merges per-timestamp across collectors | Correct cross-collector sums per bucket, same semantics as live delta-merge; collectors that don't support `DumpSnapshots` are skipped gracefully |
|
||||
|
||||
Reference in New Issue
Block a user