Files
ctool/docs/design.md
Pim van Pelt e18a89dcf0 Add Debian packaging, Makefile, manpages, tests, and design doc
Introduces a static-binary build and Debian package (amd64/arm64) with
version/commit/date stamped via -ldflags. Ships section-1 manpages for
ctool, ctfetch, and ctail. Adds a `version` subcommand reachable as
`ctool version`, `ctool -version`, `ctool --version`, `ctool fetch
version`, `ctool tail version`, and via the ctfetch/ctail symlinks. Adds
tests covering the dispatcher, fetch/tail argument parsing, and the
formatter/helper functions. Adds a retrofit design document modelled on
the vpp-maglev one, with FRs and NFRs for each tool and the dispatcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 22:21:32 +02:00

25 KiB
Raw Blame History

ctool Design Document

Metadata

Status Retrofit — describes shipped behavior as of v0.1.0
Author Pim van Pelt <pim@ipng.ch>
Last updated 2026-04-21
Audience Operators and contributors who will read the source tree next

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used as described in RFC 2119, and are reserved in this document for requirements that are actually enforced in code or by an external dependency. Plain-language descriptions of what the system or an operator can do are written in lowercase — "can", "will", "does" — and should not be read as normative.

Summary

ctool is a small collection of command-line tools for working with Static CT API logs. It ships as a single Go binary that dispatches to one of several subcommands. Today there are two: fetch, which decodes one or more entries from a log tile as structured JSON, and tail, which follows a log's checkpoint and prints a one-liner per new certificate. The binary is also exported as two busybox-style symlinks, ctfetch and ctail, so that scripts can call the subcommand directly without the outer ctool prefix.

Background

The Static CT API defines a stateless on-disk layout for Certificate Transparency logs: each leaf lives in a 256-entry gzip-compressed data tile fetched by path, and a Merkle tree over those leaves is published as a parallel set of hash tiles. Because the layout is static, a log is just a tree of files on a web server — anyone with an HTTP client and a decoder can read one. The CT ecosystem is shifting from the old RFC 6962 "logs are a database" model to this static-tile model; operators running or watching such logs need ad-hoc tools to poke at tiles by hand and to watch a log grow in real time.

ctool is those tools, gathered under one roof so that packaging, versioning, and future shared utilities live in a single place. The project deliberately stays small: each subcommand is a thin wrapper over filippo.io/sunlight and Go's standard crypto/x509, with a small shared helper package for the fiddly parts (issuer fetching, SCT decoding, CT log list enrichment).

Goals and Non-Goals

Product Goals

  1. Operator utility first. The tools exist to make Static CT logs inspectable by a human with a shell. Scriptable output (JSON for fetch, fixed-column lines for tail) is a first-class concern.
  2. No state. Every invocation is independent; there are no databases, no lock files, no background processes.
  3. Portable packaging. A single statically-linked binary that works on any Linux amd64/arm64 host without touching system libraries.
  4. Extensible dispatcher. ctool will grow new subcommands over time. Adding one MUST NOT require a new binary, a new Debian package, or any changes outside the cmd/ctool tree.

Non-Goals

  • ctool is not a CT log itself. It does not host, sign, or distribute tiles.
  • It is not a CT monitor in the Google-Chrome-policy sense. Detecting misissuance is left to whatever the operator feeds ctool tail's output into.
  • It is not a general x509 pretty-printer. The decoded fields are the ones CT operators typically care about (SCTs, issuer chain, validity range); a general "dump every extension" mode is out of scope.
  • It does not secure its own outbound traffic. Plain HTTP is accepted when the operator passes an http:// URL; it is the operator's responsibility to prefer HTTPS for production logs.
  • It is not a long-running service. tail runs in a loop, but it is a foreground process — there is no daemon, no systemd unit, no PID file.

Requirements

Each requirement carries a unique identifier (FR-X.Y or NFR-X.Y) so that later sections can cite it.

Functional Requirements

FR-1 Dispatcher (ctool)

  • FR-1.1 A single binary MUST dispatch its first positional argument to a named subcommand. Unknown subcommands MUST exit non-zero with a usage banner on stderr.
  • FR-1.2 The binary MUST recognize busybox-style invocation: when os.Args[0]'s basename is ctfetch or ctail (with or without a platform-native extension such as .exe), the corresponding subcommand is invoked directly and the remaining argv is passed through unchanged.
  • FR-1.3 The dispatcher MUST expose a version subcommand that prints the build version, git commit hash, and build date on stdout and exits zero. The same information MUST also be reachable as ctool -version, ctool --version, ctool fetch version, and ctool tail version, so that "find out what you have" works regardless of which form the operator guesses first.
  • FR-1.4 Usage banners emitted by a subcommand MUST refer to the tool by its effective invocation name — ctfetch ... when invoked through the symlink, ctool fetch ... when invoked through the outer dispatcher — so that copy/paste from an error message yields a command that works.
  • FR-1.5 Adding a new subcommand MUST require only: a new cmd_<name>.go file in cmd/ctool/, a new case in the dispatcher's switch, and (optionally) a new busybox symlink in the package layout. No changes to the Makefile's build graph and no new binaries are required.

FR-2 fetch subcommand

  • FR-2.1 ctool fetch MUST operate in two modes, distinguished by whether the second positional argument parses as a signed decimal integer:
    • Leaf-index mode: <log-url> <leaf-index> [modifiers] fetches the data tile containing leaf-index and decodes the single entry at that position.
    • Tile-dump mode: <tile-url-or-file> [modifiers] fetches or reads one tile and decodes every entry in it.
  • FR-2.2 Tile-dump mode MUST auto-detect hash tiles versus data tiles from the tile contents. For hash tiles, the output is the list of 32-byte SHA-256 node hashes and the +sct, +issuer, +ctlog, and +all modifiers MUST be rejected as an error.
  • FR-2.3 The output format MUST be pretty-printed JSON on stdout with a trailing newline.
  • FR-2.4 The following positional modifiers MUST be accepted, and any other +-prefixed token MUST be rejected as an unknown argument:
    • +sct — decode embedded SCTs on final certificates.
    • +issuer — fetch and parse the issuer certificate from the log's /issuer/<fingerprint> endpoint.
    • +ctlog — enrich each SCT with operator and state data from the CT log list.
    • +all — shorthand for all three of the above.
  • FR-2.5 When +issuer is used with tile-dump mode on a local file, the operator MUST provide --monitoring-url; the subcommand MUST otherwise derive the log root from the tile URL by stripping the /tile/... path component.
  • FR-2.6 Partial tile suffixes (.p/N) MUST be tried first when the path advertises one; on HTTP 404 the full tile MUST be fetched by stripping the suffix.

FR-3 tail subcommand

  • FR-3.1 ctool tail MUST poll the log's /checkpoint endpoint at a configurable interval (default 15s, minimum 1s), and for each completed data tile that appears after the current cursor, MUST decode every entry and print a one-line summary to stdout.
  • FR-3.2 By default, the cursor starts at the current tree tip so that only new entries are printed. --from-leaf N MUST override this so that operators can replay from a known index (including --from-leaf 0 for a full backfill).
  • FR-3.3 The one-liner format MUST be fixed-column and space-separated: leaf index (right-aligned, nine columns), type (cert or pre ), validity range (YYYY-MM-DD..YYYY-MM-DD), issuer label (up to 40 chars, truncated with ...), subject name.
  • FR-3.4 The subject name MUST be the first DNS SAN; falling back to the certificate's subject common name; falling back to the literal (unknown) if neither is present.
  • FR-3.5 The issuer label MUST prepend the issuer organisation to the issuer CN when the CN's first word is not already contained in the organisation name (e.g. R13 is shown as Let's Encrypt R13, but Let's Encrypt Authority X3 is shown verbatim).
  • FR-3.6 Status and error messages MUST go to stderr; one-liners MUST go to stdout, so that the output stream is safe to pipe into grep, awk, or a log shipper without interleaving diagnostic noise.
  • FR-3.7 A tile MUST NOT be fetched until the checkpoint confirms it is complete (256 entries), to avoid unnecessary 404s at the tree tip.
  • FR-3.8 A configurable minimum delay between outgoing HTTP requests MUST be enforced (default 2s, minimum 100ms), to stay a polite distance below any per-IP rate limit a log operator might enforce.

FR-4 Packaging and versioning

  • FR-4.1 The build MUST produce statically linked binaries for linux/amd64 and linux/arm64. CGO_ENABLED=0 MUST be the default so that the resulting binary has no libc dependency and runs on any Linux host of the matching architecture.
  • FR-4.2 The version, commit hash, and build date MUST be injected at link time via -ldflags -X, so that ctool version on a shipped binary identifies the exact build without requiring access to the source tree. The defaults baked into the source MUST keep go run / go build without ldflags usable.
  • FR-4.3 The Debian package MUST install ctool under /usr/bin, MUST provide /usr/bin/ctfetch and /usr/bin/ctail as symlinks to it, and MUST install the three section-1 manpages (ctool.1, ctfetch.1, ctail.1) gzipped under /usr/share/man/man1.

Non-Functional Requirements

NFR-1 Availability and reliability

  • NFR-1.1 A failing HTTP request MUST NOT crash tail. The error MUST be logged to stderr and the next poll MUST happen on schedule.
  • NFR-1.2 A malformed tile MUST NOT be silently skipped. If a tile cannot be read, the subcommand MUST return an error that names the offending leaf index.

NFR-2 Determinism and correctness

  • NFR-2.1 Given the same tile input, fetch MUST produce byte-identical JSON output. No timestamps, random IDs, or map iteration order MAY leak into the output.
  • NFR-2.2 tail's polling interval timer MUST start when the checkpoint is fetched, not when the previous loop finished, so that time spent fetching data tiles counts against the interval and the next poll stays on schedule.

NFR-3 Performance and scalability

  • NFR-3.1 fetch MUST fetch at most one tile per invocation. Leaf-index mode derives the one tile that contains the requested index; tile-dump mode operates on exactly the tile the operator named.
  • NFR-3.2 Caches (CT log list, issuer certificates) MUST be scoped to a single invocation. No on-disk cache, no cross-invocation state.

NFR-4 Security

  • NFR-4.1 Tiles MUST be decompressed with a bounded expansion ratio (currently 100×) to prevent a malicious log from driving the client out of memory via a zip bomb.
  • NFR-4.2 The tools MUST NOT require any Linux capabilities or root privileges. They are plain HTTP clients and run unprivileged.
  • NFR-4.3 ctool MUST NOT write anywhere on the filesystem except when the operator explicitly asks it to (e.g. by redirecting stdout). There is no cache directory, no log file, no state directory.

NFR-5 Operability

  • NFR-5.1 Every CLI flag SHOULD have a sensible default so that the most common invocation is a single URL plus maybe a leaf index.
  • NFR-5.2 The JSON output of fetch SHOULD be stable across patch releases within a minor version. Field additions are allowed; renames and removals are not.
  • NFR-5.3 Subcommand usage banners SHOULD include at least one concrete example against mon.ct.ipng.ch, which is the reference log we test against.

Architecture Overview

Process Model

ctool is a short-lived foreground process. There are no daemons, no sockets, no persistent state. Each invocation:

  1. Parses argv and dispatches to a subcommand.
  2. Makes some number of HTTP requests (or reads a local file).
  3. Writes JSON or one-liners to stdout.
  4. Exits.

tail is the one exception to step 3 being a single write: it loops, making one checkpoint request per interval and zero or more tile requests per loop iteration. It still exits on Ctrl-C or on an unrecoverable error; there is no restart policy because there is no supervisor.

Data Flow

Configuration flows in as command-line flags and positional arguments (there is no config file, no environment variables, no /etc/ anything). Data flows in from the Static CT log's HTTP surface: the /checkpoint endpoint, the /tile/... tree, and the /issuer/<fp> endpoint. The CT log list JSON (default: gstatic.com/ct/log_list/v3/all_logs_list.json) is consumed optionally when +ctlog is requested. Data flows out as JSON or fixed-column lines on stdout, and as status/error messages on stderr.

Components

ctool (dispatcher)

ctool is the outer binary: a Go main package under cmd/ctool/ whose job is to decide which subcommand to run and hand argv off to it. It is deliberately tiny — today about forty lines of code — because every new subcommand adds a case to its switch and nothing else.

Responsibilities

  • Examine os.Args[0] and, if it matches the basename of a known busybox symlink (ctfetch, ctail), route directly to that subcommand (FR-1.2).
  • Otherwise, examine os.Args[1] and route to the named subcommand, print usage and exit non-zero on unknown names (FR-1.1).
  • Handle the version / -version / --version top-level alias by printing the build version and exiting zero (FR-1.3).
  • Provide a single cmdName() helper that subcommands use when building usage banners, so that every banner refers to the effective invocation name (FR-1.4).

Extension Model

Adding a new subcommand foo is a three-line change:

  1. Add cmd/ctool/cmd_foo.go with an exported runFoo(args []string) function.
  2. Add case "foo": runFoo(os.Args[2:]) to main()'s switch and a matching line to usage().
  3. (Optional, if busybox-style invocation is wanted) Add case "ctfoo": runFoo(os.Args[1:]) to the symlink switch, add the symlink to debian/build-deb.sh, and add a ctfoo(1) manpage.

No Makefile change is required (FR-1.5) because the build target globs over ./cmd/ctool/. No new Debian package is required; the existing ctool package gets the new subcommand for free, and optionally a new symlink.

version is specifically not a subcommand file — it lives in the dispatcher because every subcommand reuses the same printVersion() helper for its own version alias (ctool fetch version, ctool tail version).

Interfaces

Presents.

  • A command-line interface driven by os.Args. The exit-code contract is 0 on success, 1 on unknown subcommand or subcommand-internal fatal error, and whatever the subcommand chooses otherwise.
  • stdout and stderr streams. Structured output (JSON, one-liners) goes to stdout; banners, progress messages, and errors go to stderr (FR-3.6, NFR-5.3).

Consumes.

  • os.Args — the sole input to the dispatcher.

fetch

fetch (reachable as ctool fetch or ctfetch) decodes one or more entries from a Static CT log tile.

Responsibilities

  • Distinguish leaf-index mode from tile-dump mode by whether args[1] parses as a decimal integer (FR-2.1).
  • Fetch the relevant tile (or read it from disk), decompress it with a bounded ratio (NFR-4.1), and decode each leaf.
  • Optionally enrich each entry with decoded SCTs, the issuer certificate, and CT-log-list metadata per the +sct, +issuer, +ctlog, and +all modifiers (FR-2.4).
  • In tile-dump mode, auto-detect hash tiles versus data tiles and refuse the cert-oriented modifiers on hash tiles (FR-2.2).
  • Emit pretty-printed JSON on stdout (FR-2.3).

Tile Fetching

The tile URL is derived by [filippo.io/sunlight]' TilePath function from the requested leaf index; fetch tries the partial tile suffix first (.p/N) and falls back to the full tile on a 404 (FR-2.6). Both paths return gzipped bytes, which are then decompressed with io.LimitReader capping expansion at 100× the input size (NFR-4.1). Tile-dump mode on a local file skips the URL derivation and simply reads the file.

Enrichment

The optional modifiers trigger network calls outside the main tile fetch:

  • +sct parses the SCT list extension from the DER cert in-memory; no extra network calls.
  • +issuer fetches /issuer/<fp> from the log root for each referenced issuer fingerprint. Results are cached for the lifetime of the invocation (NFR-3.2).
  • +ctlog fetches the CT log list JSON once per invocation and looks up each SCT's log ID against it.

+all is syntactic sugar for all three.

Interfaces

Presents.

  • Pretty-printed JSON on stdout, with a trailing newline. The top-level object is either a single Entry (leaf-index mode), a DumpResult with an entries array (data tile), or a DumpResult with a hash_tile field (hash tile).

Consumes.

  • Positional arguments (log URL, leaf index, modifiers) and two flags (--logs-list-url, --monitoring-url) from argv.
  • The Static CT log's HTTP surface — checkpoint is not read by fetch, but the tile tree and the issuer endpoint are.
  • The outbound network directly. No proxy handling, no capability requirements (NFR-4.2).

tail

tail (reachable as ctool tail or ctail) follows a Static CT log's checkpoint and prints a one-line summary per new entry.

Responsibilities

  • Poll the log's /checkpoint on a configurable interval and track the current tree size (FR-3.1).
  • For each completed data tile whose entries have not yet been printed, fetch it, decompress it, decode each leaf, and print a fixed-column one-liner to stdout (FR-3.1, FR-3.3).
  • Throttle outbound HTTP requests with a configurable minimum inter-request delay (FR-3.8).
  • Respect a configurable starting leaf index so that operators can replay from any point in the log (FR-3.2).

Poll Loop

Each iteration fetches the checkpoint first and records the timestamp; then walks forward from the cursor tile by tile, stopping at the first tile that the checkpoint does not yet confirm complete (FR-3.7). The next poll sleeps until checkpoint_time + interval, so that the time spent fetching data tiles counts against the interval and consecutive polls stay on a stable cadence (NFR-2.2).

A tile fetch failure is logged to stderr and the iteration is retried on the next poll (NFR-1.1); the cursor does not advance past a leaf that was never printed, so no entries are lost even across transient log outages.

One-Liner Format

Columns are separated by a single space, each of fixed width. The format string is documented in docs/ctail.md, reproduced here:

%9d %-4s  %-21s  %-40s %s

Fields: leaf index (decimal), type (cert or pre ), validity range (YYYY-MM-DD..YYYY-MM-DD), issuer label (truncated to 40 with ...), subject name. The truncation rule and the issuer-with-org prepending rule are both tested (FR-3.3, FR-3.4, FR-3.5).

Interfaces

Presents.

  • Fixed-column one-liners on stdout, one per log entry.
  • Status and error messages on stderr (FR-3.6).

Consumes.

  • Positional <log-url> plus flags --interval, --from-leaf, --rate-limit, --user-agent.
  • The log's HTTP surface (/checkpoint and the tile tree).
  • The outbound network directly.

Shared Helpers (internal/utils)

internal/utils holds the code that both fetch and tail call into but neither one should own: tile fetching with partial fallback, bounded gzip decompression, SCT decoding, issuer fetching with per-invocation caching, and CT-log-list enrichment. The package is deliberately import-only from cmd/ctool/; nothing outside the binary imports it.

The helpers are not tested in isolation today beyond what the binary-level tests exercise; if the package grows substantially, per-package unit tests SHOULD be added before it becomes difficult to reason about.

Operational Concerns

Packaging

The repository ships a single Debian package, ctool, built for amd64 and arm64. The package contains /usr/bin/ctool plus two symlinks (ctfetch, ctail) and three manpages under /usr/share/man/man1. There are no conffiles, no maintainer scripts, and no systemd units; the package is a pure binary-plus-docs bundle (FR-4.3).

Binaries are produced with CGO_ENABLED=0 (FR-4.1). The effect is that the binary has no libc dependency and runs on any Linux amd64/arm64 host regardless of glibc version, at the cost of Go's net package using the pure-Go resolver instead of getaddrinfo (i.e. /etc/nsswitch.conf and NSS modules are ignored). ctool only does DNS-over-UDP lookups for log hostnames, so this is fine.

Versioning

Versioning follows Semantic Versioning. The authoritative version string lives in Makefile's VERSION variable; on each build, the Makefile stamps main.version, main.commit, and main.date with -ldflags -X (FR-4.2). The commit and date are derived from git rev-parse --short HEAD and an ISO-8601 UTC timestamp at build time. ctool version prints all three; the Debian package's Version: field carries only the release version.

Failure Modes

  • Log returns 404 on a tile we expected. Treat as a transient error; tail logs to stderr and retries on the next poll (NFR-1.1). fetch exits non-zero with the HTTP status in the error message.
  • Log returns gzip bomb. The decompress helper's LimitReader caps expansion at 100× the input size (NFR-4.1); an attempt to exceed it surfaces as a decode error, not an out-of-memory crash.
  • Checkpoint is malformed. tail logs a "checkpoint error" on stderr and sleeps for one interval before retrying.
  • CT log list is unreachable. +ctlog degrades: a warning is printed to stderr and entries go out without the ctlog enrichment rather than failing the whole invocation.
  • Operator passes a hash tile URL with +sct / +issuer / +ctlog. The subcommand detects the tile type and exits with an error explaining that cert-oriented modifiers do not apply to hash tiles (FR-2.2).

Observability

ctool has no metrics and no structured logging surface of its own. Observability in practice is whatever the operator builds around the output streams:

  • Pipe ctool tail into a log shipper or awk to watch for specific issuers or subjects.
  • Pipe ctool fetch ... +all into jq for ad-hoc inspection.
  • Exit codes (0 / non-zero) are the only machine-readable signal of success or failure.

If a future operational need calls for structured logs or metrics (e.g. a ctool watch subcommand that is meant to be run under systemd), it can be added without disturbing fetch or tail.

Security

The tools run unprivileged (NFR-4.2) and write nothing to disk outside stdout (NFR-4.3). They do not handle credentials, do not parse user-supplied certificates for signature verification, and do not act as a TLS server.

TLS verification on outbound requests uses Go's default roots; there is no flag to disable it. Operators probing a log over plain HTTP accept the risk of in-flight modification.

Alternatives Considered

  • Separate ctfetch and ctail binaries, no dispatcher. Rejected in favor of one binary with symlinks. A single binary is simpler to package, ship, and version, and scales better as new subcommands land (FR-1.5). Operators who prefer the separate-binary UX still get it through the ctfetch / ctail symlinks.
  • A full CLI framework (cobra, urfave/cli). Rejected because the dispatcher is under fifty lines and the subcommand parsers use the standard flag package. A framework would add a dependency and a build surface without improving the operator experience at this scale. If the subcommand count grows past ~five, this decision should be revisited.
  • Embedding a local cache for issuer certificates across invocations. Rejected for now (NFR-3.2). Caching introduces a state directory, a cache-invalidation policy, and a new failure mode (stale issuer); since fetch is typically run ad-hoc rather than in a tight loop, the per-invocation cache is sufficient.
  • Making tail a daemon with a systemd unit. Rejected in favor of a plain foreground loop. Operators who want a daemon wrap it in systemd-run, tmux, or their orchestration tool of choice; the tool itself stays stateless and exit-code honest.

Open Questions

  • Output schema stability. fetch's JSON output is currently whatever internal/utils emits. NFR-5.2 commits to additive compatibility across patch releases, but a minor-version bump may still reshape fields. A more explicit schema document (or a --schema-version flag) would help downstream consumers.
  • More subcommands. The dispatcher is built for growth — candidate ideas include ctool verify (checkpoint signature
    • consistency proof) and ctool stats (issuer/validity distribution over a time window). These are out of scope for v0.1 but the FR-1.x requirements exist precisely so the additions stay cheap.
  • Proxy and retry policy. There is no HTTP retry today. A log that 502s intermittently will surface those as visible errors on stderr (NFR-1.1) but the cursor still stalls until the log recovers. A configurable retry/backoff policy may be worth adding once we see a deployment hit this in practice.