Merge branch 'main' of git.ipng.ch:ipng/ipng.ch
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
2025-08-24 15:23:47 +02:00

View File

@@ -75,10 +75,10 @@ instance, which sets it apart from TesseraCT which can only handle one log per i
`submissionprefix` which `sunlight` will use to accept writes, and a `monitoringprefix` which
`skylight` will use for reads.
I stuble across a small issue - I haven't created multiple DNS hostnames for the test machine. So I
I stumble across a small issue - I haven't created multiple DNS hostnames for the test machine. So I
decide to use a different port for one versus the other. The write path will use TLS on port 1443
while Sunlight will point to a normal HTTP port 1080. And considering I don't have a certificate for
`*.lab.ipng.ch`, I will use a self-signed one instad:
`*.lab.ipng.ch`, I will use a self-signed one instead:
```
pim@ctlog-test:/etc/sunlight$ openssl genrsa -out ca.key 2048
@@ -89,24 +89,28 @@ pim@ctlog-test:/etc/sunlight$ openssl req -newkey rsa:2048 -nodes -keyout sunlig
pim@ctlog-test:/etc/sunlight# openssl x509 -req -extfile \
<(printf "subjectAltName=DNS:ctlog-test.lab.ipng.ch,DNS:ctlog-test.lab.ipng.ch") -days 365 \
-in sunlight.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out sunlight.pem
ln -s sunlight.pem skylight.pem
ln -s sunlight-key.pem skylight-key.pem
```
This little snippet yields `sunlight.pem` (the certificate) and `sunlight-key.pem` (the private key).
With these in hand, I can start the rest of the show. First I will prepare the NVME storage with a
few datasets in which Sunlight will store its data:
This little snippet yields `sunlight.pem` (the certificate) and `sunlight-key.pem` (the private
key), and symlinks them to `skylight.pem` and `skylight-key.pem` for simplicity. With these in hand,
I can start the rest of the show. First I will prepare the NVME storage with a few datasets in
which Sunlight will store its data:
```
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/shared
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/logs
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/logs/sunlight-test
pim@ctlog-test:/etc/sunlight$ sudo chown -R pim:pim /ssd-vol0/sunlight-test
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/shared
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs/sunlight-test
pim@ctlog-test:~$ sudo chown -R pim:pim /ssd-vol0/sunlight-test
```
Then I'll create the Sunlight configuration:
```
pim@ctlog-test:/etc/sunlight$ sunlight-keygen -f sunlight-test.seed.bin
Log ID: IPngJcHCHWi+s37vfFqpY9ouk+if78wAY2kl/sh3c8E=
ECDSA public key:
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHR
@@ -118,8 +122,9 @@ Ed25519 public key:
-----END PUBLIC KEY-----
```
The first block creates key material for the log, and I get a huge surprise: the Log ID starts
precisely with the string `IPng`! What are the odds that that would happen. Slick!
The first block creates key material for the log, and I get a fun surprise: the Log ID starts
precisely with the string IPng... what are the odds that that would happen!? I should tell Antonis
about this, it's dope!
As a safety precaution, Sunlight requires the operator to make the `checkpoints.db` by hand, which
I'll also do:
@@ -136,7 +141,7 @@ When learning about [[Tessera]({{< ref 2025-07-26-ctlog-1 >}})], I already kind
conclusion that, for our case at IPng at least, running the fully cloud-native version with S3
storage and MySQL database, gave both poorer performance, but also more operational complexity. But
I find it interesting to compare behavior and performance, so I'll start by creating a Sunlight log
using backing Minio SSD storage.
using backing MinIO SSD storage.
I'll first create the bucket and a user account to access it:
@@ -223,10 +228,11 @@ pim@ctlog-test:~$ curl -k https://ctlog-test.lab.ipng.ch:1443/log.v3.json
404 page not found
```
I'm starting to think that using a non-standard listen port won't work. The logname is called
I'm starting to think that using a non-standard listen port won't work, or more precisely, adding
a port in the `monitoringprefix` won't work. I notice that the logname is called
`ctlog-test.lab.ipng.ch:1443` which I don't think is supposed to have a portname in it. So instead,
I make Sunlight `listen` on port 443 and omit the port in the `submissionprefix`, and give it
privileges to bind the privileged port like so:
I make Sunlight `listen` on port 443 and omit the port in the `submissionprefix`, and give it and
its companion Skylight the needed privileges to bind the privileged port like so:
```
pim@ctlog-test:~$ sudo setcap 'cap_net_bind_service=+ep' /usr/local/bin/sunlight
@@ -236,7 +242,7 @@ pim@ctlog-test:~$ sunlight -testcert -c /etc/sunlight/sunlight-s3.yaml
{{< image width="60%" src="/assets/ctlog/sunlight-test-s3.png" alt="Sunlight testlog / S3" >}}
And with that, Sunlight reports for duty. Hoi!
And with that, Sunlight reports for duty and the links work. Hoi!
#### Sunlight: Loadtesting S3
@@ -246,7 +252,7 @@ paths, and I've created a snakeoil self-signed cert. CT Hammer does not accept t
so I need to make a tiny change to the Hammer:
```
pim@minio-ssd:~/src/tesseract$ git diff
pim@ctlog-test:~/src/tesseract$ git diff
diff --git a/internal/hammer/hammer.go b/internal/hammer/hammer.go
index 3828fbd..1dfd895 100644
--- a/internal/hammer/hammer.go
@@ -286,10 +292,10 @@ pim@ctlog-test:/etc/sunlight$ T=0; O=0; while :; do \
25196 1 seconds 87 certs
```
On the first commandline I'll start the loadtest at 100 writes/sec with a duplication probability of
10%, which allows me to test Sunlights ability to avoid writing duplicates. This means I should see
on average a growth of 90/s. Check. I raise the load to 500/s:
On the first commandline I'll start the loadtest at 100 writes/sec with the standard duplication
probability of 10%, which allows me to test Sunlights ability to avoid writing duplicates. This
means I should see on average a growth of the tree at about 90/s. Check. I raise the write-load to
500/s:
```
39421 1 seconds 443 certs
@@ -299,7 +305,7 @@ on average a growth of 90/s. Check. I raise the load to 500/s:
41194 1 seconds 448 certs
```
.. and to 1000/s:
.. and to 1'000/s:
```
57941 1 seconds 945 certs
58886 1 seconds 970 certs
@@ -314,8 +320,8 @@ W0810 14:55:29.660710 1398779 analysis.go:134] (1 x) failed to create request: f
W0810 14:55:30.496603 1398779 analysis.go:124] (1 x) failed to create request: write leaf was not OK. Status code: 500. Body: "failed to read body: read tcp 127.0.1.1:443->127.0.0.1:44908: i/o timeout\n"
```
I raise the Hammer load to 5000/sec (which means 4500/s unique certs and 500 duplicates), and find
the max writes/sec to max out at around 4200/s:
I raise the Hammer load to 5'000/sec (which means 4'500/s unique certs and 500 duplicates), and find
the max committed writes/sec to max out at around 4'200/s:
```
879637 1 seconds 4213 certs
883850 1 seconds 4207 certs
@@ -332,9 +338,9 @@ W0810 15:00:05.496459 1398779 analysis.go:124] (1 x) failed to create request: f
W0810 15:00:07.187181 1398779 analysis.go:124] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF
```
At this load of 5000/s, Minio is not very impressed. Remember in the [[other article]({{< ref
2025-07-26-ctlog-1 >}})] I loadtested it to about 7500 ops/sec and the statistics below are about
50 ops/sec (2800/min). I conclude that Minio is, in fact, bored of this whole activity:
At this load of 4'200/s, MinIO is not very impressed. Remember in the [[other article]({{< ref
2025-07-26-ctlog-1 >}})] I loadtested it to about 7'500 ops/sec and the statistics below are about
50 ops/sec (2'800/min). I conclude that MinIO is, in fact, bored of this whole activity:
```
pim@ctlog-test:/etc/sunlight$ mc admin trace --stats ssd
@@ -348,11 +354,11 @@ s3.PutObject 37602 (70.3%) 1982.2 6.2ms 785µs 86.7ms 6.1ms 86
s3.GetObject 15918 (29.7%) 839.1 996µs 670µs 51.3ms 912µs 51.2ms ↑46B ↓3.0K ↑38K ↓2.4M 0
```
Sunlight still keeps it certificate cache on local disk. At a rate of 5000/s, the ZFS pool has a
write rate of about 105MB/s with about 877 ZFS writes for every 5000 certificates.
Sunlight still keeps its certificate cache on local disk. At a rate of 4'200/s, the ZFS pool has a
write rate of about 105MB/s with about 877 ZFS writes per second.
```
pim@ctlog-test:/etc/sunlight$ zpool iostat -v ssd-vol0 10
pim@ctlog-test:/etc/sunlight$ zpool iostat -v ssd-vol0 10
capacity operations bandwidth
pool alloc free read write read write
-------------------------- ----- ----- ----- ----- ----- -----
@@ -380,24 +386,24 @@ A few interesting observations:
* The write rate to ZFS is significantly higher with Sunlight than TesseraCT (about 8:1). This is
likely explained because the sqlite3 database lives on ZFS here, while TesseraCT uses MariaDB
running on a different filesystem.
* The Minio IO is a lot lighter. As I reduce the load to 1000/s, as was the case in the TesseraCT
* The MinIO usage is a lot lighter. As I reduce the load to 1'000/s, as was the case in the TesseraCT
test, I can see the ratio of Get:Put was 93:4 in TesseraCT, while it's 70:30 here. TesseraCT as
also consuming more IOPS, running at about 10.5k requests/minute, while Sunlight is
significantly calmer at 2.8k requets/minute (almost 4x less!)
significantly calmer at 2.8k requests/minute (almost 4x less!)
* The burst capacity of Sunlight is a fair bit higher than TesseraCT, likely due to its more
efficient use of S3 backends.
***Conclusion***: Sunlight S3+Minio can handle 1000/s reliably, and can spike to 5000/s with only
***Conclusion***: Sunlight S3+MinIO can handle 1'000/s reliably, and can spike to 4'200/s with only
few errors.
#### Sunlight: Loadtesting POSIX
When I took a closer look at TesseraCT a few weeks ago, it struck me that while making a
cloud-native setup, with S3 storage would allow for a cool way to enable storage scaling and
read-path redundancy, by creating replicated buckets, it does come at a significant operational
overhead and complexity. My main concern is the amount of different moving parts, and Sunlight
really has one very appealing property: it can run entirely on one machine without the need for any
other moving parts - even the SQL database is linked in. That's pretty slick.
read-path redundancy, by creating synchronously replicated buckets, it does come at a significant
operational overhead and complexity. My main concern is the amount of different moving parts, and
Sunlight really has one very appealing property: it can run entirely on one machine without the need
for any other moving parts - even the SQL database is linked in. That's pretty slick.
```
pim@ctlog-test:/etc/sunlight$ cat << EOF > sunlight.yaml
@@ -453,14 +459,16 @@ sunlight_sqlite_update_duration_seconds{quantile="0.99"} 0.014922489
```
I'm seeing here that at a load of 100/s (with 90/s of unique certificates), the 99th percentile
add-chain lataency is 207ms, which makes sense because the `period` configuration field is set to
add-chain latency is 207ms, which makes sense because the `period` configuration field is set to
200ms. The filesystem operations (discard, fetch, upload) are _de minimis_ and the sequencing
duration is at 109ms. Excellent!
But can this thing go really fast? I am reminded that the CT Hammer uses more CPU than TesseraCT,
and I've seen it above also when running my 5000/s loadtest that's about all the hammer can take on
a single Dell R630. So, as I did with the TesseraCT test, I'll use the Minio SSD and Minio Disk
machines to generate the load. I boot them, so now I can hammer, or shall I say jackhammer away:
But can this thing go really fast? I do remember that the CT Hammer uses more CPU than TesseraCT,
and I've seen it above also when running my 5'000/s loadtest that's about all the hammer can take on
a single Dell R630. So, as I did with the TesseraCT test, I'll use the MinIO SSD and MinIO Disk
machines to generate the load.
I boot them, so that I can hammer, or shall I say jackhammer away:
```
pim@ctlog-test:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab.ipng.ch \
@@ -479,7 +487,8 @@ pim@minio-disk:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab
--max_read_ops=0 --num_writers=5000 --max_write_ops=5000 --serial_offset=2000000
```
This will generate 15000/s of load, which I note makes Sunlight a fair bit more bursty:
This will generate 15'000/s of load, which I note does bring Sunlight to its knees, although it does
remain stable (yaay!) with a somewhat more bursty checkpoint interval:
```
5504780 1 seconds 4039 certs
@@ -501,17 +510,17 @@ pim@ctlog-test:/etc/sunlight$ while :; do curl -ksS https://ctlog-test.lab.ipng.
```
This rate boils down to `(6576712-6008831)/120` or 4'700/s of written certs, which at a duplication
ratio of 10% means approximately 5200/s of total accepted certs. This rate, Sunlight is consuming
ratio of 10% means approximately 5'200/s of total accepted certs. This rate, Sunlight is consuming
about 10.3 CPUs/s, while Skylight is at 0.1 CPUs/s and the CT Hammer is at 11.1 CPUs/s; Given the 40
threads on this machine, I am not saturating the CPU, but I'm curious as this rate is significantly
lower than TesseraCT. I briefly turn off the hammer on `ctlog-test` to allow Sunlight to monopolize
the entire CPU. The CPU use lowers to about 9.3 CPUs/s showing that indeed, the bottleneck is not
strictly CPU:
the entire machine. The CPU use does reduce to about 9.3 CPUs/s suggesting that indeed, the bottleneck
is not strictly CPU:
{{< image width="90%" src="/assets/ctlog/btop-sunlight.png" alt="Sunlight btop" >}}
When using only two CT Hammers (on `minio-ssd.lab.ipng.ch` and `minio-disk.lab.ipng.ch`), the CPU
use on the `ctlog-test.lab.ipng.ch` machine definitely goes down (CT Hammer is kind of a CPU hog..),
use on the `ctlog-test.lab.ipng.ch` machine definitely goes down (CT Hammer is kind of a CPU hog....),
but the resulting throughput doesn't change that much:
```
@@ -607,11 +616,11 @@ this setup.
## Wrapup - Observations
TesseraCT and Sunlight handle quite differently. Both are easily up to the task of serving the
current write-load (which is about 250/s).
From an operators point of view, TesseraCT and Sunlight handle quite differently. Both are easily up
to the task of serving the current write-load (which is about 250/s).
* ***S3***: When using the S3 backend, TesseraCT became quite unhappy with 800/s while Sunlight
went all the way up to 5'000/s and sent significantly less requests to Minio (about 4x less),
* ***S3***: When using the S3 backend, TesseraCT became quite unhappy above 800/s while Sunlight
went all the way up to 4'200/s and sent significantly less requests to MinIO (about 4x less),
while showing good telemetry on the use of S3 backends.
* ***POSIX***: When using normal filesystem, Sunlight seems to peak at 4'800/s while TesseraCT
@@ -629,6 +638,10 @@ current write-load (which is about 250/s).
precaution - but also using ports in those prefixes (other than the standard 443) rendered
Sunlight and Skylight unusable for me.
* Skylight only serves from local directory, it does not have support for S3. For operators using S3,
an alternative could be to use NGINX in the serving path, similar to TesseraCT. Skylight does have
a few things to teach me though, notably on proper compression, content type and other headers.
* TesseraCT does not have a configuration file, and will run exactly one log per binary
instance. It uses flags to construct the environment, and is much more forgiving for creative
`origin` (log name), and submission- and monitoring URLs. It's happy to use regular 'http://'