A few readability edits
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
2025-08-10 18:50:00 +02:00
parent f4ed332b18
commit 4f0188abeb

View File

@@ -89,24 +89,28 @@ pim@ctlog-test:/etc/sunlight$ openssl req -newkey rsa:2048 -nodes -keyout sunlig
pim@ctlog-test:/etc/sunlight# openssl x509 -req -extfile \
<(printf "subjectAltName=DNS:ctlog-test.lab.ipng.ch,DNS:ctlog-test.lab.ipng.ch") -days 365 \
-in sunlight.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out sunlight.pem
ln -s sunlight.pem skylight.pem
ln -s sunlight-key.pem skylight-key.pem
```
This little snippet yields `sunlight.pem` (the certificate) and `sunlight-key.pem` (the private key).
With these in hand, I can start the rest of the show. First I will prepare the NVME storage with a
few datasets in which Sunlight will store its data:
This little snippet yields `sunlight.pem` (the certificate) and `sunlight-key.pem` (the private
key), and symlinks them to `skylight.pem` and `skylight-key.pem` for simplicity. With these in hand,
I can start the rest of the show. First I will prepare the NVME storage with a few datasets in
which Sunlight will store its data:
```
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/shared
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/logs
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/logs/sunlight-test
pim@ctlog-test:/etc/sunlight$ sudo chown -R pim:pim /ssd-vol0/sunlight-test
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/shared
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs/sunlight-test
pim@ctlog-test:~$ sudo chown -R pim:pim /ssd-vol0/sunlight-test
```
Then I'll create the Sunlight configuration:
```
pim@ctlog-test:/etc/sunlight$ sunlight-keygen -f sunlight-test.seed.bin
Log ID: IPngJcHCHWi+s37vfFqpY9ouk+if78wAY2kl/sh3c8E=
ECDSA public key:
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHR
@@ -118,8 +122,9 @@ Ed25519 public key:
-----END PUBLIC KEY-----
```
The first block creates key material for the log, and I get a huge surprise: the Log ID starts
precisely with the string `IPng`! What are the odds that that would happen. Slick!
The first block creates key material for the log, and I get a fun surprise: the Log ID starts
precisely with the string IPng... what are the odds that that would happen!? I should tell Antonis
about this, it's dope!
As a safety precaution, Sunlight requires the operator to make the `checkpoints.db` by hand, which
I'll also do:
@@ -136,7 +141,7 @@ When learning about [[Tessera]({{< ref 2025-07-26-ctlog-1 >}})], I already kind
conclusion that, for our case at IPng at least, running the fully cloud-native version with S3
storage and MySQL database, gave both poorer performance, but also more operational complexity. But
I find it interesting to compare behavior and performance, so I'll start by creating a Sunlight log
using backing Minio SSD storage.
using backing MinIO SSD storage.
I'll first create the bucket and a user account to access it:
@@ -223,10 +228,11 @@ pim@ctlog-test:~$ curl -k https://ctlog-test.lab.ipng.ch:1443/log.v3.json
404 page not found
```
I'm starting to think that using a non-standard listen port won't work. The logname is called
I'm starting to think that using a non-standard listen port won't work, or more precisely, adding
a port in the `monitoringprefix` won't work. I notice that the logname is called
`ctlog-test.lab.ipng.ch:1443` which I don't think is supposed to have a portname in it. So instead,
I make Sunlight `listen` on port 443 and omit the port in the `submissionprefix`, and give it
privileges to bind the privileged port like so:
I make Sunlight `listen` on port 443 and omit the port in the `submissionprefix`, and give it and
its companion Skylight the needed privileges to bind the privileged port like so:
```
pim@ctlog-test:~$ sudo setcap 'cap_net_bind_service=+ep' /usr/local/bin/sunlight
@@ -236,7 +242,7 @@ pim@ctlog-test:~$ sunlight -testcert -c /etc/sunlight/sunlight-s3.yaml
{{< image width="60%" src="/assets/ctlog/sunlight-test-s3.png" alt="Sunlight testlog / S3" >}}
And with that, Sunlight reports for duty. Hoi!
And with that, Sunlight reports for duty and the links work. Hoi!
#### Sunlight: Loadtesting S3
@@ -246,7 +252,7 @@ paths, and I've created a snakeoil self-signed cert. CT Hammer does not accept t
so I need to make a tiny change to the Hammer:
```
pim@minio-ssd:~/src/tesseract$ git diff
pim@ctlog-test:~/src/tesseract$ git diff
diff --git a/internal/hammer/hammer.go b/internal/hammer/hammer.go
index 3828fbd..1dfd895 100644
--- a/internal/hammer/hammer.go
@@ -286,10 +292,10 @@ pim@ctlog-test:/etc/sunlight$ T=0; O=0; while :; do \
25196 1 seconds 87 certs
```
On the first commandline I'll start the loadtest at 100 writes/sec with a duplication probability of
10%, which allows me to test Sunlights ability to avoid writing duplicates. This means I should see
on average a growth of 90/s. Check. I raise the load to 500/s:
On the first commandline I'll start the loadtest at 100 writes/sec with the standard duplication
probability of 10%, which allows me to test Sunlights ability to avoid writing duplicates. This
means I should see on average a growth of the tree at about 90/s. Check. I raise the write-load to
500/s:
```
39421 1 seconds 443 certs
@@ -299,7 +305,7 @@ on average a growth of 90/s. Check. I raise the load to 500/s:
41194 1 seconds 448 certs
```
.. and to 1000/s:
.. and to 1'000/s:
```
57941 1 seconds 945 certs
58886 1 seconds 970 certs
@@ -314,8 +320,8 @@ W0810 14:55:29.660710 1398779 analysis.go:134] (1 x) failed to create request: f
W0810 14:55:30.496603 1398779 analysis.go:124] (1 x) failed to create request: write leaf was not OK. Status code: 500. Body: "failed to read body: read tcp 127.0.1.1:443->127.0.0.1:44908: i/o timeout\n"
```
I raise the Hammer load to 5000/sec (which means 4500/s unique certs and 500 duplicates), and find
the max writes/sec to max out at around 4200/s:
I raise the Hammer load to 5'000/sec (which means 4'500/s unique certs and 500 duplicates), and find
the max committed writes/sec to max out at around 4'200/s:
```
879637 1 seconds 4213 certs
883850 1 seconds 4207 certs
@@ -332,9 +338,9 @@ W0810 15:00:05.496459 1398779 analysis.go:124] (1 x) failed to create request: f
W0810 15:00:07.187181 1398779 analysis.go:124] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF
```
At this load of 5000/s, Minio is not very impressed. Remember in the [[other article]({{< ref
2025-07-26-ctlog-1 >}})] I loadtested it to about 7500 ops/sec and the statistics below are about
50 ops/sec (2800/min). I conclude that Minio is, in fact, bored of this whole activity:
At this load of 4'200/s, MinIO is not very impressed. Remember in the [[other article]({{< ref
2025-07-26-ctlog-1 >}})] I loadtested it to about 7'500 ops/sec and the statistics below are about
50 ops/sec (2'800/min). I conclude that MinIO is, in fact, bored of this whole activity:
```
pim@ctlog-test:/etc/sunlight$ mc admin trace --stats ssd
@@ -348,11 +354,11 @@ s3.PutObject 37602 (70.3%) 1982.2 6.2ms 785µs 86.7ms 6.1ms 86
s3.GetObject 15918 (29.7%) 839.1 996µs 670µs 51.3ms 912µs 51.2ms ↑46B ↓3.0K ↑38K ↓2.4M 0
```
Sunlight still keeps it certificate cache on local disk. At a rate of 5000/s, the ZFS pool has a
write rate of about 105MB/s with about 877 ZFS writes for every 5000 certificates.
Sunlight still keeps its certificate cache on local disk. At a rate of 4'200/s, the ZFS pool has a
write rate of about 105MB/s with about 877 ZFS writes per second.
```
pim@ctlog-test:/etc/sunlight$ zpool iostat -v ssd-vol0 10
pim@ctlog-test:/etc/sunlight$ zpool iostat -v ssd-vol0 10
capacity operations bandwidth
pool alloc free read write read write
-------------------------- ----- ----- ----- ----- ----- -----
@@ -380,24 +386,24 @@ A few interesting observations:
* The write rate to ZFS is significantly higher with Sunlight than TesseraCT (about 8:1). This is
likely explained because the sqlite3 database lives on ZFS here, while TesseraCT uses MariaDB
running on a different filesystem.
* The Minio IO is a lot lighter. As I reduce the load to 1000/s, as was the case in the TesseraCT
* The MinIO usage is a lot lighter. As I reduce the load to 1'000/s, as was the case in the TesseraCT
test, I can see the ratio of Get:Put was 93:4 in TesseraCT, while it's 70:30 here. TesseraCT as
also consuming more IOPS, running at about 10.5k requests/minute, while Sunlight is
significantly calmer at 2.8k requets/minute (almost 4x less!)
significantly calmer at 2.8k requests/minute (almost 4x less!)
* The burst capacity of Sunlight is a fair bit higher than TesseraCT, likely due to its more
efficient use of S3 backends.
***Conclusion***: Sunlight S3+Minio can handle 1000/s reliably, and can spike to 5000/s with only
***Conclusion***: Sunlight S3+MinIO can handle 1'000/s reliably, and can spike to 4'200/s with only
few errors.
#### Sunlight: Loadtesting POSIX
When I took a closer look at TesseraCT a few weeks ago, it struck me that while making a
cloud-native setup, with S3 storage would allow for a cool way to enable storage scaling and
read-path redundancy, by creating replicated buckets, it does come at a significant operational
overhead and complexity. My main concern is the amount of different moving parts, and Sunlight
really has one very appealing property: it can run entirely on one machine without the need for any
other moving parts - even the SQL database is linked in. That's pretty slick.
read-path redundancy, by creating synchronously replicated buckets, it does come at a significant
operational overhead and complexity. My main concern is the amount of different moving parts, and
Sunlight really has one very appealing property: it can run entirely on one machine without the need
for any other moving parts - even the SQL database is linked in. That's pretty slick.
```
pim@ctlog-test:/etc/sunlight$ cat << EOF > sunlight.yaml
@@ -453,14 +459,16 @@ sunlight_sqlite_update_duration_seconds{quantile="0.99"} 0.014922489
```
I'm seeing here that at a load of 100/s (with 90/s of unique certificates), the 99th percentile
add-chain lataency is 207ms, which makes sense because the `period` configuration field is set to
add-chain latency is 207ms, which makes sense because the `period` configuration field is set to
200ms. The filesystem operations (discard, fetch, upload) are _de minimis_ and the sequencing
duration is at 109ms. Excellent!
But can this thing go really fast? I am reminded that the CT Hammer uses more CPU than TesseraCT,
and I've seen it above also when running my 5000/s loadtest that's about all the hammer can take on
a single Dell R630. So, as I did with the TesseraCT test, I'll use the Minio SSD and Minio Disk
machines to generate the load. I boot them, so now I can hammer, or shall I say jackhammer away:
But can this thing go really fast? I do remember that the CT Hammer uses more CPU than TesseraCT,
and I've seen it above also when running my 5'000/s loadtest that's about all the hammer can take on
a single Dell R630. So, as I did with the TesseraCT test, I'll use the MinIO SSD and MinIO Disk
machines to generate the load.
I boot them, so that I can hammer, or shall I say jackhammer away:
```
pim@ctlog-test:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab.ipng.ch \
@@ -479,7 +487,8 @@ pim@minio-disk:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab
--max_read_ops=0 --num_writers=5000 --max_write_ops=5000 --serial_offset=2000000
```
This will generate 15000/s of load, which I note makes Sunlight a fair bit more bursty:
This will generate 15'000/s of load, which I note does bring Sunlight to its knees, although it does
remain stable (yaay!) with a somewhat more bursty checkpoint interval:
```
5504780 1 seconds 4039 certs
@@ -501,17 +510,17 @@ pim@ctlog-test:/etc/sunlight$ while :; do curl -ksS https://ctlog-test.lab.ipng.
```
This rate boils down to `(6576712-6008831)/120` or 4'700/s of written certs, which at a duplication
ratio of 10% means approximately 5200/s of total accepted certs. This rate, Sunlight is consuming
ratio of 10% means approximately 5'200/s of total accepted certs. This rate, Sunlight is consuming
about 10.3 CPUs/s, while Skylight is at 0.1 CPUs/s and the CT Hammer is at 11.1 CPUs/s; Given the 40
threads on this machine, I am not saturating the CPU, but I'm curious as this rate is significantly
lower than TesseraCT. I briefly turn off the hammer on `ctlog-test` to allow Sunlight to monopolize
the entire CPU. The CPU use lowers to about 9.3 CPUs/s showing that indeed, the bottleneck is not
strictly CPU:
the entire machine. The CPU use does reduce to about 9.3 CPUs/s suggesting that indeed, the bottleneck
is not strictly CPU:
{{< image width="90%" src="/assets/ctlog/btop-sunlight.png" alt="Sunlight btop" >}}
When using only two CT Hammers (on `minio-ssd.lab.ipng.ch` and `minio-disk.lab.ipng.ch`), the CPU
use on the `ctlog-test.lab.ipng.ch` machine definitely goes down (CT Hammer is kind of a CPU hog..),
use on the `ctlog-test.lab.ipng.ch` machine definitely goes down (CT Hammer is kind of a CPU hog....),
but the resulting throughput doesn't change that much:
```
@@ -607,11 +616,11 @@ this setup.
## Wrapup - Observations
TesseraCT and Sunlight handle quite differently. Both are easily up to the task of serving the
current write-load (which is about 250/s).
From an operators point of view, TesseraCT and Sunlight handle quite differently. Both are easily up
to the task of serving the current write-load (which is about 250/s).
* ***S3***: When using the S3 backend, TesseraCT became quite unhappy with 800/s while Sunlight
went all the way up to 5'000/s and sent significantly less requests to Minio (about 4x less),
* ***S3***: When using the S3 backend, TesseraCT became quite unhappy above 800/s while Sunlight
went all the way up to 4'200/s and sent significantly less requests to MinIO (about 4x less),
while showing good telemetry on the use of S3 backends.
* ***POSIX***: When using normal filesystem, Sunlight seems to peak at 4'800/s while TesseraCT