From 4f0188abeb14cf48dc7456dbdaae64542276f1b3 Mon Sep 17 00:00:00 2001 From: Pim van Pelt Date: Sun, 10 Aug 2025 18:50:00 +0200 Subject: [PATCH] A few readability edits --- content/articles/2025-08-10-ctlog-2.md | 109 +++++++++++++------------ 1 file changed, 59 insertions(+), 50 deletions(-) diff --git a/content/articles/2025-08-10-ctlog-2.md b/content/articles/2025-08-10-ctlog-2.md index 58afc35..657d6a5 100644 --- a/content/articles/2025-08-10-ctlog-2.md +++ b/content/articles/2025-08-10-ctlog-2.md @@ -89,24 +89,28 @@ pim@ctlog-test:/etc/sunlight$ openssl req -newkey rsa:2048 -nodes -keyout sunlig pim@ctlog-test:/etc/sunlight# openssl x509 -req -extfile \ <(printf "subjectAltName=DNS:ctlog-test.lab.ipng.ch,DNS:ctlog-test.lab.ipng.ch") -days 365 \ -in sunlight.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out sunlight.pem +ln -s sunlight.pem skylight.pem +ln -s sunlight-key.pem skylight-key.pem ``` -This little snippet yields `sunlight.pem` (the certificate) and `sunlight-key.pem` (the private key). -With these in hand, I can start the rest of the show. First I will prepare the NVME storage with a -few datasets in which Sunlight will store its data: +This little snippet yields `sunlight.pem` (the certificate) and `sunlight-key.pem` (the private +key), and symlinks them to `skylight.pem` and `skylight-key.pem` for simplicity. With these in hand, +I can start the rest of the show. First I will prepare the NVME storage with a few datasets in +which Sunlight will store its data: ``` -pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test -pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/shared -pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/logs -pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/logs/sunlight-test -pim@ctlog-test:/etc/sunlight$ sudo chown -R pim:pim /ssd-vol0/sunlight-test +pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test +pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/shared +pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs +pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs/sunlight-test +pim@ctlog-test:~$ sudo chown -R pim:pim /ssd-vol0/sunlight-test ``` Then I'll create the Sunlight configuration: ``` pim@ctlog-test:/etc/sunlight$ sunlight-keygen -f sunlight-test.seed.bin +Log ID: IPngJcHCHWi+s37vfFqpY9ouk+if78wAY2kl/sh3c8E= ECDSA public key: -----BEGIN PUBLIC KEY----- MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHR @@ -118,8 +122,9 @@ Ed25519 public key: -----END PUBLIC KEY----- ``` -The first block creates key material for the log, and I get a huge surprise: the Log ID starts -precisely with the string `IPng`! What are the odds that that would happen. Slick! +The first block creates key material for the log, and I get a fun surprise: the Log ID starts +precisely with the string IPng... what are the odds that that would happen!? I should tell Antonis +about this, it's dope! As a safety precaution, Sunlight requires the operator to make the `checkpoints.db` by hand, which I'll also do: @@ -136,7 +141,7 @@ When learning about [[Tessera]({{< ref 2025-07-26-ctlog-1 >}})], I already kind conclusion that, for our case at IPng at least, running the fully cloud-native version with S3 storage and MySQL database, gave both poorer performance, but also more operational complexity. But I find it interesting to compare behavior and performance, so I'll start by creating a Sunlight log -using backing Minio SSD storage. +using backing MinIO SSD storage. I'll first create the bucket and a user account to access it: @@ -223,10 +228,11 @@ pim@ctlog-test:~$ curl -k https://ctlog-test.lab.ipng.ch:1443/log.v3.json 404 page not found ``` -I'm starting to think that using a non-standard listen port won't work. The logname is called +I'm starting to think that using a non-standard listen port won't work, or more precisely, adding +a port in the `monitoringprefix` won't work. I notice that the logname is called `ctlog-test.lab.ipng.ch:1443` which I don't think is supposed to have a portname in it. So instead, -I make Sunlight `listen` on port 443 and omit the port in the `submissionprefix`, and give it -privileges to bind the privileged port like so: +I make Sunlight `listen` on port 443 and omit the port in the `submissionprefix`, and give it and +its companion Skylight the needed privileges to bind the privileged port like so: ``` pim@ctlog-test:~$ sudo setcap 'cap_net_bind_service=+ep' /usr/local/bin/sunlight @@ -236,7 +242,7 @@ pim@ctlog-test:~$ sunlight -testcert -c /etc/sunlight/sunlight-s3.yaml {{< image width="60%" src="/assets/ctlog/sunlight-test-s3.png" alt="Sunlight testlog / S3" >}} -And with that, Sunlight reports for duty. Hoi! +And with that, Sunlight reports for duty and the links work. Hoi! #### Sunlight: Loadtesting S3 @@ -246,7 +252,7 @@ paths, and I've created a snakeoil self-signed cert. CT Hammer does not accept t so I need to make a tiny change to the Hammer: ``` -pim@minio-ssd:~/src/tesseract$ git diff +pim@ctlog-test:~/src/tesseract$ git diff diff --git a/internal/hammer/hammer.go b/internal/hammer/hammer.go index 3828fbd..1dfd895 100644 --- a/internal/hammer/hammer.go @@ -286,10 +292,10 @@ pim@ctlog-test:/etc/sunlight$ T=0; O=0; while :; do \ 25196 1 seconds 87 certs ``` -On the first commandline I'll start the loadtest at 100 writes/sec with a duplication probability of -10%, which allows me to test Sunlights ability to avoid writing duplicates. This means I should see -on average a growth of 90/s. Check. I raise the load to 500/s: - +On the first commandline I'll start the loadtest at 100 writes/sec with the standard duplication +probability of 10%, which allows me to test Sunlights ability to avoid writing duplicates. This +means I should see on average a growth of the tree at about 90/s. Check. I raise the write-load to +500/s: ``` 39421 1 seconds 443 certs @@ -299,7 +305,7 @@ on average a growth of 90/s. Check. I raise the load to 500/s: 41194 1 seconds 448 certs ``` -.. and to 1000/s: +.. and to 1'000/s: ``` 57941 1 seconds 945 certs 58886 1 seconds 970 certs @@ -314,8 +320,8 @@ W0810 14:55:29.660710 1398779 analysis.go:134] (1 x) failed to create request: f W0810 14:55:30.496603 1398779 analysis.go:124] (1 x) failed to create request: write leaf was not OK. Status code: 500. Body: "failed to read body: read tcp 127.0.1.1:443->127.0.0.1:44908: i/o timeout\n" ``` -I raise the Hammer load to 5000/sec (which means 4500/s unique certs and 500 duplicates), and find -the max writes/sec to max out at around 4200/s: +I raise the Hammer load to 5'000/sec (which means 4'500/s unique certs and 500 duplicates), and find +the max committed writes/sec to max out at around 4'200/s: ``` 879637 1 seconds 4213 certs 883850 1 seconds 4207 certs @@ -332,9 +338,9 @@ W0810 15:00:05.496459 1398779 analysis.go:124] (1 x) failed to create request: f W0810 15:00:07.187181 1398779 analysis.go:124] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF ``` -At this load of 5000/s, Minio is not very impressed. Remember in the [[other article]({{< ref -2025-07-26-ctlog-1 >}})] I loadtested it to about 7500 ops/sec and the statistics below are about -50 ops/sec (2800/min). I conclude that Minio is, in fact, bored of this whole activity: +At this load of 4'200/s, MinIO is not very impressed. Remember in the [[other article]({{< ref +2025-07-26-ctlog-1 >}})] I loadtested it to about 7'500 ops/sec and the statistics below are about +50 ops/sec (2'800/min). I conclude that MinIO is, in fact, bored of this whole activity: ``` pim@ctlog-test:/etc/sunlight$ mc admin trace --stats ssd @@ -348,11 +354,11 @@ s3.PutObject 37602 (70.3%) 1982.2 6.2ms 785µs 86.7ms 6.1ms 86 s3.GetObject 15918 (29.7%) 839.1 996µs 670µs 51.3ms 912µs 51.2ms ↑46B ↓3.0K ↑38K ↓2.4M 0 ``` -Sunlight still keeps it certificate cache on local disk. At a rate of 5000/s, the ZFS pool has a -write rate of about 105MB/s with about 877 ZFS writes for every 5000 certificates. +Sunlight still keeps its certificate cache on local disk. At a rate of 4'200/s, the ZFS pool has a +write rate of about 105MB/s with about 877 ZFS writes per second. ``` -pim@ctlog-test:/etc/sunlight$ zpool iostat -v ssd-vol0 10 +pim@ctlog-test:/etc/sunlight$ zpool iostat -v ssd-vol0 10 capacity operations bandwidth pool alloc free read write read write -------------------------- ----- ----- ----- ----- ----- ----- @@ -380,24 +386,24 @@ A few interesting observations: * The write rate to ZFS is significantly higher with Sunlight than TesseraCT (about 8:1). This is likely explained because the sqlite3 database lives on ZFS here, while TesseraCT uses MariaDB running on a different filesystem. -* The Minio IO is a lot lighter. As I reduce the load to 1000/s, as was the case in the TesseraCT +* The MinIO usage is a lot lighter. As I reduce the load to 1'000/s, as was the case in the TesseraCT test, I can see the ratio of Get:Put was 93:4 in TesseraCT, while it's 70:30 here. TesseraCT as also consuming more IOPS, running at about 10.5k requests/minute, while Sunlight is - significantly calmer at 2.8k requets/minute (almost 4x less!) + significantly calmer at 2.8k requests/minute (almost 4x less!) * The burst capacity of Sunlight is a fair bit higher than TesseraCT, likely due to its more efficient use of S3 backends. -***Conclusion***: Sunlight S3+Minio can handle 1000/s reliably, and can spike to 5000/s with only +***Conclusion***: Sunlight S3+MinIO can handle 1'000/s reliably, and can spike to 4'200/s with only few errors. #### Sunlight: Loadtesting POSIX When I took a closer look at TesseraCT a few weeks ago, it struck me that while making a cloud-native setup, with S3 storage would allow for a cool way to enable storage scaling and -read-path redundancy, by creating replicated buckets, it does come at a significant operational -overhead and complexity. My main concern is the amount of different moving parts, and Sunlight -really has one very appealing property: it can run entirely on one machine without the need for any -other moving parts - even the SQL database is linked in. That's pretty slick. +read-path redundancy, by creating synchronously replicated buckets, it does come at a significant +operational overhead and complexity. My main concern is the amount of different moving parts, and +Sunlight really has one very appealing property: it can run entirely on one machine without the need +for any other moving parts - even the SQL database is linked in. That's pretty slick. ``` pim@ctlog-test:/etc/sunlight$ cat << EOF > sunlight.yaml @@ -453,14 +459,16 @@ sunlight_sqlite_update_duration_seconds{quantile="0.99"} 0.014922489 ``` I'm seeing here that at a load of 100/s (with 90/s of unique certificates), the 99th percentile -add-chain lataency is 207ms, which makes sense because the `period` configuration field is set to +add-chain latency is 207ms, which makes sense because the `period` configuration field is set to 200ms. The filesystem operations (discard, fetch, upload) are _de minimis_ and the sequencing duration is at 109ms. Excellent! -But can this thing go really fast? I am reminded that the CT Hammer uses more CPU than TesseraCT, -and I've seen it above also when running my 5000/s loadtest that's about all the hammer can take on -a single Dell R630. So, as I did with the TesseraCT test, I'll use the Minio SSD and Minio Disk -machines to generate the load. I boot them, so now I can hammer, or shall I say jackhammer away: +But can this thing go really fast? I do remember that the CT Hammer uses more CPU than TesseraCT, +and I've seen it above also when running my 5'000/s loadtest that's about all the hammer can take on +a single Dell R630. So, as I did with the TesseraCT test, I'll use the MinIO SSD and MinIO Disk +machines to generate the load. + +I boot them, so that I can hammer, or shall I say jackhammer away: ``` pim@ctlog-test:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab.ipng.ch \ @@ -479,7 +487,8 @@ pim@minio-disk:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab --max_read_ops=0 --num_writers=5000 --max_write_ops=5000 --serial_offset=2000000 ``` -This will generate 15000/s of load, which I note makes Sunlight a fair bit more bursty: +This will generate 15'000/s of load, which I note does bring Sunlight to its knees, although it does +remain stable (yaay!) with a somewhat more bursty checkpoint interval: ``` 5504780 1 seconds 4039 certs @@ -501,17 +510,17 @@ pim@ctlog-test:/etc/sunlight$ while :; do curl -ksS https://ctlog-test.lab.ipng. ``` This rate boils down to `(6576712-6008831)/120` or 4'700/s of written certs, which at a duplication -ratio of 10% means approximately 5200/s of total accepted certs. This rate, Sunlight is consuming +ratio of 10% means approximately 5'200/s of total accepted certs. This rate, Sunlight is consuming about 10.3 CPUs/s, while Skylight is at 0.1 CPUs/s and the CT Hammer is at 11.1 CPUs/s; Given the 40 threads on this machine, I am not saturating the CPU, but I'm curious as this rate is significantly lower than TesseraCT. I briefly turn off the hammer on `ctlog-test` to allow Sunlight to monopolize -the entire CPU. The CPU use lowers to about 9.3 CPUs/s showing that indeed, the bottleneck is not -strictly CPU: +the entire machine. The CPU use does reduce to about 9.3 CPUs/s suggesting that indeed, the bottleneck +is not strictly CPU: {{< image width="90%" src="/assets/ctlog/btop-sunlight.png" alt="Sunlight btop" >}} When using only two CT Hammers (on `minio-ssd.lab.ipng.ch` and `minio-disk.lab.ipng.ch`), the CPU -use on the `ctlog-test.lab.ipng.ch` machine definitely goes down (CT Hammer is kind of a CPU hog..), +use on the `ctlog-test.lab.ipng.ch` machine definitely goes down (CT Hammer is kind of a CPU hog....), but the resulting throughput doesn't change that much: ``` @@ -607,11 +616,11 @@ this setup. ## Wrapup - Observations -TesseraCT and Sunlight handle quite differently. Both are easily up to the task of serving the -current write-load (which is about 250/s). +From an operators point of view, TesseraCT and Sunlight handle quite differently. Both are easily up +to the task of serving the current write-load (which is about 250/s). -* ***S3***: When using the S3 backend, TesseraCT became quite unhappy with 800/s while Sunlight - went all the way up to 5'000/s and sent significantly less requests to Minio (about 4x less), +* ***S3***: When using the S3 backend, TesseraCT became quite unhappy above 800/s while Sunlight + went all the way up to 4'200/s and sent significantly less requests to MinIO (about 4x less), while showing good telemetry on the use of S3 backends. * ***POSIX***: When using normal filesystem, Sunlight seems to peak at 4'800/s while TesseraCT