A few readability edits
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
@@ -89,24 +89,28 @@ pim@ctlog-test:/etc/sunlight$ openssl req -newkey rsa:2048 -nodes -keyout sunlig
|
|||||||
pim@ctlog-test:/etc/sunlight# openssl x509 -req -extfile \
|
pim@ctlog-test:/etc/sunlight# openssl x509 -req -extfile \
|
||||||
<(printf "subjectAltName=DNS:ctlog-test.lab.ipng.ch,DNS:ctlog-test.lab.ipng.ch") -days 365 \
|
<(printf "subjectAltName=DNS:ctlog-test.lab.ipng.ch,DNS:ctlog-test.lab.ipng.ch") -days 365 \
|
||||||
-in sunlight.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out sunlight.pem
|
-in sunlight.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out sunlight.pem
|
||||||
|
ln -s sunlight.pem skylight.pem
|
||||||
|
ln -s sunlight-key.pem skylight-key.pem
|
||||||
```
|
```
|
||||||
|
|
||||||
This little snippet yields `sunlight.pem` (the certificate) and `sunlight-key.pem` (the private key).
|
This little snippet yields `sunlight.pem` (the certificate) and `sunlight-key.pem` (the private
|
||||||
With these in hand, I can start the rest of the show. First I will prepare the NVME storage with a
|
key), and symlinks them to `skylight.pem` and `skylight-key.pem` for simplicity. With these in hand,
|
||||||
few datasets in which Sunlight will store its data:
|
I can start the rest of the show. First I will prepare the NVME storage with a few datasets in
|
||||||
|
which Sunlight will store its data:
|
||||||
|
|
||||||
```
|
```
|
||||||
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test
|
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test
|
||||||
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/shared
|
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/shared
|
||||||
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/logs
|
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs
|
||||||
pim@ctlog-test:/etc/sunlight$ sudo zfs create ssd-vol0/sunlight-test/logs/sunlight-test
|
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs/sunlight-test
|
||||||
pim@ctlog-test:/etc/sunlight$ sudo chown -R pim:pim /ssd-vol0/sunlight-test
|
pim@ctlog-test:~$ sudo chown -R pim:pim /ssd-vol0/sunlight-test
|
||||||
```
|
```
|
||||||
|
|
||||||
Then I'll create the Sunlight configuration:
|
Then I'll create the Sunlight configuration:
|
||||||
|
|
||||||
```
|
```
|
||||||
pim@ctlog-test:/etc/sunlight$ sunlight-keygen -f sunlight-test.seed.bin
|
pim@ctlog-test:/etc/sunlight$ sunlight-keygen -f sunlight-test.seed.bin
|
||||||
|
Log ID: IPngJcHCHWi+s37vfFqpY9ouk+if78wAY2kl/sh3c8E=
|
||||||
ECDSA public key:
|
ECDSA public key:
|
||||||
-----BEGIN PUBLIC KEY-----
|
-----BEGIN PUBLIC KEY-----
|
||||||
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHR
|
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHR
|
||||||
@@ -118,8 +122,9 @@ Ed25519 public key:
|
|||||||
-----END PUBLIC KEY-----
|
-----END PUBLIC KEY-----
|
||||||
```
|
```
|
||||||
|
|
||||||
The first block creates key material for the log, and I get a huge surprise: the Log ID starts
|
The first block creates key material for the log, and I get a fun surprise: the Log ID starts
|
||||||
precisely with the string `IPng`! What are the odds that that would happen. Slick!
|
precisely with the string IPng... what are the odds that that would happen!? I should tell Antonis
|
||||||
|
about this, it's dope!
|
||||||
|
|
||||||
As a safety precaution, Sunlight requires the operator to make the `checkpoints.db` by hand, which
|
As a safety precaution, Sunlight requires the operator to make the `checkpoints.db` by hand, which
|
||||||
I'll also do:
|
I'll also do:
|
||||||
@@ -136,7 +141,7 @@ When learning about [[Tessera]({{< ref 2025-07-26-ctlog-1 >}})], I already kind
|
|||||||
conclusion that, for our case at IPng at least, running the fully cloud-native version with S3
|
conclusion that, for our case at IPng at least, running the fully cloud-native version with S3
|
||||||
storage and MySQL database, gave both poorer performance, but also more operational complexity. But
|
storage and MySQL database, gave both poorer performance, but also more operational complexity. But
|
||||||
I find it interesting to compare behavior and performance, so I'll start by creating a Sunlight log
|
I find it interesting to compare behavior and performance, so I'll start by creating a Sunlight log
|
||||||
using backing Minio SSD storage.
|
using backing MinIO SSD storage.
|
||||||
|
|
||||||
I'll first create the bucket and a user account to access it:
|
I'll first create the bucket and a user account to access it:
|
||||||
|
|
||||||
@@ -223,10 +228,11 @@ pim@ctlog-test:~$ curl -k https://ctlog-test.lab.ipng.ch:1443/log.v3.json
|
|||||||
404 page not found
|
404 page not found
|
||||||
```
|
```
|
||||||
|
|
||||||
I'm starting to think that using a non-standard listen port won't work. The logname is called
|
I'm starting to think that using a non-standard listen port won't work, or more precisely, adding
|
||||||
|
a port in the `monitoringprefix` won't work. I notice that the logname is called
|
||||||
`ctlog-test.lab.ipng.ch:1443` which I don't think is supposed to have a portname in it. So instead,
|
`ctlog-test.lab.ipng.ch:1443` which I don't think is supposed to have a portname in it. So instead,
|
||||||
I make Sunlight `listen` on port 443 and omit the port in the `submissionprefix`, and give it
|
I make Sunlight `listen` on port 443 and omit the port in the `submissionprefix`, and give it and
|
||||||
privileges to bind the privileged port like so:
|
its companion Skylight the needed privileges to bind the privileged port like so:
|
||||||
|
|
||||||
```
|
```
|
||||||
pim@ctlog-test:~$ sudo setcap 'cap_net_bind_service=+ep' /usr/local/bin/sunlight
|
pim@ctlog-test:~$ sudo setcap 'cap_net_bind_service=+ep' /usr/local/bin/sunlight
|
||||||
@@ -236,7 +242,7 @@ pim@ctlog-test:~$ sunlight -testcert -c /etc/sunlight/sunlight-s3.yaml
|
|||||||
|
|
||||||
{{< image width="60%" src="/assets/ctlog/sunlight-test-s3.png" alt="Sunlight testlog / S3" >}}
|
{{< image width="60%" src="/assets/ctlog/sunlight-test-s3.png" alt="Sunlight testlog / S3" >}}
|
||||||
|
|
||||||
And with that, Sunlight reports for duty. Hoi!
|
And with that, Sunlight reports for duty and the links work. Hoi!
|
||||||
|
|
||||||
#### Sunlight: Loadtesting S3
|
#### Sunlight: Loadtesting S3
|
||||||
|
|
||||||
@@ -246,7 +252,7 @@ paths, and I've created a snakeoil self-signed cert. CT Hammer does not accept t
|
|||||||
so I need to make a tiny change to the Hammer:
|
so I need to make a tiny change to the Hammer:
|
||||||
|
|
||||||
```
|
```
|
||||||
pim@minio-ssd:~/src/tesseract$ git diff
|
pim@ctlog-test:~/src/tesseract$ git diff
|
||||||
diff --git a/internal/hammer/hammer.go b/internal/hammer/hammer.go
|
diff --git a/internal/hammer/hammer.go b/internal/hammer/hammer.go
|
||||||
index 3828fbd..1dfd895 100644
|
index 3828fbd..1dfd895 100644
|
||||||
--- a/internal/hammer/hammer.go
|
--- a/internal/hammer/hammer.go
|
||||||
@@ -286,10 +292,10 @@ pim@ctlog-test:/etc/sunlight$ T=0; O=0; while :; do \
|
|||||||
25196 1 seconds 87 certs
|
25196 1 seconds 87 certs
|
||||||
```
|
```
|
||||||
|
|
||||||
On the first commandline I'll start the loadtest at 100 writes/sec with a duplication probability of
|
On the first commandline I'll start the loadtest at 100 writes/sec with the standard duplication
|
||||||
10%, which allows me to test Sunlights ability to avoid writing duplicates. This means I should see
|
probability of 10%, which allows me to test Sunlights ability to avoid writing duplicates. This
|
||||||
on average a growth of 90/s. Check. I raise the load to 500/s:
|
means I should see on average a growth of the tree at about 90/s. Check. I raise the write-load to
|
||||||
|
500/s:
|
||||||
|
|
||||||
```
|
```
|
||||||
39421 1 seconds 443 certs
|
39421 1 seconds 443 certs
|
||||||
@@ -299,7 +305,7 @@ on average a growth of 90/s. Check. I raise the load to 500/s:
|
|||||||
41194 1 seconds 448 certs
|
41194 1 seconds 448 certs
|
||||||
```
|
```
|
||||||
|
|
||||||
.. and to 1000/s:
|
.. and to 1'000/s:
|
||||||
```
|
```
|
||||||
57941 1 seconds 945 certs
|
57941 1 seconds 945 certs
|
||||||
58886 1 seconds 970 certs
|
58886 1 seconds 970 certs
|
||||||
@@ -314,8 +320,8 @@ W0810 14:55:29.660710 1398779 analysis.go:134] (1 x) failed to create request: f
|
|||||||
W0810 14:55:30.496603 1398779 analysis.go:124] (1 x) failed to create request: write leaf was not OK. Status code: 500. Body: "failed to read body: read tcp 127.0.1.1:443->127.0.0.1:44908: i/o timeout\n"
|
W0810 14:55:30.496603 1398779 analysis.go:124] (1 x) failed to create request: write leaf was not OK. Status code: 500. Body: "failed to read body: read tcp 127.0.1.1:443->127.0.0.1:44908: i/o timeout\n"
|
||||||
```
|
```
|
||||||
|
|
||||||
I raise the Hammer load to 5000/sec (which means 4500/s unique certs and 500 duplicates), and find
|
I raise the Hammer load to 5'000/sec (which means 4'500/s unique certs and 500 duplicates), and find
|
||||||
the max writes/sec to max out at around 4200/s:
|
the max committed writes/sec to max out at around 4'200/s:
|
||||||
```
|
```
|
||||||
879637 1 seconds 4213 certs
|
879637 1 seconds 4213 certs
|
||||||
883850 1 seconds 4207 certs
|
883850 1 seconds 4207 certs
|
||||||
@@ -332,9 +338,9 @@ W0810 15:00:05.496459 1398779 analysis.go:124] (1 x) failed to create request: f
|
|||||||
W0810 15:00:07.187181 1398779 analysis.go:124] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF
|
W0810 15:00:07.187181 1398779 analysis.go:124] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF
|
||||||
```
|
```
|
||||||
|
|
||||||
At this load of 5000/s, Minio is not very impressed. Remember in the [[other article]({{< ref
|
At this load of 4'200/s, MinIO is not very impressed. Remember in the [[other article]({{< ref
|
||||||
2025-07-26-ctlog-1 >}})] I loadtested it to about 7500 ops/sec and the statistics below are about
|
2025-07-26-ctlog-1 >}})] I loadtested it to about 7'500 ops/sec and the statistics below are about
|
||||||
50 ops/sec (2800/min). I conclude that Minio is, in fact, bored of this whole activity:
|
50 ops/sec (2'800/min). I conclude that MinIO is, in fact, bored of this whole activity:
|
||||||
|
|
||||||
```
|
```
|
||||||
pim@ctlog-test:/etc/sunlight$ mc admin trace --stats ssd
|
pim@ctlog-test:/etc/sunlight$ mc admin trace --stats ssd
|
||||||
@@ -348,8 +354,8 @@ s3.PutObject 37602 (70.3%) 1982.2 6.2ms 785µs 86.7ms 6.1ms 86
|
|||||||
s3.GetObject 15918 (29.7%) 839.1 996µs 670µs 51.3ms 912µs 51.2ms ↑46B ↓3.0K ↑38K ↓2.4M 0
|
s3.GetObject 15918 (29.7%) 839.1 996µs 670µs 51.3ms 912µs 51.2ms ↑46B ↓3.0K ↑38K ↓2.4M 0
|
||||||
```
|
```
|
||||||
|
|
||||||
Sunlight still keeps it certificate cache on local disk. At a rate of 5000/s, the ZFS pool has a
|
Sunlight still keeps its certificate cache on local disk. At a rate of 4'200/s, the ZFS pool has a
|
||||||
write rate of about 105MB/s with about 877 ZFS writes for every 5000 certificates.
|
write rate of about 105MB/s with about 877 ZFS writes per second.
|
||||||
|
|
||||||
```
|
```
|
||||||
pim@ctlog-test:/etc/sunlight$ zpool iostat -v ssd-vol0 10
|
pim@ctlog-test:/etc/sunlight$ zpool iostat -v ssd-vol0 10
|
||||||
@@ -380,24 +386,24 @@ A few interesting observations:
|
|||||||
* The write rate to ZFS is significantly higher with Sunlight than TesseraCT (about 8:1). This is
|
* The write rate to ZFS is significantly higher with Sunlight than TesseraCT (about 8:1). This is
|
||||||
likely explained because the sqlite3 database lives on ZFS here, while TesseraCT uses MariaDB
|
likely explained because the sqlite3 database lives on ZFS here, while TesseraCT uses MariaDB
|
||||||
running on a different filesystem.
|
running on a different filesystem.
|
||||||
* The Minio IO is a lot lighter. As I reduce the load to 1000/s, as was the case in the TesseraCT
|
* The MinIO usage is a lot lighter. As I reduce the load to 1'000/s, as was the case in the TesseraCT
|
||||||
test, I can see the ratio of Get:Put was 93:4 in TesseraCT, while it's 70:30 here. TesseraCT as
|
test, I can see the ratio of Get:Put was 93:4 in TesseraCT, while it's 70:30 here. TesseraCT as
|
||||||
also consuming more IOPS, running at about 10.5k requests/minute, while Sunlight is
|
also consuming more IOPS, running at about 10.5k requests/minute, while Sunlight is
|
||||||
significantly calmer at 2.8k requets/minute (almost 4x less!)
|
significantly calmer at 2.8k requests/minute (almost 4x less!)
|
||||||
* The burst capacity of Sunlight is a fair bit higher than TesseraCT, likely due to its more
|
* The burst capacity of Sunlight is a fair bit higher than TesseraCT, likely due to its more
|
||||||
efficient use of S3 backends.
|
efficient use of S3 backends.
|
||||||
|
|
||||||
***Conclusion***: Sunlight S3+Minio can handle 1000/s reliably, and can spike to 5000/s with only
|
***Conclusion***: Sunlight S3+MinIO can handle 1'000/s reliably, and can spike to 4'200/s with only
|
||||||
few errors.
|
few errors.
|
||||||
|
|
||||||
#### Sunlight: Loadtesting POSIX
|
#### Sunlight: Loadtesting POSIX
|
||||||
|
|
||||||
When I took a closer look at TesseraCT a few weeks ago, it struck me that while making a
|
When I took a closer look at TesseraCT a few weeks ago, it struck me that while making a
|
||||||
cloud-native setup, with S3 storage would allow for a cool way to enable storage scaling and
|
cloud-native setup, with S3 storage would allow for a cool way to enable storage scaling and
|
||||||
read-path redundancy, by creating replicated buckets, it does come at a significant operational
|
read-path redundancy, by creating synchronously replicated buckets, it does come at a significant
|
||||||
overhead and complexity. My main concern is the amount of different moving parts, and Sunlight
|
operational overhead and complexity. My main concern is the amount of different moving parts, and
|
||||||
really has one very appealing property: it can run entirely on one machine without the need for any
|
Sunlight really has one very appealing property: it can run entirely on one machine without the need
|
||||||
other moving parts - even the SQL database is linked in. That's pretty slick.
|
for any other moving parts - even the SQL database is linked in. That's pretty slick.
|
||||||
|
|
||||||
```
|
```
|
||||||
pim@ctlog-test:/etc/sunlight$ cat << EOF > sunlight.yaml
|
pim@ctlog-test:/etc/sunlight$ cat << EOF > sunlight.yaml
|
||||||
@@ -453,14 +459,16 @@ sunlight_sqlite_update_duration_seconds{quantile="0.99"} 0.014922489
|
|||||||
```
|
```
|
||||||
|
|
||||||
I'm seeing here that at a load of 100/s (with 90/s of unique certificates), the 99th percentile
|
I'm seeing here that at a load of 100/s (with 90/s of unique certificates), the 99th percentile
|
||||||
add-chain lataency is 207ms, which makes sense because the `period` configuration field is set to
|
add-chain latency is 207ms, which makes sense because the `period` configuration field is set to
|
||||||
200ms. The filesystem operations (discard, fetch, upload) are _de minimis_ and the sequencing
|
200ms. The filesystem operations (discard, fetch, upload) are _de minimis_ and the sequencing
|
||||||
duration is at 109ms. Excellent!
|
duration is at 109ms. Excellent!
|
||||||
|
|
||||||
But can this thing go really fast? I am reminded that the CT Hammer uses more CPU than TesseraCT,
|
But can this thing go really fast? I do remember that the CT Hammer uses more CPU than TesseraCT,
|
||||||
and I've seen it above also when running my 5000/s loadtest that's about all the hammer can take on
|
and I've seen it above also when running my 5'000/s loadtest that's about all the hammer can take on
|
||||||
a single Dell R630. So, as I did with the TesseraCT test, I'll use the Minio SSD and Minio Disk
|
a single Dell R630. So, as I did with the TesseraCT test, I'll use the MinIO SSD and MinIO Disk
|
||||||
machines to generate the load. I boot them, so now I can hammer, or shall I say jackhammer away:
|
machines to generate the load.
|
||||||
|
|
||||||
|
I boot them, so that I can hammer, or shall I say jackhammer away:
|
||||||
|
|
||||||
```
|
```
|
||||||
pim@ctlog-test:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab.ipng.ch \
|
pim@ctlog-test:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab.ipng.ch \
|
||||||
@@ -479,7 +487,8 @@ pim@minio-disk:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab
|
|||||||
--max_read_ops=0 --num_writers=5000 --max_write_ops=5000 --serial_offset=2000000
|
--max_read_ops=0 --num_writers=5000 --max_write_ops=5000 --serial_offset=2000000
|
||||||
```
|
```
|
||||||
|
|
||||||
This will generate 15000/s of load, which I note makes Sunlight a fair bit more bursty:
|
This will generate 15'000/s of load, which I note does bring Sunlight to its knees, although it does
|
||||||
|
remain stable (yaay!) with a somewhat more bursty checkpoint interval:
|
||||||
|
|
||||||
```
|
```
|
||||||
5504780 1 seconds 4039 certs
|
5504780 1 seconds 4039 certs
|
||||||
@@ -501,17 +510,17 @@ pim@ctlog-test:/etc/sunlight$ while :; do curl -ksS https://ctlog-test.lab.ipng.
|
|||||||
```
|
```
|
||||||
|
|
||||||
This rate boils down to `(6576712-6008831)/120` or 4'700/s of written certs, which at a duplication
|
This rate boils down to `(6576712-6008831)/120` or 4'700/s of written certs, which at a duplication
|
||||||
ratio of 10% means approximately 5200/s of total accepted certs. This rate, Sunlight is consuming
|
ratio of 10% means approximately 5'200/s of total accepted certs. This rate, Sunlight is consuming
|
||||||
about 10.3 CPUs/s, while Skylight is at 0.1 CPUs/s and the CT Hammer is at 11.1 CPUs/s; Given the 40
|
about 10.3 CPUs/s, while Skylight is at 0.1 CPUs/s and the CT Hammer is at 11.1 CPUs/s; Given the 40
|
||||||
threads on this machine, I am not saturating the CPU, but I'm curious as this rate is significantly
|
threads on this machine, I am not saturating the CPU, but I'm curious as this rate is significantly
|
||||||
lower than TesseraCT. I briefly turn off the hammer on `ctlog-test` to allow Sunlight to monopolize
|
lower than TesseraCT. I briefly turn off the hammer on `ctlog-test` to allow Sunlight to monopolize
|
||||||
the entire CPU. The CPU use lowers to about 9.3 CPUs/s showing that indeed, the bottleneck is not
|
the entire machine. The CPU use does reduce to about 9.3 CPUs/s suggesting that indeed, the bottleneck
|
||||||
strictly CPU:
|
is not strictly CPU:
|
||||||
|
|
||||||
{{< image width="90%" src="/assets/ctlog/btop-sunlight.png" alt="Sunlight btop" >}}
|
{{< image width="90%" src="/assets/ctlog/btop-sunlight.png" alt="Sunlight btop" >}}
|
||||||
|
|
||||||
When using only two CT Hammers (on `minio-ssd.lab.ipng.ch` and `minio-disk.lab.ipng.ch`), the CPU
|
When using only two CT Hammers (on `minio-ssd.lab.ipng.ch` and `minio-disk.lab.ipng.ch`), the CPU
|
||||||
use on the `ctlog-test.lab.ipng.ch` machine definitely goes down (CT Hammer is kind of a CPU hog..),
|
use on the `ctlog-test.lab.ipng.ch` machine definitely goes down (CT Hammer is kind of a CPU hog....),
|
||||||
but the resulting throughput doesn't change that much:
|
but the resulting throughput doesn't change that much:
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -607,11 +616,11 @@ this setup.
|
|||||||
|
|
||||||
## Wrapup - Observations
|
## Wrapup - Observations
|
||||||
|
|
||||||
TesseraCT and Sunlight handle quite differently. Both are easily up to the task of serving the
|
From an operators point of view, TesseraCT and Sunlight handle quite differently. Both are easily up
|
||||||
current write-load (which is about 250/s).
|
to the task of serving the current write-load (which is about 250/s).
|
||||||
|
|
||||||
* ***S3***: When using the S3 backend, TesseraCT became quite unhappy with 800/s while Sunlight
|
* ***S3***: When using the S3 backend, TesseraCT became quite unhappy above 800/s while Sunlight
|
||||||
went all the way up to 5'000/s and sent significantly less requests to Minio (about 4x less),
|
went all the way up to 4'200/s and sent significantly less requests to MinIO (about 4x less),
|
||||||
while showing good telemetry on the use of S3 backends.
|
while showing good telemetry on the use of S3 backends.
|
||||||
|
|
||||||
* ***POSIX***: When using normal filesystem, Sunlight seems to peak at 4'800/s while TesseraCT
|
* ***POSIX***: When using normal filesystem, Sunlight seems to peak at 4'800/s while TesseraCT
|
||||||
|
Reference in New Issue
Block a user