Compare commits
44 Commits
4aa5745d06
...
main
Author | SHA1 | Date | |
---|---|---|---|
|
512cfd75dc | ||
|
8683d570a1 | ||
|
a1a98ad3c6 | ||
|
26ae98d977 | ||
|
619a1dfdf2 | ||
|
a9e978effb | ||
|
825335cef9 | ||
|
a97115593c | ||
|
3dd0d8a656 | ||
|
f137326339 | ||
|
51098ed43c | ||
|
6b337e1167 | ||
|
bbf36f5a4e | ||
|
b324d71b3f | ||
|
2681861e4b | ||
|
4f0188abeb | ||
|
f4ed332b18 | ||
|
d9066aa241 | ||
|
c68799703b | ||
|
c32d1779f8 | ||
|
eda80e7e66 | ||
|
d13da5608d | ||
|
d47261a3b7 | ||
|
383a598fc7 | ||
|
8afa2ff944 | ||
|
fe1207ee78 | ||
|
6a59b7d7e6 | ||
|
bc2a9bb352 | ||
|
5d02b6466c | ||
|
b6b419471d | ||
|
85b41ba4e0 | ||
|
ebbb0f8e24 | ||
|
218ee84d5f | ||
|
c476fa56fb | ||
|
a76abc331f | ||
|
44deb34685 | ||
|
ca46bcf6d5 | ||
|
5042f822ef | ||
|
fdb77838b8 | ||
|
6d3f4ac206 | ||
|
baa3e78045 | ||
|
0972cf4aa1 | ||
|
4f81d377a0 | ||
|
153048eda4 |
@@ -8,9 +8,9 @@ steps:
|
|||||||
- git lfs install
|
- git lfs install
|
||||||
- git lfs pull
|
- git lfs pull
|
||||||
- name: build
|
- name: build
|
||||||
image: git.ipng.ch/ipng/drone-hugo:release-0.145.1
|
image: git.ipng.ch/ipng/drone-hugo:release-0.148.2
|
||||||
settings:
|
settings:
|
||||||
hugo_version: 0.145.0
|
hugo_version: 0.148.2
|
||||||
extended: true
|
extended: true
|
||||||
- name: rsync
|
- name: rsync
|
||||||
image: drillster/drone-rsync
|
image: drillster/drone-rsync
|
||||||
@@ -26,7 +26,7 @@ steps:
|
|||||||
port: 22
|
port: 22
|
||||||
args: '-6u --delete-after'
|
args: '-6u --delete-after'
|
||||||
source: public/
|
source: public/
|
||||||
target: /var/www/ipng.ch/
|
target: /nginx/sites/ipng.ch/
|
||||||
recursive: true
|
recursive: true
|
||||||
secrets: [ drone_sshkey ]
|
secrets: [ drone_sshkey ]
|
||||||
|
|
||||||
|
@@ -8,7 +8,7 @@ Historical context - todo, but notes for now
|
|||||||
|
|
||||||
1. started with stack.nl (when it was still stack.urc.tue.nl), 6bone and watching NASA multicast video in 1997.
|
1. started with stack.nl (when it was still stack.urc.tue.nl), 6bone and watching NASA multicast video in 1997.
|
||||||
2. founded ipng.nl project, first IPv6 in NL that was usable outside of NREN.
|
2. founded ipng.nl project, first IPv6 in NL that was usable outside of NREN.
|
||||||
3. attacted attention of the first few IPv6 partitipants in Amsterdam, organized the AIAD - AMS-IX IPv6 Awareness Day
|
3. attracted attention of the first few IPv6 participants in Amsterdam, organized the AIAD - AMS-IX IPv6 Awareness Day
|
||||||
4. launched IPv6 at AMS-IX, first IXP prefix allocated 2001:768:1::/48
|
4. launched IPv6 at AMS-IX, first IXP prefix allocated 2001:768:1::/48
|
||||||
> My Brilliant Idea Of The Day -- encode AS number in leetspeak: `::AS01:2859:1`, because who would've thought we would ever run out of 16 bit AS numbers :)
|
> My Brilliant Idea Of The Day -- encode AS number in leetspeak: `::AS01:2859:1`, because who would've thought we would ever run out of 16 bit AS numbers :)
|
||||||
5. IPng rearchitected to SixXS, and became a very large scale deployment of IPv6 tunnelbroker; our main central provisioning system moved around a few times between ISPs (Intouch, Concepts ICT, BIT, IP Man)
|
5. IPng rearchitected to SixXS, and became a very large scale deployment of IPv6 tunnelbroker; our main central provisioning system moved around a few times between ISPs (Intouch, Concepts ICT, BIT, IP Man)
|
||||||
|
@@ -185,7 +185,7 @@ function is_coloclue_beacon()
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Then, I ran the configuration again with one IPv4 beacon set on dcg-1, and still all the bird configs on both IPv4 and IPv6 for all routers parsed correctly, and the generated function on the dcg-1 IPv4 filters file was popupated:
|
Then, I ran the configuration again with one IPv4 beacon set on dcg-1, and still all the bird configs on both IPv4 and IPv6 for all routers parsed correctly, and the generated function on the dcg-1 IPv4 filters file was populated:
|
||||||
```
|
```
|
||||||
function is_coloclue_beacon()
|
function is_coloclue_beacon()
|
||||||
{
|
{
|
||||||
|
@@ -89,7 +89,7 @@ lcp lcp-sync off
|
|||||||
```
|
```
|
||||||
|
|
||||||
The prep work for the rest of the interface syncer starts with this
|
The prep work for the rest of the interface syncer starts with this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||||
for the rest of this blog post, the behavior will be in the 'on' position.
|
for the rest of this blog post, the behavior will be in the 'on' position.
|
||||||
|
|
||||||
### Change interface: state
|
### Change interface: state
|
||||||
@@ -120,7 +120,7 @@ the state it was. I did notice that you can't bring up a sub-interface if its pa
|
|||||||
is down, which I found counterintuitive, but that's neither here nor there.
|
is down, which I found counterintuitive, but that's neither here nor there.
|
||||||
|
|
||||||
All of this is to say that we have to be careful when copying state forward, because as
|
All of this is to say that we have to be careful when copying state forward, because as
|
||||||
this [[commit](https://github.com/pimvanpelt/lcpng/commit/7c15c84f6c4739860a85c599779c199cb9efef03)]
|
this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c15c84f6c4739860a85c599779c199cb9efef03)]
|
||||||
shows, issuing `set int state ... up` on an interface, won't touch its sub-interfaces in VPP, but
|
shows, issuing `set int state ... up` on an interface, won't touch its sub-interfaces in VPP, but
|
||||||
the subsequent netlink message to bring the _LIP_ for that interface up, **will** update the
|
the subsequent netlink message to bring the _LIP_ for that interface up, **will** update the
|
||||||
children, thus desynchronising Linux and VPP: Linux will have interface **and all its
|
children, thus desynchronising Linux and VPP: Linux will have interface **and all its
|
||||||
@@ -128,7 +128,7 @@ sub-interfaces** up unconditionally; VPP will have the interface up and its sub-
|
|||||||
whatever state they were before.
|
whatever state they were before.
|
||||||
|
|
||||||
To address this, a second
|
To address this, a second
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/a3dc56c01461bdffcac8193ead654ae79225220f)] was
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a3dc56c01461bdffcac8193ead654ae79225220f)] was
|
||||||
needed. I'm not too sure I want to keep this behavior, but for now, it results in an intuitive
|
needed. I'm not too sure I want to keep this behavior, but for now, it results in an intuitive
|
||||||
end-state, which is that all interfaces states are exactly the same between Linux and VPP.
|
end-state, which is that all interfaces states are exactly the same between Linux and VPP.
|
||||||
|
|
||||||
@@ -157,7 +157,7 @@ DBGvpp# set int state TenGigabitEthernet3/0/0 up
|
|||||||
### Change interface: MTU
|
### Change interface: MTU
|
||||||
|
|
||||||
Finally, a straight forward
|
Finally, a straight forward
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/39bfa1615fd1cafe5df6d8fc9d34528e8d3906e2)], or
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/39bfa1615fd1cafe5df6d8fc9d34528e8d3906e2)], or
|
||||||
so I thought. When the MTU changes in VPP (with `set interface mtu packet N <int>`), there is
|
so I thought. When the MTU changes in VPP (with `set interface mtu packet N <int>`), there is
|
||||||
callback that can be registered which copies this into the _LIP_. I did notice a specific corner
|
callback that can be registered which copies this into the _LIP_. I did notice a specific corner
|
||||||
case: In VPP, a sub-interface can have a larger MTU than its parent. In Linux, this cannot happen,
|
case: In VPP, a sub-interface can have a larger MTU than its parent. In Linux, this cannot happen,
|
||||||
@@ -179,7 +179,7 @@ higher than that, perhaps logging an error explaining why. This means two things
|
|||||||
1. Any change in VPP of a parent MTU should ensure all children are clamped to at most that.
|
1. Any change in VPP of a parent MTU should ensure all children are clamped to at most that.
|
||||||
|
|
||||||
I addressed the issue in this
|
I addressed the issue in this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/79a395b3c9f0dae9a23e6fbf10c5f284b1facb85)].
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/79a395b3c9f0dae9a23e6fbf10c5f284b1facb85)].
|
||||||
|
|
||||||
### Change interface: IP Addresses
|
### Change interface: IP Addresses
|
||||||
|
|
||||||
@@ -199,7 +199,7 @@ VPP into the companion Linux devices:
|
|||||||
_LIP_ with `lcp_itf_set_interface_addr()`.
|
_LIP_ with `lcp_itf_set_interface_addr()`.
|
||||||
|
|
||||||
This means with this
|
This means with this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/f7e1bb951d648a63dfa27d04ded0b6261b9e39fe)], at
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/f7e1bb951d648a63dfa27d04ded0b6261b9e39fe)], at
|
||||||
any time a new _LIP_ is created, the IPv4 and IPv6 address on the VPP interface are fully copied
|
any time a new _LIP_ is created, the IPv4 and IPv6 address on the VPP interface are fully copied
|
||||||
over by the third change, while at runtime, new addresses can be set/removed as well by the first
|
over by the third change, while at runtime, new addresses can be set/removed as well by the first
|
||||||
and second change.
|
and second change.
|
||||||
|
@@ -100,7 +100,7 @@ linux-cp {
|
|||||||
|
|
||||||
Based on this config, I set the startup default in `lcp_set_lcp_auto_subint()`, but I realize that
|
Based on this config, I set the startup default in `lcp_set_lcp_auto_subint()`, but I realize that
|
||||||
an administrator may want to turn it on/off at runtime, too, so I add a CLI getter/setter that
|
an administrator may want to turn it on/off at runtime, too, so I add a CLI getter/setter that
|
||||||
interacts with the flag in this [[commit](https://github.com/pimvanpelt/lcpng/commit/d23aab2d95aabcf24efb9f7aecaf15b513633ab7)]:
|
interacts with the flag in this [[commit](https://git.ipng.ch/ipng/lcpng/commit/d23aab2d95aabcf24efb9f7aecaf15b513633ab7)]:
|
||||||
|
|
||||||
```
|
```
|
||||||
DBGvpp# show lcp
|
DBGvpp# show lcp
|
||||||
@@ -116,11 +116,11 @@ lcp lcp-sync off
|
|||||||
```
|
```
|
||||||
|
|
||||||
The prep work for the rest of the interface syncer starts with this
|
The prep work for the rest of the interface syncer starts with this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||||
for the rest of this blog post, the behavior will be in the 'on' position.
|
for the rest of this blog post, the behavior will be in the 'on' position.
|
||||||
|
|
||||||
The code for the configuration toggle is in this
|
The code for the configuration toggle is in this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||||
|
|
||||||
### Auto create/delete sub-interfaces
|
### Auto create/delete sub-interfaces
|
||||||
|
|
||||||
@@ -145,7 +145,7 @@ I noticed that interface deletion had a bug (one that I fell victim to as well:
|
|||||||
remove the netlink device in the correct network namespace), which I fixed.
|
remove the netlink device in the correct network namespace), which I fixed.
|
||||||
|
|
||||||
The code for the auto create/delete and the bugfix is in this
|
The code for the auto create/delete and the bugfix is in this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||||
|
|
||||||
### Further Work
|
### Further Work
|
||||||
|
|
||||||
|
@@ -154,7 +154,7 @@ For now, `lcp_nl_dispatch()` just throws the message away after logging it with
|
|||||||
a function that will come in very useful as I start to explore all the different Netlink message types.
|
a function that will come in very useful as I start to explore all the different Netlink message types.
|
||||||
|
|
||||||
The code that forms the basis of our Netlink Listener lives in [[this
|
The code that forms the basis of our Netlink Listener lives in [[this
|
||||||
commit](https://github.com/pimvanpelt/lcpng/commit/c4e3043ea143d703915239b2390c55f7b6a9b0b1)] and
|
commit](https://git.ipng.ch/ipng/lcpng/commit/c4e3043ea143d703915239b2390c55f7b6a9b0b1)] and
|
||||||
specifically, here I want to call out I was not the primary author, I worked off of Matt and Neale's
|
specifically, here I want to call out I was not the primary author, I worked off of Matt and Neale's
|
||||||
awesome work in this pending [Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122).
|
awesome work in this pending [Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122).
|
||||||
|
|
||||||
@@ -182,7 +182,7 @@ Linux interface VPP is not aware of. But, if I can find the _LIP_, I can convert
|
|||||||
add or remove the ip4/ip6 neighbor adjacency.
|
add or remove the ip4/ip6 neighbor adjacency.
|
||||||
|
|
||||||
The code for this first Netlink message handler lives in this
|
The code for this first Netlink message handler lives in this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/30bab1d3f9ab06670fbef2c7c6a658e7b77f7738)]. An
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/30bab1d3f9ab06670fbef2c7c6a658e7b77f7738)]. An
|
||||||
ironic insight is that after writing the code, I don't think any of it will be necessary, because
|
ironic insight is that after writing the code, I don't think any of it will be necessary, because
|
||||||
the interface plugin will already copy ARP and IPv6 ND packets back and forth and itself update its
|
the interface plugin will already copy ARP and IPv6 ND packets back and forth and itself update its
|
||||||
neighbor adjacency tables; but I'm leaving the code in for now.
|
neighbor adjacency tables; but I'm leaving the code in for now.
|
||||||
@@ -197,7 +197,7 @@ it or remove it, and if there are no link-local addresses left, disable IPv6 on
|
|||||||
There's also a few multicast routes to add (notably 224.0.0.0/24 and ff00::/8, all-local-subnet).
|
There's also a few multicast routes to add (notably 224.0.0.0/24 and ff00::/8, all-local-subnet).
|
||||||
|
|
||||||
The code for IP address handling is in this
|
The code for IP address handling is in this
|
||||||
[[commit]](https://github.com/pimvanpelt/lcpng/commit/87742b4f541d389e745f0297d134e34f17b5b485), but
|
[[commit]](https://git.ipng.ch/ipng/lcpng/commit/87742b4f541d389e745f0297d134e34f17b5b485), but
|
||||||
when I took it out for a spin, I noticed something curious, looking at the log lines that are
|
when I took it out for a spin, I noticed something curious, looking at the log lines that are
|
||||||
generated for the following sequence:
|
generated for the following sequence:
|
||||||
|
|
||||||
@@ -236,7 +236,7 @@ interface and directly connected route addition/deletion is slightly different i
|
|||||||
So, I decide to take a little shortcut -- if an addition returns "already there", or a deletion returns
|
So, I decide to take a little shortcut -- if an addition returns "already there", or a deletion returns
|
||||||
"no such entry", I'll just consider it a successful addition and deletion respectively, saving my eyes
|
"no such entry", I'll just consider it a successful addition and deletion respectively, saving my eyes
|
||||||
from being screamed at by this red error message. I changed that in this
|
from being screamed at by this red error message. I changed that in this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/d63fbd8a9a612d038aa385e79a57198785d409ca)],
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/d63fbd8a9a612d038aa385e79a57198785d409ca)],
|
||||||
turning this situation in a friendly green notice instead.
|
turning this situation in a friendly green notice instead.
|
||||||
|
|
||||||
### Netlink: Link (existing)
|
### Netlink: Link (existing)
|
||||||
@@ -267,7 +267,7 @@ To avoid this loop, I temporarily turn off `lcp-sync` just before handling a bat
|
|||||||
turn it back to its original state when I'm done with that.
|
turn it back to its original state when I'm done with that.
|
||||||
|
|
||||||
The code for all/del of existing links is in this
|
The code for all/del of existing links is in this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)].
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)].
|
||||||
|
|
||||||
### Netlink: Link (new)
|
### Netlink: Link (new)
|
||||||
|
|
||||||
@@ -276,7 +276,7 @@ doesn't have a _LIP_ for, but specifically describes a VLAN interface? Well, th
|
|||||||
is trying to create a new sub-interface. And supporting that operation would be super cool, so let's go!
|
is trying to create a new sub-interface. And supporting that operation would be super cool, so let's go!
|
||||||
|
|
||||||
Using the earlier placeholder hint in `lcp_nl_link_add()` (see the previous
|
Using the earlier placeholder hint in `lcp_nl_link_add()` (see the previous
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)]),
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)]),
|
||||||
I know that I've gotten a NEWLINK request but the Linux ifindex doesn't have a _LIP_. This could be
|
I know that I've gotten a NEWLINK request but the Linux ifindex doesn't have a _LIP_. This could be
|
||||||
because the interface is entirely foreign to VPP, for example somebody created a dummy interface or
|
because the interface is entirely foreign to VPP, for example somebody created a dummy interface or
|
||||||
a VLAN sub-interface on one:
|
a VLAN sub-interface on one:
|
||||||
@@ -331,7 +331,7 @@ a boring `<phy>.<subid>` name.
|
|||||||
|
|
||||||
Alright, without further ado, the code for the main innovation here, the implementation of
|
Alright, without further ado, the code for the main innovation here, the implementation of
|
||||||
`lcp_nl_link_add_vlan()`, is in this
|
`lcp_nl_link_add_vlan()`, is in this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/45f408865688eb7ea0cdbf23aa6f8a973be49d1a)].
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/45f408865688eb7ea0cdbf23aa6f8a973be49d1a)].
|
||||||
|
|
||||||
## Results
|
## Results
|
||||||
|
|
||||||
|
@@ -118,7 +118,7 @@ or Virtual Routing/Forwarding domains). So first, I need to add these:
|
|||||||
|
|
||||||
All of this code was heavily inspired by the pending [[Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122)]
|
All of this code was heavily inspired by the pending [[Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122)]
|
||||||
but a few finishing touches were added, and wrapped up in this
|
but a few finishing touches were added, and wrapped up in this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/7a76498277edc43beaa680e91e3a0c1787319106)].
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/7a76498277edc43beaa680e91e3a0c1787319106)].
|
||||||
|
|
||||||
### Deletion
|
### Deletion
|
||||||
|
|
||||||
@@ -459,7 +459,7 @@ it as 'unreachable' rather than deleting it. These are *additions* which have a
|
|||||||
but with an interface index of 1 (which, in Netlink, is 'lo'). This makes VPP intermittently crash, so I
|
but with an interface index of 1 (which, in Netlink, is 'lo'). This makes VPP intermittently crash, so I
|
||||||
currently commented this out, while I gain better understanding. Result: blackhole/unreachable/prohibit
|
currently commented this out, while I gain better understanding. Result: blackhole/unreachable/prohibit
|
||||||
specials can not be set using the plugin. Beware!
|
specials can not be set using the plugin. Beware!
|
||||||
(disabled in this [[commit](https://github.com/pimvanpelt/lcpng/commit/7c864ed099821f62c5be8cbe9ed3f4dd34000a42)]).
|
(disabled in this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c864ed099821f62c5be8cbe9ed3f4dd34000a42)]).
|
||||||
|
|
||||||
## Credits
|
## Credits
|
||||||
|
|
||||||
|
@@ -88,7 +88,7 @@ stat['/if/rx-miss'][:, 1].sum() - returns the sum of packet counters for
|
|||||||
```
|
```
|
||||||
|
|
||||||
Alright, so let's grab that file and refactor it into a small library for me to use, I do
|
Alright, so let's grab that file and refactor it into a small library for me to use, I do
|
||||||
this in [[this commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
this in [[this commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||||
|
|
||||||
### VPP's API
|
### VPP's API
|
||||||
|
|
||||||
@@ -159,7 +159,7 @@ idx=19 name=tap4 mac=02:fe:17:06:fc:af mtu=9000 flags=3
|
|||||||
|
|
||||||
So I added a little abstration with some error handling and one main function
|
So I added a little abstration with some error handling and one main function
|
||||||
to return interfaces as a Python dictionary of those `sw_interface_details`
|
to return interfaces as a Python dictionary of those `sw_interface_details`
|
||||||
tuples in [[this commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
tuples in [[this commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||||
|
|
||||||
### AgentX
|
### AgentX
|
||||||
|
|
||||||
@@ -207,9 +207,9 @@ once asked with `GetPDU` or `GetNextPDU` requests, by issuing a corresponding `R
|
|||||||
to the SNMP server -- it takes care of all the rest!
|
to the SNMP server -- it takes care of all the rest!
|
||||||
|
|
||||||
The resulting code is in [[this
|
The resulting code is in [[this
|
||||||
commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/8c9c1e2b4aa1d40a981f17581f92bba133dd2c29)]
|
commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/8c9c1e2b4aa1d40a981f17581f92bba133dd2c29)]
|
||||||
but you can also check out the whole thing on
|
but you can also check out the whole thing on
|
||||||
[[Github](https://github.com/pimvanpelt/vpp-snmp-agent)].
|
[[Github](https://git.ipng.ch/ipng/vpp-snmp-agent)].
|
||||||
|
|
||||||
### Building
|
### Building
|
||||||
|
|
||||||
|
@@ -480,7 +480,7 @@ is to say, those packets which were destined to any IP address configured on the
|
|||||||
plane. Any traffic going _through_ VPP will never be seen by Linux! So, I'll have to be
|
plane. Any traffic going _through_ VPP will never be seen by Linux! So, I'll have to be
|
||||||
clever and count this traffic by polling VPP instead. This was the topic of my previous
|
clever and count this traffic by polling VPP instead. This was the topic of my previous
|
||||||
[VPP Part 6]({{< ref "2021-09-10-vpp-6" >}}) about the SNMP Agent. All of that code
|
[VPP Part 6]({{< ref "2021-09-10-vpp-6" >}}) about the SNMP Agent. All of that code
|
||||||
was released to [Github](https://github.com/pimvanpelt/vpp-snmp-agent), notably there's
|
was released to [Github](https://git.ipng.ch/ipng/vpp-snmp-agent), notably there's
|
||||||
a hint there for an `snmpd-dataplane.service` and a `vpp-snmp-agent.service`, including
|
a hint there for an `snmpd-dataplane.service` and a `vpp-snmp-agent.service`, including
|
||||||
the compiled binary that reads from VPP and feeds this to SNMP.
|
the compiled binary that reads from VPP and feeds this to SNMP.
|
||||||
|
|
||||||
|
@@ -30,9 +30,9 @@ virtual machine running in Qemu/KVM into a working setup with both [Free Range R
|
|||||||
and [Bird](https://bird.network.cz/) installed side by side.
|
and [Bird](https://bird.network.cz/) installed side by side.
|
||||||
|
|
||||||
**NOTE**: If you're just interested in the resulting image, here's the most pertinent information:
|
**NOTE**: If you're just interested in the resulting image, here's the most pertinent information:
|
||||||
> * ***vpp-proto.qcow2.lrz [[Download](https://ipng.ch/media/vpp-proto/vpp-proto-bookworm-20231015.qcow2.lrz)]***
|
> * ***vpp-proto.qcow2.lrz*** [[Download](https://ipng.ch/media/vpp-proto/vpp-proto-bookworm-20250607.qcow2.lrz)]
|
||||||
> * ***SHA256*** `bff03a80ccd1c0094d867d1eb1b669720a1838330c0a5a526439ecb1a2457309`
|
> * ***SHA256*** `a5fdf157c03f2d202dcccdf6ed97db49c8aa5fdb6b9ca83a1da958a8a24780ab
|
||||||
> * ***Debian Bookworm (12.4)*** and ***VPP 24.02-rc0~46-ga16463610e***
|
> * ***Debian Bookworm (12.11)*** and ***VPP 25.10-rc0~49-g90d92196***
|
||||||
> * ***CPU*** Make sure the (virtualized) CPU supports AVX
|
> * ***CPU*** Make sure the (virtualized) CPU supports AVX
|
||||||
> * ***RAM*** The image needs at least 4GB of RAM, and the hypervisor should support hugepages and AVX
|
> * ***RAM*** The image needs at least 4GB of RAM, and the hypervisor should support hugepages and AVX
|
||||||
> * ***Username***: `ipng` with ***password***: `ipng loves vpp` and is sudo-enabled
|
> * ***Username***: `ipng` with ***password***: `ipng loves vpp` and is sudo-enabled
|
||||||
@@ -62,7 +62,7 @@ plugins:
|
|||||||
or route, or the system receiving ARP or IPv6 neighbor request/reply from neighbors), and applying
|
or route, or the system receiving ARP or IPv6 neighbor request/reply from neighbors), and applying
|
||||||
these events to the VPP dataplane.
|
these events to the VPP dataplane.
|
||||||
|
|
||||||
I've published the code on [Github](https://github.com/pimvanpelt/lcpng/) and I am targeting a release
|
I've published the code on [Github](https://git.ipng.ch/ipng/lcpng/) and I am targeting a release
|
||||||
in upstream VPP, hoping to make the upcoming 22.02 release in February 2022. I have a lot of ground to
|
in upstream VPP, hoping to make the upcoming 22.02 release in February 2022. I have a lot of ground to
|
||||||
cover, but I will note that the plugin has been running in production in [AS8298]({{< ref "2021-02-27-network" >}})
|
cover, but I will note that the plugin has been running in production in [AS8298]({{< ref "2021-02-27-network" >}})
|
||||||
since Sep'21 and no crashes related to LinuxCP have been observed.
|
since Sep'21 and no crashes related to LinuxCP have been observed.
|
||||||
@@ -195,7 +195,7 @@ So grab a cup of tea, while we let Rhino stretch its legs, ehh, CPUs ...
|
|||||||
pim@rhino:~$ mkdir -p ~/src
|
pim@rhino:~$ mkdir -p ~/src
|
||||||
pim@rhino:~$ cd ~/src
|
pim@rhino:~$ cd ~/src
|
||||||
pim@rhino:~/src$ sudo apt install libmnl-dev
|
pim@rhino:~/src$ sudo apt install libmnl-dev
|
||||||
pim@rhino:~/src$ git clone https://github.com/pimvanpelt/lcpng.git
|
pim@rhino:~/src$ git clone https://git.ipng.ch/ipng/lcpng.git
|
||||||
pim@rhino:~/src$ git clone https://gerrit.fd.io/r/vpp
|
pim@rhino:~/src$ git clone https://gerrit.fd.io/r/vpp
|
||||||
pim@rhino:~/src$ ln -s ~/src/lcpng ~/src/vpp/src/plugins/lcpng
|
pim@rhino:~/src$ ln -s ~/src/lcpng ~/src/vpp/src/plugins/lcpng
|
||||||
pim@rhino:~/src$ cd ~/src/vpp
|
pim@rhino:~/src$ cd ~/src/vpp
|
||||||
|
@@ -33,7 +33,7 @@ In this first post, let's take a look at tablestakes: writing a YAML specificati
|
|||||||
configuration elements of VPP, and then ensures that the YAML file is both syntactically as well as
|
configuration elements of VPP, and then ensures that the YAML file is both syntactically as well as
|
||||||
semantically correct.
|
semantically correct.
|
||||||
|
|
||||||
**Note**: Code is on [my Github](https://github.com/pimvanpelt/vppcfg), but it's not quite ready for
|
**Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for
|
||||||
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
|
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
|
||||||
or reach out by [contacting us](/s/contact/).
|
or reach out by [contacting us](/s/contact/).
|
||||||
|
|
||||||
@@ -348,7 +348,7 @@ to mess up my (or your!) VPP router by feeding it garbage, so the lions' share o
|
|||||||
has been to assert the YAML file is both syntactically and semantically valid.
|
has been to assert the YAML file is both syntactically and semantically valid.
|
||||||
|
|
||||||
|
|
||||||
In the mean time, you can take a look at my code on [GitHub](https://github.com/pimvanpelt/vppcfg), but to
|
In the mean time, you can take a look at my code on [GitHub](https://git.ipng.ch/ipng/vppcfg), but to
|
||||||
whet your appetite, here's a hefty configuration that demonstrates all implemented types:
|
whet your appetite, here's a hefty configuration that demonstrates all implemented types:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@@ -32,7 +32,7 @@ the configuration to the dataplane. Welcome to `vppcfg`!
|
|||||||
In this second post of the series, I want to talk a little bit about how planning a path from a running
|
In this second post of the series, I want to talk a little bit about how planning a path from a running
|
||||||
configuration to a desired new configuration might look like.
|
configuration to a desired new configuration might look like.
|
||||||
|
|
||||||
**Note**: Code is on [my Github](https://github.com/pimvanpelt/vppcfg), but it's not quite ready for
|
**Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for
|
||||||
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
|
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
|
||||||
or reach out by [contacting us](/s/contact/).
|
or reach out by [contacting us](/s/contact/).
|
||||||
|
|
||||||
|
@@ -171,12 +171,12 @@ GigabitEthernet1/0/0 1 up GigabitEthernet1/0/0
|
|||||||
|
|
||||||
After this exploratory exercise, I have learned enough about the hardware to be able to take the
|
After this exploratory exercise, I have learned enough about the hardware to be able to take the
|
||||||
Fitlet2 out for a spin. To configure the VPP instance, I turn to
|
Fitlet2 out for a spin. To configure the VPP instance, I turn to
|
||||||
[[vppcfg](https://github.com/pimvanpelt/vppcfg)], which can take a YAML configuration file
|
[[vppcfg](https://git.ipng.ch/ipng/vppcfg)], which can take a YAML configuration file
|
||||||
describing the desired VPP configuration, and apply it safely to the running dataplane using the VPP
|
describing the desired VPP configuration, and apply it safely to the running dataplane using the VPP
|
||||||
API. I've written a few more posts on how it does that, notably on its [[syntax]({{< ref "2022-03-27-vppcfg-1" >}})]
|
API. I've written a few more posts on how it does that, notably on its [[syntax]({{< ref "2022-03-27-vppcfg-1" >}})]
|
||||||
and its [[planner]({{< ref "2022-04-02-vppcfg-2" >}})]. A complete
|
and its [[planner]({{< ref "2022-04-02-vppcfg-2" >}})]. A complete
|
||||||
configuration guide on vppcfg can be found
|
configuration guide on vppcfg can be found
|
||||||
[[here](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md)].
|
[[here](https://git.ipng.ch/ipng/vppcfg/blob/main/docs/config-guide.md)].
|
||||||
|
|
||||||
```
|
```
|
||||||
pim@fitlet:~$ sudo dpkg -i {lib,}vpp*23.06*deb
|
pim@fitlet:~$ sudo dpkg -i {lib,}vpp*23.06*deb
|
||||||
|
@@ -185,7 +185,7 @@ forgetful chipmunk-sized brain!), so here, I'll only recap what's already writte
|
|||||||
|
|
||||||
**1. BUILD:** For the first step, the build is straight forward, and yields a VPP instance based on
|
**1. BUILD:** For the first step, the build is straight forward, and yields a VPP instance based on
|
||||||
`vpp-ext-deps_23.06-1` at version `23.06-rc0~71-g182d2b466`, which contains my
|
`vpp-ext-deps_23.06-1` at version `23.06-rc0~71-g182d2b466`, which contains my
|
||||||
[[LCPng](https://github.com/pimvanpelt/lcpng.git)] plugin. I then copy the packages to the router.
|
[[LCPng](https://git.ipng.ch/ipng/lcpng.git)] plugin. I then copy the packages to the router.
|
||||||
The router has an E-2286G CPU @ 4.00GHz with 6 cores and 6 hyperthreads. There's a really handy tool
|
The router has an E-2286G CPU @ 4.00GHz with 6 cores and 6 hyperthreads. There's a really handy tool
|
||||||
called `likwid-topology` that can show how the L1, L2 and L3 cache lines up with respect to CPU
|
called `likwid-topology` that can show how the L1, L2 and L3 cache lines up with respect to CPU
|
||||||
cores. Here I learn that CPU (0+6) and (1+7) share L1 and L2 cache -- so I can conclude that 0-5 are
|
cores. Here I learn that CPU (0+6) and (1+7) share L1 and L2 cache -- so I can conclude that 0-5 are
|
||||||
@@ -351,7 +351,7 @@ in `vppcfg`:
|
|||||||
* When I create the initial `--novpp` config, there's a bug in `vppcfg` where I incorrectly
|
* When I create the initial `--novpp` config, there's a bug in `vppcfg` where I incorrectly
|
||||||
reference a dataplane object which I haven't initialized (because with `--novpp` the tool
|
reference a dataplane object which I haven't initialized (because with `--novpp` the tool
|
||||||
will not contact the dataplane at all. That one was easy to fix, which I did in [[this
|
will not contact the dataplane at all. That one was easy to fix, which I did in [[this
|
||||||
commit](https://github.com/pimvanpelt/vppcfg/commit/0a0413927a0be6ed3a292a8c336deab8b86f5eee)]).
|
commit](https://git.ipng.ch/ipng/vppcfg/commit/0a0413927a0be6ed3a292a8c336deab8b86f5eee)]).
|
||||||
|
|
||||||
After that small detour, I can now proceed to configure the dataplane by offering the resulting
|
After that small detour, I can now proceed to configure the dataplane by offering the resulting
|
||||||
VPP commands, like so:
|
VPP commands, like so:
|
||||||
@@ -573,7 +573,7 @@ see is that which is destined to the controlplane (eg, to one of the IPv4 or IPv
|
|||||||
multicast/broadcast groups that they are participating in), so things like tcpdump or SNMP won't
|
multicast/broadcast groups that they are participating in), so things like tcpdump or SNMP won't
|
||||||
really work.
|
really work.
|
||||||
|
|
||||||
However, due to my [[vpp-snmp-agent](https://github.com/pimvanpelt/vpp-snmp-agent.git)], which is
|
However, due to my [[vpp-snmp-agent](https://git.ipng.ch/ipng/vpp-snmp-agent.git)], which is
|
||||||
feeding as an AgentX behind an snmpd that in turn is running in the `dataplane` namespace, SNMP scrapes
|
feeding as an AgentX behind an snmpd that in turn is running in the `dataplane` namespace, SNMP scrapes
|
||||||
work as they did before, albeit with a few different interface names.
|
work as they did before, albeit with a few different interface names.
|
||||||
|
|
||||||
|
@@ -14,7 +14,7 @@ performance and versatility. For those of us who have used Cisco IOS/XR devices,
|
|||||||
_ASR_ (aggregation service router), VPP will look and feel quite familiar as many of the approaches
|
_ASR_ (aggregation service router), VPP will look and feel quite familiar as many of the approaches
|
||||||
are shared between the two.
|
are shared between the two.
|
||||||
|
|
||||||
I've been working on the Linux Control Plane [[ref](https://github.com/pimvanpelt/lcpng)], which you
|
I've been working on the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)], which you
|
||||||
can read all about in my series on VPP back in 2021:
|
can read all about in my series on VPP back in 2021:
|
||||||
|
|
||||||
[{: style="width:300px; float: right; margin-left: 1em;"}](https://video.ipng.ch/w/erc9sAofrSZ22qjPwmv6H4)
|
[{: style="width:300px; float: right; margin-left: 1em;"}](https://video.ipng.ch/w/erc9sAofrSZ22qjPwmv6H4)
|
||||||
@@ -70,7 +70,7 @@ answered by a Response PDU.
|
|||||||
|
|
||||||
Using parts of a Python Agentx library written by GitHub user hosthvo
|
Using parts of a Python Agentx library written by GitHub user hosthvo
|
||||||
[[ref](https://github.com/hosthvo/pyagentx)], I tried my hands at writing one of these AgentX's.
|
[[ref](https://github.com/hosthvo/pyagentx)], I tried my hands at writing one of these AgentX's.
|
||||||
The resulting source code is on [[GitHub](https://github.com/pimvanpelt/vpp-snmp-agent)]. That's the
|
The resulting source code is on [[GitHub](https://git.ipng.ch/ipng/vpp-snmp-agent)]. That's the
|
||||||
one that's running in production ever since I started running VPP routers at IPng Networks AS8298.
|
one that's running in production ever since I started running VPP routers at IPng Networks AS8298.
|
||||||
After the _AgentX_ exposes the dataplane interfaces and their statistics into _SNMP_, an open source
|
After the _AgentX_ exposes the dataplane interfaces and their statistics into _SNMP_, an open source
|
||||||
monitoring tool such as LibreNMS [[ref](https://librenms.org/)] can discover the routers and draw
|
monitoring tool such as LibreNMS [[ref](https://librenms.org/)] can discover the routers and draw
|
||||||
@@ -126,7 +126,7 @@ for any interface created in the dataplane.
|
|||||||
|
|
||||||
I wish I were good at Go, but I never really took to the language. I'm pretty good at Python, but
|
I wish I were good at Go, but I never really took to the language. I'm pretty good at Python, but
|
||||||
sorting through the stats segment isn't super quick as I've already noticed in the Python3 based
|
sorting through the stats segment isn't super quick as I've already noticed in the Python3 based
|
||||||
[[VPP SNMP Agent](https://github.com/pimvanpelt/vpp-snmp-agent)]. I'm probably the world's least
|
[[VPP SNMP Agent](https://git.ipng.ch/ipng/vpp-snmp-agent)]. I'm probably the world's least
|
||||||
terrible C programmer, so maybe I can take a look at the VPP Stats Client and make sense of it. Luckily,
|
terrible C programmer, so maybe I can take a look at the VPP Stats Client and make sense of it. Luckily,
|
||||||
there's an example already in `src/vpp/app/vpp_get_stats.c` and it reveals the following pattern:
|
there's an example already in `src/vpp/app/vpp_get_stats.c` and it reveals the following pattern:
|
||||||
|
|
||||||
|
@@ -19,7 +19,7 @@ same time keep an IPng Site Local network with IPv4 and IPv6 that is separate fr
|
|||||||
based on hardware/silicon based forwarding at line rate and high availability. You can read all
|
based on hardware/silicon based forwarding at line rate and high availability. You can read all
|
||||||
about my Centec MPLS shenanigans in [[this article]({{< ref "2023-03-11-mpls-core" >}})].
|
about my Centec MPLS shenanigans in [[this article]({{< ref "2023-03-11-mpls-core" >}})].
|
||||||
|
|
||||||
Ever since the release of the Linux Control Plane [[ref](https://github.com/pimvanpelt/lcpng)]
|
Ever since the release of the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)]
|
||||||
plugin in VPP, folks have asked "What about MPLS?" -- I have never really felt the need to go this
|
plugin in VPP, folks have asked "What about MPLS?" -- I have never really felt the need to go this
|
||||||
rabbit hole, because I figured that in this day and age, higher level IP protocols that do tunneling
|
rabbit hole, because I figured that in this day and age, higher level IP protocols that do tunneling
|
||||||
are just as performant, and a little bit less of an 'art' to get right. For example, the Centec
|
are just as performant, and a little bit less of an 'art' to get right. For example, the Centec
|
||||||
|
@@ -459,6 +459,6 @@ and VPP, and the overall implementation before attempting to use in production.
|
|||||||
we got at least some of this right, but testing and runtime experience will tell.
|
we got at least some of this right, but testing and runtime experience will tell.
|
||||||
|
|
||||||
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
||||||
[[GitHub](https://github.com/pimvanpelt/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
[[GitHub](https://git.ipng.ch/ipng/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||||
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
||||||
|
|
||||||
|
@@ -385,5 +385,5 @@ and VPP, and the overall implementation before attempting to use in production.
|
|||||||
we got at least some of this right, but testing and runtime experience will tell.
|
we got at least some of this right, but testing and runtime experience will tell.
|
||||||
|
|
||||||
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
||||||
[[GitHub](https://github.com/pimvanpelt/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
[[GitHub](https://git.ipng.ch/ipng/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||||
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
||||||
|
@@ -304,7 +304,7 @@ Gateway, just to show a few of the more advanced features of VPP. For me, this t
|
|||||||
line of thinking: classifiers. This extract/match/act pattern can be used in policers, ACLs and
|
line of thinking: classifiers. This extract/match/act pattern can be used in policers, ACLs and
|
||||||
arbitrary traffic redirection through VPP's directed graph (eg. selecting a next node for
|
arbitrary traffic redirection through VPP's directed graph (eg. selecting a next node for
|
||||||
processing). I'm going to deep-dive into this classifier behavior in an upcoming article, and see
|
processing). I'm going to deep-dive into this classifier behavior in an upcoming article, and see
|
||||||
how I might add this to [[vppcfg](https://github.com/pimvanpelt/vppcfg.git)], because I think it
|
how I might add this to [[vppcfg](https://git.ipng.ch/ipng/vppcfg.git)], because I think it
|
||||||
would be super powerful to abstract away the rather complex underlying API into something a little
|
would be super powerful to abstract away the rather complex underlying API into something a little
|
||||||
bit more ... user friendly. Stay tuned! :)
|
bit more ... user friendly. Stay tuned! :)
|
||||||
|
|
||||||
|
@@ -359,7 +359,7 @@ does not have an IPv4 address. Except -- I'm bending the rules a little bit by d
|
|||||||
There's an internal function `ip4_sw_interface_enable_disable()` which is called to enable IPv4
|
There's an internal function `ip4_sw_interface_enable_disable()` which is called to enable IPv4
|
||||||
processing on an interface once the first IPv4 address is added. So my first fix is to force this to
|
processing on an interface once the first IPv4 address is added. So my first fix is to force this to
|
||||||
be enabled for any interface that is exposed via Linux Control Plane, notably in `lcp_itf_pair_create()`
|
be enabled for any interface that is exposed via Linux Control Plane, notably in `lcp_itf_pair_create()`
|
||||||
[[here](https://github.com/pimvanpelt/lcpng/blob/main/lcpng_interface.c#L777)].
|
[[here](https://git.ipng.ch/ipng/lcpng/blob/main/lcpng_interface.c#L777)].
|
||||||
|
|
||||||
This approach is partially effective:
|
This approach is partially effective:
|
||||||
|
|
||||||
@@ -500,7 +500,7 @@ which is unnumbered. Because I don't know for sure if everybody would find this
|
|||||||
I make sure to guard the behavior behind a backwards compatible configuration option.
|
I make sure to guard the behavior behind a backwards compatible configuration option.
|
||||||
|
|
||||||
If you're curious, please take a look at the change in my [[GitHub
|
If you're curious, please take a look at the change in my [[GitHub
|
||||||
repo](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
repo](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||||
which I:
|
which I:
|
||||||
1. add a new configuration option, `lcp-sync-unnumbered`, which defaults to `on`. That would be
|
1. add a new configuration option, `lcp-sync-unnumbered`, which defaults to `on`. That would be
|
||||||
what the plugin would do in the normal case: copy forward these borrowed IP addresses to Linux.
|
what the plugin would do in the normal case: copy forward these borrowed IP addresses to Linux.
|
||||||
|
@@ -147,7 +147,7 @@ With all of that, I am ready to demonstrate two working solutions now. I first c
|
|||||||
Ondrej's [[commit](https://gitlab.nic.cz/labs/bird/-/commit/280daed57d061eb1ebc89013637c683fe23465e8)].
|
Ondrej's [[commit](https://gitlab.nic.cz/labs/bird/-/commit/280daed57d061eb1ebc89013637c683fe23465e8)].
|
||||||
Then, I compile VPP with my pending [[gerrit](https://gerrit.fd.io/r/c/vpp/+/40482)]. Finally,
|
Then, I compile VPP with my pending [[gerrit](https://gerrit.fd.io/r/c/vpp/+/40482)]. Finally,
|
||||||
to demonstrate how `update_loopback_addr()` might work, I compile `lcpng` with my previous
|
to demonstrate how `update_loopback_addr()` might work, I compile `lcpng` with my previous
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)],
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)],
|
||||||
which allows me to inhibit copying forward addresses from VPP to Linux, when using _unnumbered_
|
which allows me to inhibit copying forward addresses from VPP to Linux, when using _unnumbered_
|
||||||
interfaces.
|
interfaces.
|
||||||
|
|
||||||
|
@@ -250,10 +250,10 @@ remove the IPv4 and IPv6 addresses from the <span style='color:red;font-weight:b
|
|||||||
routers in Brüttisellen. They are directly connected, and if anything goes wrong, I can walk
|
routers in Brüttisellen. They are directly connected, and if anything goes wrong, I can walk
|
||||||
over and rescue them. Sounds like a safe way to start!
|
over and rescue them. Sounds like a safe way to start!
|
||||||
|
|
||||||
I quickly add the ability for [[vppcfg](https://github.com/pimvanpelt/vppcfg)] to configure
|
I quickly add the ability for [[vppcfg](https://git.ipng.ch/ipng/vppcfg)] to configure
|
||||||
_unnumbered_ interfaces. In VPP, these are interfaces that don't have an IPv4 or IPv6 address of
|
_unnumbered_ interfaces. In VPP, these are interfaces that don't have an IPv4 or IPv6 address of
|
||||||
their own, but they borrow one from another interface. If you're curious, you can take a look at the
|
their own, but they borrow one from another interface. If you're curious, you can take a look at the
|
||||||
[[User Guide](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md#interfaces)] on
|
[[User Guide](https://git.ipng.ch/ipng/vppcfg/blob/main/docs/config-guide.md#interfaces)] on
|
||||||
GitHub.
|
GitHub.
|
||||||
|
|
||||||
Looking at their `vppcfg` files, the change is actually very easy, taking as an example the
|
Looking at their `vppcfg` files, the change is actually very easy, taking as an example the
|
||||||
@@ -291,7 +291,7 @@ interface.
|
|||||||
|
|
||||||
In the article, you'll see that discussed as _Solution 2_, and it includes a bit of rationale why I
|
In the article, you'll see that discussed as _Solution 2_, and it includes a bit of rationale why I
|
||||||
find this better. I implemented it in this
|
find this better. I implemented it in this
|
||||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||||
case you're curious, and the commandline keyword is `lcp lcp-sync-unnumbered off` (the default is
|
case you're curious, and the commandline keyword is `lcp lcp-sync-unnumbered off` (the default is
|
||||||
_on_).
|
_on_).
|
||||||
|
|
||||||
|
238
content/articles/2024-09-03-asr9001.md
Normal file
238
content/articles/2024-09-03-asr9001.md
Normal file
@@ -0,0 +1,238 @@
|
|||||||
|
---
|
||||||
|
date: "2024-09-03T13:07:54Z"
|
||||||
|
title: Loadtest notes, ASR9001
|
||||||
|
draft: true
|
||||||
|
---
|
||||||
|
|
||||||
|
### L2 point-to-point (L2XC) config
|
||||||
|
|
||||||
|
```
|
||||||
|
interface TenGigE0/0/0/0
|
||||||
|
mtu 9216
|
||||||
|
load-interval 30
|
||||||
|
l2transport
|
||||||
|
!
|
||||||
|
!
|
||||||
|
interface TenGigE0/0/0/1
|
||||||
|
mtu 9216
|
||||||
|
load-interval 30
|
||||||
|
l2transport
|
||||||
|
!
|
||||||
|
!
|
||||||
|
interface TenGigE0/0/0/2
|
||||||
|
mtu 9216
|
||||||
|
load-interval 30
|
||||||
|
l2transport
|
||||||
|
!
|
||||||
|
!
|
||||||
|
interface TenGigE0/0/0/3
|
||||||
|
mtu 9216
|
||||||
|
load-interval 30
|
||||||
|
l2transport
|
||||||
|
!
|
||||||
|
!
|
||||||
|
|
||||||
|
|
||||||
|
...
|
||||||
|
l2vpn
|
||||||
|
load-balancing flow src-dst-ip
|
||||||
|
logging
|
||||||
|
bridge-domain
|
||||||
|
pseudowire
|
||||||
|
!
|
||||||
|
xconnect group LoadTest
|
||||||
|
p2p pair0
|
||||||
|
interface TenGigE0/0/2/0
|
||||||
|
interface TenGigE0/0/2/1
|
||||||
|
!
|
||||||
|
p2p pair1
|
||||||
|
interface TenGigE0/0/2/2
|
||||||
|
interface TenGigE0/0/2/3
|
||||||
|
!
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### L2 Bridge-Domain
|
||||||
|
|
||||||
|
```
|
||||||
|
l2vpn
|
||||||
|
bridge group LoadTestp
|
||||||
|
bridge-domain bd0
|
||||||
|
interface TenGigE0/0/0/0
|
||||||
|
!
|
||||||
|
interface TenGigE0/0/0/1
|
||||||
|
!
|
||||||
|
!
|
||||||
|
bridge-domain bd1
|
||||||
|
interface TenGigE0/0/0/2
|
||||||
|
!
|
||||||
|
interface TenGigE0/0/0/3
|
||||||
|
!
|
||||||
|
!
|
||||||
|
...
|
||||||
|
```
|
||||||
|
RP/0/RSP0/CPU0:micro-fridge#show l2vpn forwarding bridge-domain mac-address location 0/0/CPU0
|
||||||
|
Sat Aug 31 12:09:08.957 UTC
|
||||||
|
Mac Address Type Learned from/Filtered on LC learned Resync Age Mapped to
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
9c69.b461.fcf2 dynamic Te0/0/0/0 0/0/CPU0 0d 0h 0m 14s N/A
|
||||||
|
9c69.b461.fcf3 dynamic Te0/0/0/1 0/0/CPU0 0d 0h 0m 2s N/A
|
||||||
|
001b.2155.1f11 dynamic Te0/0/0/2 0/0/CPU0 0d 0h 0m 0s N/A
|
||||||
|
001b.2155.1f10 dynamic Te0/0/0/3 0/0/CPU0 0d 0h 0m 15s N/A
|
||||||
|
001b.21bc.47a4 dynamic Te0/0/1/0 0/0/CPU0 0d 0h 0m 6s N/A
|
||||||
|
001b.21bc.47a5 dynamic Te0/0/1/1 0/0/CPU0 0d 0h 0m 21s N/A
|
||||||
|
9c69.b461.ff41 dynamic Te0/0/1/2 0/0/CPU0 0d 0h 0m 16s N/A
|
||||||
|
9c69.b461.ff40 dynamic Te0/0/1/3 0/0/CPU0 0d 0h 0m 10s N/A
|
||||||
|
001b.2155.1d1d dynamic Te0/0/2/0 0/0/CPU0 0d 0h 0m 9s N/A
|
||||||
|
001b.2155.1d1c dynamic Te0/0/2/1 0/0/CPU0 0d 0h 0m 16s N/A
|
||||||
|
001b.2155.1e08 dynamic Te0/0/2/2 0/0/CPU0 0d 0h 0m 4s N/A
|
||||||
|
001b.2155.1e09 dynamic Te0/0/2/3 0/0/CPU0 0d 0h 0m 11s N/A
|
||||||
|
```
|
||||||
|
|
||||||
|
Interesting finding, after a bridge-domain overload occurs, forwarding pretty much stops
|
||||||
|
```
|
||||||
|
Te0/0/0/0:
|
||||||
|
30 second input rate 6931755000 bits/sec, 14441158 packets/sec
|
||||||
|
30 second output rate 0 bits/sec, 0 packets/sec
|
||||||
|
Te0/0/0/1:
|
||||||
|
30 second input rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second output rate 19492000 bits/sec, 40609 packets/sec
|
||||||
|
|
||||||
|
Te0/0/0/2:
|
||||||
|
30 second input rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second output rate 19720000 bits/sec, 41084 packets/sec
|
||||||
|
Te0/0/0/3:
|
||||||
|
30 second input rate 6931728000 bits/sec, 14441100 packets/sec
|
||||||
|
30 second output rate 0 bits/sec, 0 packets/sec
|
||||||
|
|
||||||
|
... and so on
|
||||||
|
|
||||||
|
30 second input rate 6931558000 bits/sec, 14440748 packets/sec
|
||||||
|
30 second output rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second input rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second output rate 12627000 bits/sec, 26307 packets/sec
|
||||||
|
30 second input rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second output rate 12710000 bits/sec, 26479 packets/sec
|
||||||
|
30 second input rate 6931542000 bits/sec, 14440712 packets/sec
|
||||||
|
30 second output rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second input rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second output rate 19196000 bits/sec, 39992 packets/sec
|
||||||
|
30 second input rate 6931651000 bits/sec, 14440938 packets/sec
|
||||||
|
30 second output rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second input rate 6931658000 bits/sec, 14440958 packets/sec
|
||||||
|
30 second output rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second input rate 0 bits/sec, 0 packets/sec
|
||||||
|
30 second output rate 13167000 bits/sec, 27431 packets/sec
|
||||||
|
```
|
||||||
|
|
||||||
|
MPLS enabled test:
|
||||||
|
```
|
||||||
|
arp vrf default 100.64.0.2 001b.2155.1e08 ARPA
|
||||||
|
arp vrf default 100.64.1.2 001b.2155.1e09 ARPA
|
||||||
|
arp vrf default 100.64.2.2 001b.2155.1d1c ARPA
|
||||||
|
arp vrf default 100.64.3.2 001b.2155.1d1d ARPA
|
||||||
|
arp vrf default 100.64.4.2 001b.21bc.47a4 ARPA
|
||||||
|
arp vrf default 100.64.5.2 001b.21bc.47a5 ARPA
|
||||||
|
arp vrf default 100.64.6.2 9c69.b461.fcf2 ARPA
|
||||||
|
arp vrf default 100.64.7.2 9c69.b461.fcf3 ARPA
|
||||||
|
arp vrf default 100.64.8.2 001b.2155.1f10 ARPA
|
||||||
|
arp vrf default 100.64.9.2 001b.2155.1f11 ARPA
|
||||||
|
arp vrf default 100.64.10.2 9c69.b461.ff40 ARPA
|
||||||
|
arp vrf default 100.64.11.2 9c69.b461.ff41 ARPA
|
||||||
|
|
||||||
|
router static
|
||||||
|
address-family ipv4 unicast
|
||||||
|
0.0.0.0/0 198.19.5.1
|
||||||
|
16.0.0.0/24 100.64.0.2
|
||||||
|
16.0.1.0/24 100.64.2.2
|
||||||
|
16.0.2.0/24 100.64.4.2
|
||||||
|
16.0.3.0/24 100.64.6.2
|
||||||
|
16.0.4.0/24 100.64.8.2
|
||||||
|
16.0.5.0/24 100.64.10.2
|
||||||
|
48.0.0.0/24 100.64.1.2
|
||||||
|
48.0.1.0/24 100.64.3.2
|
||||||
|
48.0.2.0/24 100.64.5.2
|
||||||
|
48.0.3.0/24 100.64.7.2
|
||||||
|
48.0.4.0/24 100.64.9.2
|
||||||
|
48.0.5.0/24 100.64.11.2
|
||||||
|
!
|
||||||
|
!
|
||||||
|
|
||||||
|
mpls static
|
||||||
|
interface TenGigE0/0/0/0
|
||||||
|
interface TenGigE0/0/0/1
|
||||||
|
interface TenGigE0/0/0/2
|
||||||
|
interface TenGigE0/0/0/3
|
||||||
|
interface TenGigE0/0/1/0
|
||||||
|
interface TenGigE0/0/1/1
|
||||||
|
interface TenGigE0/0/1/2
|
||||||
|
interface TenGigE0/0/1/3
|
||||||
|
interface TenGigE0/0/2/0
|
||||||
|
interface TenGigE0/0/2/1
|
||||||
|
interface TenGigE0/0/2/2
|
||||||
|
interface TenGigE0/0/2/3
|
||||||
|
address-family ipv4 unicast
|
||||||
|
local-label 16 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/2/3 100.64.1.2 out-label 17
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 17 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/2/2 100.64.0.2 out-label 16
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 18 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/2/0 100.64.3.2 out-label 19
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 19 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/2/1 100.64.2.2 out-label 18
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 20 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/1/1 100.64.5.2 out-label 21
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 21 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/1/0 100.64.4.2 out-label 20
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 22 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/0/1 100.64.7.2 out-label 23
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 23 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/0/0 100.64.6.2 out-label 22
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 24 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/0/2 100.64.9.2 out-label 25
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 25 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/0/3 100.64.8.2 out-label 24
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 26 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/1/2 100.64.11.2 out-label 27
|
||||||
|
!
|
||||||
|
!
|
||||||
|
local-label 27 allocate
|
||||||
|
forward
|
||||||
|
path 1 nexthop TenGigE0/0/1/3 100.64.10.2 out-label 26
|
||||||
|
!
|
||||||
|
!
|
||||||
|
!
|
||||||
|
!
|
||||||
|
```
|
@@ -230,7 +230,7 @@ does not have any form of configuration persistence and that's deliberate. VPP's
|
|||||||
programmable dataplane, and explicitly has left the programming and configuration as an exercise for
|
programmable dataplane, and explicitly has left the programming and configuration as an exercise for
|
||||||
integrators. I have written a Python project that takes a YAML file as input and uses it to
|
integrators. I have written a Python project that takes a YAML file as input and uses it to
|
||||||
configure (and reconfigure, on the fly) the dataplane automatically, called
|
configure (and reconfigure, on the fly) the dataplane automatically, called
|
||||||
[[VPPcfg](https://github.com/pimvanpelt/vppcfg.git)]. Previously, I wrote some implementation thoughts
|
[[VPPcfg](https://git.ipng.ch/ipng/vppcfg.git)]. Previously, I wrote some implementation thoughts
|
||||||
on its [[datamodel]({{< ref 2022-03-27-vppcfg-1 >}})] and its [[operations]({{< ref 2022-04-02-vppcfg-2
|
on its [[datamodel]({{< ref 2022-03-27-vppcfg-1 >}})] and its [[operations]({{< ref 2022-04-02-vppcfg-2
|
||||||
>}})] so I won't repeat that here. Instead, I will just show the configuration:
|
>}})] so I won't repeat that here. Instead, I will just show the configuration:
|
||||||
|
|
||||||
|
@@ -430,7 +430,7 @@ Boom. I could not be more pleased.
|
|||||||
This was a nice exercise for me! I'm going this direction becaue the
|
This was a nice exercise for me! I'm going this direction becaue the
|
||||||
[[Containerlab](https://containerlab.dev)] framework will start containers with given NOS images,
|
[[Containerlab](https://containerlab.dev)] framework will start containers with given NOS images,
|
||||||
not too dissimilar from the one I just made, and then attaches `veth` pairs between the containers.
|
not too dissimilar from the one I just made, and then attaches `veth` pairs between the containers.
|
||||||
I started dabbling with a [[pull-request](https://github.com/srl-labs/containerlab/pull/2569)], but
|
I started dabbling with a [[pull-request](https://github.com/srl-labs/containerlab/pull/2571)], but
|
||||||
I got stuck with a part of the Containerlab code that pre-deploys config files into the containers.
|
I got stuck with a part of the Containerlab code that pre-deploys config files into the containers.
|
||||||
You see, I will need to generate two files:
|
You see, I will need to generate two files:
|
||||||
|
|
||||||
@@ -448,7 +448,7 @@ will connect a few VPP containers together with an SR Linux node in a lab. Stand
|
|||||||
|
|
||||||
Once we have that, there's still quite some work for me to do. Notably:
|
Once we have that, there's still quite some work for me to do. Notably:
|
||||||
* Configuration persistence. `clab` allows you to save the running config. For that, I'll need to
|
* Configuration persistence. `clab` allows you to save the running config. For that, I'll need to
|
||||||
introduce [[vppcfg](https://github.com/pimvanpelt/vppcfg.git)] and a means to invoke it when
|
introduce [[vppcfg](https://git.ipng.ch/ipng/vppcfg)] and a means to invoke it when
|
||||||
the lab operator wants to save their config, and then reconfigure VPP when the container
|
the lab operator wants to save their config, and then reconfigure VPP when the container
|
||||||
restarts.
|
restarts.
|
||||||
* I'll need to have a few files from `clab` shared with the host, notably the `startup.conf` and
|
* I'll need to have a few files from `clab` shared with the host, notably the `startup.conf` and
|
||||||
|
373
content/articles/2025-05-04-containerlab-2.md
Normal file
373
content/articles/2025-05-04-containerlab-2.md
Normal file
@@ -0,0 +1,373 @@
|
|||||||
|
---
|
||||||
|
date: "2025-05-04T15:07:23Z"
|
||||||
|
title: 'VPP in Containerlab - Part 2'
|
||||||
|
params:
|
||||||
|
asciinema: true
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/containerlab/containerlab.svg" alt="Containerlab Logo" width="12em" >}}
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
From time to time the subject of containerized VPP instances comes up. At IPng, I run the routers in
|
||||||
|
AS8298 on bare metal (Supermicro and Dell hardware), as it allows me to maximize performance.
|
||||||
|
However, VPP is quite friendly in virtualization. Notably, it runs really well on virtual machines
|
||||||
|
like Qemu/KVM or VMWare. I can pass through PCI devices directly to the host, and use CPU pinning to
|
||||||
|
allow the guest virtual machine access to the underlying physical hardware. In such a mode, VPP
|
||||||
|
performance almost the same as on bare metal. But did you know that VPP can also run in Docker?
|
||||||
|
|
||||||
|
The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa.
|
||||||
|
One of the presenters was Nardus le Roux of Nokia, and he showed off a project called
|
||||||
|
[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing
|
||||||
|
container-based networking labs. It starts the containers, builds virtual wiring between them to
|
||||||
|
create lab topologies of users' choice and manages the lab lifecycle.
|
||||||
|
|
||||||
|
Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise
|
||||||
|
to actually add it. In my previous [[article]({{< ref 2025-05-03-containerlab-1.md >}})], I took
|
||||||
|
a good look at VPP as a dockerized container. In this article, I'll explore how to make such a
|
||||||
|
container run in Containerlab!
|
||||||
|
|
||||||
|
## Completing the Docker container
|
||||||
|
|
||||||
|
Just having VPP running by itself in a container is not super useful (although it _is_ cool!). I
|
||||||
|
decide first to add a few bits and bobs that will come in handy in the `Dockerfile`:
|
||||||
|
|
||||||
|
```
|
||||||
|
FROM debian:bookworm
|
||||||
|
ARG DEBIAN_FRONTEND=noninteractive
|
||||||
|
ARG VPP_INSTALL_SKIP_SYSCTL=true
|
||||||
|
ARG REPO=release
|
||||||
|
EXPOSE 22/tcp
|
||||||
|
RUN apt-get update && apt-get -y install curl procps tcpdump iproute2 iptables \
|
||||||
|
iputils-ping net-tools git python3 python3-pip vim-tiny openssh-server bird2 \
|
||||||
|
mtr-tiny traceroute && apt-get clean
|
||||||
|
|
||||||
|
# Install VPP
|
||||||
|
RUN mkdir -p /var/log/vpp /root/.ssh/
|
||||||
|
RUN curl -s https://packagecloud.io/install/repositories/fdio/${REPO}/script.deb.sh | bash
|
||||||
|
RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean
|
||||||
|
|
||||||
|
# Build vppcfg
|
||||||
|
RUN pip install --break-system-packages build netaddr yamale argparse pyyaml ipaddress
|
||||||
|
RUN git clone https://git.ipng.ch/ipng/vppcfg.git && cd vppcfg && python3 -m build && \
|
||||||
|
pip install --break-system-packages dist/vppcfg-*-py3-none-any.whl
|
||||||
|
|
||||||
|
# Config files
|
||||||
|
COPY files/etc/vpp/* /etc/vpp/
|
||||||
|
COPY files/etc/bird/* /etc/bird/
|
||||||
|
COPY files/init-container.sh /sbin/
|
||||||
|
RUN chmod 755 /sbin/init-container.sh
|
||||||
|
CMD ["/sbin/init-container.sh"]
|
||||||
|
```
|
||||||
|
|
||||||
|
A few notable additions:
|
||||||
|
* ***vppcfg*** is a handy utility I wrote and discussed in a previous [[article]({{< ref
|
||||||
|
2022-04-02-vppcfg-2 >}})]. Its purpose is to take YAML file that describes the configuration of
|
||||||
|
the dataplane (like which interfaces, sub-interfaces, MTU, IP addresses and so on), and then
|
||||||
|
apply this safely to a running dataplane. You can check it out in my
|
||||||
|
[[vppcfg](https://git.ipng.ch/ipng/vppcfg)] git repository.
|
||||||
|
* ***openssh-server*** will come in handy to log in to the container, in addition to the already
|
||||||
|
available `docker exec`.
|
||||||
|
* ***bird2*** which will be my controlplane of choice. At a future date, I might also add FRR,
|
||||||
|
which may be a good alterantive for some. VPP works well with both. You can check out Bird on
|
||||||
|
the nic.cz [[website](https://bird.network.cz/?get_doc&f=bird.html&v=20)].
|
||||||
|
|
||||||
|
I'll add a couple of default config files for Bird and VPP, and replace the CMD with a generic
|
||||||
|
`/sbin/init-container.sh` in which I can do any late binding stuff before launching VPP.
|
||||||
|
|
||||||
|
### Initializing the Container
|
||||||
|
|
||||||
|
#### VPP Containerlab: NetNS
|
||||||
|
|
||||||
|
VPP's Linux Control Plane plugin wants to run in its own network namespace. So the first order of
|
||||||
|
business of `/sbin/init-container.sh` is to create it:
|
||||||
|
|
||||||
|
```
|
||||||
|
NETNS=${NETNS:="dataplane"}
|
||||||
|
|
||||||
|
echo "Creating dataplane namespace"
|
||||||
|
/usr/bin/mkdir -p /etc/netns/$NETNS
|
||||||
|
/usr/bin/touch /etc/netns/$NETNS/resolv.conf
|
||||||
|
/usr/sbin/ip netns add $NETNS
|
||||||
|
```
|
||||||
|
|
||||||
|
#### VPP Containerlab: SSH
|
||||||
|
|
||||||
|
Then, I'll set the root password (which is `vpp` by the way), and start aan SSH daemon which allows
|
||||||
|
for password-less logins:
|
||||||
|
|
||||||
|
```
|
||||||
|
echo "Starting SSH, with credentials root:vpp"
|
||||||
|
sed -i -e 's,^#PermitRootLogin prohibit-password,PermitRootLogin yes,' /etc/ssh/sshd_config
|
||||||
|
sed -i -e 's,^root:.*,root:$y$j9T$kG8pyZEVmwLXEtXekQCRK.$9iJxq/bEx5buni1hrC8VmvkDHRy7ZMsw9wYvwrzexID:20211::::::,' /etc/shadow
|
||||||
|
/etc/init.d/ssh start
|
||||||
|
```
|
||||||
|
|
||||||
|
#### VPP Containerlab: Bird2
|
||||||
|
|
||||||
|
I can already predict that Bird2 won't be the only option for a controlplane, even though I'm a huge
|
||||||
|
fan of it. Therefore, I'll make it configurable to leave the door open for other controlplane
|
||||||
|
implementations in the future:
|
||||||
|
|
||||||
|
```
|
||||||
|
BIRD_ENABLED=${BIRD_ENABLED:="true"}
|
||||||
|
|
||||||
|
if [ "$BIRD_ENABLED" == "true" ]; then
|
||||||
|
echo "Starting Bird in $NETNS"
|
||||||
|
mkdir -p /run/bird /var/log/bird
|
||||||
|
chown bird:bird /var/log/bird
|
||||||
|
ROUTERID=$(ip -br a show eth0 | awk '{ print $3 }' | cut -f1 -d/)
|
||||||
|
sed -i -e "s,.*router id .*,router id $ROUTERID; # Set by container-init.sh," /etc/bird/bird.conf
|
||||||
|
/usr/bin/nsenter --net=/var/run/netns/$NETNS /usr/sbin/bird -u bird -g bird
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
I am reminded that Bird won't start if it cannot determine its _router id_. When I start it in the
|
||||||
|
`dataplane` namespace, it will immediately exit, because there will be no IP addresses configured
|
||||||
|
yet. But luckily, it logs its complaint and it's easily addressed. I decide to take the management
|
||||||
|
IPv4 address from `eth0` and write that into the `bird.conf` file, which otherwise does some basic
|
||||||
|
initialization that I described in a previous [[article]({{< ref 2021-09-02-vpp-5 >}})], so I'll
|
||||||
|
skip that here. However, I do include an empty file called `/etc/bird/bird-local.conf` for users to
|
||||||
|
further configure Bird2.
|
||||||
|
|
||||||
|
#### VPP Containerlab: Binding veth pairs
|
||||||
|
|
||||||
|
When Containerlab starts the VPP container, it'll offer it a set of `veth` ports that connect this
|
||||||
|
container to other nodes in the lab. This is done by the `links` list in the topology file
|
||||||
|
[[ref](https://containerlab.dev/manual/network/)]. It's my goal to take all of the interfaces
|
||||||
|
that are of type `veth`, and generate a little snippet to grab them and bind them into VPP while
|
||||||
|
setting their MTU to 9216 to allow for jumbo frames:
|
||||||
|
|
||||||
|
```
|
||||||
|
CLAB_VPP_FILE=${CLAB_VPP_FILE:=/etc/vpp/clab.vpp}
|
||||||
|
|
||||||
|
echo "Generating $CLAB_VPP_FILE"
|
||||||
|
: > $CLAB_VPP_FILE
|
||||||
|
MTU=9216
|
||||||
|
for IFNAME in $(ip -br link show type veth | cut -f1 -d@ | grep -v '^eth0$' | sort); do
|
||||||
|
MAC=$(ip -br link show dev $IFNAME | awk '{ print $3 }')
|
||||||
|
echo " * $IFNAME hw-addr $MAC mtu $MTU"
|
||||||
|
ip link set $IFNAME up mtu $MTU
|
||||||
|
cat << EOF >> $CLAB_VPP_FILE
|
||||||
|
create host-interface name $IFNAME hw-addr $MAC
|
||||||
|
set interface name host-$IFNAME $IFNAME
|
||||||
|
set interface mtu $MTU $IFNAME
|
||||||
|
set interface state $IFNAME up
|
||||||
|
|
||||||
|
EOF
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
{{< image width="5em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
|
||||||
|
|
||||||
|
One thing I realized is that VPP will assign a random MAC address on its copy of the `veth` port,
|
||||||
|
which is not great. I'll explicitly configure it with the same MAC address as the `veth` interface
|
||||||
|
itself, otherwise I'd have to put the interface into promiscuous mode.
|
||||||
|
|
||||||
|
#### VPP Containerlab: VPPcfg
|
||||||
|
|
||||||
|
I'm almost ready, but I have one more detail. The user will be able to offer a
|
||||||
|
[[vppcfg](https://git.ipng.ch/ipng/vppcfg)] YAML file to configure the interfaces and so on. If such
|
||||||
|
a file exists, I'll apply it to the dataplane upon startup:
|
||||||
|
|
||||||
|
```
|
||||||
|
VPPCFG_VPP_FILE=${VPPCFG_VPP_FILE:=/etc/vpp/vppcfg.vpp}
|
||||||
|
|
||||||
|
echo "Generating $VPPCFG_VPP_FILE"
|
||||||
|
: > $VPPCFG_VPP_FILE
|
||||||
|
if [ -r /etc/vpp/vppcfg.yaml ]; then
|
||||||
|
vppcfg plan --novpp -c /etc/vpp/vppcfg.yaml -o $VPPCFG_VPP_FILE
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
Once the VPP process starts, it'll execute `/etc/vpp/bootstrap.vpp`, which in turn executes these
|
||||||
|
newly generated `/etc/vpp/clab.vpp` to grab the `veth` interfaces, and then `/etc/vpp/vppcfg.vpp` to
|
||||||
|
further configure the dataplane. Easy peasy!
|
||||||
|
|
||||||
|
### Adding VPP to Containerlab
|
||||||
|
|
||||||
|
Roman points out a previous integration for the 6WIND VSR in
|
||||||
|
[[PR#2540](https://github.com/srl-labs/containerlab/pull/2540)]. This serves as a useful guide to
|
||||||
|
get me started. I fork the repo, create a branch so that Roman can also add a few commits, and
|
||||||
|
together we start hacking in [[PR#2571](https://github.com/srl-labs/containerlab/pull/2571)].
|
||||||
|
|
||||||
|
First, I add the documentation skeleton in `docs/manual/kinds/fdio_vpp.md`, which links in from a
|
||||||
|
few other places, and will be where the end-user facing documentation will live. That's about half
|
||||||
|
the contributed LOC, right there!
|
||||||
|
|
||||||
|
Next, I'll create a Go module in `nodes/fdio_vpp/fdio_vpp.go` which doesn't do much other than
|
||||||
|
creating the `struct`, and its required `Register` and `Init` functions. The `Init` function ensures
|
||||||
|
the right capabilities are set in Docker, and the right devices are bound for the container.
|
||||||
|
|
||||||
|
I notice that Containerlab rewrites the Dockerfile `CMD` string and prepends an `if-wait.sh` script
|
||||||
|
to it. This is because when Containerlab starts the container, it'll still be busy adding these
|
||||||
|
`link` interfaces to it, and if a container starts too quickly, it may not see all the interfaces.
|
||||||
|
So, containerlab informs the container using an environment variable called `CLAB_INTFS`, so this
|
||||||
|
script simply sleeps for a while until that exact amount of interfaces are present. Ok, cool beans.
|
||||||
|
|
||||||
|
Roman helps me a bit with Go templating. You see, I think it'll be slick to have the CLI prompt for
|
||||||
|
the VPP containers to reflect their hostname, because normally, VPP will assign `vpp# `. I add the
|
||||||
|
template in `nodes/fdio_vpp/vpp_startup_config.go.tpl` and it only has one variable expansion: `unix
|
||||||
|
{ cli-prompt {{ .ShortName }}# }`. But I totally think it's worth it, because when running many VPP
|
||||||
|
containers in the lab, it could otherwise get confusing.
|
||||||
|
|
||||||
|
Roman also shows me a trick in the function `PostDeploy()`, which will write the user's SSH pubkeys
|
||||||
|
to `/root/.ssh/authorized_keys`. This allows users to log in without having to use password
|
||||||
|
authentication.
|
||||||
|
|
||||||
|
Collectively, we decide to punt on the `SaveConfig` function until we're a bit further along. I have
|
||||||
|
an idea how this would work, basically along the lines of calling `vppcfg dump` and bind-mounting
|
||||||
|
that file into the lab directory somewhere. This way, upon restarting, the YAML file can be re-read
|
||||||
|
and the dataplane initialized. But it'll be for another day.
|
||||||
|
|
||||||
|
After the main module is finished, all I have to do is add it to `clab/register.go` and that's just
|
||||||
|
about it. In about 170 lines of code, 50 lines of Go template, and 170 lines of Markdown, this
|
||||||
|
contribution is about ready to ship!
|
||||||
|
|
||||||
|
### Containerlab: Demo
|
||||||
|
|
||||||
|
After I finish writing the documentation, I decide to include a demo with a quickstart to help folks
|
||||||
|
along. A simple lab showing two VPP instances and two Alpine Linux clients can be found on
|
||||||
|
[[git.ipng.ch/ipng/vpp-containerlab](https://git.ipng.ch/ipng/vpp-containerlab)]. Simply check out the
|
||||||
|
repo and start the lab, like so:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ git clone https://git.ipng.ch/ipng/vpp-containerlab.git
|
||||||
|
$ cd vpp-containerlab
|
||||||
|
$ containerlab deploy --topo vpp.clab.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Containerlab: configs
|
||||||
|
|
||||||
|
The file `vpp.clab.yml` contains an example topology existing of two VPP instances connected each to
|
||||||
|
one Alpine linux container, in the following topology:
|
||||||
|
|
||||||
|
{{< image src="/assets/containerlab/learn-vpp.png" alt="Containerlab Topo" width="100%" >}}
|
||||||
|
|
||||||
|
Two relevant files for each VPP router are included in this
|
||||||
|
[[repository](https://git.ipng.ch/ipng/vpp-containerlab)]:
|
||||||
|
1. `config/vpp*/vppcfg.yaml` configures the dataplane interfaces, including a loopback address.
|
||||||
|
1. `config/vpp*/bird-local.conf` configures the controlplane to enable BFD and OSPF.
|
||||||
|
|
||||||
|
To illustrate these files, let me take a closer look at node `vpp1`. It's VPP dataplane
|
||||||
|
configuration looks like this:
|
||||||
|
```
|
||||||
|
pim@summer:~/src/vpp-containerlab$ cat config/vpp1/vppcfg.yaml
|
||||||
|
interfaces:
|
||||||
|
eth1:
|
||||||
|
description: 'To client1'
|
||||||
|
mtu: 1500
|
||||||
|
lcp: eth1
|
||||||
|
addresses: [ 10.82.98.65/28, 2001:db8:8298:101::1/64 ]
|
||||||
|
eth2:
|
||||||
|
description: 'To vpp2'
|
||||||
|
mtu: 9216
|
||||||
|
lcp: eth2
|
||||||
|
addresses: [ 10.82.98.16/31, 2001:db8:8298:1::1/64 ]
|
||||||
|
loopbacks:
|
||||||
|
loop0:
|
||||||
|
description: 'vpp1'
|
||||||
|
lcp: loop0
|
||||||
|
addresses: [ 10.82.98.0/32, 2001:db8:8298::/128 ]
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, I enable BFD, OSPF and OSPFv3 on `eth2` and `loop0` on both of the VPP routers:
|
||||||
|
```
|
||||||
|
pim@summer:~/src/vpp-containerlab$ cat config/vpp1/bird-local.conf
|
||||||
|
protocol bfd bfd1 {
|
||||||
|
interface "eth2" { interval 100 ms; multiplier 30; };
|
||||||
|
}
|
||||||
|
|
||||||
|
protocol ospf v2 ospf4 {
|
||||||
|
ipv4 { import all; export all; };
|
||||||
|
area 0 {
|
||||||
|
interface "loop0" { stub yes; };
|
||||||
|
interface "eth2" { type pointopoint; cost 10; bfd on; };
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
protocol ospf v3 ospf6 {
|
||||||
|
ipv6 { import all; export all; };
|
||||||
|
area 0 {
|
||||||
|
interface "loop0" { stub yes; };
|
||||||
|
interface "eth2" { type pointopoint; cost 10; bfd on; };
|
||||||
|
};
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Containerlab: playtime!
|
||||||
|
|
||||||
|
Once the lab comes up, I can SSH to the VPP containers (`vpp1` and `vpp2`) which have my SSH pubkeys
|
||||||
|
installed thanks to Roman's work. Barring that, I could still log in as user `root` using
|
||||||
|
password `vpp`. VPP runs its own network namespace called `dataplane`, which is very similar to SR
|
||||||
|
Linux default `network-instance`. I can join that namespace to take a closer look:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~/src/vpp-containerlab$ ssh root@vpp1
|
||||||
|
root@vpp1:~# nsenter --net=/var/run/netns/dataplane
|
||||||
|
root@vpp1:~# ip -br a
|
||||||
|
lo DOWN
|
||||||
|
loop0 UP 10.82.98.0/32 2001:db8:8298::/128 fe80::dcad:ff:fe00:0/64
|
||||||
|
eth1 UNKNOWN 10.82.98.65/28 2001:db8:8298:101::1/64 fe80::a8c1:abff:fe77:acb9/64
|
||||||
|
eth2 UNKNOWN 10.82.98.16/31 2001:db8:8298:1::1/64 fe80::a8c1:abff:fef0:7125/64
|
||||||
|
|
||||||
|
root@vpp1:~# ping 10.82.98.1
|
||||||
|
PING 10.82.98.1 (10.82.98.1) 56(84) bytes of data.
|
||||||
|
64 bytes from 10.82.98.1: icmp_seq=1 ttl=64 time=9.53 ms
|
||||||
|
64 bytes from 10.82.98.1: icmp_seq=2 ttl=64 time=15.9 ms
|
||||||
|
^C
|
||||||
|
--- 10.82.98.1 ping statistics ---
|
||||||
|
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
|
||||||
|
rtt min/avg/max/mdev = 9.530/12.735/15.941/3.205 ms
|
||||||
|
```
|
||||||
|
|
||||||
|
From `vpp1`, I can tell that Bird2's OSPF adjacency has formed, because I can ping the `loop0`
|
||||||
|
address of `vpp2` router on 10.82.98.1. Nice! The two client nodes are running a minimalistic Alpine
|
||||||
|
Linux container, which doesn't ship with SSH by default. But of course I can still enter the
|
||||||
|
containers using `docker exec`, like so:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~/src/vpp-containerlab$ docker exec -it client1 sh
|
||||||
|
/ # ip addr show dev eth1
|
||||||
|
531235: eth1@if531234: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 9500 qdisc noqueue state UP
|
||||||
|
link/ether 00:c1:ab:00:00:01 brd ff:ff:ff:ff:ff:ff
|
||||||
|
inet 10.82.98.66/28 scope global eth1
|
||||||
|
valid_lft forever preferred_lft forever
|
||||||
|
inet6 2001:db8:8298:101::2/64 scope global
|
||||||
|
valid_lft forever preferred_lft forever
|
||||||
|
inet6 fe80::2c1:abff:fe00:1/64 scope link
|
||||||
|
valid_lft forever preferred_lft forever
|
||||||
|
/ # traceroute 10.82.98.82
|
||||||
|
traceroute to 10.82.98.82 (10.82.98.82), 30 hops max, 46 byte packets
|
||||||
|
1 10.82.98.65 (10.82.98.65) 5.906 ms 7.086 ms 7.868 ms
|
||||||
|
2 10.82.98.17 (10.82.98.17) 24.007 ms 23.349 ms 15.933 ms
|
||||||
|
3 10.82.98.82 (10.82.98.82) 39.978 ms 31.127 ms 31.854 ms
|
||||||
|
|
||||||
|
/ # traceroute 2001:db8:8298:102::2
|
||||||
|
traceroute to 2001:db8:8298:102::2 (2001:db8:8298:102::2), 30 hops max, 72 byte packets
|
||||||
|
1 2001:db8:8298:101::1 (2001:db8:8298:101::1) 0.701 ms 7.144 ms 7.900 ms
|
||||||
|
2 2001:db8:8298:1::2 (2001:db8:8298:1::2) 23.909 ms 22.943 ms 23.893 ms
|
||||||
|
3 2001:db8:8298:102::2 (2001:db8:8298:102::2) 31.964 ms 30.814 ms 32.000 ms
|
||||||
|
```
|
||||||
|
|
||||||
|
From the vantage point of `client1`, the first hop represents the `vpp1` node, which forwards to
|
||||||
|
`vpp2`, which finally forwards to `client2`, which shows that both VPP routers are passing traffic.
|
||||||
|
Dope!
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
After all of this deep-diving, all that's left is for me to demonstrate the Containerlab by means of
|
||||||
|
this little screencast [[asciinema](/assets/containerlab/vpp-containerlab.cast)]. I hope you enjoy
|
||||||
|
it as much as I enjoyed creating it:
|
||||||
|
|
||||||
|
{{< asciinema src="/assets/containerlab/vpp-containerlab.cast" >}}
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
I wanted to give a shout-out Roman Dodin for his help getting the Containerlab parts squared away
|
||||||
|
when I got a little bit stuck. He took the time to explain the internals and idiom of Containerlab
|
||||||
|
project, which really saved me a tonne of time. He also pair-programmed the
|
||||||
|
[[PR#2471](https://github.com/srl-labs/containerlab/pull/2571)] with me over the span of two
|
||||||
|
evenings.
|
||||||
|
|
||||||
|
Collaborative open source rocks!
|
713
content/articles/2025-05-28-minio-1.md
Normal file
713
content/articles/2025-05-28-minio-1.md
Normal file
@@ -0,0 +1,713 @@
|
|||||||
|
---
|
||||||
|
date: "2025-05-28T22:07:23Z"
|
||||||
|
title: 'Case Study: Minio S3 - Part 1'
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/minio/minio-logo.png" alt="MinIO Logo" width="6em" >}}
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading
|
||||||
|
scalability, data availability, security, and performance. Millions of customers of all sizes and
|
||||||
|
industries store, manage, analyze, and protect any amount of data for virtually any use case, such
|
||||||
|
as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and
|
||||||
|
easy-to-use management features, you can optimize costs, organize and analyze data, and configure
|
||||||
|
fine-tuned access controls to meet specific business and compliance requirements.
|
||||||
|
|
||||||
|
Amazon's S3 became the _de facto_ standard object storage system, and there exist several fully open
|
||||||
|
source implementations of the protocol. One of them is MinIO: designed to allow enterprises to
|
||||||
|
consolidate all of their data on a single, private cloud namespace. Architected using the same
|
||||||
|
principles as the hyperscalers, AIStor delivers performance at scale at a fraction of the cost
|
||||||
|
compared to the public cloud.
|
||||||
|
|
||||||
|
IPng Networks is an Internet Service Provider, but I also dabble in self-hosting things, for
|
||||||
|
example [[PeerTube](https://video.ipng.ch/)], [[Mastodon](https://ublog.tech/)],
|
||||||
|
[[Immich](https://photos.ipng.ch/)], [[Pixelfed](https://pix.ublog.tech/)] and of course
|
||||||
|
[[Hugo](https://ipng.ch/)]. These services all have one thing in common: they tend to use lots of
|
||||||
|
storage when they grow. At IPng Networks, all hypervisors ship with enterprise SAS flash drives,
|
||||||
|
mostly 1.92TB and 3.84TB. Scaling up each of these services, and backing them up safely, can be
|
||||||
|
quite the headache.
|
||||||
|
|
||||||
|
This article is for the storage-buffs. I'll set up a set of distributed MinIO nodes from scatch.
|
||||||
|
|
||||||
|
## Physical
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/minio/disks.png" alt="MinIO Disks" width="16em" >}}
|
||||||
|
|
||||||
|
I'll start with the basics. I still have a few Dell R720 servers laying around, they are getting a
|
||||||
|
bit older but still have 24 cores and 64GB of memory. First I need to get me some disks. I order
|
||||||
|
36pcs of 16TB SATA enterprise disk, a mixture of Seagate EXOS and Toshiba MG series disks. I've once
|
||||||
|
learned (the hard way), that buying a big stack of disks from one production run is a risk - so I'll
|
||||||
|
mix and match the drives.
|
||||||
|
|
||||||
|
Three trays of caddies and a melted credit card later, I have 576TB of SATA disks safely in hand.
|
||||||
|
Each machine will carry 192TB of raw storage. The nice thing about this chassis is that Dell can
|
||||||
|
ship them with 12x 3.5" SAS slots in the front, and 2x 2.5" SAS slots in the rear of the chassis.
|
||||||
|
|
||||||
|
So I'll install Debian Bookworm on one small 480G SSD in software RAID1.
|
||||||
|
|
||||||
|
### Cloning an install
|
||||||
|
|
||||||
|
I have three identical machines so in total I'll want six of these SSDs. I temporarily screw the
|
||||||
|
other five in 3.5" drive caddies and plug them into the first installed Dell, which I've called
|
||||||
|
`minio-proto`:
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@minio-proto:~$ for i in b c d e f; do
|
||||||
|
sudo dd if=/dev/sda of=/dev/sd${i} bs=512 count=1;
|
||||||
|
sudo mdadm --manage /dev/md0 --add /dev/md${i}1
|
||||||
|
done
|
||||||
|
pim@minio-proto:~$ sudo mdadm --manage /dev/md0 --grow 6
|
||||||
|
pim@minio-proto:~$ watch cat /proc/mdstat
|
||||||
|
pim@minio-proto:~$ for i in a b c d e f; do
|
||||||
|
sudo grub-install /dev/sd$i
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/minio/rack.png" alt="MinIO Rack" width="16em" >}}
|
||||||
|
|
||||||
|
The first command takes my installed disk, `/dev/sda`, and copies the first sector over to the other
|
||||||
|
five. This will give them the same partition table. Next, I'll add the first partition of each disk
|
||||||
|
to the raidset. Then, I'll expand the raidset to have six members, after which the kernel starts a
|
||||||
|
recovery process that syncs the newly added paritions to `/dev/md0` (by copying from `/dev/sda` to
|
||||||
|
all other disks at once). Finally, I'll watch this exciting movie and grab a cup of tea.
|
||||||
|
|
||||||
|
|
||||||
|
Once the disks are fully copied, I'll shut down the machine and distribute the disks to their
|
||||||
|
respective Dell R720, two each. Once they boot they will all be identical. I'll need to make sure
|
||||||
|
their hostnames, and machine/host-id are unique, otherwise things like bridges will have overlapping
|
||||||
|
MAC addresses - ask me how I know:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@minio-proto:~$ sudo mdadm --manage /dev/md0 --grow -n 2
|
||||||
|
pim@minio-proto:~$ sudo rm /etc/ssh/ssh_host*
|
||||||
|
pim@minio-proto:~$ sudo hostname minio0-chbtl0
|
||||||
|
pim@minio-proto:~$ sudo dpkg-reconfigure openssh-server
|
||||||
|
pim@minio-proto:~$ sudo dd if=/dev/random of=/etc/hostid bs=4 count=1
|
||||||
|
pim@minio-proto:~$ sudo /usr/bin/dbus-uuidgen > /etc/machine-id
|
||||||
|
pim@minio-proto:~$ sudo reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
After which I have three beautiful and unique machines:
|
||||||
|
* `minio0.chbtl0.net.ipng.ch`: which will go into my server rack at the IPng office.
|
||||||
|
* `minio0.ddln0.net.ipng.ch`: which will go to [[Daedalean]({{< ref
|
||||||
|
2022-02-24-colo >}})], doing AI since before it was all about vibe coding.
|
||||||
|
* `minio0.chrma0.net.ipng.ch`: which will go to [[IP-Max](https://ip-max.net/)], one of the best
|
||||||
|
ISPs on the planet. 🥰
|
||||||
|
|
||||||
|
|
||||||
|
## Deploying Minio
|
||||||
|
|
||||||
|
The user guide that MinIO provides
|
||||||
|
[[ref](https://min.io/docs/minio/linux/operations/installation.html)] is super good, arguably one of
|
||||||
|
the best documented open source projects I've ever seen. it shows me that I can do three types of
|
||||||
|
install. A 'Standalone' with one disk, a 'Standalone Multi-Drive', and a 'Distributed' deployment.
|
||||||
|
I decide to make three independent standalone multi-drive installs. This way, I have less shared
|
||||||
|
fate, and will be immune to network partitions (as these are going to be in three different
|
||||||
|
physical locations). I've also read about per-bucket _replication_, which will be an excellent way
|
||||||
|
to get geographical distribution and active/active instances to work together.
|
||||||
|
|
||||||
|
I feel good about the single-machine multi-drive decision. I follow the install guide
|
||||||
|
[[ref](https://min.io/docs/minio/linux/operations/install-deploy-manage/deploy-minio-single-node-multi-drive.html#minio-snmd)]
|
||||||
|
for this deployment type.
|
||||||
|
|
||||||
|
### IPng Frontends
|
||||||
|
|
||||||
|
At IPng I use a private IPv4/IPv6/MPLS network that is not connected to the internet. I call this
|
||||||
|
network [[IPng Site Local]({{< ref 2023-03-11-mpls-core.md >}})]. But how will users reach my Minio
|
||||||
|
install? I have four redundantly and geographically deployed frontends, two in the Netherlands and
|
||||||
|
two in Switzerland. I've described the frontend setup in a [[previous article]({{< ref
|
||||||
|
2023-03-17-ipng-frontends >}})] and the certificate management in [[this article]({{< ref
|
||||||
|
2023-03-24-lego-dns01 >}})].
|
||||||
|
|
||||||
|
I've decided to run the service on these three regionalized endpoints:
|
||||||
|
1. `s3.chbtl0.ipng.ch` which will back into `minio0.chbtl0.net.ipng.ch`
|
||||||
|
1. `s3.ddln0.ipng.ch` which will back into `minio0.ddln0.net.ipng.ch`
|
||||||
|
1. `s3.chrma0.ipng.ch` which will back into `minio0.chrma0.net.ipng.ch`
|
||||||
|
|
||||||
|
The first thing I take note of is that S3 buckets can be either addressed _by path_, in other words
|
||||||
|
something like `s3.chbtl0.ipng.ch/my-bucket/README.md`, but they can also be addressed by virtual
|
||||||
|
host, like so: `my-bucket.s3.chbtl0.ipng.ch/README.md`. A subtle difference, but from the docs I
|
||||||
|
understand that Minio needs to have control of the whole space under its main domain.
|
||||||
|
|
||||||
|
There's a small implication to this requirement -- the Web Console that ships with MinIO (eh, well,
|
||||||
|
maybe that's going to change, more on that later), will want to have its own domain-name, so I
|
||||||
|
choose something simple: `cons0-s3.chbtl0.ipng.ch` and so on. This way, somebody might still be able
|
||||||
|
to have a bucket name called `cons0` :)
|
||||||
|
|
||||||
|
#### Let's Encrypt Certificates
|
||||||
|
|
||||||
|
Alright, so I will be neading nine domains into this new certificate which I'll simply call
|
||||||
|
`s3.ipng.ch`. I configure it in Ansible:
|
||||||
|
|
||||||
|
```
|
||||||
|
certbot:
|
||||||
|
certs:
|
||||||
|
...
|
||||||
|
s3.ipng.ch:
|
||||||
|
groups: [ 'nginx', 'minio' ]
|
||||||
|
altnames:
|
||||||
|
- 's3.chbtl0.ipng.ch'
|
||||||
|
- 'cons0-s3.chbtl0.ipng.ch'
|
||||||
|
- '*.s3.chbtl0.ipng.ch'
|
||||||
|
- 's3.ddln0.ipng.ch'
|
||||||
|
- 'cons0-s3.ddln0.ipng.ch'
|
||||||
|
- '*.s3.ddln0.ipng.ch'
|
||||||
|
- 's3.chrma0.ipng.ch'
|
||||||
|
- 'cons0-s3.chrma0.ipng.ch'
|
||||||
|
- '*.s3.chrma0.ipng.ch'
|
||||||
|
```
|
||||||
|
|
||||||
|
I run the `certbot` playbook and it does two things:
|
||||||
|
1. On the machines from group `nginx` and `minio`, it will ensure there exists a user `lego` with
|
||||||
|
an SSH key and write permissions to `/etc/lego/`; this is where the automation will write (and
|
||||||
|
update) the certificate keys.
|
||||||
|
1. On the `lego` machine, it'll create two files. One is the certificate requestor, and the other
|
||||||
|
is a certificate distribution script that will copy the cert to the right machine(s) when it
|
||||||
|
renews.
|
||||||
|
|
||||||
|
On the `lego` machine, I'll run the cert request for the first time:
|
||||||
|
|
||||||
|
```
|
||||||
|
lego@lego:~$ bin/certbot:s3.ipng.ch
|
||||||
|
lego@lego:~$ RENEWED_LINEAGE=/home/lego/acme-dns/live/s3.ipng.ch bin/certbot-distribute
|
||||||
|
```
|
||||||
|
|
||||||
|
The first script asks me to add the _acme-challenge DNS entries, which I'll do, for example on the
|
||||||
|
`s3.chbtl0.ipng.ch` instance (and similar for the `ddln0` and `chrma0` ones:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ORIGIN chbtl0.ipng.ch.
|
||||||
|
_acme-challenge.s3 CNAME 51f16fd0-8eb6-455c-b5cd-96fad12ef8fd.auth.ipng.ch.
|
||||||
|
_acme-challenge.cons0-s3 CNAME 450477b8-74c9-4b9e-bbeb-de49c3f95379.auth.ipng.ch.
|
||||||
|
s3 CNAME nginx0.ipng.ch.
|
||||||
|
*.s3 CNAME nginx0.ipng.ch.
|
||||||
|
cons0-s3 CNAME nginx0.ipng.ch.
|
||||||
|
```
|
||||||
|
|
||||||
|
I push and reload the `ipng.ch` zonefile with these changes after which the certificate gets
|
||||||
|
requested and a cronjob added to check for renewals. The second script will copy the newly created
|
||||||
|
cert to all three `minio` machines, and all four `nginx` machines. From now on, every 90 days, a new
|
||||||
|
cert will be automatically generated and distributed. Slick!
|
||||||
|
|
||||||
|
#### NGINX Configs
|
||||||
|
|
||||||
|
With the LE wildcard certs in hand, I can create an NGINX frontend for these minio deployments.
|
||||||
|
|
||||||
|
First, a simple redirector service that punts people on port 80 to port 443:
|
||||||
|
|
||||||
|
```
|
||||||
|
server {
|
||||||
|
listen [::]:80;
|
||||||
|
listen 0.0.0.0:80;
|
||||||
|
|
||||||
|
server_name cons0-s3.chbtl0.ipng.ch s3.chbtl0.ipng.ch *.s3.chbtl0.ipng.ch;
|
||||||
|
access_log /var/log/nginx/s3.chbtl0.ipng.ch-access.log;
|
||||||
|
include /etc/nginx/conf.d/ipng-headers.inc;
|
||||||
|
|
||||||
|
location / {
|
||||||
|
return 301 https://$server_name$request_uri;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Next, the Minio API service itself which runs on port 9000, with a configuration snippet inspired by
|
||||||
|
the MinIO [[docs](https://min.io/docs/minio/linux/integrations/setup-nginx-proxy-with-minio.html)]:
|
||||||
|
|
||||||
|
```
|
||||||
|
server {
|
||||||
|
listen [::]:443 ssl http2;
|
||||||
|
listen 0.0.0.0:443 ssl http2;
|
||||||
|
ssl_certificate /etc/certs/s3.ipng.ch/fullchain.pem;
|
||||||
|
ssl_certificate_key /etc/certs/s3.ipng.ch/privkey.pem;
|
||||||
|
include /etc/nginx/conf.d/options-ssl-nginx.inc;
|
||||||
|
ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;
|
||||||
|
|
||||||
|
server_name s3.chbtl0.ipng.ch *.s3.chbtl0.ipng.ch;
|
||||||
|
access_log /var/log/nginx/s3.chbtl0.ipng.ch-access.log upstream;
|
||||||
|
include /etc/nginx/conf.d/ipng-headers.inc;
|
||||||
|
|
||||||
|
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
|
||||||
|
|
||||||
|
ignore_invalid_headers off;
|
||||||
|
client_max_body_size 0;
|
||||||
|
# Disable buffering
|
||||||
|
proxy_buffering off;
|
||||||
|
proxy_request_buffering off;
|
||||||
|
|
||||||
|
location / {
|
||||||
|
proxy_set_header Host $http_host;
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
|
||||||
|
proxy_connect_timeout 300;
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Connection "";
|
||||||
|
chunked_transfer_encoding off;
|
||||||
|
|
||||||
|
proxy_pass http://minio0.chbtl0.net.ipng.ch:9000;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, the Minio Console service which runs on port 9090:
|
||||||
|
|
||||||
|
```
|
||||||
|
include /etc/nginx/conf.d/geo-ipng-trusted.inc;
|
||||||
|
|
||||||
|
server {
|
||||||
|
listen [::]:443 ssl http2;
|
||||||
|
listen 0.0.0.0:443 ssl http2;
|
||||||
|
ssl_certificate /etc/certs/s3.ipng.ch/fullchain.pem;
|
||||||
|
ssl_certificate_key /etc/certs/s3.ipng.ch/privkey.pem;
|
||||||
|
include /etc/nginx/conf.d/options-ssl-nginx.inc;
|
||||||
|
ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;
|
||||||
|
|
||||||
|
server_name cons0-s3.chbtl0.ipng.ch;
|
||||||
|
access_log /var/log/nginx/cons0-s3.chbtl0.ipng.ch-access.log upstream;
|
||||||
|
include /etc/nginx/conf.d/ipng-headers.inc;
|
||||||
|
|
||||||
|
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
|
||||||
|
|
||||||
|
ignore_invalid_headers off;
|
||||||
|
client_max_body_size 0;
|
||||||
|
# Disable buffering
|
||||||
|
proxy_buffering off;
|
||||||
|
proxy_request_buffering off;
|
||||||
|
|
||||||
|
location / {
|
||||||
|
if ($geo_ipng_trusted = 0) { rewrite ^ https://ipng.ch/ break; }
|
||||||
|
proxy_set_header Host $http_host;
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
proxy_set_header X-NginX-Proxy true;
|
||||||
|
|
||||||
|
real_ip_header X-Real-IP;
|
||||||
|
proxy_connect_timeout 300;
|
||||||
|
chunked_transfer_encoding off;
|
||||||
|
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Upgrade $http_upgrade;
|
||||||
|
proxy_set_header Connection "upgrade";
|
||||||
|
|
||||||
|
proxy_pass http://minio0.chbtl0.net.ipng.ch:9090;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This last one has an NGINX trick. It will only allow users in if they are in the map called
|
||||||
|
`geo_ipng_trusted`, which contains a set of IPv4 and IPv6 prefixes. Visitors who are not in this map
|
||||||
|
will receive an HTTP redirect back to the [[IPng.ch](https://ipng.ch/)] homepage instead.
|
||||||
|
|
||||||
|
I run the Ansible Playbook which contains the NGINX changes to all frontends, but of course nothing
|
||||||
|
runs yet, because I haven't yet started MinIO backends.
|
||||||
|
|
||||||
|
### MinIO Backends
|
||||||
|
|
||||||
|
The first thing I need to do is get those disks mounted. MinIO likes using XFS, so I'll install that
|
||||||
|
and prepare the disks as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@minio0-chbtl0:~$ sudo apt install xfsprogs
|
||||||
|
pim@minio0-chbtl0:~$ sudo modprobe xfs
|
||||||
|
pim@minio0-chbtl0:~$ echo xfs | sudo tee -a /etc/modules
|
||||||
|
pim@minio0-chbtl0:~$ sudo update-initramfs -k all -u
|
||||||
|
pim@minio0-chbtl0:~$ for i in a b c d e f g h i j k l; do sudo mkfs.xfs /dev/sd$i; done
|
||||||
|
pim@minio0-chbtl0:~$ blkid | awk 'BEGIN {i=1} /TYPE="xfs"/ {
|
||||||
|
printf "%s /minio/disk%d xfs defaults 0 2\n",$2,i; i++;
|
||||||
|
}' | sudo tee -a /etc/fstab
|
||||||
|
pim@minio0-chbtl0:~$ for i in `seq 1 12`; do sudo mkdir -p /minio/disk$i; done
|
||||||
|
pim@minio0-chbtl0:~$ sudo mount -t xfs -a
|
||||||
|
pim@minio0-chbtl0:~$ sudo chown -R minio-user: /minio/
|
||||||
|
```
|
||||||
|
|
||||||
|
From the top: I'll install `xfsprogs` which contains the things I need to manipulate XFS filesystems
|
||||||
|
in Debian. Then I'll install the `xfs` kernel module, and make sure it gets inserted upon subsequent
|
||||||
|
startup by adding it to `/etc/modules` and regenerating the initrd for the installed kernels.
|
||||||
|
|
||||||
|
Next, I'll format all twelve 16TB disks (which are `/dev/sda` - `/dev/sdl` on these machines), and
|
||||||
|
add their resulting blockdevice id's to `/etc/fstab` so they get persistently mounted on reboot.
|
||||||
|
|
||||||
|
Finally, I'll create their mountpoints, mount all XFS filesystems, and chown them to the user that
|
||||||
|
MinIO is running as. End result:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@minio0-chbtl0:~$ df -T
|
||||||
|
Filesystem Type 1K-blocks Used Available Use% Mounted on
|
||||||
|
udev devtmpfs 32950856 0 32950856 0% /dev
|
||||||
|
tmpfs tmpfs 6595340 1508 6593832 1% /run
|
||||||
|
/dev/md0 ext4 114695308 5423976 103398948 5% /
|
||||||
|
tmpfs tmpfs 32976680 0 32976680 0% /dev/shm
|
||||||
|
tmpfs tmpfs 5120 4 5116 1% /run/lock
|
||||||
|
/dev/sda xfs 15623792640 121505936 15502286704 1% /minio/disk1
|
||||||
|
/dev/sde xfs 15623792640 121505968 15502286672 1% /minio/disk12
|
||||||
|
/dev/sdi xfs 15623792640 121505968 15502286672 1% /minio/disk11
|
||||||
|
/dev/sdl xfs 15623792640 121505904 15502286736 1% /minio/disk10
|
||||||
|
/dev/sdd xfs 15623792640 121505936 15502286704 1% /minio/disk4
|
||||||
|
/dev/sdb xfs 15623792640 121505968 15502286672 1% /minio/disk3
|
||||||
|
/dev/sdk xfs 15623792640 121505936 15502286704 1% /minio/disk5
|
||||||
|
/dev/sdc xfs 15623792640 121505936 15502286704 1% /minio/disk9
|
||||||
|
/dev/sdf xfs 15623792640 121506000 15502286640 1% /minio/disk2
|
||||||
|
/dev/sdj xfs 15623792640 121505968 15502286672 1% /minio/disk7
|
||||||
|
/dev/sdg xfs 15623792640 121506000 15502286640 1% /minio/disk8
|
||||||
|
/dev/sdh xfs 15623792640 121505968 15502286672 1% /minio/disk6
|
||||||
|
tmpfs tmpfs 6595336 0 6595336 0% /run/user/0
|
||||||
|
```
|
||||||
|
|
||||||
|
MinIO likes to be configured using environment variables - and this is likely because it's a popular
|
||||||
|
thing to run in a containerized environment like Kubernetes. The maintainers ship it also as a
|
||||||
|
Debian package, which will read its environment from `/etc/default/minio`, and I'll prepare that
|
||||||
|
file as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /etc/default/minio
|
||||||
|
MINIO_DOMAIN="s3.chbtl0.ipng.ch,minio0.chbtl0.net.ipng.ch"
|
||||||
|
MINIO_ROOT_USER="XXX"
|
||||||
|
MINIO_ROOT_PASSWORD="YYY"
|
||||||
|
MINIO_VOLUMES="/minio/disk{1...12}"
|
||||||
|
MINIO_OPTS="--console-address :9001"
|
||||||
|
EOF
|
||||||
|
pim@minio0-chbtl0:~$ sudo systemctl enable --now minio
|
||||||
|
pim@minio0-chbtl0:~$ sudo journalctl -u minio
|
||||||
|
May 31 10:44:11 minio0-chbtl0 minio[690420]: MinIO Object Storage Server
|
||||||
|
May 31 10:44:11 minio0-chbtl0 minio[690420]: Copyright: 2015-2025 MinIO, Inc.
|
||||||
|
May 31 10:44:11 minio0-chbtl0 minio[690420]: License: GNU AGPLv3 - https://www.gnu.org/licenses/agpl-3.0.html
|
||||||
|
May 31 10:44:11 minio0-chbtl0 minio[690420]: Version: RELEASE.2025-05-24T17-08-30Z (go1.24.3 linux/amd64)
|
||||||
|
May 31 10:44:11 minio0-chbtl0 minio[690420]: API: http://198.19.4.11:9000 http://127.0.0.1:9000
|
||||||
|
May 31 10:44:11 minio0-chbtl0 minio[690420]: WebUI: https://cons0-s3.chbtl0.ipng.ch/
|
||||||
|
May 31 10:44:11 minio0-chbtl0 minio[690420]: Docs: https://docs.min.io
|
||||||
|
|
||||||
|
pim@minio0-chbtl0:~$ sudo ipmitool sensor | grep Watts
|
||||||
|
Pwr Consumption | 154.000 | Watts
|
||||||
|
```
|
||||||
|
|
||||||
|
Incidentally - I am pretty pleased with this 192TB disk tank, sporting 24 cores, 64GB memory and
|
||||||
|
2x10G network, casually hanging out at 154 Watts of power all up. Slick!
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/minio/minio-ec.svg" alt="MinIO Erasure Coding" width="22em" >}}
|
||||||
|
|
||||||
|
MinIO implements _erasure coding_ as a core component in providing availability and resiliency
|
||||||
|
during drive or node-level failure events. MinIO partitions each object into data and parity shards
|
||||||
|
and distributes those shards across a single so-called _erasure set_. Under the hood, it uses
|
||||||
|
[[Reed-Solomon](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction)] erasure coding
|
||||||
|
implementation and partitions the object for distribution. From the MinIO website, I'll borrow a
|
||||||
|
diagram to show how it looks like on a single node like mine to the right.
|
||||||
|
|
||||||
|
Anyway, MinIO detects 12 disks and installs an erasure set with 8 data disks and 4 parity disks,
|
||||||
|
which it calls `EC:4` encoding, also known in the industry as `RS8.4`.
|
||||||
|
Just like that, the thing shoots to life. Awesome!
|
||||||
|
|
||||||
|
### MinIO Client
|
||||||
|
|
||||||
|
On Summer, I'll install the MinIO Client called `mc`. This is easy because the maintainers ship a
|
||||||
|
Linux binary which I can just download. On OpenBSD, they don't do that. Not a problem though, on
|
||||||
|
Squanchy, Pencilvester and Glootie, I will just `go install` the client. Using the `mc` commandline,
|
||||||
|
I can all any of the S3 APIs on my new MinIO instance:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ set +o history
|
||||||
|
pim@summer:~$ mc alias set chbtl0 https://s3.chbtl0.ipng.ch/ <rootuser> <rootpass>
|
||||||
|
pim@summer:~$ set -o history
|
||||||
|
pim@summer:~$ mc admin info chbtl0/
|
||||||
|
● s3.chbtl0.ipng.ch
|
||||||
|
Uptime: 22 hours
|
||||||
|
Version: 2025-05-24T17:08:30Z
|
||||||
|
Network: 1/1 OK
|
||||||
|
Drives: 12/12 OK
|
||||||
|
Pool: 1
|
||||||
|
|
||||||
|
┌──────┬───────────────────────┬─────────────────────┬──────────────┐
|
||||||
|
│ Pool │ Drives Usage │ Erasure stripe size │ Erasure sets │
|
||||||
|
│ 1st │ 0.8% (total: 116 TiB) │ 12 │ 1 │
|
||||||
|
└──────┴───────────────────────┴─────────────────────┴──────────────┘
|
||||||
|
|
||||||
|
95 GiB Used, 5 Buckets, 5,859 Objects, 318 Versions, 1 Delete Marker
|
||||||
|
12 drives online, 0 drives offline, EC:4
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Cool beans. I think I should get rid of this root account though, I've installed those credentials
|
||||||
|
into the `/etc/default/minio` environment file, but I don't want to keep them out in the open. So
|
||||||
|
I'll make an account for myself and assign me reasonable privileges, called `consoleAdmin` in the
|
||||||
|
default install:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ set +o history
|
||||||
|
pim@summer:~$ mc admin user add chbtl0/ <someuser> <somepass>
|
||||||
|
pim@summer:~$ mc admin policy info chbtl0 consoleAdmin
|
||||||
|
pim@summer:~$ mc admin policy attach chbtl0 consoleAdmin --user=<someuser>
|
||||||
|
pim@summer:~$ mc alias set chbtl0 https://s3.chbtl0.ipng.ch/ <someuser> <somepass>
|
||||||
|
pim@summer:~$ set -o history
|
||||||
|
```
|
||||||
|
|
||||||
|
OK, I feel less gross now that I'm not operating as root on the MinIO deployment. Using my new
|
||||||
|
user-powers, let me set some metadata on my new minio server:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ mc admin config set chbtl0/ site name=chbtl0 region=switzerland
|
||||||
|
Successfully applied new settings.
|
||||||
|
Please restart your server 'mc admin service restart chbtl0/'.
|
||||||
|
pim@summer:~$ mc admin service restart chbtl0/
|
||||||
|
Service status: ▰▰▱ [DONE]
|
||||||
|
Summary:
|
||||||
|
┌───────────────┬─────────────────────────────┐
|
||||||
|
│ Servers: │ 1 online, 0 offline, 0 hung │
|
||||||
|
│ Restart Time: │ 61.322886ms │
|
||||||
|
└───────────────┴─────────────────────────────┘
|
||||||
|
pim@summer:~$ mc admin config get chbtl0/ site
|
||||||
|
site name=chbtl0 region=switzerland
|
||||||
|
```
|
||||||
|
|
||||||
|
By the way, what's really cool about these open standards is that both the Amazon `aws` client works
|
||||||
|
with MinIO, but `mc` also works with AWS!
|
||||||
|
### MinIO Console
|
||||||
|
|
||||||
|
Although I'm pretty good with APIs and command line tools, there's some benefit also in using a
|
||||||
|
Graphical User Interface. MinIO ships with one, but there was a bit of a kerfuffle in the MinIO
|
||||||
|
community. Unfortunately, these are pretty common -- Redis (an open source key/value storage system)
|
||||||
|
changed their offering abruptly. Terraform (an open source infrastructure-as-code tool) changed
|
||||||
|
their licensing at some point. Ansible (an open source machine management tool) changed their
|
||||||
|
offering also. MinIO developers decided to strip their console of ~all features recently. The gnarly
|
||||||
|
bits are discussed on
|
||||||
|
[[reddit](https://www.reddit.com/r/selfhosted/comments/1kva3pw/avoid_minio_developers_introduce_trojan_horse/)].
|
||||||
|
but suffice to say: the same thing that happened in literally 100% of the other cases, also happened
|
||||||
|
here. Somebody decided to simply fork the code from before it was changed.
|
||||||
|
|
||||||
|
Enter OpenMaxIO. A cringe worthy name, but it gets the job done. Reading up on the
|
||||||
|
[[GitHub](https://github.com/OpenMaxIO/openmaxio-object-browser/issues/5)], reviving the fully
|
||||||
|
working console is pretty straight forward -- that is, once somebody spent a few days figuring it
|
||||||
|
out. Thank you `icesvz` for this excellent pointer. With this, I can create a systemd service for
|
||||||
|
the console and start it:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@minio0-chbtl0:~$ cat << EOF | sudo tee -a /etc/default/minio
|
||||||
|
## NOTE(pim): For openmaxio console service
|
||||||
|
CONSOLE_MINIO_SERVER="http://localhost:9000"
|
||||||
|
MINIO_BROWSER_REDIRECT_URL="https://cons0-s3.chbtl0.ipng.ch/"
|
||||||
|
EOF
|
||||||
|
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /lib/systemd/system/minio-console.service
|
||||||
|
[Unit]
|
||||||
|
Description=OpenMaxIO Console Service
|
||||||
|
Wants=network-online.target
|
||||||
|
After=network-online.target
|
||||||
|
AssertFileIsExecutable=/usr/local/bin/minio-console
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
|
||||||
|
WorkingDirectory=/usr/local
|
||||||
|
|
||||||
|
User=minio-user
|
||||||
|
Group=minio-user
|
||||||
|
ProtectProc=invisible
|
||||||
|
|
||||||
|
EnvironmentFile=-/etc/default/minio
|
||||||
|
ExecStart=/usr/local/bin/minio-console server
|
||||||
|
Restart=always
|
||||||
|
LimitNOFILE=1048576
|
||||||
|
MemoryAccounting=no
|
||||||
|
TasksMax=infinity
|
||||||
|
TimeoutSec=infinity
|
||||||
|
OOMScoreAdjust=-1000
|
||||||
|
SendSIGKILL=no
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
pim@minio0-chbtl0:~$ sudo systemctl enable --now minio-console
|
||||||
|
pim@minio0-chbtl0:~$ sudo systemctl restart minio
|
||||||
|
```
|
||||||
|
|
||||||
|
The first snippet is an update to the MinIO configuration that instructs it to redirect users who
|
||||||
|
are not trying to use the API to the console endpoint on `cons0-s3.chbtl0.ipng.ch`, and then the
|
||||||
|
console-server needs to know where to find the API, which from its vantage point is running on
|
||||||
|
`localhost:9000`. Hello, beautiful fully featured console:
|
||||||
|
|
||||||
|
{{< image src="/assets/minio/console-1.png" alt="MinIO Console" >}}
|
||||||
|
|
||||||
|
### MinIO Prometheus
|
||||||
|
|
||||||
|
MinIO ships with a prometheus metrics endpoint, and I notice on its console that it has a nice
|
||||||
|
metrics tab, which is fully greyed out. This is most likely because, well, I don't have a Prometheus
|
||||||
|
install here yet. I decide to keep the storage nodes self-contained and start a Prometheus server on
|
||||||
|
the local machine. I can always plumb that to IPng's Grafana instance later.
|
||||||
|
|
||||||
|
For now, I'll install Prometheus as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@minio0-chbtl0:~$ cat << EOF | sudo tee -a /etc/default/minio
|
||||||
|
## NOTE(pim): Metrics for minio-console
|
||||||
|
MINIO_PROMETHEUS_AUTH_TYPE="public"
|
||||||
|
CONSOLE_PROMETHEUS_URL="http://localhost:19090/"
|
||||||
|
CONSOLE_PROMETHEUS_JOB_ID="minio-job"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
pim@minio0-chbtl0:~$ sudo apt install prometheus
|
||||||
|
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /etc/default/prometheus
|
||||||
|
ARGS="--web.listen-address='[::]:19090' --storage.tsdb.retention.size=16GB"
|
||||||
|
EOF
|
||||||
|
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /etc/prometheus/prometheus.yml
|
||||||
|
global:
|
||||||
|
scrape_interval: 60s
|
||||||
|
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: minio-job
|
||||||
|
metrics_path: /minio/v2/metrics/cluster
|
||||||
|
static_configs:
|
||||||
|
- targets: ['localhost:9000']
|
||||||
|
labels:
|
||||||
|
cluster: minio0-chbtl0
|
||||||
|
|
||||||
|
- job_name: minio-job-node
|
||||||
|
metrics_path: /minio/v2/metrics/node
|
||||||
|
static_configs:
|
||||||
|
- targets: ['localhost:9000']
|
||||||
|
labels:
|
||||||
|
cluster: minio0-chbtl0
|
||||||
|
|
||||||
|
- job_name: minio-job-bucket
|
||||||
|
metrics_path: /minio/v2/metrics/bucket
|
||||||
|
static_configs:
|
||||||
|
- targets: ['localhost:9000']
|
||||||
|
labels:
|
||||||
|
cluster: minio0-chbtl0
|
||||||
|
|
||||||
|
- job_name: minio-job-resource
|
||||||
|
metrics_path: /minio/v2/metrics/resource
|
||||||
|
static_configs:
|
||||||
|
- targets: ['localhost:9000']
|
||||||
|
labels:
|
||||||
|
cluster: minio0-chbtl0
|
||||||
|
|
||||||
|
- job_name: node
|
||||||
|
static_configs:
|
||||||
|
- targets: ['localhost:9100']
|
||||||
|
labels:
|
||||||
|
cluster: minio0-chbtl0
|
||||||
|
pim@minio0-chbtl0:~$ sudo systemctl restart minio prometheus
|
||||||
|
```
|
||||||
|
|
||||||
|
In the first snippet, I'll tell MinIO where it should find its Prometheus instance. Since the MinIO
|
||||||
|
console service is running on port 9090, and this is also the default port for Prometheus, I will
|
||||||
|
run Promtheus on port 19090 instead. From reading the MinIO docs, I can see that normally MinIO will
|
||||||
|
want prometheus to authenticate to it before it'll allow the endpoints to be scraped. I'll turn that
|
||||||
|
off by making these public. On the IPng Frontends, I can always remove access to /minio/v2 and
|
||||||
|
simply use the IPng Site Local access for local Prometheus scrapers instead.
|
||||||
|
|
||||||
|
After telling Prometheus its runtime arguments (in `/etc/default/prometheus`) and its scraping
|
||||||
|
endpoints (in `/etc/prometheus/prometheus.yml`), I can restart minio and prometheus. A few minutes
|
||||||
|
later, I can see the _Metrics_ tab in the console come to life.
|
||||||
|
|
||||||
|
But now that I have this prometheus running on the MinIO node, I can also add it to IPng's Grafana
|
||||||
|
configuration, by adding a new data source on `minio0.chbtl0.net.ipng.ch:19090` and pointing the
|
||||||
|
default Grafana [[Dashboard](https://grafana.com/grafana/dashboards/13502-minio-dashboard/)] at it:
|
||||||
|
|
||||||
|
{{< image src="/assets/minio/console-2.png" alt="Grafana Dashboard" >}}
|
||||||
|
|
||||||
|
A two-for-one: I will both be able to see metrics directly in the console, but also I will be able
|
||||||
|
to hook up these per-node prometheus instances into IPng's alertmanager also, and I've read some
|
||||||
|
[[docs](https://min.io/docs/minio/linux/operations/monitoring/collect-minio-metrics-using-prometheus.html)]
|
||||||
|
on the concepts. I'm really liking the experience so far!
|
||||||
|
|
||||||
|
### MinIO Nagios
|
||||||
|
|
||||||
|
Prometheus is fancy and all, but at IPng Networks, I've been doing monitoring for a while now. As a
|
||||||
|
dinosaur, I still have an active [[Nagios](https://www.nagios.org/)] install, which autogenerates
|
||||||
|
all of its configuration using the Ansible repository I have. So for the new Ansible group called
|
||||||
|
`minio`, I will autogenerate the following snippet:
|
||||||
|
|
||||||
|
```
|
||||||
|
define command {
|
||||||
|
command_name ipng_check_minio
|
||||||
|
command_line $USER1$/check_http -E -H $HOSTALIAS$ -I $ARG1$ -p $ARG2$ -u $ARG3$ -r '$ARG4$'
|
||||||
|
}
|
||||||
|
|
||||||
|
define service {
|
||||||
|
hostgroup_name ipng:minio:ipv6
|
||||||
|
service_description minio6:api
|
||||||
|
check_command ipng_check_minio!$_HOSTADDRESS6$!9000!/minio/health/cluster!
|
||||||
|
use ipng-service-fast
|
||||||
|
notification_interval 0 ; set > 0 if you want to be renotified
|
||||||
|
}
|
||||||
|
|
||||||
|
define service {
|
||||||
|
hostgroup_name ipng:minio:ipv6
|
||||||
|
service_description minio6:prom
|
||||||
|
check_command ipng_check_minio!$_HOSTADDRESS6$!19090!/classic/targets!minio-job
|
||||||
|
use ipng-service-fast
|
||||||
|
notification_interval 0 ; set > 0 if you want to be renotified
|
||||||
|
}
|
||||||
|
|
||||||
|
define service {
|
||||||
|
hostgroup_name ipng:minio:ipv6
|
||||||
|
service_description minio6:console
|
||||||
|
check_command ipng_check_minio!$_HOSTADDRESS6$!9090!/!MinIO Console
|
||||||
|
use ipng-service-fast
|
||||||
|
notification_interval 0 ; set > 0 if you want to be renotified
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
I've shown the snippet for IPv6 but I also have three services defined for legacy IP in the
|
||||||
|
hostgroup `ipng:minio:ipv4`. The check command here uses `-I` which has the IPv4 or IPv6 address to
|
||||||
|
talk to, `-p` for the port to consule, `-u` for the URI to hit and an option `-r` for a regular
|
||||||
|
expression to expect in the output. For the Nagios afficianados out there: my Ansible `groups`
|
||||||
|
correspond one to one with autogenerated Nagios `hostgroups`. This allows me to add arbitrary checks
|
||||||
|
by group-type, like above in the `ipng:minio` group for IPv4 and IPv6.
|
||||||
|
|
||||||
|
In the MinIO [[docs](https://min.io/docs/minio/linux/operations/monitoring/healthcheck-probe.html)]
|
||||||
|
I read up on the Healthcheck API. I choose to monitor the _Cluster Write Quorum_ on my minio
|
||||||
|
deployments. For Prometheus, I decide to hit the `targets` endpoint and expect the `minio-job` to be
|
||||||
|
among them. Finally, for the MinIO Console, I expect to see a login screen with the words `MinIO
|
||||||
|
Console` in the returned page. I guessed right, because Nagios is all green:
|
||||||
|
|
||||||
|
{{< image src="/assets/minio/nagios.png" alt="Nagios Dashboard" >}}
|
||||||
|
|
||||||
|
## My First Bucket
|
||||||
|
|
||||||
|
The IPng website is a statically generated Hugo site, and when-ever I submit a change to my Git
|
||||||
|
repo, a CI/CD runner (called [[Drone](https://www.drone.io/)]), picks up the change. It re-builds
|
||||||
|
the static website, and copies it to four redundant NGINX servers.
|
||||||
|
|
||||||
|
But IPng's website has amassed quite a bit of extra files (like VM images and VPP packages that I
|
||||||
|
publish), which are copied separately using a simple push script I have in my home directory. This
|
||||||
|
avoids all those big media files from cluttering the Git repository. I decide to move this stuff
|
||||||
|
into S3:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~/src/ipng-web-assets$ echo 'Gruezi World.' > ipng.ch/media/README.md
|
||||||
|
pim@summer:~/src/ipng-web-assets$ mc mb chbtl0/ipng-web-assets
|
||||||
|
pim@summer:~/src/ipng-web-assets$ mc mirror . chbtl0/ipng-web-assets/
|
||||||
|
...ch/media/README.md: 6.50 GiB / 6.50 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 236.38 MiB/s 28s
|
||||||
|
pim@summer:~/src/ipng-web-assets$ mc anonymous set download chbtl0/ipng-web-assets/
|
||||||
|
```
|
||||||
|
|
||||||
|
OK, two things that immediately jump out at me. This stuff is **fast**: Summer is connected with a
|
||||||
|
2.5GbE network card, and she's running hard, copying the 6.5GB of data that are in these web assets
|
||||||
|
essentially at line rate. It doesn't really surprise me because Summer is running off of Gen4 NVME,
|
||||||
|
while MinIO has 12 spinning disks which each can write about 160MB/s or so sustained
|
||||||
|
[[ref](https://www.seagate.com/www-content/datasheets/pdfs/exos-x16-DS2011-1-1904US-en_US.pdf)],
|
||||||
|
with 24 CPUs to tend to the NIC (2x10G) and disks (2x SSD, 12x LFF). Should be plenty!
|
||||||
|
|
||||||
|
The second is that MinIO allows for buckets to be publicly shared in three ways: 1) read-only by
|
||||||
|
setting `download`; 2) write-only by setting `upload`, and 3) read-write by setting `public`.
|
||||||
|
I set `download` here, which means I should be able to fetch an asset now publicly:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ curl https://s3.chbtl0.ipng.ch/ipng-web-assets/ipng.ch/media/README.md
|
||||||
|
Gruezi World.
|
||||||
|
pim@summer:~$ curl https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/README.md
|
||||||
|
Gruezi World.
|
||||||
|
```
|
||||||
|
|
||||||
|
The first `curl` here shows the path-based access, while the second one shows an equivalent
|
||||||
|
virtual-host based access. Both retrieve the file I just pushed via the public Internet. Whoot!
|
||||||
|
|
||||||
|
# What's Next
|
||||||
|
|
||||||
|
I'm going to be moving [[Restic](https://restic.net/)] backups from IPng's ZFS storage pool to this
|
||||||
|
S3 service over the next few days. I'll also migrate PeerTube and possibly Mastodon from NVME based
|
||||||
|
storage to replicated S3 buckets as well. Finally, the IPng website media that I mentioned above,
|
||||||
|
should make for a nice followup article. Stay tuned!
|
475
content/articles/2025-06-01-minio-2.md
Normal file
475
content/articles/2025-06-01-minio-2.md
Normal file
@@ -0,0 +1,475 @@
|
|||||||
|
---
|
||||||
|
date: "2025-06-01T10:07:23Z"
|
||||||
|
title: 'Case Study: Minio S3 - Part 2'
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/minio/minio-logo.png" alt="MinIO Logo" width="6em" >}}
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading
|
||||||
|
scalability, data availability, security, and performance. Millions of customers of all sizes and
|
||||||
|
industries store, manage, analyze, and protect any amount of data for virtually any use case, such
|
||||||
|
as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and
|
||||||
|
easy-to-use management features, you can optimize costs, organize and analyze data, and configure
|
||||||
|
fine-tuned access controls to meet specific business and compliance requirements.
|
||||||
|
|
||||||
|
Amazon's S3 became the _de facto_ standard object storage system, and there exist several fully open
|
||||||
|
source implementations of the protocol. One of them is MinIO: designed to allow enterprises to
|
||||||
|
consolidate all of their data on a single, private cloud namespace. Architected using the same
|
||||||
|
principles as the hyperscalers, AIStor delivers performance at scale at a fraction of the cost
|
||||||
|
compared to the public cloud.
|
||||||
|
|
||||||
|
IPng Networks is an Internet Service Provider, but I also dabble in self-hosting things, for
|
||||||
|
example [[PeerTube](https://video.ipng.ch/)], [[Mastodon](https://ublog.tech/)],
|
||||||
|
[[Immich](https://photos.ipng.ch/)], [[Pixelfed](https://pix.ublog.tech/)] and of course
|
||||||
|
[[Hugo](https://ipng.ch/)]. These services all have one thing in common: they tend to use lots of
|
||||||
|
storage when they grow. At IPng Networks, all hypervisors ship with enterprise SAS flash drives,
|
||||||
|
mostly 1.92TB and 3.84TB. Scaling up each of these services, and backing them up safely, can be
|
||||||
|
quite the headache.
|
||||||
|
|
||||||
|
In a [[previous article]({{< ref 2025-05-28-minio-1 >}})], I talked through the install of a
|
||||||
|
redundant set of three Minio machines. In this article, I'll start putting them to good use.
|
||||||
|
|
||||||
|
## Use Case: Restic
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/minio/restic-logo.png" alt="Restic Logo" width="12em" >}}
|
||||||
|
|
||||||
|
[[Restic](https://restic.org/)] is a modern backup program that can back up your files from multiple
|
||||||
|
host OS, to many different storage types, easily, effectively, securely, verifiably and freely. With
|
||||||
|
a sales pitch like that, what's not to love? Actually, I am a long-time
|
||||||
|
[[BorgBackup](https://www.borgbackup.org/)] user, and I think I'll keep that running. However, for
|
||||||
|
resilience, and because I've heard only good things about Restic, I'll make a second backup of the
|
||||||
|
routers, hypervisors, and virtual machines using Restic.
|
||||||
|
|
||||||
|
Restic can use S3 buckets out of the box (incidentally, so can BorgBackup). To configure it, I use
|
||||||
|
a mixture of environment variables and flags. But first, let me create a bucket for the backups.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@glootie:~$ mc mb chbtl0/ipng-restic
|
||||||
|
pim@glootie:~$ mc admin user add chbtl0/ <key> <secret>
|
||||||
|
pim@glootie:~$ cat << EOF | tee ipng-restic-access.json
|
||||||
|
{
|
||||||
|
"PolicyName": "ipng-restic-access",
|
||||||
|
"Policy": {
|
||||||
|
"Version": "2012-10-17",
|
||||||
|
"Statement": [
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ],
|
||||||
|
"Resource": [ "arn:aws:s3:::ipng-restic", "arn:aws:s3:::ipng-restic/*" ]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
pim@glootie:~$ mc admin policy create chbtl0/ ipng-restic-access.json
|
||||||
|
pim@glootie:~$ mc admin policy attach chbtl0/ ipng-restic-access --user <key>
|
||||||
|
```
|
||||||
|
|
||||||
|
First, I'll create a bucket called `ipng-restic`. Then, I'll create a _user_ with a given secret
|
||||||
|
_key_. To protect the innocent, and my backups, I'll not disclose them. Next, I'll create an
|
||||||
|
IAM policy that allows for Get/List/Put/Delete to be performed on the bucket and its contents, and
|
||||||
|
finally I'll attach this policy to the user I just created.
|
||||||
|
|
||||||
|
To run a Restic backup, I'll first have to create a so-called _repository_. The repository has a
|
||||||
|
location and a password, which Restic uses to encrypt the data. Because I'm using S3, I'll also need
|
||||||
|
to specify the key and secret:
|
||||||
|
|
||||||
|
```
|
||||||
|
root@glootie:~# RESTIC_PASSWORD="changeme"
|
||||||
|
root@glootie:~# RESTIC_REPOSITORY="s3:https://s3.chbtl0.ipng.ch/ipng-restic/$(hostname)/"
|
||||||
|
root@glootie:~# AWS_ACCESS_KEY_ID="<key>"
|
||||||
|
root@glootie:~# AWS_SECRET_ACCESS_KEY:="<secret>"
|
||||||
|
root@glootie:~# export RESTIC_PASSWORD RESTIC_REPOSITORY AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
|
||||||
|
root@glootie:~# restic init
|
||||||
|
created restic repository 807cf25e85 at s3:https://s3.chbtl0.ipng.ch/ipng-restic/glootie.ipng.ch/
|
||||||
|
```
|
||||||
|
|
||||||
|
Restic prints out some repository finterprint of the latest 'snapshot' it just created. Taking a
|
||||||
|
look on the MinIO install:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@glootie:~$ mc stat chbtl0/ipng-restic/glootie.ipng.ch/
|
||||||
|
Name : config
|
||||||
|
Date : 2025-06-01 12:01:43 UTC
|
||||||
|
Size : 155 B
|
||||||
|
ETag : 661a43f72c43080649712e45da14da3a
|
||||||
|
Type : file
|
||||||
|
Metadata :
|
||||||
|
Content-Type: application/octet-stream
|
||||||
|
|
||||||
|
Name : keys/
|
||||||
|
Date : 2025-06-01 12:03:33 UTC
|
||||||
|
Type : folder
|
||||||
|
```
|
||||||
|
|
||||||
|
Cool. Now I'm ready to make my first full backup:
|
||||||
|
|
||||||
|
```
|
||||||
|
root@glootie:~# ARGS="--exclude /proc --exclude /sys --exclude /dev --exclude /run"
|
||||||
|
root@glootie:~# ARGS="$ARGS --exclude-if-present .nobackup"
|
||||||
|
root@glootie:~# restic backup $ARGS /
|
||||||
|
...
|
||||||
|
processed 1141426 files, 131.111 GiB in 15:12
|
||||||
|
snapshot 34476c74 saved
|
||||||
|
```
|
||||||
|
|
||||||
|
Once the backup completes, the Restic authors advise me to also do a check of the repository, and to
|
||||||
|
prune it so that it keeps a finite amount of daily, weekly and monthly backups. My further journey
|
||||||
|
for Restic looks a bit like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
root@glootie:~# restic check
|
||||||
|
using temporary cache in /tmp/restic-check-cache-2712250731
|
||||||
|
create exclusive lock for repository
|
||||||
|
load indexes
|
||||||
|
check all packs
|
||||||
|
check snapshots, trees and blobs
|
||||||
|
[0:04] 100.00% 1 / 1 snapshots
|
||||||
|
|
||||||
|
no errors were found
|
||||||
|
|
||||||
|
root@glootie:~# restic forget --prune --keep-daily 8 --keep-weekly 5 --keep-monthly 6
|
||||||
|
repository 34476c74 opened (version 2, compression level auto)
|
||||||
|
Applying Policy: keep 8 daily, 5 weekly, 6 monthly snapshots
|
||||||
|
keep 1 snapshots:
|
||||||
|
ID Time Host Tags Reasons Paths
|
||||||
|
---------------------------------------------------------------------------------
|
||||||
|
34476c74 2025-06-01 12:18:54 glootie.ipng.ch daily snapshot /
|
||||||
|
weekly snapshot
|
||||||
|
monthly snapshot
|
||||||
|
----------------------------------------------------------------------------------
|
||||||
|
1 snapshots
|
||||||
|
```
|
||||||
|
|
||||||
|
Right on! I proceed to update the Ansible configs at IPng to roll this out against the entire fleet
|
||||||
|
of 152 hosts at IPng Networks. I do this in a little tool called `bitcron`, which I wrote for a
|
||||||
|
previous company I worked at: [[BIT](https://bit.nl)] in the Netherlands. Bitcron allows me to
|
||||||
|
create relatively elegant cronjobs that can raise warnings, errors and fatal issues. If no issues
|
||||||
|
are found, an e-mail can be sent to a bitbucket address, but if warnings or errors are found, a
|
||||||
|
different _monitored_ address will be used. Bitcron is kind of cool, and I wrote it in 2001. Maybe
|
||||||
|
I'll write about it, for old time's sake. I wonder if the folks at BIT still use it?
|
||||||
|
|
||||||
|
## Use Case: NGINX
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/minio/nginx-logo.png" alt="NGINX Logo" width="11em" >}}
|
||||||
|
|
||||||
|
OK, with the first use case out of the way, I turn my attention to a second - in my opinion more
|
||||||
|
interesting - use case. In the [[previous article]({{< ref 2025-05-28-minio-1 >}})], I created a
|
||||||
|
public bucket called `ipng-web-assets` in which I stored 6.50GB of website data belonging to the
|
||||||
|
IPng website, and some material I posted when I was on my
|
||||||
|
[[Sabbatical](https://sabbatical.ipng.nl/)] last year.
|
||||||
|
|
||||||
|
### MinIO: Bucket Replication
|
||||||
|
|
||||||
|
First things first: redundancy. These web assets are currently pushed to all four nginx machines,
|
||||||
|
and statically served. If I were to replace them with a single S3 bucket, I would create a single
|
||||||
|
point of failure, and that's _no bueno_!
|
||||||
|
|
||||||
|
Off I go, creating a replicated bucket using two MinIO instances (`chbtl0` and `ddln0`):
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@glootie:~$ mc mb ddln0/ipng-web-assets
|
||||||
|
pim@glootie:~$ mc anonymous set download ddln0/ipng-web-assets
|
||||||
|
pim@glootie:~$ mc admin user add ddln0/ <replkey> <replsecret>
|
||||||
|
pim@glootie:~$ cat << EOF | tee ipng-web-assets-access.json
|
||||||
|
{
|
||||||
|
"PolicyName": "ipng-web-assets-access",
|
||||||
|
"Policy": {
|
||||||
|
"Version": "2012-10-17",
|
||||||
|
"Statement": [
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ],
|
||||||
|
"Resource": [ "arn:aws:s3:::ipng-web-assets", "arn:aws:s3:::ipng-web-assets/*" ]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
pim@glootie:~$ mc admin policy create ddln0/ ipng-web-assets-access.json
|
||||||
|
pim@glootie:~$ mc admin policy attach ddln0/ ipng-web-assets-access --user <replkey>
|
||||||
|
pim@glootie:~$ mc replicate add chbtl0/ipng-web-assets \
|
||||||
|
--remote-bucket https://<key>:<secret>@s3.ddln0.ipng.ch/ipng-web-assets
|
||||||
|
```
|
||||||
|
|
||||||
|
What happens next is pure magic. I've told `chbtl0` that I want it to replicate all existing and
|
||||||
|
future changes to that bucket to its neighbor `ddln0`. Only minutes later, I check the replication
|
||||||
|
status, just to see that it's _already done_:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@glootie:~$ mc replicate status chbtl0/ipng-web-assets
|
||||||
|
Replication status since 1 hour
|
||||||
|
s3.ddln0.ipng.ch
|
||||||
|
Replicated: 142 objects (6.5 GiB)
|
||||||
|
Queued: ● 0 objects, 0 B (avg: 4 objects, 915 MiB ; max: 0 objects, 0 B)
|
||||||
|
Workers: 0 (avg: 0; max: 0)
|
||||||
|
Transfer Rate: 15 kB/s (avg: 88 MB/s; max: 719 MB/s
|
||||||
|
Latency: 3ms (avg: 3ms; max: 7ms)
|
||||||
|
Link: ● online (total downtime: 0 milliseconds)
|
||||||
|
Errors: 0 in last 1 minute; 0 in last 1hr; 0 since uptime
|
||||||
|
Configured Max Bandwidth (Bps): 644 GB/s Current Bandwidth (Bps): 975 B/s
|
||||||
|
pim@summer:~/src/ipng-web-assets$ mc ls ddln0/ipng-web-assets/
|
||||||
|
[2025-06-01 12:42:22 CEST] 0B ipng.ch/
|
||||||
|
[2025-06-01 12:42:22 CEST] 0B sabbatical.ipng.nl/
|
||||||
|
```
|
||||||
|
|
||||||
|
MinIO has pumped the data from bucket `ipng-web-assets` to the other machine at an average of 88MB/s
|
||||||
|
with a peak throughput of 719MB/s (probably for the larger VM images). And indeed, looking at the
|
||||||
|
remote machine, it is fully caught up after the push, within only a minute or so with a completely
|
||||||
|
fresh copy. Nice!
|
||||||
|
|
||||||
|
### MinIO: Missing directory index
|
||||||
|
|
||||||
|
I take a look at what I just built, on the following URL:
|
||||||
|
* [https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/IMG_0406_0.mp4](https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/IMG_0406_0.mp4)
|
||||||
|
|
||||||
|
That checks out, and I can see the mess that was my room when I first went on sabbatical. By the
|
||||||
|
way, I totally cleaned it up, see
|
||||||
|
[[here](https://sabbatical.ipng.nl/blog/2024/08/01/thursday-basement-done/)] for proof. I can't,
|
||||||
|
however, see the directory listing:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@glootie:~$ curl https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/
|
||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<Error>
|
||||||
|
<Code>NoSuchKey</Code>
|
||||||
|
<Message>The specified key does not exist.</Message>
|
||||||
|
<Key>sabbatical.ipng.nl/media/vdo/</Key>
|
||||||
|
<BucketName>ipng-web-assets</BucketName>
|
||||||
|
<Resource>/sabbatical.ipng.nl/media/vdo/</Resource>
|
||||||
|
<RequestId>1844EC0CFEBF3C5F</RequestId>
|
||||||
|
<HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId>
|
||||||
|
</Error>
|
||||||
|
```
|
||||||
|
|
||||||
|
That's unfortunate, because some of the IPng articles link to a directory full of files, which I'd
|
||||||
|
like to be shown so that my readers can navigate through the directories. Surely I'm not the first
|
||||||
|
to encounter this? And sure enough, I'm not
|
||||||
|
[[ref](https://github.com/glowinthedark/index-html-generator)] by user `glowinthedark` who wrote a
|
||||||
|
little python script that generates `index.html` files for their Caddy file server. I'll take me
|
||||||
|
some of that Python, thank you!
|
||||||
|
|
||||||
|
With the following little script, my setup is complete:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@glootie:~/src/ipng-web-assets$ cat push.sh
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
echo "Generating index.html files ..."
|
||||||
|
for D in */media; do
|
||||||
|
echo "* Directory $D"
|
||||||
|
./genindex.py -r $D
|
||||||
|
done
|
||||||
|
echo "Done (genindex)"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "Mirroring directoro to S3 Bucket"
|
||||||
|
mc mirror --remove --overwrite . chbtl0/ipng-web-assets/
|
||||||
|
echo "Done (mc mirror)"
|
||||||
|
echo ""
|
||||||
|
pim@glootie:~/src/ipng-web-assets$ ./push.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Only a few seconds after I run `./push.sh`, the replication is complete and I have two identical
|
||||||
|
copies of my media:
|
||||||
|
|
||||||
|
1. [https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/](https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/index.html)
|
||||||
|
1. [https://ipng-web-assets.s3.ddln0.ipng.ch/ipng.ch/media/](https://ipng-web-assets.s3.ddln0.ipng.ch/ipng.ch/media/index.html)
|
||||||
|
|
||||||
|
|
||||||
|
### NGINX: Proxy to Minio
|
||||||
|
|
||||||
|
Before moving to S3 storage, my NGINX frontends all kept a copy of the IPng media on local NVME
|
||||||
|
disk. That's great for reliability, as each NGINX instance is completely hermetic and standalone.
|
||||||
|
However, it's not great for scaling: the current NGINX instances only have 16GB of local storage,
|
||||||
|
and I'd rather not have my static web asset data outgrow that filesystem. From before, I already had
|
||||||
|
an NGINX config that served the Hugo static data from `/var/www/ipng.ch/ and the `/media'
|
||||||
|
subdirectory from a different directory in `/var/www/ipng-web-assets/ipng.ch/media`.
|
||||||
|
|
||||||
|
Moving to redundant S3 storage backenda is straight forward:
|
||||||
|
|
||||||
|
```
|
||||||
|
upstream minio_ipng {
|
||||||
|
least_conn;
|
||||||
|
server minio0.chbtl0.net.ipng.ch:9000;
|
||||||
|
server minio0.ddln0.net.ipng.ch:9000;
|
||||||
|
}
|
||||||
|
|
||||||
|
server {
|
||||||
|
...
|
||||||
|
location / {
|
||||||
|
root /var/www/ipng.ch/;
|
||||||
|
}
|
||||||
|
|
||||||
|
location /media {
|
||||||
|
proxy_set_header Host $http_host;
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
|
||||||
|
proxy_connect_timeout 300;
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Connection "";
|
||||||
|
chunked_transfer_encoding off;
|
||||||
|
|
||||||
|
rewrite (.*)/$ $1/index.html;
|
||||||
|
|
||||||
|
proxy_pass http://minio_ipng/ipng-web-assets/ipng.ch/media;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
I want to make note of a few things:
|
||||||
|
1. The `upstream` definition here uses IPng Site Local entrypoints, considering the NGINX servers
|
||||||
|
all have direct MTU=9000 access to the MinIO instances. I'll put both in there, in a
|
||||||
|
round-robin configuration favoring the replica with _least connections_.
|
||||||
|
1. Deeplinking to directory names without the trailing `/index.html` would serve a 404 from the
|
||||||
|
backend, so I'll intercept these and rewrite directory to always include the `/index.html'.
|
||||||
|
1. The used upstream endpoint is _path-based_, that is to say has the bucketname and website name
|
||||||
|
included. This whole location used to be simply `root /var/www/ipng-web-assets/ipng.ch/media/`
|
||||||
|
so the mental change is quite small.
|
||||||
|
|
||||||
|
### NGINX: Caching
|
||||||
|
|
||||||
|
|
||||||
|
After deploying the S3 upstream on all IPng websites, I can delete the old
|
||||||
|
`/var/www/ipng-web-assets/` directory and reclaim about 7GB of diskspace. This gives me an idea ...
|
||||||
|
|
||||||
|
{{< image width="8em" float="left" src="/assets/shared/brain.png" alt="brain" >}}
|
||||||
|
|
||||||
|
On the one hand it's great that I will pull these assets from Minio and all, but at the same time,
|
||||||
|
it's a tad inefficient to retrieve them from, say, Zurich to Amsterdam just to serve them onto the
|
||||||
|
internet again. If at any time something on the IPng website goes viral, it'd be nice to be able to
|
||||||
|
serve them directly from the edge, right?
|
||||||
|
|
||||||
|
A webcache. What could _possibly_ go wrong :)
|
||||||
|
|
||||||
|
NGINX is really really good at caching content. It has a powerful engine to store, scan, revalidate
|
||||||
|
and match any content and upstream headers. It's also very well documented, so I take a look at the
|
||||||
|
proxy module's documentation [[here](https://nginx.org/en/docs/http/ngx_http_proxy_module.html)] and
|
||||||
|
in particular a useful [[blog](https://blog.nginx.org/blog/nginx-caching-guide)] on their website.
|
||||||
|
|
||||||
|
The first thing I need to do is create what is called a _key zone_, which is a region of memory in
|
||||||
|
which URL keys are stored with some metadata. Having a copy of the keys in memory enables NGINX to
|
||||||
|
quickly determine if a request is a HIT or a MISS without having to go to disk, greatly speeding up
|
||||||
|
the check.
|
||||||
|
|
||||||
|
In `/etc/nginx/conf.d/ipng-cache.conf` I add the following NGINX cache:
|
||||||
|
|
||||||
|
```
|
||||||
|
proxy_cache_path /var/www/nginx-cache levels=1:2 keys_zone=ipng_cache:10m max_size=8g
|
||||||
|
inactive=24h use_temp_path=off;
|
||||||
|
```
|
||||||
|
|
||||||
|
With this statement, I'll create a 2-level subdirectory, and allocate 10MB of space, which should
|
||||||
|
hold on the order of 100K entries. The maximum size I'll allow the cache to grow to is 8GB, and I'll
|
||||||
|
mark any object inactive if it's not been referenced for 24 hours. I learn that inactive is
|
||||||
|
different to expired content. If a cache element has expired, but NGINX can't reach the upstream
|
||||||
|
for a new copy, it can be configured to serve a inactive (stale) copy from the cache. That's dope,
|
||||||
|
as it serves as an extra layer of defence in case the network or all available S3 replicas take the
|
||||||
|
day off. I'll ask NGINX to avoid writing objects first to a tmp directory and them moving them into
|
||||||
|
the `/var/www/nginx-cache` directory. These are recommendations I grab from the manual.
|
||||||
|
|
||||||
|
Within the `location` block I configured above, I'm now ready to enable this cache. I'll do that by
|
||||||
|
adding two include files, which I'll reference in all sites that I want to have make use of this
|
||||||
|
cache:
|
||||||
|
|
||||||
|
First, to enable the cache, I write the following snippet:
|
||||||
|
```
|
||||||
|
pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-cache.inc
|
||||||
|
proxy_cache ipng_cache;
|
||||||
|
proxy_ignore_headers Cache-Control;
|
||||||
|
proxy_cache_valid any 1h;
|
||||||
|
proxy_cache_revalidate on;
|
||||||
|
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
|
||||||
|
proxy_cache_background_update on;
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, I find it useful to emit a few debugging HTTP headers, and at the same time I see that Minio
|
||||||
|
emits a bunch of HTTP headers that may not be safe for me to propagate, so I pen two more snippets:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-strip-minio-headers.inc
|
||||||
|
proxy_hide_header x-minio-deployment-id;
|
||||||
|
proxy_hide_header x-amz-request-id;
|
||||||
|
proxy_hide_header x-amz-id-2;
|
||||||
|
proxy_hide_header x-amz-replication-status;
|
||||||
|
proxy_hide_header x-amz-version-id;
|
||||||
|
|
||||||
|
pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-add-upstream-headers.inc
|
||||||
|
add_header X-IPng-Frontend $hostname always;
|
||||||
|
add_header X-IPng-Upstream $upstream_addr always;
|
||||||
|
add_header X-IPng-Upstream-Status $upstream_status always;
|
||||||
|
add_header X-IPng-Cache-Status $upstream_cache_status;
|
||||||
|
```
|
||||||
|
|
||||||
|
With that, I am ready to enable caching of the IPng `/media` location:
|
||||||
|
|
||||||
|
```
|
||||||
|
location /media {
|
||||||
|
...
|
||||||
|
include /etc/nginx/conf.d/ipng-strip-minio-headers.inc;
|
||||||
|
include /etc/nginx/conf.d/ipng-add-upstream-headers.inc;
|
||||||
|
include /etc/nginx/conf.d/ipng-cache.inc;
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
I run the Ansible playbook for the NGINX cluster and take a look at the replica at Coloclue in
|
||||||
|
Amsterdam, called `nginx0.nlams1.ipng.ch`. Notably, it'll have to retrieve the file from a MinIO
|
||||||
|
replica in Zurich (12ms away), so it's expected to take a little while.
|
||||||
|
|
||||||
|
The first attempt:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \
|
||||||
|
https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz
|
||||||
|
...
|
||||||
|
< last-modified: Sun, 01 Jun 2025 12:37:52 GMT
|
||||||
|
< x-ipng-frontend: nginx0-nlams1
|
||||||
|
< x-ipng-cache-status: MISS
|
||||||
|
< x-ipng-upstream: [2001:678:d78:503::b]:9000
|
||||||
|
< x-ipng-upstream-status: 200
|
||||||
|
|
||||||
|
100 711M 100 711M 0 0 26.2M 0 0:00:27 0:00:27 --:--:-- 26.6M
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
OK, that's respectable, I've read the file at 26MB/s. Of course I just turned on the cache, so the
|
||||||
|
NGINX fetches the file from Zurich while handing it over to my `curl` here. It notifies me by means
|
||||||
|
of a HTTP header that the cache was a `MISS`, and then which upstream server it contacted to
|
||||||
|
retrieve the object.
|
||||||
|
|
||||||
|
But look at what happens the _second_ time I run the same command:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \
|
||||||
|
https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz
|
||||||
|
< last-modified: Sun, 01 Jun 2025 12:37:52 GMT
|
||||||
|
< x-ipng-frontend: nginx0-nlams1
|
||||||
|
< x-ipng-cache-status: HIT
|
||||||
|
|
||||||
|
100 711M 100 711M 0 0 436M 0 0:00:01 0:00:01 --:--:-- 437M
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Holy moly! First I see the object has the same _Last-Modified_ header, but I now also see that the
|
||||||
|
_Cache-Status_ was a `HIT`, and there is no mention of any upstream server. I do however see the
|
||||||
|
file come in at a whopping 437MB/s which is 16x faster than over the network!! Nice work, NGINX!
|
||||||
|
|
||||||
|
{{< image float="right" src="/assets/minio/rack-2.png" alt="Rack-o-Minio" width="12em" >}}
|
||||||
|
|
||||||
|
# What's Next
|
||||||
|
|
||||||
|
I'm going to deploy the third MinIO replica in Rümlang once the disks arrive. I'll release the
|
||||||
|
~4TB of disk used currently in Restic backups for the fleet, and put that ZFS capacity to other use.
|
||||||
|
Now, creating services like PeerTube, Mastodon, Pixelfed, Loops, NextCloud and what-have-you, will
|
||||||
|
become much easier for me. And with the per-bucket replication between MinIO deployments, I also
|
||||||
|
think this is a great way to auto-backup important data. First off, it'll be RS8.4 on the MinIO node
|
||||||
|
itself, and secondly, user data will be copied automatically to a neighboring facility.
|
||||||
|
|
||||||
|
I've convinced myself that S3 storage is a great service to operate, and that MinIO is awesome.
|
375
content/articles/2025-07-12-vpp-evpn-1.md
Normal file
375
content/articles/2025-07-12-vpp-evpn-1.md
Normal file
@@ -0,0 +1,375 @@
|
|||||||
|
---
|
||||||
|
date: "2025-07-12T08:07:23Z"
|
||||||
|
title: 'VPP and eVPN/VxLAN - Part 1'
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image width="6em" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
You know what would be really cool? If VPP could be an eVPN/VxLAN speaker! Sometimes I feel like I'm
|
||||||
|
the very last on the planet to learn about something cool. My latest "A-Ha!"-moment was when I was
|
||||||
|
configuring the eVPN fabric for [[Frys-IX](https://frys-ix.net/)], and I wrote up an article about
|
||||||
|
it [[here]({{< ref 2025-04-09-frysix-evpn >}})] back in April.
|
||||||
|
|
||||||
|
I can build the equivalent of Virtual Private Wires (VPWS), also called L2VPN or Virtual Leased
|
||||||
|
Lines, and these are straight forward because they typically only have two endpoints. A "regular"
|
||||||
|
VxLAN tunnel which is L2 cross connected with another interface already does that just fine. Take a
|
||||||
|
look at an article on [[L2 Gymnastics]({{< ref 2022-01-12-vpp-l2 >}})] for that. But the real kicker
|
||||||
|
is that I can also create multi-site L2 domains like Virtual Private LAN Services (VPLS) or also
|
||||||
|
called Virtual Private Ethernet, L2VPN or Ethernet LAN Service (E-LAN). And *that* is a whole other
|
||||||
|
level of awesome.
|
||||||
|
|
||||||
|
## Recap: VPP today
|
||||||
|
|
||||||
|
### VPP: VxLAN
|
||||||
|
|
||||||
|
The current VPP VxLAN tunnel plugin does point to point tunnels, that is they are configured with a
|
||||||
|
source address, destination address, destination port and VNI. As I mentioned, a point to point
|
||||||
|
ethernet transport is configured very easily:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.254 vni 8298 instance 0
|
||||||
|
vpp0# set int l2 xconnect vxlan_tunnel0 HundredGigabitEthernet10/0/0
|
||||||
|
vpp0# set int l2 xconnect HundredGigabitEthernet10/0/0 vxlan_tunnel0
|
||||||
|
vpp0# set int state vxlan_tunnel0 up
|
||||||
|
vpp0# set int state HundredGigabitEthernet10/0/0 up
|
||||||
|
|
||||||
|
vpp1# create vxlan tunnel src 192.0.2.254 dst 192.0.2.1 vni 8298 instance 0
|
||||||
|
vpp1# set int l2 xconnect vxlan_tunnel0 HundredGigabitEthernet10/0/1
|
||||||
|
vpp1# set int l2 xconnect HundredGigabitEthernet10/0/1 vxlan_tunnel0
|
||||||
|
vpp1# set int state vxlan_tunnel0 up
|
||||||
|
vpp1# set int state HundredGigabitEthernet10/0/1 up
|
||||||
|
```
|
||||||
|
|
||||||
|
And with that, `vpp0:Hu10/0/0` is cross connected with `vpp1:Hu10/0/1` and ethernet flows between
|
||||||
|
the two.
|
||||||
|
|
||||||
|
### VPP: Bridge Domains
|
||||||
|
|
||||||
|
Now consider a VPLS with five different routers. While it's possible to create a bridge-domain and add
|
||||||
|
some local ports and four other VxLAN tunnels:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp0# create bridge-domain 8298
|
||||||
|
vpp0# set int l2 bridge HundredGigabitEthernet10/0/1 8298
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.2 vni 8298 instance 0
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.3 vni 8298 instance 1
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.4 vni 8298 instance 2
|
||||||
|
vpp0# create vxlan tunnel src 192.0.2.1 dst 192.0.2.5 vni 8298 instance 3
|
||||||
|
vpp0# set int l2 bridge vxlan_tunnel0 8298
|
||||||
|
vpp0# set int l2 bridge vxlan_tunnel1 8298
|
||||||
|
vpp0# set int l2 bridge vxlan_tunnel2 8298
|
||||||
|
vpp0# set int l2 bridge vxlan_tunnel3 8298
|
||||||
|
```
|
||||||
|
|
||||||
|
To make this work, I will have to replicate this configuration to all other `vpp1`-`vpp4` routers.
|
||||||
|
While it does work, it's really not very practical. When other VPP instances get added to a VPLS,
|
||||||
|
every other router will have to have a new VxLAN tunnel created and added to its local bridge
|
||||||
|
domain. Consider 1000s of VPLS instances on 100s of routers, it would yield ~100'000 VxLAN tunnels
|
||||||
|
on every router, yikes!
|
||||||
|
|
||||||
|
Such a configuration reminds me in a way of iBGP in a large network: the naive approach is to have a
|
||||||
|
full mesh of all routers speaking to all other routers, but that quickly becomes a maintenance
|
||||||
|
headache. The canonical solution for this is to create iBGP _Route Reflectors_ to which every router
|
||||||
|
connects, and their job is to redistribute routing information between the fleet of routers. This
|
||||||
|
turns the iBGP problem from an O(N^2) to an O(N) problem: all 1'000 routers connect to, say, three
|
||||||
|
regional route reflectors for a total of 3'000 BGP connections, which is much better than ~1'000'000
|
||||||
|
BGP connections in the naive approach.
|
||||||
|
|
||||||
|
## Recap: eVPN Moving parts
|
||||||
|
|
||||||
|
The reason why I got so enthusiastic when I was playing with Arista and Nokia's eVPN stuff, is
|
||||||
|
because it requires very little dataplane configuration, and a relatively intuitive controlplane
|
||||||
|
configuration:
|
||||||
|
|
||||||
|
1. **Dataplane**: For each L2 broadcast domain (be it a L2XC or a Bridge Domain), really all I
|
||||||
|
need is a single VxLAN interface with a given VNI, which should be able to send encapsulated
|
||||||
|
ethernet frames to one more more other speakers in the same domain.
|
||||||
|
1. **Controlplane**: I will need to learn MAC addresses locally, and inform some BGP eVPN
|
||||||
|
implementation of who-lives-where. Other VxLAN speakers learn of the MAC addresses I own, and
|
||||||
|
will send me encapsulated ethernet for those addresses
|
||||||
|
1. **Dataplane**: For unknown layer2 destinations, like _Broadcast_, _Unknown Unicast_, and
|
||||||
|
_Multicast_ (BUM) traffic, I will want to keep track of which other VxLAN speakers these
|
||||||
|
packets should be flooded. I make note that this is not that different to flooding the packets
|
||||||
|
to local interfaces, except here it'd be flooding them to remote VxLAN endpoints.
|
||||||
|
1. **ControlPlane**: Flooding L2 traffic across wide area networks is typically considered icky,
|
||||||
|
so a few tricks might be optionally deployed. Since the controlplane already knows which MAC
|
||||||
|
lives where, it may as well also make note of any local IPv6 ARP and IPv6 neighbor discovery
|
||||||
|
replies and teach its peers which IPv4/IPv6 addresses live where: a distributed neighbor table.
|
||||||
|
|
||||||
|
{{< image width="6em" float="left" src="/assets/shared/brain.png" alt="brain" >}}
|
||||||
|
|
||||||
|
For the controlplane parts, [[FRRouting](https://frrouting.org/)] has a working implementation for
|
||||||
|
L2 (MAC-VRF) and L3 (IP-VRF). My favorite, [[Bird](https://birg.nic.cz/)], is slowly catching up, and
|
||||||
|
has a few of these controlplane parts already working (mostly MAC-VRF). Commercial vendors like Arista,
|
||||||
|
Nokia, Juniper, Cisco are ready to go. If we want VPP to inter-operate, we may need to make a few
|
||||||
|
changes.
|
||||||
|
|
||||||
|
## VPP: Changes needed
|
||||||
|
|
||||||
|
### Dynamic VxLAN
|
||||||
|
|
||||||
|
I propose two changes to the VxLAN plugin, or perhaps, a new plugin that changes the behavior so that
|
||||||
|
we don't have to break any performance or functional promises to existing users. This new VxLAN
|
||||||
|
interface behavior changes in the following ways:
|
||||||
|
|
||||||
|
1. Each VxLAN interface has a local L2FIB attached to it, the keys are MAC address and the
|
||||||
|
values are remote VTEPs. In its simplest form, the values would be just IPv4 or IPv6 addresses,
|
||||||
|
because I can re-use the VNI and port information from the tunnel definition itself.
|
||||||
|
|
||||||
|
1. Each VxLAN interface has a local flood-list attached to it. This list contains remote VTEPs
|
||||||
|
that I am supposed to send 'flood' packets to. Similar to the Bridge Domain, when packets are marked
|
||||||
|
for flooding, I will need to prepare and replicate them, sending them to each VTEP.
|
||||||
|
|
||||||
|
|
||||||
|
A set of APIs will be needed to manipulate these:
|
||||||
|
* ***Interface***: I will need to have an interface create, delete and list call, which will
|
||||||
|
be able to maintain the interfaces, their metadata like source address, source/destination port,
|
||||||
|
VNI and such.
|
||||||
|
* ***L2FIB***: I will need to add, replace, delete, and list which MAC addresses go where,
|
||||||
|
With such a table, each time a packet is handled for a given Dynamic VxLAN interface, the
|
||||||
|
dst_addr can be written into the packet.
|
||||||
|
* ***Flooding***: For those packets that are not unicast (BUM), I will need to be able to add,
|
||||||
|
remove and list which VTEPs should receive this packet.
|
||||||
|
|
||||||
|
It would be pretty dope if the configuration looked something like this:
|
||||||
|
```
|
||||||
|
vpp# create evpn-vxlan src <v46address> dst-port <port> vni <vni> instance <id>
|
||||||
|
vpp# evpn-vxlan l2fib <iface> mac <mac> dst <v46address> [del]
|
||||||
|
vpp# evpn-vxlan flood <iface> dst <v46address> [del]
|
||||||
|
```
|
||||||
|
|
||||||
|
The VxLAN underlay transport can be either IPv4 or IPv6. Of course manipulating L2FIB or Flood
|
||||||
|
destinations must match the address family of an interface of type evpn-vxlan. A practical example
|
||||||
|
might be:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp# create evpn-vxlan src 2001:db8::1 dst-port 4789 vni 8298 instance 6
|
||||||
|
vpp# evpn-vxlan l2fib evpn-vxlan0 mac 00:01:02:82:98:02 dst 2001:db8::2
|
||||||
|
vpp# evpn-vxlan l2fib evpn-vxlan0 mac 00:01:02:82:98:03 dst 2001:db8::3
|
||||||
|
vpp# evpn-vxlan flood evpn-vxlan0 dst 2001:db8::2
|
||||||
|
vpp# evpn-vxlan flood evpn-vxlan0 dst 2001:db8::3
|
||||||
|
vpp# evpn-vxlan flood evpn-vxlan0 dst 2001:db8::4
|
||||||
|
```
|
||||||
|
|
||||||
|
By the way, while this _could_ be a new plugin, it could also just be added to the existing VxLAN
|
||||||
|
plugin. One way in which I might do this when creating a normal vxlan tunnel is to allow for its
|
||||||
|
destination address to be either 0.0.0.0 for IPv4 or :: for IPv6. That would signal 'dynamic'
|
||||||
|
tunneling, upon which the L2FIB and Flood lists are used. It would slow down each VxLAN packet by
|
||||||
|
the time it takes to call `ip46_address_is_zero()` which is only a handfull of clocks.
|
||||||
|
|
||||||
|
### Bridge Domain
|
||||||
|
|
||||||
|
{{< image width="6em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
|
||||||
|
|
||||||
|
It's important to understand that L2 learning is **required** for eVPN to function. Each router
|
||||||
|
needs to be able to tell the iBGP eVPN session which MAC addresses should be forwarded to it. This
|
||||||
|
rules out the simple case of L2XC because there, no learning is performed. The corollary is that a
|
||||||
|
bridge-domain is required for any form of eVPN.
|
||||||
|
|
||||||
|
The L2 code in VPP already does most of what I'd need. It maintains an L2FIB in `vnet/l2/l2_fib.c`,
|
||||||
|
which is keyed by bridge-id and MAC address, and its values are a 64 bit structure that points
|
||||||
|
essentially to a `sw_if_index` output interface. The L2FIB of the eVPN needs a bit more information
|
||||||
|
though, notably a `ip46address` struct to know which VTEP to send to. It's tempting to add this
|
||||||
|
extra data to the bridge domain code. I would recommend against it, because other implementations,
|
||||||
|
for example MPLS, GENEVE or Carrier Pigeon IP may need more than just the destination address. Even
|
||||||
|
the VxLAN implementation I'm thinking about might want to be able to override other things like the
|
||||||
|
destination port for a given VTEP, or even the VNI. Putting all of this stuff in the bridge-domain
|
||||||
|
code will just clutter it, for all users, not just those users who might want eVPN.
|
||||||
|
|
||||||
|
Similarly, one might argue it is tempting to re-use/extend the behavior in `vnet/l2/l2_flood.c`,
|
||||||
|
because if it's already replicating BUM traffic, why not replicate it many times over the flood list
|
||||||
|
for any member interface that happens to be a dynamic VxLAN interface? This would be a bad idea
|
||||||
|
because of a few reasons. Firstly, it is not guaranteed that the VxLAN plugin is loaded, and in
|
||||||
|
doing this, I would leak internal details of VxLAN into the bridge-domain code. Secondly, the
|
||||||
|
`l2_flood.c` code would potentially get messy if other types were added (like the MPLS and GENEVE
|
||||||
|
above).
|
||||||
|
|
||||||
|
A reasonable request is to mark such BUM frames once in the existing L2 code and when handing the
|
||||||
|
replicated packet into the VxLAN node, to see the `is_bum` marker and once again replicate -- in the
|
||||||
|
vxlan plugin -- these packets to the VTEPs in our local flood-list. Although a bit more work, this
|
||||||
|
approach only requires a tiny amount of work in the `l2_flood.c` code (the marking), and will keep
|
||||||
|
all the logic tucked away where it is relevant, derisking the VPP vnet codebase.
|
||||||
|
|
||||||
|
Fundamentally, I think the cleanest design is to keep the dynamic VxLAN interface fully
|
||||||
|
self-contained and it would therefor maintain its own L2FIB and Flooding logic. The only thing I
|
||||||
|
would add to the L2 codebase is some form of BUM marker to allow for efficient flooding.
|
||||||
|
|
||||||
|
### Control Plane
|
||||||
|
|
||||||
|
There's a few things the control plane has to do. Some external agent, like FRR or Bird, will be
|
||||||
|
receiving a few types of eVPN messages. The ones I'm interested in are:
|
||||||
|
|
||||||
|
* ***Type 2***: MAC/IP Advertisement Route
|
||||||
|
- On the way in, these should be fed to the VxLAN L2FIB belonging to the bridge-domain.
|
||||||
|
- On the way out, learned addresses should be advertised to peers.
|
||||||
|
- Regarding IPv4/IPv6 addresses, that is the ARP / ND tables: we can talk about those later.
|
||||||
|
* ***Type 3***: Inclusive Multicast Ethernet Tag Route
|
||||||
|
- On the way in, these will populate the VxLAN Flood list belonging to the bridge-domain
|
||||||
|
- On the way out, each bridge-domain should advertise itself as IMET to peers.
|
||||||
|
* ***Type 5***: IP Prefix Route
|
||||||
|
- Similar to IP information in Type 2, we can talk about those later once L3VPN/eVPN is
|
||||||
|
needed.
|
||||||
|
|
||||||
|
The 'on the way in' stuff can be easily done with my proposed APIs in the Dynamic VxLAN (or a new
|
||||||
|
eVPN VxLAN) plugin. Adding, removing, listing L2FIB and Flood lists is easy as far as VPP is
|
||||||
|
concerned. It's just that the controlplane implementation needs to somehow _feed_ the API, so an
|
||||||
|
external program may be needed, or alterntively the Linux Control Plane netlink plugin might be used
|
||||||
|
to consume this information.
|
||||||
|
|
||||||
|
The 'on the way out' stuff is a bit trickier. I will need to listen to creation of new broadcast
|
||||||
|
domains and associate them with the right IMET announcements, and for each MAC address learned, pick
|
||||||
|
them up and advertise them into eVPN. Later, if ever ARP and ND proxying becomes important, I'll
|
||||||
|
have to revisit the bridge-domain feature to do IPv4 ARP and IPv6 Neighbor Discovery, and replace it
|
||||||
|
with some code that populates the IPv4/IPv6 parts of the Type2 messages on the way out, and
|
||||||
|
similarly on the way in, populates an L3 neighbor cache for the bridge domain, so ARP and ND replies
|
||||||
|
can be synthesized based on what we've learned in eVPN.
|
||||||
|
|
||||||
|
# Demonstration
|
||||||
|
|
||||||
|
### VPP: Current VxLAN
|
||||||
|
|
||||||
|
I'll build a small demo environment on Summer to show how the interaction of VxLAN and Bridge
|
||||||
|
Domain works today:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp# create tap host-if-name dummy0 host-mtu-size 9216 host-ip4-addr 192.0.2.1/24
|
||||||
|
vpp# set int state tap0 up
|
||||||
|
vpp# set int ip address tap0 192.0.2.1/24
|
||||||
|
vpp# set ip neighbor tap0 192.0.2.254 01:02:03:82:98:fe static
|
||||||
|
vpp# set ip neighbor tap0 192.0.2.2 01:02:03:82:98:02 static
|
||||||
|
vpp# set ip neighbor tap0 192.0.2.3 01:02:03:82:98:03 static
|
||||||
|
|
||||||
|
vpp# create vxlan tunnel src 192.0.2.1 dst 192.0.2.254 vni 8298
|
||||||
|
vpp# set int state vxlan_tunnel0 up
|
||||||
|
|
||||||
|
vpp# create tap host-if-name vpptap0 host-mtu-size 9216 hw-addr 02:fe:64:dc:1b:82
|
||||||
|
vpp# set int state tap1 up
|
||||||
|
|
||||||
|
vpp# create bridge-domain 8298
|
||||||
|
vpp# set int l2 bridge tap1 8298
|
||||||
|
vpp# set int l2 bridge vxlan_tunnel0 8298
|
||||||
|
```
|
||||||
|
|
||||||
|
I've created a tap device called `dummy0` and gave it an IPv4 address. Normally, I would use some
|
||||||
|
DPDK or RDMA interface like `TenGigabutEthernet10/0/0`. Then I'll populate some static ARP entries.
|
||||||
|
Again, normally this would just be 'use normal routing'. However, for the purposes of this
|
||||||
|
demonstration, it helps to use a TAP device, as any packets I make VPP send to those 192.0.2.254 and
|
||||||
|
so on, can be captured with `tcpdump` in Linux in addition to `trace add` in VPP.
|
||||||
|
|
||||||
|
Then, I create a VxLAN tunnel with a default destination of 192.0.2.254 and the given VNI.
|
||||||
|
Next, I create a TAP interface called `vpptap0` with the given MAC address.
|
||||||
|
Finally, I bind these two interfaces together in a bridge-domain.
|
||||||
|
|
||||||
|
I proceed to write a small ScaPY program:
|
||||||
|
|
||||||
|
```python
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
|
||||||
|
from scapy.all import Ether, IP, UDP, Raw, sendp
|
||||||
|
|
||||||
|
pkt = Ether(dst="01:02:03:04:05:02", src="02:fe:64:dc:1b:82", type=0x0800)
|
||||||
|
/ IP(src="192.168.1.1", dst="192.168.1.2")
|
||||||
|
/ UDP(sport=8298, dport=7) / Raw(load=b"ping")
|
||||||
|
print(pkt)
|
||||||
|
sendp(pkt, iface="vpptap0")
|
||||||
|
|
||||||
|
pkt = Ether(dst="01:02:03:04:05:03", src="02:fe:64:dc:1b:82", type=0x0800)
|
||||||
|
/ IP(src="192.168.1.1", dst="192.168.1.3")
|
||||||
|
/ UDP(sport=8298, dport=7) / Raw(load=b"ping")
|
||||||
|
print(pkt)
|
||||||
|
sendp(pkt, iface="vpptap0")
|
||||||
|
```
|
||||||
|
|
||||||
|
What will happen is, the ScaPY program will emit these frames into device `vpptap0` which is in
|
||||||
|
bridge-domain 8298. The bridge will learn our src MAC `02:fe:64:dc:1b:82`, and look up the dst MAC
|
||||||
|
`01:02:03:04:05:02`, and because there hasn't been traffic yet, it'll flood to all member ports, one
|
||||||
|
of which is the VxLAN tunnel. VxLAN will then encapsulate the packets to the other side of the
|
||||||
|
tunnel.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ sudo ./vxlan-test.py
|
||||||
|
Ether / IP / UDP 192.168.1.1:8298 > 192.168.1.2:echo / Raw
|
||||||
|
Ether / IP / UDP 192.168.1.1:8298 > 192.168.1.3:echo / Raw
|
||||||
|
|
||||||
|
pim@summer:~$ sudo tcpdump -evni dummy0
|
||||||
|
10:50:35.310620 02:fe:72:52:38:53 > 01:02:03:82:98:fe, ethertype IPv4 (0x0800), length 96:
|
||||||
|
(tos 0x0, ttl 253, id 0, offset 0, flags [none], proto UDP (17), length 82)
|
||||||
|
192.0.2.1.6345 > 192.0.2.254.4789: VXLAN, flags [I] (0x08), vni 8298
|
||||||
|
02:fe:64:dc:1b:82 > 01:02:03:04:05:02, ethertype IPv4 (0x0800), length 46:
|
||||||
|
(tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 32)
|
||||||
|
192.168.1.1.8298 > 192.168.1.2.7: UDP, length 4
|
||||||
|
10:50:35.362552 02:fe:72:52:38:53 > 01:02:03:82:98:fe, ethertype IPv4 (0x0800), length 96:
|
||||||
|
(tos 0x0, ttl 253, id 0, offset 0, flags [none], proto UDP (17), length 82)
|
||||||
|
192.0.2.1.23916 > 192.0.2.254.4789: VXLAN, flags [I] (0x08), vni 8298
|
||||||
|
02:fe:64:dc:1b:82 > 01:02:03:04:05:03, ethertype IPv4 (0x0800), length 46:
|
||||||
|
(tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 32)
|
||||||
|
192.168.1.1.8298 > 192.168.1.3.7: UDP, length 4
|
||||||
|
```
|
||||||
|
|
||||||
|
I want to point out that nothing, so far, is special. All of this works with upstream VPP just fine.
|
||||||
|
I can see two VxLAN encapsulated packets, both destined to `192.0.2.254:4789`. Cool.
|
||||||
|
|
||||||
|
### Dynamic VPP VxLAN
|
||||||
|
|
||||||
|
I wrote a prototype for a Dynamic VxLAN tunnel in [[43433](https://gerrit.fd.io/r/c/vpp/+/43433)].
|
||||||
|
The good news is, this works. The bad news is, I think I'll want to discuss my proposal (this
|
||||||
|
article) with the community before going further down a potential rabbit hole.
|
||||||
|
|
||||||
|
With my gerrit patched in, I can do the following:
|
||||||
|
|
||||||
|
```
|
||||||
|
vpp# vxlan l2fib vxlan_tunnel0 mac 01:02:03:04:05:02 dst 192.0.2.2
|
||||||
|
Added VXLAN dynamic destination for 01:02:03:04:05:02 on vxlan_tunnel0 dst 192.0.2.2
|
||||||
|
vpp# vxlan l2fib vxlan_tunnel0 mac 01:02:03:04:05:03 dst 192.0.2.3
|
||||||
|
Added VXLAN dynamic destination for 01:02:03:04:05:03 on vxlan_tunnel0 dst 192.0.2.3
|
||||||
|
|
||||||
|
vpp# show vxlan l2fib
|
||||||
|
VXLAN Dynamic L2FIB entries:
|
||||||
|
MAC Interface Destination Port VNI
|
||||||
|
01:02:03:04:05:02 vxlan_tunnel0 192.0.2.2 4789 8298
|
||||||
|
01:02:03:04:05:03 vxlan_tunnel0 192.0.2.3 4789 8298
|
||||||
|
Dynamic L2FIB entries: 2
|
||||||
|
```
|
||||||
|
|
||||||
|
I've instructed the VxLAN tunnel to change the tunnel destination based on the destination MAC.
|
||||||
|
|
||||||
|
|
||||||
|
I run the script and tcpdump again:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@summer:~$ sudo tcpdump -evni dummy0
|
||||||
|
11:16:53.834619 02:fe:fe:ae:0d:a3 > 01:02:03:82:98:fe, ethertype IPv4 (0x0800), length 96:
|
||||||
|
(tos 0x0, ttl 253, id 0, offset 0, flags [none], proto UDP (17), length 82, bad cksum 3945 (->3997)!)
|
||||||
|
192.0.2.1.6345 > 192.0.2.2.4789: VXLAN, flags [I] (0x08), vni 8298
|
||||||
|
02:fe:64:dc:1b:82 > 01:02:03:04:05:02, ethertype IPv4 (0x0800), length 46:
|
||||||
|
(tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 32)
|
||||||
|
192.168.1.1.8298 > 192.168.1.2.7: UDP, length 4
|
||||||
|
11:16:53.882554 02:fe:fe:ae:0d:a3 > 01:02:03:82:98:fe, ethertype IPv4 (0x0800), length 96:
|
||||||
|
(tos 0x0, ttl 253, id 0, offset 0, flags [none], proto UDP (17), length 82, bad cksum 3944 (->3996)!)
|
||||||
|
192.0.2.1.23916 > 192.0.2.3.4789: VXLAN, flags [I] (0x08), vni 8298
|
||||||
|
02:fe:64:dc:1b:82 > 01:02:03:04:05:03, ethertype IPv4 (0x0800), length 46:
|
||||||
|
(tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 32)
|
||||||
|
192.168.1.1.8298 > 192.168.1.3.7: UDP, length 4
|
||||||
|
```
|
||||||
|
|
||||||
|
Two important notes: Firstly, this works! For the MAC address ending in `:02`, send the packet to
|
||||||
|
`192.0.2.2` instead of the default of `192.0.2.254`. Same for the `:03` MAC which now goes to
|
||||||
|
`192.0.2.3`. Nice! But secondly, the IPv4 header of the VxLAN packets was changed, so there needs to
|
||||||
|
be a call to `ip4_header_checksum()` inserted somewhere. That's an easy fix.
|
||||||
|
|
||||||
|
# What's next
|
||||||
|
|
||||||
|
I want to discuss a few things, perhaps at an upcoming VPP Community meeting. Notably:
|
||||||
|
1. Is the VPP Developer community supportive of adding eVPN support? Does anybody want to help
|
||||||
|
write it with me?
|
||||||
|
1. Is changing the existing VxLAN plugin appropriate, or should I make a new plugin which adds
|
||||||
|
dynamic endpoints, L2FIB and Flood lists for BUM traffic?
|
||||||
|
1. Is it acceptable for me to add a BUM marker in `l2_flood.c` so that I can reuse all the logic
|
||||||
|
from bridge-domain flooding as I extend to also do VTEP flooding?
|
||||||
|
1. (perhaps later) VxLAN is the canonical underlay, but is there an appetite to extend also to,
|
||||||
|
say, GENEVE or MPLS?
|
||||||
|
1. (perhaps later) What's a good way to tie in a controlplane like FRRouting or Bird2 into the
|
||||||
|
dataplane (perhaps using a sidecar controller, or perhaps using Linux CP Netlink messages)?
|
||||||
|
|
701
content/articles/2025-07-26-ctlog-1.md
Normal file
701
content/articles/2025-07-26-ctlog-1.md
Normal file
@@ -0,0 +1,701 @@
|
|||||||
|
---
|
||||||
|
date: "2025-07-26T22:07:23Z"
|
||||||
|
title: 'Certificate Transparency - Part 1 - TesseraCT'
|
||||||
|
aliases:
|
||||||
|
- /s/articles/2025/07/26/certificate-transparency-part-1/
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image width="10em" float="right" src="/assets/ctlog/ctlog-logo-ipng.png" alt="ctlog logo" >}}
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
There once was a Dutch company called [[DigiNotar](https://en.wikipedia.org/wiki/DigiNotar)], as the
|
||||||
|
name suggests it was a form of _digital notary_, and they were in the business of issuing security
|
||||||
|
certificates. Unfortunately, in June of 2011, their IT infrastructure was compromised and
|
||||||
|
subsequently it issued hundreds of fraudulent SSL certificates, some of which were used for
|
||||||
|
man-in-the-middle attacks on Iranian Gmail users. Not cool.
|
||||||
|
|
||||||
|
Google launched a project called **Certificate Transparency**, because it was becoming more common
|
||||||
|
that the root of trust given to _Certification Authorities_ could no longer be unilaterally trusted.
|
||||||
|
These attacks showed that the lack of transparency in the way CAs operated was a significant risk to
|
||||||
|
the Web Public Key Infrastructure. It led to the creation of this ambitious
|
||||||
|
[[project](https://certificate.transparency.dev/)] to improve security online by bringing
|
||||||
|
accountability to the system that protects our online services with _SSL_ (Secure Socket Layer)
|
||||||
|
and _TLS_ (Transport Layer Security).
|
||||||
|
|
||||||
|
In 2013, [[RFC 6962](https://datatracker.ietf.org/doc/html/rfc6962)] was published by the IETF. It
|
||||||
|
describes an experimental protocol for publicly logging the existence of Transport Layer Security
|
||||||
|
(TLS) certificates as they are issued or observed, in a manner that allows anyone to audit
|
||||||
|
certificate authority (CA) activity and notice the issuance of suspect certificates as well as to
|
||||||
|
audit the certificate logs themselves. The intent is that eventually clients would refuse to honor
|
||||||
|
certificates that do not appear in a log, effectively forcing CAs to add all issued certificates to
|
||||||
|
the logs.
|
||||||
|
|
||||||
|
This series explores and documents how IPng Networks will be running two Static CT _Logs_ with two
|
||||||
|
different implementations. One will be [[Sunlight](https://sunlight.dev/)], and the other will be
|
||||||
|
[[TesseraCT](https://github.com/transparency-dev/tesseract)].
|
||||||
|
|
||||||
|
## Static Certificate Transparency
|
||||||
|
|
||||||
|
In this context, _Logs_ are network services that implement the protocol operations for submissions
|
||||||
|
and queries that are defined in a specification that builds on the previous RFC. A few years ago,
|
||||||
|
my buddy Antonis asked me if I would be willing to run a log, but operationally they were very
|
||||||
|
complex and expensive to run. However, over the years, the concept of _Static Logs_ put running one
|
||||||
|
in reach. This [[Static CT API](https://github.com/C2SP/C2SP/blob/main/static-ct-api.md)] defines a
|
||||||
|
read-path HTTP static asset hierarchy (for monitoring) to be implemented alongside the write-path
|
||||||
|
RFC 6962 endpoints (for submission).
|
||||||
|
|
||||||
|
Aside from the different read endpoints, a log that implements the Static API is a regular CT log
|
||||||
|
that can work alongside RFC 6962 logs and that fulfills the same purpose. In particular, it requires
|
||||||
|
no modification to submitters and TLS clients.
|
||||||
|
|
||||||
|
If you only read one document about Static CT, read Filippo Valsorda's excellent
|
||||||
|
[[paper](https://filippo.io/a-different-CT-log)]. It describes a radically cheaper and easier to
|
||||||
|
operate [[Certificate Transparency](https://certificate.transparency.dev/)] log that is backed by a
|
||||||
|
consistent object storage, and can scale to 30x the current issuance rate for 2-10% of the costs
|
||||||
|
with no merge delay.
|
||||||
|
|
||||||
|
## Scalable, Cheap, Reliable: choose two
|
||||||
|
|
||||||
|
{{< image width="18em" float="right" src="/assets/ctlog/MPLS Backbone - CTLog.svg" alt="ctlog at ipng" >}}
|
||||||
|
|
||||||
|
In the diagram, I've drawn an overview of IPng's network. In {{< boldcolor color="red" >}}red{{<
|
||||||
|
/boldcolor >}} a european backbone network is provided by a [[BGP Free Core
|
||||||
|
network]({{< ref 2022-12-09-oem-switch-2 >}})]. It operates a private IPv4, IPv6, and MPLS network, called
|
||||||
|
_IPng Site Local_, which is not connected to the internet. On top of that, IPng offers L2 and L3
|
||||||
|
services, for example using [[VPP]({{< ref 2021-02-27-network >}})].
|
||||||
|
|
||||||
|
In {{< boldcolor color="lightgreen" >}}green{{< /boldcolor >}} I built a cluster of replicated
|
||||||
|
NGINX frontends. They connect into _IPng Site Local_ and can reach all hypervisors, VMs, and storage
|
||||||
|
systems. They also connect to the Internet with a single IPv4 and IPv6 address. One might say that
|
||||||
|
SSL is _added and removed here :-)_ [[ref](/assets/ctlog/nsa_slide.jpg)].
|
||||||
|
|
||||||
|
Then in {{< boldcolor color="orange" >}}orange{{< /boldcolor >}} I built a set of [[MinIO]({{< ref
|
||||||
|
2025-05-28-minio-1 >}})] S3 storage pools. Amongst others, I serve the static content from the IPng
|
||||||
|
website from these pools, providing fancy redundancy and caching. I wrote about its design in [[this
|
||||||
|
article]({{< ref 2025-06-01-minio-2 >}})].
|
||||||
|
|
||||||
|
Finally, I turn my attention to the {{< boldcolor color="blue" >}}blue{{< /boldcolor >}} which is
|
||||||
|
two hypervisors, one run by [[IPng](https://ipng.ch/)] and the other by [[Massar](https://massars.net/)]. Each
|
||||||
|
of them will be running one of the _Log_ implementations. IPng provides two large ZFS storage tanks
|
||||||
|
for offsite backup, in case a hypervisor decides to check out, and daily backups to an S3 bucket
|
||||||
|
using Restic.
|
||||||
|
|
||||||
|
Having explained all of this, I am well aware that end to end reliability will be coming from the
|
||||||
|
fact that there are many independent _Log_ operators, and folks wanting to validate certificates can
|
||||||
|
simply monitor many. If there is a gap in coverage, say due to any given _Log_'s downtime, this will
|
||||||
|
not necessarily be problematic. It does mean that I may have to suppress the SRE in me...
|
||||||
|
|
||||||
|
## MinIO
|
||||||
|
|
||||||
|
My first instinct is to leverage the distributed storage IPng has, but as I'll show in the rest of
|
||||||
|
this article, maybe a simpler, more elegant design could be superior, precisely because individual
|
||||||
|
log reliability is not _as important_ as having many available log _instances_ to choose from.
|
||||||
|
|
||||||
|
From operators in the field I understand that the world-wide generation of certificates is roughly
|
||||||
|
17M/day, which amounts of some 200-250qps of writes. Antonis explains that certs with a validity
|
||||||
|
if 180 days or less will need two CT log entries, while certs with a validity more than 180d will
|
||||||
|
need three CT log entries. So the write rate is roughly 2.2x that, as an upper bound.
|
||||||
|
|
||||||
|
My first thought is to see how fast my open source S3 machines can go, really. I'm curious also as
|
||||||
|
to the difference between SSD and spinning disks.
|
||||||
|
|
||||||
|
I boot two Dell R630s in the Lab. These machines have two Xeon E5-2640 v4 CPUs for a total of 20
|
||||||
|
cores and 40 threads, and 512GB of DDR4 memory. They also sport a SAS controller. In one machine I
|
||||||
|
place 6pcs 1.2TB SAS3 disks (HPE part number EG1200JEHMC), and in the second machine I place 6pcs
|
||||||
|
of 1.92TB enterprise storage (Samsung part number P1633N19).
|
||||||
|
|
||||||
|
I spin up a 6-device MinIO cluster on both and take them out for a spin using [[S3
|
||||||
|
Benchmark](https://github.com/wasabi-tech/s3-benchmark.git)] from Wasabi Tech.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/s3-benchmark$ for dev in disk ssd; do \
|
||||||
|
for t in 1 8 32; do \
|
||||||
|
for z in 4M 1M 8k 4k; do \
|
||||||
|
./s3-benchmark -a $KEY -s $SECRET -u http://minio-$dev:9000 -t $t -z $z \
|
||||||
|
| tee -a minio-results.txt; \
|
||||||
|
done; \
|
||||||
|
done; \
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
The loadtest above does a bunch of runs with varying parameters. First it tries to read and write
|
||||||
|
object sizes of 4MB, 1MB, 8kB and 4kB respectively. Then it tries to do this with either 1 thread, 8
|
||||||
|
threads or 32 threads. Finally it tests both the disk-based variant as well as the SSD based one.
|
||||||
|
The loadtest runs from a third machine, so that the Dell R630 disk tanks can stay completely
|
||||||
|
dedicated to their task of running MinIO.
|
||||||
|
|
||||||
|
{{< image width="100%" src="/assets/ctlog/minio_8kb_performance.png" alt="MinIO 8kb disk vs SSD" >}}
|
||||||
|
|
||||||
|
The left-hand side graph feels pretty natural to me. With one thread, uploading 8kB objects will
|
||||||
|
quickly hit the IOPS rate of the disks, each of which have to participate in the write due to EC:3
|
||||||
|
encoding when using six disks, and it tops out at ~56 PUT/s. The single thread hitting SSDs will not
|
||||||
|
hit that limit, and has ~371 PUT/s which I found a bit underwhelming. But, when performing the
|
||||||
|
loadtest with either 8 or 32 write threads, the hard disks become only marginally faster (topping
|
||||||
|
out at 240 PUT/s), while the SSDs really start to shine, with 3850 PUT/s. Pretty good performance.
|
||||||
|
|
||||||
|
On the read-side, I am pleasantly surprised that there's not really that much of a difference
|
||||||
|
between disks and SSDs. This is likely because the host filesystem cache is playing a large role, so
|
||||||
|
the 1-thread performance is equivalent (765 GET/s for disks, 677 GET/s for SSDs), and the 32-thread
|
||||||
|
performance is also equivalent (at 7624 GET/s for disks with 7261 GET/s for SSDs). I do wonder why
|
||||||
|
the hard disks consistently outperform the SSDs with all the other variables (OS, MinIO version,
|
||||||
|
hardware) the same.
|
||||||
|
|
||||||
|
## Sidequest: SeaweedFS
|
||||||
|
|
||||||
|
Something that has long caught my attention is the way in which
|
||||||
|
[[SeaweedFS](https://github.com/seaweedfs/seaweedfs)] approaches blob storage. Many operators have
|
||||||
|
great success with many small file writes in SeaweedFS compared to MinIO and even AWS S3 storage.
|
||||||
|
This is because writes with WeedFS are not broken into erasure-sets, which would require every disk
|
||||||
|
to write a small part or checksum of the data, but rather files are replicated within the cluster in
|
||||||
|
their entirety on different disks, racks or datacenters. I won't bore you with the details of
|
||||||
|
SeaweedFS but I'll tack on a docker [[compose file](/assets/ctlog/seaweedfs.docker-compose.yml)]
|
||||||
|
that I used at the end of this article, if you're curious.
|
||||||
|
|
||||||
|
{{< image width="100%" src="/assets/ctlog/size_comparison_8t.png" alt="MinIO vs SeaWeedFS" >}}
|
||||||
|
|
||||||
|
In the write-path, SeaweedFS dominates in all cases, due to its different way of achieving durable
|
||||||
|
storage (per-file replication in SeaweedFS versus all-disk erasure-sets in MinIO):
|
||||||
|
* 4k: 3,384 ops/sec vs MinIO's 111 ops/sec (30x faster!)
|
||||||
|
* 8k: 3,332 ops/sec vs MinIO's 111 ops/sec (30x faster!)
|
||||||
|
* 1M: 383 ops/sec vs MinIO's 44 ops/sec (9x faster)
|
||||||
|
* 4M: 104 ops/sec vs MinIO's 32 ops/sec (4x faster)
|
||||||
|
|
||||||
|
For the read-path, in GET operations MinIO is better at small objects, and really dominates the
|
||||||
|
large objects:
|
||||||
|
* 4k: 7,411 ops/sec vs SeaweedFS 5,014 ops/sec
|
||||||
|
* 8k: 7,666 ops/sec vs SeaweedFS 5,165 ops/sec
|
||||||
|
* 1M: 5,466 ops/sec vs SeaweedFS 2,212 ops/sec
|
||||||
|
* 4M: 3,084 ops/sec vs SeaweedFS 646 ops/sec
|
||||||
|
|
||||||
|
This makes me draw an interesting conclusion: seeing as CT Logs are read/write heavy (every couple
|
||||||
|
of seconds, the Merkle tree is recomputed which is reasonably disk-intensive), SeaweedFS might be a
|
||||||
|
slight better choice. IPng Networks has three MinIO deployments, but no SeaweedFS deployments. Yet.
|
||||||
|
|
||||||
|
# Tessera
|
||||||
|
|
||||||
|
[[Tessera](https://github.com/transparency-dev/tessera.git)] is a Go library for building tile-based
|
||||||
|
transparency logs (tlogs) [[ref](https://github.com/C2SP/C2SP/blob/main/tlog-tiles.md)]. It is the
|
||||||
|
logical successor to the approach that Google took when building and operating _Logs_ using its
|
||||||
|
predecessor called [[Trillian](https://github.com/google/trillian)]. The implementation and its APIs
|
||||||
|
bake-in current best-practices based on the lessons learned over the past decade of building and
|
||||||
|
operating transparency logs in production environments and at scale.
|
||||||
|
|
||||||
|
Tessera was introduced at the Transparency.Dev summit in October 2024. I first watch Al and Martin
|
||||||
|
[[introduce](https://www.youtube.com/watch?v=9j_8FbQ9qSc)] it at last year's summit. At a high
|
||||||
|
level, it wraps what used to be a whole kubernetes cluster full of components, into a single library
|
||||||
|
that can be used with Cloud based services, either like AWS S3 and RDS database, or like GCP's GCS
|
||||||
|
storage and Spanner database. However, Google also made is easy to use a regular POSIX filesystem
|
||||||
|
implementation.
|
||||||
|
|
||||||
|
## TesseraCT
|
||||||
|
|
||||||
|
{{< image width="10em" float="right" src="/assets/ctlog/tesseract-logo.png" alt="tesseract logo" >}}
|
||||||
|
|
||||||
|
While Tessera is a library, a CT log implementation comes from its sibling GitHub repository called
|
||||||
|
[[TesseraCT](https://github.com/transparency-dev/tesseract)]. Because it leverages Tessera under the
|
||||||
|
hood, TesseraCT can run on GCP, AWS, POSIX-compliant, or on S3-compatible systems alongside a MySQL
|
||||||
|
database. In order to provide ecosystem agility and to control the growth of CT Log sizes, new CT
|
||||||
|
Logs must be temporally sharded, defining a certificate expiry range denoted in the form of two
|
||||||
|
dates: `[rangeBegin, rangeEnd)`. The certificate expiry range allows a Log to reject otherwise valid
|
||||||
|
logging submissions for certificates that expire before or after this defined range, thus
|
||||||
|
partitioning the set of publicly-trusted certificates that each Log will accept. I will be expected
|
||||||
|
to keep logs for an extended period of time, say 3-5 years.
|
||||||
|
|
||||||
|
It's time for me to figure out what this TesseraCT thing can do .. are you ready? Let's go!
|
||||||
|
|
||||||
|
### TesseraCT: S3 and SQL
|
||||||
|
|
||||||
|
TesseraCT comes with a few so-called _personalities_. Those are an implementation of the underlying
|
||||||
|
storage infrastructure in an opinionated way. The first personality I look at is the `aws` one in
|
||||||
|
`cmd/tesseract/aws`. I notice that this personality does make hard assumptions about the use of AWS
|
||||||
|
which is unfortunate as the documentation says '.. or self-hosted S3 and MySQL database'. However,
|
||||||
|
the `aws` personality assumes the AWS SecretManager in order to fetch its signing key. Before I
|
||||||
|
can be successful, I need to detangle that.
|
||||||
|
|
||||||
|
#### TesseraCT: AWS and Local Signer
|
||||||
|
|
||||||
|
First, I change `cmd/tesseract/aws/main.go` to add two new flags:
|
||||||
|
|
||||||
|
* ***-signer_public_key_file***: a path to the public key for checkpoints and SCT signer
|
||||||
|
* ***-signer_private_key_file***: a path to the private key for checkpoints and SCT signer
|
||||||
|
|
||||||
|
I then change the program to assume if these flags are both set, the user will want a
|
||||||
|
_NewLocalSigner_ instead of a _NewSecretsManagerSigner_. Now all I have to do is implement the
|
||||||
|
signer interface in a package `local_signer.go`. There, function _NewLocalSigner()_ will read the
|
||||||
|
public and private PEM from file, decode them, and create an _ECDSAWithSHA256Signer_ with them, a
|
||||||
|
simple example to show what I mean:
|
||||||
|
|
||||||
|
```
|
||||||
|
// NewLocalSigner creates a new signer that uses the ECDSA P-256 key pair from
|
||||||
|
// local disk files for signing digests.
|
||||||
|
func NewLocalSigner(publicKeyFile, privateKeyFile string) (*ECDSAWithSHA256Signer, error) {
|
||||||
|
// Read public key
|
||||||
|
publicKeyPEM, err := os.ReadFile(publicKeyFile)
|
||||||
|
publicPemBlock, rest := pem.Decode(publicKeyPEM)
|
||||||
|
|
||||||
|
var publicKey crypto.PublicKey
|
||||||
|
publicKey, err = x509.ParsePKIXPublicKey(publicPemBlock.Bytes)
|
||||||
|
ecdsaPublicKey, ok := publicKey.(*ecdsa.PublicKey)
|
||||||
|
|
||||||
|
// Read private key
|
||||||
|
privateKeyPEM, err := os.ReadFile(privateKeyFile)
|
||||||
|
privatePemBlock, rest := pem.Decode(privateKeyPEM)
|
||||||
|
|
||||||
|
var ecdsaPrivateKey *ecdsa.PrivateKey
|
||||||
|
ecdsaPrivateKey, err = x509.ParseECPrivateKey(privatePemBlock.Bytes)
|
||||||
|
|
||||||
|
// Verify the correctness of the signer key pair
|
||||||
|
if !ecdsaPrivateKey.PublicKey.Equal(ecdsaPublicKey) {
|
||||||
|
return nil, errors.New("signer key pair doesn't match")
|
||||||
|
}
|
||||||
|
|
||||||
|
return &ECDSAWithSHA256Signer{
|
||||||
|
publicKey: ecdsaPublicKey,
|
||||||
|
privateKey: ecdsaPrivateKey,
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
In the snippet above I omitted all of the error handling, but the local signer logic itself is
|
||||||
|
hopefully clear. And with that, I am liberated from Amazon's Cloud offering and can run this thing
|
||||||
|
all by myself!
|
||||||
|
|
||||||
|
#### TesseraCT: Running with S3, MySQL, and Local Signer
|
||||||
|
|
||||||
|
First, I need to create a suitable ECDSA key:
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~$ openssl ecparam -name prime256v1 -genkey -noout -out /tmp/private_key.pem
|
||||||
|
pim@ctlog-test:~$ openssl ec -in /tmp/private_key.pem -pubout -out /tmp/public_key.pem
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, I'll install the MySQL server and create the databases:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~$ sudo apt install default-mysql-server
|
||||||
|
pim@ctlog-test:~$ sudo mysql -u root
|
||||||
|
|
||||||
|
CREATE USER 'tesseract'@'localhost' IDENTIFIED BY '<db_passwd>';
|
||||||
|
CREATE DATABASE tesseract;
|
||||||
|
CREATE DATABASE tesseract_antispam;
|
||||||
|
GRANT ALL PRIVILEGES ON tesseract.* TO 'tesseract'@'localhost';
|
||||||
|
GRANT ALL PRIVILEGES ON tesseract_antispam.* TO 'tesseract'@'localhost';
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, I use the SSD MinIO lab-machine that I just loadtested to create an S3 bucket.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~$ mc mb minio-ssd/tesseract-test
|
||||||
|
pim@ctlog-test:~$ cat << EOF > /tmp/minio-access.json
|
||||||
|
{ "Version": "2012-10-17", "Statement": [ {
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [ "s3:ListBucket", "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ],
|
||||||
|
"Resource": [ "arn:aws:s3:::tesseract-test/*", "arn:aws:s3:::tesseract-test" ]
|
||||||
|
} ]
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
pim@ctlog-test:~$ mc admin user add minio-ssd <user> <secret>
|
||||||
|
pim@ctlog-test:~$ mc admin policy create minio-ssd tesseract-test-access /tmp/minio-access.json
|
||||||
|
pim@ctlog-test:~$ mc admin policy attach minio-ssd tesseract-test-access --user <user>
|
||||||
|
pim@ctlog-test:~$ mc anonymous set public minio-ssd/tesseract-test
|
||||||
|
```
|
||||||
|
|
||||||
|
{{< image width="6em" float="left" src="/assets/shared/brain.png" alt="brain" >}}
|
||||||
|
|
||||||
|
After some fiddling, I understand that the AWS software development kit makes some assumptions that
|
||||||
|
you'll be using .. _quelle surprise_ .. AWS services. But you can also use local S3 services by
|
||||||
|
setting a few key environment variables. I had heard of the S3 access and secret key environment
|
||||||
|
variables before, but I now need to also use a different S3 endpoint. That little detour into the
|
||||||
|
codebase only took me .. several hours.
|
||||||
|
|
||||||
|
Armed with that knowledge, I can build and finally start my TesseraCT instance:
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract/cmd/tesseract/aws$ go build -o ~/aws .
|
||||||
|
pim@ctlog-test:~$ export AWS_DEFAULT_REGION="us-east-1"
|
||||||
|
pim@ctlog-test:~$ export AWS_ACCESS_KEY_ID="<user>"
|
||||||
|
pim@ctlog-test:~$ export AWS_SECRET_ACCESS_KEY="<secret>"
|
||||||
|
pim@ctlog-test:~$ export AWS_ENDPOINT_URL_S3="http://minio-ssd.lab.ipng.ch:9000/"
|
||||||
|
pim@ctlog-test:~$ ./aws --http_endpoint='[::]:6962' \
|
||||||
|
--origin=ctlog-test.lab.ipng.ch/test-ecdsa \
|
||||||
|
--bucket=tesseract-test \
|
||||||
|
--db_host=ctlog-test.lab.ipng.ch \
|
||||||
|
--db_user=tesseract \
|
||||||
|
--db_password=<db_passwd> \
|
||||||
|
--db_name=tesseract \
|
||||||
|
--antispam_db_name=tesseract_antispam \
|
||||||
|
--signer_public_key_file=/tmp/public_key.pem \
|
||||||
|
--signer_private_key_file=/tmp/private_key.pem \
|
||||||
|
--roots_pem_file=internal/hammer/testdata/test_root_ca_cert.pem
|
||||||
|
|
||||||
|
I0727 15:13:04.666056 337461 main.go:128] **** CT HTTP Server Starting ****
|
||||||
|
```
|
||||||
|
|
||||||
|
Hah! I think most of the command line flags and environment variables should make sense, but I was
|
||||||
|
struggling for a while with the `--roots_pem_file` and the `--origin` flags, so I phoned a friend
|
||||||
|
(Al Cutter, Googler extraordinaire and an expert in Tessera/CT). He explained to me that the Log is
|
||||||
|
actually an open endpoint to which anybody might POST data. However, to avoid folks abusing the log
|
||||||
|
infrastructure, each POST is expected to come from one of the certificate authorities listed in the
|
||||||
|
`--roots_pem_file`. OK, that makes sense.
|
||||||
|
|
||||||
|
Then, the `--origin` flag designates how my log calls itself. In the resulting `checkpoint` file it
|
||||||
|
will enumerate a hash of the latest merged and published Merkle tree. In case a server serves
|
||||||
|
multiple logs, it uses the `--origin` flag to make the destinction which checksum belongs to which.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ curl http://tesseract-test.minio-ssd.lab.ipng.ch:9000/checkpoint
|
||||||
|
ctlog-test.lab.ipng.ch/test-ecdsa
|
||||||
|
0
|
||||||
|
JGPitKWWI0aGuCfC2k1n/p9xdWAYPm5RZPNDXkCEVUU=
|
||||||
|
|
||||||
|
— ctlog-test.lab.ipng.ch/test-ecdsa L+IHdQAAAZhMCONUBAMARjBEAiA/nc9dig6U//vPg7SoTHjt9bxP5K+x3w4MYKpIRn4ULQIgUY5zijRK8qyuJGvZaItDEmP1gohCt+wI+sESBnhkuqo=
|
||||||
|
```
|
||||||
|
|
||||||
|
When creating the bucket above, I used `mc anonymous set public`, which made the S3 bucket
|
||||||
|
world-readable. I can now execute the whole read-path simply by hitting the S3 service. Check.
|
||||||
|
|
||||||
|
#### TesseraCT: Loadtesting S3/MySQL
|
||||||
|
|
||||||
|
{{< image width="12em" float="right" src="/assets/ctlog/stop-hammer-time.jpg" alt="Stop, hammer time" >}}
|
||||||
|
|
||||||
|
The write path is a server on `[::]:6962`. I should be able to write a log to it, but how? Here's
|
||||||
|
where I am grateful to find a tool in the TesseraCT GitHub repository called `hammer`. This hammer
|
||||||
|
sets up read and write traffic to a Static CT API log to test correctness and performance under
|
||||||
|
load. The traffic is sent according to the [[Static CT API](https://c2sp.org/static-ct-api)] spec.
|
||||||
|
Slick!
|
||||||
|
|
||||||
|
The tool start a text-based UI (my favorite! also when using Cisco T-Rex loadtester) in the terminal
|
||||||
|
that shows the current status, logs, and supports increasing/decreasing read and write traffic. This
|
||||||
|
TUI allows for a level of interactivity when probing a new configuration of a log in order to find
|
||||||
|
any cliffs where performance degrades. For real load-testing applications, especially headless runs
|
||||||
|
as part of a CI pipeline, it is recommended to run the tool with `-show_ui=false` in order to disable
|
||||||
|
the UI.
|
||||||
|
|
||||||
|
I'm a bit lost in the somewhat terse
|
||||||
|
[[README.md](https://github.com/transparency-dev/tesseract/tree/main/internal/hammer)], but my buddy
|
||||||
|
Al comes to my rescue and explains the flags to me. First of all, the loadtester wants to hit the
|
||||||
|
same `--origin` that I configured the write-path to accept. In my case this is
|
||||||
|
`ctlog-test.lab.ipng.ch/test-ecdsa`. Then, it needs the public key for that _Log_, which I can find
|
||||||
|
in `/tmp/public_key.pem`. The text there is the _DER_ (Distinguished Encoding Rules), stored as a
|
||||||
|
base64 encoded string. What follows next was the most difficult for me to understand, as I was
|
||||||
|
thinking the hammer would read some log from the internet somewhere and replay it locally. Al
|
||||||
|
explains that actually, the `hammer` tool synthetically creates all of these entries itself, and it
|
||||||
|
regularly reads the `checkpoint` from the `--log_url` place, while it writes its certificates to
|
||||||
|
`--write_log_url`. The last few flags just inform the `hammer` how many read and write ops/sec it
|
||||||
|
should generate, and with that explanation my brain plays _tadaa.wav_ and I am ready to go.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ go run ./internal/hammer \
|
||||||
|
--origin=ctlog-test.lab.ipng.ch/test-ecdsa \
|
||||||
|
--log_public_key=MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEucHtDWe9GYNicPnuGWbEX8rJg/VnDcXs8z40KdoNidBKy6/ZXw2u+NW1XAUnGpXcZozxufsgOMhijsWb25r7jw== \
|
||||||
|
--log_url=http://tesseract-test.minio-ssd.lab.ipng.ch:9000/ \
|
||||||
|
--write_log_url=http://localhost:6962/ctlog-test.lab.ipng.ch/test-ecdsa/ \
|
||||||
|
--max_read_ops=0 \
|
||||||
|
--num_writers=5000 \
|
||||||
|
--max_write_ops=100
|
||||||
|
```
|
||||||
|
|
||||||
|
{{< image width="30em" float="right" src="/assets/ctlog/ctlog-loadtest1.png" alt="S3/MySQL Loadtest 100qps" >}}
|
||||||
|
|
||||||
|
Cool! It seems that the loadtest is happily chugging along at 100qps. The log is consuming them in
|
||||||
|
the HTTP write-path by accepting POST requests to
|
||||||
|
`/ctlog-test.lab.ipng.ch/test-ecdsa/ct/v1/add-chain`, where hammer is offering them at a rate of
|
||||||
|
100qps, with a configured probability of duplicates set at 10%. What that means is that every now
|
||||||
|
and again, it'll repeat a previous request. The purpose of this is to stress test the so-called
|
||||||
|
`antispam` implementation. When `hammer` sends its requests, it signs them with a certificate that
|
||||||
|
was issued by the CA described in `internal/hammer/testdata/test_root_ca_cert.pem`, which is why
|
||||||
|
TesseraCT accepts them.
|
||||||
|
|
||||||
|
I raise the write load by using the '>' key a few times. I notice things are great at 500qps, which
|
||||||
|
is nice because that's double what we are to expect. But I start seeing a bit more noise at 600qps.
|
||||||
|
When I raise the write-rate to 1000qps, all hell breaks loose on the logs of the server (and similar
|
||||||
|
logs in the `hammer` loadtester:
|
||||||
|
|
||||||
|
```
|
||||||
|
W0727 15:54:33.419881 348475 handlers.go:168] ctlog-test.lab.ipng.ch/test-ecdsa: AddChain handler error: couldn't store the leaf: failed to fetch entry bundle at index 0: failed to fetch resource: getObject: failed to create reader for object "tile/data/000" in bucket "tesseract-test": operation error S3: GetObject, context deadline exceeded
|
||||||
|
W0727 15:55:02.727962 348475 aws.go:345] GarbageCollect failed: failed to delete one or more objects: failed to delete objects: operation error S3: DeleteObjects, https response error StatusCode: 400, RequestID: 1856202CA3C4B83F, HostID: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8, api error MalformedXML: The XML you provided was not well-formed or did not validate against our published schema.
|
||||||
|
E0727 15:55:10.448973 348475 append_lifecycle.go:293] followerStats: follower "AWS antispam" EntriesProcessed(): failed to read follow coordination info: Error 1040: Too many connections
|
||||||
|
```
|
||||||
|
|
||||||
|
I see on the MinIO instance that it's doing about 150/s of GETs and 15/s of PUTs, which is totally
|
||||||
|
reasonable:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ mc admin trace --stats ssd
|
||||||
|
Duration: 6m9s ▰▱▱
|
||||||
|
RX Rate:↑ 34 MiB/m
|
||||||
|
TX Rate:↓ 2.3 GiB/m
|
||||||
|
RPM : 10588.1
|
||||||
|
-------------
|
||||||
|
Call Count RPM Avg Time Min Time Max Time Avg TTFB Max TTFB Avg Size Rate /min
|
||||||
|
s3.GetObject 60558 (92.9%) 9837.2 4.3ms 708µs 48.1ms 3.9ms 47.8ms ↑144B ↓246K ↑1.4M ↓2.3G
|
||||||
|
s3.PutObject 2199 (3.4%) 357.2 5.3ms 2.4ms 32.7ms 5.3ms 32.7ms ↑92K ↑32M
|
||||||
|
s3.DeleteMultipleObjects 1212 (1.9%) 196.9 877µs 290µs 41.1ms 850µs 41.1ms ↑230B ↓369B ↑44K ↓71K
|
||||||
|
s3.ListObjectsV2 1212 (1.9%) 196.9 18.4ms 999µs 52.8ms 18.3ms 52.7ms ↑131B ↓261B ↑25K ↓50K
|
||||||
|
```
|
||||||
|
|
||||||
|
Another nice way to see what makes it through is this oneliner, which reads the `checkpoint` every
|
||||||
|
second, and once it changes, shows the delta in seconds and how many certs were written:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ T=0; O=0; while :; do \
|
||||||
|
N=$(curl -sS http://tesseract-test.minio-ssd.lab.ipng.ch:9000/checkpoint | grep -E '^[0-9]+$'); \
|
||||||
|
if [ "$N" -eq "$O" ]; then \
|
||||||
|
echo -n .; \
|
||||||
|
else \
|
||||||
|
echo " $T seconds $((N-O)) certs"; O=$N; T=0; echo -n $N\ ;
|
||||||
|
fi; \
|
||||||
|
T=$((T+1)); sleep 1; done
|
||||||
|
1012905 .... 5 seconds 2081 certs
|
||||||
|
1014986 .... 5 seconds 2126 certs
|
||||||
|
1017112 .... 5 seconds 1913 certs
|
||||||
|
1019025 .... 5 seconds 2588 certs
|
||||||
|
1021613 .... 5 seconds 2591 certs
|
||||||
|
1024204 .... 5 seconds 2197 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
So I can see that the checkpoint is refreshed every 5 seconds and between 1913 and 2591 certs are
|
||||||
|
written each time. And indeed, at 400/s there are no errors or warnings at all. At this write rate,
|
||||||
|
TesseraCT is using about 2.9 CPUs/s, with MariaDB using 0.3 CPUs/s, but the hammer is using 6.0
|
||||||
|
CPUs/s. Overall, the machine is perfectly happily serving for a few hours under this load test.
|
||||||
|
|
||||||
|
***Conclusion: a write-rate of 400/s should be safe with S3+MySQL***
|
||||||
|
|
||||||
|
### TesseraCT: POSIX
|
||||||
|
|
||||||
|
I have been playing with this idea of having a reliable read-path by having the S3 cluster be
|
||||||
|
redundant, or by replicating the S3 bucket. But Al asks: why not use our experimental POSIX?
|
||||||
|
We discuss two very important benefits, but also two drawbacks:
|
||||||
|
|
||||||
|
* On the plus side:
|
||||||
|
1. There is no need for S3 storage, read/writing to a local ZFS raidz2 pool instead.
|
||||||
|
1. There is no need for MySQL, as the POSIX implementation can use a local badger instance
|
||||||
|
also on the local filesystem.
|
||||||
|
* On the drawbacks:
|
||||||
|
1. There is a SPOF in the read-path, as the single VM must handle both. The write-path always
|
||||||
|
has a SPOF on the TesseraCT VM.
|
||||||
|
1. Local storage is more expensive than S3 storage, and can be used only for the purposes of
|
||||||
|
one application (and at best, shared with other VMs on the same hypervisor).
|
||||||
|
|
||||||
|
Come to think of it, this is maybe not such a bad tradeoff. I do kind of like having a single-VM
|
||||||
|
with a single-binary and no other moving parts. It greatly simplifies the architecture, and for the
|
||||||
|
read-path I can (and will) still use multiple upstream NGINX machines in IPng's network.
|
||||||
|
|
||||||
|
I consider myself nerd-sniped, and take a look at the POSIX variant. I have a few SAS3
|
||||||
|
solid state storage (NetAPP part number X447_S1633800AMD), which I plug into the `ctlog-test`
|
||||||
|
machine.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~$ sudo zpool create -o ashift=12 -o autotrim=on -o ssd-vol0 mirror \
|
||||||
|
/dev/disk/by-id/wwn-0x5002538a0???????
|
||||||
|
pim@ctlog-test:~$ sudo zfs create ssd-vol0/tesseract-test
|
||||||
|
pim@ctlog-test:~$ sudo chown pim:pim /ssd-vol0/tesseract-test
|
||||||
|
pim@ctlog-test:~/src/tesseract$ go run ./cmd/experimental/posix --http_endpoint='[::]:6962' \
|
||||||
|
--origin=ctlog-test.lab.ipng.ch/test-ecdsa \
|
||||||
|
--private_key=/tmp/private_key.pem \
|
||||||
|
--storage_dir=/ssd-vol0/tesseract-test \
|
||||||
|
--roots_pem_file=internal/hammer/testdata/test_root_ca_cert.pem
|
||||||
|
badger 2025/07/27 16:29:15 INFO: All 0 tables opened in 0s
|
||||||
|
badger 2025/07/27 16:29:15 INFO: Discard stats nextEmptySlot: 0
|
||||||
|
badger 2025/07/27 16:29:15 INFO: Set nextTxnTs to 0
|
||||||
|
I0727 16:29:15.032845 363156 files.go:502] Initializing directory for POSIX log at "/ssd-vol0/tesseract-test" (this should only happen ONCE per log!)
|
||||||
|
I0727 16:29:15.034101 363156 main.go:97] **** CT HTTP Server Starting ****
|
||||||
|
|
||||||
|
pim@ctlog-test:~/src/tesseract$ cat /ssd-vol0/tesseract-test/checkpoint
|
||||||
|
ctlog-test.lab.ipng.ch/test-ecdsa
|
||||||
|
0
|
||||||
|
47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
|
||||||
|
|
||||||
|
— ctlog-test.lab.ipng.ch/test-ecdsa L+IHdQAAAZhMSgC8BAMARzBFAiBjT5zdkniKlryqlUlx/gLHOtVK26zuWwrc4BlyTVzCWgIhAJ0GIrlrP7YGzRaHjzdB5tnS5rpP3LeOsPbpLateaiFc
|
||||||
|
```
|
||||||
|
|
||||||
|
Alright, I can see the log started and created an empty checkpoint file. Nice!
|
||||||
|
|
||||||
|
Before I can loadtest it, I will need to get the read-path to become visible. The `hammer` can read
|
||||||
|
a checkpoint from local `file:///` prefixes, but I'll have to serve them over the network eventually
|
||||||
|
anyway, so I create the following NGINX config for it:
|
||||||
|
|
||||||
|
```
|
||||||
|
server {
|
||||||
|
listen 80 default_server backlog=4096;
|
||||||
|
listen [::]:80 default_server backlog=4096;
|
||||||
|
root /ssd-vol0/tesseract-test/;
|
||||||
|
index index.html index.htm index.nginx-debian.html;
|
||||||
|
|
||||||
|
server_name _;
|
||||||
|
|
||||||
|
access_log /var/log/nginx/access.log combined buffer=512k flush=5s;
|
||||||
|
|
||||||
|
location / {
|
||||||
|
try_files $uri $uri/ =404;
|
||||||
|
tcp_nopush on;
|
||||||
|
sendfile on;
|
||||||
|
tcp_nodelay on;
|
||||||
|
keepalive_timeout 65;
|
||||||
|
keepalive_requests 1000;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Just a couple of small thoughts on this configuration. I'm using buffered access logs, to avoid
|
||||||
|
excessive disk writes in the read-path. Then, I'm using kernel `sendfile()` which will instruct the
|
||||||
|
kernel to serve the static objects directly, so that NGINX can move on. Further, I'll allow for a
|
||||||
|
long keepalive in HTTP 1.1, so that future requests can use the same TCP connection, and I'll set
|
||||||
|
the flag `tcp_nodelay` and `tcp_nopush` to just blast the data out without waiting.
|
||||||
|
|
||||||
|
Without much ado:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ curl -sS ctlog-test.lab.ipng.ch/checkpoint
|
||||||
|
ctlog-test.lab.ipng.ch/test-ecdsa
|
||||||
|
0
|
||||||
|
47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=
|
||||||
|
|
||||||
|
— ctlog-test.lab.ipng.ch/test-ecdsa L+IHdQAAAZhMTfksBAMASDBGAiEAqADLH0P/SRVloF6G1ezlWG3Exf+sTzPIY5u6VjAKLqACIQCkJO2N0dZQuDHvkbnzL8Hd91oyU41bVqfD3vs5EwUouA==
|
||||||
|
```
|
||||||
|
|
||||||
|
#### TesseraCT: Loadtesting POSIX
|
||||||
|
|
||||||
|
The loadtesting is roughly the same. I start the `hammer` with the same 500qps of write rate, which
|
||||||
|
was roughly where the S3+MySQL variant topped. My checkpoint tracker shows the following:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ T=0; O=0; while :; do \
|
||||||
|
N=$(curl -sS http://localhost/checkpoint | grep -E '^[0-9]+$'); \
|
||||||
|
if [ "$N" -eq "$O" ]; then \
|
||||||
|
echo -n .; \
|
||||||
|
else \
|
||||||
|
echo " $T seconds $((N-O)) certs"; O=$N; T=0; echo -n $N\ ;
|
||||||
|
fi; \
|
||||||
|
T=$((T+1)); sleep 1; done
|
||||||
|
59250 ......... 10 seconds 5244 certs
|
||||||
|
64494 ......... 10 seconds 5000 certs
|
||||||
|
69494 ......... 10 seconds 5000 certs
|
||||||
|
74494 ......... 10 seconds 5000 certs
|
||||||
|
79494 ......... 10 seconds 5256 certs
|
||||||
|
79494 ......... 10 seconds 5256 certs
|
||||||
|
84750 ......... 10 seconds 5244 certs
|
||||||
|
89994 ......... 10 seconds 5256 certs
|
||||||
|
95250 ......... 10 seconds 5000 certs
|
||||||
|
100250 ......... 10 seconds 5000 certs
|
||||||
|
105250 ......... 10 seconds 5000 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
I learn two things. First, the checkpoint interval in this `posix` variant is 10 seconds, compared
|
||||||
|
to the 5 seconds of the `aws` variant I tested before. I dive into the code, because there doesn't
|
||||||
|
seem to be a `--checkpoint_interval` flag. In the `tessera` library, I find
|
||||||
|
`DefaultCheckpointInterval` which is set to 10 seconds. I change it to be 2 seconds instead, and
|
||||||
|
restart the `posix` binary:
|
||||||
|
|
||||||
|
```
|
||||||
|
238250 . 2 seconds 1000 certs
|
||||||
|
239250 . 2 seconds 1000 certs
|
||||||
|
240250 . 2 seconds 1000 certs
|
||||||
|
241250 . 2 seconds 1000 certs
|
||||||
|
242250 . 2 seconds 1000 certs
|
||||||
|
243250 . 2 seconds 1000 certs
|
||||||
|
244250 . 2 seconds 1000 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
{{< image width="30em" float="right" src="/assets/ctlog/ctlog-loadtest2.png" alt="Posix Loadtest 5000qps" >}}
|
||||||
|
|
||||||
|
Very nice! Maybe I can write a few more certs? I restart the `hammer` with 5000/s, which somewhat to my
|
||||||
|
surprise, ends up serving!
|
||||||
|
|
||||||
|
```
|
||||||
|
642608 . 2 seconds 6155 certs
|
||||||
|
648763 . 2 seconds 10256 certs
|
||||||
|
659019 . 2 seconds 9237 certs
|
||||||
|
668256 . 2 seconds 8800 certs
|
||||||
|
677056 . 2 seconds 8729 certs
|
||||||
|
685785 . 2 seconds 8237 certs
|
||||||
|
694022 . 2 seconds 7487 certs
|
||||||
|
701509 . 2 seconds 8572 certs
|
||||||
|
710081 . 2 seconds 7413 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
The throughput is highly variable though, seemingly between 3700/sec and 5100/sec, and I quickly
|
||||||
|
find out that the `hammer` is completely saturating the CPU on the machine, leaving very little room
|
||||||
|
for the `posix` TesseraCT to serve. I'm going to need more machines!
|
||||||
|
|
||||||
|
So I start a `hammer` loadtester on the two now-idle MinIO servers, and run them at about 6000qps
|
||||||
|
**each**, for a total of 12000 certs/sec. And my little `posix` binary is keeping up like a champ:
|
||||||
|
|
||||||
|
```
|
||||||
|
2987169 . 2 seconds 23040 certs
|
||||||
|
3010209 . 2 seconds 23040 certs
|
||||||
|
3033249 . 2 seconds 21760 certs
|
||||||
|
3055009 . 2 seconds 21504 certs
|
||||||
|
3076513 . 2 seconds 23808 certs
|
||||||
|
3100321 . 2 seconds 22528 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
One thing is reasonably clear, the `posix` TesseraCT is CPU bound, not disk bound. The CPU is now
|
||||||
|
running at about 18.5 CPUs/s (with 20 cores), which is pretty much all this Dell has to offer. The
|
||||||
|
NetAPP enterprise solid state drives are not impressed:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ zpool iostat -v ssd-vol0 10 100
|
||||||
|
capacity operations bandwidth
|
||||||
|
pool alloc free read write read write
|
||||||
|
-------------------------- ----- ----- ----- ----- ----- -----
|
||||||
|
ssd-vol0 11.4G 733G 0 3.13K 0 117M
|
||||||
|
mirror-0 11.4G 733G 0 3.13K 0 117M
|
||||||
|
wwn-0x5002538a05302930 - - 0 1.04K 0 39.1M
|
||||||
|
wwn-0x5002538a053069f0 - - 0 1.06K 0 39.1M
|
||||||
|
wwn-0x5002538a06313ed0 - - 0 1.02K 0 39.1M
|
||||||
|
-------------------------- ----- ----- ----- ----- ----- -----
|
||||||
|
|
||||||
|
pim@ctlog-test:~/src/tesseract$ zpool iostat -l ssd-vol0 10
|
||||||
|
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
|
||||||
|
pool alloc free read write read write read write read write read write read write wait wait
|
||||||
|
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
|
||||||
|
ssd-vol0 14.0G 730G 0 1.48K 0 35.4M - 2ms - 535us - 1us - 3ms - 50ms
|
||||||
|
ssd-vol0 14.0G 730G 0 1.12K 0 23.0M - 1ms - 733us - 2us - 1ms - 44ms
|
||||||
|
ssd-vol0 14.1G 730G 0 1.42K 0 45.3M - 508us - 122us - 914ns - 2ms - 41ms
|
||||||
|
ssd-vol0 14.2G 730G 0 678 0 21.0M - 863us - 144us - 2us - 2ms - -
|
||||||
|
```
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
OK, that kind of seals the deal for me. The write path needs about 250 certs/sec and I'm hammering
|
||||||
|
now with 12'000 certs/sec, with room to spare. But what about the read path? The cool thing about
|
||||||
|
the static log is that reads are all entirely done by NGINX. The only file that isn't cacheable is
|
||||||
|
the `checkpoint` file which gets updated every two seconds (or ten seconds in the default `tessera`
|
||||||
|
settings).
|
||||||
|
|
||||||
|
So I start yet another `hammer` whose job it is to read back from the static filesystem:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ curl localhost/nginx_status; sleep 60; curl localhost/nginx_status
|
||||||
|
Active connections: 10556
|
||||||
|
server accepts handled requests
|
||||||
|
25302 25302 1492918
|
||||||
|
Reading: 0 Writing: 1 Waiting: 10555
|
||||||
|
Active connections: 7791
|
||||||
|
server accepts handled requests
|
||||||
|
25764 25764 1727631
|
||||||
|
Reading: 0 Writing: 1 Waiting: 7790
|
||||||
|
```
|
||||||
|
|
||||||
|
And I can see that it's keeping up quite nicely. In one minute, it handled (1727631-1492918) or
|
||||||
|
234713 requests, which is a cool 3911 requests/sec. All these read/write hammers are kind of
|
||||||
|
saturating the `ctlog-test` machine though:
|
||||||
|
|
||||||
|
{{< image width="100%" src="/assets/ctlog/ctlog-loadtest3.png" alt="Posix Loadtest 8000qps write, 4000qps read" >}}
|
||||||
|
|
||||||
|
But after a little bit of fiddling, I can assert my conclusion:
|
||||||
|
|
||||||
|
***Conclusion: a write-rate of 8'000/s alongside a read-rate of 4'000/s should be safe with POSIX***
|
||||||
|
|
||||||
|
## What's Next
|
||||||
|
|
||||||
|
I am going to offer such a machine in production together with Antonis Chariton, and Jeroen Massar.
|
||||||
|
I plan to do a few additional things:
|
||||||
|
|
||||||
|
* Test Sunlight as well on the same hardware. It would be nice to see a comparison between write
|
||||||
|
rates of the two implementations.
|
||||||
|
* Work with Al Cutter and the Transparency Dev team to close a few small gaps (like the
|
||||||
|
`local_signer.go` and some Prometheus monitoring of the `posix` binary.
|
||||||
|
* Install and launch both under `*.ct.ipng.ch`, which in itself deserves its own report, showing
|
||||||
|
how I intend to do log cycling and care/feeding, as well as report on the real production
|
||||||
|
experience running these CT Logs.
|
666
content/articles/2025-08-10-ctlog-2.md
Normal file
666
content/articles/2025-08-10-ctlog-2.md
Normal file
@@ -0,0 +1,666 @@
|
|||||||
|
---
|
||||||
|
date: "2025-08-10T12:07:23Z"
|
||||||
|
title: 'Certificate Transparency - Part 2 - Sunlight'
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image width="10em" float="right" src="/assets/ctlog/ctlog-logo-ipng.png" alt="ctlog logo" >}}
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
There once was a Dutch company called [[DigiNotar](https://en.wikipedia.org/wiki/DigiNotar)], as the
|
||||||
|
name suggests it was a form of _digital notary_, and they were in the business of issuing security
|
||||||
|
certificates. Unfortunately, in June of 2011, their IT infrastructure was compromised and
|
||||||
|
subsequently it issued hundreds of fraudulent SSL certificates, some of which were used for
|
||||||
|
man-in-the-middle attacks on Iranian Gmail users. Not cool.
|
||||||
|
|
||||||
|
Google launched a project called **Certificate Transparency**, because it was becoming more common
|
||||||
|
that the root of trust given to _Certification Authorities_ could no longer be unilaterally trusted.
|
||||||
|
These attacks showed that the lack of transparency in the way CAs operated was a significant risk to
|
||||||
|
the Web Public Key Infrastructure. It led to the creation of this ambitious
|
||||||
|
[[project](https://certificate.transparency.dev/)] to improve security online by bringing
|
||||||
|
accountability to the system that protects our online services with _SSL_ (Secure Socket Layer)
|
||||||
|
and _TLS_ (Transport Layer Security).
|
||||||
|
|
||||||
|
In 2013, [[RFC 6962](https://datatracker.ietf.org/doc/html/rfc6962)] was published by the IETF. It
|
||||||
|
describes an experimental protocol for publicly logging the existence of Transport Layer Security
|
||||||
|
(TLS) certificates as they are issued or observed, in a manner that allows anyone to audit
|
||||||
|
certificate authority (CA) activity and notice the issuance of suspect certificates as well as to
|
||||||
|
audit the certificate logs themselves. The intent is that eventually clients would refuse to honor
|
||||||
|
certificates that do not appear in a log, effectively forcing CAs to add all issued certificates to
|
||||||
|
the logs.
|
||||||
|
|
||||||
|
In a [[previous article]({{< ref 2025-07-26-ctlog-1 >}})], I took a deep dive into an upcoming
|
||||||
|
open source implementation of Static CT Logs made by Google. There is however a very competent
|
||||||
|
alternative called [[Sunlight](https://sunlight.dev/)], which deserves some attention to get to know
|
||||||
|
its look and feel, as well as its performance characteristics.
|
||||||
|
|
||||||
|
## Sunlight
|
||||||
|
|
||||||
|
I start by reading up on the project website, and learn:
|
||||||
|
|
||||||
|
> _Sunlight is a [[Certificate Transparency](https://certificate.transparency.dev/)] log implementation
|
||||||
|
> and monitoring API designed for scalability, ease of operation, and reduced cost. What started as
|
||||||
|
> the Sunlight API is now the [[Static CT API](https://c2sp.org/static-ct-api)] and is allowed by the
|
||||||
|
> CT log policies of the major browsers._
|
||||||
|
>
|
||||||
|
> _Sunlight was designed by Filippo Valsorda for the needs of the WebPKI community, through the
|
||||||
|
> feedback of many of its members, and in particular of the Sigsum, Google TrustFabric, and ISRG
|
||||||
|
> teams. It is partially based on the Go Checksum Database. Sunlight's development was sponsored by
|
||||||
|
> Let's Encrypt._
|
||||||
|
|
||||||
|
I have a chat with Filippo and think I'm addressing an Elephant by asking him which of the two
|
||||||
|
implementations, TesseraCT or Sunlight, he thinks would be a good fit. One thing he says really sticks
|
||||||
|
with me: "The community needs _any_ static log operator, so if Google thinks TesseraCT is ready, by
|
||||||
|
all means use that. The diversity will do us good!".
|
||||||
|
|
||||||
|
To find out if one or the other is 'ready' is partly on the software, but importantly also on the
|
||||||
|
operator. So I carefully take Sunlight out of its cardboard box, and put it onto the same Dell R630
|
||||||
|
that I used in my previous tests: two Xeon E5-2640 v4 CPUs for a total of 20 cores and 40 threads,
|
||||||
|
and 512GB of DDR4 memory. They also sport a SAS controller. In one machine I place 6 pcs 1.2TB SAS3
|
||||||
|
drives (HPE part number EG1200JEHMC), and in the second machine I place 6pcs of 1.92TB enterprise
|
||||||
|
storage (Samsung part number P1633N19).
|
||||||
|
|
||||||
|
### Sunlight: setup
|
||||||
|
|
||||||
|
I download the source from GitHub, which, one of these days, will have an IPv6 address. Building the
|
||||||
|
tools is easy enough, there are three main tools:
|
||||||
|
1. ***sunlight***: Which serves the write-path. Certification authorities add their certs here.
|
||||||
|
1. ***sunlight-keygen***: A helper tool to create the so-called `seed` file (key material) for a
|
||||||
|
log.
|
||||||
|
1. ***skylight***: Which serves the read-path. `/checkpoint` and things like `/tile` and `/issuer`
|
||||||
|
are served here in a spec-compliant way.
|
||||||
|
|
||||||
|
The YAML configuration file is straightforward, and can define and handle multiple logs in one
|
||||||
|
instance, which sets it apart from TesseraCT which can only handle one log per instance. There's a
|
||||||
|
`submissionprefix` which `sunlight` will use to accept writes, and a `monitoringprefix` which
|
||||||
|
`skylight` will use for reads.
|
||||||
|
|
||||||
|
I stumble across a small issue - I haven't created multiple DNS hostnames for the test machine. So I
|
||||||
|
decide to use a different port for one versus the other. The write path will use TLS on port 1443
|
||||||
|
while Sunlight will point to a normal HTTP port 1080. And considering I don't have a certificate for
|
||||||
|
`*.lab.ipng.ch`, I will use a self-signed one instead:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ openssl genrsa -out ca.key 2048
|
||||||
|
pim@ctlog-test:/etc/sunlight$ openssl req -new -x509 -days 365 -key ca.key \
|
||||||
|
-subj "/C=CH/ST=ZH/L=Bruttisellen/O=IPng Networks GmbH/CN=IPng Root CA" -out ca.crt
|
||||||
|
pim@ctlog-test:/etc/sunlight$ openssl req -newkey rsa:2048 -nodes -keyout sunlight-key.pem \
|
||||||
|
-subj "/C=CH/ST=ZH/L=Bruttisellen/O=IPng Networks GmbH/CN=*.lab.ipng.ch" -out sunlight.csr
|
||||||
|
pim@ctlog-test:/etc/sunlight# openssl x509 -req -extfile \
|
||||||
|
<(printf "subjectAltName=DNS:ctlog-test.lab.ipng.ch,DNS:ctlog-test.lab.ipng.ch") -days 365 \
|
||||||
|
-in sunlight.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out sunlight.pem
|
||||||
|
ln -s sunlight.pem skylight.pem
|
||||||
|
ln -s sunlight-key.pem skylight-key.pem
|
||||||
|
```
|
||||||
|
|
||||||
|
This little snippet yields `sunlight.pem` (the certificate) and `sunlight-key.pem` (the private
|
||||||
|
key), and symlinks them to `skylight.pem` and `skylight-key.pem` for simplicity. With these in hand,
|
||||||
|
I can start the rest of the show. First I will prepare the NVME storage with a few datasets in
|
||||||
|
which Sunlight will store its data:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test
|
||||||
|
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/shared
|
||||||
|
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs
|
||||||
|
pim@ctlog-test:~$ sudo zfs create ssd-vol0/sunlight-test/logs/sunlight-test
|
||||||
|
pim@ctlog-test:~$ sudo chown -R pim:pim /ssd-vol0/sunlight-test
|
||||||
|
```
|
||||||
|
|
||||||
|
Then I'll create the Sunlight configuration:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ sunlight-keygen -f sunlight-test.seed.bin
|
||||||
|
Log ID: IPngJcHCHWi+s37vfFqpY9ouk+if78wAY2kl/sh3c8E=
|
||||||
|
ECDSA public key:
|
||||||
|
-----BEGIN PUBLIC KEY-----
|
||||||
|
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHR
|
||||||
|
wRllfa2cjURBJIKPpCUbgiiMX/jLQqmfzYrtveUws4SG8eT7+ICoa8xdAQ==
|
||||||
|
-----END PUBLIC KEY-----
|
||||||
|
Ed25519 public key:
|
||||||
|
-----BEGIN PUBLIC KEY-----
|
||||||
|
0pHg7KptAxmb4o67m9xNM1Ku3YH4bjjXbyIgXn2R2bk=
|
||||||
|
-----END PUBLIC KEY-----
|
||||||
|
```
|
||||||
|
|
||||||
|
The first block creates key material for the log, and I get a fun surprise: the Log ID starts
|
||||||
|
precisely with the string IPng... what are the odds that that would happen!? I should tell Antonis
|
||||||
|
about this, it's dope!
|
||||||
|
|
||||||
|
As a safety precaution, Sunlight requires the operator to make the `checkpoints.db` by hand, which
|
||||||
|
I'll also do:
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ sqlite3 /ssd-vol0/sunlight-test/shared/checkpoints.db \
|
||||||
|
"CREATE TABLE checkpoints (logID BLOB PRIMARY KEY, body TEXT)"
|
||||||
|
```
|
||||||
|
|
||||||
|
And with that, I'm ready to create my first log!
|
||||||
|
|
||||||
|
### Sunlight: Setting up S3
|
||||||
|
|
||||||
|
When learning about [[Tessera]({{< ref 2025-07-26-ctlog-1 >}})], I already kind of drew the
|
||||||
|
conclusion that, for our case at IPng at least, running the fully cloud-native version with S3
|
||||||
|
storage and MySQL database, gave both poorer performance, but also more operational complexity. But
|
||||||
|
I find it interesting to compare behavior and performance, so I'll start by creating a Sunlight log
|
||||||
|
using backing MinIO SSD storage.
|
||||||
|
|
||||||
|
I'll first create the bucket and a user account to access it:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~$ export AWS_ACCESS_KEY_ID="<some user>"
|
||||||
|
pim@ctlog-test:~$ export AWS_SECRET_ACCESS_KEY="<some password>"
|
||||||
|
pim@ctlog-test:~$ export S3_BUCKET=sunlight-test
|
||||||
|
|
||||||
|
pim@ctlog-test:~$ mc mb ssd/${S3_BUCKET}
|
||||||
|
pim@ctlog-test:~$ cat << EOF > /tmp/minio-access.json
|
||||||
|
{ "Version": "2012-10-17", "Statement": [ {
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [ "s3:ListBucket", "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ],
|
||||||
|
"Resource": [ "arn:aws:s3:::${S3_BUCKET}/*", "arn:aws:s3:::${S3_BUCKET}" ]
|
||||||
|
} ]
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
pim@ctlog-test:~$ mc admin user add ssd ${AWS_ACCESS_KEY_ID} ${AWS_SECRET_ACCESS_KEY}
|
||||||
|
pim@ctlog-test:~$ mc admin policy create ssd ${S3_BUCKET}-access /tmp/minio-access.json
|
||||||
|
pim@ctlog-test:~$ mc admin policy attach ssd ${S3_BUCKET}-access --user ${AWS_ACCESS_KEY_ID}
|
||||||
|
pim@ctlog-test:~$ mc anonymous set public ssd/${S3_BUCKET}
|
||||||
|
```
|
||||||
|
|
||||||
|
After setting up the S3 environment, all I must do is wire it up to the Sunlight configuration
|
||||||
|
file:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ cat << EOF > sunlight-s3.yaml
|
||||||
|
listen:
|
||||||
|
- "[::]:1443"
|
||||||
|
checkpoints: /ssd-vol0/sunlight-test/shared/checkpoints.db
|
||||||
|
logs:
|
||||||
|
- shortname: sunlight-test
|
||||||
|
inception: 2025-08-10
|
||||||
|
submissionprefix: https://ctlog-test.lab.ipng.ch:1443/
|
||||||
|
monitoringprefix: http://sunlight-test.minio-ssd.lab.ipng.ch:9000/
|
||||||
|
secret: /etc/sunlight/sunlight-test.seed.bin
|
||||||
|
cache: /ssd-vol0/sunlight-test/logs/sunlight-test/cache.db
|
||||||
|
s3region: eu-schweiz-1
|
||||||
|
s3bucket: sunlight-test
|
||||||
|
s3endpoint: http://minio-ssd.lab.ipng.ch:9000/
|
||||||
|
roots: /etc/sunlight/roots.pem
|
||||||
|
period: 200
|
||||||
|
poolsize: 15000
|
||||||
|
notafterstart: 2024-01-01T00:00:00Z
|
||||||
|
notafterlimit: 2025-01-01T00:00:00Z
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
The one thing of note here is the use of `roots:` file which contains the Root CA for the TesseraCT
|
||||||
|
loadtester which I'll be using. In production, Sunlight can grab the approved roots from the
|
||||||
|
so-called _Common CA Database_ or CCADB. But you can also specify either all roots using the `roots`
|
||||||
|
field, or additional roots on top of the `ccadbroots` field, using the `extraroots` field. That's a
|
||||||
|
handy trick! You can find more info on the [[CCADB](https://www.ccadb.org/)] homepage.
|
||||||
|
|
||||||
|
I can then start Sunlight just like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ sunlight -testcert -c /etc/sunlight/sunlight-s3.yaml {"time":"2025-08-10T13:49:36.091384532+02:00","level":"INFO","source":{"function":"main.main.func1","file":"/home/pim/src/sunlight/cmd/sunlight/sunlig
|
||||||
|
ht.go","line":341},"msg":"debug server listening","addr":{"IP":"127.0.0.1","Port":37477,"Zone":""}}
|
||||||
|
time=2025-08-10T13:49:36.091+02:00 level=INFO msg="debug server listening" addr=127.0.0.1:37477 {"time":"2025-08-10T13:49:36.100471647+02:00","level":"INFO","source":{"function":"main.main","file":"/home/pim/src/sunlight/cmd/sunlight/sunlight.go"
|
||||||
|
,"line":542},"msg":"today is the Inception date, creating log","log":"sunlight-test"} time=2025-08-10T13:49:36.100+02:00 level=INFO msg="today is the Inception date, creating log" log=sunlight-test
|
||||||
|
{"time":"2025-08-10T13:49:36.119529208+02:00","level":"INFO","source":{"function":"filippo.io/sunlight/internal/ctlog.CreateLog","file":"/home/pim/src
|
||||||
|
/sunlight/internal/ctlog/ctlog.go","line":159},"msg":"created log","log":"sunlight-test","timestamp":1754826576111,"logID":"IPngJcHCHWi+s37vfFqpY9ouk+if78wAY2kl/sh3c8E="}
|
||||||
|
time=2025-08-10T13:49:36.119+02:00 level=INFO msg="created log" log=sunlight-test timestamp=1754826576111 logID="IPngJcHCHWi+s37vfFqpY9ouk+if78wAY2kl/sh3c8E="
|
||||||
|
{"time":"2025-08-10T13:49:36.127702166+02:00","level":"WARN","source":{"function":"filippo.io/sunlight/internal/ctlog.LoadLog","file":"/home/pim/src/s
|
||||||
|
unlight/internal/ctlog/ctlog.go","line":296},"msg":"failed to parse previously trusted roots","log":"sunlight-test","roots":""} time=2025-08-10T13:49:36.127+02:00 level=WARN msg="failed to parse previously trusted roots" log=sunlight-test roots=""
|
||||||
|
{"time":"2025-08-10T13:49:36.127766452+02:00","level":"INFO","source":{"function":"filippo.io/sunlight/internal/ctlog.LoadLog","file":"/home/pim/src/sunlight/internal/ctlog/ctlog.go","line":301},"msg":"loaded log","log":"sunlight-test","logID":"IPngJcHCHWi+s37vfFqpY9ouk+if78wAY2kl/sh3c8E=","size":0,
|
||||||
|
"timestamp":1754826576111}
|
||||||
|
time=2025-08-10T13:49:36.127+02:00 level=INFO msg="loaded log" log=sunlight-test logID="IPngJcHCHWi+s37vfFqpY9ouk+if78wAY2kl/sh3c8E=" size=0 timestamp=1754826576111
|
||||||
|
{"time":"2025-08-10T13:49:36.540297532+02:00","level":"INFO","source":{"function":"filippo.io/sunlight/internal/ctlog.(*Log).sequencePool","file":"/home/pim/src/sunlight/internal/ctlog/ctlog.go","line":972},"msg":"sequenced pool","log":"sunlight-test","old_tree_size":0,"entries":0,"start":"2025-08-1
|
||||||
|
0T13:49:36.534500633+02:00","tree_size":0,"tiles":0,"timestamp":1754826576534,"elapsed":5788099}
|
||||||
|
time=2025-08-10T13:49:36.540+02:00 level=INFO msg="sequenced pool" log=sunlight-test old_tree_size=0 entries=0 start=2025-08-10T13:49:36.534+02:00 tree_size=0 tiles=0 timestamp=1754826576534 elapsed=5.788099ms
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Although that looks pretty good, I see that something is not quite right. When Sunlight comes up, it shares
|
||||||
|
with me a few links, in the `get-roots` and `json` fields on the homepage, but neither of them work:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~$ curl -k https://ctlog-test.lab.ipng.ch:1443/ct/v1/get-roots
|
||||||
|
404 page not found
|
||||||
|
pim@ctlog-test:~$ curl -k https://ctlog-test.lab.ipng.ch:1443/log.v3.json
|
||||||
|
404 page not found
|
||||||
|
```
|
||||||
|
|
||||||
|
I'm starting to think that using a non-standard listen port won't work, or more precisely, adding
|
||||||
|
a port in the `monitoringprefix` won't work. I notice that the logname is called
|
||||||
|
`ctlog-test.lab.ipng.ch:1443` which I don't think is supposed to have a portname in it. So instead,
|
||||||
|
I make Sunlight `listen` on port 443 and omit the port in the `submissionprefix`, and give it and
|
||||||
|
its companion Skylight the needed privileges to bind the privileged port like so:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~$ sudo setcap 'cap_net_bind_service=+ep' /usr/local/bin/sunlight
|
||||||
|
pim@ctlog-test:~$ sudo setcap 'cap_net_bind_service=+ep' /usr/local/bin/skylight
|
||||||
|
pim@ctlog-test:~$ sunlight -testcert -c /etc/sunlight/sunlight-s3.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
{{< image width="60%" src="/assets/ctlog/sunlight-test-s3.png" alt="Sunlight testlog / S3" >}}
|
||||||
|
|
||||||
|
And with that, Sunlight reports for duty and the links work. Hoi!
|
||||||
|
|
||||||
|
#### Sunlight: Loadtesting S3
|
||||||
|
|
||||||
|
I have some good experience loadtesting from the [[TesseraCT article]({{< ref 2025-07-26-ctlog-1
|
||||||
|
>}})]. One important difference is that Sunlight wants to use SSL for the submission and monitoring
|
||||||
|
paths, and I've created a snakeoil self-signed cert. CT Hammer does not accept that out of the box,
|
||||||
|
so I need to make a tiny change to the Hammer:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ git diff
|
||||||
|
diff --git a/internal/hammer/hammer.go b/internal/hammer/hammer.go
|
||||||
|
index 3828fbd..1dfd895 100644
|
||||||
|
--- a/internal/hammer/hammer.go
|
||||||
|
+++ b/internal/hammer/hammer.go
|
||||||
|
@@ -104,6 +104,9 @@ func main() {
|
||||||
|
MaxIdleConns: *numWriters + *numReadersFull + *numReadersRandom,
|
||||||
|
MaxIdleConnsPerHost: *numWriters + *numReadersFull + *numReadersRandom,
|
||||||
|
DisableKeepAlives: false,
|
||||||
|
+ TLSClientConfig: &tls.Config{
|
||||||
|
+ InsecureSkipVerify: true,
|
||||||
|
+ },
|
||||||
|
},
|
||||||
|
Timeout: *httpTimeout,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
With that small bit of insecurity out of the way, Sunlight makes it otherwise pretty easy for me to
|
||||||
|
construct the CT Hammer commandline:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab.ipng.ch \
|
||||||
|
--log_public_key=MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHRwRllfa2cjURBJIKPpCUbgiiMX/jLQqmfzYrtveUws4SG8eT7+ICoa8xdAQ== \
|
||||||
|
--log_url=http://sunlight-test.minio-ssd.lab.ipng.ch:9000/ --write_log_url=https://ctlog-test.lab.ipng.ch/ \
|
||||||
|
--max_read_ops=0 --num_writers=5000 --max_write_ops=100
|
||||||
|
|
||||||
|
pim@ctlog-test:/etc/sunlight$ T=0; O=0; while :; do \
|
||||||
|
N=$(curl -sS http://sunlight-test.minio-ssd.lab.ipng.ch:9000/checkpoint | grep -E '^[0-9]+$'); \
|
||||||
|
if [ "$N" -eq "$O" ]; then \
|
||||||
|
echo -n .; \
|
||||||
|
else \
|
||||||
|
echo " $T seconds $((N-O)) certs"; O=$N; T=0; echo -n $N\ ;
|
||||||
|
fi; \
|
||||||
|
T=$((T+1)); sleep 1; done
|
||||||
|
24915 1 seconds 96 certs
|
||||||
|
25011 1 seconds 92 certs
|
||||||
|
25103 1 seconds 93 certs
|
||||||
|
25196 1 seconds 87 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
On the first commandline I'll start the loadtest at 100 writes/sec with the standard duplication
|
||||||
|
probability of 10%, which allows me to test Sunlights ability to avoid writing duplicates. This
|
||||||
|
means I should see on average a growth of the tree at about 90/s. Check. I raise the write-load to
|
||||||
|
500/s:
|
||||||
|
|
||||||
|
```
|
||||||
|
39421 1 seconds 443 certs
|
||||||
|
39864 1 seconds 442 certs
|
||||||
|
40306 1 seconds 441 certs
|
||||||
|
40747 1 seconds 447 certs
|
||||||
|
41194 1 seconds 448 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
.. and to 1'000/s:
|
||||||
|
```
|
||||||
|
57941 1 seconds 945 certs
|
||||||
|
58886 1 seconds 970 certs
|
||||||
|
59856 1 seconds 948 certs
|
||||||
|
60804 1 seconds 965 certs
|
||||||
|
61769 1 seconds 955 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
After a few minutes I see a few errors from CT Hammer:
|
||||||
|
```
|
||||||
|
W0810 14:55:29.660710 1398779 analysis.go:134] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF
|
||||||
|
W0810 14:55:30.496603 1398779 analysis.go:124] (1 x) failed to create request: write leaf was not OK. Status code: 500. Body: "failed to read body: read tcp 127.0.1.1:443->127.0.0.1:44908: i/o timeout\n"
|
||||||
|
```
|
||||||
|
|
||||||
|
I raise the Hammer load to 5'000/sec (which means 4'500/s unique certs and 500 duplicates), and find
|
||||||
|
the max committed writes/sec to max out at around 4'200/s:
|
||||||
|
```
|
||||||
|
879637 1 seconds 4213 certs
|
||||||
|
883850 1 seconds 4207 certs
|
||||||
|
888057 1 seconds 4211 certs
|
||||||
|
892268 1 seconds 4249 certs
|
||||||
|
896517 1 seconds 4216 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
The error rate is a steady stream of errors like the one before:
|
||||||
|
```
|
||||||
|
W0810 14:59:48.499274 1398779 analysis.go:124] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF
|
||||||
|
W0810 14:59:49.034194 1398779 analysis.go:124] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF
|
||||||
|
W0810 15:00:05.496459 1398779 analysis.go:124] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF
|
||||||
|
W0810 15:00:07.187181 1398779 analysis.go:124] (1 x) failed to create request: failed to write leaf: Post "https://ctlog-test.lab.ipng.ch/ct/v1/add-chain": EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
At this load of 4'200/s, MinIO is not very impressed. Remember in the [[other article]({{< ref
|
||||||
|
2025-07-26-ctlog-1 >}})] I loadtested it to about 7'500 ops/sec and the statistics below are about
|
||||||
|
50 ops/sec (2'800/min). I conclude that MinIO is, in fact, bored of this whole activity:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ mc admin trace --stats ssd
|
||||||
|
Duration: 18m58s ▱▱▱
|
||||||
|
RX Rate:↑ 115 MiB/m
|
||||||
|
TX Rate:↓ 2.4 MiB/m
|
||||||
|
RPM : 2821.3
|
||||||
|
-------------
|
||||||
|
Call Count RPM Avg Time Min Time Max Time Avg TTFB Max TTFB Avg Size Rate /min Errors
|
||||||
|
s3.PutObject 37602 (70.3%) 1982.2 6.2ms 785µs 86.7ms 6.1ms 86.6ms ↑59K ↓0B ↑115M ↓1.4K 0
|
||||||
|
s3.GetObject 15918 (29.7%) 839.1 996µs 670µs 51.3ms 912µs 51.2ms ↑46B ↓3.0K ↑38K ↓2.4M 0
|
||||||
|
```
|
||||||
|
|
||||||
|
Sunlight still keeps its certificate cache on local disk. At a rate of 4'200/s, the ZFS pool has a
|
||||||
|
write rate of about 105MB/s with about 877 ZFS writes per second.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ zpool iostat -v ssd-vol0 10
|
||||||
|
capacity operations bandwidth
|
||||||
|
pool alloc free read write read write
|
||||||
|
-------------------------- ----- ----- ----- ----- ----- -----
|
||||||
|
ssd-vol0 59.1G 685G 0 2.55K 0 312M
|
||||||
|
mirror-0 59.1G 685G 0 2.55K 0 312M
|
||||||
|
wwn-0x5002538a05302930 - - 0 877 0 104M
|
||||||
|
wwn-0x5002538a053069f0 - - 0 871 0 104M
|
||||||
|
wwn-0x5002538a06313ed0 - - 0 866 0 104M
|
||||||
|
-------------------------- ----- ----- ----- ----- ----- -----
|
||||||
|
|
||||||
|
pim@ctlog-test:/etc/sunlight$ zpool iostat -l ssd-vol0 10
|
||||||
|
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
|
||||||
|
pool alloc free read write read write read write read write read write read write wait wait
|
||||||
|
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
|
||||||
|
ssd-vol0 59.0G 685G 0 3.19K 0 388M - 8ms - 628us - 990us - 10ms - 88ms
|
||||||
|
ssd-vol0 59.2G 685G 0 2.49K 0 296M - 5ms - 557us - 163us - 8ms - -
|
||||||
|
ssd-vol0 59.6G 684G 0 2.04K 0 253M - 2ms - 704us - 296us - 4ms - -
|
||||||
|
ssd-vol0 58.8G 685G 0 2.72K 0 328M - 6ms - 783us - 701us - 9ms - 68ms
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
A few interesting observations:
|
||||||
|
* Sunlight still uses a local sqlite3 database for the certificate tracking, which is more
|
||||||
|
efficient than MariaDB/MySQL, let alone AWS RDS, so it has one less runtime dependency.
|
||||||
|
* The write rate to ZFS is significantly higher with Sunlight than TesseraCT (about 8:1). This is
|
||||||
|
likely explained because the sqlite3 database lives on ZFS here, while TesseraCT uses MariaDB
|
||||||
|
running on a different filesystem.
|
||||||
|
* The MinIO usage is a lot lighter. As I reduce the load to 1'000/s, as was the case in the TesseraCT
|
||||||
|
test, I can see the ratio of Get:Put was 93:4 in TesseraCT, while it's 70:30 here. TesseraCT as
|
||||||
|
also consuming more IOPS, running at about 10.5k requests/minute, while Sunlight is
|
||||||
|
significantly calmer at 2.8k requests/minute (almost 4x less!)
|
||||||
|
* The burst capacity of Sunlight is a fair bit higher than TesseraCT, likely due to its more
|
||||||
|
efficient use of S3 backends.
|
||||||
|
|
||||||
|
***Conclusion***: Sunlight S3+MinIO can handle 1'000/s reliably, and can spike to 4'200/s with only
|
||||||
|
few errors.
|
||||||
|
|
||||||
|
#### Sunlight: Loadtesting POSIX
|
||||||
|
|
||||||
|
When I took a closer look at TesseraCT a few weeks ago, it struck me that while making a
|
||||||
|
cloud-native setup, with S3 storage would allow for a cool way to enable storage scaling and
|
||||||
|
read-path redundancy, by creating synchronously replicated buckets, it does come at a significant
|
||||||
|
operational overhead and complexity. My main concern is the amount of different moving parts, and
|
||||||
|
Sunlight really has one very appealing property: it can run entirely on one machine without the need
|
||||||
|
for any other moving parts - even the SQL database is linked in. That's pretty slick.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ cat << EOF > sunlight.yaml
|
||||||
|
listen:
|
||||||
|
- "[::]:443"
|
||||||
|
checkpoints: /ssd-vol0/sunlight-test/shared/checkpoints.db
|
||||||
|
logs:
|
||||||
|
- shortname: sunlight-test
|
||||||
|
inception: 2025-08-10
|
||||||
|
submissionprefix: https://ctlog-test.lab.ipng.ch/
|
||||||
|
monitoringprefix: https://ctlog-test.lab.ipng.ch:1443/
|
||||||
|
secret: /etc/sunlight/sunlight-test.seed.bin
|
||||||
|
cache: /ssd-vol0/sunlight-test/logs/sunlight-test/cache.db
|
||||||
|
localdirectory: /ssd-vol0/sunlight-test/logs/sunlight-test/data
|
||||||
|
roots: /etc/sunlight/roots.pem
|
||||||
|
period: 200
|
||||||
|
poolsize: 15000
|
||||||
|
notafterstart: 2024-01-01T00:00:00Z
|
||||||
|
notafterlimit: 2025-01-01T00:00:00Z
|
||||||
|
EOF
|
||||||
|
pim@ctlog-test:/etc/sunlight$ sunlight -testcert -c sunlight.yaml
|
||||||
|
pim@ctlog-test:/etc/sunlight$ skylight -testcert -c skylight.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
First I'll start a hello-world loadtest at 100/s and take a look at the number of leaves in the
|
||||||
|
checkpoint after a few minutes, I would expect about three minutes worth at 100/s with a duplicate
|
||||||
|
probability of 10% to yield about 16'200 unique certificates in total.
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ while :; do curl -ksS https://ctlog-test.lab.ipng.ch:1443/checkpoint | grep -E '^[0-9]+$'; sleep 60; done
|
||||||
|
10086
|
||||||
|
15518
|
||||||
|
20920
|
||||||
|
26339
|
||||||
|
```
|
||||||
|
|
||||||
|
And would you look at that? `(26339-10086)` is right on the dot! One thing that I find particularly
|
||||||
|
cool about Sunlight is its baked in Prometheus metrics. This allows me some pretty solid insight on
|
||||||
|
its performance. Take a look for example at the write path latency tail (99th ptile):
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ curl -ksS https://ctlog-test.lab.ipng.ch/metrics | egrep 'seconds.*quantile=\"0.99\"'
|
||||||
|
sunlight_addchain_wait_seconds{log="sunlight-test",quantile="0.99"} 0.207285993
|
||||||
|
sunlight_cache_get_duration_seconds{log="sunlight-test",quantile="0.99"} 0.001409719
|
||||||
|
sunlight_cache_put_duration_seconds{log="sunlight-test",quantile="0.99"} 0.002227985
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="discard",quantile="0.99"} 0.000224969
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="fetch",quantile="0.99"} 8.3003e-05
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="upload",quantile="0.99"} 0.042118751
|
||||||
|
sunlight_http_request_duration_seconds{endpoint="add-chain",log="sunlight-test",quantile="0.99"} 0.2259605
|
||||||
|
sunlight_sequencing_duration_seconds{log="sunlight-test",quantile="0.99"} 0.108987393
|
||||||
|
sunlight_sqlite_update_duration_seconds{quantile="0.99"} 0.014922489
|
||||||
|
```
|
||||||
|
|
||||||
|
I'm seeing here that at a load of 100/s (with 90/s of unique certificates), the 99th percentile
|
||||||
|
add-chain latency is 207ms, which makes sense because the `period` configuration field is set to
|
||||||
|
200ms. The filesystem operations (discard, fetch, upload) are _de minimis_ and the sequencing
|
||||||
|
duration is at 109ms. Excellent!
|
||||||
|
|
||||||
|
But can this thing go really fast? I do remember that the CT Hammer uses more CPU than TesseraCT,
|
||||||
|
and I've seen it above also when running my 5'000/s loadtest that's about all the hammer can take on
|
||||||
|
a single Dell R630. So, as I did with the TesseraCT test, I'll use the MinIO SSD and MinIO Disk
|
||||||
|
machines to generate the load.
|
||||||
|
|
||||||
|
I boot them, so that I can hammer, or shall I say jackhammer away:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab.ipng.ch \
|
||||||
|
--log_public_key=MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHRwRllfa2cjURBJIKPpCUbgiiMX/jLQqmfzYrtveUws4SG8eT7+ICoa8xdAQ== \
|
||||||
|
--log_url=https://ctlog-test.lab.ipng.ch:1443/ --write_log_url=https://ctlog-test.lab.ipng.ch/ \
|
||||||
|
--max_read_ops=0 --num_writers=5000 --max_write_ops=5000
|
||||||
|
|
||||||
|
pim@minio-ssd:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab.ipng.ch \
|
||||||
|
--log_public_key=MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHRwRllfa2cjURBJIKPpCUbgiiMX/jLQqmfzYrtveUws4SG8eT7+ICoa8xdAQ== \
|
||||||
|
--log_url=https://ctlog-test.lab.ipng.ch:1443/ --write_log_url=https://ctlog-test.lab.ipng.ch/ \
|
||||||
|
--max_read_ops=0 --num_writers=5000 --max_write_ops=5000 --serial_offset=1000000
|
||||||
|
|
||||||
|
pim@minio-disk:~/src/tesseract$ go run ./internal/hammer --origin=ctlog-test.lab.ipng.ch \
|
||||||
|
--log_public_key=MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE6Hg60YncYt/V69kLmg4LlTO9RmHRwRllfa2cjURBJIKPpCUbgiiMX/jLQqmfzYrtveUws4SG8eT7+ICoa8xdAQ== \
|
||||||
|
--log_url=https://ctlog-test.lab.ipng.ch:1443/ --write_log_url=https://ctlog-test.lab.ipng.ch/ \
|
||||||
|
--max_read_ops=0 --num_writers=5000 --max_write_ops=5000 --serial_offset=2000000
|
||||||
|
```
|
||||||
|
|
||||||
|
This will generate 15'000/s of load, which I note does bring Sunlight to its knees, although it does
|
||||||
|
remain stable (yaay!) with a somewhat more bursty checkpoint interval:
|
||||||
|
|
||||||
|
```
|
||||||
|
5504780 1 seconds 4039 certs
|
||||||
|
5508819 1 seconds 10000 certs
|
||||||
|
5518819 . 2 seconds 7976 certs
|
||||||
|
5526795 1 seconds 2022 certs
|
||||||
|
5528817 1 seconds 9782 certs
|
||||||
|
5538599 1 seconds 217 certs
|
||||||
|
5538816 1 seconds 3114 certs
|
||||||
|
5541930 1 seconds 6818 certs
|
||||||
|
```
|
||||||
|
|
||||||
|
So what I do instead is a somewhat simpler measurement of certificates per minute:
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ while :; do curl -ksS https://ctlog-test.lab.ipng.ch:1443/checkpoint | grep -E '^[0-9]+$'; sleep 60; done
|
||||||
|
6008831
|
||||||
|
6296255
|
||||||
|
6576712
|
||||||
|
```
|
||||||
|
|
||||||
|
This rate boils down to `(6576712-6008831)/120` or 4'700/s of written certs, which at a duplication
|
||||||
|
ratio of 10% means approximately 5'200/s of total accepted certs. This rate, Sunlight is consuming
|
||||||
|
about 10.3 CPUs/s, while Skylight is at 0.1 CPUs/s and the CT Hammer is at 11.1 CPUs/s; Given the 40
|
||||||
|
threads on this machine, I am not saturating the CPU, but I'm curious as this rate is significantly
|
||||||
|
lower than TesseraCT. I briefly turn off the hammer on `ctlog-test` to allow Sunlight to monopolize
|
||||||
|
the entire machine. The CPU use does reduce to about 9.3 CPUs/s suggesting that indeed, the bottleneck
|
||||||
|
is not strictly CPU:
|
||||||
|
|
||||||
|
{{< image width="90%" src="/assets/ctlog/btop-sunlight.png" alt="Sunlight btop" >}}
|
||||||
|
|
||||||
|
When using only two CT Hammers (on `minio-ssd.lab.ipng.ch` and `minio-disk.lab.ipng.ch`), the CPU
|
||||||
|
use on the `ctlog-test.lab.ipng.ch` machine definitely goes down (CT Hammer is kind of a CPU hog....),
|
||||||
|
but the resulting throughput doesn't change that much:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ while :; do curl -ksS https://ctlog-test.lab.ipng.ch:1443/checkpoint | grep -E '^[0-9]+$'; sleep 60; done
|
||||||
|
7985648
|
||||||
|
8302421
|
||||||
|
8528122
|
||||||
|
8772758
|
||||||
|
```
|
||||||
|
|
||||||
|
What I find particularly interesting is that the total rate stays approximately 4'400/s
|
||||||
|
(`(8772758-7985648)/180`), while the checkpoint latency varies considerably. One really cool thing I
|
||||||
|
learned earlier is that Sunlight comes with baked in Prometheus metrics, which I can take a look at
|
||||||
|
while keeping it under this load of ~10'000/sec:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ curl -ksS https://ctlog-test.lab.ipng.ch/metrics | egrep 'seconds.*quantile=\"0.99\"'
|
||||||
|
sunlight_addchain_wait_seconds{log="sunlight-test",quantile="0.99"} 1.889983538
|
||||||
|
sunlight_cache_get_duration_seconds{log="sunlight-test",quantile="0.99"} 0.000148819
|
||||||
|
sunlight_cache_put_duration_seconds{log="sunlight-test",quantile="0.99"} 0.837981208
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="discard",quantile="0.99"} 0.000433179
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="fetch",quantile="0.99"} NaN
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="upload",quantile="0.99"} 0.067494558
|
||||||
|
sunlight_http_request_duration_seconds{endpoint="add-chain",log="sunlight-test",quantile="0.99"} 1.86894666
|
||||||
|
sunlight_sequencing_duration_seconds{log="sunlight-test",quantile="0.99"} 1.111400223
|
||||||
|
sunlight_sqlite_update_duration_seconds{quantile="0.99"} 0.016859223
|
||||||
|
```
|
||||||
|
|
||||||
|
Comparing the throughput at 4'400/s with that first test of 100/s, I expect and can confirm a
|
||||||
|
significant increase in all of these metrics. The 99th percentile addchain is now 1889ms (up from
|
||||||
|
207ms) and the sequencing duration is now 1111ms (up from 109ms).
|
||||||
|
|
||||||
|
#### Sunlight: Effect of period
|
||||||
|
|
||||||
|
I fiddle a little bit with Sunlight's configuration file, notably the `period` and `poolsize`.
|
||||||
|
First I set `period:2000` and `poolsize:15000`, which yields pretty much the same throughput:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ while :; do curl -ksS https://ctlog-test.lab.ipng.ch:1443/checkpoint | grep -E '^[0-9]+$'; sleep 60; done
|
||||||
|
701850
|
||||||
|
1001424
|
||||||
|
1295508
|
||||||
|
1575789
|
||||||
|
```
|
||||||
|
|
||||||
|
With a generated load of 10'000/sec with a 10% duplication rate, I am offering roughly 9'000/sec of
|
||||||
|
unique certificates, and I'm seeing `(1575789 - 701850)/180` or about 4'855/sec come through. Just
|
||||||
|
for reference, at this rate and with `period:2000`, the latency tail looks like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ curl -ksS https://ctlog-test.lab.ipng.ch/metrics | egrep 'seconds.*quantile=\"0.99\"'
|
||||||
|
sunlight_addchain_wait_seconds{log="sunlight-test",quantile="0.99"} 3.203510079
|
||||||
|
sunlight_cache_get_duration_seconds{log="sunlight-test",quantile="0.99"} 0.000108613
|
||||||
|
sunlight_cache_put_duration_seconds{log="sunlight-test",quantile="0.99"} 0.950453973
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="discard",quantile="0.99"} 0.00046192
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="fetch",quantile="0.99"} NaN
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="upload",quantile="0.99"} 0.049007693
|
||||||
|
sunlight_http_request_duration_seconds{endpoint="add-chain",log="sunlight-test",quantile="0.99"} 3.570709413
|
||||||
|
sunlight_sequencing_duration_seconds{log="sunlight-test",quantile="0.99"} 1.5968609040000001
|
||||||
|
sunlight_sqlite_update_duration_seconds{quantile="0.99"} 0.010847308
|
||||||
|
```
|
||||||
|
|
||||||
|
Then I also set a `period:100` and `poolsize:15000`, which does improve a bit:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ while :; do curl -ksS https://ctlog-test.lab.ipng.ch:1443/checkpoint | grep -E '^[0-9]+$'; sleep 60; done
|
||||||
|
560654
|
||||||
|
950524
|
||||||
|
1324645
|
||||||
|
1720362
|
||||||
|
```
|
||||||
|
|
||||||
|
With the same generated load of 10'000/sec with a 10% duplication rate, I am still offering roughly
|
||||||
|
9'000/sec of unique certificates, and I'm seeing `(1720362 - 560654)/180` or about 6'440/sec come
|
||||||
|
through, which is a fair bit better, at the expense of more disk activity. At this rate and with
|
||||||
|
`period:100`, the latency tail looks like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
pim@ctlog-test:/etc/sunlight$ curl -ksS https://ctlog-test.lab.ipng.ch/metrics | egrep 'seconds.*quantile=\"0.99\"'
|
||||||
|
sunlight_addchain_wait_seconds{log="sunlight-test",quantile="0.99"} 1.616046445
|
||||||
|
sunlight_cache_get_duration_seconds{log="sunlight-test",quantile="0.99"} 7.5123e-05
|
||||||
|
sunlight_cache_put_duration_seconds{log="sunlight-test",quantile="0.99"} 0.534935803
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="discard",quantile="0.99"} 0.000377273
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="fetch",quantile="0.99"} 4.8893e-05
|
||||||
|
sunlight_fs_op_duration_seconds{log="sunlight-test",method="upload",quantile="0.99"} 0.054685991
|
||||||
|
sunlight_http_request_duration_seconds{endpoint="add-chain",log="sunlight-test",quantile="0.99"} 1.946445877
|
||||||
|
sunlight_sequencing_duration_seconds{log="sunlight-test",quantile="0.99"} 0.980602185
|
||||||
|
sunlight_sqlite_update_duration_seconds{quantile="0.99"} 0.018385831
|
||||||
|
```
|
||||||
|
|
||||||
|
***Conclusion***: Sunlight on POSIX can reliably handle 4'400/s (with a duplicate rate of 10%) on
|
||||||
|
this setup.
|
||||||
|
|
||||||
|
## Wrapup - Observations
|
||||||
|
|
||||||
|
From an operators point of view, TesseraCT and Sunlight handle quite differently. Both are easily up
|
||||||
|
to the task of serving the current write-load (which is about 250/s).
|
||||||
|
|
||||||
|
* ***S3***: When using the S3 backend, TesseraCT became quite unhappy above 800/s while Sunlight
|
||||||
|
went all the way up to 4'200/s and sent significantly less requests to MinIO (about 4x less),
|
||||||
|
while showing good telemetry on the use of S3 backends. In this mode, TesseraCT uses MySQL (in
|
||||||
|
my case, MariaDB) which was not on the ZFS pool, but on the boot-disk.
|
||||||
|
|
||||||
|
* ***POSIX***: When using normal filesystem, Sunlight seems to peak at 4'800/s while TesseraCT
|
||||||
|
went all the way to 12'000/s. When doing so, Disk IO was quite similar between the two
|
||||||
|
solutions, taking into account that TesseraCT runs BadgerDB, while Sunlight uses sqlite3,
|
||||||
|
both are using their respective ZFS pool.
|
||||||
|
|
||||||
|
***Notable***: Sunlight POSIX and S3 performance is roughly identical (both handle about
|
||||||
|
5'000/sec), while TesseraCT POSIX performance (12'000/s) is significantly better than its S3
|
||||||
|
(800/s). Some other observations:
|
||||||
|
|
||||||
|
* Sunlight has a very opinionated configuration, and can run multiple logs with one configuration
|
||||||
|
file and one binary. Its configuration was a bit constraining though, as I could not manage to
|
||||||
|
use `monitoringprefix` or `submissionprefix` with `http://` prefix - a likely security
|
||||||
|
precaution - but also using ports in those prefixes (other than the standard 443) rendered
|
||||||
|
Sunlight and Skylight unusable for me.
|
||||||
|
|
||||||
|
* Skylight only serves from local directory, it does not have support for S3. For operators using S3,
|
||||||
|
an alternative could be to use NGINX in the serving path, similar to TesseraCT. Skylight does have
|
||||||
|
a few things to teach me though, notably on proper compression, content type and other headers.
|
||||||
|
|
||||||
|
* TesseraCT does not have a configuration file, and will run exactly one log per binary
|
||||||
|
instance. It uses flags to construct the environment, and is much more forgiving for creative
|
||||||
|
`origin` (log name), and submission- and monitoring URLs. It's happy to use regular 'http://'
|
||||||
|
for both, which comes in handy in those architectures where the system is serving behind a
|
||||||
|
reversed proxy.
|
||||||
|
|
||||||
|
* The TesseraCT Hammer tool then again does not like using self-signed certificates, and needs
|
||||||
|
to be told to skip certificate validation in the case of Sunlight loadtests while it is
|
||||||
|
running with the `-testcert` commandline.
|
||||||
|
|
||||||
|
I consider all of these small and mostly cosmetic issues, because in production there will be proper
|
||||||
|
TLS certificates issued and normal https:// serving ports with unique monitoring and submission
|
||||||
|
hostnames.
|
||||||
|
|
||||||
|
## What's Next
|
||||||
|
|
||||||
|
Together with Antonis Chariton and Jeroen Massar, IPng Networks will be offering both TesseraCT and
|
||||||
|
Sunlight logs on the public internet. One final step is to productionize both logs, and file the
|
||||||
|
paperwork for them in the community. Although at this point our Sunlight log is already running,
|
||||||
|
I'll wait a few weeks to gather any additional intel, before wrapping up in a final article.
|
||||||
|
|
515
content/articles/2025-08-24-ctlog-3.md
Normal file
515
content/articles/2025-08-24-ctlog-3.md
Normal file
@@ -0,0 +1,515 @@
|
|||||||
|
---
|
||||||
|
date: "2025-08-24T12:07:23Z"
|
||||||
|
title: 'Certificate Transparency - Part 3 - Operations'
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image width="10em" float="right" src="/assets/ctlog/ctlog-logo-ipng.png" alt="ctlog logo" >}}
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
There once was a Dutch company called [[DigiNotar](https://en.wikipedia.org/wiki/DigiNotar)], as the
|
||||||
|
name suggests it was a form of _digital notary_, and they were in the business of issuing security
|
||||||
|
certificates. Unfortunately, in June of 2011, their IT infrastructure was compromised and
|
||||||
|
subsequently it issued hundreds of fraudulent SSL certificates, some of which were used for
|
||||||
|
man-in-the-middle attacks on Iranian Gmail users. Not cool.
|
||||||
|
|
||||||
|
Google launched a project called **Certificate Transparency**, because it was becoming more common
|
||||||
|
that the root of trust given to _Certification Authorities_ could no longer be unilaterally trusted.
|
||||||
|
These attacks showed that the lack of transparency in the way CAs operated was a significant risk to
|
||||||
|
the Web Public Key Infrastructure. It led to the creation of this ambitious
|
||||||
|
[[project](https://certificate.transparency.dev/)] to improve security online by bringing
|
||||||
|
accountability to the system that protects our online services with _SSL_ (Secure Socket Layer)
|
||||||
|
and _TLS_ (Transport Layer Security).
|
||||||
|
|
||||||
|
In 2013, [[RFC 6962](https://datatracker.ietf.org/doc/html/rfc6962)] was published by the IETF. It
|
||||||
|
describes an experimental protocol for publicly logging the existence of Transport Layer Security
|
||||||
|
(TLS) certificates as they are issued or observed, in a manner that allows anyone to audit
|
||||||
|
certificate authority (CA) activity and notice the issuance of suspect certificates as well as to
|
||||||
|
audit the certificate logs themselves. The intent is that eventually clients would refuse to honor
|
||||||
|
certificates that do not appear in a log, effectively forcing CAs to add all issued certificates to
|
||||||
|
the logs.
|
||||||
|
|
||||||
|
In the first two articles of this series, I explored [[Sunlight]({{< ref 2025-07-26-ctlog-1 >}})]
|
||||||
|
and [[TesseraCT]({{< ref 2025-08-10-ctlog-2 >}})], two open source implementations of the Static CT
|
||||||
|
protocol. In this final article, I'll share the details on how I created the environment and
|
||||||
|
production instances for four logs that IPng will be providing: Rennet and Lipase are two
|
||||||
|
ingredients to make cheese and will serve as our staging/testing logs. Gouda and Halloumi are two
|
||||||
|
delicious cheeses that pay homage to our heritage, Jeroen and I being Dutch and Antonis being
|
||||||
|
Greek.
|
||||||
|
|
||||||
|
## Hardware
|
||||||
|
|
||||||
|
At IPng Networks, all hypervisors are from the same brand: Dell's Poweredge line. In this project,
|
||||||
|
Jeroen is also contributing a server, and it so happens that he also has a Dell Poweredge. We're
|
||||||
|
both running Debian on our hypervisor, so we install a fresh VM with Debian 13.0, codenamed
|
||||||
|
_Trixie_, and give the machine 16GB of memory, 8 vCPU and a 16GB boot disk. Boot disks are placed on
|
||||||
|
the hypervisor's ZFS pool, and a blockdevice snapshot is taken every 6hrs. This allows the boot disk
|
||||||
|
to be rolled back to a last known good point in case an upgrade goes south. If you haven't seen it
|
||||||
|
yet, take a look at [[zrepl](https://zrepl.github.io/)], a one-stop, integrated solution for ZFS
|
||||||
|
replication. This tool is incredibly powerful, and can do snapshot management, sourcing / sinking
|
||||||
|
to remote hosts, of course using incremental snapshots as they are native to ZFS.
|
||||||
|
|
||||||
|
Once the machine is up, we pass four enterprise-class storage drives, in our case 3.84TB Kioxia
|
||||||
|
NVMe, model _KXD51RUE3T84_ which are PCIe 3.1 x4 lanes, and NVMe 1.2.1 specification with a good
|
||||||
|
durability and reasonable (albeit not stellar) read throughput of ~2700MB/s, write throughput of
|
||||||
|
~800MB/s with 240 kIOPS random read and 21 kIOPS random write. My attention is also drawn to a
|
||||||
|
specific specification point: these drives allow for 1.0 DWPD, which stands for _Drive Writes Per
|
||||||
|
Day_, in other words they are not going to run themselves off a cliff after a few petabytes of
|
||||||
|
writes, and I am reminded that a CT Log wants to write to disk a lot during normal operation.
|
||||||
|
|
||||||
|
The point of these logs is to **keep them safe**, and the most important aspects of the compute
|
||||||
|
environment are the use of ECC memory to detect single bit errors, and dependable storage. Toshiba
|
||||||
|
makes a great product.
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog1:~$ sudo zpool create -f -o ashift=12 -o autotrim=on -O atime=off -O xattr=sa \
|
||||||
|
ssd-vol0 raidz2 /dev/disk/by-id/nvme-KXD51RUE3T84_TOSHIBA_*M
|
||||||
|
ctlog1:~$ sudo zfs create -o encryption=on -o keyformat=passphrase ssd-vol0/enc
|
||||||
|
ctlog1:~$ sudo zfs create ssd-vol0/logs
|
||||||
|
ctlog1:~$ for log in lipase; do \
|
||||||
|
for shard in 2025h2 2026h1 2026h2 2027h1 2027h2; do \
|
||||||
|
sudo zfs create ssd-vol0/logs/${log}${shard} \
|
||||||
|
done \
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
The hypervisor will use PCI passthrough for the NVMe drives, and we'll handle ZFS directly on the
|
||||||
|
VM. The first command creates a ZFS raidz2 pool using 4kB blocks, turns of _atime_ (which avoids one
|
||||||
|
metadata write for each read!), and turns on SSD trimming in ZFS, a very useful feature.
|
||||||
|
|
||||||
|
Then I'll create an encrypted volume for the configuration and key material. This way, if the
|
||||||
|
machine is ever physically transported, the keys will be safe in transit. Finally, I'll create the
|
||||||
|
temporal log shards starting at 2025h2, all the way through to 2027h2 for our testing log called
|
||||||
|
_Lipase_ and our production log called _Halloumi_ on Jeroen's machine. On my own machine, it'll be
|
||||||
|
_Rennet_ for the testing log and _Gouda_ for the production log.
|
||||||
|
|
||||||
|
## Sunlight
|
||||||
|
|
||||||
|
{{< image width="10em" float="right" src="/assets/ctlog/sunlight-logo.png" alt="Sunlight logo" >}}
|
||||||
|
|
||||||
|
I set up Sunlight first. as its authors have extensive operational notes both in terms of the
|
||||||
|
[[config](https://config.sunlight.geomys.org/)] of Geomys' _Tuscolo_ log, as well as on the
|
||||||
|
[[Sunlight](https://sunlight.dev)] homepage. I really appreciate that Filippo added some
|
||||||
|
[[Gists](https://gist.github.com/FiloSottile/989338e6ba8e03f2c699590ce83f537b)] and
|
||||||
|
[[Doc](https://docs.google.com/document/d/1ID8dX5VuvvrgJrM0Re-jt6Wjhx1eZp-trbpSIYtOhRE/edit?tab=t.0#heading=h.y3yghdo4mdij)]
|
||||||
|
with pretty much all I need to know to run one too. Our Rennet and Gouda logs use very similar
|
||||||
|
approach for their configuration, with one notable exception: the VMs do not have a public IP
|
||||||
|
address, and are tucked away in a private network called IPng Site Local. I'll get back to that
|
||||||
|
later.
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog@ctlog0:/ssd-vol0/enc/sunlight$ cat << EOF | tee sunlight-staging.yaml
|
||||||
|
listen:
|
||||||
|
- "[::]:16420"
|
||||||
|
checkpoints: /ssd-vol0/shared/checkpoints.db
|
||||||
|
logs:
|
||||||
|
- shortname: rennet2025h2
|
||||||
|
inception: 2025-07-28
|
||||||
|
period: 200
|
||||||
|
poolsize: 750
|
||||||
|
submissionprefix: https://rennet2025h2.log.ct.ipng.ch
|
||||||
|
monitoringprefix: https://rennet2025h2.mon.ct.ipng.ch
|
||||||
|
ccadbroots: testing
|
||||||
|
extraroots: /ssd-vol0/enc/sunlight/extra-roots-staging.pem
|
||||||
|
secret: /ssd-vol0/enc/sunlight/keys/rennet2025h2.seed.bin
|
||||||
|
cache: /ssd-vol0/logs/rennet2025h2/cache.db
|
||||||
|
localdirectory: /ssd-vol0/logs/rennet2025h2/data
|
||||||
|
notafterstart: 2025-07-01T00:00:00Z
|
||||||
|
notafterlimit: 2026-01-01T00:00:00Z
|
||||||
|
...
|
||||||
|
EOF
|
||||||
|
ctlog@ctlog0:/ssd-vol0/enc/sunlight$ cat << EOF | tee skylight-staging.yaml
|
||||||
|
listen:
|
||||||
|
- "[::]:16421"
|
||||||
|
homeredirect: https://ipng.ch/s/ct/
|
||||||
|
logs:
|
||||||
|
- shortname: rennet2025h2
|
||||||
|
monitoringprefix: https://rennet2025h2.mon.ct.ipng.ch
|
||||||
|
localdirectory: /ssd-vol0/logs/rennet2025h2/data
|
||||||
|
staging: true
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
In the first configuration file, I'll tell _Sunlight_ (the write path component) to listen on port
|
||||||
|
`:16420` and I'll tell _Skylight_ (the read path component) to listen on port `:16421`. I've disabled
|
||||||
|
the automatic certificate renewals, and will handle SSL upstream. A few notes on this:
|
||||||
|
|
||||||
|
1. Most importantly, I will be using a common frontend pool with a wildcard certificate for
|
||||||
|
`*.ct.ipng.ch`. I wrote about [[DNS-01]({{< ref 2023-03-24-lego-dns01 >}})] before, it's a very
|
||||||
|
convenient way for IPng to do certificate pool management. I will be sharing certificate for all log
|
||||||
|
types under this certificate.
|
||||||
|
1. ACME/HTTP-01 could be made to work with a bit of effort; plumbing through the `/.well-known/`
|
||||||
|
URIs on the frontend and pointing them to these instances. But then the cert would have to be copied
|
||||||
|
from Sunlight back to the frontends.
|
||||||
|
|
||||||
|
I've noticed that when the log doesn't exist yet, I can start Sunlight and it'll create the bits and
|
||||||
|
pieces on the local filesystem and start writing checkpoints. But if the log already exists, I am
|
||||||
|
required to have the _monitoringprefix_ active, otherwise Sunlight won't start up. It's a small
|
||||||
|
thing, as I will have the read path operational in a few simple steps. Anyway, all five logshards
|
||||||
|
for Rennet, and a few days later, for Gouda, are operational this way.
|
||||||
|
|
||||||
|
Skylight provides all the things I need to serve the data back, which is a huge help. The [[Static
|
||||||
|
Log Spec](https://github.com/C2SP/C2SP/blob/main/static-ct-api.md)] is very clear on things like
|
||||||
|
compression, content-type, cache-control and other headers. Skylight makes this a breeze, as it reads
|
||||||
|
a configuration file very similar to the Sunlight write-path one, and takes care of it all for me.
|
||||||
|
|
||||||
|
## TesseraCT
|
||||||
|
|
||||||
|
{{< image width="10em" float="right" src="/assets/ctlog/tesseract-logo.png" alt="TesseraCT logo" >}}
|
||||||
|
|
||||||
|
Good news came to our community on August 14th, when Google's TrustFabric team announced their Alpha
|
||||||
|
milestone of [[TesseraCT](https://blog.transparency.dev/introducing-tesseract)]. This release
|
||||||
|
also moved the POSIX variant from experimental alongside the already further along GCP and AWS
|
||||||
|
personalities. After playing around with it with Al and the team, I think I've learned enough to get
|
||||||
|
us going in a public `tesseract-posix` instance.
|
||||||
|
|
||||||
|
One thing I liked about Sunlight is its compact YAML file that described the pertinent bits of the
|
||||||
|
system, and that I can serve any number of logs with the same process. On the other hand, TesseraCT
|
||||||
|
can serve only one log per process. Both have pro's and con's, notably if any poisonous submission
|
||||||
|
would be offered, Sunlight might take down all logs, while TesseraCT would only take down the log
|
||||||
|
receiving the offensive submission. On the other hand, maintaining separate processes is cumbersome,
|
||||||
|
and all log instances need to be meticulously configured.
|
||||||
|
|
||||||
|
|
||||||
|
### TesseraCT genconf
|
||||||
|
|
||||||
|
I decide to automate this by vibing a little tool called `tesseract-genconf`, which I've published on
|
||||||
|
[[Gitea](https://git.ipng.ch/certificate-transparency/cheese)]. What it does is take a YAML file
|
||||||
|
describing the logs, and outputs the bits and pieces needed to operate multiple separate processes
|
||||||
|
that together form the sharded static log. I've attempted to stay mostly compatible with the
|
||||||
|
Sunlight YAML configuration, and came up with a variant like this one:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog@ctlog1:/ssd-vol0/enc/tesseract$ cat << EOF | tee tesseract-staging.yaml
|
||||||
|
listen:
|
||||||
|
- "[::]:8080"
|
||||||
|
roots: /ssd-vol0/enc/tesseract/roots.pem
|
||||||
|
logs:
|
||||||
|
- shortname: lipase2025h2
|
||||||
|
listen: "[::]:16900"
|
||||||
|
submissionprefix: https://lipase2025h2.log.ct.ipng.ch
|
||||||
|
monitoringprefix: https://lipase2025h2.mon.ct.ipng.ch
|
||||||
|
extraroots: /ssd-vol0/enc/tesseract/extra-roots-staging.pem
|
||||||
|
secret: /ssd-vol0/enc/tesseract/keys/lipase2025h2.pem
|
||||||
|
localdirectory: /ssd-vol0/logs/lipase2025h2/data
|
||||||
|
notafterstart: 2025-07-01T00:00:00Z
|
||||||
|
notafterlimit: 2026-01-01T00:00:00Z
|
||||||
|
...
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
With this snippet, I have all the information I need. Here's the steps I take to construct the log
|
||||||
|
itself:
|
||||||
|
|
||||||
|
***1. Generate keys***
|
||||||
|
|
||||||
|
The keys are `prime256v1` and the format that TesseraCT accepts did change since I wrote up my first
|
||||||
|
[[deep dive]({{< ref 2025-07-26-ctlog-1 >}})] a few weeks ago. Now, the tool accepts a `PEM` format
|
||||||
|
private key, from which the _Log ID_ and _Public Key_ can be derived. So off I go:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf -c tesseract-staging.yaml gen-key
|
||||||
|
Creating /ssd-vol0/enc/tesseract/keys/lipase2025h2.pem
|
||||||
|
Creating /ssd-vol0/enc/tesseract/keys/lipase2026h1.pem
|
||||||
|
Creating /ssd-vol0/enc/tesseract/keys/lipase2026h2.pem
|
||||||
|
Creating /ssd-vol0/enc/tesseract/keys/lipase2027h1.pem
|
||||||
|
Creating /ssd-vol0/enc/tesseract/keys/lipase2027h2.pem
|
||||||
|
```
|
||||||
|
|
||||||
|
Of course, if a file already exists at that location, it'll just print a warning like:
|
||||||
|
```
|
||||||
|
Key already exists: /ssd-vol0/enc/tesseract/keys/lipase2025h2.pem (skipped)
|
||||||
|
```
|
||||||
|
|
||||||
|
***2. Generate JSON/HTML***
|
||||||
|
|
||||||
|
I will be operating the read-path with NGINX. Log operators have started speaking about their log
|
||||||
|
metadata in terms of a small JSON file called `log.v3.json`, and Skylight does a good job of
|
||||||
|
exposing that one, alongside all the other pertinent metadata. So I'll generate these files for each
|
||||||
|
of the logs:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf -c tesseract-staging.yaml gen-html
|
||||||
|
Creating /ssd-vol0/logs/lipase2025h2/data/index.html
|
||||||
|
Creating /ssd-vol0/logs/lipase2025h2/data/log.v3.json
|
||||||
|
Creating /ssd-vol0/logs/lipase2026h1/data/index.html
|
||||||
|
Creating /ssd-vol0/logs/lipase2026h1/data/log.v3.json
|
||||||
|
Creating /ssd-vol0/logs/lipase2026h2/data/index.html
|
||||||
|
Creating /ssd-vol0/logs/lipase2026h2/data/log.v3.json
|
||||||
|
Creating /ssd-vol0/logs/lipase2027h1/data/index.html
|
||||||
|
Creating /ssd-vol0/logs/lipase2027h1/data/log.v3.json
|
||||||
|
Creating /ssd-vol0/logs/lipase2027h2/data/index.html
|
||||||
|
Creating /ssd-vol0/logs/lipase2027h2/data/log.v3.json
|
||||||
|
```
|
||||||
|
|
||||||
|
{{< image width="60%" src="/assets/ctlog/lipase.png" alt="TesseraCT Lipase Log" >}}
|
||||||
|
|
||||||
|
It's nice to see a familiar look-and-feel for these logs appear in those `index.html` (which all
|
||||||
|
cross-link to each other within the logs specificied in `tesseract-staging.yaml`, which is dope.
|
||||||
|
|
||||||
|
***3. Generate Roots***
|
||||||
|
|
||||||
|
Antonis had seen this before (thanks for the explanation!) but TesseraCT does not natively implement
|
||||||
|
fetching of the [[CCADB](https://www.ccadb.org/)] roots. But, he points out, you can just get them
|
||||||
|
from any other running log instance, so I'll implement a `gen-roots` command:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf gen-roots \
|
||||||
|
--source https://tuscolo2027h1.sunlight.geomys.org --output production-roots.pem
|
||||||
|
Fetching roots from: https://tuscolo2027h1.sunlight.geomys.org/ct/v1/get-roots
|
||||||
|
2025/08/25 08:24:58 Warning: Failed to parse certificate,carefully skipping: x509: negative serial number
|
||||||
|
Creating production-roots.pem
|
||||||
|
Successfully wrote 248 certificates to tusc.pem (out of 249 total)
|
||||||
|
|
||||||
|
ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf gen-roots \
|
||||||
|
--source https://navigli2027h1.sunlight.geomys.org --output testing-roots.pem
|
||||||
|
Fetching roots from: https://navigli2027h1.sunlight.geomys.org/ct/v1/get-roots
|
||||||
|
Creating testing-roots.pem
|
||||||
|
Successfully wrote 82 certificates to tusc.pem (out of 82 total)
|
||||||
|
```
|
||||||
|
|
||||||
|
I can do this regularly, say daily, in a cronjob and if the files were to change, restart the
|
||||||
|
TesseraCT processes. It's not ideal (because the restart might be briefly disruptive), but it's a
|
||||||
|
reasonable option for the time being.
|
||||||
|
|
||||||
|
***4. Generate TesseraCT cmdline***
|
||||||
|
|
||||||
|
I will be running TesseraCT as a _templated unit_ in systemd. These are system unit files that have
|
||||||
|
an argument, they will have an @ in their name, like so:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog@ctlog1:/ssd-vol0/enc/tesseract$ cat << EOF | sudo tee /lib/systemd/system/tesseract@.service
|
||||||
|
[Unit]
|
||||||
|
Description=Tesseract CT Log service for %i
|
||||||
|
ConditionFileExists=/ssd-vol0/logs/%i/data/.env
|
||||||
|
After=network.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
# The %i here refers to the instance name, e.g., "lipase2025h2"
|
||||||
|
# This path should point to where your instance-specific .env files are located
|
||||||
|
EnvironmentFile=/ssd-vol0/logs/%i/data/.env
|
||||||
|
ExecStart=/home/ctlog/bin/tesseract-posix $TESSERACT_ARGS
|
||||||
|
User=ctlog
|
||||||
|
Group=ctlog
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
I can now implement a `gen-env` command for my tool:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf -c tesseract-staging.yaml gen-env
|
||||||
|
Creating /ssd-vol0/logs/lipase2025h2/data/roots.pem
|
||||||
|
Creating /ssd-vol0/logs/lipase2025h2/data/.env
|
||||||
|
Creating /ssd-vol0/logs/lipase2026h1/data/roots.pem
|
||||||
|
Creating /ssd-vol0/logs/lipase2026h1/data/.env
|
||||||
|
Creating /ssd-vol0/logs/lipase2026h2/data/roots.pem
|
||||||
|
Creating /ssd-vol0/logs/lipase2026h2/data/.env
|
||||||
|
Creating /ssd-vol0/logs/lipase2027h1/data/roots.pem
|
||||||
|
Creating /ssd-vol0/logs/lipase2027h1/data/.env
|
||||||
|
Creating /ssd-vol0/logs/lipase2027h2/data/roots.pem
|
||||||
|
Creating /ssd-vol0/logs/lipase2027h2/data/.env
|
||||||
|
```
|
||||||
|
|
||||||
|
Looking at one of those .env files, I can show the exact commandline I'll be feeding to the
|
||||||
|
`tesseract-posix` binary:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog@ctlog1:/ssd-vol0/enc/tesseract$ cat /ssd-vol0/logs/lipase2025h2/data/.env
|
||||||
|
TESSERACT_ARGS="--private_key=/ssd-vol0/enc/tesseract/keys/lipase2025h2.pem
|
||||||
|
--origin=lipase2025h2.log.ct.ipng.ch --storage_dir=/ssd-vol0/logs/lipase2025h2/data
|
||||||
|
--roots_pem_file=/ssd-vol0/logs/lipase2025h2/data/roots.pem --http_endpoint=[::]:16900
|
||||||
|
--not_after_start=2025-07-01T00:00:00Z --not_after_limit=2026-01-01T00:00:00Z"
|
||||||
|
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
|
||||||
|
```
|
||||||
|
|
||||||
|
{{< image width="7em" float="left" src="/assets/shared/warning.png" alt="Warning" >}}
|
||||||
|
A quick operational note on OpenTelemetry (also often referred to as Otel): Al and the TrustFabric
|
||||||
|
team added open telemetry to the TesseraCT personalities, as it was mostly already implemented in
|
||||||
|
the underlying Tessera library. By default, it'll try to send its telemetry to localhost using
|
||||||
|
`https`, which makes sense in those cases where the collector is on a different machine. In my case,
|
||||||
|
I'll keep `otelcol` (the collector) on the same machine. Its job is to consume the Otel telemetry
|
||||||
|
stream, and turn those back into Prometheus `/metrics` endpoint on port `:9464`.
|
||||||
|
|
||||||
|
The `gen-env` command also assembles the per-instance `roots.pem` file. For staging logs, it'll take
|
||||||
|
the file pointed to by the `roots:` key, and append any per-log `extraroots:` files. For me, these
|
||||||
|
extraroots are empty and the main roots file points at either the testing roots that came from
|
||||||
|
_Rennet_ (our Sunlight staging log), or the production roots that came from _Gouda_. A job well done!
|
||||||
|
|
||||||
|
***5. Generate NGINX***
|
||||||
|
|
||||||
|
When I first ran my tests, I noticed that the log check tool called `ct-fsck` threw errors on my
|
||||||
|
read path. Filippo explained that the HTTP headers matter in the Static CT specification. Tiles,
|
||||||
|
Issuers, and Checkpoint must all have specific caching and content type headers set. This is what
|
||||||
|
makes Skylight such a gem - I get to read it (and the spec!) to see what I'm supposed to be serving.
|
||||||
|
|
||||||
|
And thus, `gen-nginx` command is born, and listens on port `:8080` for requests:
|
||||||
|
|
||||||
|
```
|
||||||
|
ctlog@ctlog1:/ssd-vol0/enc/tesseract$ tesseract-genconf -c tesseract-staging.yaml gen-nginx
|
||||||
|
Creating nginx config: /ssd-vol0/logs/lipase2025h2/data/lipase2025h2.mon.ct.ipng.ch.conf
|
||||||
|
Creating nginx config: /ssd-vol0/logs/lipase2026h1/data/lipase2026h1.mon.ct.ipng.ch.conf
|
||||||
|
Creating nginx config: /ssd-vol0/logs/lipase2026h2/data/lipase2026h2.mon.ct.ipng.ch.conf
|
||||||
|
Creating nginx config: /ssd-vol0/logs/lipase2027h1/data/lipase2027h1.mon.ct.ipng.ch.conf
|
||||||
|
Creating nginx config: /ssd-vol0/logs/lipase2027h2/data/lipase2027h2.mon.ct.ipng.ch.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
All that's left for me to do is symlink these from `/etc/nginx/sites-enabled/` and the read-path is
|
||||||
|
off to the races. With these commands in the `tesseract-genconf` tool, I am hoping that future
|
||||||
|
travelers have an easy time setting up their static log. Please let me know if you'd like to use, or
|
||||||
|
contribute, to the tool. You can find me in the Transparency Dev Slack, in #ct and also #cheese.
|
||||||
|
|
||||||
|
|
||||||
|
## IPng Frontends
|
||||||
|
|
||||||
|
{{< image width="18em" float="right" src="/assets/ctlog/MPLS Backbone - CTLog.svg" alt="ctlog at ipng" >}}
|
||||||
|
|
||||||
|
IPng Networks has a private internal network called [[IPng Site Local]({{< ref 2023-03-11-mpls-core
|
||||||
|
>}})], which is not routed on the internet. Our [[Frontends]({{< ref 2023-03-17-ipng-frontends >}})]
|
||||||
|
are the only things that have public IPv4 and IPv6 addresses. It allows for things like anycasted
|
||||||
|
webservers and loadbalancing with
|
||||||
|
[[Maglev](https://research.google/pubs/maglev-a-fast-and-reliable-software-network-load-balancer/)].
|
||||||
|
|
||||||
|
The IPng Site Local network kind of looks like the picture to the right. The hypervisors running the
|
||||||
|
Sunlight and TesseraCT logs are at NTT Zurich1 in Rümlang, Switzerland. The IPng frontends are
|
||||||
|
in green, and the sweet thing is, some of them run in IPng's own ISP network (AS8298), while others
|
||||||
|
run in partner networks (like IP-Max AS25091, and Coloclue AS8283). This means that I will benefit
|
||||||
|
from some pretty solid connectivity redundancy.
|
||||||
|
|
||||||
|
The frontends are provisioned with Ansible. There are two aspects to them - firstly, a _certbot_
|
||||||
|
instance maintains the Let's Encrypt wildcard certificates for `*.ct.ipng.ch`. There's a machine
|
||||||
|
tucked away somewhere called `lego.net.ipng.ch` -- again, not exposed on the internet -- and its job
|
||||||
|
is to renew certificates and copy them to the machines that need them. Next, a cluster of NGINX
|
||||||
|
servers uses these certificates to expose IPng and customer services to the Internet.
|
||||||
|
|
||||||
|
I can tie it all together with a snippet like so, for which I apologize in advance - it's quite a
|
||||||
|
wall of text:
|
||||||
|
|
||||||
|
```
|
||||||
|
map $http_user_agent $no_cache_ctlog_lipase {
|
||||||
|
"~*TesseraCT fsck" 1;
|
||||||
|
default 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
server {
|
||||||
|
listen [::]:443 ssl http2;
|
||||||
|
listen 0.0.0.0:443 ssl http2;
|
||||||
|
ssl_certificate /etc/certs/ct.ipng.ch/fullchain.pem;
|
||||||
|
ssl_certificate_key /etc/certs/ct.ipng.ch/privkey.pem;
|
||||||
|
include /etc/nginx/conf.d/options-ssl-nginx.inc;
|
||||||
|
ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;
|
||||||
|
|
||||||
|
server_name lipase2025h2.log.ct.ipng.ch;
|
||||||
|
access_log /nginx/logs/lipase2025h2.log.ct.ipng.ch-access.log upstream buffer=512k flush=5s;
|
||||||
|
include /etc/nginx/conf.d/ipng-headers.inc;
|
||||||
|
|
||||||
|
location = / {
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Host lipase2025h2.mon.ct.ipng.ch;
|
||||||
|
proxy_set_header Upgrade $http_upgrade;
|
||||||
|
proxy_set_header Connection "upgrade";
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
proxy_pass http://ctlog1.net.ipng.ch:8080/index.html;
|
||||||
|
}
|
||||||
|
|
||||||
|
location = /metrics {
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Host $host;
|
||||||
|
proxy_set_header Upgrade $http_upgrade;
|
||||||
|
proxy_set_header Connection "upgrade";
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
proxy_pass http://ctlog1.net.ipng.ch:9464;
|
||||||
|
}
|
||||||
|
|
||||||
|
location / {
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Host $host;
|
||||||
|
proxy_set_header Upgrade $http_upgrade;
|
||||||
|
proxy_set_header Connection "upgrade";
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
proxy_pass http://ctlog1.net.ipng.ch:16900;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
server {
|
||||||
|
listen [::]:443 ssl http2;
|
||||||
|
listen 0.0.0.0:443 ssl http2;
|
||||||
|
ssl_certificate /etc/certs/ct.ipng.ch/fullchain.pem;
|
||||||
|
ssl_certificate_key /etc/certs/ct.ipng.ch/privkey.pem;
|
||||||
|
include /etc/nginx/conf.d/options-ssl-nginx.inc;
|
||||||
|
ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;
|
||||||
|
|
||||||
|
server_name lipase2025h2.mon.ct.ipng.ch;
|
||||||
|
access_log /nginx/logs/lipase2025h2.mon.ct.ipng.ch-access.log upstream buffer=512k flush=5s;
|
||||||
|
include /etc/nginx/conf.d/ipng-headers.inc;
|
||||||
|
|
||||||
|
location = /checkpoint {
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Host $host;
|
||||||
|
proxy_set_header Upgrade $http_upgrade;
|
||||||
|
proxy_set_header Connection "upgrade";
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
|
||||||
|
proxy_pass http://ctlog1.net.ipng.ch:8080;
|
||||||
|
}
|
||||||
|
|
||||||
|
location / {
|
||||||
|
proxy_http_version 1.1;
|
||||||
|
proxy_set_header Host $host;
|
||||||
|
proxy_set_header Upgrade $http_upgrade;
|
||||||
|
proxy_set_header Connection "upgrade";
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
|
||||||
|
include /etc/nginx/conf.d/ipng-upstream-headers.inc;
|
||||||
|
proxy_cache ipng_cache;
|
||||||
|
proxy_cache_key "$scheme://$host$request_uri";
|
||||||
|
proxy_cache_valid 200 24h;
|
||||||
|
proxy_cache_revalidate off;
|
||||||
|
proxy_cache_bypass $no_cache_ctlog_lipase;
|
||||||
|
proxy_no_cache $no_cache_ctlog_lipase;
|
||||||
|
|
||||||
|
proxy_pass http://ctlog1.net.ipng.ch:8080;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Taking _Lipase_ shard 2025h2 as an example, The submission path (on `*.log.ct.ipng.ch`) will show
|
||||||
|
the same `index.html` as the monitoring path (on `*.mon.ct.ipng.ch`), to provide some consistency
|
||||||
|
with Sunlight logs. Otherwise, the `/metrics` endpoint is forwarded to the `otelcol` running on port
|
||||||
|
`:9464`, and the rest (the `/ct/v1/` and so on) are sent to the first port `:16900` of the
|
||||||
|
TesseraCT.
|
||||||
|
|
||||||
|
Then the read-path makes a special-case of the `/checkpoint` endpoint, which it does not cache. That
|
||||||
|
request (as all others) are forwarded to port `:8080` which is where NGINX is running. Other
|
||||||
|
requests (notably `/tile` and `/issuer`) are cacheable, so I'll cache these on the upstream NGINX
|
||||||
|
servers, both for resilience as well as for performance. Having four of these NGINX upstream will
|
||||||
|
allow the Static CT logs (regardless of being Sunlight or TesseraCT) to serve very high read-rates.
|
||||||
|
|
||||||
|
## What's Next
|
||||||
|
|
||||||
|
I need to spend a little bit of time thinking about rate limits, specifically write-ratelimits. I
|
||||||
|
think I'll use a request limiter in upstream NGINX, to allow for each IP or /24 or /48 subnet to
|
||||||
|
only send a fixed number of requests/sec. I'll probably keep that part private though, as it's a
|
||||||
|
good rule of thumb to never offer information to attackers.
|
||||||
|
|
||||||
|
Together with Antonis Chariton and Jeroen Massar, IPng Networks will be offering both TesseraCT and
|
||||||
|
Sunlight logs on the public internet. One final step is to productionize both logs, and file the
|
||||||
|
paperwork for them in the community. At this point our Sunlight log has been running for a month or
|
||||||
|
so, and we've filed the paperwork for it to be included at Apple and Google.
|
||||||
|
|
||||||
|
I'm going to have folks poke at _Lipase_ as well, after which I'll try to run a few `ct-fsck` to
|
||||||
|
make sure the logs are sane, before offering them into the inclusion program as well. Wish us luck!
|
73
content/ctlog.md
Normal file
73
content/ctlog.md
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
---
|
||||||
|
title: 'Certificate Transparency'
|
||||||
|
date: 2025-07-30
|
||||||
|
url: /s/ct
|
||||||
|
---
|
||||||
|
|
||||||
|
{{< image width="10em" float="right" src="/assets/ctlog/ctlog-logo-ipng.png" alt="ctlog logo" >}}
|
||||||
|
|
||||||
|
Certificate Transparency logs are "append-only" and publicly-auditable ledgers of certificates being
|
||||||
|
created, updated, and expired. This is the homepage for IPng Networks' Certificate Transparency
|
||||||
|
project.
|
||||||
|
|
||||||
|
Certificate Transparency [[CT](https://certificate.transparency.dev)] is a system for logging and
|
||||||
|
monitoring certificate issuance. It greatly enhances everyone’s ability to monitor and study
|
||||||
|
certificate issuance, and these capabilities have led to numerous improvements to the CA ecosystem
|
||||||
|
and Web security. As a result, it is rapidly becoming critical Internet infrastructure. Originally
|
||||||
|
developed by Google, the concept is now being adopted by many _Certification Authories_ who log
|
||||||
|
their certificates, and professional _Monitoring_ companies who observe the certificates and
|
||||||
|
report anomalies.
|
||||||
|
|
||||||
|
IPng Networks runs our logs under the domain `ct.ipng.ch`, split into a `*.log.ct.ipng.ch` for the
|
||||||
|
write-path, and `*.mon.ct.ipng.ch` for the read-path.
|
||||||
|
|
||||||
|
We are submitting our log for inclusion in the approved log lists for Google Chrome and Apple
|
||||||
|
Safari. Following 90 days of successful monitoring, we anticipate our log will be added to these
|
||||||
|
trusted lists and that change will propagate to people’s browsers with subsequent browser version
|
||||||
|
releases.
|
||||||
|
|
||||||
|
We operate two popular implementations of Static Certificate Transparency software.
|
||||||
|
|
||||||
|
## Sunlight
|
||||||
|
|
||||||
|
{{< image width="10em" float="right" src="/assets/ctlog/sunlight-logo.png" alt="sunlight logo" >}}
|
||||||
|
|
||||||
|
[[Sunlight](https://sunlight.dev)] was designed by Filippo Valsorda for the needs of the WebPKI
|
||||||
|
community, through the feedback of many of its members, and in particular of the Sigsum, Google
|
||||||
|
TrustFabric, and ISRG teams. It is partially based on the Go Checksum Database. Sunlight's
|
||||||
|
development was sponsored by Let's Encrypt.
|
||||||
|
|
||||||
|
Our Sunlight logs:
|
||||||
|
* A staging log called [[Rennet](https://rennet2025h2.log.ct.ipng.ch/)], incepted 2025-07-28,
|
||||||
|
starting from temporal shard `rennet2025h2`.
|
||||||
|
* A production log called [[Gouda](https://gouda2025h2.log.ct.ipng.ch/)], incepted 2025-07-30,
|
||||||
|
starting from temporal shard `gouda2025h2`.
|
||||||
|
|
||||||
|
## TesseraCT
|
||||||
|
|
||||||
|
{{< image width="10em" float="right" src="/assets/ctlog/tesseract-logo.png" alt="tesseract logo" >}}
|
||||||
|
|
||||||
|
[[TesseraCT](https://github.com/transparency-dev/tesseract)] is a Certificate Transparency (CT) log
|
||||||
|
implementation by the TrustFabric team at Google. It was built to allow log operators to run
|
||||||
|
production static-ct-api CT logs starting with temporal shards covering 2026 onwards, as the
|
||||||
|
successor to Trillian's CTFE.
|
||||||
|
|
||||||
|
Our TesseraCT logs:
|
||||||
|
* A staging log called [[Lipase](https://lipase2025h2.log.ct.ipng.ch/)], incepted 2025-08-22,
|
||||||
|
starting from temporal shared `lipase2025h2`.
|
||||||
|
* A production log called [[Halloumi](https://halloumi2025h2.log.ct.ipng.ch/)], incepted 2025-08-24,
|
||||||
|
starting from temporal shared `halloumi2025h2`.
|
||||||
|
* Log `halloumi2026h2` incorporated incorrect data into its Merkle Tree at entry 4357956 and
|
||||||
|
4552365, due to a [[TesseraCT bug](https://github.com/transparency-dev/tesseract/issues/553)]
|
||||||
|
and was retired on 2025-09-08, to be replaced by temporal shard `halloumi2026h2a`.
|
||||||
|
|
||||||
|
## Operational Details
|
||||||
|
|
||||||
|
You can read more details about our infrastructure on:
|
||||||
|
* **[[TesseraCT]({{< ref 2025-07-26-ctlog-1 >}})]** - published on 2025-07-26.
|
||||||
|
* **[[Sunlight]({{< ref 2025-08-10-ctlog-2 >}})]** - published on 2025-08-10.
|
||||||
|
* **[[Operations]({{< ref 2025-08-24-ctlog-3 >}})]** - published on 2025-08-24.
|
||||||
|
|
||||||
|
The operators of this infrastructure are **Antonis Chariton**, **Jeroen Massar** and **Pim van Pelt**. \
|
||||||
|
You can reach us via e-mail at [[<ct-ops@ipng.ch>](mailto:ct-ops@ipng.ch)].
|
||||||
|
|
BIN
static/assets/containerlab/learn-vpp.png
(Stored with Git LFS)
Normal file
BIN
static/assets/containerlab/learn-vpp.png
(Stored with Git LFS)
Normal file
Binary file not shown.
1270
static/assets/containerlab/vpp-containerlab.cast
Normal file
1270
static/assets/containerlab/vpp-containerlab.cast
Normal file
File diff suppressed because it is too large
Load Diff
1
static/assets/ctlog/MPLS Backbone - CTLog.svg
Normal file
1
static/assets/ctlog/MPLS Backbone - CTLog.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 147 KiB |
BIN
static/assets/ctlog/btop-sunlight.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/btop-sunlight.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/ctlog-loadtest1.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/ctlog-loadtest1.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/ctlog-loadtest2.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/ctlog-loadtest2.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/ctlog-loadtest3.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/ctlog-loadtest3.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/ctlog-logo-ipng.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/ctlog-logo-ipng.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/lipase.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/lipase.png
(Stored with Git LFS)
Normal file
Binary file not shown.
164
static/assets/ctlog/minio-results.txt
Normal file
164
static/assets/ctlog/minio-results.txt
Normal file
@@ -0,0 +1,164 @@
|
|||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=4M
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 813, speed = 54.2MB/sec, 13.5 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 23168, speed = 1.5GB/sec, 386.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 2.2 secs, 371.2 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=1M
|
||||||
|
2025/07/20 16:07:25 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FACEBAC4D052, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 1221, speed = 20.3MB/sec, 20.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 31000, speed = 516.7MB/sec, 516.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 3.2 secs, 376.5 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=8k
|
||||||
|
2025/07/20 16:09:29 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FAEB70060604, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 3353, speed = 447KB/sec, 55.9 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 45913, speed = 6MB/sec, 765.2 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 9.3 secs, 361.6 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=4k
|
||||||
|
2025/07/20 16:11:38 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FB098B162788, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 3404, speed = 226.9KB/sec, 56.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 45230, speed = 2.9MB/sec, 753.8 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 9.4 secs, 362.6 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=4M
|
||||||
|
2025/07/20 16:13:47 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FB27AE890E75, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.1 secs, objects = 1898, speed = 126.4MB/sec, 31.6 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 185034, speed = 12GB/sec, 3083.9 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 0.4 secs, 4267.8 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=1M
|
||||||
|
2025/07/20 16:15:48 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FB43C0386015, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.2 secs, objects = 2627, speed = 43.7MB/sec, 43.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 327959, speed = 5.3GB/sec, 5465.9 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 0.6 secs, 4045.6 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=8k
|
||||||
|
2025/07/20 16:17:49 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FB5FE2012590, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 6663, speed = 887.7KB/sec, 111.0 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 459962, speed = 59.9MB/sec, 7666.0 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 1.7 secs, 3890.9 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=4k
|
||||||
|
2025/07/20 16:19:50 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FB7C3CF0FFCA, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.1 secs, objects = 6673, speed = 444.4KB/sec, 111.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 444637, speed = 28.9MB/sec, 7410.5 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 1.5 secs, 4411.8 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=4M
|
||||||
|
2025/07/20 16:21:52 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FB988DB60881, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.2 secs, objects = 3093, speed = 205.5MB/sec, 51.4 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 168750, speed = 11GB/sec, 2811.4 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 0.3 secs, 9112.2 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=1M
|
||||||
|
2025/07/20 16:23:53 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FBB4A1E534DE, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.2 secs, objects = 4652, speed = 77.2MB/sec, 77.2 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 351187, speed = 5.7GB/sec, 5852.8 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 0.6 secs, 8141.6 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=8k
|
||||||
|
2025/07/20 16:25:54 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FBD0C4764C64, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.1 secs, objects = 14497, speed = 1.9MB/sec, 241.4 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 457437, speed = 59.6MB/sec, 7623.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 1.7 secs, 8353.6 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=4k
|
||||||
|
2025/07/20 16:27:55 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FBED210B0792, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.1 secs, objects = 14459, speed = 962.6KB/sec, 240.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 466680, speed = 30.4MB/sec, 7777.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 1.7 secs, 8605.3 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=4M
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 1866, speed = 124.4MB/sec, 31.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 16400, speed = 1.1GB/sec, 273.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 5.1 secs, 369.3 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=1M
|
||||||
|
2025/07/20 16:32:02 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FC25AE815718, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 5459, speed = 91MB/sec, 91.0 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 25090, speed = 418.2MB/sec, 418.2 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 14.8 secs, 369.8 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=8k
|
||||||
|
2025/07/20 16:34:17 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FC4514A78873, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 22278, speed = 2.9MB/sec, 371.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 40626, speed = 5.3MB/sec, 677.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 61.6 secs, 361.8 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=4k
|
||||||
|
2025/07/20 16:37:19 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FC6F629ACFAC, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 23394, speed = 1.5MB/sec, 389.9 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 39249, speed = 2.6MB/sec, 654.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 64.5 secs, 363.0 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=4M
|
||||||
|
2025/07/20 16:40:23 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FC9A5D101971, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 10564, speed = 704.1MB/sec, 176.0 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 20682, speed = 1.3GB/sec, 344.6 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 2.5 secs, 4178.8 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=1M
|
||||||
|
2025/07/20 16:42:26 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FCB6EB0A45D9, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 26550, speed = 442.4MB/sec, 442.4 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 124810, speed = 2GB/sec, 2080.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 6.6 secs, 4049.2 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=8k
|
||||||
|
2025/07/20 16:44:32 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FCD4684A110E, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 129363, speed = 16.8MB/sec, 2155.9 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 423956, speed = 55.2MB/sec, 7065.8 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 32.4 secs, 3992.0 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=4k
|
||||||
|
2025/07/20 16:47:05 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FCF7EA4857CF, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 123067, speed = 8MB/sec, 2051.0 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 357694, speed = 23.3MB/sec, 5961.4 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 30.9 secs, 3986.0 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=4M
|
||||||
|
2025/07/20 16:49:36 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FD1B12EFDEBC, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.1 secs, objects = 13131, speed = 873.3MB/sec, 218.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.1 secs, objects = 18630, speed = 1.2GB/sec, 310.2 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 1.7 secs, 7787.5 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=1M
|
||||||
|
2025/07/20 16:51:38 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FD3779E97644, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.1 secs, objects = 40226, speed = 669.8MB/sec, 669.8 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 85692, speed = 1.4GB/sec, 1427.8 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 4.7 secs, 8610.2 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=8k
|
||||||
|
2025/07/20 16:53:42 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FD5489FB2F1F, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 230985, speed = 30.1MB/sec, 3849.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 435703, speed = 56.7MB/sec, 7261.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 25.8 secs, 8945.8 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:9000, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=4k
|
||||||
|
2025/07/20 16:56:08 WARNING: createBucket wasabi-benchmark-bucket error, ignoring BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
|
||||||
|
status code: 409, request id: 1853FD7683B9BB96, host id: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 228647, speed = 14.9MB/sec, 3810.4 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 452412, speed = 29.5MB/sec, 7539.9 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 27.2 secs, 8418.0 deletes/sec. Slowdowns = 0
|
BIN
static/assets/ctlog/minio_8kb_performance.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/minio_8kb_performance.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/nsa_slide.jpg
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/nsa_slide.jpg
(Stored with Git LFS)
Normal file
Binary file not shown.
80
static/assets/ctlog/seaweedfs-results.txt
Normal file
80
static/assets/ctlog/seaweedfs-results.txt
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=1M
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 1994, speed = 33.2MB/sec, 33.2 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 29243, speed = 487.4MB/sec, 487.4 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 2.8 secs, 701.4 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=8k
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 13634, speed = 1.8MB/sec, 227.2 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 32284, speed = 4.2MB/sec, 538.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 18.7 secs, 727.8 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=1M
|
||||||
|
Loop 1: PUT time 62.0 secs, objects = 23733, speed = 382.8MB/sec, 382.8 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 132708, speed = 2.2GB/sec, 2211.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 3.7 secs, 6490.1 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=8k
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 199925, speed = 26MB/sec, 3331.9 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 309937, speed = 40.4MB/sec, 5165.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 31.2 secs, 6406.0 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=1M
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 1975, speed = 32.9MB/sec, 32.9 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 29898, speed = 498.3MB/sec, 498.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 2.7 secs, 726.6 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=1, loops=1, size=8k
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 13662, speed = 1.8MB/sec, 227.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 31865, speed = 4.1MB/sec, 531.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 18.8 secs, 726.9 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=1M
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 26622, speed = 443.6MB/sec, 443.6 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 117688, speed = 1.9GB/sec, 1961.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 4.1 secs, 6499.5 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=8k
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 198238, speed = 25.8MB/sec, 3303.9 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 312868, speed = 40.7MB/sec, 5214.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 30.8 secs, 6432.7 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=4M
|
||||||
|
Loop 1: PUT time 60.1 secs, objects = 6220, speed = 414.2MB/sec, 103.6 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 38773, speed = 2.5GB/sec, 646.1 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 0.9 secs, 6693.3 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=4k
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 203033, speed = 13.2MB/sec, 3383.8 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 300824, speed = 19.6MB/sec, 5013.6 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 31.1 secs, 6528.6 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=4M
|
||||||
|
Loop 1: PUT time 60.3 secs, objects = 13181, speed = 874.2MB/sec, 218.6 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.1 secs, objects = 18575, speed = 1.2GB/sec, 309.3 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 0.8 secs, 17547.2 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-disk:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=4k
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 495006, speed = 32.2MB/sec, 8249.5 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 465947, speed = 30.3MB/sec, 7765.4 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 41.4 secs, 11961.3 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=4M
|
||||||
|
Loop 1: PUT time 60.1 secs, objects = 7073, speed = 471MB/sec, 117.8 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 31248, speed = 2GB/sec, 520.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 1.1 secs, 6576.1 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=8, loops=1, size=4k
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 214387, speed = 14MB/sec, 3573.0 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 297586, speed = 19.4MB/sec, 4959.7 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 32.9 secs, 6519.8 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=4M
|
||||||
|
Loop 1: PUT time 60.1 secs, objects = 14365, speed = 956MB/sec, 239.0 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.1 secs, objects = 18113, speed = 1.2GB/sec, 301.6 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 0.8 secs, 18655.8 deletes/sec. Slowdowns = 0
|
||||||
|
Wasabi benchmark program v2.0
|
||||||
|
Parameters: url=http://minio-ssd:8333, bucket=wasabi-benchmark-bucket, region=us-east-1, duration=60, threads=32, loops=1, size=4k
|
||||||
|
Loop 1: PUT time 60.0 secs, objects = 489736, speed = 31.9MB/sec, 8161.8 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: GET time 60.0 secs, objects = 460296, speed = 30MB/sec, 7671.2 operations/sec. Slowdowns = 0
|
||||||
|
Loop 1: DELETE time 41.0 secs, 11957.6 deletes/sec. Slowdowns = 0
|
116
static/assets/ctlog/seaweedfs.docker-compose.yml
Normal file
116
static/assets/ctlog/seaweedfs.docker-compose.yml
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
# Test Setup for SeaweedFS with 6 disks, a Filer an an S3 API
|
||||||
|
#
|
||||||
|
# Use with the following .env file
|
||||||
|
# root@minio-ssd:~# cat /opt/seaweedfs/.env
|
||||||
|
# AWS_ACCESS_KEY_ID="hottentotten"
|
||||||
|
# AWS_SECRET_ACCESS_KEY="tentententoonstelling"
|
||||||
|
|
||||||
|
services:
|
||||||
|
# Master
|
||||||
|
master0:
|
||||||
|
image: chrislusf/seaweedfs
|
||||||
|
ports:
|
||||||
|
- 9333:9333
|
||||||
|
- 19333:19333
|
||||||
|
command: "-v=1 master -volumeSizeLimitMB 100 -resumeState=false -ip=master0 -ip.bind=0.0.0.0 -port=9333 -mdir=/var/lib/seaweedfs/master"
|
||||||
|
volumes:
|
||||||
|
- ./data/master0:/var/lib/seaweedfs/master
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Volume Server 1
|
||||||
|
volume1:
|
||||||
|
image: chrislusf/seaweedfs
|
||||||
|
command: 'volume -dataCenter=dc1 -rack=r1 -mserver="master0:9333" -port=8081 -preStopSeconds=1 -dir=/var/lib/seaweedfs/volume1'
|
||||||
|
volumes:
|
||||||
|
- /data/disk1:/var/lib/seaweedfs/volume1
|
||||||
|
depends_on:
|
||||||
|
- master0
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Volume Server 2
|
||||||
|
volume2:
|
||||||
|
image: chrislusf/seaweedfs
|
||||||
|
command: 'volume -dataCenter=dc1 -rack=r1 -mserver="master0:9333" -port=8082 -preStopSeconds=1 -dir=/var/lib/seaweedfs/volume2'
|
||||||
|
volumes:
|
||||||
|
- /data/disk2:/var/lib/seaweedfs/volume2
|
||||||
|
depends_on:
|
||||||
|
- master0
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Volume Server 3
|
||||||
|
volume3:
|
||||||
|
image: chrislusf/seaweedfs
|
||||||
|
command: 'volume -dataCenter=dc1 -rack=r1 -mserver="master0:9333" -port=8083 -preStopSeconds=1 -dir=/var/lib/seaweedfs/volume3'
|
||||||
|
volumes:
|
||||||
|
- /data/disk3:/var/lib/seaweedfs/volume3
|
||||||
|
depends_on:
|
||||||
|
- master0
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Volume Server 4
|
||||||
|
volume4:
|
||||||
|
image: chrislusf/seaweedfs
|
||||||
|
command: 'volume -dataCenter=dc1 -rack=r1 -mserver="master0:9333" -port=8084 -preStopSeconds=1 -dir=/var/lib/seaweedfs/volume4'
|
||||||
|
volumes:
|
||||||
|
- /data/disk4:/var/lib/seaweedfs/volume4
|
||||||
|
depends_on:
|
||||||
|
- master0
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Volume Server 5
|
||||||
|
volume5:
|
||||||
|
image: chrislusf/seaweedfs
|
||||||
|
command: 'volume -dataCenter=dc1 -rack=r1 -mserver="master0:9333" -port=8085 -preStopSeconds=1 -dir=/var/lib/seaweedfs/volume5'
|
||||||
|
volumes:
|
||||||
|
- /data/disk5:/var/lib/seaweedfs/volume5
|
||||||
|
depends_on:
|
||||||
|
- master0
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Volume Server 6
|
||||||
|
volume6:
|
||||||
|
image: chrislusf/seaweedfs
|
||||||
|
command: 'volume -dataCenter=dc1 -rack=r1 -mserver="master0:9333" -port=8086 -preStopSeconds=1 -dir=/var/lib/seaweedfs/volume6'
|
||||||
|
volumes:
|
||||||
|
- /data/disk6:/var/lib/seaweedfs/volume6
|
||||||
|
depends_on:
|
||||||
|
- master0
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Filer
|
||||||
|
filer:
|
||||||
|
image: chrislusf/seaweedfs
|
||||||
|
ports:
|
||||||
|
- 8888:8888
|
||||||
|
- 18888:18888
|
||||||
|
command: 'filer -defaultReplicaPlacement=002 -iam -master="master0:9333"'
|
||||||
|
volumes:
|
||||||
|
- ./data/filer:/data
|
||||||
|
depends_on:
|
||||||
|
- master0
|
||||||
|
- volume1
|
||||||
|
- volume2
|
||||||
|
- volume3
|
||||||
|
- volume4
|
||||||
|
- volume5
|
||||||
|
- volume6
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# S3 API
|
||||||
|
s3:
|
||||||
|
image: chrislusf/seaweedfs
|
||||||
|
ports:
|
||||||
|
- 8333:8333
|
||||||
|
command: 's3 -filer="filer:8888" -ip.bind=0.0.0.0'
|
||||||
|
env_file:
|
||||||
|
- .env
|
||||||
|
depends_on:
|
||||||
|
- master0
|
||||||
|
- volume1
|
||||||
|
- volume2
|
||||||
|
- volume3
|
||||||
|
- volume4
|
||||||
|
- volume5
|
||||||
|
- volume6
|
||||||
|
- filer
|
||||||
|
restart: unless-stopped
|
BIN
static/assets/ctlog/size_comparison_8t.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/size_comparison_8t.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/stop-hammer-time.jpg
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/stop-hammer-time.jpg
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/sunlight-logo.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/sunlight-logo.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/sunlight-test-s3.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/sunlight-test-s3.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/ctlog/tesseract-logo.png
(Stored with Git LFS)
Normal file
BIN
static/assets/ctlog/tesseract-logo.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/console-1.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/console-1.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/console-2.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/console-2.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/disks.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/disks.png
(Stored with Git LFS)
Normal file
Binary file not shown.
1633
static/assets/minio/minio-ec.svg
Normal file
1633
static/assets/minio/minio-ec.svg
Normal file
File diff suppressed because it is too large
Load Diff
After Width: | Height: | Size: 90 KiB |
BIN
static/assets/minio/minio-logo.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/minio-logo.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/nagios.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/nagios.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/nginx-logo.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/nginx-logo.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/rack-2.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/rack-2.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/rack.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/rack.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/restic-logo.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/restic-logo.png
(Stored with Git LFS)
Normal file
Binary file not shown.
1
themes/hugo-theme-ipng/layouts/shortcodes/boldcolor.html
Normal file
1
themes/hugo-theme-ipng/layouts/shortcodes/boldcolor.html
Normal file
@@ -0,0 +1 @@
|
|||||||
|
<span style="color: {{ .Get "color" }}; font-weight: bold;">{{ .Inner }}</span>
|
Reference in New Issue
Block a user