Compare commits

...

6 Commits

Author SHA1 Message Date
a76abc331f A few typo fixes, h/t jeroen
All checks were successful
continuous-integration/drone/push Build is passing
2025-06-05 20:06:34 +00:00
44deb34685 Typo fixes, h/t Jeroen
All checks were successful
continuous-integration/drone/push Build is passing
2025-06-05 20:04:11 +00:00
ca46bcf6d5 Add Minio #2
All checks were successful
continuous-integration/drone/push Build is passing
2025-06-01 16:39:48 +02:00
5042f822ef Minio Article #1
All checks were successful
continuous-integration/drone/push Build is passing
2025-06-01 12:53:16 +02:00
fdb77838b8 Rewrite github.com to git.ipng.ch for popular repos
All checks were successful
continuous-integration/drone/push Build is passing
2025-05-04 21:54:16 +02:00
6d3f4ac206 Some readability changes 2025-05-04 21:50:07 +02:00
33 changed files with 2900 additions and 52 deletions

View File

@ -89,7 +89,7 @@ lcp lcp-sync off
```
The prep work for the rest of the interface syncer starts with this
[[commit](https://github.com/pimvanpelt/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
[[commit](https://git.ipng.ch/ipng/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
for the rest of this blog post, the behavior will be in the 'on' position.
### Change interface: state
@ -120,7 +120,7 @@ the state it was. I did notice that you can't bring up a sub-interface if its pa
is down, which I found counterintuitive, but that's neither here nor there.
All of this is to say that we have to be careful when copying state forward, because as
this [[commit](https://github.com/pimvanpelt/lcpng/commit/7c15c84f6c4739860a85c599779c199cb9efef03)]
this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c15c84f6c4739860a85c599779c199cb9efef03)]
shows, issuing `set int state ... up` on an interface, won't touch its sub-interfaces in VPP, but
the subsequent netlink message to bring the _LIP_ for that interface up, **will** update the
children, thus desynchronising Linux and VPP: Linux will have interface **and all its
@ -128,7 +128,7 @@ sub-interfaces** up unconditionally; VPP will have the interface up and its sub-
whatever state they were before.
To address this, a second
[[commit](https://github.com/pimvanpelt/lcpng/commit/a3dc56c01461bdffcac8193ead654ae79225220f)] was
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a3dc56c01461bdffcac8193ead654ae79225220f)] was
needed. I'm not too sure I want to keep this behavior, but for now, it results in an intuitive
end-state, which is that all interfaces states are exactly the same between Linux and VPP.
@ -157,7 +157,7 @@ DBGvpp# set int state TenGigabitEthernet3/0/0 up
### Change interface: MTU
Finally, a straight forward
[[commit](https://github.com/pimvanpelt/lcpng/commit/39bfa1615fd1cafe5df6d8fc9d34528e8d3906e2)], or
[[commit](https://git.ipng.ch/ipng/lcpng/commit/39bfa1615fd1cafe5df6d8fc9d34528e8d3906e2)], or
so I thought. When the MTU changes in VPP (with `set interface mtu packet N <int>`), there is
callback that can be registered which copies this into the _LIP_. I did notice a specific corner
case: In VPP, a sub-interface can have a larger MTU than its parent. In Linux, this cannot happen,
@ -179,7 +179,7 @@ higher than that, perhaps logging an error explaining why. This means two things
1. Any change in VPP of a parent MTU should ensure all children are clamped to at most that.
I addressed the issue in this
[[commit](https://github.com/pimvanpelt/lcpng/commit/79a395b3c9f0dae9a23e6fbf10c5f284b1facb85)].
[[commit](https://git.ipng.ch/ipng/lcpng/commit/79a395b3c9f0dae9a23e6fbf10c5f284b1facb85)].
### Change interface: IP Addresses
@ -199,7 +199,7 @@ VPP into the companion Linux devices:
_LIP_ with `lcp_itf_set_interface_addr()`.
This means with this
[[commit](https://github.com/pimvanpelt/lcpng/commit/f7e1bb951d648a63dfa27d04ded0b6261b9e39fe)], at
[[commit](https://git.ipng.ch/ipng/lcpng/commit/f7e1bb951d648a63dfa27d04ded0b6261b9e39fe)], at
any time a new _LIP_ is created, the IPv4 and IPv6 address on the VPP interface are fully copied
over by the third change, while at runtime, new addresses can be set/removed as well by the first
and second change.

View File

@ -100,7 +100,7 @@ linux-cp {
Based on this config, I set the startup default in `lcp_set_lcp_auto_subint()`, but I realize that
an administrator may want to turn it on/off at runtime, too, so I add a CLI getter/setter that
interacts with the flag in this [[commit](https://github.com/pimvanpelt/lcpng/commit/d23aab2d95aabcf24efb9f7aecaf15b513633ab7)]:
interacts with the flag in this [[commit](https://git.ipng.ch/ipng/lcpng/commit/d23aab2d95aabcf24efb9f7aecaf15b513633ab7)]:
```
DBGvpp# show lcp
@ -116,11 +116,11 @@ lcp lcp-sync off
```
The prep work for the rest of the interface syncer starts with this
[[commit](https://github.com/pimvanpelt/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
[[commit](https://git.ipng.ch/ipng/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
for the rest of this blog post, the behavior will be in the 'on' position.
The code for the configuration toggle is in this
[[commit](https://github.com/pimvanpelt/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
[[commit](https://git.ipng.ch/ipng/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
### Auto create/delete sub-interfaces
@ -145,7 +145,7 @@ I noticed that interface deletion had a bug (one that I fell victim to as well:
remove the netlink device in the correct network namespace), which I fixed.
The code for the auto create/delete and the bugfix is in this
[[commit](https://github.com/pimvanpelt/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
[[commit](https://git.ipng.ch/ipng/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
### Further Work

View File

@ -154,7 +154,7 @@ For now, `lcp_nl_dispatch()` just throws the message away after logging it with
a function that will come in very useful as I start to explore all the different Netlink message types.
The code that forms the basis of our Netlink Listener lives in [[this
commit](https://github.com/pimvanpelt/lcpng/commit/c4e3043ea143d703915239b2390c55f7b6a9b0b1)] and
commit](https://git.ipng.ch/ipng/lcpng/commit/c4e3043ea143d703915239b2390c55f7b6a9b0b1)] and
specifically, here I want to call out I was not the primary author, I worked off of Matt and Neale's
awesome work in this pending [Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122).
@ -182,7 +182,7 @@ Linux interface VPP is not aware of. But, if I can find the _LIP_, I can convert
add or remove the ip4/ip6 neighbor adjacency.
The code for this first Netlink message handler lives in this
[[commit](https://github.com/pimvanpelt/lcpng/commit/30bab1d3f9ab06670fbef2c7c6a658e7b77f7738)]. An
[[commit](https://git.ipng.ch/ipng/lcpng/commit/30bab1d3f9ab06670fbef2c7c6a658e7b77f7738)]. An
ironic insight is that after writing the code, I don't think any of it will be necessary, because
the interface plugin will already copy ARP and IPv6 ND packets back and forth and itself update its
neighbor adjacency tables; but I'm leaving the code in for now.
@ -197,7 +197,7 @@ it or remove it, and if there are no link-local addresses left, disable IPv6 on
There's also a few multicast routes to add (notably 224.0.0.0/24 and ff00::/8, all-local-subnet).
The code for IP address handling is in this
[[commit]](https://github.com/pimvanpelt/lcpng/commit/87742b4f541d389e745f0297d134e34f17b5b485), but
[[commit]](https://git.ipng.ch/ipng/lcpng/commit/87742b4f541d389e745f0297d134e34f17b5b485), but
when I took it out for a spin, I noticed something curious, looking at the log lines that are
generated for the following sequence:
@ -236,7 +236,7 @@ interface and directly connected route addition/deletion is slightly different i
So, I decide to take a little shortcut -- if an addition returns "already there", or a deletion returns
"no such entry", I'll just consider it a successful addition and deletion respectively, saving my eyes
from being screamed at by this red error message. I changed that in this
[[commit](https://github.com/pimvanpelt/lcpng/commit/d63fbd8a9a612d038aa385e79a57198785d409ca)],
[[commit](https://git.ipng.ch/ipng/lcpng/commit/d63fbd8a9a612d038aa385e79a57198785d409ca)],
turning this situation in a friendly green notice instead.
### Netlink: Link (existing)
@ -267,7 +267,7 @@ To avoid this loop, I temporarily turn off `lcp-sync` just before handling a bat
turn it back to its original state when I'm done with that.
The code for all/del of existing links is in this
[[commit](https://github.com/pimvanpelt/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)].
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)].
### Netlink: Link (new)
@ -276,7 +276,7 @@ doesn't have a _LIP_ for, but specifically describes a VLAN interface? Well, th
is trying to create a new sub-interface. And supporting that operation would be super cool, so let's go!
Using the earlier placeholder hint in `lcp_nl_link_add()` (see the previous
[[commit](https://github.com/pimvanpelt/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)]),
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)]),
I know that I've gotten a NEWLINK request but the Linux ifindex doesn't have a _LIP_. This could be
because the interface is entirely foreign to VPP, for example somebody created a dummy interface or
a VLAN sub-interface on one:
@ -331,7 +331,7 @@ a boring `<phy>.<subid>` name.
Alright, without further ado, the code for the main innovation here, the implementation of
`lcp_nl_link_add_vlan()`, is in this
[[commit](https://github.com/pimvanpelt/lcpng/commit/45f408865688eb7ea0cdbf23aa6f8a973be49d1a)].
[[commit](https://git.ipng.ch/ipng/lcpng/commit/45f408865688eb7ea0cdbf23aa6f8a973be49d1a)].
## Results

View File

@ -118,7 +118,7 @@ or Virtual Routing/Forwarding domains). So first, I need to add these:
All of this code was heavily inspired by the pending [[Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122)]
but a few finishing touches were added, and wrapped up in this
[[commit](https://github.com/pimvanpelt/lcpng/commit/7a76498277edc43beaa680e91e3a0c1787319106)].
[[commit](https://git.ipng.ch/ipng/lcpng/commit/7a76498277edc43beaa680e91e3a0c1787319106)].
### Deletion
@ -459,7 +459,7 @@ it as 'unreachable' rather than deleting it. These are *additions* which have a
but with an interface index of 1 (which, in Netlink, is 'lo'). This makes VPP intermittently crash, so I
currently commented this out, while I gain better understanding. Result: blackhole/unreachable/prohibit
specials can not be set using the plugin. Beware!
(disabled in this [[commit](https://github.com/pimvanpelt/lcpng/commit/7c864ed099821f62c5be8cbe9ed3f4dd34000a42)]).
(disabled in this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c864ed099821f62c5be8cbe9ed3f4dd34000a42)]).
## Credits

View File

@ -88,7 +88,7 @@ stat['/if/rx-miss'][:, 1].sum() - returns the sum of packet counters for
```
Alright, so let's grab that file and refactor it into a small library for me to use, I do
this in [[this commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
this in [[this commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
### VPP's API
@ -159,7 +159,7 @@ idx=19 name=tap4 mac=02:fe:17:06:fc:af mtu=9000 flags=3
So I added a little abstration with some error handling and one main function
to return interfaces as a Python dictionary of those `sw_interface_details`
tuples in [[this commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
tuples in [[this commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
### AgentX
@ -207,9 +207,9 @@ once asked with `GetPDU` or `GetNextPDU` requests, by issuing a corresponding `R
to the SNMP server -- it takes care of all the rest!
The resulting code is in [[this
commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/8c9c1e2b4aa1d40a981f17581f92bba133dd2c29)]
commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/8c9c1e2b4aa1d40a981f17581f92bba133dd2c29)]
but you can also check out the whole thing on
[[Github](https://github.com/pimvanpelt/vpp-snmp-agent)].
[[Github](https://git.ipng.ch/ipng/vpp-snmp-agent)].
### Building

View File

@ -480,7 +480,7 @@ is to say, those packets which were destined to any IP address configured on the
plane. Any traffic going _through_ VPP will never be seen by Linux! So, I'll have to be
clever and count this traffic by polling VPP instead. This was the topic of my previous
[VPP Part 6]({{< ref "2021-09-10-vpp-6" >}}) about the SNMP Agent. All of that code
was released to [Github](https://github.com/pimvanpelt/vpp-snmp-agent), notably there's
was released to [Github](https://git.ipng.ch/ipng/vpp-snmp-agent), notably there's
a hint there for an `snmpd-dataplane.service` and a `vpp-snmp-agent.service`, including
the compiled binary that reads from VPP and feeds this to SNMP.

View File

@ -62,7 +62,7 @@ plugins:
or route, or the system receiving ARP or IPv6 neighbor request/reply from neighbors), and applying
these events to the VPP dataplane.
I've published the code on [Github](https://github.com/pimvanpelt/lcpng/) and I am targeting a release
I've published the code on [Github](https://git.ipng.ch/ipng/lcpng/) and I am targeting a release
in upstream VPP, hoping to make the upcoming 22.02 release in February 2022. I have a lot of ground to
cover, but I will note that the plugin has been running in production in [AS8298]({{< ref "2021-02-27-network" >}})
since Sep'21 and no crashes related to LinuxCP have been observed.
@ -195,7 +195,7 @@ So grab a cup of tea, while we let Rhino stretch its legs, ehh, CPUs ...
pim@rhino:~$ mkdir -p ~/src
pim@rhino:~$ cd ~/src
pim@rhino:~/src$ sudo apt install libmnl-dev
pim@rhino:~/src$ git clone https://github.com/pimvanpelt/lcpng.git
pim@rhino:~/src$ git clone https://git.ipng.ch/ipng/lcpng.git
pim@rhino:~/src$ git clone https://gerrit.fd.io/r/vpp
pim@rhino:~/src$ ln -s ~/src/lcpng ~/src/vpp/src/plugins/lcpng
pim@rhino:~/src$ cd ~/src/vpp

View File

@ -33,7 +33,7 @@ In this first post, let's take a look at tablestakes: writing a YAML specificati
configuration elements of VPP, and then ensures that the YAML file is both syntactically as well as
semantically correct.
**Note**: Code is on [my Github](https://github.com/pimvanpelt/vppcfg), but it's not quite ready for
**Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
or reach out by [contacting us](/s/contact/).
@ -348,7 +348,7 @@ to mess up my (or your!) VPP router by feeding it garbage, so the lions' share o
has been to assert the YAML file is both syntactically and semantically valid.
In the mean time, you can take a look at my code on [GitHub](https://github.com/pimvanpelt/vppcfg), but to
In the mean time, you can take a look at my code on [GitHub](https://git.ipng.ch/ipng/vppcfg), but to
whet your appetite, here's a hefty configuration that demonstrates all implemented types:
```

View File

@ -32,7 +32,7 @@ the configuration to the dataplane. Welcome to `vppcfg`!
In this second post of the series, I want to talk a little bit about how planning a path from a running
configuration to a desired new configuration might look like.
**Note**: Code is on [my Github](https://github.com/pimvanpelt/vppcfg), but it's not quite ready for
**Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
or reach out by [contacting us](/s/contact/).

View File

@ -171,12 +171,12 @@ GigabitEthernet1/0/0 1 up GigabitEthernet1/0/0
After this exploratory exercise, I have learned enough about the hardware to be able to take the
Fitlet2 out for a spin. To configure the VPP instance, I turn to
[[vppcfg](https://github.com/pimvanpelt/vppcfg)], which can take a YAML configuration file
[[vppcfg](https://git.ipng.ch/ipng/vppcfg)], which can take a YAML configuration file
describing the desired VPP configuration, and apply it safely to the running dataplane using the VPP
API. I've written a few more posts on how it does that, notably on its [[syntax]({{< ref "2022-03-27-vppcfg-1" >}})]
and its [[planner]({{< ref "2022-04-02-vppcfg-2" >}})]. A complete
configuration guide on vppcfg can be found
[[here](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md)].
[[here](https://git.ipng.ch/ipng/vppcfg/blob/main/docs/config-guide.md)].
```
pim@fitlet:~$ sudo dpkg -i {lib,}vpp*23.06*deb

View File

@ -185,7 +185,7 @@ forgetful chipmunk-sized brain!), so here, I'll only recap what's already writte
**1. BUILD:** For the first step, the build is straight forward, and yields a VPP instance based on
`vpp-ext-deps_23.06-1` at version `23.06-rc0~71-g182d2b466`, which contains my
[[LCPng](https://github.com/pimvanpelt/lcpng.git)] plugin. I then copy the packages to the router.
[[LCPng](https://git.ipng.ch/ipng/lcpng.git)] plugin. I then copy the packages to the router.
The router has an E-2286G CPU @ 4.00GHz with 6 cores and 6 hyperthreads. There's a really handy tool
called `likwid-topology` that can show how the L1, L2 and L3 cache lines up with respect to CPU
cores. Here I learn that CPU (0+6) and (1+7) share L1 and L2 cache -- so I can conclude that 0-5 are
@ -351,7 +351,7 @@ in `vppcfg`:
* When I create the initial `--novpp` config, there's a bug in `vppcfg` where I incorrectly
reference a dataplane object which I haven't initialized (because with `--novpp` the tool
will not contact the dataplane at all. That one was easy to fix, which I did in [[this
commit](https://github.com/pimvanpelt/vppcfg/commit/0a0413927a0be6ed3a292a8c336deab8b86f5eee)]).
commit](https://git.ipng.ch/ipng/vppcfg/commit/0a0413927a0be6ed3a292a8c336deab8b86f5eee)]).
After that small detour, I can now proceed to configure the dataplane by offering the resulting
VPP commands, like so:
@ -573,7 +573,7 @@ see is that which is destined to the controlplane (eg, to one of the IPv4 or IPv
multicast/broadcast groups that they are participating in), so things like tcpdump or SNMP won't
really work.
However, due to my [[vpp-snmp-agent](https://github.com/pimvanpelt/vpp-snmp-agent.git)], which is
However, due to my [[vpp-snmp-agent](https://git.ipng.ch/ipng/vpp-snmp-agent.git)], which is
feeding as an AgentX behind an snmpd that in turn is running in the `dataplane` namespace, SNMP scrapes
work as they did before, albeit with a few different interface names.

View File

@ -14,7 +14,7 @@ performance and versatility. For those of us who have used Cisco IOS/XR devices,
_ASR_ (aggregation service router), VPP will look and feel quite familiar as many of the approaches
are shared between the two.
I've been working on the Linux Control Plane [[ref](https://github.com/pimvanpelt/lcpng)], which you
I've been working on the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)], which you
can read all about in my series on VPP back in 2021:
[![DENOG14](/assets/vpp-stats/denog14-thumbnail.png){: style="width:300px; float: right; margin-left: 1em;"}](https://video.ipng.ch/w/erc9sAofrSZ22qjPwmv6H4)
@ -70,7 +70,7 @@ answered by a Response PDU.
Using parts of a Python Agentx library written by GitHub user hosthvo
[[ref](https://github.com/hosthvo/pyagentx)], I tried my hands at writing one of these AgentX's.
The resulting source code is on [[GitHub](https://github.com/pimvanpelt/vpp-snmp-agent)]. That's the
The resulting source code is on [[GitHub](https://git.ipng.ch/ipng/vpp-snmp-agent)]. That's the
one that's running in production ever since I started running VPP routers at IPng Networks AS8298.
After the _AgentX_ exposes the dataplane interfaces and their statistics into _SNMP_, an open source
monitoring tool such as LibreNMS [[ref](https://librenms.org/)] can discover the routers and draw
@ -126,7 +126,7 @@ for any interface created in the dataplane.
I wish I were good at Go, but I never really took to the language. I'm pretty good at Python, but
sorting through the stats segment isn't super quick as I've already noticed in the Python3 based
[[VPP SNMP Agent](https://github.com/pimvanpelt/vpp-snmp-agent)]. I'm probably the world's least
[[VPP SNMP Agent](https://git.ipng.ch/ipng/vpp-snmp-agent)]. I'm probably the world's least
terrible C programmer, so maybe I can take a look at the VPP Stats Client and make sense of it. Luckily,
there's an example already in `src/vpp/app/vpp_get_stats.c` and it reveals the following pattern:

View File

@ -19,7 +19,7 @@ same time keep an IPng Site Local network with IPv4 and IPv6 that is separate fr
based on hardware/silicon based forwarding at line rate and high availability. You can read all
about my Centec MPLS shenanigans in [[this article]({{< ref "2023-03-11-mpls-core" >}})].
Ever since the release of the Linux Control Plane [[ref](https://github.com/pimvanpelt/lcpng)]
Ever since the release of the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)]
plugin in VPP, folks have asked "What about MPLS?" -- I have never really felt the need to go this
rabbit hole, because I figured that in this day and age, higher level IP protocols that do tunneling
are just as performant, and a little bit less of an 'art' to get right. For example, the Centec

View File

@ -459,6 +459,6 @@ and VPP, and the overall implementation before attempting to use in production.
we got at least some of this right, but testing and runtime experience will tell.
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
[[GitHub](https://github.com/pimvanpelt/lcpng.git)]. If you'd like to test this - reach out to the VPP
[[GitHub](https://git.ipng.ch/ipng/lcpng.git)]. If you'd like to test this - reach out to the VPP
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!

View File

@ -385,5 +385,5 @@ and VPP, and the overall implementation before attempting to use in production.
we got at least some of this right, but testing and runtime experience will tell.
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
[[GitHub](https://github.com/pimvanpelt/lcpng.git)]. If you'd like to test this - reach out to the VPP
[[GitHub](https://git.ipng.ch/ipng/lcpng.git)]. If you'd like to test this - reach out to the VPP
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!

View File

@ -304,7 +304,7 @@ Gateway, just to show a few of the more advanced features of VPP. For me, this t
line of thinking: classifiers. This extract/match/act pattern can be used in policers, ACLs and
arbitrary traffic redirection through VPP's directed graph (eg. selecting a next node for
processing). I'm going to deep-dive into this classifier behavior in an upcoming article, and see
how I might add this to [[vppcfg](https://github.com/pimvanpelt/vppcfg.git)], because I think it
how I might add this to [[vppcfg](https://git.ipng.ch/ipng/vppcfg.git)], because I think it
would be super powerful to abstract away the rather complex underlying API into something a little
bit more ... user friendly. Stay tuned! :)

View File

@ -359,7 +359,7 @@ does not have an IPv4 address. Except -- I'm bending the rules a little bit by d
There's an internal function `ip4_sw_interface_enable_disable()` which is called to enable IPv4
processing on an interface once the first IPv4 address is added. So my first fix is to force this to
be enabled for any interface that is exposed via Linux Control Plane, notably in `lcp_itf_pair_create()`
[[here](https://github.com/pimvanpelt/lcpng/blob/main/lcpng_interface.c#L777)].
[[here](https://git.ipng.ch/ipng/lcpng/blob/main/lcpng_interface.c#L777)].
This approach is partially effective:
@ -500,7 +500,7 @@ which is unnumbered. Because I don't know for sure if everybody would find this
I make sure to guard the behavior behind a backwards compatible configuration option.
If you're curious, please take a look at the change in my [[GitHub
repo](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
repo](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
which I:
1. add a new configuration option, `lcp-sync-unnumbered`, which defaults to `on`. That would be
what the plugin would do in the normal case: copy forward these borrowed IP addresses to Linux.

View File

@ -147,7 +147,7 @@ With all of that, I am ready to demonstrate two working solutions now. I first c
Ondrej's [[commit](https://gitlab.nic.cz/labs/bird/-/commit/280daed57d061eb1ebc89013637c683fe23465e8)].
Then, I compile VPP with my pending [[gerrit](https://gerrit.fd.io/r/c/vpp/+/40482)]. Finally,
to demonstrate how `update_loopback_addr()` might work, I compile `lcpng` with my previous
[[commit](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)],
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)],
which allows me to inhibit copying forward addresses from VPP to Linux, when using _unnumbered_
interfaces.

View File

@ -250,10 +250,10 @@ remove the IPv4 and IPv6 addresses from the <span style='color:red;font-weight:b
routers in Br&uuml;ttisellen. They are directly connected, and if anything goes wrong, I can walk
over and rescue them. Sounds like a safe way to start!
I quickly add the ability for [[vppcfg](https://github.com/pimvanpelt/vppcfg)] to configure
I quickly add the ability for [[vppcfg](https://git.ipng.ch/ipng/vppcfg)] to configure
_unnumbered_ interfaces. In VPP, these are interfaces that don't have an IPv4 or IPv6 address of
their own, but they borrow one from another interface. If you're curious, you can take a look at the
[[User Guide](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md#interfaces)] on
[[User Guide](https://git.ipng.ch/ipng/vppcfg/blob/main/docs/config-guide.md#interfaces)] on
GitHub.
Looking at their `vppcfg` files, the change is actually very easy, taking as an example the
@ -291,7 +291,7 @@ interface.
In the article, you'll see that discussed as _Solution 2_, and it includes a bit of rationale why I
find this better. I implemented it in this
[[commit](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
case you're curious, and the commandline keyword is `lcp lcp-sync-unnumbered off` (the default is
_on_).

View File

@ -230,7 +230,7 @@ does not have any form of configuration persistence and that's deliberate. VPP's
programmable dataplane, and explicitly has left the programming and configuration as an exercise for
integrators. I have written a Python project that takes a YAML file as input and uses it to
configure (and reconfigure, on the fly) the dataplane automatically, called
[[VPPcfg](https://github.com/pimvanpelt/vppcfg.git)]. Previously, I wrote some implementation thoughts
[[VPPcfg](https://git.ipng.ch/ipng/vppcfg.git)]. Previously, I wrote some implementation thoughts
on its [[datamodel]({{< ref 2022-03-27-vppcfg-1 >}})] and its [[operations]({{< ref 2022-04-02-vppcfg-2
>}})] so I won't repeat that here. Instead, I will just show the configuration:

View File

@ -19,11 +19,11 @@ performance almost the same as on bare metal. But did you know that VPP can also
The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa.
One of the presenters was Nardus le Roux of Nokia, and he showed off a project called
[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing
container-based networking labs. It starts the containers, builds a virtual wiring between them to
create lab topologies of users choice and manages labs lifecycle.
container-based networking labs. It starts the containers, builds virtual wiring between them to
create lab topologies of users' choice and manages the lab lifecycle.
Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise
to actually add them. In the previous [[article]({{< ref 2025-05-03-containerlab-1.md >}})], I took
to actually add it. In my previous [[article]({{< ref 2025-05-03-containerlab-1.md >}})], I took
a good look at VPP as a dockerized container. In this article, I'll explore how to make such a
container run in Containerlab!
@ -49,7 +49,7 @@ RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean
# Build vppcfg
RUN pip install --break-system-packages build netaddr yamale argparse pyyaml ipaddress
RUN git clone https://github.com/pimvanpelt/vppcfg.git && cd vppcfg && python3 -m build && \
RUN git clone https://git.ipng.ch/ipng/vppcfg.git && cd vppcfg && python3 -m build && \
pip install --break-system-packages dist/vppcfg-*-py3-none-any.whl
# Config files

View File

@ -0,0 +1,713 @@
---
date: "2025-05-28T22:07:23Z"
title: 'Case Study: Minio S3 - Part 1'
---
{{< image float="right" src="/assets/minio/minio-logo.png" alt="MinIO Logo" width="6em" >}}
# Introduction
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading
scalability, data availability, security, and performance. Millions of customers of all sizes and
industries store, manage, analyze, and protect any amount of data for virtually any use case, such
as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and
easy-to-use management features, you can optimize costs, organize and analyze data, and configure
fine-tuned access controls to meet specific business and compliance requirements.
Amazon's S3 became the _de facto_ standard object storage system, and there exist several fully open
source implementations of the protocol. One of them is MinIO: designed to allow enterprises to
consolidate all of their data on a single, private cloud namespace. Architected using the same
principles as the hyperscalers, AIStor delivers performance at scale at a fraction of the cost
compared to the public cloud.
IPng Networks is an Internet Service Provider, but I also dabble in self-hosting things, for
example [[PeerTube](https://video.ipng.ch/)], [[Mastodon](https://ublog.tech/)],
[[Immich](https://photos.ipng.ch/)], [[Pixelfed](https://pix.ublog.tech/)] and of course
[[Hugo](https://ipng/ch/)]. These services all have one thing in common: they tend to use lots of
storage when they grow. At IPng Networks, all hypervisors ship with enterprise SAS flash drives,
mostly 1.92TB and 3.84TB. Scaling up each of these services, and backing them up safely, can be
quite the headache.
This article is for the storage-buffs. I'll set up a set of distributed MinIO nodes from scatch.
## Physical
{{< image float="right" src="/assets/minio/disks.png" alt="MinIO Disks" width="16em" >}}
I'll start with the basics. I still have a few Dell R720 servers laying around, they are getting a
bit older but still have 24 cores and 64GB of memory. First I need to get me some disks. I order
36pcs of 16TB SATA enterprise disk, a mixture of Seagate EXOS and Toshiba MG series disks. I've once
learned (the hard way), that buying a big stack of disks from one production run is a risk - so I'll
mix and match the drives.
Three trays of caddies and a melted credit card later, I have 576TB of SATA disks safely in hand.
Each machine will carry 192TB of raw storage. The nice thing about this chassis is that Dell can
ship them with 12x 3.5" SAS slots in the front, and 2x 2.5" SAS slots in the rear of the chassis.
So I'll install Debian Bookworm on one small 480G SSD in software RAID1.
### Cloning an install
I have three identical machines so in total I'll want six of these SSDs. I temporarily screw the
other five in 3.5" drive caddies and plug them into the first installed Dell, which I've called
`minio-proto`:
```
pim@minio-proto:~$ for i in b c d e f; do
sudo dd if=/dev/sda of=/dev/sd${i} bs=512 count=1;
sudo mdadm --manage /dev/md0 --add /dev/md${i}1
done
pim@minio-proto:~$ sudo mdadm --manage /dev/md0 --grow 6
pim@minio-proto:~$ watch cat /proc/mdstat
pim@minio-proto:~$ for i in a b c d e f; do
sudo grub-install /dev/sd$i
done
```
{{< image float="right" src="/assets/minio/rack.png" alt="MinIO Rack" width="16em" >}}
The first command takes my installed disk, `/dev/sda`, and copies the first sector over to the other
five. This will give them the same partition table. Next, I'll add the first partition of each disk
to the raidset. Then, I'll expand the raidset to have six members, after which the kernel starts a
recovery process that syncs the newly added paritions to `/dev/md0` (by copying from `/dev/sda` to
all other disks at once). Finally, I'll watch this exciting movie and grab a cup of tea.
Once the disks are fully copied, I'll shut down the machine and distribute the disks to their
respective Dell R720, two each. Once they boot they will all be identical. I'll need to make sure
their hostnames, and machine/host-id are unique, otherwise things like bridges will have overlapping
MAC addresses - ask me how I know:
```
pim@minio-proto:~$ sudo mdadm --manage /dev/md0 --grow -n 2
pim@minio-proto:~$ sudo rm /etc/ssh/ssh_host*
pim@minio-proto:~$ sudo hostname minio0-chbtl0
pim@minio-proto:~$ sudo dpkg-reconfigure openssh-server
pim@minio-proto:~$ sudo dd if=/dev/random of=/etc/hostid bs=4 count=1
pim@minio-proto:~$ sudo /usr/bin/dbus-uuidgen > /etc/machine-id
pim@minio-proto:~$ sudo reboot
```
After which I have three beautiful and unique machines:
* `minio0.chbtl0.net.ipng.ch`: which will go into my server rack at the IPng office.
* `minio0.ddln0.net.ipng.ch`: which will go to [[Daedalean]({{< ref
2022-02-24-colo >}})], doing AI since before it was all about vibe coding.
* `minio0.chrma0.net.ipng.ch`: which will go to [[IP-Max](https://ip-max.net/)], one of the best
ISPs on the planet. 🥰
## Deploying Minio
The user guide that MinIO provides
[[ref](https://min.io/docs/minio/linux/operations/installation.html)] is super good, arguably one of
the best documented open source projects I've ever seen. it shows me that I can do three types of
install. A 'Standalone' with one disk, a 'Standalone Multi-Drive', and a 'Distributed' deployment.
I decide to make three independent standalone multi-drive installs. This way, I have less shared
fate, and will be immune to network partitions (as these are going to be in three different
physical locations). I've also read about per-bucket _replication_, which will be an excellent way
to get geographical distribution and active/active instances to work together.
I feel good about the single-machine multi-drive decision. I follow the install guide
[[ref](https://min.io/docs/minio/linux/operations/install-deploy-manage/deploy-minio-single-node-multi-drive.html#minio-snmd)]
for this deployment type.
### IPng Frontends
At IPng I use a private IPv4/IPv6/MPLS network that is not connected to the internet. I call this
network [[IPng Site Local]({{< ref 2023-03-11-mpls-core.md >}})]. But how will users reach my Minio
install? I have four redundantly and geographically deployed frontends, two in the Netherlands and
two in Switzerland. I've described the frontend setup in a [[previous article]({{< ref
2023-03-17-ipng-frontends >}})] and the certificate management in [[this article]({{< ref
2023-03-24-lego-dns01 >}})].
I've decided to run the service on these three regionalized endpoints:
1. `s3.chbtl0.ipng.ch` which will back into `minio0.chbtl0.net.ipng.ch`
1. `s3.ddln0.ipng.ch` which will back into `minio0.ddln0.net.ipng.ch`
1. `s3.chrma0.ipng.ch` which will back into `minio0.chrma0.net.ipng.ch`
The first thing I take note of is that S3 buckets can be either addressed _by path_, in other words
something like `s3.chbtl0.ipng.ch/my-bucket/README.md`, but they can also be addressed by virtual
host, like so: `my-bucket.s3.chbtl0.ipng.ch/README.md`. A subtle difference, but from the docs I
understand that Minio needs to have control of the whole space under its main domain.
There's a small implication to this requirement -- the Web Console that ships with MinIO (eh, well,
maybe that's going to change, more on that later), will want to have its own domain-name, so I
choose something simple: `cons0-s3.chbtl0.ipng.ch` and so on. This way, somebody might still be able
to have a bucket name called `cons0` :)
#### Let's Encrypt Certificates
Alright, so I will be neading nine domains into this new certificate which I'll simply call
`s3.ipng.ch`. I configure it in Ansible:
```
certbot:
certs:
...
s3.ipng.ch:
groups: [ 'nginx', 'minio' ]
altnames:
- 's3.chbtl0.ipng.ch'
- 'cons0-s3.chbtl0.ipng.ch'
- '*.s3.chbtl0.ipng.ch'
- 's3.ddln0.ipng.ch'
- 'cons0-s3.ddln0.ipng.ch'
- '*.s3.ddln0.ipng.ch'
- 's3.chrma0.ipng.ch'
- 'cons0-s3.chrma0.ipng.ch'
- '*.s3.chrma0.ipng.ch'
```
I run the `certbot` playbook and it does two things:
1. On the machines from group `nginx` and `minio`, it will ensure there exists a user `lego` with
an SSH key and write permissions to `/etc/lego/`; this is where the automation will write (and
update) the certificate keys.
1. On the `lego` machine, it'll create two files. One is the certificate requestor, and the other
is a certificate distribution script that will copy the cert to the right machine(s) when it
renews.
On the `lego` machine, I'll run the cert request for the first time:
```
lego@lego:~$ bin/certbot:s3.ipng.ch
lego@lego:~$ RENEWED_LINEAGE=/home/lego/acme-dns/live/s3.ipng.ch bin/certbot-distribute
```
The first script asks me to add the _acme-challenge DNS entries, which I'll do, for example on the
`s3.chbtl0.ipng.ch` instance (and similar for the `ddln0` and `chrma0` ones:
```
$ORIGIN chbtl0.ipng.ch.
_acme-challenge.s3 CNAME 51f16fd0-8eb6-455c-b5cd-96fad12ef8fd.auth.ipng.ch.
_acme-challenge.cons0-s3 CNAME 450477b8-74c9-4b9e-bbeb-de49c3f95379.auth.ipng.ch.
s3 CNAME nginx0.ipng.ch.
*.s3 CNAME nginx0.ipng.ch.
cons0-s3 CNAME nginx0.ipng.ch.
```
I push and reload the `ipng.ch` zonefile with these changes after which the certificate gets
requested and a cronjob added to check for renewals. The second script will copy the newly created
cert to all three `minio` machines, and all four `nginx` machines. From now on, every 90 days, a new
cert will be automatically generated and distributed. Slick!
#### NGINX Configs
With the LE wildcard certs in hand, I can create an NGINX frontend for these minio deployments.
First, a simple redirector service that punts people on port 80 to port 443:
```
server {
listen [::]:80;
listen 0.0.0.0:80;
server_name cons0-s3.chbtl0.ipng.ch s3.chbtl0.ipng.ch *.s3.chbtl0.ipng.ch;
access_log /var/log/nginx/s3.chbtl0.ipng.ch-access.log;
include /etc/nginx/conf.d/ipng-headers.inc;
location / {
return 301 https://$server_name$request_uri;
}
}
```
Next, the Minio API service itself which runs on port 9000, with a configuration snippet inspired by
the MinIO [[docs](https://min.io/docs/minio/linux/integrations/setup-nginx-proxy-with-minio.html)]:
```
server {
listen [::]:443 ssl http2;
listen 0.0.0.0:443 ssl http2;
ssl_certificate /etc/certs/s3.ipng.ch/fullchain.pem;
ssl_certificate_key /etc/certs/s3.ipng.ch/privkey.pem;
include /etc/nginx/conf.d/options-ssl-nginx.inc;
ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;
server_name s3.chbtl0.ipng.ch *.s3.chbtl0.ipng.ch;
access_log /var/log/nginx/s3.chbtl0.ipng.ch-access.log upstream;
include /etc/nginx/conf.d/ipng-headers.inc;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
ignore_invalid_headers off;
client_max_body_size 0;
# Disable buffering
proxy_buffering off;
proxy_request_buffering off;
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 300;
proxy_http_version 1.1;
proxy_set_header Connection "";
chunked_transfer_encoding off;
proxy_pass http://minio0.chbtl0.net.ipng.ch:9000;
}
}
```
Finally, the Minio Console service which runs on port 9090:
```
include /etc/nginx/conf.d/geo-ipng-trusted.inc;
server {
listen [::]:443 ssl http2;
listen 0.0.0.0:443 ssl http2;
ssl_certificate /etc/certs/s3.ipng.ch/fullchain.pem;
ssl_certificate_key /etc/certs/s3.ipng.ch/privkey.pem;
include /etc/nginx/conf.d/options-ssl-nginx.inc;
ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;
server_name cons0-s3.chbtl0.ipng.ch;
access_log /var/log/nginx/cons0-s3.chbtl0.ipng.ch-access.log upstream;
include /etc/nginx/conf.d/ipng-headers.inc;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
ignore_invalid_headers off;
client_max_body_size 0;
# Disable buffering
proxy_buffering off;
proxy_request_buffering off;
location / {
if ($geo_ipng_trusted = 0) { rewrite ^ https://ipng.ch/ break; }
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-NginX-Proxy true;
real_ip_header X-Real-IP;
proxy_connect_timeout 300;
chunked_transfer_encoding off;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_pass http://minio0.chbtl0.net.ipng.ch:9090;
}
}
```
This last one has an NGINX trick. It will only allow users in if they are in the map called
`geo_ipng_trusted`, which contains a set of IPv4 and IPv6 prefixes. Visitors who are not in this map
will receive an HTTP redirect back to the [[IPng.ch](https://ipng.ch/)] homepage instead.
I run the Ansible Playbook which contains the NGINX changes to all frontends, but of course nothing
runs yet, because I haven't yet started MinIO backends.
### MinIO Backends
The first thing I need to do is get those disks mounted. MinIO likes using XFS, so I'll install that
and prepare the disks as follows:
```
pim@minio0-chbtl0:~$ sudo apt install xfsprogs
pim@minio0-chbtl0:~$ sudo modprobe xfs
pim@minio0-chbtl0:~$ echo xfs | sudo tee -a /etc/modules
pim@minio0-chbtl0:~$ sudo update-initramfs -k all -u
pim@minio0-chbtl0:~$ for i in a b c d e f g h i j k l; do sudo mkfs.xfs /dev/sd$i; done
pim@minio0-chbtl0:~$ blkid | awk 'BEGIN {i=1} /TYPE="xfs"/ {
printf "%s /minio/disk%d xfs defaults 0 2\n",$2,i; i++;
}' | sudo tee -a /etc/fstab
pim@minio0-chbtl0:~$ for i in `seq 1 12`; do sudo mkdir -p /minio/disk$i; done
pim@minio0-chbtl0:~$ sudo mount -t xfs -a
pim@minio0-chbtl0:~$ sudo chown -R minio-user: /minio/
```
From the top: I'll install `xfsprogs` which contains the things I need to manipulate XFS filesystems
in Debian. Then I'll install the `xfs` kernel module, and make sure it gets inserted upon subsequent
startup by adding it to `/etc/modules` and regenerating the initrd for the installed kernels.
Next, I'll format all twelve 16TB disks (which are `/dev/sda` - `/dev/sdl` on these machines), and
add their resulting blockdevice id's to `/etc/fstab` so they get persistently mounted on reboot.
Finally, I'll create their mountpoints, mount all XFS filesystems, and chown them to the user that
MinIO is running as. End result:
```
pim@minio0-chbtl0:~$ df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
udev devtmpfs 32950856 0 32950856 0% /dev
tmpfs tmpfs 6595340 1508 6593832 1% /run
/dev/md0 ext4 114695308 5423976 103398948 5% /
tmpfs tmpfs 32976680 0 32976680 0% /dev/shm
tmpfs tmpfs 5120 4 5116 1% /run/lock
/dev/sda xfs 15623792640 121505936 15502286704 1% /minio/disk1
/dev/sde xfs 15623792640 121505968 15502286672 1% /minio/disk12
/dev/sdi xfs 15623792640 121505968 15502286672 1% /minio/disk11
/dev/sdl xfs 15623792640 121505904 15502286736 1% /minio/disk10
/dev/sdd xfs 15623792640 121505936 15502286704 1% /minio/disk4
/dev/sdb xfs 15623792640 121505968 15502286672 1% /minio/disk3
/dev/sdk xfs 15623792640 121505936 15502286704 1% /minio/disk5
/dev/sdc xfs 15623792640 121505936 15502286704 1% /minio/disk9
/dev/sdf xfs 15623792640 121506000 15502286640 1% /minio/disk2
/dev/sdj xfs 15623792640 121505968 15502286672 1% /minio/disk7
/dev/sdg xfs 15623792640 121506000 15502286640 1% /minio/disk8
/dev/sdh xfs 15623792640 121505968 15502286672 1% /minio/disk6
tmpfs tmpfs 6595336 0 6595336 0% /run/user/0
```
MinIO likes to be configured using environment variables - and this is likely because it's a popular
thing to run in a containerized environment like Kubernetes. The maintainers ship it also as a
Debian package, which will read its environment from `/etc/default/minio`, and I'll prepare that
file as follows:
```
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /etc/default/minio
MINIO_DOMAIN="s3.chbtl0.ipng.ch,minio0.chbtl0.net.ipng.ch"
MINIO_ROOT_USER="XXX"
MINIO_ROOT_PASSWORD="YYY"
MINIO_VOLUMES="/minio/disk{1...12}"
MINIO_OPTS="--console-address :9001"
EOF
pim@minio0-chbtl0:~$ sudo systemctl enable --now minio
pim@minio0-chbtl0:~$ sudo journalctl -u minio
May 31 10:44:11 minio0-chbtl0 minio[690420]: MinIO Object Storage Server
May 31 10:44:11 minio0-chbtl0 minio[690420]: Copyright: 2015-2025 MinIO, Inc.
May 31 10:44:11 minio0-chbtl0 minio[690420]: License: GNU AGPLv3 - https://www.gnu.org/licenses/agpl-3.0.html
May 31 10:44:11 minio0-chbtl0 minio[690420]: Version: RELEASE.2025-05-24T17-08-30Z (go1.24.3 linux/amd64)
May 31 10:44:11 minio0-chbtl0 minio[690420]: API: http://198.19.4.11:9000 http://127.0.0.1:9000
May 31 10:44:11 minio0-chbtl0 minio[690420]: WebUI: https://cons0-s3.chbtl0.ipng.ch/
May 31 10:44:11 minio0-chbtl0 minio[690420]: Docs: https://docs.min.io
pim@minio0-chbtl0:~$ sudo ipmitool sensor | grep Watts
Pwr Consumption | 154.000 | Watts
```
Incidentally - I am pretty pleased with this 192TB disk tank, sporting 24 cores, 64GB memory and
2x10G network, casually hanging out at 154 Watts of power all up. Slick!
{{< image float="right" src="/assets/minio/minio-ec.svg" alt="MinIO Erasure Coding" width="22em" >}}
MinIO implements _erasure coding_ as a core component in providing availability and resiliency
during drive or node-level failure events. MinIO partitions each object into data and parity shards
and distributes those shards across a single so-called _erasure set_. Under the hood, it uses
[[Reed-Solomon](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction)] erasure coding
implementation and partitions the object for distribution. From the MinIO website, I'll borrow a
diagram to show how it looks like on a single node like mine to the right.
Anyway, MinIO detects 12 disks and installs an erasure set with 8 data disks and 4 parity disks,
which it calls `EC:4` encoding, also known in the industry as `RS8.4`.
Just like that, the thing shoots to life. Awesome!
### MinIO Client
On Summer, I'll install the MinIO Client called `mc`. This is easy because the maintainers ship a
Linux binary which I can just download. On OpenBSD, they don't do that. Not a problem though, on
Squanchy, Pencilvester and Glootie, I will just `go install` the client. Using the `mc` commandline,
I can all any of the S3 APIs on my new MinIO instance:
```
pim@summer:~$ set +o history
pim@summer:~$ mc alias set chbtl0 https://s3.chbtl0.ipng.ch/ <rootuser> <rootpass>
pim@summer:~$ set -o history
pim@summer:~$ mc admin info chbtl0/
● s3.chbtl0.ipng.ch
Uptime: 22 hours
Version: 2025-05-24T17:08:30Z
Network: 1/1 OK
Drives: 12/12 OK
Pool: 1
┌──────┬───────────────────────┬─────────────────────┬──────────────┐
│ Pool │ Drives Usage │ Erasure stripe size │ Erasure sets │
│ 1st │ 0.8% (total: 116 TiB) │ 12 │ 1 │
└──────┴───────────────────────┴─────────────────────┴──────────────┘
95 GiB Used, 5 Buckets, 5,859 Objects, 318 Versions, 1 Delete Marker
12 drives online, 0 drives offline, EC:4
```
Cool beans. I think I should get rid of this root account though, I've installed those credentials
into the `/etc/default/minio` environment file, but I don't want to keep them out in the open. So
I'll make an account for myself and assign me reasonable privileges, called `consoleAdmin` in the
default install:
```
pim@summer:~$ set +o history
pim@summer:~$ mc admin user add chbtl0/ <someuser> <somepass>
pim@summer:~$ mc admin policy info chbtl0 consoleAdmin
pim@summer:~$ mc admin policy attach chbtl0 consoleAdmin --user=<someuser>
pim@summer:~$ mc alias set chbtl0 https://s3.chbtl0.ipng.ch/ <someuser> <somepass>
pim@summer:~$ set -o history
```
OK, I feel less gross now that I'm not operating as root on the MinIO deployment. Using my new
user-powers, let me set some metadata on my new minio server:
```
pim@summer:~$ mc admin config set chbtl0/ site name=chbtl0 region=switzerland
Successfully applied new settings.
Please restart your server 'mc admin service restart chbtl0/'.
pim@summer:~$ mc admin service restart chbtl0/
Service status: ▰▰▱ [DONE]
Summary:
┌───────────────┬─────────────────────────────┐
│ Servers: │ 1 online, 0 offline, 0 hung │
│ Restart Time: │ 61.322886ms │
└───────────────┴─────────────────────────────┘
pim@summer:~$ mc admin config get chbtl0/ site
site name=chbtl0 region=switzerland
```
By the way, what's really cool about these open standards is that both the Amazon `aws` client works
with MinIO, but `mc` also works with AWS!
### MinIO Console
Although I'm pretty good with APIs and command line tools, there's some benefit also in using a
Graphical User Interface. MinIO ships with one, but there was a bit of a kerfuffle in the MinIO
community. Unfortunately, these are pretty common -- Redis (an open source key/value storage system)
changed their offering abruptly. Terraform (an open source infrastructure-as-code tool) changed
their licensing at some point. Ansible (an open source machine management tool) changed their
offering also. MinIO developers decided to strip their console of ~all features recently. The gnarly
bits are discussed on
[[reddit](https://www.reddit.com/r/selfhosted/comments/1kva3pw/avoid_minio_developers_introduce_trojan_horse/)].
but suffice to say: the same thing that happened in literally 100% of the other cases, also happened
here. Somebody decided to simply fork the code from before it was changed.
Enter OpenMaxIO. A cringe worthy name, but it gets the job done. Reading up on the
[[GitHub](https://github.com/OpenMaxIO/openmaxio-object-browser/issues/5)], reviving the fully
working console is pretty straight forward -- that is, once somebody spent a few days figuring it
out. Thank you `icesvz` for this excellent pointer. With this, I can create a systemd service for
the console and start it:
```
pim@minio0-chbtl0:~$ cat << EOF | sudo tee -a /etc/default/minio
## NOTE(pim): For openmaxio console service
CONSOLE_MINIO_SERVER="http://localhost:9000"
MINIO_BROWSER_REDIRECT_URL="https://cons0-s3.chbtl0.ipng.ch/"
EOF
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /lib/systemd/system/minio-console.service
[Unit]
Description=OpenMaxIO Console Service
Wants=network-online.target
After=network-online.target
AssertFileIsExecutable=/usr/local/bin/minio-console
[Service]
Type=simple
WorkingDirectory=/usr/local
User=minio-user
Group=minio-user
ProtectProc=invisible
EnvironmentFile=-/etc/default/minio
ExecStart=/usr/local/bin/minio-console server
Restart=always
LimitNOFILE=1048576
MemoryAccounting=no
TasksMax=infinity
TimeoutSec=infinity
OOMScoreAdjust=-1000
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
EOF
pim@minio0-chbtl0:~$ sudo systemctl enable --now minio-console
pim@minio0-chbtl0:~$ sudo systemctl restart minio
```
The first snippet is an update to the MinIO configuration that instructs it to redirect users who
are not trying to use the API to the console endpoint on `cons0-s3.chbtl0.ipng.ch`, and then the
console-server needs to know where to find the API, which from its vantage point is running on
`localhost:9000`. Hello, beautiful fully featured console:
{{< image src="/assets/minio/console-1.png" alt="MinIO Console" >}}
### MinIO Prometheus
MinIO ships with a prometheus metrics endpoint, and I notice on its console that it has a nice
metrics tab, which is fully greyed out. This is most likely because, well, I don't have a Prometheus
install here yet. I decide to keep the storage nodes self-contained and start a Prometheus server on
the local machine. I can always plumb that to IPng's Grafana instance later.
For now, I'll install Prometheus as follows:
```
pim@minio0-chbtl0:~$ cat << EOF | sudo tee -a /etc/default/minio
## NOTE(pim): Metrics for minio-console
MINIO_PROMETHEUS_AUTH_TYPE="public"
CONSOLE_PROMETHEUS_URL="http://localhost:19090/"
CONSOLE_PROMETHEUS_JOB_ID="minio-job"
EOF
pim@minio0-chbtl0:~$ sudo apt install prometheus
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /etc/default/prometheus
ARGS="--web.listen-address='[::]:19090' --storage.tsdb.retention.size=16GB"
EOF
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /etc/prometheus/prometheus.yml
global:
scrape_interval: 60s
scrape_configs:
- job_name: minio-job
metrics_path: /minio/v2/metrics/cluster
static_configs:
- targets: ['localhost:9000']
labels:
cluster: minio0-chbtl0
- job_name: minio-job-node
metrics_path: /minio/v2/metrics/node
static_configs:
- targets: ['localhost:9000']
labels:
cluster: minio0-chbtl0
- job_name: minio-job-bucket
metrics_path: /minio/v2/metrics/bucket
static_configs:
- targets: ['localhost:9000']
labels:
cluster: minio0-chbtl0
- job_name: minio-job-resource
metrics_path: /minio/v2/metrics/resource
static_configs:
- targets: ['localhost:9000']
labels:
cluster: minio0-chbtl0
- job_name: node
static_configs:
- targets: ['localhost:9100']
labels:
cluster: minio0-chbtl0
pim@minio0-chbtl0:~$ sudo systemctl restart minio prometheus
```
In the first snippet, I'll tell MinIO where it should find its Prometheus instance. Since the MinIO
console service is running on port 9090, and this is also the default port for Prometheus, I will
run Promtheus on port 19090 instead. From reading the MinIO docs, I can see that normally MinIO will
want prometheus to authenticate to it before it'll allow the endpoints to be scraped. I'll turn that
off by making these public. On the IPng Frontends, I can always remove access to /minio/v2 and
simply use the IPng Site Local access for local Prometheus scrapers instead.
After telling Prometheus its runtime arguments (in `/etc/default/prometheus`) and its scraping
endpoints (in `/etc/prometheus/prometheus.yml`), I can restart minio and prometheus. A few minutes
later, I can see the _Metrics_ tab in the console come to life.
But now that I have this prometheus running on the MinIO node, I can also add it to IPng's Grafana
configuration, by adding a new data source on `minio0.chbtl0.net.ipng.ch:19090` and pointing the
default Grafana [[Dashboard](https://grafana.com/grafana/dashboards/13502-minio-dashboard/)] at it:
{{< image src="/assets/minio/console-2.png" alt="Grafana Dashboard" >}}
A two-for-one: I will both be able to see metrics directly in the console, but also I will be able
to hook up these per-node prometheus instances into IPng's alertmanager also, and I've read some
[[docs](https://min.io/docs/minio/linux/operations/monitoring/collect-minio-metrics-using-prometheus.html)]
on the concepts. I'm really liking the experience so far!
### MinIO Nagios
Prometheus is fancy and all, but at IPng Networks, I've been doing monitoring for a while now. As a
dinosaur, I still have an active [[Nagios](https://www.nagios.org/)] install, which autogenerates
all of its configuration using the Ansible repository I have. So for the new Ansible group called
`minio`, I will autogenerate the following snippet:
```
define command {
command_name ipng_check_minio
command_line $USER1$/check_http -E -H $HOSTALIAS$ -I $ARG1$ -p $ARG2$ -u $ARG3$ -r '$ARG4$'
}
define service {
hostgroup_name ipng:minio:ipv6
service_description minio6:api
check_command ipng_check_minio!$_HOSTADDRESS6$!9000!/minio/health/cluster!
use ipng-service-fast
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
hostgroup_name ipng:minio:ipv6
service_description minio6:prom
check_command ipng_check_minio!$_HOSTADDRESS6$!19090!/classic/targets!minio-job
use ipng-service-fast
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
hostgroup_name ipng:minio:ipv6
service_description minio6:console
check_command ipng_check_minio!$_HOSTADDRESS6$!9090!/!MinIO Console
use ipng-service-fast
notification_interval 0 ; set > 0 if you want to be renotified
}
```
I've shown the snippet for IPv6 but I also have three services defined for legacy IP in the
hostgroup `ipng:minio:ipv4`. The check command here uses `-I` which has the IPv4 or IPv6 address to
talk to, `-p` for the port to consule, `-u` for the URI to hit and an option `-r` for a regular
expression to expect in the output. For the Nagios afficianados out there: my Ansible `groups`
correspond one to one with autogenerated Nagios `hostgroups`. This allows me to add arbitrary checks
by group-type, like above in the `ipng:minio` group for IPv4 and IPv6.
In the MinIO [[docs](https://min.io/docs/minio/linux/operations/monitoring/healthcheck-probe.html)]
I read up on the Healthcheck API. I choose to monitor the _Cluster Write Quorum_ on my minio
deployments. For Prometheus, I decide to hit the `targets` endpoint and expect the `minio-job` to be
among them. Finally, for the MinIO Console, I expect to see a login screen with the words `MinIO
Console` in the returned page. I guessed right, because Nagios is all green:
{{< image src="/assets/minio/nagios.png" alt="Nagios Dashboard" >}}
## My First Bucket
The IPng website is a statically generated Hugo site, and when-ever I submit a change to my Git
repo, a CI/CD runner (called [[Drone](https://www.drone.io/)]), picks up the change. It re-builds
the static website, and copies it to four redundant NGINX servers.
But IPng's website has amassed quite a bit of extra files (like VM images and VPP packages that I
publish), which are copied separately using a simple push script I have in my home directory. This
avoids all those big media files from cluttering the Git repository. I decide to move this stuff
into S3:
```
pim@summer:~/src/ipng-web-assets$ echo 'Gruezi World.' > ipng.ch/media/README.md
pim@summer:~/src/ipng-web-assets$ mc mb chbtl0/ipng-web-assets
pim@summer:~/src/ipng-web-assets$ mc mirror . chbtl0/ipng-web-assets/
...ch/media/README.md: 6.50 GiB / 6.50 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 236.38 MiB/s 28s
pim@summer:~/src/ipng-web-assets$ mc anonymous set download chbtl0/ipng-web-assets/
```
OK, two things that immediately jump out at me. This stuff is **fast**: Summer is connected with a
2.5GbE network card, and she's running hard, copying the 6.5GB of data that are in these web assets
essentially at line rate. It doesn't really surprise me because Summer is running off of Gen4 NVME,
while MinIO has 12 spinning disks which each can write about 160MB/s or so sustained
[[ref](https://www.seagate.com/www-content/datasheets/pdfs/exos-x16-DS2011-1-1904US-en_US.pdf)],
with 24 CPUs to tend to the NIC (2x10G) and disks (2x SSD, 12x LFF). Should be plenty!
The second is that MinIO allows for buckets to be publicly shared in three ways: 1) read-only by
setting `download`; 2) write-only by setting `upload`, and 3) read-write by setting `public`.
I set `download` here, which means I should be able to fetch an asset now publicly:
```
pim@summer:~$ curl https://s3.chbtl0.ipng.ch/ipng-web-assets/ipng.ch/media/README.md
Gruezi World.
pim@summer:~$ curl https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/README.md
Gruezi World.
```
The first `curl` here shows the path-based access, while the second one shows an equivalent
virtual-host based access. Both retrieve the file I just pushed via the public Internet. Whoot!
# What's Next
I'm going to be moving [[Restic](https://restic.net/)] backups from IPng's ZFS storage pool to this
S3 service over the next few days. I'll also migrate PeerTube and possibly Mastodon from NVME based
storage to replicated S3 buckets as well. Finally, the IPng website media that I mentioned above,
should make for a nice followup article. Stay tuned!

View File

@ -0,0 +1,475 @@
---
date: "2025-06-01T10:07:23Z"
title: 'Case Study: Minio S3 - Part 2'
---
{{< image float="right" src="/assets/minio/minio-logo.png" alt="MinIO Logo" width="6em" >}}
# Introduction
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading
scalability, data availability, security, and performance. Millions of customers of all sizes and
industries store, manage, analyze, and protect any amount of data for virtually any use case, such
as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and
easy-to-use management features, you can optimize costs, organize and analyze data, and configure
fine-tuned access controls to meet specific business and compliance requirements.
Amazon's S3 became the _de facto_ standard object storage system, and there exist several fully open
source implementations of the protocol. One of them is MinIO: designed to allow enterprises to
consolidate all of their data on a single, private cloud namespace. Architected using the same
principles as the hyperscalers, AIStor delivers performance at scale at a fraction of the cost
compared to the public cloud.
IPng Networks is an Internet Service Provider, but I also dabble in self-hosting things, for
example [[PeerTube](https://video.ipng.ch/)], [[Mastodon](https://ublog.tech/)],
[[Immich](https://photos.ipng.ch/)], [[Pixelfed](https://pix.ublog.tech/)] and of course
[[Hugo](https://ipng/ch/)]. These services all have one thing in common: they tend to use lots of
storage when they grow. At IPng Networks, all hypervisors ship with enterprise SAS flash drives,
mostly 1.92TB and 3.84TB. Scaling up each of these services, and backing them up safely, can be
quite the headache.
In a [[previous article]({{< ref 2025-05-28-minio-1 >}})], I talked through the install of a
redundant set of three Minio machines. In this article, I'll start putting them to good use.
## Use Case: Restic
{{< image float="right" src="/assets/minio/restic-logo.png" alt="Restic Logo" width="12em" >}}
[[Restic](https://restic.org/)] is a modern backup program that can back up your files from multiple
host OS, to many different storage types, easily, effectively, securely, verifiably and freely. With
a sales pitch like that, what's not to love? Actually, I am a long-time
[[BorgBackup](https://www.borgbackup.org/)] user, and I think I'll keep that running. However, for
resilience, and because I've heard only good things about Restic, I'll make a second backup of the
routers, hypervisors, and virtual machines using Restic.
Restic can use S3 buckets out of the box (incidentally, so can BorgBackup). To configure it, I use
a mixture of environment variables and flags. But first, let me create a bucket for the backups.
```
pim@glootie:~$ mc mb chbtl0/ipng-restic
pim@glootie:~$ mc admin user add chbtl0/ <key> <secret>
pim@glootie:~$ cat << EOF | tee ipng-restic-access.json
{
"PolicyName": "ipng-restic-access",
"Policy": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ],
"Resource": [ "arn:aws:s3:::ipng-restic", "arn:aws:s3:::ipng-restic/*" ]
}
]
},
}
EOF
pim@glootie:~$ mc admin policy create chbtl0/ ipng-restic-access.json
pim@glootie:~$ mc admin policy attach chbtl0/ ipng-restic-access --user <key>
```
First, I'll create a bucket called `ipng-restic`. Then, I'll create a _user_ with a given secret
_key_. To protect the innocent, and my backups, I'll not disclose them. Next, I'll create an
IAM policy that allows for Get/List/Put/Delete to be performed on the bucket and its contents, and
finally I'll attach this policy to the user I just created.
To run a Restic backup, I'll first have to create a so-called _repository_. The repository has a
location and a password, which Restic uses to encrypt the data. Because I'm using S3, I'll also need
to specify the key and secret:
```
root@glootie:~# RESTIC_PASSWORD="changeme"
root@glootie:~# RESTIC_REPOSITORY="s3:https://s3.chbtl0.ipng.ch/ipng-restic/$(hostname)/"
root@glootie:~# AWS_ACCESS_KEY_ID="<key>"
root@glootie:~# AWS_SECRET_ACCESS_KEY:="<secret>"
root@glootie:~# export RESTIC_PASSWORD RESTIC_REPOSITORY AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
root@glootie:~# restic init
created restic repository 807cf25e85 at s3:https://s3.chbtl0.ipng.ch/ipng-restic/glootie.ipng.ch/
```
Restic prints out some repository finterprint of the latest 'snapshot' it just created. Taking a
look on the MinIO install:
```
pim@glootie:~$ mc stat chbtl0/ipng-restic/glootie.ipng.ch/
Name : config
Date : 2025-06-01 12:01:43 UTC
Size : 155 B
ETag : 661a43f72c43080649712e45da14da3a
Type : file
Metadata :
Content-Type: application/octet-stream
Name : keys/
Date : 2025-06-01 12:03:33 UTC
Type : folder
```
Cool. Now I'm ready to make my first full backup:
```
root@glootie:~# ARGS="--exclude /proc --exclude /sys --exclude /dev --exclude /run"
root@glootie:~# ARGS="$ARGS --exclude-if-present .nobackup"
root@glootie:~# restic backup $ARGS /
...
processed 1141426 files, 131.111 GiB in 15:12
snapshot 34476c74 saved
```
Once the backup completes, the Restic authors advise me to also do a check of the repository, and to
prune it so that it keeps a finite amount of daily, weekly and monthly backups. My further journey
for Restic looks a bit like this:
```
root@glootie:~# restic check
using temporary cache in /tmp/restic-check-cache-2712250731
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:04] 100.00% 1 / 1 snapshots
no errors were found
root@glootie:~# restic forget --prune --keep-daily 8 --keep-weekly 5 --keep-monthly 6
repository 34476c74 opened (version 2, compression level auto)
Applying Policy: keep 8 daily, 5 weekly, 6 monthly snapshots
keep 1 snapshots:
ID Time Host Tags Reasons Paths
---------------------------------------------------------------------------------
34476c74 2025-06-01 12:18:54 glootie.ipng.ch daily snapshot /
weekly snapshot
monthly snapshot
----------------------------------------------------------------------------------
1 snapshots
```
Right on! I proceed to update the Ansible configs at IPng to roll this out against the entire fleet
of 152 hosts at IPng Networks. I do this in a little tool called `bitcron`, which I wrote for a
previous company I worked at: [[BIT](https://bit.nl)] in the Netherlands. Bitcron allows me to
create relatively elegant cronjobs that can raise warnings, errors and fatal issues. If no issues
are found, an e-mail can be sent to a bitbucket address, but if warnings or errors are found, a
different _monitored_ address will be used. Bitcron is kind of cool, and I wrote it in 2001. Maybe
I'll write about it, for old time's sake. I wonder if the folks at BIT still use it?
## Use Case: NGINX
{{< image float="right" src="/assets/minio/nginx-logo.png" alt="NGINX Logo" width="11em" >}}
OK, with the first use case out of the way, I turn my attention to a second - in my opinion more
interesting - use case. In the [[previous article]({{< ref 2025-05-28-minio-1 >}})], I created a
public bucket called `ipng-web-assets` in which I stored 6.50GB of website data belonging to the
IPng website, and some material I posted when I was on my
[[Sabbatical](https://sabbatical.ipng.nl/)] last year.
### MinIO: Bucket Replication
First things first: redundancy. These web assets are currently pushed to all four nginx machines,
and statically served. If I were to replace them with a single S3 bucket, I would create a single
point of failure, and that's _no bueno_!
Off I go, creating a replicated bucket using two MinIO instances (`chbtl0` and `ddln0`):
```
pim@glootie:~$ mc mb ddln0/ipng-web-assets
pim@glootie:~$ mc anonymous set download ddln0/ipng-web-assets
pim@glootie:~$ mc admin user add ddln0/ <replkey> <replsecret>
pim@glootie:~$ cat << EOF | tee ipng-web-assets-access.json
{
"PolicyName": "ipng-web-assets-access",
"Policy": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ],
"Resource": [ "arn:aws:s3:::ipng-web-assets", "arn:aws:s3:::ipng-web-assets/*" ]
}
]
},
}
EOF
pim@glootie:~$ mc admin policy create ddln0/ ipng-web-assets-access.json
pim@glootie:~$ mc admin policy attach ddln0/ ipng-web-assets-access --user <replkey>
pim@glootie:~$ mc replicate add chbtl0/ipng-web-assets \
--remote-bucket https://<key>:<secret>@s3.ddln0.ipng.ch/ipng-web-assets
```
What happens next is pure magic. I've told `chbtl0` that I want it to replicate all existing and
future changes to that bucket to its neighbor `ddln0`. Only minutes later, I check the replication
status, just to see that it's _already done_:
```
pim@glootie:~$ mc replicate status chbtl0/ipng-web-assets
Replication status since 1 hour
s3.ddln0.ipng.ch
Replicated: 142 objects (6.5 GiB)
Queued: ● 0 objects, 0 B (avg: 4 objects, 915 MiB ; max: 0 objects, 0 B)
Workers: 0 (avg: 0; max: 0)
Transfer Rate: 15 kB/s (avg: 88 MB/s; max: 719 MB/s
Latency: 3ms (avg: 3ms; max: 7ms)
Link: ● online (total downtime: 0 milliseconds)
Errors: 0 in last 1 minute; 0 in last 1hr; 0 since uptime
Configured Max Bandwidth (Bps): 644 GB/s Current Bandwidth (Bps): 975 B/s
pim@summer:~/src/ipng-web-assets$ mc ls ddln0/ipng-web-assets/
[2025-06-01 12:42:22 CEST] 0B ipng.ch/
[2025-06-01 12:42:22 CEST] 0B sabbatical.ipng.nl/
```
MinIO has pumped the data from bucket `ipng-web-assets` to the other machine at an average of 88MB/s
with a peak throughput of 719MB/s (probably for the larger VM images). And indeed, looking at the
remote machine, it is fully caught up after the push, within only a minute or so with a completely
fresh copy. Nice!
### MinIO: Missing directory index
I take a look at what I just built, on the following URL:
* [https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/IMG_0406_0.mp4](https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/IMG_0406_0.mp4)
That checks out, and I can see the mess that was my room when I first went on sabbatical. By the
way, I totally cleaned it up, see
[[here](https://sabbatical.ipng.nl/blog/2024/08/01/thursday-basement-done/)] for proof. I can't,
however, see the directory listing:
```
pim@glootie:~$ curl https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>sabbatical.ipng.nl/media/vdo/</Key>
<BucketName>ipng-web-assets</BucketName>
<Resource>/sabbatical.ipng.nl/media/vdo/</Resource>
<RequestId>1844EC0CFEBF3C5F</RequestId>
<HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId>
</Error>
```
That's unfortunate, because some of the IPng articles link to a directory full of files, which I'd
like to be shown so that my readers can navigate through the directories. Surely I'm not the first
to encounter this? And sure enough, I'm not
[[ref](https://github.com/glowinthedark/index-html-generator)] by user `glowinthedark` who wrote a
little python script that generates `index.html` files for their Caddy file server. I'll take me
some of that Python, thank you!
With the following little script, my setup is complete:
```
pim@glootie:~/src/ipng-web-assets$ cat push.sh
#!/usr/bin/env bash
echo "Generating index.html files ..."
for D in */media; do
echo "* Directory $D"
./genindex.py -r $D
done
echo "Done (genindex)"
echo ""
echo "Mirroring directoro to S3 Bucket"
mc mirror --remove --overwrite . chbtl0/ipng-web-assets/
echo "Done (mc mirror)"
echo ""
pim@glootie:~/src/ipng-web-assets$ ./push.sh
```
Only a few seconds after I run `./push.sh`, the replication is complete and I have two identical
copies of my media:
1. [https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/](https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/index.html)
1. [https://ipng-web-assets.s3.ddln0.ipng.ch/ipng.ch/media/](https://ipng-web-assets.s3.ddln0.ipng.ch/ipng.ch/media/index.html)
### NGINX: Proxy to Minio
Before moving to S3 storage, my NGINX frontends all kept a copy of the IPng media on local NVME
disk. That's great for reliability, as each NGINX instance is completely hermetic and standalone.
However, it's not great for scaling: the current NGINX instances only have 16GB of local storage,
and I'd rather not have my static web asset data outgrow that filesystem. From before, I already had
an NGINX config that served the Hugo static data from `/var/www/ipng.ch/ and the `/media'
subdirectory from a different directory in `/var/www/ipng-web-assets/ipng.ch/media`.
Moving to redundant S3 storage backenda is straight forward:
```
upstream minio_ipng {
least_conn;
server minio0.chbtl0.net.ipng.ch:9000;
server minio0.ddln0.net.ipng.ch:9000;
}
server {
...
location / {
root /var/www/ipng.ch/;
}
location /media {
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 300;
proxy_http_version 1.1;
proxy_set_header Connection "";
chunked_transfer_encoding off;
rewrite (.*)/$ $1/index.html;
proxy_pass http://minio_ipng/ipng-web-assets/ipng.ch/media;
}
}
```
I want to make note of a few things:
1. The `upstream` definition here uses IPng Site Local entrypoints, considering the NGINX servers
all have direct MTU=9000 access to the MinIO instances. I'll put both in there, in a
round-robin configuration favoring the replica with _least connections_.
1. Deeplinking to directory names without the trailing `/index.html` would serve a 404 from the
backend, so I'll intercept these and rewrite directory to always include the `/index.html'.
1. The used upstream endpoint is _path-based_, that is to say has the bucketname and website name
included. This whole location used to be simply `root /var/www/ipng-web-assets/ipng.ch/media/`
so the mental change is quite small.
### NGINX: Caching
After deploying the S3 upstream on all IPng websites, I can delete the old
`/var/www/ipng-web-assets/` directory and reclaim about 7GB of diskspace. This gives me an idea ...
{{< image width="8em" float="left" src="/assets/shared/brain.png" alt="brain" >}}
On the one hand it's great that I will pull these assets from Minio and all, but at the same time,
it's a tad inefficient to retrieve them from, say, Zurich to Amsterdam just to serve them onto the
internet again. If at any time something on the IPng website goes viral, it'd be nice to be able to
serve them directly from the edge, right?
A webcache. What could _possibly_ go wrong :)
NGINX is really really good at caching content. It has a powerful engine to store, scan, revalidate
and match any content and upstream headers. It's also very well documented, so I take a look at the
proxy module's documentation [[here](https://nginx.org/en/docs/http/ngx_http_proxy_module.html)] and
in particular a useful [[blog](https://blog.nginx.org/blog/nginx-caching-guide)] on their website.
The first thing I need to do is create what is called a _key zone_, which is a region of memory in
which URL keys are stored with some metadata. Having a copy of the keys in memory enables NGINX to
quickly determine if a request is a HIT or a MISS without having to go to disk, greatly speeding up
the check.
In `/etc/nginx/conf.d/ipng-cache.conf` I add the following NGINX cache:
```
proxy_cache_path /var/www/nginx-cache levels=1:2 keys_zone=ipng_cache:10m max_size=8g
inactive=24h use_temp_path=off;
```
With this statement, I'll create a 2-level subdirectory, and allocate 10MB of space, which should
hold on the order of 100K entries. The maximum size I'll allow the cache to grow to is 8GB, and I'll
mark any object inactive if it's not been referenced for 24 hours. I learn that inactive is
different to expired content. If a cache element has expired, but NGINX can't reach the upstream
for a new copy, it can be configured to serve a inactive (stale) copy from the cache. That's dope,
as it serves as an extra layer of defence in case the network or all available S3 replicas take the
day off. I'll ask NGINX to avoid writing objects first to a tmp directory and them moving them into
the `/var/www/nginx-cache` directory. These are recommendations I grab from the manual.
Within the `location` block I configured above, I'm now ready to enable this cache. I'll do that by
adding two include files, which I'll reference in all sites that I want to have make use of this
cache:
First, to enable the cache, I write the following snippet:
```
pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-cache.inc
proxy_cache ipng_cache;
proxy_ignore_headers Cache-Control;
proxy_cache_valid any 1h;
proxy_cache_revalidate on;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_background_update on;
```
Then, I find it useful to emit a few debugging HTTP headers, and at the same time I see that Minio
emits a bunch of HTTP headers that may not be safe for me to propagate, so I pen two more snippets:
```
pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-strip-minio-headers.inc
proxy_hide_header x-minio-deployment-id;
proxy_hide_header x-amz-request-id;
proxy_hide_header x-amz-id-2;
proxy_hide_header x-amz-replication-status;
proxy_hide_header x-amz-version-id;
pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-add-upstream-headers.inc
add_header X-IPng-Frontend $hostname always;
add_header X-IPng-Upstream $upstream_addr always;
add_header X-IPng-Upstream-Status $upstream_status always;
add_header X-IPng-Cache-Status $upstream_cache_status;
```
With that, I am ready to enable caching of the IPng `/media` location:
```
location /media {
...
include /etc/nginx/conf.d/ipng-strip-minio-headers.inc;
include /etc/nginx/conf.d/ipng-add-upstream-headers.inc;
include /etc/nginx/conf.d/ipng-cache.inc;
...
}
```
## Results
I run the Ansible playbook for the NGINX cluster and take a look at the replica at Coloclue in
Amsterdam, called `nginx0.nlams1.ipng.ch`. Notably, it'll have to retrieve the file from a MinIO
replica in Zurich (12ms away), so it's expected to take a little while.
The first attempt:
```
pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \
https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz
...
< last-modified: Sun, 01 Jun 2025 12:37:52 GMT
< x-ipng-frontend: nginx0-nlams1
< x-ipng-cache-status: MISS
< x-ipng-upstream: [2001:678:d78:503::b]:9000
< x-ipng-upstream-status: 200
100 711M 100 711M 0 0 26.2M 0 0:00:27 0:00:27 --:--:-- 26.6M
```
OK, that's respectable, I've read the file at 26MB/s. Of course I just turned on the cache, so the
NGINX fetches the file from Zurich while handing it over to my `curl` here. It notifies me by means
of a HTTP header that the cache was a `MISS`, and then which upstream server it contacted to
retrieve the object.
But look at what happens the _second_ time I run the same command:
```
pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \
https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz
< last-modified: Sun, 01 Jun 2025 12:37:52 GMT
< x-ipng-frontend: nginx0-nlams1
< x-ipng-cache-status: HIT
100 711M 100 711M 0 0 436M 0 0:00:01 0:00:01 --:--:-- 437M
```
Holy moly! First I see the object has the same _Last-Modified_ header, but I now also see that the
_Cache-Status_ was a `HIT`, and there is no mention of any upstream server. I do however see the
file come in at a whopping 437MB/s which is 16x faster than over the network!! Nice work, NGINX!
{{< image float="right" src="/assets/minio/rack-2.png" alt="Rack-o-Minio" width="12em" >}}
# What's Next
I'm going to deploy the third MinIO replica in R&uuml;mlang once the disks arrive. I'll release the
~4TB of disk used currently in Restic backups for the fleet, and put that ZFS capacity to other use.
Now, creating services like PeerTube, Mastodon, Pixelfed, Loops, NextCloud and what-have-you, will
become much easier for me. And with the per-bucket replication between MinIO deployments, I also
think this is a great way to auto-backup important data. First off, it'll be RS8.4 on the MinIO node
itself, and secondly, user data will be copied automatically to a neighboring facility.
I've convinced myself that S3 storage is a great service to operate, and that MinIO is awesome.

BIN
static/assets/minio/console-1.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
static/assets/minio/console-2.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
static/assets/minio/disks.png (Stored with Git LFS) Normal file

Binary file not shown.

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 90 KiB

BIN
static/assets/minio/minio-logo.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
static/assets/minio/nagios.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
static/assets/minio/nginx-logo.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
static/assets/minio/rack-2.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
static/assets/minio/rack.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
static/assets/minio/restic-logo.png (Stored with Git LFS) Normal file

Binary file not shown.