Compare commits
6 Commits
baa3e78045
...
main
Author | SHA1 | Date | |
---|---|---|---|
a76abc331f | |||
44deb34685 | |||
ca46bcf6d5 | |||
5042f822ef | |||
fdb77838b8 | |||
6d3f4ac206 |
@ -89,7 +89,7 @@ lcp lcp-sync off
|
||||
```
|
||||
|
||||
The prep work for the rest of the interface syncer starts with this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||
for the rest of this blog post, the behavior will be in the 'on' position.
|
||||
|
||||
### Change interface: state
|
||||
@ -120,7 +120,7 @@ the state it was. I did notice that you can't bring up a sub-interface if its pa
|
||||
is down, which I found counterintuitive, but that's neither here nor there.
|
||||
|
||||
All of this is to say that we have to be careful when copying state forward, because as
|
||||
this [[commit](https://github.com/pimvanpelt/lcpng/commit/7c15c84f6c4739860a85c599779c199cb9efef03)]
|
||||
this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c15c84f6c4739860a85c599779c199cb9efef03)]
|
||||
shows, issuing `set int state ... up` on an interface, won't touch its sub-interfaces in VPP, but
|
||||
the subsequent netlink message to bring the _LIP_ for that interface up, **will** update the
|
||||
children, thus desynchronising Linux and VPP: Linux will have interface **and all its
|
||||
@ -128,7 +128,7 @@ sub-interfaces** up unconditionally; VPP will have the interface up and its sub-
|
||||
whatever state they were before.
|
||||
|
||||
To address this, a second
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/a3dc56c01461bdffcac8193ead654ae79225220f)] was
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a3dc56c01461bdffcac8193ead654ae79225220f)] was
|
||||
needed. I'm not too sure I want to keep this behavior, but for now, it results in an intuitive
|
||||
end-state, which is that all interfaces states are exactly the same between Linux and VPP.
|
||||
|
||||
@ -157,7 +157,7 @@ DBGvpp# set int state TenGigabitEthernet3/0/0 up
|
||||
### Change interface: MTU
|
||||
|
||||
Finally, a straight forward
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/39bfa1615fd1cafe5df6d8fc9d34528e8d3906e2)], or
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/39bfa1615fd1cafe5df6d8fc9d34528e8d3906e2)], or
|
||||
so I thought. When the MTU changes in VPP (with `set interface mtu packet N <int>`), there is
|
||||
callback that can be registered which copies this into the _LIP_. I did notice a specific corner
|
||||
case: In VPP, a sub-interface can have a larger MTU than its parent. In Linux, this cannot happen,
|
||||
@ -179,7 +179,7 @@ higher than that, perhaps logging an error explaining why. This means two things
|
||||
1. Any change in VPP of a parent MTU should ensure all children are clamped to at most that.
|
||||
|
||||
I addressed the issue in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/79a395b3c9f0dae9a23e6fbf10c5f284b1facb85)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/79a395b3c9f0dae9a23e6fbf10c5f284b1facb85)].
|
||||
|
||||
### Change interface: IP Addresses
|
||||
|
||||
@ -199,7 +199,7 @@ VPP into the companion Linux devices:
|
||||
_LIP_ with `lcp_itf_set_interface_addr()`.
|
||||
|
||||
This means with this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/f7e1bb951d648a63dfa27d04ded0b6261b9e39fe)], at
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/f7e1bb951d648a63dfa27d04ded0b6261b9e39fe)], at
|
||||
any time a new _LIP_ is created, the IPv4 and IPv6 address on the VPP interface are fully copied
|
||||
over by the third change, while at runtime, new addresses can be set/removed as well by the first
|
||||
and second change.
|
||||
|
@ -100,7 +100,7 @@ linux-cp {
|
||||
|
||||
Based on this config, I set the startup default in `lcp_set_lcp_auto_subint()`, but I realize that
|
||||
an administrator may want to turn it on/off at runtime, too, so I add a CLI getter/setter that
|
||||
interacts with the flag in this [[commit](https://github.com/pimvanpelt/lcpng/commit/d23aab2d95aabcf24efb9f7aecaf15b513633ab7)]:
|
||||
interacts with the flag in this [[commit](https://git.ipng.ch/ipng/lcpng/commit/d23aab2d95aabcf24efb9f7aecaf15b513633ab7)]:
|
||||
|
||||
```
|
||||
DBGvpp# show lcp
|
||||
@ -116,11 +116,11 @@ lcp lcp-sync off
|
||||
```
|
||||
|
||||
The prep work for the rest of the interface syncer starts with this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/2d00de080bd26d80ce69441b1043de37e0326e0a)], and
|
||||
for the rest of this blog post, the behavior will be in the 'on' position.
|
||||
|
||||
The code for the configuration toggle is in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||
|
||||
### Auto create/delete sub-interfaces
|
||||
|
||||
@ -145,7 +145,7 @@ I noticed that interface deletion had a bug (one that I fell victim to as well:
|
||||
remove the netlink device in the correct network namespace), which I fixed.
|
||||
|
||||
The code for the auto create/delete and the bugfix is in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/934446dcd97f51c82ddf133ad45b61b3aae14b2d)].
|
||||
|
||||
### Further Work
|
||||
|
||||
|
@ -154,7 +154,7 @@ For now, `lcp_nl_dispatch()` just throws the message away after logging it with
|
||||
a function that will come in very useful as I start to explore all the different Netlink message types.
|
||||
|
||||
The code that forms the basis of our Netlink Listener lives in [[this
|
||||
commit](https://github.com/pimvanpelt/lcpng/commit/c4e3043ea143d703915239b2390c55f7b6a9b0b1)] and
|
||||
commit](https://git.ipng.ch/ipng/lcpng/commit/c4e3043ea143d703915239b2390c55f7b6a9b0b1)] and
|
||||
specifically, here I want to call out I was not the primary author, I worked off of Matt and Neale's
|
||||
awesome work in this pending [Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122).
|
||||
|
||||
@ -182,7 +182,7 @@ Linux interface VPP is not aware of. But, if I can find the _LIP_, I can convert
|
||||
add or remove the ip4/ip6 neighbor adjacency.
|
||||
|
||||
The code for this first Netlink message handler lives in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/30bab1d3f9ab06670fbef2c7c6a658e7b77f7738)]. An
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/30bab1d3f9ab06670fbef2c7c6a658e7b77f7738)]. An
|
||||
ironic insight is that after writing the code, I don't think any of it will be necessary, because
|
||||
the interface plugin will already copy ARP and IPv6 ND packets back and forth and itself update its
|
||||
neighbor adjacency tables; but I'm leaving the code in for now.
|
||||
@ -197,7 +197,7 @@ it or remove it, and if there are no link-local addresses left, disable IPv6 on
|
||||
There's also a few multicast routes to add (notably 224.0.0.0/24 and ff00::/8, all-local-subnet).
|
||||
|
||||
The code for IP address handling is in this
|
||||
[[commit]](https://github.com/pimvanpelt/lcpng/commit/87742b4f541d389e745f0297d134e34f17b5b485), but
|
||||
[[commit]](https://git.ipng.ch/ipng/lcpng/commit/87742b4f541d389e745f0297d134e34f17b5b485), but
|
||||
when I took it out for a spin, I noticed something curious, looking at the log lines that are
|
||||
generated for the following sequence:
|
||||
|
||||
@ -236,7 +236,7 @@ interface and directly connected route addition/deletion is slightly different i
|
||||
So, I decide to take a little shortcut -- if an addition returns "already there", or a deletion returns
|
||||
"no such entry", I'll just consider it a successful addition and deletion respectively, saving my eyes
|
||||
from being screamed at by this red error message. I changed that in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/d63fbd8a9a612d038aa385e79a57198785d409ca)],
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/d63fbd8a9a612d038aa385e79a57198785d409ca)],
|
||||
turning this situation in a friendly green notice instead.
|
||||
|
||||
### Netlink: Link (existing)
|
||||
@ -267,7 +267,7 @@ To avoid this loop, I temporarily turn off `lcp-sync` just before handling a bat
|
||||
turn it back to its original state when I'm done with that.
|
||||
|
||||
The code for all/del of existing links is in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)].
|
||||
|
||||
### Netlink: Link (new)
|
||||
|
||||
@ -276,7 +276,7 @@ doesn't have a _LIP_ for, but specifically describes a VLAN interface? Well, th
|
||||
is trying to create a new sub-interface. And supporting that operation would be super cool, so let's go!
|
||||
|
||||
Using the earlier placeholder hint in `lcp_nl_link_add()` (see the previous
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)]),
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/e604dd34784e029b41a47baa3179296d15b0632e)]),
|
||||
I know that I've gotten a NEWLINK request but the Linux ifindex doesn't have a _LIP_. This could be
|
||||
because the interface is entirely foreign to VPP, for example somebody created a dummy interface or
|
||||
a VLAN sub-interface on one:
|
||||
@ -331,7 +331,7 @@ a boring `<phy>.<subid>` name.
|
||||
|
||||
Alright, without further ado, the code for the main innovation here, the implementation of
|
||||
`lcp_nl_link_add_vlan()`, is in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/45f408865688eb7ea0cdbf23aa6f8a973be49d1a)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/45f408865688eb7ea0cdbf23aa6f8a973be49d1a)].
|
||||
|
||||
## Results
|
||||
|
||||
|
@ -118,7 +118,7 @@ or Virtual Routing/Forwarding domains). So first, I need to add these:
|
||||
|
||||
All of this code was heavily inspired by the pending [[Gerrit](https://gerrit.fd.io/r/c/vpp/+/31122)]
|
||||
but a few finishing touches were added, and wrapped up in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/7a76498277edc43beaa680e91e3a0c1787319106)].
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/7a76498277edc43beaa680e91e3a0c1787319106)].
|
||||
|
||||
### Deletion
|
||||
|
||||
@ -459,7 +459,7 @@ it as 'unreachable' rather than deleting it. These are *additions* which have a
|
||||
but with an interface index of 1 (which, in Netlink, is 'lo'). This makes VPP intermittently crash, so I
|
||||
currently commented this out, while I gain better understanding. Result: blackhole/unreachable/prohibit
|
||||
specials can not be set using the plugin. Beware!
|
||||
(disabled in this [[commit](https://github.com/pimvanpelt/lcpng/commit/7c864ed099821f62c5be8cbe9ed3f4dd34000a42)]).
|
||||
(disabled in this [[commit](https://git.ipng.ch/ipng/lcpng/commit/7c864ed099821f62c5be8cbe9ed3f4dd34000a42)]).
|
||||
|
||||
## Credits
|
||||
|
||||
|
@ -88,7 +88,7 @@ stat['/if/rx-miss'][:, 1].sum() - returns the sum of packet counters for
|
||||
```
|
||||
|
||||
Alright, so let's grab that file and refactor it into a small library for me to use, I do
|
||||
this in [[this commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||
this in [[this commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||
|
||||
### VPP's API
|
||||
|
||||
@ -159,7 +159,7 @@ idx=19 name=tap4 mac=02:fe:17:06:fc:af mtu=9000 flags=3
|
||||
|
||||
So I added a little abstration with some error handling and one main function
|
||||
to return interfaces as a Python dictionary of those `sw_interface_details`
|
||||
tuples in [[this commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||
tuples in [[this commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/51eee915bf0f6267911da596b41a4475feaf212e)].
|
||||
|
||||
### AgentX
|
||||
|
||||
@ -207,9 +207,9 @@ once asked with `GetPDU` or `GetNextPDU` requests, by issuing a corresponding `R
|
||||
to the SNMP server -- it takes care of all the rest!
|
||||
|
||||
The resulting code is in [[this
|
||||
commit](https://github.com/pimvanpelt/vpp-snmp-agent/commit/8c9c1e2b4aa1d40a981f17581f92bba133dd2c29)]
|
||||
commit](https://git.ipng.ch/ipng/vpp-snmp-agent/commit/8c9c1e2b4aa1d40a981f17581f92bba133dd2c29)]
|
||||
but you can also check out the whole thing on
|
||||
[[Github](https://github.com/pimvanpelt/vpp-snmp-agent)].
|
||||
[[Github](https://git.ipng.ch/ipng/vpp-snmp-agent)].
|
||||
|
||||
### Building
|
||||
|
||||
|
@ -480,7 +480,7 @@ is to say, those packets which were destined to any IP address configured on the
|
||||
plane. Any traffic going _through_ VPP will never be seen by Linux! So, I'll have to be
|
||||
clever and count this traffic by polling VPP instead. This was the topic of my previous
|
||||
[VPP Part 6]({{< ref "2021-09-10-vpp-6" >}}) about the SNMP Agent. All of that code
|
||||
was released to [Github](https://github.com/pimvanpelt/vpp-snmp-agent), notably there's
|
||||
was released to [Github](https://git.ipng.ch/ipng/vpp-snmp-agent), notably there's
|
||||
a hint there for an `snmpd-dataplane.service` and a `vpp-snmp-agent.service`, including
|
||||
the compiled binary that reads from VPP and feeds this to SNMP.
|
||||
|
||||
|
@ -62,7 +62,7 @@ plugins:
|
||||
or route, or the system receiving ARP or IPv6 neighbor request/reply from neighbors), and applying
|
||||
these events to the VPP dataplane.
|
||||
|
||||
I've published the code on [Github](https://github.com/pimvanpelt/lcpng/) and I am targeting a release
|
||||
I've published the code on [Github](https://git.ipng.ch/ipng/lcpng/) and I am targeting a release
|
||||
in upstream VPP, hoping to make the upcoming 22.02 release in February 2022. I have a lot of ground to
|
||||
cover, but I will note that the plugin has been running in production in [AS8298]({{< ref "2021-02-27-network" >}})
|
||||
since Sep'21 and no crashes related to LinuxCP have been observed.
|
||||
@ -195,7 +195,7 @@ So grab a cup of tea, while we let Rhino stretch its legs, ehh, CPUs ...
|
||||
pim@rhino:~$ mkdir -p ~/src
|
||||
pim@rhino:~$ cd ~/src
|
||||
pim@rhino:~/src$ sudo apt install libmnl-dev
|
||||
pim@rhino:~/src$ git clone https://github.com/pimvanpelt/lcpng.git
|
||||
pim@rhino:~/src$ git clone https://git.ipng.ch/ipng/lcpng.git
|
||||
pim@rhino:~/src$ git clone https://gerrit.fd.io/r/vpp
|
||||
pim@rhino:~/src$ ln -s ~/src/lcpng ~/src/vpp/src/plugins/lcpng
|
||||
pim@rhino:~/src$ cd ~/src/vpp
|
||||
|
@ -33,7 +33,7 @@ In this first post, let's take a look at tablestakes: writing a YAML specificati
|
||||
configuration elements of VPP, and then ensures that the YAML file is both syntactically as well as
|
||||
semantically correct.
|
||||
|
||||
**Note**: Code is on [my Github](https://github.com/pimvanpelt/vppcfg), but it's not quite ready for
|
||||
**Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for
|
||||
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
|
||||
or reach out by [contacting us](/s/contact/).
|
||||
|
||||
@ -348,7 +348,7 @@ to mess up my (or your!) VPP router by feeding it garbage, so the lions' share o
|
||||
has been to assert the YAML file is both syntactically and semantically valid.
|
||||
|
||||
|
||||
In the mean time, you can take a look at my code on [GitHub](https://github.com/pimvanpelt/vppcfg), but to
|
||||
In the mean time, you can take a look at my code on [GitHub](https://git.ipng.ch/ipng/vppcfg), but to
|
||||
whet your appetite, here's a hefty configuration that demonstrates all implemented types:
|
||||
|
||||
```
|
||||
|
@ -32,7 +32,7 @@ the configuration to the dataplane. Welcome to `vppcfg`!
|
||||
In this second post of the series, I want to talk a little bit about how planning a path from a running
|
||||
configuration to a desired new configuration might look like.
|
||||
|
||||
**Note**: Code is on [my Github](https://github.com/pimvanpelt/vppcfg), but it's not quite ready for
|
||||
**Note**: Code is on [my Github](https://git.ipng.ch/ipng/vppcfg), but it's not quite ready for
|
||||
prime-time yet. Take a look, and engage with us on GitHub (pull requests preferred over issues themselves)
|
||||
or reach out by [contacting us](/s/contact/).
|
||||
|
||||
|
@ -171,12 +171,12 @@ GigabitEthernet1/0/0 1 up GigabitEthernet1/0/0
|
||||
|
||||
After this exploratory exercise, I have learned enough about the hardware to be able to take the
|
||||
Fitlet2 out for a spin. To configure the VPP instance, I turn to
|
||||
[[vppcfg](https://github.com/pimvanpelt/vppcfg)], which can take a YAML configuration file
|
||||
[[vppcfg](https://git.ipng.ch/ipng/vppcfg)], which can take a YAML configuration file
|
||||
describing the desired VPP configuration, and apply it safely to the running dataplane using the VPP
|
||||
API. I've written a few more posts on how it does that, notably on its [[syntax]({{< ref "2022-03-27-vppcfg-1" >}})]
|
||||
and its [[planner]({{< ref "2022-04-02-vppcfg-2" >}})]. A complete
|
||||
configuration guide on vppcfg can be found
|
||||
[[here](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md)].
|
||||
[[here](https://git.ipng.ch/ipng/vppcfg/blob/main/docs/config-guide.md)].
|
||||
|
||||
```
|
||||
pim@fitlet:~$ sudo dpkg -i {lib,}vpp*23.06*deb
|
||||
|
@ -185,7 +185,7 @@ forgetful chipmunk-sized brain!), so here, I'll only recap what's already writte
|
||||
|
||||
**1. BUILD:** For the first step, the build is straight forward, and yields a VPP instance based on
|
||||
`vpp-ext-deps_23.06-1` at version `23.06-rc0~71-g182d2b466`, which contains my
|
||||
[[LCPng](https://github.com/pimvanpelt/lcpng.git)] plugin. I then copy the packages to the router.
|
||||
[[LCPng](https://git.ipng.ch/ipng/lcpng.git)] plugin. I then copy the packages to the router.
|
||||
The router has an E-2286G CPU @ 4.00GHz with 6 cores and 6 hyperthreads. There's a really handy tool
|
||||
called `likwid-topology` that can show how the L1, L2 and L3 cache lines up with respect to CPU
|
||||
cores. Here I learn that CPU (0+6) and (1+7) share L1 and L2 cache -- so I can conclude that 0-5 are
|
||||
@ -351,7 +351,7 @@ in `vppcfg`:
|
||||
* When I create the initial `--novpp` config, there's a bug in `vppcfg` where I incorrectly
|
||||
reference a dataplane object which I haven't initialized (because with `--novpp` the tool
|
||||
will not contact the dataplane at all. That one was easy to fix, which I did in [[this
|
||||
commit](https://github.com/pimvanpelt/vppcfg/commit/0a0413927a0be6ed3a292a8c336deab8b86f5eee)]).
|
||||
commit](https://git.ipng.ch/ipng/vppcfg/commit/0a0413927a0be6ed3a292a8c336deab8b86f5eee)]).
|
||||
|
||||
After that small detour, I can now proceed to configure the dataplane by offering the resulting
|
||||
VPP commands, like so:
|
||||
@ -573,7 +573,7 @@ see is that which is destined to the controlplane (eg, to one of the IPv4 or IPv
|
||||
multicast/broadcast groups that they are participating in), so things like tcpdump or SNMP won't
|
||||
really work.
|
||||
|
||||
However, due to my [[vpp-snmp-agent](https://github.com/pimvanpelt/vpp-snmp-agent.git)], which is
|
||||
However, due to my [[vpp-snmp-agent](https://git.ipng.ch/ipng/vpp-snmp-agent.git)], which is
|
||||
feeding as an AgentX behind an snmpd that in turn is running in the `dataplane` namespace, SNMP scrapes
|
||||
work as they did before, albeit with a few different interface names.
|
||||
|
||||
|
@ -14,7 +14,7 @@ performance and versatility. For those of us who have used Cisco IOS/XR devices,
|
||||
_ASR_ (aggregation service router), VPP will look and feel quite familiar as many of the approaches
|
||||
are shared between the two.
|
||||
|
||||
I've been working on the Linux Control Plane [[ref](https://github.com/pimvanpelt/lcpng)], which you
|
||||
I've been working on the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)], which you
|
||||
can read all about in my series on VPP back in 2021:
|
||||
|
||||
[{: style="width:300px; float: right; margin-left: 1em;"}](https://video.ipng.ch/w/erc9sAofrSZ22qjPwmv6H4)
|
||||
@ -70,7 +70,7 @@ answered by a Response PDU.
|
||||
|
||||
Using parts of a Python Agentx library written by GitHub user hosthvo
|
||||
[[ref](https://github.com/hosthvo/pyagentx)], I tried my hands at writing one of these AgentX's.
|
||||
The resulting source code is on [[GitHub](https://github.com/pimvanpelt/vpp-snmp-agent)]. That's the
|
||||
The resulting source code is on [[GitHub](https://git.ipng.ch/ipng/vpp-snmp-agent)]. That's the
|
||||
one that's running in production ever since I started running VPP routers at IPng Networks AS8298.
|
||||
After the _AgentX_ exposes the dataplane interfaces and their statistics into _SNMP_, an open source
|
||||
monitoring tool such as LibreNMS [[ref](https://librenms.org/)] can discover the routers and draw
|
||||
@ -126,7 +126,7 @@ for any interface created in the dataplane.
|
||||
|
||||
I wish I were good at Go, but I never really took to the language. I'm pretty good at Python, but
|
||||
sorting through the stats segment isn't super quick as I've already noticed in the Python3 based
|
||||
[[VPP SNMP Agent](https://github.com/pimvanpelt/vpp-snmp-agent)]. I'm probably the world's least
|
||||
[[VPP SNMP Agent](https://git.ipng.ch/ipng/vpp-snmp-agent)]. I'm probably the world's least
|
||||
terrible C programmer, so maybe I can take a look at the VPP Stats Client and make sense of it. Luckily,
|
||||
there's an example already in `src/vpp/app/vpp_get_stats.c` and it reveals the following pattern:
|
||||
|
||||
|
@ -19,7 +19,7 @@ same time keep an IPng Site Local network with IPv4 and IPv6 that is separate fr
|
||||
based on hardware/silicon based forwarding at line rate and high availability. You can read all
|
||||
about my Centec MPLS shenanigans in [[this article]({{< ref "2023-03-11-mpls-core" >}})].
|
||||
|
||||
Ever since the release of the Linux Control Plane [[ref](https://github.com/pimvanpelt/lcpng)]
|
||||
Ever since the release of the Linux Control Plane [[ref](https://git.ipng.ch/ipng/lcpng)]
|
||||
plugin in VPP, folks have asked "What about MPLS?" -- I have never really felt the need to go this
|
||||
rabbit hole, because I figured that in this day and age, higher level IP protocols that do tunneling
|
||||
are just as performant, and a little bit less of an 'art' to get right. For example, the Centec
|
||||
|
@ -459,6 +459,6 @@ and VPP, and the overall implementation before attempting to use in production.
|
||||
we got at least some of this right, but testing and runtime experience will tell.
|
||||
|
||||
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
||||
[[GitHub](https://github.com/pimvanpelt/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||
[[GitHub](https://git.ipng.ch/ipng/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
||||
|
||||
|
@ -385,5 +385,5 @@ and VPP, and the overall implementation before attempting to use in production.
|
||||
we got at least some of this right, but testing and runtime experience will tell.
|
||||
|
||||
I will be silently porting the change into my own copy of the Linux Controlplane called lcpng on
|
||||
[[GitHub](https://github.com/pimvanpelt/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||
[[GitHub](https://git.ipng.ch/ipng/lcpng.git)]. If you'd like to test this - reach out to the VPP
|
||||
Developer [[mailinglist](mailto:vpp-dev@lists.fd.io)] any time!
|
||||
|
@ -304,7 +304,7 @@ Gateway, just to show a few of the more advanced features of VPP. For me, this t
|
||||
line of thinking: classifiers. This extract/match/act pattern can be used in policers, ACLs and
|
||||
arbitrary traffic redirection through VPP's directed graph (eg. selecting a next node for
|
||||
processing). I'm going to deep-dive into this classifier behavior in an upcoming article, and see
|
||||
how I might add this to [[vppcfg](https://github.com/pimvanpelt/vppcfg.git)], because I think it
|
||||
how I might add this to [[vppcfg](https://git.ipng.ch/ipng/vppcfg.git)], because I think it
|
||||
would be super powerful to abstract away the rather complex underlying API into something a little
|
||||
bit more ... user friendly. Stay tuned! :)
|
||||
|
||||
|
@ -359,7 +359,7 @@ does not have an IPv4 address. Except -- I'm bending the rules a little bit by d
|
||||
There's an internal function `ip4_sw_interface_enable_disable()` which is called to enable IPv4
|
||||
processing on an interface once the first IPv4 address is added. So my first fix is to force this to
|
||||
be enabled for any interface that is exposed via Linux Control Plane, notably in `lcp_itf_pair_create()`
|
||||
[[here](https://github.com/pimvanpelt/lcpng/blob/main/lcpng_interface.c#L777)].
|
||||
[[here](https://git.ipng.ch/ipng/lcpng/blob/main/lcpng_interface.c#L777)].
|
||||
|
||||
This approach is partially effective:
|
||||
|
||||
@ -500,7 +500,7 @@ which is unnumbered. Because I don't know for sure if everybody would find this
|
||||
I make sure to guard the behavior behind a backwards compatible configuration option.
|
||||
|
||||
If you're curious, please take a look at the change in my [[GitHub
|
||||
repo](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||
repo](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||
which I:
|
||||
1. add a new configuration option, `lcp-sync-unnumbered`, which defaults to `on`. That would be
|
||||
what the plugin would do in the normal case: copy forward these borrowed IP addresses to Linux.
|
||||
|
@ -147,7 +147,7 @@ With all of that, I am ready to demonstrate two working solutions now. I first c
|
||||
Ondrej's [[commit](https://gitlab.nic.cz/labs/bird/-/commit/280daed57d061eb1ebc89013637c683fe23465e8)].
|
||||
Then, I compile VPP with my pending [[gerrit](https://gerrit.fd.io/r/c/vpp/+/40482)]. Finally,
|
||||
to demonstrate how `update_loopback_addr()` might work, I compile `lcpng` with my previous
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)],
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)],
|
||||
which allows me to inhibit copying forward addresses from VPP to Linux, when using _unnumbered_
|
||||
interfaces.
|
||||
|
||||
|
@ -250,10 +250,10 @@ remove the IPv4 and IPv6 addresses from the <span style='color:red;font-weight:b
|
||||
routers in Brüttisellen. They are directly connected, and if anything goes wrong, I can walk
|
||||
over and rescue them. Sounds like a safe way to start!
|
||||
|
||||
I quickly add the ability for [[vppcfg](https://github.com/pimvanpelt/vppcfg)] to configure
|
||||
I quickly add the ability for [[vppcfg](https://git.ipng.ch/ipng/vppcfg)] to configure
|
||||
_unnumbered_ interfaces. In VPP, these are interfaces that don't have an IPv4 or IPv6 address of
|
||||
their own, but they borrow one from another interface. If you're curious, you can take a look at the
|
||||
[[User Guide](https://github.com/pimvanpelt/vppcfg/blob/main/docs/config-guide.md#interfaces)] on
|
||||
[[User Guide](https://git.ipng.ch/ipng/vppcfg/blob/main/docs/config-guide.md#interfaces)] on
|
||||
GitHub.
|
||||
|
||||
Looking at their `vppcfg` files, the change is actually very easy, taking as an example the
|
||||
@ -291,7 +291,7 @@ interface.
|
||||
|
||||
In the article, you'll see that discussed as _Solution 2_, and it includes a bit of rationale why I
|
||||
find this better. I implemented it in this
|
||||
[[commit](https://github.com/pimvanpelt/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||
[[commit](https://git.ipng.ch/ipng/lcpng/commit/a960d64a87849d312b32d9432ffb722672c14878)], in
|
||||
case you're curious, and the commandline keyword is `lcp lcp-sync-unnumbered off` (the default is
|
||||
_on_).
|
||||
|
||||
|
@ -230,7 +230,7 @@ does not have any form of configuration persistence and that's deliberate. VPP's
|
||||
programmable dataplane, and explicitly has left the programming and configuration as an exercise for
|
||||
integrators. I have written a Python project that takes a YAML file as input and uses it to
|
||||
configure (and reconfigure, on the fly) the dataplane automatically, called
|
||||
[[VPPcfg](https://github.com/pimvanpelt/vppcfg.git)]. Previously, I wrote some implementation thoughts
|
||||
[[VPPcfg](https://git.ipng.ch/ipng/vppcfg.git)]. Previously, I wrote some implementation thoughts
|
||||
on its [[datamodel]({{< ref 2022-03-27-vppcfg-1 >}})] and its [[operations]({{< ref 2022-04-02-vppcfg-2
|
||||
>}})] so I won't repeat that here. Instead, I will just show the configuration:
|
||||
|
||||
|
@ -19,11 +19,11 @@ performance almost the same as on bare metal. But did you know that VPP can also
|
||||
The other day I joined the [[ZANOG'25](https://nog.net.za/event1/zanog25/)] in Durban, South Africa.
|
||||
One of the presenters was Nardus le Roux of Nokia, and he showed off a project called
|
||||
[[Containerlab](https://containerlab.dev/)], which provides a CLI for orchestrating and managing
|
||||
container-based networking labs. It starts the containers, builds a virtual wiring between them to
|
||||
create lab topologies of users choice and manages labs lifecycle.
|
||||
container-based networking labs. It starts the containers, builds virtual wiring between them to
|
||||
create lab topologies of users' choice and manages the lab lifecycle.
|
||||
|
||||
Quite regularly I am asked 'when will you add VPP to Containerlab?', but at ZANOG I made a promise
|
||||
to actually add them. In the previous [[article]({{< ref 2025-05-03-containerlab-1.md >}})], I took
|
||||
to actually add it. In my previous [[article]({{< ref 2025-05-03-containerlab-1.md >}})], I took
|
||||
a good look at VPP as a dockerized container. In this article, I'll explore how to make such a
|
||||
container run in Containerlab!
|
||||
|
||||
@ -49,7 +49,7 @@ RUN apt-get update && apt-get -y install vpp vpp-plugin-core && apt-get clean
|
||||
|
||||
# Build vppcfg
|
||||
RUN pip install --break-system-packages build netaddr yamale argparse pyyaml ipaddress
|
||||
RUN git clone https://github.com/pimvanpelt/vppcfg.git && cd vppcfg && python3 -m build && \
|
||||
RUN git clone https://git.ipng.ch/ipng/vppcfg.git && cd vppcfg && python3 -m build && \
|
||||
pip install --break-system-packages dist/vppcfg-*-py3-none-any.whl
|
||||
|
||||
# Config files
|
||||
|
713
content/articles/2025-05-28-minio-1.md
Normal file
713
content/articles/2025-05-28-minio-1.md
Normal file
@ -0,0 +1,713 @@
|
||||
---
|
||||
date: "2025-05-28T22:07:23Z"
|
||||
title: 'Case Study: Minio S3 - Part 1'
|
||||
---
|
||||
|
||||
{{< image float="right" src="/assets/minio/minio-logo.png" alt="MinIO Logo" width="6em" >}}
|
||||
|
||||
# Introduction
|
||||
|
||||
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading
|
||||
scalability, data availability, security, and performance. Millions of customers of all sizes and
|
||||
industries store, manage, analyze, and protect any amount of data for virtually any use case, such
|
||||
as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and
|
||||
easy-to-use management features, you can optimize costs, organize and analyze data, and configure
|
||||
fine-tuned access controls to meet specific business and compliance requirements.
|
||||
|
||||
Amazon's S3 became the _de facto_ standard object storage system, and there exist several fully open
|
||||
source implementations of the protocol. One of them is MinIO: designed to allow enterprises to
|
||||
consolidate all of their data on a single, private cloud namespace. Architected using the same
|
||||
principles as the hyperscalers, AIStor delivers performance at scale at a fraction of the cost
|
||||
compared to the public cloud.
|
||||
|
||||
IPng Networks is an Internet Service Provider, but I also dabble in self-hosting things, for
|
||||
example [[PeerTube](https://video.ipng.ch/)], [[Mastodon](https://ublog.tech/)],
|
||||
[[Immich](https://photos.ipng.ch/)], [[Pixelfed](https://pix.ublog.tech/)] and of course
|
||||
[[Hugo](https://ipng/ch/)]. These services all have one thing in common: they tend to use lots of
|
||||
storage when they grow. At IPng Networks, all hypervisors ship with enterprise SAS flash drives,
|
||||
mostly 1.92TB and 3.84TB. Scaling up each of these services, and backing them up safely, can be
|
||||
quite the headache.
|
||||
|
||||
This article is for the storage-buffs. I'll set up a set of distributed MinIO nodes from scatch.
|
||||
|
||||
## Physical
|
||||
|
||||
{{< image float="right" src="/assets/minio/disks.png" alt="MinIO Disks" width="16em" >}}
|
||||
|
||||
I'll start with the basics. I still have a few Dell R720 servers laying around, they are getting a
|
||||
bit older but still have 24 cores and 64GB of memory. First I need to get me some disks. I order
|
||||
36pcs of 16TB SATA enterprise disk, a mixture of Seagate EXOS and Toshiba MG series disks. I've once
|
||||
learned (the hard way), that buying a big stack of disks from one production run is a risk - so I'll
|
||||
mix and match the drives.
|
||||
|
||||
Three trays of caddies and a melted credit card later, I have 576TB of SATA disks safely in hand.
|
||||
Each machine will carry 192TB of raw storage. The nice thing about this chassis is that Dell can
|
||||
ship them with 12x 3.5" SAS slots in the front, and 2x 2.5" SAS slots in the rear of the chassis.
|
||||
|
||||
So I'll install Debian Bookworm on one small 480G SSD in software RAID1.
|
||||
|
||||
### Cloning an install
|
||||
|
||||
I have three identical machines so in total I'll want six of these SSDs. I temporarily screw the
|
||||
other five in 3.5" drive caddies and plug them into the first installed Dell, which I've called
|
||||
`minio-proto`:
|
||||
|
||||
|
||||
```
|
||||
pim@minio-proto:~$ for i in b c d e f; do
|
||||
sudo dd if=/dev/sda of=/dev/sd${i} bs=512 count=1;
|
||||
sudo mdadm --manage /dev/md0 --add /dev/md${i}1
|
||||
done
|
||||
pim@minio-proto:~$ sudo mdadm --manage /dev/md0 --grow 6
|
||||
pim@minio-proto:~$ watch cat /proc/mdstat
|
||||
pim@minio-proto:~$ for i in a b c d e f; do
|
||||
sudo grub-install /dev/sd$i
|
||||
done
|
||||
```
|
||||
|
||||
{{< image float="right" src="/assets/minio/rack.png" alt="MinIO Rack" width="16em" >}}
|
||||
|
||||
The first command takes my installed disk, `/dev/sda`, and copies the first sector over to the other
|
||||
five. This will give them the same partition table. Next, I'll add the first partition of each disk
|
||||
to the raidset. Then, I'll expand the raidset to have six members, after which the kernel starts a
|
||||
recovery process that syncs the newly added paritions to `/dev/md0` (by copying from `/dev/sda` to
|
||||
all other disks at once). Finally, I'll watch this exciting movie and grab a cup of tea.
|
||||
|
||||
|
||||
Once the disks are fully copied, I'll shut down the machine and distribute the disks to their
|
||||
respective Dell R720, two each. Once they boot they will all be identical. I'll need to make sure
|
||||
their hostnames, and machine/host-id are unique, otherwise things like bridges will have overlapping
|
||||
MAC addresses - ask me how I know:
|
||||
|
||||
```
|
||||
pim@minio-proto:~$ sudo mdadm --manage /dev/md0 --grow -n 2
|
||||
pim@minio-proto:~$ sudo rm /etc/ssh/ssh_host*
|
||||
pim@minio-proto:~$ sudo hostname minio0-chbtl0
|
||||
pim@minio-proto:~$ sudo dpkg-reconfigure openssh-server
|
||||
pim@minio-proto:~$ sudo dd if=/dev/random of=/etc/hostid bs=4 count=1
|
||||
pim@minio-proto:~$ sudo /usr/bin/dbus-uuidgen > /etc/machine-id
|
||||
pim@minio-proto:~$ sudo reboot
|
||||
```
|
||||
|
||||
After which I have three beautiful and unique machines:
|
||||
* `minio0.chbtl0.net.ipng.ch`: which will go into my server rack at the IPng office.
|
||||
* `minio0.ddln0.net.ipng.ch`: which will go to [[Daedalean]({{< ref
|
||||
2022-02-24-colo >}})], doing AI since before it was all about vibe coding.
|
||||
* `minio0.chrma0.net.ipng.ch`: which will go to [[IP-Max](https://ip-max.net/)], one of the best
|
||||
ISPs on the planet. 🥰
|
||||
|
||||
|
||||
## Deploying Minio
|
||||
|
||||
The user guide that MinIO provides
|
||||
[[ref](https://min.io/docs/minio/linux/operations/installation.html)] is super good, arguably one of
|
||||
the best documented open source projects I've ever seen. it shows me that I can do three types of
|
||||
install. A 'Standalone' with one disk, a 'Standalone Multi-Drive', and a 'Distributed' deployment.
|
||||
I decide to make three independent standalone multi-drive installs. This way, I have less shared
|
||||
fate, and will be immune to network partitions (as these are going to be in three different
|
||||
physical locations). I've also read about per-bucket _replication_, which will be an excellent way
|
||||
to get geographical distribution and active/active instances to work together.
|
||||
|
||||
I feel good about the single-machine multi-drive decision. I follow the install guide
|
||||
[[ref](https://min.io/docs/minio/linux/operations/install-deploy-manage/deploy-minio-single-node-multi-drive.html#minio-snmd)]
|
||||
for this deployment type.
|
||||
|
||||
### IPng Frontends
|
||||
|
||||
At IPng I use a private IPv4/IPv6/MPLS network that is not connected to the internet. I call this
|
||||
network [[IPng Site Local]({{< ref 2023-03-11-mpls-core.md >}})]. But how will users reach my Minio
|
||||
install? I have four redundantly and geographically deployed frontends, two in the Netherlands and
|
||||
two in Switzerland. I've described the frontend setup in a [[previous article]({{< ref
|
||||
2023-03-17-ipng-frontends >}})] and the certificate management in [[this article]({{< ref
|
||||
2023-03-24-lego-dns01 >}})].
|
||||
|
||||
I've decided to run the service on these three regionalized endpoints:
|
||||
1. `s3.chbtl0.ipng.ch` which will back into `minio0.chbtl0.net.ipng.ch`
|
||||
1. `s3.ddln0.ipng.ch` which will back into `minio0.ddln0.net.ipng.ch`
|
||||
1. `s3.chrma0.ipng.ch` which will back into `minio0.chrma0.net.ipng.ch`
|
||||
|
||||
The first thing I take note of is that S3 buckets can be either addressed _by path_, in other words
|
||||
something like `s3.chbtl0.ipng.ch/my-bucket/README.md`, but they can also be addressed by virtual
|
||||
host, like so: `my-bucket.s3.chbtl0.ipng.ch/README.md`. A subtle difference, but from the docs I
|
||||
understand that Minio needs to have control of the whole space under its main domain.
|
||||
|
||||
There's a small implication to this requirement -- the Web Console that ships with MinIO (eh, well,
|
||||
maybe that's going to change, more on that later), will want to have its own domain-name, so I
|
||||
choose something simple: `cons0-s3.chbtl0.ipng.ch` and so on. This way, somebody might still be able
|
||||
to have a bucket name called `cons0` :)
|
||||
|
||||
#### Let's Encrypt Certificates
|
||||
|
||||
Alright, so I will be neading nine domains into this new certificate which I'll simply call
|
||||
`s3.ipng.ch`. I configure it in Ansible:
|
||||
|
||||
```
|
||||
certbot:
|
||||
certs:
|
||||
...
|
||||
s3.ipng.ch:
|
||||
groups: [ 'nginx', 'minio' ]
|
||||
altnames:
|
||||
- 's3.chbtl0.ipng.ch'
|
||||
- 'cons0-s3.chbtl0.ipng.ch'
|
||||
- '*.s3.chbtl0.ipng.ch'
|
||||
- 's3.ddln0.ipng.ch'
|
||||
- 'cons0-s3.ddln0.ipng.ch'
|
||||
- '*.s3.ddln0.ipng.ch'
|
||||
- 's3.chrma0.ipng.ch'
|
||||
- 'cons0-s3.chrma0.ipng.ch'
|
||||
- '*.s3.chrma0.ipng.ch'
|
||||
```
|
||||
|
||||
I run the `certbot` playbook and it does two things:
|
||||
1. On the machines from group `nginx` and `minio`, it will ensure there exists a user `lego` with
|
||||
an SSH key and write permissions to `/etc/lego/`; this is where the automation will write (and
|
||||
update) the certificate keys.
|
||||
1. On the `lego` machine, it'll create two files. One is the certificate requestor, and the other
|
||||
is a certificate distribution script that will copy the cert to the right machine(s) when it
|
||||
renews.
|
||||
|
||||
On the `lego` machine, I'll run the cert request for the first time:
|
||||
|
||||
```
|
||||
lego@lego:~$ bin/certbot:s3.ipng.ch
|
||||
lego@lego:~$ RENEWED_LINEAGE=/home/lego/acme-dns/live/s3.ipng.ch bin/certbot-distribute
|
||||
```
|
||||
|
||||
The first script asks me to add the _acme-challenge DNS entries, which I'll do, for example on the
|
||||
`s3.chbtl0.ipng.ch` instance (and similar for the `ddln0` and `chrma0` ones:
|
||||
|
||||
```
|
||||
$ORIGIN chbtl0.ipng.ch.
|
||||
_acme-challenge.s3 CNAME 51f16fd0-8eb6-455c-b5cd-96fad12ef8fd.auth.ipng.ch.
|
||||
_acme-challenge.cons0-s3 CNAME 450477b8-74c9-4b9e-bbeb-de49c3f95379.auth.ipng.ch.
|
||||
s3 CNAME nginx0.ipng.ch.
|
||||
*.s3 CNAME nginx0.ipng.ch.
|
||||
cons0-s3 CNAME nginx0.ipng.ch.
|
||||
```
|
||||
|
||||
I push and reload the `ipng.ch` zonefile with these changes after which the certificate gets
|
||||
requested and a cronjob added to check for renewals. The second script will copy the newly created
|
||||
cert to all three `minio` machines, and all four `nginx` machines. From now on, every 90 days, a new
|
||||
cert will be automatically generated and distributed. Slick!
|
||||
|
||||
#### NGINX Configs
|
||||
|
||||
With the LE wildcard certs in hand, I can create an NGINX frontend for these minio deployments.
|
||||
|
||||
First, a simple redirector service that punts people on port 80 to port 443:
|
||||
|
||||
```
|
||||
server {
|
||||
listen [::]:80;
|
||||
listen 0.0.0.0:80;
|
||||
|
||||
server_name cons0-s3.chbtl0.ipng.ch s3.chbtl0.ipng.ch *.s3.chbtl0.ipng.ch;
|
||||
access_log /var/log/nginx/s3.chbtl0.ipng.ch-access.log;
|
||||
include /etc/nginx/conf.d/ipng-headers.inc;
|
||||
|
||||
location / {
|
||||
return 301 https://$server_name$request_uri;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Next, the Minio API service itself which runs on port 9000, with a configuration snippet inspired by
|
||||
the MinIO [[docs](https://min.io/docs/minio/linux/integrations/setup-nginx-proxy-with-minio.html)]:
|
||||
|
||||
```
|
||||
server {
|
||||
listen [::]:443 ssl http2;
|
||||
listen 0.0.0.0:443 ssl http2;
|
||||
ssl_certificate /etc/certs/s3.ipng.ch/fullchain.pem;
|
||||
ssl_certificate_key /etc/certs/s3.ipng.ch/privkey.pem;
|
||||
include /etc/nginx/conf.d/options-ssl-nginx.inc;
|
||||
ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;
|
||||
|
||||
server_name s3.chbtl0.ipng.ch *.s3.chbtl0.ipng.ch;
|
||||
access_log /var/log/nginx/s3.chbtl0.ipng.ch-access.log upstream;
|
||||
include /etc/nginx/conf.d/ipng-headers.inc;
|
||||
|
||||
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
|
||||
|
||||
ignore_invalid_headers off;
|
||||
client_max_body_size 0;
|
||||
# Disable buffering
|
||||
proxy_buffering off;
|
||||
proxy_request_buffering off;
|
||||
|
||||
location / {
|
||||
proxy_set_header Host $http_host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
proxy_connect_timeout 300;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Connection "";
|
||||
chunked_transfer_encoding off;
|
||||
|
||||
proxy_pass http://minio0.chbtl0.net.ipng.ch:9000;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Finally, the Minio Console service which runs on port 9090:
|
||||
|
||||
```
|
||||
include /etc/nginx/conf.d/geo-ipng-trusted.inc;
|
||||
|
||||
server {
|
||||
listen [::]:443 ssl http2;
|
||||
listen 0.0.0.0:443 ssl http2;
|
||||
ssl_certificate /etc/certs/s3.ipng.ch/fullchain.pem;
|
||||
ssl_certificate_key /etc/certs/s3.ipng.ch/privkey.pem;
|
||||
include /etc/nginx/conf.d/options-ssl-nginx.inc;
|
||||
ssl_dhparam /etc/nginx/conf.d/ssl-dhparams.inc;
|
||||
|
||||
server_name cons0-s3.chbtl0.ipng.ch;
|
||||
access_log /var/log/nginx/cons0-s3.chbtl0.ipng.ch-access.log upstream;
|
||||
include /etc/nginx/conf.d/ipng-headers.inc;
|
||||
|
||||
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
|
||||
|
||||
ignore_invalid_headers off;
|
||||
client_max_body_size 0;
|
||||
# Disable buffering
|
||||
proxy_buffering off;
|
||||
proxy_request_buffering off;
|
||||
|
||||
location / {
|
||||
if ($geo_ipng_trusted = 0) { rewrite ^ https://ipng.ch/ break; }
|
||||
proxy_set_header Host $http_host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
proxy_set_header X-NginX-Proxy true;
|
||||
|
||||
real_ip_header X-Real-IP;
|
||||
proxy_connect_timeout 300;
|
||||
chunked_transfer_encoding off;
|
||||
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
|
||||
proxy_pass http://minio0.chbtl0.net.ipng.ch:9090;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This last one has an NGINX trick. It will only allow users in if they are in the map called
|
||||
`geo_ipng_trusted`, which contains a set of IPv4 and IPv6 prefixes. Visitors who are not in this map
|
||||
will receive an HTTP redirect back to the [[IPng.ch](https://ipng.ch/)] homepage instead.
|
||||
|
||||
I run the Ansible Playbook which contains the NGINX changes to all frontends, but of course nothing
|
||||
runs yet, because I haven't yet started MinIO backends.
|
||||
|
||||
### MinIO Backends
|
||||
|
||||
The first thing I need to do is get those disks mounted. MinIO likes using XFS, so I'll install that
|
||||
and prepare the disks as follows:
|
||||
|
||||
```
|
||||
pim@minio0-chbtl0:~$ sudo apt install xfsprogs
|
||||
pim@minio0-chbtl0:~$ sudo modprobe xfs
|
||||
pim@minio0-chbtl0:~$ echo xfs | sudo tee -a /etc/modules
|
||||
pim@minio0-chbtl0:~$ sudo update-initramfs -k all -u
|
||||
pim@minio0-chbtl0:~$ for i in a b c d e f g h i j k l; do sudo mkfs.xfs /dev/sd$i; done
|
||||
pim@minio0-chbtl0:~$ blkid | awk 'BEGIN {i=1} /TYPE="xfs"/ {
|
||||
printf "%s /minio/disk%d xfs defaults 0 2\n",$2,i; i++;
|
||||
}' | sudo tee -a /etc/fstab
|
||||
pim@minio0-chbtl0:~$ for i in `seq 1 12`; do sudo mkdir -p /minio/disk$i; done
|
||||
pim@minio0-chbtl0:~$ sudo mount -t xfs -a
|
||||
pim@minio0-chbtl0:~$ sudo chown -R minio-user: /minio/
|
||||
```
|
||||
|
||||
From the top: I'll install `xfsprogs` which contains the things I need to manipulate XFS filesystems
|
||||
in Debian. Then I'll install the `xfs` kernel module, and make sure it gets inserted upon subsequent
|
||||
startup by adding it to `/etc/modules` and regenerating the initrd for the installed kernels.
|
||||
|
||||
Next, I'll format all twelve 16TB disks (which are `/dev/sda` - `/dev/sdl` on these machines), and
|
||||
add their resulting blockdevice id's to `/etc/fstab` so they get persistently mounted on reboot.
|
||||
|
||||
Finally, I'll create their mountpoints, mount all XFS filesystems, and chown them to the user that
|
||||
MinIO is running as. End result:
|
||||
|
||||
```
|
||||
pim@minio0-chbtl0:~$ df -T
|
||||
Filesystem Type 1K-blocks Used Available Use% Mounted on
|
||||
udev devtmpfs 32950856 0 32950856 0% /dev
|
||||
tmpfs tmpfs 6595340 1508 6593832 1% /run
|
||||
/dev/md0 ext4 114695308 5423976 103398948 5% /
|
||||
tmpfs tmpfs 32976680 0 32976680 0% /dev/shm
|
||||
tmpfs tmpfs 5120 4 5116 1% /run/lock
|
||||
/dev/sda xfs 15623792640 121505936 15502286704 1% /minio/disk1
|
||||
/dev/sde xfs 15623792640 121505968 15502286672 1% /minio/disk12
|
||||
/dev/sdi xfs 15623792640 121505968 15502286672 1% /minio/disk11
|
||||
/dev/sdl xfs 15623792640 121505904 15502286736 1% /minio/disk10
|
||||
/dev/sdd xfs 15623792640 121505936 15502286704 1% /minio/disk4
|
||||
/dev/sdb xfs 15623792640 121505968 15502286672 1% /minio/disk3
|
||||
/dev/sdk xfs 15623792640 121505936 15502286704 1% /minio/disk5
|
||||
/dev/sdc xfs 15623792640 121505936 15502286704 1% /minio/disk9
|
||||
/dev/sdf xfs 15623792640 121506000 15502286640 1% /minio/disk2
|
||||
/dev/sdj xfs 15623792640 121505968 15502286672 1% /minio/disk7
|
||||
/dev/sdg xfs 15623792640 121506000 15502286640 1% /minio/disk8
|
||||
/dev/sdh xfs 15623792640 121505968 15502286672 1% /minio/disk6
|
||||
tmpfs tmpfs 6595336 0 6595336 0% /run/user/0
|
||||
```
|
||||
|
||||
MinIO likes to be configured using environment variables - and this is likely because it's a popular
|
||||
thing to run in a containerized environment like Kubernetes. The maintainers ship it also as a
|
||||
Debian package, which will read its environment from `/etc/default/minio`, and I'll prepare that
|
||||
file as follows:
|
||||
|
||||
```
|
||||
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /etc/default/minio
|
||||
MINIO_DOMAIN="s3.chbtl0.ipng.ch,minio0.chbtl0.net.ipng.ch"
|
||||
MINIO_ROOT_USER="XXX"
|
||||
MINIO_ROOT_PASSWORD="YYY"
|
||||
MINIO_VOLUMES="/minio/disk{1...12}"
|
||||
MINIO_OPTS="--console-address :9001"
|
||||
EOF
|
||||
pim@minio0-chbtl0:~$ sudo systemctl enable --now minio
|
||||
pim@minio0-chbtl0:~$ sudo journalctl -u minio
|
||||
May 31 10:44:11 minio0-chbtl0 minio[690420]: MinIO Object Storage Server
|
||||
May 31 10:44:11 minio0-chbtl0 minio[690420]: Copyright: 2015-2025 MinIO, Inc.
|
||||
May 31 10:44:11 minio0-chbtl0 minio[690420]: License: GNU AGPLv3 - https://www.gnu.org/licenses/agpl-3.0.html
|
||||
May 31 10:44:11 minio0-chbtl0 minio[690420]: Version: RELEASE.2025-05-24T17-08-30Z (go1.24.3 linux/amd64)
|
||||
May 31 10:44:11 minio0-chbtl0 minio[690420]: API: http://198.19.4.11:9000 http://127.0.0.1:9000
|
||||
May 31 10:44:11 minio0-chbtl0 minio[690420]: WebUI: https://cons0-s3.chbtl0.ipng.ch/
|
||||
May 31 10:44:11 minio0-chbtl0 minio[690420]: Docs: https://docs.min.io
|
||||
|
||||
pim@minio0-chbtl0:~$ sudo ipmitool sensor | grep Watts
|
||||
Pwr Consumption | 154.000 | Watts
|
||||
```
|
||||
|
||||
Incidentally - I am pretty pleased with this 192TB disk tank, sporting 24 cores, 64GB memory and
|
||||
2x10G network, casually hanging out at 154 Watts of power all up. Slick!
|
||||
|
||||
{{< image float="right" src="/assets/minio/minio-ec.svg" alt="MinIO Erasure Coding" width="22em" >}}
|
||||
|
||||
MinIO implements _erasure coding_ as a core component in providing availability and resiliency
|
||||
during drive or node-level failure events. MinIO partitions each object into data and parity shards
|
||||
and distributes those shards across a single so-called _erasure set_. Under the hood, it uses
|
||||
[[Reed-Solomon](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction)] erasure coding
|
||||
implementation and partitions the object for distribution. From the MinIO website, I'll borrow a
|
||||
diagram to show how it looks like on a single node like mine to the right.
|
||||
|
||||
Anyway, MinIO detects 12 disks and installs an erasure set with 8 data disks and 4 parity disks,
|
||||
which it calls `EC:4` encoding, also known in the industry as `RS8.4`.
|
||||
Just like that, the thing shoots to life. Awesome!
|
||||
|
||||
### MinIO Client
|
||||
|
||||
On Summer, I'll install the MinIO Client called `mc`. This is easy because the maintainers ship a
|
||||
Linux binary which I can just download. On OpenBSD, they don't do that. Not a problem though, on
|
||||
Squanchy, Pencilvester and Glootie, I will just `go install` the client. Using the `mc` commandline,
|
||||
I can all any of the S3 APIs on my new MinIO instance:
|
||||
|
||||
```
|
||||
pim@summer:~$ set +o history
|
||||
pim@summer:~$ mc alias set chbtl0 https://s3.chbtl0.ipng.ch/ <rootuser> <rootpass>
|
||||
pim@summer:~$ set -o history
|
||||
pim@summer:~$ mc admin info chbtl0/
|
||||
● s3.chbtl0.ipng.ch
|
||||
Uptime: 22 hours
|
||||
Version: 2025-05-24T17:08:30Z
|
||||
Network: 1/1 OK
|
||||
Drives: 12/12 OK
|
||||
Pool: 1
|
||||
|
||||
┌──────┬───────────────────────┬─────────────────────┬──────────────┐
|
||||
│ Pool │ Drives Usage │ Erasure stripe size │ Erasure sets │
|
||||
│ 1st │ 0.8% (total: 116 TiB) │ 12 │ 1 │
|
||||
└──────┴───────────────────────┴─────────────────────┴──────────────┘
|
||||
|
||||
95 GiB Used, 5 Buckets, 5,859 Objects, 318 Versions, 1 Delete Marker
|
||||
12 drives online, 0 drives offline, EC:4
|
||||
|
||||
```
|
||||
|
||||
Cool beans. I think I should get rid of this root account though, I've installed those credentials
|
||||
into the `/etc/default/minio` environment file, but I don't want to keep them out in the open. So
|
||||
I'll make an account for myself and assign me reasonable privileges, called `consoleAdmin` in the
|
||||
default install:
|
||||
|
||||
```
|
||||
pim@summer:~$ set +o history
|
||||
pim@summer:~$ mc admin user add chbtl0/ <someuser> <somepass>
|
||||
pim@summer:~$ mc admin policy info chbtl0 consoleAdmin
|
||||
pim@summer:~$ mc admin policy attach chbtl0 consoleAdmin --user=<someuser>
|
||||
pim@summer:~$ mc alias set chbtl0 https://s3.chbtl0.ipng.ch/ <someuser> <somepass>
|
||||
pim@summer:~$ set -o history
|
||||
```
|
||||
|
||||
OK, I feel less gross now that I'm not operating as root on the MinIO deployment. Using my new
|
||||
user-powers, let me set some metadata on my new minio server:
|
||||
|
||||
```
|
||||
pim@summer:~$ mc admin config set chbtl0/ site name=chbtl0 region=switzerland
|
||||
Successfully applied new settings.
|
||||
Please restart your server 'mc admin service restart chbtl0/'.
|
||||
pim@summer:~$ mc admin service restart chbtl0/
|
||||
Service status: ▰▰▱ [DONE]
|
||||
Summary:
|
||||
┌───────────────┬─────────────────────────────┐
|
||||
│ Servers: │ 1 online, 0 offline, 0 hung │
|
||||
│ Restart Time: │ 61.322886ms │
|
||||
└───────────────┴─────────────────────────────┘
|
||||
pim@summer:~$ mc admin config get chbtl0/ site
|
||||
site name=chbtl0 region=switzerland
|
||||
```
|
||||
|
||||
By the way, what's really cool about these open standards is that both the Amazon `aws` client works
|
||||
with MinIO, but `mc` also works with AWS!
|
||||
### MinIO Console
|
||||
|
||||
Although I'm pretty good with APIs and command line tools, there's some benefit also in using a
|
||||
Graphical User Interface. MinIO ships with one, but there was a bit of a kerfuffle in the MinIO
|
||||
community. Unfortunately, these are pretty common -- Redis (an open source key/value storage system)
|
||||
changed their offering abruptly. Terraform (an open source infrastructure-as-code tool) changed
|
||||
their licensing at some point. Ansible (an open source machine management tool) changed their
|
||||
offering also. MinIO developers decided to strip their console of ~all features recently. The gnarly
|
||||
bits are discussed on
|
||||
[[reddit](https://www.reddit.com/r/selfhosted/comments/1kva3pw/avoid_minio_developers_introduce_trojan_horse/)].
|
||||
but suffice to say: the same thing that happened in literally 100% of the other cases, also happened
|
||||
here. Somebody decided to simply fork the code from before it was changed.
|
||||
|
||||
Enter OpenMaxIO. A cringe worthy name, but it gets the job done. Reading up on the
|
||||
[[GitHub](https://github.com/OpenMaxIO/openmaxio-object-browser/issues/5)], reviving the fully
|
||||
working console is pretty straight forward -- that is, once somebody spent a few days figuring it
|
||||
out. Thank you `icesvz` for this excellent pointer. With this, I can create a systemd service for
|
||||
the console and start it:
|
||||
|
||||
```
|
||||
pim@minio0-chbtl0:~$ cat << EOF | sudo tee -a /etc/default/minio
|
||||
## NOTE(pim): For openmaxio console service
|
||||
CONSOLE_MINIO_SERVER="http://localhost:9000"
|
||||
MINIO_BROWSER_REDIRECT_URL="https://cons0-s3.chbtl0.ipng.ch/"
|
||||
EOF
|
||||
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /lib/systemd/system/minio-console.service
|
||||
[Unit]
|
||||
Description=OpenMaxIO Console Service
|
||||
Wants=network-online.target
|
||||
After=network-online.target
|
||||
AssertFileIsExecutable=/usr/local/bin/minio-console
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
|
||||
WorkingDirectory=/usr/local
|
||||
|
||||
User=minio-user
|
||||
Group=minio-user
|
||||
ProtectProc=invisible
|
||||
|
||||
EnvironmentFile=-/etc/default/minio
|
||||
ExecStart=/usr/local/bin/minio-console server
|
||||
Restart=always
|
||||
LimitNOFILE=1048576
|
||||
MemoryAccounting=no
|
||||
TasksMax=infinity
|
||||
TimeoutSec=infinity
|
||||
OOMScoreAdjust=-1000
|
||||
SendSIGKILL=no
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
pim@minio0-chbtl0:~$ sudo systemctl enable --now minio-console
|
||||
pim@minio0-chbtl0:~$ sudo systemctl restart minio
|
||||
```
|
||||
|
||||
The first snippet is an update to the MinIO configuration that instructs it to redirect users who
|
||||
are not trying to use the API to the console endpoint on `cons0-s3.chbtl0.ipng.ch`, and then the
|
||||
console-server needs to know where to find the API, which from its vantage point is running on
|
||||
`localhost:9000`. Hello, beautiful fully featured console:
|
||||
|
||||
{{< image src="/assets/minio/console-1.png" alt="MinIO Console" >}}
|
||||
|
||||
### MinIO Prometheus
|
||||
|
||||
MinIO ships with a prometheus metrics endpoint, and I notice on its console that it has a nice
|
||||
metrics tab, which is fully greyed out. This is most likely because, well, I don't have a Prometheus
|
||||
install here yet. I decide to keep the storage nodes self-contained and start a Prometheus server on
|
||||
the local machine. I can always plumb that to IPng's Grafana instance later.
|
||||
|
||||
For now, I'll install Prometheus as follows:
|
||||
|
||||
```
|
||||
pim@minio0-chbtl0:~$ cat << EOF | sudo tee -a /etc/default/minio
|
||||
## NOTE(pim): Metrics for minio-console
|
||||
MINIO_PROMETHEUS_AUTH_TYPE="public"
|
||||
CONSOLE_PROMETHEUS_URL="http://localhost:19090/"
|
||||
CONSOLE_PROMETHEUS_JOB_ID="minio-job"
|
||||
EOF
|
||||
|
||||
pim@minio0-chbtl0:~$ sudo apt install prometheus
|
||||
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /etc/default/prometheus
|
||||
ARGS="--web.listen-address='[::]:19090' --storage.tsdb.retention.size=16GB"
|
||||
EOF
|
||||
pim@minio0-chbtl0:~$ cat << EOF | sudo tee /etc/prometheus/prometheus.yml
|
||||
global:
|
||||
scrape_interval: 60s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: minio-job
|
||||
metrics_path: /minio/v2/metrics/cluster
|
||||
static_configs:
|
||||
- targets: ['localhost:9000']
|
||||
labels:
|
||||
cluster: minio0-chbtl0
|
||||
|
||||
- job_name: minio-job-node
|
||||
metrics_path: /minio/v2/metrics/node
|
||||
static_configs:
|
||||
- targets: ['localhost:9000']
|
||||
labels:
|
||||
cluster: minio0-chbtl0
|
||||
|
||||
- job_name: minio-job-bucket
|
||||
metrics_path: /minio/v2/metrics/bucket
|
||||
static_configs:
|
||||
- targets: ['localhost:9000']
|
||||
labels:
|
||||
cluster: minio0-chbtl0
|
||||
|
||||
- job_name: minio-job-resource
|
||||
metrics_path: /minio/v2/metrics/resource
|
||||
static_configs:
|
||||
- targets: ['localhost:9000']
|
||||
labels:
|
||||
cluster: minio0-chbtl0
|
||||
|
||||
- job_name: node
|
||||
static_configs:
|
||||
- targets: ['localhost:9100']
|
||||
labels:
|
||||
cluster: minio0-chbtl0
|
||||
pim@minio0-chbtl0:~$ sudo systemctl restart minio prometheus
|
||||
```
|
||||
|
||||
In the first snippet, I'll tell MinIO where it should find its Prometheus instance. Since the MinIO
|
||||
console service is running on port 9090, and this is also the default port for Prometheus, I will
|
||||
run Promtheus on port 19090 instead. From reading the MinIO docs, I can see that normally MinIO will
|
||||
want prometheus to authenticate to it before it'll allow the endpoints to be scraped. I'll turn that
|
||||
off by making these public. On the IPng Frontends, I can always remove access to /minio/v2 and
|
||||
simply use the IPng Site Local access for local Prometheus scrapers instead.
|
||||
|
||||
After telling Prometheus its runtime arguments (in `/etc/default/prometheus`) and its scraping
|
||||
endpoints (in `/etc/prometheus/prometheus.yml`), I can restart minio and prometheus. A few minutes
|
||||
later, I can see the _Metrics_ tab in the console come to life.
|
||||
|
||||
But now that I have this prometheus running on the MinIO node, I can also add it to IPng's Grafana
|
||||
configuration, by adding a new data source on `minio0.chbtl0.net.ipng.ch:19090` and pointing the
|
||||
default Grafana [[Dashboard](https://grafana.com/grafana/dashboards/13502-minio-dashboard/)] at it:
|
||||
|
||||
{{< image src="/assets/minio/console-2.png" alt="Grafana Dashboard" >}}
|
||||
|
||||
A two-for-one: I will both be able to see metrics directly in the console, but also I will be able
|
||||
to hook up these per-node prometheus instances into IPng's alertmanager also, and I've read some
|
||||
[[docs](https://min.io/docs/minio/linux/operations/monitoring/collect-minio-metrics-using-prometheus.html)]
|
||||
on the concepts. I'm really liking the experience so far!
|
||||
|
||||
### MinIO Nagios
|
||||
|
||||
Prometheus is fancy and all, but at IPng Networks, I've been doing monitoring for a while now. As a
|
||||
dinosaur, I still have an active [[Nagios](https://www.nagios.org/)] install, which autogenerates
|
||||
all of its configuration using the Ansible repository I have. So for the new Ansible group called
|
||||
`minio`, I will autogenerate the following snippet:
|
||||
|
||||
```
|
||||
define command {
|
||||
command_name ipng_check_minio
|
||||
command_line $USER1$/check_http -E -H $HOSTALIAS$ -I $ARG1$ -p $ARG2$ -u $ARG3$ -r '$ARG4$'
|
||||
}
|
||||
|
||||
define service {
|
||||
hostgroup_name ipng:minio:ipv6
|
||||
service_description minio6:api
|
||||
check_command ipng_check_minio!$_HOSTADDRESS6$!9000!/minio/health/cluster!
|
||||
use ipng-service-fast
|
||||
notification_interval 0 ; set > 0 if you want to be renotified
|
||||
}
|
||||
|
||||
define service {
|
||||
hostgroup_name ipng:minio:ipv6
|
||||
service_description minio6:prom
|
||||
check_command ipng_check_minio!$_HOSTADDRESS6$!19090!/classic/targets!minio-job
|
||||
use ipng-service-fast
|
||||
notification_interval 0 ; set > 0 if you want to be renotified
|
||||
}
|
||||
|
||||
define service {
|
||||
hostgroup_name ipng:minio:ipv6
|
||||
service_description minio6:console
|
||||
check_command ipng_check_minio!$_HOSTADDRESS6$!9090!/!MinIO Console
|
||||
use ipng-service-fast
|
||||
notification_interval 0 ; set > 0 if you want to be renotified
|
||||
}
|
||||
```
|
||||
|
||||
I've shown the snippet for IPv6 but I also have three services defined for legacy IP in the
|
||||
hostgroup `ipng:minio:ipv4`. The check command here uses `-I` which has the IPv4 or IPv6 address to
|
||||
talk to, `-p` for the port to consule, `-u` for the URI to hit and an option `-r` for a regular
|
||||
expression to expect in the output. For the Nagios afficianados out there: my Ansible `groups`
|
||||
correspond one to one with autogenerated Nagios `hostgroups`. This allows me to add arbitrary checks
|
||||
by group-type, like above in the `ipng:minio` group for IPv4 and IPv6.
|
||||
|
||||
In the MinIO [[docs](https://min.io/docs/minio/linux/operations/monitoring/healthcheck-probe.html)]
|
||||
I read up on the Healthcheck API. I choose to monitor the _Cluster Write Quorum_ on my minio
|
||||
deployments. For Prometheus, I decide to hit the `targets` endpoint and expect the `minio-job` to be
|
||||
among them. Finally, for the MinIO Console, I expect to see a login screen with the words `MinIO
|
||||
Console` in the returned page. I guessed right, because Nagios is all green:
|
||||
|
||||
{{< image src="/assets/minio/nagios.png" alt="Nagios Dashboard" >}}
|
||||
|
||||
## My First Bucket
|
||||
|
||||
The IPng website is a statically generated Hugo site, and when-ever I submit a change to my Git
|
||||
repo, a CI/CD runner (called [[Drone](https://www.drone.io/)]), picks up the change. It re-builds
|
||||
the static website, and copies it to four redundant NGINX servers.
|
||||
|
||||
But IPng's website has amassed quite a bit of extra files (like VM images and VPP packages that I
|
||||
publish), which are copied separately using a simple push script I have in my home directory. This
|
||||
avoids all those big media files from cluttering the Git repository. I decide to move this stuff
|
||||
into S3:
|
||||
|
||||
```
|
||||
pim@summer:~/src/ipng-web-assets$ echo 'Gruezi World.' > ipng.ch/media/README.md
|
||||
pim@summer:~/src/ipng-web-assets$ mc mb chbtl0/ipng-web-assets
|
||||
pim@summer:~/src/ipng-web-assets$ mc mirror . chbtl0/ipng-web-assets/
|
||||
...ch/media/README.md: 6.50 GiB / 6.50 GiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 236.38 MiB/s 28s
|
||||
pim@summer:~/src/ipng-web-assets$ mc anonymous set download chbtl0/ipng-web-assets/
|
||||
```
|
||||
|
||||
OK, two things that immediately jump out at me. This stuff is **fast**: Summer is connected with a
|
||||
2.5GbE network card, and she's running hard, copying the 6.5GB of data that are in these web assets
|
||||
essentially at line rate. It doesn't really surprise me because Summer is running off of Gen4 NVME,
|
||||
while MinIO has 12 spinning disks which each can write about 160MB/s or so sustained
|
||||
[[ref](https://www.seagate.com/www-content/datasheets/pdfs/exos-x16-DS2011-1-1904US-en_US.pdf)],
|
||||
with 24 CPUs to tend to the NIC (2x10G) and disks (2x SSD, 12x LFF). Should be plenty!
|
||||
|
||||
The second is that MinIO allows for buckets to be publicly shared in three ways: 1) read-only by
|
||||
setting `download`; 2) write-only by setting `upload`, and 3) read-write by setting `public`.
|
||||
I set `download` here, which means I should be able to fetch an asset now publicly:
|
||||
|
||||
```
|
||||
pim@summer:~$ curl https://s3.chbtl0.ipng.ch/ipng-web-assets/ipng.ch/media/README.md
|
||||
Gruezi World.
|
||||
pim@summer:~$ curl https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/README.md
|
||||
Gruezi World.
|
||||
```
|
||||
|
||||
The first `curl` here shows the path-based access, while the second one shows an equivalent
|
||||
virtual-host based access. Both retrieve the file I just pushed via the public Internet. Whoot!
|
||||
|
||||
# What's Next
|
||||
|
||||
I'm going to be moving [[Restic](https://restic.net/)] backups from IPng's ZFS storage pool to this
|
||||
S3 service over the next few days. I'll also migrate PeerTube and possibly Mastodon from NVME based
|
||||
storage to replicated S3 buckets as well. Finally, the IPng website media that I mentioned above,
|
||||
should make for a nice followup article. Stay tuned!
|
475
content/articles/2025-06-01-minio-2.md
Normal file
475
content/articles/2025-06-01-minio-2.md
Normal file
@ -0,0 +1,475 @@
|
||||
---
|
||||
date: "2025-06-01T10:07:23Z"
|
||||
title: 'Case Study: Minio S3 - Part 2'
|
||||
---
|
||||
|
||||
{{< image float="right" src="/assets/minio/minio-logo.png" alt="MinIO Logo" width="6em" >}}
|
||||
|
||||
# Introduction
|
||||
|
||||
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading
|
||||
scalability, data availability, security, and performance. Millions of customers of all sizes and
|
||||
industries store, manage, analyze, and protect any amount of data for virtually any use case, such
|
||||
as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and
|
||||
easy-to-use management features, you can optimize costs, organize and analyze data, and configure
|
||||
fine-tuned access controls to meet specific business and compliance requirements.
|
||||
|
||||
Amazon's S3 became the _de facto_ standard object storage system, and there exist several fully open
|
||||
source implementations of the protocol. One of them is MinIO: designed to allow enterprises to
|
||||
consolidate all of their data on a single, private cloud namespace. Architected using the same
|
||||
principles as the hyperscalers, AIStor delivers performance at scale at a fraction of the cost
|
||||
compared to the public cloud.
|
||||
|
||||
IPng Networks is an Internet Service Provider, but I also dabble in self-hosting things, for
|
||||
example [[PeerTube](https://video.ipng.ch/)], [[Mastodon](https://ublog.tech/)],
|
||||
[[Immich](https://photos.ipng.ch/)], [[Pixelfed](https://pix.ublog.tech/)] and of course
|
||||
[[Hugo](https://ipng/ch/)]. These services all have one thing in common: they tend to use lots of
|
||||
storage when they grow. At IPng Networks, all hypervisors ship with enterprise SAS flash drives,
|
||||
mostly 1.92TB and 3.84TB. Scaling up each of these services, and backing them up safely, can be
|
||||
quite the headache.
|
||||
|
||||
In a [[previous article]({{< ref 2025-05-28-minio-1 >}})], I talked through the install of a
|
||||
redundant set of three Minio machines. In this article, I'll start putting them to good use.
|
||||
|
||||
## Use Case: Restic
|
||||
|
||||
{{< image float="right" src="/assets/minio/restic-logo.png" alt="Restic Logo" width="12em" >}}
|
||||
|
||||
[[Restic](https://restic.org/)] is a modern backup program that can back up your files from multiple
|
||||
host OS, to many different storage types, easily, effectively, securely, verifiably and freely. With
|
||||
a sales pitch like that, what's not to love? Actually, I am a long-time
|
||||
[[BorgBackup](https://www.borgbackup.org/)] user, and I think I'll keep that running. However, for
|
||||
resilience, and because I've heard only good things about Restic, I'll make a second backup of the
|
||||
routers, hypervisors, and virtual machines using Restic.
|
||||
|
||||
Restic can use S3 buckets out of the box (incidentally, so can BorgBackup). To configure it, I use
|
||||
a mixture of environment variables and flags. But first, let me create a bucket for the backups.
|
||||
|
||||
```
|
||||
pim@glootie:~$ mc mb chbtl0/ipng-restic
|
||||
pim@glootie:~$ mc admin user add chbtl0/ <key> <secret>
|
||||
pim@glootie:~$ cat << EOF | tee ipng-restic-access.json
|
||||
{
|
||||
"PolicyName": "ipng-restic-access",
|
||||
"Policy": {
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ],
|
||||
"Resource": [ "arn:aws:s3:::ipng-restic", "arn:aws:s3:::ipng-restic/*" ]
|
||||
}
|
||||
]
|
||||
},
|
||||
}
|
||||
EOF
|
||||
pim@glootie:~$ mc admin policy create chbtl0/ ipng-restic-access.json
|
||||
pim@glootie:~$ mc admin policy attach chbtl0/ ipng-restic-access --user <key>
|
||||
```
|
||||
|
||||
First, I'll create a bucket called `ipng-restic`. Then, I'll create a _user_ with a given secret
|
||||
_key_. To protect the innocent, and my backups, I'll not disclose them. Next, I'll create an
|
||||
IAM policy that allows for Get/List/Put/Delete to be performed on the bucket and its contents, and
|
||||
finally I'll attach this policy to the user I just created.
|
||||
|
||||
To run a Restic backup, I'll first have to create a so-called _repository_. The repository has a
|
||||
location and a password, which Restic uses to encrypt the data. Because I'm using S3, I'll also need
|
||||
to specify the key and secret:
|
||||
|
||||
```
|
||||
root@glootie:~# RESTIC_PASSWORD="changeme"
|
||||
root@glootie:~# RESTIC_REPOSITORY="s3:https://s3.chbtl0.ipng.ch/ipng-restic/$(hostname)/"
|
||||
root@glootie:~# AWS_ACCESS_KEY_ID="<key>"
|
||||
root@glootie:~# AWS_SECRET_ACCESS_KEY:="<secret>"
|
||||
root@glootie:~# export RESTIC_PASSWORD RESTIC_REPOSITORY AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
|
||||
root@glootie:~# restic init
|
||||
created restic repository 807cf25e85 at s3:https://s3.chbtl0.ipng.ch/ipng-restic/glootie.ipng.ch/
|
||||
```
|
||||
|
||||
Restic prints out some repository finterprint of the latest 'snapshot' it just created. Taking a
|
||||
look on the MinIO install:
|
||||
|
||||
```
|
||||
pim@glootie:~$ mc stat chbtl0/ipng-restic/glootie.ipng.ch/
|
||||
Name : config
|
||||
Date : 2025-06-01 12:01:43 UTC
|
||||
Size : 155 B
|
||||
ETag : 661a43f72c43080649712e45da14da3a
|
||||
Type : file
|
||||
Metadata :
|
||||
Content-Type: application/octet-stream
|
||||
|
||||
Name : keys/
|
||||
Date : 2025-06-01 12:03:33 UTC
|
||||
Type : folder
|
||||
```
|
||||
|
||||
Cool. Now I'm ready to make my first full backup:
|
||||
|
||||
```
|
||||
root@glootie:~# ARGS="--exclude /proc --exclude /sys --exclude /dev --exclude /run"
|
||||
root@glootie:~# ARGS="$ARGS --exclude-if-present .nobackup"
|
||||
root@glootie:~# restic backup $ARGS /
|
||||
...
|
||||
processed 1141426 files, 131.111 GiB in 15:12
|
||||
snapshot 34476c74 saved
|
||||
```
|
||||
|
||||
Once the backup completes, the Restic authors advise me to also do a check of the repository, and to
|
||||
prune it so that it keeps a finite amount of daily, weekly and monthly backups. My further journey
|
||||
for Restic looks a bit like this:
|
||||
|
||||
```
|
||||
root@glootie:~# restic check
|
||||
using temporary cache in /tmp/restic-check-cache-2712250731
|
||||
create exclusive lock for repository
|
||||
load indexes
|
||||
check all packs
|
||||
check snapshots, trees and blobs
|
||||
[0:04] 100.00% 1 / 1 snapshots
|
||||
|
||||
no errors were found
|
||||
|
||||
root@glootie:~# restic forget --prune --keep-daily 8 --keep-weekly 5 --keep-monthly 6
|
||||
repository 34476c74 opened (version 2, compression level auto)
|
||||
Applying Policy: keep 8 daily, 5 weekly, 6 monthly snapshots
|
||||
keep 1 snapshots:
|
||||
ID Time Host Tags Reasons Paths
|
||||
---------------------------------------------------------------------------------
|
||||
34476c74 2025-06-01 12:18:54 glootie.ipng.ch daily snapshot /
|
||||
weekly snapshot
|
||||
monthly snapshot
|
||||
----------------------------------------------------------------------------------
|
||||
1 snapshots
|
||||
```
|
||||
|
||||
Right on! I proceed to update the Ansible configs at IPng to roll this out against the entire fleet
|
||||
of 152 hosts at IPng Networks. I do this in a little tool called `bitcron`, which I wrote for a
|
||||
previous company I worked at: [[BIT](https://bit.nl)] in the Netherlands. Bitcron allows me to
|
||||
create relatively elegant cronjobs that can raise warnings, errors and fatal issues. If no issues
|
||||
are found, an e-mail can be sent to a bitbucket address, but if warnings or errors are found, a
|
||||
different _monitored_ address will be used. Bitcron is kind of cool, and I wrote it in 2001. Maybe
|
||||
I'll write about it, for old time's sake. I wonder if the folks at BIT still use it?
|
||||
|
||||
## Use Case: NGINX
|
||||
|
||||
{{< image float="right" src="/assets/minio/nginx-logo.png" alt="NGINX Logo" width="11em" >}}
|
||||
|
||||
OK, with the first use case out of the way, I turn my attention to a second - in my opinion more
|
||||
interesting - use case. In the [[previous article]({{< ref 2025-05-28-minio-1 >}})], I created a
|
||||
public bucket called `ipng-web-assets` in which I stored 6.50GB of website data belonging to the
|
||||
IPng website, and some material I posted when I was on my
|
||||
[[Sabbatical](https://sabbatical.ipng.nl/)] last year.
|
||||
|
||||
### MinIO: Bucket Replication
|
||||
|
||||
First things first: redundancy. These web assets are currently pushed to all four nginx machines,
|
||||
and statically served. If I were to replace them with a single S3 bucket, I would create a single
|
||||
point of failure, and that's _no bueno_!
|
||||
|
||||
Off I go, creating a replicated bucket using two MinIO instances (`chbtl0` and `ddln0`):
|
||||
|
||||
```
|
||||
pim@glootie:~$ mc mb ddln0/ipng-web-assets
|
||||
pim@glootie:~$ mc anonymous set download ddln0/ipng-web-assets
|
||||
pim@glootie:~$ mc admin user add ddln0/ <replkey> <replsecret>
|
||||
pim@glootie:~$ cat << EOF | tee ipng-web-assets-access.json
|
||||
{
|
||||
"PolicyName": "ipng-web-assets-access",
|
||||
"Policy": {
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ],
|
||||
"Resource": [ "arn:aws:s3:::ipng-web-assets", "arn:aws:s3:::ipng-web-assets/*" ]
|
||||
}
|
||||
]
|
||||
},
|
||||
}
|
||||
EOF
|
||||
pim@glootie:~$ mc admin policy create ddln0/ ipng-web-assets-access.json
|
||||
pim@glootie:~$ mc admin policy attach ddln0/ ipng-web-assets-access --user <replkey>
|
||||
pim@glootie:~$ mc replicate add chbtl0/ipng-web-assets \
|
||||
--remote-bucket https://<key>:<secret>@s3.ddln0.ipng.ch/ipng-web-assets
|
||||
```
|
||||
|
||||
What happens next is pure magic. I've told `chbtl0` that I want it to replicate all existing and
|
||||
future changes to that bucket to its neighbor `ddln0`. Only minutes later, I check the replication
|
||||
status, just to see that it's _already done_:
|
||||
|
||||
```
|
||||
pim@glootie:~$ mc replicate status chbtl0/ipng-web-assets
|
||||
Replication status since 1 hour
|
||||
s3.ddln0.ipng.ch
|
||||
Replicated: 142 objects (6.5 GiB)
|
||||
Queued: ● 0 objects, 0 B (avg: 4 objects, 915 MiB ; max: 0 objects, 0 B)
|
||||
Workers: 0 (avg: 0; max: 0)
|
||||
Transfer Rate: 15 kB/s (avg: 88 MB/s; max: 719 MB/s
|
||||
Latency: 3ms (avg: 3ms; max: 7ms)
|
||||
Link: ● online (total downtime: 0 milliseconds)
|
||||
Errors: 0 in last 1 minute; 0 in last 1hr; 0 since uptime
|
||||
Configured Max Bandwidth (Bps): 644 GB/s Current Bandwidth (Bps): 975 B/s
|
||||
pim@summer:~/src/ipng-web-assets$ mc ls ddln0/ipng-web-assets/
|
||||
[2025-06-01 12:42:22 CEST] 0B ipng.ch/
|
||||
[2025-06-01 12:42:22 CEST] 0B sabbatical.ipng.nl/
|
||||
```
|
||||
|
||||
MinIO has pumped the data from bucket `ipng-web-assets` to the other machine at an average of 88MB/s
|
||||
with a peak throughput of 719MB/s (probably for the larger VM images). And indeed, looking at the
|
||||
remote machine, it is fully caught up after the push, within only a minute or so with a completely
|
||||
fresh copy. Nice!
|
||||
|
||||
### MinIO: Missing directory index
|
||||
|
||||
I take a look at what I just built, on the following URL:
|
||||
* [https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/IMG_0406_0.mp4](https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/IMG_0406_0.mp4)
|
||||
|
||||
That checks out, and I can see the mess that was my room when I first went on sabbatical. By the
|
||||
way, I totally cleaned it up, see
|
||||
[[here](https://sabbatical.ipng.nl/blog/2024/08/01/thursday-basement-done/)] for proof. I can't,
|
||||
however, see the directory listing:
|
||||
|
||||
```
|
||||
pim@glootie:~$ curl https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<Error>
|
||||
<Code>NoSuchKey</Code>
|
||||
<Message>The specified key does not exist.</Message>
|
||||
<Key>sabbatical.ipng.nl/media/vdo/</Key>
|
||||
<BucketName>ipng-web-assets</BucketName>
|
||||
<Resource>/sabbatical.ipng.nl/media/vdo/</Resource>
|
||||
<RequestId>1844EC0CFEBF3C5F</RequestId>
|
||||
<HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId>
|
||||
</Error>
|
||||
```
|
||||
|
||||
That's unfortunate, because some of the IPng articles link to a directory full of files, which I'd
|
||||
like to be shown so that my readers can navigate through the directories. Surely I'm not the first
|
||||
to encounter this? And sure enough, I'm not
|
||||
[[ref](https://github.com/glowinthedark/index-html-generator)] by user `glowinthedark` who wrote a
|
||||
little python script that generates `index.html` files for their Caddy file server. I'll take me
|
||||
some of that Python, thank you!
|
||||
|
||||
With the following little script, my setup is complete:
|
||||
|
||||
```
|
||||
pim@glootie:~/src/ipng-web-assets$ cat push.sh
|
||||
#!/usr/bin/env bash
|
||||
|
||||
echo "Generating index.html files ..."
|
||||
for D in */media; do
|
||||
echo "* Directory $D"
|
||||
./genindex.py -r $D
|
||||
done
|
||||
echo "Done (genindex)"
|
||||
echo ""
|
||||
|
||||
echo "Mirroring directoro to S3 Bucket"
|
||||
mc mirror --remove --overwrite . chbtl0/ipng-web-assets/
|
||||
echo "Done (mc mirror)"
|
||||
echo ""
|
||||
pim@glootie:~/src/ipng-web-assets$ ./push.sh
|
||||
```
|
||||
|
||||
Only a few seconds after I run `./push.sh`, the replication is complete and I have two identical
|
||||
copies of my media:
|
||||
|
||||
1. [https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/](https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/index.html)
|
||||
1. [https://ipng-web-assets.s3.ddln0.ipng.ch/ipng.ch/media/](https://ipng-web-assets.s3.ddln0.ipng.ch/ipng.ch/media/index.html)
|
||||
|
||||
|
||||
### NGINX: Proxy to Minio
|
||||
|
||||
Before moving to S3 storage, my NGINX frontends all kept a copy of the IPng media on local NVME
|
||||
disk. That's great for reliability, as each NGINX instance is completely hermetic and standalone.
|
||||
However, it's not great for scaling: the current NGINX instances only have 16GB of local storage,
|
||||
and I'd rather not have my static web asset data outgrow that filesystem. From before, I already had
|
||||
an NGINX config that served the Hugo static data from `/var/www/ipng.ch/ and the `/media'
|
||||
subdirectory from a different directory in `/var/www/ipng-web-assets/ipng.ch/media`.
|
||||
|
||||
Moving to redundant S3 storage backenda is straight forward:
|
||||
|
||||
```
|
||||
upstream minio_ipng {
|
||||
least_conn;
|
||||
server minio0.chbtl0.net.ipng.ch:9000;
|
||||
server minio0.ddln0.net.ipng.ch:9000;
|
||||
}
|
||||
|
||||
server {
|
||||
...
|
||||
location / {
|
||||
root /var/www/ipng.ch/;
|
||||
}
|
||||
|
||||
location /media {
|
||||
proxy_set_header Host $http_host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
proxy_connect_timeout 300;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Connection "";
|
||||
chunked_transfer_encoding off;
|
||||
|
||||
rewrite (.*)/$ $1/index.html;
|
||||
|
||||
proxy_pass http://minio_ipng/ipng-web-assets/ipng.ch/media;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
I want to make note of a few things:
|
||||
1. The `upstream` definition here uses IPng Site Local entrypoints, considering the NGINX servers
|
||||
all have direct MTU=9000 access to the MinIO instances. I'll put both in there, in a
|
||||
round-robin configuration favoring the replica with _least connections_.
|
||||
1. Deeplinking to directory names without the trailing `/index.html` would serve a 404 from the
|
||||
backend, so I'll intercept these and rewrite directory to always include the `/index.html'.
|
||||
1. The used upstream endpoint is _path-based_, that is to say has the bucketname and website name
|
||||
included. This whole location used to be simply `root /var/www/ipng-web-assets/ipng.ch/media/`
|
||||
so the mental change is quite small.
|
||||
|
||||
### NGINX: Caching
|
||||
|
||||
|
||||
After deploying the S3 upstream on all IPng websites, I can delete the old
|
||||
`/var/www/ipng-web-assets/` directory and reclaim about 7GB of diskspace. This gives me an idea ...
|
||||
|
||||
{{< image width="8em" float="left" src="/assets/shared/brain.png" alt="brain" >}}
|
||||
|
||||
On the one hand it's great that I will pull these assets from Minio and all, but at the same time,
|
||||
it's a tad inefficient to retrieve them from, say, Zurich to Amsterdam just to serve them onto the
|
||||
internet again. If at any time something on the IPng website goes viral, it'd be nice to be able to
|
||||
serve them directly from the edge, right?
|
||||
|
||||
A webcache. What could _possibly_ go wrong :)
|
||||
|
||||
NGINX is really really good at caching content. It has a powerful engine to store, scan, revalidate
|
||||
and match any content and upstream headers. It's also very well documented, so I take a look at the
|
||||
proxy module's documentation [[here](https://nginx.org/en/docs/http/ngx_http_proxy_module.html)] and
|
||||
in particular a useful [[blog](https://blog.nginx.org/blog/nginx-caching-guide)] on their website.
|
||||
|
||||
The first thing I need to do is create what is called a _key zone_, which is a region of memory in
|
||||
which URL keys are stored with some metadata. Having a copy of the keys in memory enables NGINX to
|
||||
quickly determine if a request is a HIT or a MISS without having to go to disk, greatly speeding up
|
||||
the check.
|
||||
|
||||
In `/etc/nginx/conf.d/ipng-cache.conf` I add the following NGINX cache:
|
||||
|
||||
```
|
||||
proxy_cache_path /var/www/nginx-cache levels=1:2 keys_zone=ipng_cache:10m max_size=8g
|
||||
inactive=24h use_temp_path=off;
|
||||
```
|
||||
|
||||
With this statement, I'll create a 2-level subdirectory, and allocate 10MB of space, which should
|
||||
hold on the order of 100K entries. The maximum size I'll allow the cache to grow to is 8GB, and I'll
|
||||
mark any object inactive if it's not been referenced for 24 hours. I learn that inactive is
|
||||
different to expired content. If a cache element has expired, but NGINX can't reach the upstream
|
||||
for a new copy, it can be configured to serve a inactive (stale) copy from the cache. That's dope,
|
||||
as it serves as an extra layer of defence in case the network or all available S3 replicas take the
|
||||
day off. I'll ask NGINX to avoid writing objects first to a tmp directory and them moving them into
|
||||
the `/var/www/nginx-cache` directory. These are recommendations I grab from the manual.
|
||||
|
||||
Within the `location` block I configured above, I'm now ready to enable this cache. I'll do that by
|
||||
adding two include files, which I'll reference in all sites that I want to have make use of this
|
||||
cache:
|
||||
|
||||
First, to enable the cache, I write the following snippet:
|
||||
```
|
||||
pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-cache.inc
|
||||
proxy_cache ipng_cache;
|
||||
proxy_ignore_headers Cache-Control;
|
||||
proxy_cache_valid any 1h;
|
||||
proxy_cache_revalidate on;
|
||||
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
|
||||
proxy_cache_background_update on;
|
||||
```
|
||||
|
||||
Then, I find it useful to emit a few debugging HTTP headers, and at the same time I see that Minio
|
||||
emits a bunch of HTTP headers that may not be safe for me to propagate, so I pen two more snippets:
|
||||
|
||||
```
|
||||
pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-strip-minio-headers.inc
|
||||
proxy_hide_header x-minio-deployment-id;
|
||||
proxy_hide_header x-amz-request-id;
|
||||
proxy_hide_header x-amz-id-2;
|
||||
proxy_hide_header x-amz-replication-status;
|
||||
proxy_hide_header x-amz-version-id;
|
||||
|
||||
pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-add-upstream-headers.inc
|
||||
add_header X-IPng-Frontend $hostname always;
|
||||
add_header X-IPng-Upstream $upstream_addr always;
|
||||
add_header X-IPng-Upstream-Status $upstream_status always;
|
||||
add_header X-IPng-Cache-Status $upstream_cache_status;
|
||||
```
|
||||
|
||||
With that, I am ready to enable caching of the IPng `/media` location:
|
||||
|
||||
```
|
||||
location /media {
|
||||
...
|
||||
include /etc/nginx/conf.d/ipng-strip-minio-headers.inc;
|
||||
include /etc/nginx/conf.d/ipng-add-upstream-headers.inc;
|
||||
include /etc/nginx/conf.d/ipng-cache.inc;
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
## Results
|
||||
|
||||
I run the Ansible playbook for the NGINX cluster and take a look at the replica at Coloclue in
|
||||
Amsterdam, called `nginx0.nlams1.ipng.ch`. Notably, it'll have to retrieve the file from a MinIO
|
||||
replica in Zurich (12ms away), so it's expected to take a little while.
|
||||
|
||||
The first attempt:
|
||||
|
||||
```
|
||||
pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \
|
||||
https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz
|
||||
...
|
||||
< last-modified: Sun, 01 Jun 2025 12:37:52 GMT
|
||||
< x-ipng-frontend: nginx0-nlams1
|
||||
< x-ipng-cache-status: MISS
|
||||
< x-ipng-upstream: [2001:678:d78:503::b]:9000
|
||||
< x-ipng-upstream-status: 200
|
||||
|
||||
100 711M 100 711M 0 0 26.2M 0 0:00:27 0:00:27 --:--:-- 26.6M
|
||||
```
|
||||
|
||||
|
||||
OK, that's respectable, I've read the file at 26MB/s. Of course I just turned on the cache, so the
|
||||
NGINX fetches the file from Zurich while handing it over to my `curl` here. It notifies me by means
|
||||
of a HTTP header that the cache was a `MISS`, and then which upstream server it contacted to
|
||||
retrieve the object.
|
||||
|
||||
But look at what happens the _second_ time I run the same command:
|
||||
|
||||
```
|
||||
pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \
|
||||
https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz
|
||||
< last-modified: Sun, 01 Jun 2025 12:37:52 GMT
|
||||
< x-ipng-frontend: nginx0-nlams1
|
||||
< x-ipng-cache-status: HIT
|
||||
|
||||
100 711M 100 711M 0 0 436M 0 0:00:01 0:00:01 --:--:-- 437M
|
||||
```
|
||||
|
||||
|
||||
Holy moly! First I see the object has the same _Last-Modified_ header, but I now also see that the
|
||||
_Cache-Status_ was a `HIT`, and there is no mention of any upstream server. I do however see the
|
||||
file come in at a whopping 437MB/s which is 16x faster than over the network!! Nice work, NGINX!
|
||||
|
||||
{{< image float="right" src="/assets/minio/rack-2.png" alt="Rack-o-Minio" width="12em" >}}
|
||||
|
||||
# What's Next
|
||||
|
||||
I'm going to deploy the third MinIO replica in Rümlang once the disks arrive. I'll release the
|
||||
~4TB of disk used currently in Restic backups for the fleet, and put that ZFS capacity to other use.
|
||||
Now, creating services like PeerTube, Mastodon, Pixelfed, Loops, NextCloud and what-have-you, will
|
||||
become much easier for me. And with the per-bucket replication between MinIO deployments, I also
|
||||
think this is a great way to auto-backup important data. First off, it'll be RS8.4 on the MinIO node
|
||||
itself, and secondly, user data will be copied automatically to a neighboring facility.
|
||||
|
||||
I've convinced myself that S3 storage is a great service to operate, and that MinIO is awesome.
|
BIN
static/assets/minio/console-1.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/console-1.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/console-2.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/console-2.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/disks.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/disks.png
(Stored with Git LFS)
Normal file
Binary file not shown.
1633
static/assets/minio/minio-ec.svg
Normal file
1633
static/assets/minio/minio-ec.svg
Normal file
File diff suppressed because it is too large
Load Diff
After Width: | Height: | Size: 90 KiB |
BIN
static/assets/minio/minio-logo.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/minio-logo.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/nagios.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/nagios.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/nginx-logo.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/nginx-logo.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/rack-2.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/rack-2.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/rack.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/rack.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/assets/minio/restic-logo.png
(Stored with Git LFS)
Normal file
BIN
static/assets/minio/restic-logo.png
(Stored with Git LFS)
Normal file
Binary file not shown.
Reference in New Issue
Block a user