Files
ipng.ch/content/articles/2025-06-01-minio-2.md
Pim van Pelt ca46bcf6d5
All checks were successful
continuous-integration/drone/push Build is passing
Add Minio #2
2025-06-01 16:39:48 +02:00

21 KiB

date, title
date title
2025-06-01T10:07:23Z Case Study: Minio S3 - Part 2

{{< image float="right" src="/assets/minio/minio-logo.png" alt="MinIO Logo" width="6em" >}}

Introduction

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Millions of customers of all sizes and industries store, manage, analyze, and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize and analyze data, and configure fine-tuned access controls to meet specific business and compliance requirements.

Amazon's S3 became the de facto standard object storage system, and there exist several fully open source implementations of the protocol. One of them is MinIO: designed to allow enterprises to consolidate all of their data on a single, private cloud namespace. Architected using the same principles as the hyperscalers, AIStor delivers performance at scale at a fraction of the cost compared to the public cloud.

IPng Networks is an Internet Service Provider, but I also dabble in self-hosting things, for example [PeerTube], [Mastodon], [Immich], [Pixelfed] and of course [Hugo]. These services all have one thing in common: they tend to use lots of storage when they grow. At IPng Networks, all hypervisors ship with enterprise SAS flash drives, mostly 1.92TB and 3.84TB. Scaling up each of these services, and backing them up safely, can be quite the headache.

In a [[previous article]({{< ref 2025-05-28-minio-1 >}})], I talked through the install of a redundant set of three Minio machines. In this article, I'll start putting them to good use.

Use Case: Restic

{{< image float="right" src="/assets/minio/restic-logo.png" alt="Restic Logo" width="12em" >}}

[Restic] is a modern backup program that can back up your files from multiple host OS, to many different storage types, easily, effectively, securely, verifiably and freely. With a sales pitch like that, what's not to love? Actually, I am a long-time [BorgBackup] user, and I think I'll keep that running. However, for resilience, and because I've heard only good things about Restic, I'll make a second backup of the routers, hypervisors, and virtual machines using Restic.

Restic can use S3 buckets out of the box (incidentally, so can BorgBackup). To configure it, I use a mixture of environment variables and flags. But first, let me create a bucket for the backups.

pim@glootie:~$ mc mb chbtl0/ipng-restic
pim@glootie:~$ mc admin user add chbtl0/ <key> <secret>
pim@glootie:~$ cat << EOF | tee ipng-restic-access.json
{
 "PolicyName": "ipng-restic-access",
 "Policy": {
  "Version": "2012-10-17",
  "Statement": [
   {
    "Effect": "Allow",
    "Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ],
    "Resource": [ "arn:aws:s3:::ipng-restic", "arn:aws:s3:::ipng-restic/*" ]
   }
  ]
 },
}
EOF
pim@glootie:~$ mc admin policy create chbtl0/ ipng-restic-access.json
pim@glootie:~$ mc admin policy attach chbtl0/ ipng-restic-access --user <key>

First, I'll create a bucket called ipng-restic. Then, I'll create a user with a given secret key. To protect the innocent, and my backups, I'll not disclose them. Next, I'll create an IAM policy that allows for Get/List/Put/Delete to be performed on the bucket and its contents, and finally I'll attach this policy to the user I just created.

To run a Restic backup, I'll first have to create a so-called repository. The repository has a location and a password, which Restic uses to encrypt the data. Because I'm using S3, I'll also need to specify the key and secret:

root@glootie:~# RESTIC_PASSWORD="changeme"
root@glootie:~# RESTIC_REPOSITORY="s3:https://s3.chbtl0.ipng.ch/ipng-restic/$(hostname)/"
root@glootie:~# AWS_ACCESS_KEY_ID="<key>"
root@glootie:~# AWS_SECRET_ACCESS_KEY:="<secret>"
root@glootie:~# export RESTIC_PASSWORD RESTIC_REPOSITORY AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
root@glootie:~# restic init
created restic repository 807cf25e85 at s3:https://s3.chbtl0.ipng.ch/ipng-restic/glootie.ipng.ch/

Restic prints out some repository finterprint of the latest 'snapshot' it just created. Taking a look on the MinIO install:

pim@glootie:~$ mc stat chbtl0/ipng-restic/glootie.ipng.ch/
Name      : config
Date      : 2025-06-01 12:01:43 UTC 
Size      : 155 B  
ETag      : 661a43f72c43080649712e45da14da3a 
Type      : file 
Metadata  :
  Content-Type: application/octet-stream 

Name      : keys/
Date      : 2025-06-01 12:03:33 UTC 
Type      : folder 

Cool. Now I'm ready to make my first full backup:

root@glootie:~# ARGS="--exclude /proc --exclude /sys --exclude /dev --exclude /run"
root@glootie:~# ARGS="$ARGS --exclude-if-present .nobackup"
root@glootie:~# restic backup $ARGS /
...
processed 1141426 files, 131.111 GiB in 15:12
snapshot 34476c74 saved

Once the backup completes, the Restic authors advise me to also do a check of the repository, and to prune it so that it keeps a finite amount of daily, weekly and monthly backups. My further journey for Restic looks a bit like this:

root@glootie:~# restic check
using temporary cache in /tmp/restic-check-cache-2712250731
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:04] 100.00%  1 / 1 snapshots

no errors were found

root@glootie:~# restic forget --prune --keep-daily 8 --keep-weekly 5 --keep-monthly 6
repository 34476c74 opened (version 2, compression level auto)
Applying Policy: keep 8 daily, 5 weekly, 6 monthly snapshots
keep 1 snapshots:
ID        Time                 Host           Tags        Reasons           Paths
---------------------------------------------------------------------------------
34476c74  2025-06-01 12:18:54  glootie.ipng.ch            daily snapshot    /
                                                          weekly snapshot
                                                          monthly snapshot
----------------------------------------------------------------------------------
1 snapshots

Right on! I proceed to update the Ansible configs at IPng to roll this out against the entire fleet of 152 hosts at IPng Networks. I do this in a little tool called bitcron, which I wrote for a previous company I worked at: [BIT] in the Netherlands. Bitcron allows me to create relatively elegant cronjobs that can raise warnings, errors and fatal issues. If no issues are found, an e-mail can be sent to a bitbucket address, but if warnings or errors are found, a different monitored address will be used. Bitcron is kind of cool, and I wrote it in 2001. Maybe I'll write about it, for old time's sake. I wonder if the folks at BIT still use it?

Use Case: NGINX

{{< image float="right" src="/assets/minio/nginx-logo.png" alt="NGINX Logo" width="11em" >}}

OK, with the first use case out of the way, I turn my attention to a second - in my opinion more interesting - use case. In the [[previous article]({{< ref 2025-05-28-minio-1 >}})], I created a public bucket called ipng-web-assets in which I stored 6.50GB of website data belonging to the IPng website, and some material I posted when I was on my [Sabbatical] last year.

MinIO: Bucket Replication

First things first: redundancy. These web assets are currently pushed to all four nginx machines, and statically served. If I were to replace them with a single S3 bucket, I would create a single point of failure, and that's no bueno!

Off I go, creating a replicated bucket using two MinIO instances (chbtl0 and ddln0):

pim@glootie:~$ mc mb ddln0/ipng-web-assets
pim@glootie:~$ mc anonymous set download ddln0/ipng-web-assets
pim@glootie:~$ mc admin user add ddln0/ <replkey> <replsecret>
pim@glootie:~$ cat << EOF | tee ipng-web-assets-access.json
{
 "PolicyName": "ipng-web-assets-access",
 "Policy": {
  "Version": "2012-10-17",
  "Statement": [
   {
    "Effect": "Allow",
    "Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ],
    "Resource": [ "arn:aws:s3:::ipng-web-assets", "arn:aws:s3:::ipng-web-assets/*" ]
   }
  ]
 },
}
EOF
pim@glootie:~$ mc admin policy create ddln0/ ipng-web-assets-access.json
pim@glootie:~$ mc admin policy attach ddln0/ ipng-web-assets-access --user <replkey>
pim@glootie:~$ mc replicate add chbtl0/ipng-web-assets \
                  --remote-bucket https://<key>:<secret>@s3.ddln0.ipng.ch/ipng-web-assets

What happens next is pure magic. I've told chbtl0 that I want it to replicate all existing and future changes to that bucket to its neighbor ddln0. Only minutes later, I check the replication status, just to see that it's already done:

pim@glootie:~$ mc replicate status chbtl0/ipng-web-assets
  Replication status since 1 hour                                                                     
  s3.ddln0.ipng.ch
  Replicated:                   142 objects (6.5 GiB)                                             
  Queued:                       ● 0 objects, 0 B (avg: 4 objects, 915 MiB ; max: 0 objects, 0 B)  
  Workers:                      0 (avg: 0; max: 0)                                                
  Transfer Rate:                15 kB/s (avg: 88 MB/s; max: 719 MB/s                              
  Latency:                      3ms (avg: 3ms; max: 7ms)                                          
  Link:                         ● online (total downtime: 0 milliseconds)                         
  Errors:                       0 in last 1 minute; 0 in last 1hr; 0 since uptime                 
  Configured Max Bandwidth (Bps): 644 GB/s   Current Bandwidth (Bps): 975 B/s                     
pim@summer:~/src/ipng-web-assets$ mc ls ddln0/ipng-web-assets/
[2025-06-01 12:42:22 CEST]     0B ipng.ch/
[2025-06-01 12:42:22 CEST]     0B sabbatical.ipng.nl/

MinIO has pumped the data from bucket ipng-web-assets to the other machine at an average of 88MB/s with a peak throughput of 719MB/s (probably for the larger VM images). And indeed, looking at the

MinIO: Missing directory index

I take a look at what I just built, on the following URL:

That checks out, and I can see the mess that was my room when I first went on sabbatical. By the way, I totally cleaned it up, see [here] for proof. I can't, however, see the directory listing:

pim@glootie:~$ curl https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/
<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchKey</Code>
  <Message>The specified key does not exist.</Message>
  <Key>sabbatical.ipng.nl/media/vdo/</Key>
  <BucketName>ipng-web-assets</BucketName>
  <Resource>/sabbatical.ipng.nl/media/vdo/</Resource>
  <RequestId>1844EC0CFEBF3C5F</RequestId>
  <HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId>
</Error>

That's unfortunate, because some of the IPng articles link to a directory full of files, which I'd like to be shown so that my readers can navigate through the directories. Surely I'm not the first to encouter this? And sure enough, I'm not [ref] by user glowinthedark who wrote a little python script that generates index.html files for their Caddy file server. I'll take me some of that Python, thank you!

With the following little script, my setup is complete:

pim@glootie:~/src/ipng-web-assets$ cat push.sh 
#!/usr/bin/env bash

echo "Generating index.html files ..."
for D in */media; do
  echo "* Directory $D"
  ./genindex.py -r $D
done
echo "Done (genindex)"
echo ""

echo "Mirroring directoro to S3 Bucket"
mc mirror --remove --overwrite . chbtl0/ipng-web-assets/
echo "Done (mc mirror)"
echo ""
pim@glootie:~/src/ipng-web-assets$ ./push.sh 

Only a few seconds after I run ./push.sh, the replication is complete and I have two identical copies of my media:

  1. https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/
  2. https://ipng-web-assets.s3.ddln0.ipng.ch/ipng.ch/media/

NGINX: Proxy to Minio

Before moving to S3 storage, my NGINX frontends all kept a copy of the IPng media on local NVME disk. That's great for reliability, as each NGINX instance is completely hermetic and standalone. However, it's not great for scaling: the current NGINX instances only have 16GB of local storage, and I'd rather not have my static web asset data outgrow that filesystem. From before, I already had an NGINX config that served the Hugo static data from /var/www/ipng.ch/ and the /media' subdirectory from a different directory in /var/www/ipng-web-assets/ipng.ch/media.

Moving to redundant S3 storage backenda is straight forward:

upstream minio_ipng {
  least_conn;
  server minio0.chbtl0.net.ipng.ch:9000;
  server minio0.ddln0.net.ipng.ch:9000;
}

server {
  ...
  location / {
    root /var/www/ipng.ch/;
  }

  location /media {
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    proxy_connect_timeout 300;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    chunked_transfer_encoding off;

    rewrite (.*)/$ $1/index.html;

    proxy_pass http://minio_ipng/ipng-web-assets/ipng.ch/media;
  }
}

I want to make note of a few things:

  1. The upstream definition here uses IPng Site Local entrypoints, considering the NGINX servers all have direct MTU=9000 access to the MinIO instances. I'll put both in there, in a round-robin configuration favoring the replica with least connections.
  2. Deeplinking to directory names without the trailing /index.html would serve a 404 from the backend, so I'll intercept these and rewrite directory to always include the `/index.html'.
  3. The used upstream endpoint is path-based, that is to say has the bucketname and website name included. This whole location used to be simply root /var/www/ipng-web-assets/ipng.ch/media/ so the mental change is quite small.

NGINX: Caching

After deploying the S3 upstream on all IPng websites, I can delete the old /var/www/ipng-web-assets/ directory and reclaim about 7GB of diskspace. This gives me an idea ...

{{< image width="8em" float="left" src="/assets/shared/brain.png" alt="brain" >}}

On the one hand it's great that I will pull these assets from Minio and all, but at the same time, it's a tad inefficient to retrieve them from, say, Zurich to Amsterdam just to serve them onto the internet again. If at any time something on the IPng website goes viral, it'd be nice to be able to serve them directly from the edge, right?

A webcache. What could possibly go wrong :)

NGINX is really really good at caching content. It has a powerful engine to store, scan, revalidate and match any content and upstream headers. It's also very well documented, so I take a look at the proxy module's documentation [here] and in particular a useful [blog] on their website.

The first thing I need to do is create what is called a key zone, which is a region of memory in which URL keys are stored with some metadata. Having a copy of the keys in memory enables NGINX to quickly determine if a request is a HIT or a MISS without having to go to disk, greatly speeding up the check.

In /etc/nginx/conf.d/ipng-cache.conf I add the following NGINX cache:

proxy_cache_path /var/www/nginx-cache levels=1:2 keys_zone=ipng_cache:10m max_size=8g
                 inactive=24h use_temp_path=off;

With this statement, I'll create a 2-level subdirectory, and allocate 10MB of space, which should hold on the order of 100K entries. The maximum size I'll allow the cache to grow to is 8GB, and I'll mark any object inactive if it's not been referenced for 24 hours. I learn that inactive is different to expired content. If a cache element has expired, but NGINX can't reach the upstream for a new copy, it can be configured to serve a inactive (stale) copy from the cache. That's dope, as it serves as an extra layer of defence in case the network or all available S3 replicas take the day off. I'll ask NGINX to avoid writing objects first to a tmp directory and them moving them into the /var/www/nginx-cache directory. These are recommendations I grab from the manual.

Within the location block I configured above, I'm now ready to enable this cache. I'll do that by adding two include files, which I'll reference in all sites that I want to have make use of this cache:

First, to enable the cache, I write the following snippet:

pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-cache.inc
proxy_cache ipng_cache;
proxy_ignore_headers Cache-Control;
proxy_cache_valid any 1h;
proxy_cache_revalidate on;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_background_update on;

Then, I find it useful to emit a few debugging HTTP headers, and at the same time I see that Minio emits a bunch of HTTP headers that may not be safe for me to propagate, so I pen two more snippets:

pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-strip-minio-headers.inc 
proxy_hide_header x-minio-deployment-id;
proxy_hide_header x-amz-request-id;
proxy_hide_header x-amz-id-2;
proxy_hide_header x-amz-replication-status;
proxy_hide_header x-amz-version-id;

pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-add-upstream-headers.inc 
add_header X-IPng-Frontend $hostname always;
add_header X-IPng-Upstream $upstream_addr always;
add_header X-IPng-Upstream-Status $upstream_status always;
add_header X-IPng-Cache-Status $upstream_cache_status;

With that, I am ready to enable caching of the IPng /media location:

  location /media {
    ...
    include /etc/nginx/conf.d/ipng-strip-minio-headers.inc;
    include /etc/nginx/conf.d/ipng-add-upstream-headers.inc;
    include /etc/nginx/conf.d/ipng-cache.inc;
    ...
}

Results

I run the Ansible playbook for the NGINX cluster and take a look at the replica at Coloclue in Amsterdam, called nginx0.nlams1.ipng.ch. Notably, it'll have to retrieve the file from a MinIO replica in Zurich (12ms away), so it's expected to take a little while.

The first attempt:

pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \
                     https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz
...
< last-modified: Sun, 01 Jun 2025 12:37:52 GMT
< x-ipng-frontend: nginx0-nlams1
< x-ipng-cache-status: MISS
< x-ipng-upstream: [2001:678:d78:503::b]:9000
< x-ipng-upstream-status: 200

100  711M  100  711M    0     0  26.2M      0  0:00:27  0:00:27 --:--:-- 26.6M

OK, that's respectable, I've read the file at 26MB/s. Of course I just turned on the cache, so the NGINX fetches the file from Zurich while handing it over to my curl here. It notifies me by means of a HTTP header that the cache was a MISS, and then which upstream server it contacted to retrieve the object.

But look at what happens the second time I run the same command:

pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \
                     https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz
< last-modified: Sun, 01 Jun 2025 12:37:52 GMT
< x-ipng-frontend: nginx0-nlams1
< x-ipng-cache-status: HIT
 
100  711M  100  711M    0     0   436M      0  0:00:01  0:00:01 --:--:--  437M

Holy moly! First I see the object has the same Last-Modified header, but I now also see that the Cache-Status was a HIT, and there is no mention of any upstream server. I do however see the file come in at a whopping 437MB/s which is 16x faster than over the network!! Nice work, NGINX!

{{< image float="right" src="/assets/minio/rack-2.png" alt="Rack-o-Minio" width="12em" >}}

What's Next

I'm going to deploy the third MinIO replica in Rümlang once the disks arrive. I'll release the ~4TB of disk used currently in Restic backups for the fleet, and put that ZFS capacity to other use. Now, creating services like PeerTube, Mastodon, Pixelfed, Loops, NextCloud and what-have-you, will become much easier for me. And with the per-bucket replication between MinIO deployments, I also think this is a great way to auto-backup important data. First off, it'll be RS8.4 on the MinIO node itself, and secondly, user data will be copied automatically to a neighboring facility.

I've convinced myself that S3 storage is a great service to operate, and that MinIO is awesome.