diff --git a/content/articles/2025-06-01-minio-2.md b/content/articles/2025-06-01-minio-2.md new file mode 100644 index 0000000..570dbd6 --- /dev/null +++ b/content/articles/2025-06-01-minio-2.md @@ -0,0 +1,473 @@ +--- +date: "2025-06-01T10:07:23Z" +title: 'Case Study: Minio S3 - Part 2' +--- + +{{< image float="right" src="/assets/minio/minio-logo.png" alt="MinIO Logo" width="6em" >}} + +# Introduction + +Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading +scalability, data availability, security, and performance. Millions of customers of all sizes and +industries store, manage, analyze, and protect any amount of data for virtually any use case, such +as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and +easy-to-use management features, you can optimize costs, organize and analyze data, and configure +fine-tuned access controls to meet specific business and compliance requirements. + +Amazon's S3 became the _de facto_ standard object storage system, and there exist several fully open +source implementations of the protocol. One of them is MinIO: designed to allow enterprises to +consolidate all of their data on a single, private cloud namespace. Architected using the same +principles as the hyperscalers, AIStor delivers performance at scale at a fraction of the cost +compared to the public cloud. + +IPng Networks is an Internet Service Provider, but I also dabble in self-hosting things, for +example [[PeerTube](https://video.ipng.ch/)], [[Mastodon](https://ublog.tech/)], +[[Immich](https://photos.ipng.ch/)], [[Pixelfed](https://pix.ublog.tech/)] and of course +[[Hugo](https://ipng/ch/)]. These services all have one thing in common: they tend to use lots of +storage when they grow. At IPng Networks, all hypervisors ship with enterprise SAS flash drives, +mostly 1.92TB and 3.84TB. Scaling up each of these services, and backing them up safely, can be +quite the headache. + +In a [[previous article]({{< ref 2025-05-28-minio-1 >}})], I talked through the install of a +redundant set of three Minio machines. In this article, I'll start putting them to good use. + +## Use Case: Restic + +{{< image float="right" src="/assets/minio/restic-logo.png" alt="Restic Logo" width="12em" >}} + +[[Restic](https://restic.org/)] is a modern backup program that can back up your files from multiple +host OS, to many different storage types, easily, effectively, securely, verifiably and freely. With +a sales pitch like that, what's not to love? Actually, I am a long-time +[[BorgBackup](https://www.borgbackup.org/)] user, and I think I'll keep that running. However, for +resilience, and because I've heard only good things about Restic, I'll make a second backup of the +routers, hypervisors, and virtual machines using Restic. + +Restic can use S3 buckets out of the box (incidentally, so can BorgBackup). To configure it, I use +a mixture of environment variables and flags. But first, let me create a bucket for the backups. + +``` +pim@glootie:~$ mc mb chbtl0/ipng-restic +pim@glootie:~$ mc admin user add chbtl0/ +pim@glootie:~$ cat << EOF | tee ipng-restic-access.json +{ + "PolicyName": "ipng-restic-access", + "Policy": { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ], + "Resource": [ "arn:aws:s3:::ipng-restic", "arn:aws:s3:::ipng-restic/*" ] + } + ] + }, +} +EOF +pim@glootie:~$ mc admin policy create chbtl0/ ipng-restic-access.json +pim@glootie:~$ mc admin policy attach chbtl0/ ipng-restic-access --user +``` + +First, I'll create a bucket called `ipng-restic`. Then, I'll create a _user_ with a given secret +_key_. To protect the innocent, and my backups, I'll not disclose them. Next, I'll create an +IAM policy that allows for Get/List/Put/Delete to be performed on the bucket and its contents, and +finally I'll attach this policy to the user I just created. + +To run a Restic backup, I'll first have to create a so-called _repository_. The repository has a +location and a password, which Restic uses to encrypt the data. Because I'm using S3, I'll also need +to specify the key and secret: + +``` +root@glootie:~# RESTIC_PASSWORD="changeme" +root@glootie:~# RESTIC_REPOSITORY="s3:https://s3.chbtl0.ipng.ch/ipng-restic/$(hostname)/" +root@glootie:~# AWS_ACCESS_KEY_ID="" +root@glootie:~# AWS_SECRET_ACCESS_KEY:="" +root@glootie:~# export RESTIC_PASSWORD RESTIC_REPOSITORY AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY +root@glootie:~# restic init +created restic repository 807cf25e85 at s3:https://s3.chbtl0.ipng.ch/ipng-restic/glootie.ipng.ch/ +``` + +Restic prints out some repository finterprint of the latest 'snapshot' it just created. Taking a +look on the MinIO install: + +``` +pim@glootie:~$ mc stat chbtl0/ipng-restic/glootie.ipng.ch/ +Name : config +Date : 2025-06-01 12:01:43 UTC +Size : 155 B +ETag : 661a43f72c43080649712e45da14da3a +Type : file +Metadata : + Content-Type: application/octet-stream + +Name : keys/ +Date : 2025-06-01 12:03:33 UTC +Type : folder +``` + +Cool. Now I'm ready to make my first full backup: + +``` +root@glootie:~# ARGS="--exclude /proc --exclude /sys --exclude /dev --exclude /run" +root@glootie:~# ARGS="$ARGS --exclude-if-present .nobackup" +root@glootie:~# restic backup $ARGS / +... +processed 1141426 files, 131.111 GiB in 15:12 +snapshot 34476c74 saved +``` + +Once the backup completes, the Restic authors advise me to also do a check of the repository, and to +prune it so that it keeps a finite amount of daily, weekly and monthly backups. My further journey +for Restic looks a bit like this: + +``` +root@glootie:~# restic check +using temporary cache in /tmp/restic-check-cache-2712250731 +create exclusive lock for repository +load indexes +check all packs +check snapshots, trees and blobs +[0:04] 100.00% 1 / 1 snapshots + +no errors were found + +root@glootie:~# restic forget --prune --keep-daily 8 --keep-weekly 5 --keep-monthly 6 +repository 34476c74 opened (version 2, compression level auto) +Applying Policy: keep 8 daily, 5 weekly, 6 monthly snapshots +keep 1 snapshots: +ID Time Host Tags Reasons Paths +--------------------------------------------------------------------------------- +34476c74 2025-06-01 12:18:54 glootie.ipng.ch daily snapshot / + weekly snapshot + monthly snapshot +---------------------------------------------------------------------------------- +1 snapshots +``` + +Right on! I proceed to update the Ansible configs at IPng to roll this out against the entire fleet +of 152 hosts at IPng Networks. I do this in a little tool called `bitcron`, which I wrote for a +previous company I worked at: [[BIT](https://bit.nl)] in the Netherlands. Bitcron allows me to +create relatively elegant cronjobs that can raise warnings, errors and fatal issues. If no issues +are found, an e-mail can be sent to a bitbucket address, but if warnings or errors are found, a +different _monitored_ address will be used. Bitcron is kind of cool, and I wrote it in 2001. Maybe +I'll write about it, for old time's sake. I wonder if the folks at BIT still use it? + +## Use Case: NGINX + +{{< image float="right" src="/assets/minio/nginx-logo.png" alt="NGINX Logo" width="11em" >}} + +OK, with the first use case out of the way, I turn my attention to a second - in my opinion more +interesting - use case. In the [[previous article]({{< ref 2025-05-28-minio-1 >}})], I created a +public bucket called `ipng-web-assets` in which I stored 6.50GB of website data belonging to the +IPng website, and some material I posted when I was on my +[[Sabbatical](https://sabbatical.ipng.nl/)] last year. + +### MinIO: Bucket Replication + +First things first: redundancy. These web assets are currently pushed to all four nginx machines, +and statically served. If I were to replace them with a single S3 bucket, I would create a single +point of failure, and that's _no bueno_! + +Off I go, creating a replicated bucket using two MinIO instances (`chbtl0` and `ddln0`): + +``` +pim@glootie:~$ mc mb ddln0/ipng-web-assets +pim@glootie:~$ mc anonymous set download ddln0/ipng-web-assets +pim@glootie:~$ mc admin user add ddln0/ +pim@glootie:~$ cat << EOF | tee ipng-web-assets-access.json +{ + "PolicyName": "ipng-web-assets-access", + "Policy": { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ "s3:DeleteObject", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ], + "Resource": [ "arn:aws:s3:::ipng-web-assets", "arn:aws:s3:::ipng-web-assets/*" ] + } + ] + }, +} +EOF +pim@glootie:~$ mc admin policy create ddln0/ ipng-web-assets-access.json +pim@glootie:~$ mc admin policy attach ddln0/ ipng-web-assets-access --user +pim@glootie:~$ mc replicate add chbtl0/ipng-web-assets \ + --remote-bucket https://:@s3.ddln0.ipng.ch/ipng-web-assets +``` + +What happens next is pure magic. I've told `chbtl0` that I want it to replicate all existing and +future changes to that bucket to its neighbor `ddln0`. Only minutes later, I check the replication +status, just to see that it's _already done_: + +``` +pim@glootie:~$ mc replicate status chbtl0/ipng-web-assets + Replication status since 1 hour + s3.ddln0.ipng.ch + Replicated: 142 objects (6.5 GiB) + Queued: ● 0 objects, 0 B (avg: 4 objects, 915 MiB ; max: 0 objects, 0 B) + Workers: 0 (avg: 0; max: 0) + Transfer Rate: 15 kB/s (avg: 88 MB/s; max: 719 MB/s + Latency: 3ms (avg: 3ms; max: 7ms) + Link: ● online (total downtime: 0 milliseconds) + Errors: 0 in last 1 minute; 0 in last 1hr; 0 since uptime + Configured Max Bandwidth (Bps): 644 GB/s Current Bandwidth (Bps): 975 B/s +pim@summer:~/src/ipng-web-assets$ mc ls ddln0/ipng-web-assets/ +[2025-06-01 12:42:22 CEST] 0B ipng.ch/ +[2025-06-01 12:42:22 CEST] 0B sabbatical.ipng.nl/ +``` + +MinIO has pumped the data from bucket `ipng-web-assets` to the other machine at an average of 88MB/s +with a peak throughput of 719MB/s (probably for the larger VM images). And indeed, looking at the + +### MinIO: Missing directory index + +I take a look at what I just built, on the following URL: +* [https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/IMG_0406_0.mp4](https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/IMG_0406_0.mp4) + +That checks out, and I can see the mess that was my room when I first went on sabbatical. By the +way, I totally cleaned it up, see +[[here](https://sabbatical.ipng.nl/blog/2024/08/01/thursday-basement-done/)] for proof. I can't, +however, see the directory listing: + +``` +pim@glootie:~$ curl https://ipng-web-assets.s3.ddln0.ipng.ch/sabbatical.ipng.nl/media/vdo/ + + + NoSuchKey + The specified key does not exist. + sabbatical.ipng.nl/media/vdo/ + ipng-web-assets + /sabbatical.ipng.nl/media/vdo/ + 1844EC0CFEBF3C5F + dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8 + +``` + +That's unfortunate, because some of the IPng articles link to a directory full of files, which I'd +like to be shown so that my readers can navigate through the directories. Surely I'm not the first +to encouter this? And sure enough, I'm not +[[ref](https://github.com/glowinthedark/index-html-generator)] by user `glowinthedark` who wrote a +little python script that generates `index.html` files for their Caddy file server. I'll take me +some of that Python, thank you! + +With the following little script, my setup is complete: + +``` +pim@glootie:~/src/ipng-web-assets$ cat push.sh +#!/usr/bin/env bash + +echo "Generating index.html files ..." +for D in */media; do + echo "* Directory $D" + ./genindex.py -r $D +done +echo "Done (genindex)" +echo "" + +echo "Mirroring directoro to S3 Bucket" +mc mirror --remove --overwrite . chbtl0/ipng-web-assets/ +echo "Done (mc mirror)" +echo "" +pim@glootie:~/src/ipng-web-assets$ ./push.sh +``` + +Only a few seconds after I run `./push.sh`, the replication is complete and I have two identical +copies of my media: + +1. [https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/](https://ipng-web-assets.s3.chbtl0.ipng.ch/ipng.ch/media/index.html) +1. [https://ipng-web-assets.s3.ddln0.ipng.ch/ipng.ch/media/](https://ipng-web-assets.s3.ddln0.ipng.ch/ipng.ch/media/index.html) + + +### NGINX: Proxy to Minio + +Before moving to S3 storage, my NGINX frontends all kept a copy of the IPng media on local NVME +disk. That's great for reliability, as each NGINX instance is completely hermetic and standalone. +However, it's not great for scaling: the current NGINX instances only have 16GB of local storage, +and I'd rather not have my static web asset data outgrow that filesystem. From before, I already had +an NGINX config that served the Hugo static data from `/var/www/ipng.ch/ and the `/media' +subdirectory from a different directory in `/var/www/ipng-web-assets/ipng.ch/media`. + +Moving to redundant S3 storage backenda is straight forward: + +``` +upstream minio_ipng { + least_conn; + server minio0.chbtl0.net.ipng.ch:9000; + server minio0.ddln0.net.ipng.ch:9000; +} + +server { + ... + location / { + root /var/www/ipng.ch/; + } + + location /media { + proxy_set_header Host $http_host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + + proxy_connect_timeout 300; + proxy_http_version 1.1; + proxy_set_header Connection ""; + chunked_transfer_encoding off; + + rewrite (.*)/$ $1/index.html; + + proxy_pass http://minio_ipng/ipng-web-assets/ipng.ch/media; + } +} +``` + +I want to make note of a few things: +1. The `upstream` definition here uses IPng Site Local entrypoints, considering the NGINX servers + all have direct MTU=9000 access to the MinIO instances. I'll put both in there, in a + round-robin configuration favoring the replica with _least connections_. +1. Deeplinking to directory names without the trailing `/index.html` would serve a 404 from the + backend, so I'll intercept these and rewrite directory to always include the `/index.html'. +1. The used upstream endpoint is _path-based_, that is to say has the bucketname and website name + included. This whole location used to be simply `root /var/www/ipng-web-assets/ipng.ch/media/` + so the mental change is quite small. + +### NGINX: Caching + + +After deploying the S3 upstream on all IPng websites, I can delete the old +`/var/www/ipng-web-assets/` directory and reclaim about 7GB of diskspace. This gives me an idea ... + +{{< image width="8em" float="left" src="/assets/shared/brain.png" alt="brain" >}} + +On the one hand it's great that I will pull these assets from Minio and all, but at the same time, +it's a tad inefficient to retrieve them from, say, Zurich to Amsterdam just to serve them onto the +internet again. If at any time something on the IPng website goes viral, it'd be nice to be able to +serve them directly from the edge, right? + +A webcache. What could _possibly_ go wrong :) + +NGINX is really really good at caching content. It has a powerful engine to store, scan, revalidate +and match any content and upstream headers. It's also very well documented, so I take a look at the +proxy module's documentation [[here](https://nginx.org/en/docs/http/ngx_http_proxy_module.html)] and +in particular a useful [[blog](https://blog.nginx.org/blog/nginx-caching-guide)] on their website. + +The first thing I need to do is create what is called a _key zone_, which is a region of memory in +which URL keys are stored with some metadata. Having a copy of the keys in memory enables NGINX to +quickly determine if a request is a HIT or a MISS without having to go to disk, greatly speeding up +the check. + +In `/etc/nginx/conf.d/ipng-cache.conf` I add the following NGINX cache: + +``` +proxy_cache_path /var/www/nginx-cache levels=1:2 keys_zone=ipng_cache:10m max_size=8g + inactive=24h use_temp_path=off; +``` + +With this statement, I'll create a 2-level subdirectory, and allocate 10MB of space, which should +hold on the order of 100K entries. The maximum size I'll allow the cache to grow to is 8GB, and I'll +mark any object inactive if it's not been referenced for 24 hours. I learn that inactive is +different to expired content. If a cache element has expired, but NGINX can't reach the upstream +for a new copy, it can be configured to serve a inactive (stale) copy from the cache. That's dope, +as it serves as an extra layer of defence in case the network or all available S3 replicas take the +day off. I'll ask NGINX to avoid writing objects first to a tmp directory and them moving them into +the `/var/www/nginx-cache` directory. These are recommendations I grab from the manual. + +Within the `location` block I configured above, I'm now ready to enable this cache. I'll do that by +adding two include files, which I'll reference in all sites that I want to have make use of this +cache: + +First, to enable the cache, I write the following snippet: +``` +pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-cache.inc +proxy_cache ipng_cache; +proxy_ignore_headers Cache-Control; +proxy_cache_valid any 1h; +proxy_cache_revalidate on; +proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504; +proxy_cache_background_update on; +``` + +Then, I find it useful to emit a few debugging HTTP headers, and at the same time I see that Minio +emits a bunch of HTTP headers that may not be safe for me to propagate, so I pen two more snippets: + +``` +pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-strip-minio-headers.inc +proxy_hide_header x-minio-deployment-id; +proxy_hide_header x-amz-request-id; +proxy_hide_header x-amz-id-2; +proxy_hide_header x-amz-replication-status; +proxy_hide_header x-amz-version-id; + +pim@nginx0-nlams1:~$ cat /etc/nginx/conf.d/ipng-add-upstream-headers.inc +add_header X-IPng-Frontend $hostname always; +add_header X-IPng-Upstream $upstream_addr always; +add_header X-IPng-Upstream-Status $upstream_status always; +add_header X-IPng-Cache-Status $upstream_cache_status; +``` + +With that, I am ready to enable caching of the IPng `/media` location: + +``` + location /media { + ... + include /etc/nginx/conf.d/ipng-strip-minio-headers.inc; + include /etc/nginx/conf.d/ipng-add-upstream-headers.inc; + include /etc/nginx/conf.d/ipng-cache.inc; + ... +} +``` + +## Results + +I run the Ansible playbook for the NGINX cluster and take a look at the replica at Coloclue in +Amsterdam, called `nginx0.nlams1.ipng.ch`. Notably, it'll have to retrieve the file from a MinIO +replica in Zurich (12ms away), so it's expected to take a little while. + +The first attempt: + +``` +pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \ + https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz +... +< last-modified: Sun, 01 Jun 2025 12:37:52 GMT +< x-ipng-frontend: nginx0-nlams1 +< x-ipng-cache-status: MISS +< x-ipng-upstream: [2001:678:d78:503::b]:9000 +< x-ipng-upstream-status: 200 + +100 711M 100 711M 0 0 26.2M 0 0:00:27 0:00:27 --:--:-- 26.6M +``` + + +OK, that's respectable, I've read the file at 26MB/s. Of course I just turned on the cache, so the +NGINX fetches the file from Zurich while handing it over to my `curl` here. It notifies me by means +of a HTTP header that the cache was a `MISS`, and then which upstream server it contacted to +retrieve the object. + +But look at what happens the _second_ time I run the same command: + +``` +pim@nginx0-nlams1:~$ curl -v -o /dev/null --connect-to ipng.ch:443:localhost:443 \ + https://ipng.ch/media/vpp-proto/vpp-proto-bookworm.qcow2.lrz +< last-modified: Sun, 01 Jun 2025 12:37:52 GMT +< x-ipng-frontend: nginx0-nlams1 +< x-ipng-cache-status: HIT + +100 711M 100 711M 0 0 436M 0 0:00:01 0:00:01 --:--:-- 437M +``` + + +Holy moly! First I see the object has the same _Last-Modified_ header, but I now also see that the +_Cache-Status_ was a `HIT`, and there is no mention of any upstream server. I do however see the +file come in at a whopping 437MB/s which is 16x faster than over the network!! Nice work, NGINX! + +{{< image float="right" src="/assets/minio/rack-2.png" alt="Rack-o-Minio" width="12em" >}} + +# What's Next + +I'm going to deploy the third MinIO replica in Rümlang once the disks arrive. I'll release the +~4TB of disk used currently in Restic backups for the fleet, and put that ZFS capacity to other use. +Now, creating services like PeerTube, Mastodon, Pixelfed, Loops, NextCloud and what-have-you, will +become much easier for me. And with the per-bucket replication between MinIO deployments, I also +think this is a great way to auto-backup important data. First off, it'll be RS8.4 on the MinIO node +itself, and secondly, user data will be copied automatically to a neighboring facility. + +I've convinced myself that S3 storage is a great service to operate, and that MinIO is awesome. diff --git a/static/assets/minio/nginx-logo.png b/static/assets/minio/nginx-logo.png new file mode 100644 index 0000000..6b9d646 --- /dev/null +++ b/static/assets/minio/nginx-logo.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1e14e4e3ab22a2e3229498150bcb65f45d86178f1204c756796e8485a4fc1c9a +size 6007 diff --git a/static/assets/minio/rack-2.png b/static/assets/minio/rack-2.png new file mode 100644 index 0000000..042a9a7 --- /dev/null +++ b/static/assets/minio/rack-2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5ed9fe8d090fe9460dfe126d76564b38882196dc84dd4e76509c299d11634ac7 +size 1302910 diff --git a/static/assets/minio/restic-logo.png b/static/assets/minio/restic-logo.png new file mode 100644 index 0000000..e7f4eab --- /dev/null +++ b/static/assets/minio/restic-logo.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23bc19c8b9db187b5487ab6a9fa5f3470f4f9c7a4fb6caba1ec11275b4a4dbb9 +size 307599