254 lines
16 KiB
Markdown
254 lines
16 KiB
Markdown
---
|
|
date: "2022-11-20T22:35:14Z"
|
|
title: Mastodon - Part 1 - Installing
|
|
aliases:
|
|
- /s/articles/2022/11/20/mastodon-1.html
|
|
---
|
|
|
|
# About this series
|
|
|
|
{{< image width="200px" float="right" src="/assets/mastodon/mastodon-logo.svg" alt="Mastodon" >}}
|
|
|
|
I have seen companies achieve great successes in the space of consumer internet and entertainment industry. I've been feeling less
|
|
enthusiastic about the stronghold that these corporations have over my digital presence. I am the first to admit that using "free" services
|
|
is convenient, but these companies are sometimes taking away my autonomy and exerting control over society. To each their own of course, but
|
|
for me it's time to take back a little bit of responsibility for my online social presence, away from centrally hosted services and to
|
|
privately operated ones.
|
|
|
|
This series details my findings starting a micro blogging website, which uses a new set of super interesting open interconnect protocols to
|
|
share media (text, pictures, videos, etc) between producers and their followers, using an open source project called
|
|
[Mastodon](https://joinmastodon.org/).
|
|
|
|
## Introduction
|
|
|
|
Similar to how blogging is the act of publishing updates to a website, microblogging is the act of publishing small updates to a stream of
|
|
updates on your profile. You can publish text posts and optionally attach media such as pictures, audio, video, or polls. Mastodon lets you
|
|
follow friends and discover new ones. It doesn't do this in a centralized way, however.
|
|
|
|
Groups of people congregate on a given server, of which they become a user by creating an account on that server. Then, they interact with
|
|
one another on that server, but users can also interact with folks on _other_ servers. Instead of following **@IPngNetworks**, they might
|
|
follow a user on a given server domain, like **@IPngNetworks@ublog.tech**. This way, all these servers can be run _independently_ but
|
|
interact with each other using a common protocol (called ActivityPub). I've heard this concept be compared to choosing an e-mail provider: I
|
|
might choose Google's gmail.com, and you might use Microsoft's live.com. However we can send e-mails back and forth due to this common
|
|
protocol (called SMTP).
|
|
|
|
### uBlog.tech
|
|
|
|
I thought I would give it a go, mostly out of engineering curiosity but also because I more strongly feel today that we (the users) ought to
|
|
take a bit more ownership back. I've been a regular blogging and micro-blogging user since approximately for ever, and I think it may be a
|
|
good investment of my time to learn a bit more about the architecture of Mastodon. So, I've decided to build and _productionize_ a server
|
|
instance.
|
|
|
|
I registered [uBlog.tech](https://ublog.tech). Incidentally, if you're reading this and would like to participate, the server welcomes users
|
|
in the network-, systems- and software engineering disciplines. But, before I can get to the fun parts though, I have to do a bunch of work
|
|
to get this server in a shape in which it can be trusted with user generated content.
|
|
|
|
### Hardware
|
|
|
|
I'm running Debian on (a set of) Dell R720s hosted by IPng Networks in Zurich, Switzerland. These machines are all roughly the same, and
|
|
come with:
|
|
|
|
* 2x10C/10T Intel E5-2680 (so 40 CPUs)
|
|
* 256GB ECC RAM
|
|
* 2x240G SSD in mdraid to boot from
|
|
* 3x1T SSD in ZFS for fast storage
|
|
* 6x16T harddisk with 2x500G SSD for L2ARC, in ZFS for bulk storage
|
|
|
|
Data integrity and durability is important to me. It's the one thing that typically the commercial vendors do really well, and my pride
|
|
prohibits me from losing data due to things like "disk failure" or "computer broken" or "datacenter on fire". So, I handle backups in two
|
|
main ways: borg(1) and zrepl(1).
|
|
|
|
* **Hypervisor hosts** make a daily copy of their entire filesystem using **borgbackup(1)** to a set of two remote fileservers. This way, the
|
|
important file metadata, configs for the virtual machines, and so on, are all safely stored remotely.
|
|
* **Virtual machines** are running on ZFS blockdevices on either the SSD pool, or the disk pool, or both. Using a tool called **zrepl(1)**
|
|
(which I described a little bit in a [[previous post]({{< ref "2022-10-14-lab-1" >}})]), I create a snapshot every 12hrs on the local
|
|
blockdevice, and incrementally copy away those snapshots daily to the remote fileservers.
|
|
|
|
If I do something silly on a given virtual machine, I can roll back the machine filesystem state to the previous checkpoint and reboot. This has
|
|
saved my butt a number of times, during say a PHP 7 to 8 upgrade for Librenms, or during an OpenBSD upgrade that ran out of disk midway
|
|
through. Being able to roll back to a last known good state is awesome, and completely transparent for the virtual machine, as the
|
|
snapshotting is done on the underlying storage pool in the hypervisor. The fileservers run physically separated from the server pools, one in
|
|
Zurich and another in Geneva, so this way, if I were to lose the entire machine, I still have a ~12h old backup in two locations.
|
|
|
|
### Software
|
|
|
|
I provision a VM with 8vCPUs (dedicated on the underlying hypervisor), including 16GB of memory and two virtio network cards. One NIC will
|
|
connect to a backend LAN in some RFC1918 address space, and the other will present an IPv4 and IPv6 interface to the internet. I give this
|
|
machine two blockdevices, one small one of 16GB (vda) that is created on the hypervisor's `ssd-vol0/libvirt/ublog-disk0`, to be used only
|
|
for boot, logs and OS. Then, a second one (vdb) is created at 300GB on `ssd-vol1/libvirt/ublog-disk1` and it will be used for Mastodon and
|
|
its supporting services.
|
|
|
|
Then I simply install Debian into **vda** using `virt-install`. At IPng Networks we have some ansible-style automation that takes over the
|
|
machine, and further installs all sorts of Debian packages that we use (like a Prometheus node exporter, more on that later), and sets up a
|
|
firewall that allows SSH access for our trusted networks, and otherwise only allows port 80 and 443 because this is to be a webserver.
|
|
|
|
After installing Debian Bullseye, I'll create the following ZFS filesystems on **vdb**:
|
|
|
|
```
|
|
pim@ublog:~$ sudo zfs create -o mountpoint=/home/mastodon data/mastodon -V10G
|
|
pim@ublog:~$ sudo zfs create -o mountpoint=/var/lib/elasticsearch data/elasticsearch -V10G
|
|
pim@ublog:~$ sudo zfs create -o mountpoint=/var/lib/postgresql data/postgresql -V20G
|
|
pim@ublog:~$ sudo zfs create -o mountpoint=/var/lib/redis data/redis -V2G
|
|
pim@ublog:~$ sudo zfs create -o mountpoint=/home/mastodon/libve/public/system data/mastodon-system
|
|
```
|
|
|
|
As a sidenote, I realize that this ZFS filesystem pool consists only of **vdb**, but its underlying blockdevice is protected in a raidz, and
|
|
it is copied incrementally daily off-site by the hypervisor. I'm pretty confident on safety here, but I prefer to use ZFS for the virtual
|
|
machine guests as well, because now I can do local snapshotting, of say `data/mastodon-system`, and I can more easily grow/shrink the
|
|
datasets for the supporting services, as well as monitor them individually for wildgrowth.
|
|
|
|
#### Installing Mastodon
|
|
|
|
I then go through the public [Mastodon docs](https://docs.joinmastodon.org/admin/install/) to further install the machine. I choose not to
|
|
go the Docker route, but instead stick to systemd installs. The install itself is pretty straight forward, but I did find the nginx config
|
|
a bit rough around the edges (notably because the default files I'm asked to use have their ssl certificate stanza's commented out, while
|
|
trying to listen on port 443, and this makes nginx and certbot very confused). A cup of tea later, and we're all good.
|
|
|
|
I am not going to start prematurely optimizing, and after a very engaging thread on Mastodon itself
|
|
[[@davidlars@hachyderm.io](https://ublog.tech/@davidlars@hachyderm.io/109381163342345835)] with a few fellow admins, the consensus really is
|
|
to _KISS_ (keep it simple, silly!). In that thread, I made a few general observations on scaling up and out (none of which I'll be doing
|
|
initially), just by using some previous experience as a systems engineer, and knowing a bit about the components used here:
|
|
|
|
* Running services on dedicated machines (ie. saparate storage, postgres, Redis, Puma and Sidekiq workers)
|
|
* Fiddle with Puma worker pool (more workers, and/or more threads per worker)
|
|
* Fiddle with Sidekiq worker pool and dedicated instances per queue
|
|
* Put storage on local minio cluster
|
|
* Run multiple postgres databases, read-only replicas, or multimaster
|
|
* Run cluster of multiple redis instances instead of one
|
|
* Split off the cache redis into mem-only
|
|
* Frontend the service with a cluster of NGINX + object caching
|
|
|
|
Some other points of interest for those of us on the adventure of running our own machines follow:
|
|
|
|
#### Logging
|
|
|
|
Mastodon is a chatty one - it is logging to stdout/stderr and most of its tasks in Sidekiq have a lot to say. On Debian, by default this
|
|
output goes from **systemd** into **journald** which in turn copies it into **syslogd**. The result of this is that each logline hits the
|
|
disk three (!) times. And also by default, Debian and Ubuntu aren't too great at log hygiene. While `/var/log/` is scrubbed by logrotate(8),
|
|
nothing helps avoid the journal from growing unboundedly. So I quickly make the following change:
|
|
|
|
```
|
|
pim@ublog:~$ cat << EOF | sudo tee /etc/systemd/journald.conf
|
|
[Journal]
|
|
SystemMaxUse=500M
|
|
ForwardToSyslog=no
|
|
EOF
|
|
pim@ublog:~$ sudo systemctl restart systemd-journald
|
|
```
|
|
|
|
#### Paperclip and ImageMagick
|
|
|
|
I noticed while tailing the journal `journalctl -f` that lots of incoming media gets first spooled to /tmp and then run through a conversion
|
|
step to ensure the media is of the right format/aspect ratio. Mastodon calls a library called `paperclip` which in turn uses file(1) and
|
|
identify(1) to determine the type of file, and based on the answer for images runs convert(1) or ffmpeg(1) to munge it into the shape it
|
|
wants. I suspect that this will cause a fair bit of I/O in `/tmp` so something to keep in mind, is to either lazily turn that mountpoint
|
|
into a `tmpfs` (which is in general frowned upon), or to change the paperclip library to use a user-defined filesystem like `~mastodon/tmp`
|
|
and make _that_ a memory backed filesystem instead. The log signature in case you're curious:
|
|
|
|
```
|
|
Nov 20 21:02:10 ublog bundle[408189]: Command :: file -b --mime '/tmp/a22ab94adb939b0eb3c224bb9046c9cf20221123-408189-s0rsty.jpg'
|
|
Nov 20 21:02:10 ublog bundle[408189]: Command :: identify -format %m '/tmp/6205b887c6c337b1a72ae2a7ccb359c920221123-408189-e9jul1.jpg[0]'
|
|
Nov 20 21:02:10 ublog bundle[408189]: Command :: convert '/tmp/6205b887c6c337b1a72ae2a7ccb359c920221123-408189-e9jul1.jpg[0]' -auto-orient -resize "400x400>" -coalesce '/tmp/8ce2976b99d4b5e861e6c988459ee20c20221123-408189-1p5gg4'
|
|
Nov 20 21:02:10 ublog bundle[408189]: Command :: convert '/tmp/8ce2976b99d4b5e861e6c988459ee20c20221123-408189-1p5gg4' -depth 8 RGB:-
|
|
```
|
|
|
|
I will put a pin in this until it becomes a bottleneck, but larger server admins may have thought about this before, and if so, let me know
|
|
what you came up with!
|
|
|
|
#### Elasticsearch
|
|
|
|
There's a little bit of a timebomb here, unfortunately. Following [[Full-text
|
|
search](https://docs.joinmastodon.org/admin/optional/elasticsearch/)] docs, the install and integration is super easy. But, in an upcoming
|
|
release, Elasticsearch is going to _force_ authentication by default, even though in the current version they are still tolerant of
|
|
non-secured instances, those will break in the future. So I'm going to get ahead of that and create my instance with the minimally required
|
|
security setup in mind [[ref](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html)]:
|
|
|
|
```
|
|
pim@ublog:~$ cat << EOF | sudo tee -a /etc/elasticsearch/elasticsearch.yml
|
|
xpack.security.enabled: true
|
|
discovery.type: single-node
|
|
EOF
|
|
pim@ublog:~$ PASS=$(openssl rand -base64 12)
|
|
pim@ublog:~$ /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive
|
|
(use this $PASS for the 'elastic' user)
|
|
pim@ublog:~$ cat << EOF | sudo tee -a ~mastodon/live/.env.production
|
|
ES_USER=elastic
|
|
ES_PASS=$PASS
|
|
EOF
|
|
pim@ublog:~$ sudo systemctl restart mastodon-streaming mastodon-web mastodon-sidekiq
|
|
```
|
|
|
|
Elasticsearch is a memory hog, which is not that strange considering its job is to supply full text retrieval in a large
|
|
amount of documents and data at high performance. It'll by default grab roughly half of the machine's memory, which it
|
|
really doesn't need for now. So, I'll give it a little bit of a smaller playground to expand into, by limiting it's heap
|
|
to 2 GB to get us started:
|
|
|
|
```
|
|
pim@ublog:~$ cat << EOF | sudo tee /etc/elasticsearch/jvm.options.d/memory.options
|
|
-Xms2048M
|
|
-Xmx2048M
|
|
EOF
|
|
pim@ublog:~$ sudo systemctl restart elasticsearch
|
|
```
|
|
|
|
#### Mail
|
|
|
|
E-mail can be quite tricky to get right. At IPng we've been running mailservers for a while now, and we're reasonably good at delivering
|
|
mail even to the most hard-line providers (looking at you, GMX and Google). We use relays from a previous project of mine called
|
|
[[PaPHosting](https://paphosting.net)], which you can clearly see comes from the Dark Ages when the Internet was still easy. These days, our
|
|
mailservers run a combination of STS-MTA, TLS certs from Lets Encrypt, DMARC, and SPF. So our outbound mail is simply using OpenBSD's
|
|
smtpd(8), and it forwards to the remote relay pool of five servers using authentication, but only after rewriting the envelope to always
|
|
come from `@ublog.tech` and match the e-mail sender (which allows for strict SPF):
|
|
|
|
```
|
|
pim@ublog:~$ cat /etc/smtpd.conf
|
|
table aliases file:/etc/aliases
|
|
table secrets file:/etc/mail/secrets
|
|
|
|
listen on localhost
|
|
|
|
action "local_mail" mbox alias <aliases>
|
|
action "outbound" relay host "smtp+tls://papmx@smtp.paphosting.net" auth <secrets> \
|
|
mail-from "@ublog.tech"
|
|
|
|
match from local for local action "local_mail"
|
|
match from local for any action "outbound"
|
|
```
|
|
|
|
Inbound mail to the `@ublog.tech` domain is also handled by the paphosting servers, which forward them all to our respective inboxes.
|
|
|
|
#### Server Settings
|
|
|
|
After reading a post from [[@rriemann@chaos.social](https://ublog.tech/@rriemann@chaos.social/109384055799108617)], I was quickly convinced
|
|
that having a good privacy policy is worth the time. I took their excellent advice to create a reasonable [[Privacy
|
|
Policy](https://ublog.tech/privacy-policy)]. Thanks again for that, and if you're running a server in Europe or with European users,
|
|
definitely check it out.
|
|
|
|
Rules are important. I didn't give this as much thought, but I did assert some ground rules. Even though I do believe in [[Postel's
|
|
Robustness Principle](https://en.wikipedia.org/wiki/Robustness_principle)] (_Be liberal in what you accept, and conservative in what you
|
|
send._), I generally tend to believe that computers lose their temper less often than humans, so I started off with:
|
|
|
|
1. **Behavioral Tenets**: Use welcoming and inclusive language, be respectful of differing viewpoints and experiences, gracefully accept
|
|
constructive criticism, focus on what is best for the community, show empathy towards other community members. Be kind to each other, and
|
|
yourself.
|
|
1. **Unacceptable behavior**: Use of sexualized language or imagery, unsolicited romantic attention, trolling, derogatory
|
|
comments, personal or political attacks, doxxing are strictly prohibited. Use conduct considered inappropriate for a professional setting.
|
|
|
|
{{< image width="70px" float="left" src="/assets/mastodon/msie.png" alt="Favicon" >}}
|
|
I also read an entertaining (likely insider-joke) post from [[@nova@hachyderm.io](https://ublog.tech/@nova@hachyderm.io/109389072740558566)],
|
|
in which she was asking about the internet explorer favicon on her instance, so I couldn't resist but replace the mastodon favicon with the
|
|
IPng Networks one. Vanity matters.
|
|
|
|
## What's next
|
|
|
|
Now that the server is up, and I have a small amount of users (mostly folks I know from the tech industry), I took some time to explore
|
|
both the Fediverse, reach out to friends old and new, participate in a few random discussions, fiddle with the iOS apps (and in the end,
|
|
settled on Toot! with a runner up of Metatext), and generally had an *amazing* time on Mastodon these last few days.
|
|
|
|
Now, I think I'm ready to further productionize the experience. My next article will cover monitoring - a vital aspect of any serious
|
|
project. I'll go over Prometheus, Grafana, Alertmanager and how to get the most signal out of a running Mastodon instance. Stay tuned!
|
|
|
|
If you're looking for a home, feel free to sign up at [https://ublog.tech/](https://ublog.tech/) as I'm sure that having a bit more load /
|
|
traffic on this instance will allow me to learn (and in turn, to share with others)!
|
|
|