Files
ipng.ch/content/articles/2022-11-24-mastodon-2.md

218 lines
16 KiB
Markdown

---
date: "2022-11-24T01:20:14Z"
title: Mastodon - Part 2 - Monitoring
aliases:
- /s/articles/2022/11/24/mastodon-2.html
---
# About this series
{{< image width="200px" float="right" src="/assets/mastodon/mastodon-logo.svg" alt="Mastodon" >}}
I have seen companies achieve great successes in the space of consumer internet and entertainment industry. I've been feeling less
enthusiastic about the stronghold that these corporations have over my digital presence. I am the first to admit that using "free" services
is convenient, but these companies are sometimes taking away my autonomy and exerting control over society. To each their own of course, but
for me it's time to take back a little bit of responsibility for my online social presence, away from centrally hosted services and to
privately operated ones.
In the [[previous post]({{< ref "2022-11-20-mastodon-1" >}})], I shared some thoughts on how the overall install of a Mastodon instance
went, making it a point to ensure my users' (and my own!) data is somehow safe, and the machine runs on good hardware, and with good
connectivity. Thanks IPng, for that 10G connection! In this post, I visit an old friend,
[[Borgmon](https://sre.google/sre-book/practical-alerting/)], which has since reincarnated and become the _de facto_ open source
observability and signals ecosystem, and its incomparably awesome friend. Hello, **Prometheus** and **Grafana**!
## Anatomy of Mastodon
Looking more closely at the architecture of Mastodon, it consists of a few moving parts:
1. **Storage**: At the bottom, there's persistent storage, in my case **ZFS**, on which account information (like avatars), media
attachments, and site-specific media lives. As posts stream to my instance, their media is spooled locally for performance.
1. **State**: Application state is kept in two databases:
* Firstly, a **SQL database** which is chosen to be [[PostgreSQL](https://postgresql.org)].
* Secondly, a memory based **key-value storage** system [[Redis](https://redis.io)] is used to track the vitals of home feeds,
list feeds, Sidekiq queues as well as Mastodon's streaming API.
1. **Web (transactional)**: The webserver that serves end user requests and the API is written in a Ruby framework called
[[Puma](https://github.com/puma/puma)]. Puma tries to do its job efficiently, and doesn't allow itself to be bogged down by long lived web
sessions, such as the ones where clients get streaming updates to their timelines on the web- or mobile client.
1. **Web (streaming)**: This webserver is written in [[NodeJS](https://nodejs.org/en/about/)] and excels at long lived connections
that use Websockets, by providing a Streaming API to clients.
1. **Web (frontend)**: To tie all the current and future microservices together, provide SSL (for HTTPS), and a local object cache for
things that don't change often, one or more [[NGINX](https://nginx.org)] servers are used.
1. **Backend (processing)**: Many interactions with the server (such as distributing posts) turn in to background tasks that are enqueued
and handled asynchronously by a worker pool provided by [[Sidekiq](https://github.com/mperham/sidekiq)].
1. **Backend (search)**: Users that wish to search the local corpus of posts and media, can interact with an instance of
[[Elastic](https://www.elastic.co/)], a free and open search and analytics solution.
These systems all interact in particular ways, but I immediately noticed one interesting tidbit. Pretty much every system in this list can
(or can be easily made to) emit metrics in a popular [[Prometheus](https://prometheus.io/)] format. I cannot overstate the love I have for
this project, both technically but also socially because I know how it came to be. Ben, thanks for the RC racecars (I still have them!).
Matt, I admire your Go- and Java-skills and your general workplace awesomeness. And Richi sorry to have missed you last week in Hamburg at
[[DENOG14](https://www.denog.de/de/meetings/denog14/agenda.html)]!
## Prometheus
Taking stock of the architecture here, I think my best bet is to rig this stuff up with Prometheus. This works mostly by having a central,
in my case external to [[uBlog.tech](https://ublog.tech)] server scrape a bunch of timeseries metrics periodically, after which I can create
pretty graphs of them, but also monitor if some values seem out of whack, like a Sidekiq queue delay raising, CPU or disk I/O running a bit
hot. And the best thing yet? I will get pretty much all of this for free, because other, smarter folks have contributed into this
ecosystem already:
* **Server**: monitoring is canonically done by [[Node Exporter](https://prometheus.io/docs/guides/node-exporter/)]. It provides metrics for
all the lowlevel machine and kernel stats you'd ever think to want: network, disk, cpu, processes, load, and so on.
* **Redis**: Is provided by [[Redis Exporter](https://github.com/oliver006/redis_exporter)] and can show all sorts of operations on data
realms served by Redis.
* **PostgreSQL**: is provided by [[Postgres Exporter](https://github.com/prometheus-community/postgres_exporter)] which is
maintained by the Prometheus Community.
* **NGINX**: Is provided by [[NGINX Exporter](https://github.com/nginxinc/nginx-prometheus-exporter)] which is maintained by the company
behind NGINX. I used to have a Lua based exporter (when I ran [[SixXS](https://sixxs.net/)]) which had lots of interesting additional
stats, but for the time being I'll just use this one.
* **Elastic**: has a converter from its own metrics system in the [[Elasticsearch
Exporter](https://github.com/prometheus-community/elasticsearch_exporter)], once again maintained by the (impressively fabulous!)
Prometheus Community.
All of these implement a common pattern: they take the (bespoke, internal) representation of statistics counters or dials/gauges, and
transform them into a common format called the _Metrics Exposition_ format, and they provide this in either an HTTP endpoint (typically
using a `/metrics` URI handler directly on the webserver), or in a push-mechanism using a popular
[[Pushgateway](https://prometheus.io/docs/instrumenting/pushing/)] in case there is no server to poll, for example a batch process that did
some work and wanted to report on its results.
Incidentally, a fair amount of popular open source infrastructure already has a Prometheus exporter -- check out [[this
list](https://prometheus.io/docs/instrumenting/exporters/)], but also the assigned [[TCP
ports](https://github.com/prometheus/prometheus/wiki/Default-port-allocations)] for popular things that you might also be using.
Maybe you'll get lucky and find out that somebody has already provided an exporter, so you don't have to!
### Configuring Exporters
Now that I have found a whole swarm of these Prometheus Exporter microservices, and understand how to plumb each of them through to
what-ever it is they are monitoring, I can get cracking on some observability. Let me provide some notes for posterity, both for myself
if I ever revisit the topic and ... kind of forgot what I had done so far :), but maybe also for the adventurous, who are interested in
using Prometheus on their own Mastodon instance.
First of all, it's worth mentioning that while these exporters (typically written in Go) have command line flags, they can often also take
their configuration from environment variables, provided mostly becasue they operate in Docker or Kubernetes. My exporters will all run
_vanilla_ in **systemd**, but these systemd units can also be configured to use environments, which is neat!
First, I create a few environment files for each **systemd** unit that contains a Prometheus exporter:
```
pim@ublog:~$ ls -la /etc/default/*exporter
-rw-r----- 1 root root 49 Nov 23 18:15 /etc/default/elasticsearch-exporter
-rw-r----- 1 root root 76 Nov 22 17:13 /etc/default/nginx-exporter
-rw-r----- 1 root root 170 Nov 22 22:41 /etc/default/postgres-exporter
-rw-r----- 1 root root 9789 May 27 2021 /etc/default/prometheus-node-exporter
-rw-r----- 1 root root 0 Nov 22 22:56 /etc/default/redis-exporter
-rw-r----- 1 root root 67 Nov 22 23:56 /etc/default/statsd-exporter
```
The contents of these files will give away passwords, like the one for ElasticSearch or Postgres, so I specifically make them readable only
by `root:root`. I won't share my passwords with you, dear reader, so you'll have to guess the contents here!
Priming the environment with these values, I will take the **systemd** unit for elasticsearch as an example:
```
pim@ublog:~$ cat << EOF | sudo tee /lib/systemd/system/elasticsearch-exporter.service
[Unit]
Description=Elasticsearch Prometheus Exporter
After=network.target
[Service]
EnvironmentFile=/etc/default/elasticsearch-exporter
ExecStart=/usr/local/bin/elasticsearch_exporter
User=elasticsearch
Group=elasticsearch
Restart=always
[Install]
WantedBy=multi-user.target
EOF
pim@ublog:~$ cat << EOF | sudo tee /etc/default/elasticsearch-exporter
ES_USERNAME=elastic
ES_PASSWORD=$(SOMETHING_SECRET) # same as ES_PASS in .env.production
EOF
pim@ublog:~$ sudo systemctl enable elasticsearch-exporter
pim@ublog:~$ sudo systemctl start elasticsearch-exporter
```
Et voil&agrave;, just like that the service starts, connects to elasticsearch, transforms all of its innards into beautiful Prometheus metrics, and
exposes them on its "registered" port, in this case 9114, which can be scraped by the Prometheus instance a few computers away, connected to
the uBlog VM via backend LAN over RFC1918. I just _knew_ that second NIC would come in useful!
{{< image width="400px" float="right" src="/assets/mastodon/prom-metrics.png" alt="ElasticSearch Metrics" >}}
All **five** of the exporters are configured and exposed. They are now providing a wealth of realtime information
on how the various Mastodon components are going. And if any of them start malfunctioning, or running out of steam, or simply taking the day
off, I will be able to see this either by certrain metrics going out of expected ranges, or by the exporter reporting that it cannot even
find the service at all (which we can also detect and turn into alarms, more on that later).
Pictured here (you should probably open it in full resolution unless you have hawk eyes), is an example of those metrics, of which
Prometheus is happy to handle several million at relatively high period of scraping, in my case every 10 seconds, it comes around and pulls
the data from these five exporters. While these metrics are human readable, they aren't very practical...
## Grafana
... so let's visualize them with an equally awesome tool: [[Grafana](https://grafana.com/)]. This tool provides operational dashboards for any
data that is stored here, there, or anywhere :) Grafana can render stuff from a plethora of backends, one popular and established one is
Prometheus. And as it turns out, as with Prometheus, lots of work has been done already with canonical, almost out-of-the-box, dashboards
that were contributed by folks in the field. n fact, every single one of the five exporters I installed, also have an accompanying
dashboard, sometimes even multiple to choose from! Grafana allows you to [[search and download](https://grafana.com/grafana/dashboards/)]
these from a corpus they provide, referring to them by their `id`, or alternatively downloading a JSON representation of the dashboard, for
example one that comes with the exporter, or one you find on GitHub.
For uBlog, I installed: [[Node Exporter](https://grafana.com/grafana/dashboards/1860-node-exporter-full/)], [[Postgres
Exporter](https://grafana.com/grafana/dashboards/9628-postgresql-database/)], [[Redis
Exporter](https://grafana.com/grafana/dashboards/11692-redis-dashboard-for-prometheus-redis-exporter-1-x/)], [[NGINX
Exporter](https://github.com/nginxinc/nginx-prometheus-exporter/blob/main/grafana/README.md)], and [[ElasticSearch
Exporter](https://grafana.com/grafana/dashboards/14191-elasticsearch-overview/)].
{{< image width="400px" float="right" src="/assets/mastodon/grafana-psql.png" alt="Grafana Postgres" >}}
To the right (top) you'll see a dashboard for PostgreSQL - it has lots of expert insights on how databases are used, how many read/write
operations (like SELECT and UPDATE/DELETE queries) are performed, and their respective latency expectations. What I find particularly useful
is the total amount of memory, CPU and disk activity. This allows me to see at a glance when it's time to break out
[[pgTune](https://github.com/gregs1104/pgtune)] to help change system settings for Postgres, or even inform me when it's time to move
the database to its own server rather than co-habitating with the other stuff running on this virtual machine. In my experience, stateful
systems are often the source of bottlenecks, so I take special care to monitor them and observe their performance over time. In particular,
slowness will be seen in Mastodon if the database is slow (sound familiar?).
{{< image width="400px" float="right" src="/assets/mastodon/grafana-redis.png" alt="Grafana Redis" >}}
Next, to the right (middle) you'll see a dashboard for Redis. This one shows me how full the Redis cache is (you can see the yellow line at
in the first graph there is when I restarted Redis to give it a `maxmemory` setting of 1GB), but also a high resolution overview of how many
operations it's doing. I can see that the server is spiky and upon closer inspection this is the `pfcount` command with a period of exactly
300 seconds, in other words something is spiking every 5min. I have a feeling that this might become an issue... and when it does, I'll get
to learn all about this elusive [[pfcount](https://redis.io/commands/pfcount/)] command. But until then, I can see the average time by
command: because Redis serves from RAM and this is a pretty quick server, I see the turnaround time for most queries to it in the
200-500 &micro;s range, wow!
{{< image width="400px" float="right" src="/assets/mastodon/grafana-node.png" alt="Grafana Node" >}}
But while these dashboards are awesome, what I find has saved me (and my ISP, IPng Networks) a metric tonne of time, is the most fundamental
monitoring in the Node Exporter dashboard, pictured to the right (bottom). What I really love about this dashboard, is that it shows at a
glance the parts of the _computer_ that are going to become a problem. If RAM is full (but not because of filesystem cache), or CPU is
running hot, or the network is flatlining at a certain throughput or packets/sec limit, these are all things that the applications running
_on_ the machine won't necessarily be able to show me more information on, but the _Node Exporter_ to the rescue: it has so many interesting
pieces of kernel and host operating system telemetry, that it is one of the single most useful tools I know. Every physical host and every
virtual machine, is exporting metrics into IPng Networks' prometheus instance, and it constantly shows me what to improve. Thanks, Obama!
## What's next
Careful readers will have noticed that this whole article talks about all sorts of interesting telemetry, observability metrics, and
dashboards, but they are all _common components_, and none of them touch on the internals of Mastodon's processes, like _Puma_ or _Sidekiq_
or the _API Services_ that Mastodon exposes. Consider this a cliff hanger (eh, mostly because I'm a bit busy at work and will need a little
more time).
In an upcoming post, I take a deep dive into this application-specific behavior and how to extract this telemetry (spoiler alert: it can be
done! and I will open source it!), as I've started to learn more about how Ruby gathers and exposes its own internals. Interestingly, one of
the things that I'll talk about is _NSA_ but not the American agency, rather a comical wordplay from some open source minded folks who have
blazed the path in making Ruby's Rail application performance metrics available to external observers. In a round-about way, I hope to
show how to plug these into Prometheus in the same way all the other exporters have already done so.
By the way: If you're looking for a home, feel free to sign up at [https://ublog.tech/](https://ublog.tech/) as I'm sure that having a bit
more load / traffic on this instance will allow me to learn (and in turn, to share with others)!