Files
ipng.ch/content/articles/2022-11-27-mastodon-3.md

329 lines
26 KiB
Markdown

---
date: "2022-11-27T00:01:14Z"
title: Mastodon - Part 3 - statsd and Prometheus
aliases:
- /s/articles/2022/11/27/mastodon-3.html
---
# About this series
{{< image width="200px" float="right" src="/assets/mastodon/mastodon-logo.svg" alt="Mastodon" >}}
I have seen companies achieve great successes in the space of consumer internet and entertainment industry. I've been feeling less
enthusiastic about the stronghold that these corporations have over my digital presence. I am the first to admit that using "free" services
is convenient, but these companies are sometimes taking away my autonomy and exerting control over society. To each their own of course, but
for me it's time to take back a little bit of responsibility for my online social presence, away from centrally hosted services and to
privately operated ones.
In my [[first post]({{< ref "2022-11-20-mastodon-1" >}})], I shared some thoughts on how I installed a Mastodon instance for myself. In a
[[followup post]({{< ref "2022-11-24-mastodon-2" >}})] I talked about its overall architecture and how one might use Prometheus to monitor
vital backends like Redis, Postgres and Elastic. But Mastodon _itself_ is also an application which can provide a wealth of telemetry using
a protocol called [[StatsD](https://github.com/statsd/statsd)].
In this post, I'll show how I tie these all together in a custom **Grafana Mastodon dashboard**!
## Mastodon Statistics
I noticed in the [[Mastodon docs](https://docs.joinmastodon.org/admin/config/#statsd)], that there's a one-liner breadcrumb that might be
easy to overlook, as it doesn't give many details:
> `STATSD_ADDR`: If set, Mastodon will log some events and metrics into a StatsD instance identified by its hostname and port.
Interesting, but what is this **statsd**, precisely? It's a simple text-only protocol that allows applications to send key-value pairs in
the form of `<metricname>:<value>|<type>` strings, that carry statistics of certain _type_ across the network, using either TCP or UDP.
Cool! To make use of these stats, I first add this `STATSD_ADDR` environment variable from the docs to my `.env.production` file, pointing
it at `localhost:9125`. This should make Mastodon apps emit some statistics of sorts.
I decide to a look at those packets, instructing tcpdump to show the contents of the packets (using the `-A` flag). Considering my
destination is `localhost`, I also know which interface to tcpdump on (using the `-i lo` flag). My first attempt is a little bit noisy,
because the packet dump contains the [[IPv4 header](https://en.wikipedia.org/wiki/IPv4)] (20 bytes) and [[UDP
header](https://en.wikipedia.org/wiki/User_Datagram_Protocol)] (8 bytes) as well, but sure enough, if I start reading from the 28th byte
onwards, I get human-readable data, in a bunch of strings that start with `Mastodon`:
```
pim@ublog:~$ sudo tcpdump -Ani lo port 9125 | grep 'Mastodon.production.' | sed -e 's,.*Mas,Mas,'
Mastodon.production.sidekiq.ActivityPub..ProcessingWorker.processing_time:16272|ms
Mastodon.production.sidekiq.ActivityPub..ProcessingWorker.success:1|c
Mastodon.production.sidekiq.scheduled_size:25|g
Mastodon.production.db.tables.accounts.queries.select.duration:1.8323479999999999|ms
Mastodon.production.web.ActivityPub.InboxesController.create.json.total_duration:33.856679|ms
Mastodon.production.web.ActivityPub.InboxesController.create.json.db_time:2.3943890118971467|ms
Mastodon.production.web.ActivityPub.InboxesController.create.json.view_time:1|ms
Mastodon.production.web.ActivityPub.InboxesController.create.json.status.202:1|c
...
```
**statsd** organizes its variable names in a dot-delimited tree hierarchy. I can clearly see some patterns in here, but why guess
when you're working with Open Source? Mastodon turns out to be using a popular Ruby library called the [[National Statsd
Agency](https://github.com/localshred/nsa)], a wordplay that I don't necessarily find all that funny. Naming aside though, this library
collects application level statistics in four main categories:
1. **:action_controller**: listens to the ActionController class that is extended into ApplicationControllers in Mastodon
1. **:active_record**: listens to any database (SQL) queries and emits timing information for them
1. **:active_support_cache**: records information regarding caching (Redis) queries, and emits timing information for them
1. **:sidekiq**: listens to Sidekiq middleware and emits information about queues, workers and their jobs
Using the library's [[docs](https://github.com/localshred/nsa)], I can clearly see the patterns described, for example in the SQL
recorder, the format will be `{ns}.{prefix}.tables.{table_name}.queries.{operation}.duration` where operation here means one of the
classic SQL query types, **SELECT**, **INSERT**, **UPDATE**, and **DELETE**. Similarly, in the cache recorder, the format will be
`{ns}.{prefix}.{operation}.duration` where operation denotes one of **read_hit**, **read_miss**, **generate**, **delete**, and so on.
Reading a bit more of the Mastodon and **statsd** library code, I learn that all variables emitted, the namespace `{ns}` is always a
combination of the application name and Ruby Rails environment, ie. **Mastodon.production**, and the `{prefix}` is the collector name,
one-of **web**, **db**, **cache** or **sidekiq**. If you're curious, the Mastodon code that initializes the **statsd** collectors lives
in `config/initializers/statsd.rb`. Alright, I conclude that this is all I need to know about the naming schema.
Moving along, **statsd** gives each variable name a [[metric type](https://github.com/statsd/statsd/blob/master/docs/metric_types.md)], which
can be counters **c**, timers **ms** and gauges **g**. In the packet dump above you can see examples of each of these. The counter type in
particular is a little bit different -- applications emit increments here - in the case of the ActivityPub.InboxesController, it
merely signaled to increment the counter by 1, not the absolute value of the counter. This is actually pretty smart, because now any number
of workers/servers can all contribute to a global counter, by each just sending incrementals which are aggregated by the receiver.
As a small critique, I happened to notice that in the sidekiq datastream, some of what I think are _counters_ are actually modeled as
_gauges_ (notably the **processed** and **failed** jobs from the workers). I will have to remember that, but after observing for a few
minutes, I think I can see lots of nifty data in here.
## Prometheus
At IPng Networks, we use Prometheus as a monitoring observability tool. It's worth pointing out that **statsd** has a few options itself to
visualise data, but considering I already have lots of telemetry in Prometheus and Grafana (see my [[previous post]({{< ref "2022-11-24-mastodon-2" >}})]), I'm going to take a bit of a detour, and convert these metrics into the Prometheus _exposition format_, so that
they can be scraped on a `/metrics` endpoint just like the others. This way, I have all monitoring in one place and using one tool.
Monitoring is hard enough as it is, and having to learn multiple tools is _no bueno_ :)
### Statsd Exporter: overview
The community maintains a Prometheus [[Statsd Exporter](https://github.com/prometheus/statsd_exporter)] on GitHub. This tool, like many
others in the exporter family, will connect to a local source of telemetry, and convert these into the required format for consumption by
Prometheus. If left completely unconfigured, it will simply receive the **statsd** UDP packets on the Mastodon side, and export them
verbatim on the Prometheus side. This will have a few downsides, notably when new operations or controllers come into existence, I would
have to explicitly make Prometheus aware of them.
I think we can do better, specifically because of the patterns noted above, I can condense the many metricnames from **statsd** into a few
carefully chosen Prometheus metrics, and add their variability into _labels_ in those time series. Taking SQL queries as an example, I see
that there's a metricname for each known SQL table in Mastodon (and there are many) and then for each table, a unique metric is created for
each of the four operations:
```
Mastodon.production.tables.{table_name}.queries.select.duration
Mastodon.production.tables.{table_name}.queries.insert.duration
Mastodon.production.tables.{table_name}.queries.update.duration
Mastodon.production.tables.{table_name}.queries.delete.duration
```
What if I could rewrite these by capturing the `{table_name}` label, and further observing that there are four query types (SELECT, INSERT,
UPDATE, DELETE), so possibly capturing those into a `{operation}` label, like so:
```
mastodon_db_operation_sum{operation="select",table="users"} 85.910
mastodon_db_operation_sum{operation="insert",table="accounts"} 112.70
mastodon_db_operation_sum{operation="update",table="web_push_subscriptions"} 6.55
mastodon_db_operation_sum{operation="delete",table="web_settings"} 9.668
mastodon_db_operation_count{operation="select",table="users"} 28790
mastodon_db_operation_count{operation="insert",table="accounts"} 610
mastodon_db_operation_count{operation="update",table="web_push_subscriptions"} 380
mastodon_db_operation_count{operation="delete",table="web_settings"} 4)
```
This way, there are only two Prometheus metric names **mastodon_db_operation_sum** and **mastodon_db_operation_count**. The first one counts
the cumulative time spent performing operations of that type on the table, and the second one counts the total amount of queries of that
type on the table. If I take the **rate()** of the count variable, I will have queries-per-second, and if I divide the **rate()** of the
time spent by the **rate()** of the count, I will have a running average time spent per query over that time interval.
### Statsd Exporter: configuration
The Prometheus folks also thought of this, _quelle surprise_, and the exporter provides incredibly powerful transformation functionality
between the hierarchical tree-form of **statsd** and the multi-dimensional labeling format of Prometheus. This is called the [[Mapping
Configuration](https://github.com/prometheus/statsd_exporter#metric-mapping-and-configuration)], and it allows either globbing or regular
expression matching of the input metricnames, turning them into labeled output metrics. Building further on our example for SQL queries, I
can create a mapping like so:
```
pim@ublog:~$ cat << EOF | sudo tee /etc/prometheus/statsd-mapping.yaml
mappings:
- match: Mastodon\.production\.db\.tables\.(.+)\.queries\.(.+)\.duration
match_type: regex
name: "mastodon_db_operation"
labels:
table: "$1"
operation: "$2"
```
This snippet will use a regular expression to match input metricnames, carefully escaping the dot-delimiters. Within the input, I will match
two groups, the segment following `tables.` holds the variable SQL Table name and the segment following `queries.` captures the SQL Operation.
Once this matches, the exporter will give the resulting variable in Prometheus simply the name `mastodon_db_operation` and add two labels
with the results of the regexp capture groups.
This one mapping I showed above will take care of _all of the metrics_ from the **database** collector, but are three other collectors in
Mastodon's Ruby world. In the interest of brevity, I'll not bore you with them in this article, as this is mostly a rinse-and-repeat jobbie.
But I have attached a copy of the complete mapping configuration at the end of this article. With all of that hard work on mapping
completed, I can now start the **statsd** exporter and see its beautifully formed and labeled timeseries show up on port 9102, the default
[[assigned port](https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exporters.md)] for this exporter type.
## Grafana
First let me start by saying I'm incredibly grateful to all the folks who have contributed to existing exporters and Grafana dashboards,
notably for [[Node Exporter](https://grafana.com/grafana/dashboards/1860-node-exporter-full/)],
[[Postgres Exporter](https://grafana.com/grafana/dashboards/9628-postgresql-database/)],
[[Redis Exporter](https://grafana.com/grafana/dashboards/11692-redis-dashboard-for-prometheus-redis-exporter-1-x/)],
[[NGINX Exporter](https://github.com/nginxinc/nginx-prometheus-exporter/blob/main/grafana/README.md)], and [[ElasticSearch
Exporter](https://grafana.com/grafana/dashboards/14191-elasticsearch-overview/)]. I'm ready to make a modest contribution back to this
wonderful community of monitoring dashboards, in the form of a Grafana dashboard for Mastodon!
Writing these is pretty rewarding. I'll take some time to explain a few Grafana concepts, although this is not meant to be a tutorial at
all and honestly, I'm not that good at this anyway. A good dashboard design goes from a 30'000ft overview of the most vital stats (not
necessarily graphs, but using visual clues like colors), and gives more information in so-called _drill-down_ dashboards that allow a much
finer grained / higher resolution picture of a specific part of the monitored application.
Seeing as the collectors are emitting four main parts of the application (remember, the `{prefix}` is one of **web**, **db**, **cache**, or
**sidekiq**), so I will give the dashboard the same structure. Also, I will try my best not to invent new terminology, as the application
developers have given their telemetry certain names, I will stick to these as well. Building a dashboard this way, application developers as
well as application operators will more likely be talking about the same things.
#### Mastodon Overview
{{< image src="/assets/mastodon/mastodon-stats-overview.png" alt="Mastodon Stats Overview" >}}
In the **Mastodon Overview**, each of the four collectors gets one or two stats-chips to present their highest level vitalsigns on. For a
web application, this will largely be requests per second, latency and possibly errors served. For a SQL database, this is typically the
issued queries and their latency. For a cache, the types of operation and again the latency observed in those operations. For Sidekiq (the
background worker pool that performs certain tasks in a queue on behalf of the system or user), I decide to focus on units of work, latency
and queue sizes.
Setting up the Prometheus queries in Grafana that fetch the data I need for these is typically going to be one of two things:
1. **QPS**: This is a rate of the monotonic increasing *_count*, over say one minute, I will see the average queries-per-second. Considering
the counters I created have labels that tell me what they are counting (for example in Puma, which API endpoint is being queried, and
what format that request is using), I can now elegantly aggregate those application-wide, like so:
> sum by (mastodon)(rate(mastodon_controller_duration_count[1m]))
2. **Latency**: The metrics in Prometheus aggregate a runtime monotonic increasing *_sum* which tells me about the total time spent doing
those things. It's pretty easy to calculate the running average latency over the last minute, by simply dividing the rate of time spent
by the rate of requests served, like so:
> sum by (mastodon)(rate(mastodon_controller_duration_sum[1m])) <br /> / <br />
> sum by (mastodon)(rate(mastodon_controller_duration_count[1m]))
To avoid clutter, I will leave the detailed full resolution view (like which _controller_ exactly, and what _format_ was queried, and which
_action_ was taken in the API) to a drilldown below. These two patterns are continud throughout the overview panel. Each QPS value is
rendered in dark blue, while the latency gets a special treatment on colors: I define a threshold which I consider "unacceptable", and then
create a few thresholds in Grafana to change the color as I approach that unacceptable max limit. By means of example, the Puma Latency
element I described above, will have a maximum acceptable latency of 250ms. If the latency is above 40% of that, the color will turn yellow;
above 60% it'll turn orange and above 80% it'll turn red. This provides a visual queue that something may be wrong.
#### Puma Controllers
The APIs that Mastodon offers are served by a component called [[Puma](https://github.com/puma/puma)], a simple, fast, multi-threaded, and
highly parallel HTTP 1.1 server for Ruby/Rack applications. The application running in Puma typically defines endpoints as so-called
ActionControllers, which Mastodon expands on in a derived concept called ApplicationControllers which each have a unique _controller_ name
(for example **ActivityPub.InboxesController**), an _action_ performed on them (for example **create**, **show** or **destroy**), and a
_format_ in which the data is handled (for example **html** or **json**). For each cardinal combination, a set of timeseries (counter, time
spent and latency quantiles) will exist. At the moment, there are about 53 API controllers, 8 actions, and 4 formats, which means there are
1'696 interesting metrics to inspect.
Drawing all of these in one graph quickly turns into an unmanageable mess, but there's a neat trick in Grafana: what if I could make these
variables selectable, and maybe pin them to exactly one value (for example, all information with a specific _controller_), that would
greatly reduce the amount of data we have to show. To implement this, the dashboard can pre-populate a variable based on a Prometheus query.
By means of example, to find the possible values of _controller_, I might take a look at all Prometheus metrics with name
**mastodon_controller_duration_count** and search for labels within them with a regular expression, for example **/controller="([^"]+)"/**.
What this will do is select all values in the group `"([^"]+)"` which may seem a little bit cryptic at first. The logic behind it is first
create a group between parenthesis `(...)` and then within that group match a set of characters `[...]` where the set is all characters
except the double-quote `[^"]` and that is repeated one-or-more times with the `+` suffix. So this will precisely select the string between
the double-quotes in the label: `controller="whatever"` will return `whatever` with this expression.
{{< image width="400px" float="right" src="/assets/mastodon/mastodon-puma-details.png" alt="Mastodon Puma Controller Details" >}}
After creating three of these, one for _controller_, _action_ and _format_, three new dropdown selectors appear at the top of my dashboard.
I will allow any combination of selections, including "All" of them (the default). Then, if I wish to drill down, I can pin one or more of
these variables to narrow down the total amount of timeseries to draw.
Shown to the right are two examples, one with "All" timeseries in the graph, which shows at least which one(s) are outliers. In this case,
the orange trace in the top graph showing more than average operations is the so-called **ActivityPub.InboxesController**.
I can find this out by hovering over the orange trace, the tooltip will show me the current name and value. Then, selecting this in the top
navigation dropdown **Puma Controller**, Grafana will narrow down the data for me to only those relevant to this controller, which is super
cool.
#### Drilldown in Grafana
{{< image width="400px" float="right" src="/assets/mastodon/mastodon-puma-controller.png" alt="Mastodon Puma Controller Details" >}}
Where the graph down below (called _Action Format Controller Operations_) showed all 1'600 or so timeseries, selecting the one controller I'm
interested in, shows me a much cleaner graph with only three timeseries, take a look to the right. Just by playing around with this data, I'm
learning a lot about the architecture of this application!
For example, I know that the only _action_ on this particular controller seems to be **create**, and there are three available _formats_ in which
this create action can be performed: **all**, **html** and **json**. And using the graph above that got me started on this little journey, I now
know that the traffic spike was for `controller=ActivityPub.InboxesController, action=create, format=all`. Dope!
#### SQL Details
{{< image src="/assets/mastodon/mastodon-sql-details.png" alt="Mastodon SQL Details" >}}
While I already have a really great [[Postgres Dashboard](https://grafana.com/grafana/dashboards/9628-postgresql-database/)] (the one that
came with Postgres _server_), it is also good to be able to see what the _client_ is experiencing. Here, we can drill down on two variables,
called `$sql_table` and `$sql_operation`. For each {table,operation}-tuple, the average, median and 90th/99th percentile latency are
available. So I end up with the following graphs and dials for tail latency: the top left graph shows me something interesting -- most
queries are SELECT, but the bottom graph shows me lots of tables (at the time of this article, Mastodon has 73 unique SQL tables). If I
wanted to answer the question "which table gets most SELECTs", I can drill down first by selecting the **SQL Operation** to be **select**,
after which I see decidedly less traces in the _SQL Table Operations_ graph. Further analysis shows that the two places that are mostly read
from are the tables called **statuses** and **accounts**. When I drill down using the selectors at the top of Grafana's dashboard UI, the tail
latency is automatically filtered to only that which is selected. If I were to see very slow queries at some point in the future, it'll be
very easy to narrow down exactly which table and which operation is the culprit.
#### Cache Details
{{< image src="/assets/mastodon/mastodon-cache-details.png" alt="Mastodon Cache Details" >}}
For the cache statistics collector, I learn there are a few different operators. Similar to Postgres, I already have a really cool
[[Redis Dashboard](https://grafana.com/grafana/dashboards/11692-redis-dashboard-for-prometheus-redis-exporter-1-x/)], for which I can see
the Redis _server_ view. But in Mastodon, I can now also see the _client_ view, and see when any of these operations spike in either
queries/sec (left graph), latency (middle graph), or tail latency for common operations (the dials on the right). This is bound to come in
handy at some point -- I already saw one or two spikes in the **generate** operation (see the blue spike in the screenshot above), which is
something to keep an eye on.
#### Sidekiq Details
{{< image src="/assets/mastodon/mastodon-sidekiq-details.png" alt="Mastodon Sidekiq Details" >}}
The single most
interesting thing in the Mastodon application is undoubtedly its _Sidekiq_ workers, the ones that do all sorts of system- and user-triggered
work such as distributing the posts to federated servers, prefetch links and media, and calculate trending tags, posts and links.
Sidekiq is a [[producer-consumer](https://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem)] system where new units of work (called
jobs) are written to a _queue_ in Redis by a producer (typically Mastodon's webserver Puma, or another Sidekiq task that needs something to
happen at some point in the future), and then consumed by one or more pools which execute the _worker_ jobs.
There are several queues defined in Mastodon, and each _worker_ has a name, a _failure_ and _success_ rate, and a running tally of how much
_processing_time_ they've spent executing this type of work. Sidekiq workers will consume jobs in
[[FIFO](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics))] order, and it has a finite amount of workers (by default on a small
instance it runs one worker with 25 threads). If you're interested in this type of provisioning, [[Nora
Tindall](https://nora.codes/post/scaling-mastodon-in-the-face-of-an-exodus/)] wrote a great article about it.
This drill-down dashboard shows all of the Sidekiq _worker_ types known to Prometheus, and can be selected at the top of the dashboard in
the dropdown called **Sidekiq Worker**. A total amount of worker jobs/second, as well as the running average time spent performing those jobs
is shown in the first two graphs. The three dials show the median, 90th percentile and 99th percentile latency of the work being performed.
If all threads are busy, new work is left in the queue, until a worker thread is available to execute the job. This will lead to a queue
delay on a busy server that is underprovisioned. For jobs that had to wait for an available thread to pick them up, the number of jobs per
queue, and the time in seconds that the jobs were waiting to be picked up by a worker, are shown in the two lists at bottom right.
And with that, as [[North of the Border](https://www.youtube.com/@NorthoftheBorder)] would say: _"We're on to the glamour shots!"_.
## What's next
I made a promise on two references that will be needed to successfully hook up Prometheus and Grafana to the `STATSD_ADDR` configuration for
Mastodon's Rails environment, and here they are:
* The Statsd Exporter mapping configuration file: [[/etc/prometheus/statsd-mapping.yaml](/assets/mastodon/statsd-mapping.yaml)]
* The Grafana Dashboard: [[grafana.com/dashboards/](https://grafana.com/grafana/dashboards/17492-mastodon-stats/)]
**As a call to action:** if you are running a larger instance and would allow me to take a look and learn from you, I'd be very grateful.
I'm going to monitor my own instance for a little while, so that I can start to get a feeling for where the edges of performance cliffs are,
in other words: How slow is _too slow_? How much load is _too much_? In an upcoming post, I will take a closer look at alerting in Prometheus,
so that I can catch these performance cliffs and make human operators aware of them by means of alerts, delivered via Telegram or Slack.
By the way: If you're looking for a home, feel free to sign up at [https://ublog.tech/](https://ublog.tech/) as I'm sure that having a bit
more load / traffic on this instance will allow me to learn (and in turn, to share with others)!