Files
ipng.ch/content/articles/2023-08-27-ansible-nginx.md

629 lines
32 KiB
Markdown

---
date: "2023-08-27T08:56:54Z"
title: 'Case Study: NGINX + Certbot with Ansible'
---
# About this series
{{< image width="200px" float="right" src="/assets/ansible/Ansible_logo.svg" alt="Ansible" >}}
In the distant past (to be precise, in November of 2009) I wrote a little piece of automation together with my buddy Paul, called
_PaPHosting_. The goal was to be able to configure common attributes like servername, config files, webserver and DNS configs in a
consistent way, tracked in Subversion. By the way despite this project deriving its name from the first two authors, our mutual buddy Jeroen
also started using it, and has written lots of additional cool stuff in the repo, as well as helped to move from Subversion to Git a few
years ago.
Michael DeHaan [[ref](https://www.ansible.com/blog/author/michael-dehaan)] founded Ansible in 2012, and by then our little _PaPHosting_
project, which was written as a set of bash scripts, had sufficiently solved our automation needs. But, as is the case with most home-grown
systems, over time I kept on seeing more and more interesting features and integrations emerge, solid documentation, large user group, and
eventually I had to reconsider our 1.5K LOC of Bash and ~16.5K files under maintenance, and in the end, I settled on Ansible.
```
commit c986260040df5a9bf24bef6bfc28e1f3fa4392ed
Author: Pim van Pelt <pim@ipng.nl>
Date: Thu Nov 26 23:13:21 2009 +0000
pim@squanchy:~/src/paphosting$ find * -type f | wc -l
16541
pim@squanchy:~/src/paphosting/scripts$ wc -l *push.sh funcs
132 apache-push.sh
148 dns-push.sh
92 files-push.sh
100 nagios-push.sh
178 nginx-push.sh
271 pkg-push.sh
100 sendmail-push.sh
76 smokeping-push.sh
371 funcs
1468 total
```
In a [[previous article]({% post_url 2023-03-17-ipng-frontends %})], I talked about having not one but a cluster of NGINX servers that would
each share a set of SSL certificates and pose as a reversed proxy for a bunch of websites. At the bottom of that article, I wrote:
> The main thing that's next is to automate a bit more of this. IPng Networks has an Ansible controller, which I'd like to add ...
> but considering Ansible is its whole own elaborate bundle of joy, I'll leave that for maybe another article.
**Tadaah.wav** that article is here! This is by no means an introduction or howto to Ansible. For that, please take a look at the
incomparable Jeff Geerling [[ref](https://www.jeffgeerling.com/)] and his book: [[Ansible for Devops](https://www.ansiblefordevops.com/)]. I
bought and read this book, and I highly recommend it.
## Ansible: Playbook Anatomy
The first thing I do is install four Debian Bookworm virtual machines, two in Amsterdam, one in Geneva and one in Zurich. These will be my
first group of NGINX servers, that are supposed to be my geo-distributed frontend pool. I don't do any specific configuration or
installation of packages, I just leave whatever deboostrap gives me, which is a relatively lean install with 8 vCPUs, 16GB of memory, a 20GB
boot disk and a 30G second disk for caching and static websites.
Ansible is a simple, but powerful, server and configuration management tool (with a few other tricks up its sleeve). It consists of an
_inventory_ (the hosts I'll manage), that are put in one or more _groups_, there is a registery of _variables_ (telling me things about
those hosts and groups), and an elaborate system to run small bits of automation, called _tasks_ organized in things called _Playbooks_.
### NGINX Cluster: Group Basics
First of all, I create an Ansible _group_ called **nginx** and I add the following four freshly installed virtual machine hosts to it:
```
pim@squanchy:~/src/ipng-ansible$ cat << EOF | tee -a inventory/nodes.yml
nginx:
hosts:
nginx0.chrma0.net.ipng.ch:
nginx0.chplo0.net.ipng.ch:
nginx0.nlams1.net.ipng.ch:
nginx0.nlams2.net.ipng.ch:
EOF
```
I have a mixture of Debian and OpenBSD machines at IPng Networks, so I will add this group **nginx** as a child to another group called
**debian**, so that I can run "common debian tasks", such as installing Debian packages that I want all of my servers to have, adding users
and their SSH key for folks who need access, installing and configuring the firewall and things like Borgmatic backups.
I'm not going to go into all the details here for the **debian** playbook, though. It's just there to make the base system consistent across
all servers (bare metal or virtual). The one thing I'll mention though, is that the **debian** playbook will see to it that the correct
users are created, with their SSH pubkey, and I'm going to first use this feature by creating two users:
1. `lego`: As I described in a [[post on DNS-01]({% post_url 2023-03-24-lego-dns01 %})], IPng has a certificate machine that answers Let's
Encrypt DNS-01 challenges, and its job is to regularly prove ownership of my domains, and then request a (wildcard!) certificate.
Once that renews, copy the certificate to all NGINX machines. To do that copy, `lego` needs an account on these machines, it needs
to be able to write the certs and issue a reload to the NGINX server.
1. `drone`: Most of my websites are static, for example `ipng.ch` is generated by Jekyll. I typically write an article on my laptop, and
once I'm happy with it, I'll git commit and push it, after which a _Continuous Integration_ system called [[Drone](https://drone.io)]
gets triggered, builds the website, runs some tests, and ultimately copies it out to the NGINX machines. Similar to the first user,
this second user must have an account and the ability to write its web data to the NGINX server in the right spot.
That explains the following:
```yaml
pim@squanchy:~/src/ipng-ansible$ cat << EOF | tee group_vars/nginx.yml
---
users:
lego:
comment: Lets Encrypt
password: "!"
groups: [ lego ]
drone:
comment: Drone CI
password: "!"
groups: [ www-data ]
sshkeys:
lego:
- key: ecdsa-sha2-nistp256 <hidden>
comment: lego@lego.net.ipng.ch
drone:
- key: ecdsa-sha2-nistp256 <hidden>
comment: drone@git.net.ipng.ch
```
I note that the `users` and `sshkeys` used here are dictionaries, and that the `users` role defines a few default accounts like my own
account `pim`, so writing this to the **group_vars** means that these new entries are applied to all machines that belong to the group
**nginx**, so they'll get these users created _in addition to_ the other users in the dictionary. Nifty!
### NGINX Cluster: Config
I wanted to be able to conserve IP addresses, and just a few months ago, had a discussion with some folks at Coloclue where we shared the
frustration that what was hip in the 90s (go to RIPE NCC and ask for a /20, justifying that with "I run SSL websites") is somehow still
being used today, even though that's no longer required, or in fact, desirable. So I take one IPv4 and IPv6 address and will use a TLS
extension called _Server Name Indication_ or [[SNI](https://en.wikipedia.org/wiki/Server_Name_Indication)], designed in 2003 (**20 years
old today**), which you can see described in [[RFC 3546](https://datatracker.ietf.org/doc/html/rfc3546)].
Folks who try to argue they need multiple IPv4 addresses because they run multiple SSL websites are somewhat of a trigger to me, so this
article doubles up as a "how to do SNI and conserve IPv4 addresses".
I will group my websites that share the same SSL certificate, and I'll call these things _clusters_. An IPng NGINX Cluster:
* is identified by a name, for example `ipng` or `frysix`
* is served by one or more NGINX servers, for example `nginx0.chplo0.ipng.ch` and `nginx0.nlams1.ipng.ch`
* serves one or more distinct websites, for example `www.ipng.ch` and `nagios.ipng.ch` and `go.ipng.ch`
* has exactly one SSL certificate, which should cover all of the website(s), preferably using wildcard certs, for example `*.ipng.ch,
ipng.ch`
And then, I define several clusters this way, in the following configuration file:
```yaml
pim@squanchy:~/src/ipng-ansible$ cat << EOF | tee vars/nginx.yml
---
nginx:
clusters:
ipng:
members: [ nginx0.chrma0.net.ipng.ch, nginx0.chplo0.net.ipng.ch, nginx0.nlams1.net.ipng.ch, nginx0.nlams2.net.ipng.ch ]
ssl_common_name: ipng.ch
sites:
ipng.ch:
nagios.ipng.ch:
go.ipng.ch:
frysix:
members: [ nginx0.nlams1.net.ipng.ch, nginx0.nlams2.net.ipng.ch ]
ssl_common_name: frys-ix.net
sites:
frys-ix.net:
```
This way I can neatly group the websites (eg. the **ipng** websites) together, call them by name, and immediately see which servers are going to
be serving them using which certificate common name. For future expansion (hint: an upcoming article on monitoring), I decide to make the
**sites** element here a _dictionary_ with only keys and no values as opposed to a _list_, because later I will want to add some bits and
pieces of information for each website.
### NGINX Cluster: Sites
As is common with NGINX, I will keep a list of websites in the directory `/etc/nginx/sites-available/` and once I need a given machine to
actually serve that website, I'll symlink it from `/etc/nginx/sites-enabled/`. In addition, I decide to add a few common configuration
snippets, such as logging and SSL/TLS parameter files and options, which allow the webserver to score relatively high on SSL certificate
checker sites. It helps to keep the security buffs off my case.
So I decide on the following structure, each file to be copied to all nginx machines in `/etc/nginx/`:
```
roles/nginx/files/conf.d/http-log.conf
roles/nginx/files/conf.d/ipng-headers.inc
roles/nginx/files/conf.d/options-ssl-nginx.inc
roles/nginx/files/conf.d/ssl-dhparams.inc
roles/nginx/files/sites-available/ipng.ch.conf
roles/nginx/files/sites-available/nagios.ipng.ch.conf
roles/nginx/files/sites-available/go.ipng.ch.conf
roles/nginx/files/sites-available/go.ipng.ch.htpasswd
roles/nginx/files/sites-available/...
```
In order:
* `conf.d/http-log.conf` defines a custom logline type called `upstream` that contains a few interesting additional items that show me
the performance of NGINX:
> log_format upstream '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' 'rt=$request_time uct=$upstream_connect_time uht=$upstream_header_time urt=$upstream_response_time';
* `conf.d/ipng-headers.inc` adds a header served to end-users from this NGINX, that reveals the instance that served the request.
Debugging a cluster becomes a lot easier if you know which server served what:
> add_header X-IPng-Frontend $hostname always;
* `conf.d/options-ssl-nginx.inc` and `conf.d/ssl-dhparams.inc` are files borrowed from Certbot's NGINX configuration, and ensure the best
TLS and SSL session parameters are used.
* `sites-available/*.conf` are the configuration blocks for the port-80 (HTTP) and port-443 (SSL certificate) websites. In the interest of
brevity I won't copy them here, but if you're curious I showed a bunch of these in a [[previous article]({% post_url
2023-03-17-ipng-frontends %})]. These per-website config files sensibly include the SSL defaults, custom IPng headers and `upstream` log
format.
### NGINX Cluster: Let's Encrypt
I figure the single most important thing to get right is how to enable multiple groups of websites, including SSL certificates, in multiple
_Clusters_ (say `ipng` and `frysix`), to be served using different SSL certificates, but on the same IPv4 and IPv6 address, using _Server
Name Indication_ or SNI. Let's first take a look at building these two of these certificates, one for [[IPng Networks](https://ipng.ch)] and
one for [[FrysIX](https://frys-ix.net/)], the internet exchange with Frysian roots, which incidentally offers free 1G, 10G, 40G and 100G
ports all over the Amsterdam metro. My buddy Arend and I are running that exchange, so please do join it!
I described the usual `HTTP-01` certificate challenge a while ago in [[this article]({% post_url 2023-03-17-ipng-frontends %})], but I
rarely use it because I've found that once installed, `DNS-01` is vastly superior. I wrote about the ability to request a single certificate
with multiple _wildcard_ entries in a [[DNS-01 article]({% post_url 2023-03-24-lego-dns01 %})], so I'm going to save you the repetition, and
simply use `certbot`, `acme-dns` and the `DNS-01` challenge type, to request the following _two_ certificates:
```bash
lego@lego:~$ certbot certonly --config-dir /home/lego/acme-dns --logs-dir /home/lego/logs \
--work-dir /home/lego/workdir --manual --manual-auth-hook /home/lego/acme-dns/acme-dns-auth.py \
--preferred-challenges dns --debug-challenges \
-d ipng.ch -d *.ipng.ch -d *.net.ipng.ch \
-d ipng.nl -d *.ipng.nl \
-d ipng.eu -d *.ipng.eu \
-d ipng.li -d *.ipng.li \
-d ublog.tech -d *.ublog.tech \
-d as8298.net -d *.as8298.net \
-d as50869.net -d *.as50869.net
lego@lego:~$ certbot certonly --config-dir /home/lego/acme-dns --logs-dir /home/lego/logs \
--work-dir /home/lego/workdir --manual --manual-auth-hook /home/lego/acme-dns/acme-dns-auth.py \
--preferred-challenges dns --debug-challenges \
-d frys-ix.net -d *.frys-ix.net
```
First off, while I showed how to get these certificates by hand, actually generating these two commands is easily doable in Ansible (which
I'll show at the end of this article!) I defined which cluster has which main certificate name, and which websites it's wanting to serve.
Looking at `vars/nginx.yml`, it becomes quickly obvious how I can automate this. Using a relatively straight forward construct, I can let
Ansible create for me a list of commandline arguments programmatically:
1. Initialize a variable `CERT_ALTNAMES` as a list of `nginx.clusters.ipng.ssl_common_name` and its wildcard, in other words `[ipng.ch,
*.ipng.ch]`.
1. As a convenience, tack onto the `CERT_ALTNAMES` list any entries in the `nginx.clusters.ipng.ssl_altname`, such as `[*.net.ipng.ch]`.
1. Then looping over each entry in the `nginx.clusters.ipng.sites` dictionary, use `fnmatch` to match it against any entries in the
`CERT_ALTNAMES` list:
* If it matches, for example with `go.ipng.ch`, skip and continue. This website is covered already by an altname.
* If it doesn't match, for example with `ublog.tech`, simply add it and its wildcard to the `CERT_ALTNAMES` list: `[ublog.tech, *.ublog.tech]`.
Now, the first time I run this for a new cluster (which has never had a certificate issued before), `certbot` will ask me to ensure the correct
`_acme-challenge` records are in each respective DNS zone. After doing that, it will issue two separate certificates and install a cronjob
that will periodically check the age, and renew the certificate(s) when they are up for renewal. In a post-renewal hook, I will create a
script that copies the new certificate to the NGINX cluster (using the `lego` user + SSH key that I defined above).
```bash
lego@lego:~$ find /home/lego/acme-dns/live/ -type f
/home/lego/acme-dns/live/README
/home/lego/acme-dns/live/frys-ix.net/README
/home/lego/acme-dns/live/frys-ix.net/chain.pem
/home/lego/acme-dns/live/frys-ix.net/privkey.pem
/home/lego/acme-dns/live/frys-ix.net/cert.pem
/home/lego/acme-dns/live/frys-ix.net/fullchain.pem
/home/lego/acme-dns/live/ipng.ch/README
/home/lego/acme-dns/live/ipng.ch/chain.pem
/home/lego/acme-dns/live/ipng.ch/privkey.pem
/home/lego/acme-dns/live/ipng.ch/cert.pem
/home/lego/acme-dns/live/ipng.ch/fullchain.pem
```
The crontab entry that Certbot normally installs makes soms assumptions on directory and which user is running the renewal. I am not a fan of
having the `root` user do this, so I've changed it to this:
```bash
lego@lego:~$ cat /etc/cron.d/certbot
0 */12 * * * lego perl -e 'sleep int(rand(43200))' && certbot -q renew \
--config-dir /home/lego/acme-dns --logs-dir /home/lego/logs \
--work-dir /home/lego/workdir \
--deploy-hook "/home/lego/bin/certbot-distribute"
```
And some pretty cool magic happens with this `certbot-distribute` script. When `certbot` has successfully received a new
certificate, it'll set a few environment variables and execute the deploy hook with them:
* ***RENEWED_LINEAGE***: will point to the config live subdirectory (eg. `/home/lego/acme-dns/live/ipng.ch`) containing the new
certificates and keys
* ***RENEWED_DOMAINS*** will contain a space-delimited list of renewed certificate domains (eg. `ipng.ch *.ipng.ch *.net.ipng.ch`)
Using the first of those two things, I guess it becomes straight forward to distribute the new certs:
```bash
#!/bin/sh
CERT=$(basename $RENEWED_LINEAGE)
CERTFILE=$RENEWED_LINEAGE/fullchain.pem
KEYFILE=$RENEWED_LINEAGE/privkey.pem
if [ "$CERT" = "ipng.ch" ]; then
MACHS="nginx0.chrma0.ipng.ch nginx0.chplo0.ipng.ch nginx0.nlams1.ipng.ch nginx0.nlams2.ipng.ch"
elif [ "$CERT" = "frys-ix.net" ]; then
MACHS="nginx0.nlams1.ipng.ch nginx0.nlams2.ipng.ch"
else
echo "Unknown certificate $CERT, do not know which machines to copy to"
exit 3
fi
for MACH in $MACHS; do
fping -q $MACH 2>/dev/null || {
echo "$MACH: Skipping (unreachable)"
continue
}
echo $MACH: Copying $CERT
scp -q $CERTFILE $MACH:/etc/nginx/certs/$CERT.crt
scp -q $KEYFILE $MACH:/etc/nginx/certs/$CERT.key
echo $MACH: Reloading nginx
ssh $MACH 'sudo systemctl reload nginx'
done
```
There are a few things to note, if you look at my little shell script. I already kind of know which `CERT` belongs to which `MACHS`,
because this was configured in `vars/nginx.yml`, where I have a cluster name, say `ipng`, which conveniently has two variables, one called
`members` which is a list of machines, and the second is `ssl_common_name` which is `ipng.ch`. I think that I can find a way to let
Ansible generate this file for me also, whoot!
### Ansible: NGINX
Tying it all together (frankly, a tiny bit surprised you're still reading this!), I can now offer an Ansible role that automates all of this.
```yaml
{%- raw %}
pim@squanchy:~/src/ipng-ansible$ cat << EOF | tee roles/nginx/tasks/main.yml
- name: Install Debian packages
ansible.builtin.apt:
update_cache: true
pkg: [ nginx, ufw, net-tools, apache2-utils, mtr-tiny, rsync ]
- name: Copy config files
ansible.builtin.copy:
src: "{{ item }}"
dest: "/etc/nginx/"
owner: root
group: root
mode: u=rw,g=r,o=r
directory_mode: u=rwx,g=rx,o=rx
loop: [ conf.d, sites-available ]
notify: Reload nginx
- name: Add cluster
ansible.builtin.include_tasks:
file: cluster.yml
loop: "{{ nginx.clusters | dict2items }}"
loop_control:
label: "{{ item.key }}"
EOF
pim@squanchy:~/src/ipng-ansible$ cat << EOF > roles/nginx/handlers/main.yml
- name: Reload nginx
ansible.builtin.service:
name: nginx
state: reloaded
EOF
{% endraw %}
```
The first task installs the Debian packages I'll want to use. The `apache2-utils` package is to create and maintain `htpasswd` files and
some other useful things. The `rsync` package is needed to accept both website data from the `drone` continuous integration user, as well as
certificate data from the `lego` user.
The second task copies all of the (static) configuration files onto the machine, populating `/etc/nginx/conf.d/` and
`/etc/nginx/sites-available/`. It uses a `notify` stanza to make note if any of these files (notably the ones in `conf.d/`) have changed, and
if so, remember to invoke a _handler_ to reload the running NGINX to pick up those changes later on.
Finally, the third task branches out and executes the tasks defined in `tasks/cluster.yml` one for each NGINX cluster (in my case, `ipng`
and then `frysix`):
```yaml
{%- raw %}
pim@squanchy:~/src/ipng-ansible$ cat << EOF | tee roles/nginx/tasks/cluster.yml
- name: "Enable sites for cluster {{ item.key }}"
ansible.builtin.file:
src: "/etc/nginx/sites-available/{{ sites_item.key }}.conf"
dest: "/etc/nginx/sites-enabled/{{ sites_item.key }}.conf"
owner: root
group: root
state: link
loop: "{{ (nginx.clusters[item.key].sites | default({}) | dict2items) }}"
when: inventory_hostname in nginx.clusters[item.key].members | default([])
loop_control:
loop_var: sites_item
label: "{{ sites_item.key }}"
notify: Reload nginx
EOF
{% endraw %}
```
This task is a bit more complicated, so let me go over it from outwards facing in. The thing that called us, already has a loop variable
called `item` which has a key (`ipng`) and a value (the whole cluster defined under `nginx.clusters.ipng`). Now if I take that
`item.key` variable and look at its `sites` dictionary (in other words: `nginx.clusters.ipng.sites`, I can create another loop over all the
sites belonging to that cluster. Iterating over a dictionary in Ansible is done with a filter called `dict2items`, and because technically
the cluster could have zero sites, I can ensure the `sites` dictionary defaults to the empty dictionary `{}`. Phew!
Ansible is running this for each machine, and of course I only want to execute this block, if the given machine (which is referenced as
`inventory_hostname` occurs in the clusters' `members` list. If not: skip, if yes: go! which is what the `when` line does.
The loop itself then runs for each site in the `sites` dictionary, allowing the `loop_control` to give that loop variable a unique name
called `sites_item`, and when printing information on the CLI, using the `label` set to the `sites_item.key` variable (eg. `frys-ix.net`)
rather than the whole dictionary belonging to it.
With all of that said, the inner loop is easy: create a (sym)link for each website config file from `sites-available` to `sites-enabled` and
if new links are created, invoke the _Reload nginx_ handler.
### Ansible: Certbot
***But what about that LEGO stuff?*** Fair question. The two scripts I described above (one to create the certbot certificate, and another
to copy it to the correct machines), both need to be generated and copied to the right places, so here I go, appending to the tasks:
```yaml
{%- raw %}
pim@squanchy:~/src/ipng-ansible$ cat << EOF | tee -a roles/nginx/tasks/main.yml
- name: Create LEGO directory
ansible.builtin.file:
path: "/etc/nginx/certs/"
owner: lego
group: lego
mode: u=rwx,g=rx,o=
- name: Add sudoers.d
ansible.builtin.copy:
src: sudoers
dest: "/etc/sudoers.d/lego-ipng"
owner: root
group: root
- name: Generate Certbot Distribute script
delegate_to: lego.net.ipng.ch
run_once: true
ansible.builtin.template:
src: certbot-distribute.j2
dest: "/home/lego/bin/certbot-distribute"
owner: lego
group: lego
mode: u=rwx,g=rx,o=
- name: Generate Certbot Cluster scripts
delegate_to: lego.net.ipng.ch
run_once: true
ansible.builtin.template:
src: certbot-cluster.j2
dest: "/home/lego/bin/certbot-{{ item.key }}"
owner: lego
group: lego
mode: u=rwx,g=rx,o=
loop: "{{ nginx.clusters | dict2items }}"
EOF
pim@squanchy:~/src/ipng-ansible$ cat << EOF | tee roles/nginx/files/sudoers
## *** Managed by IPng Ansible ***
#
%lego ALL=(ALL) NOPASSWD: /usr/bin/systemctl reload nginx
EOF
{% endraw -%}
```
The first task creates `/etc/nginx/certs` which will be owned by the user `lego`, and that's where Certbot will rsync the certificates after
renewal. The second task then allows `lego` user to issue a `systemctl reload nginx` so that NGINX can pick up the certificates once they've
changed on disk.
The third task generated the `certbot-distribute` script, that, depending on the common name of the certificate (for example `ipng.ch` or
`frys-ix.net`), knows which NGINX machines to copy it to. Its logic is pretty similar to the plain-old shellscript I started with, but does
have a few variable expansions. If you'll recall, that script had hard coded way to assemble the MACHS variable, which can be replaced now:
```bash
{%- raw %}
# ...
{% for cluster_name, cluster in nginx.clusters.items() | default({}) %}
{% if not loop.first%}el{% endif %}if [ "$CERT" = "{{ cluster.ssl_common_name }}" ]; then
MACHS="{{ cluster.members | join(' ') }}"
{% endfor %}
else
echo "Unknown certificate $CERT, do not know which machines to copy to"
exit 3
fi
{% endraw %}
```
One common Ansible trick here is to detect if a given loop has just begun (in which case `loop.first` will be true), or if this is the last
element in the loop (in which case `loop.last` will be true). I can use this to emit the `if` (first) versus `elif` (not first) statements.
Looking back at what I wrote in this _Certbot Distribute_ task, you'll see I used two additional configuration elements:
1. ***run_once***: Since there are potentially many machines in the **nginx** _Group_, by default Ansible will run this task for each machine. However, the Certbot cluster and distribute scripts really only need to be generated once per _Playbook_ execution, which is determined by this `run_once` field.
1. ***delegate_to***: This task should be executed not on an NGINX machine, rather instead on the `lego.net.ipng.ch` machine, which is specified by the `delegate_to` field.
#### Ansible: lookup example
And now for the _pièce de résistance_, the fourth and final task generates a shell script that captures for each cluster the primary name
(called `ssl_common_name`) and the list of alternate names which will turn into full commandline to request a certificate with all wildcard
domains added (eg. `ipng.ch` and `*.ipng.ch`). To do this, I decide to create an Ansible [[Lookup
Plugin](https://docs.ansible.com/ansible/latest/plugins/lookup.html)]. This lookup will simply return **true** if a given sitename is
covered by any of the existing certificace altnames, including wildcard domains, for which I can use the standard python `fnmatch`.
First, I can create the lookup plugin in a a well-known directory, so Ansible can discover it:
```
pim@squanchy:~/src/ipng-ansible$ cat << EOF | tee roles/nginx/lookup_plugins/altname_match.py
import ansible.utils as utils
import ansible.errors as errors
from ansible.plugins.lookup import LookupBase
import fnmatch
class LookupModule(LookupBase):
def __init__(self, basedir=None, **kwargs):
self.basedir = basedir
def run(self, terms, variables=None, **kwargs):
sitename = terms[0]
cert_altnames = terms[1]
for altname in cert_altnames:
if sitename == altname:
return [True]
if fnmatch.fnmatch(sitename, altname):
return [True]
return [False]
EOF
```
The Python class here will compare the website name in `terms[0]` with a list of altnames given in
`terms[1]` and will return True either if a literal match occured, or if the altname `fnmatch` with the sitename.
It will return False otherwise. Dope! Here's how I use it in the `certbot-cluster` script, which is
starting to get pretty fancy:
```bash
{%- raw %}
pim@squanchy:~/src/ipng-ansible$ cat << EOF | tee roles/nginx/templates/certbot-cluster.j2
#!/bin/sh
###
### {{ ansible_managed }}
###
{% set cluster_name = item.key %}
{% set cluster = item.value %}
{% set sites = nginx.clusters[cluster_name].sites | default({}) %}
#
# This script generates a certbot commandline to initialize (or re-initialize) a given certificate for an NGINX cluster.
#
### Metadata for this cluster:
#
# {{ cluster_name }}: {{ cluster }}
{% set cert_altname = [ cluster.ssl_common_name, '*.' + cluster.ssl_common_name ] %}
{% do cert_altname.extend(cluster.ssl_altname|default([])) %}
{% for sitename, site in sites.items() %}
{% set altname_matched = lookup('altname_match', sitename, cert_altname) %}
{% if not altname_matched %}
{% do cert_altname.append(sitename) %}
{% do cert_altname.append("*."+sitename) %}
{% endif %}
{% endfor %}
# CERT_ALTNAME: {{ cert_altname | join(' ') }}
#
###
certbot certonly --config-dir /home/lego/acme-dns --logs-dir /home/lego/logs --work-dir /home/lego/workdir \
--manual --manual-auth-hook /home/lego/acme-dns/acme-dns-auth.py \
--preferred-challenges dns --debug-challenges \
{% for domain in cert_altname %}
-d {{ domain }}{% if not loop.last %} \{% endif %}
{% endfor %}
EOF
{% endraw %}
```
Ansible provides a lot of templating and logic evaluation in its Jinja2 templating language, but it isn't really a programming language.
That said, from the top, here's what happens:
* I set three variables, `cluster_name`, `cluster` (the dictionary with the cluster config) and as a shorthand `sites` which is a
dictionary of sites, defaulting to `{}` if it doesn't exist.
* I'll print the cluster name and the cluster config for posterity. Who knows, eventually I'll be debugging this anyway :-)
* Then comes the main thrust, the simple loop that I described above, but in Jinja2:
* Initialize the `cert_altname` list with the `ssl_common_name` and its wildcard variant, optionally extending it with the list
of altnames in `ssl_altname`, if it's set.
* For each site in the sites dictionary, invoke the lookup and capture its (boolean) result in `altname_matched`.
* If the match failed, we have a new domain, so add it and its wildcard variant to the `cert_altname` list, I use the `do`
Jinja2 extension there comes from package `jinja2.ext.do`.
* At the end of this, all of these website names have been reduced to their domain+wildcard variant, which I can loop over to emit
the `-d` flags to `certbot` at the bottom of the file.
And with that, I can generate both the certificate request command, and distribute the resulting
certificates to those NGINX servers that need them.
## Results
{{< image src="/assets/ansible/ansible-run.png" alt="Ansible Run" >}}
I'm very pleased with the results. I can clearly see that the two servers that I assigned to this
NGINX cluster (the two in Amsterdam) got their sites enabled, whereas the other two (Zurich and
Geneva) were skipped. I can also see that the new certbot request scripts was generated and the
existing certbot-distribute script was updated (to be aware of where to copy a renewed cert for this
cluster). And, in the end only the two relevant NGINX servers were reloaded, reducing overall risk.
One other way to show that the very same IPv4 and IPv6 address can be used to serve multiple
distinct multi-domain/wildcard SSL certificates, using this _Server Name Indication_ (SNI, which, I
repeat, has been available **since 2003** or so), is this:
```bash
pim@squanchy:~$ HOST=nginx0.nlams1.ipng.ch
pim@squanchy:~$ PORT=443
pim@squanchy:~$ SERVERNAME=www.ipng.ch
pim@squanchy:~$ openssl s_client -connect $HOST:$PORT -servername $SERVERNAME </dev/null 2>/dev/null \
| openssl x509 -text | grep DNS: | sed -e 's,^ *,,'
DNS:*.ipng.ch, DNS:*.ipng.eu, DNS:*.ipng.li, DNS:*.ipng.nl, DNS:*.net.ipng.ch, DNS:*.ublog.tech,
DNS:as50869.net, DNS:as8298.net, DNS:ipng.ch, DNS:ipng.eu, DNS:ipng.li, DNS:ipng.nl, DNS:ublog.tech
pim@squanchy:~$ SERVERNAME=www.frys-ix.net
pim@squanchy:~$ openssl s_client -connect $HOST:$PORT -servername $SERVERNAME </dev/null 2>/dev/null \
| openssl x509 -text | grep DNS: | sed -e 's,^ *,,'
DNS:*.frys-ix.net, DNS:frys-ix.net
```
Ansible is really powerful, and once I got to know it a little bit, will readily admit it's way
cooler than PaPhosting ever was :)
## What's Next
If you remember, I wrote that the `nginx.clusters.*.sites` would not be a list but rather a
dictionary, because I'd like to be able to carry other bits of information. And if you take a close
look at my screenshot above, you'll see I revealed something about Nagios... so in an upcoming post
I'd like to share how IPng Networks arranges its Nagios environment, and I'll use the NGINX configs
here to show how I automatically monitor all servers participating in an NGINX _Cluster_, both for
pending certificate expiry, which should not generally happen precisely due to the automation here,
but also in case any backend server takes the day off.
Stay tuned! Oh, and if you're good at Ansible and would like to point out how silly I approach
things, please do drop me a line on Mastodon, where you can reach me on
[[@IPngNetworks@ublog.tech](https://ublog.tech/@IPngNetworks)].