Compare commits

...

2 Commits

Author SHA1 Message Date
26397d69c6 Readability pass, ready for publication
All checks were successful
continuous-integration/drone/push Build is passing
2024-10-21 18:58:27 +02:00
388293baef Add FreeIX #2 article 2024-10-21 18:10:02 +02:00
3 changed files with 786 additions and 5 deletions

View File

@ -91,7 +91,7 @@ their traffic to these remote internet exchanges.
There are two types of BGP neighbor adjacency:
1. ***Members***: these are {ip-address,AS}-tuples which FreeIX has explicitly configured. Learned prefixes are added
to as-set AS50869:AS-MEMBERS. Members receive _all_ prefixes from FreeIX, each annotated with BGP **informational**
to as-set AS50869:AS-MEMBERS. Members receive _some or all_ prefixes from FreeIX, each annotated with BGP **informational**
communities, and members can drive certain behavior with BGP **action** communities.
1. ***Peers***: these are all other entities with whom FreeIX has an adjacency at public internet exchanges or private
@ -195,12 +195,12 @@ network interconnects:
* `(50869,3020,1)`: Inhibit Action (30XX), Country (3020), Switzerland (1)
* `(50869,3030,1308)`: Inhibit Action (30XX), IXP (3030), PeeringDB IXP for LS-IX (1308)
Further actions can be placed on a per-remote-neighbor basis:
Four actions can be placed on a per-remote-asn basis:
* `(50869,3040,13030)`: Inhibit Action (30XX), AS (3040), Init7 (AS13030)
* `(50869,3041,6939)`: Prepend Action (30XX), Prepend Once (3041), Hurricane Electric (AS6939)
* `(50869,3042,12859)`: Prepend Action (30XX), Prepend Twice (3042), BIT BV (AS12859)
* `(50869,3043,8283)`: Prepend Action (30XX), Prepend Three Times (3043), Coloclue (AS8283)
* `(50869,3100,6939)`: Prepend Once Action (3100), Hurricane Electric (AS6939)
* `(50869,3200,12859)`: Prepend Twice Action (3200), BIT BV (AS12859)
* `(50869,3300,8283)`: Prepend Thice Action (3300), Coloclue (AS8283)
Peers cannot set these actions, as all action communities will be stripped on ingress. Members can set these action
communities on their sessions with FreeIX routers, however in some cases they may also be set by FreeIX operators when

View File

@ -0,0 +1,778 @@
---
date: "2024-10-21T10:52:11Z"
title: "FreeIX - Remote, part 2"
---
{{< image width="18em" float="right" src="/assets/freeix/freeix-artist-rendering.png" alt="FreeIX, Artists Rendering" >}}
# Introduction
A few months ago, I wrote about [[an idea]({{< ref 2024-04-27-freeix-1.md >}})] to help boost the
value of small Internet Exchange Points (_IXPs). When such an exchange doesn't have many members,
then the operational costs of connecting to it (cross connects, router ports, finding peers, etc)
are not very favorable.
Clearly, the benefit of using an Internet Exchange is to reduce the portion of an ISPs (and CDNs)
traffic that must be delivered via their upstream transit providers, thereby reducing the average
per-bit delivery cost and as well reducing the end to end latency as seen by their users or
customers. Furthermore, the increased number of paths available through the IXP improves routing
efficiency and fault-tolerance, and at the same time it avoids traffic going the scenic route to a
large hub like Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local.
## Refresher: FreeIX Remote
{{< image width="20em" float="right" src="/assets/freeix/Free IX Remote.svg" alt="FreeIX Remote" >}}
Let's take for example the [[Free IX in Greece](https://free-ix.gr/)] that was announced at GRNOG16
in Athens on April 19th, 2024. This exchange initially targets Athens and Thessaloniki, with 2x100G
between the two cities. Members can connect to either site for the cost of only a cross connect.
The 1G/10G/25G ports will be _Gratis_, so please make sure to apply if you're in this region! I
myself have connected one very special router to Free IX Greece, which will be offering an outreach
infrastructure by connecting to _other_ Internet Exchange Points in Amsterdam, and allowing all FreeIX
Greece members to benefit from that in the following way:
1. FreeIX Remote uses AS50869 to peer with any network operator (or routeserver) available at public
Internet Exchange Points or using private interconnects. For these peers, it looks like a completely
normal service provider in this regard. It will connect to internet exchange points, and learn a bunch of
routes and announce other routes.
1. FreeIX Remote _members_ can join the program, after which they are granted certain propagation
permissions by FreeIX Remote at the point where they have a BGP session with AS50869. The prefixes
learned on these _member_ sessions are marked as such, and will be allowed to propagate. Members
will receive some or all learned prefixes from AS50869.
1. FreeIX _members_ can set fine grained BGP communities to determine which of their prefixes are
propagated to and from which locations, by router, country or Internet Exchange Point.
Members at smaller internet exchange points greatly benefit from this type of outreach, by receiving large
portions of the public internet directly at their preferred peering location. The _Free IX Remote_
routers will carry member traffic to and from these remote Internet Exchange Points. My [[previous
article]({{< ref 2024-04-27-freeix-1.md >}})] went into a good amount of detail on the principles of
operation, but back then I made a promise to come back to the actual _implementation_ of such a
complex routing topology. As a starting point, I work with the structure I shared in [[IPng's
Routing Policy]({{< ref 2021-11-14-routing-policy.md >}})]. If you haven't read that yet, I think
it may make sense to take a look as many of the structural elements and concepts will be similar.
## Implementation
The routing policy calls for three classes of (large) BGP communities: informational, permission and
inhibit. It also defines a few classic BGP communties, but I'll skip over those as they are not
very interesting. Firstly, I will use the _informational_ communities to tag which prefixes were
learned by which _router_, in which _country_ and at which internet exchange point, which I will call a
_group_.
Then, I will use the same structure to grant members _permissions_, that is to say, when AS50869
learns their prefixes, they will get tagged with specific action communities that enable propagation
to other places. I will call this 'Member-to-IXP'. Sometimes, I'd like to be able to _inhibit_
propagation of 'Member-to-IXP', so there will be a third set of communities that perform this
function. Finally, matching on the informational communities in a clever way will enable a symmetric
'IXP-to-Member' propagation.
To help structure this implementation, it helps if I think about it in
the following way:
Let's say, AS50869 is connected to IXP1, IXP2, IXP3 and IXP4. AS50869 has a _member_ called M1 at
IXP1, and that member is 'permitted' to reach IXP2 and IXP3, but it is 'inhibited' from reaching
IXP4. My _FreeIX Remote_ implementation now has to satisfy three main requirements:
1. **Ingress**: learn prefixes (from peers and members alike) at internet exchange points or
private network interconnects, and 'tag' them with the correct informational communities.
1. **Egress: Member-to-IXP**: Announce M1's prefixes to IXP2 and IXP3, but not to IXP4.
1. **Egress: IXP-to-Member**: Announce IXP2's and IXP3's prefixes to M1, but not IXP4's.
### Defining Countries and Routers
I'll start by giving each country which has at least one router a unique _country_id_ in a YAML
file, leaving the value 0 to mean 'all' countries:
```
$ cat config/common/countries.yaml
country:
all: 0
CH: 1
NL: 2
GR: 3
IT: 4
```
Each router has its own configuration file, and at the top, I'll define some metadata which
includes things like the country in which it operates, and its own unique _router_id_, like so:
```
$ cat config/chrma0.net.free-ix.net.yaml
device:
id: 1
hostname: chrma0.free-ix.net
shortname: chrma0
country: CH
loopbacks:
ipv4: 194.126.235.16
ipv6: "2a0b:dd80:3101::"
location: "Hofwiesenstrasse, Ruemlang, Zurich, Switzerland"
...
```
### Defining communities
Next, I define the BGP communities in `class` and `subclass` types, in the following YAML structure:
```
ebgp:
community:
legacy:
noannounce: 0
blackhole: 666
inhibit: 3000
prepend1: 3100
prepend2: 3200
prepend3: 3300
large:
class:
informational: 1000
permission: 2000
inhibit: 3000
prepend1: 3100
prepend2: 3200
prepend3: 3300
subclass:
all: 0
router: 10
country: 20
group: 30
asn: 40
```
### Defining Members
In order to keep this system manageable, I have to rely on automation. I intend to leverage the
BGP community _subclasses_ in a simple ACL system consisting of the following YAML, taking my buddy
Antonios' network as an example:
```
$ cat config/common/members.yaml
member:
210312:
description: DaKnObNET
prefix_filter: AS-SET-DNET
permission: [ router:chrma0 ]
inhibit: [ group:chix ]
...
```
The syntax of the `permission` and `inhibit` fields are identical. They are lists of key:value pairs
where they key must be one of the _subclasses_ (eg. 'router', 'country', 'group', 'asn'), and the
value appropriate for that type. In this example, AS50869 is being asked to grant permissions for
Antonios' prefixes to any peer connected to `router:chrma0`, but inhibit propagation to/from the
exchange point called `group:chix`. I could extend this list, for example by adding a permission to
`country:NL` or an inhibit to `router:grskg0` and so on.
I decide that sensible defaults are to give permissions to all, and keep inhibit empty. In other
words: be very liberal in propagation, to maximize the value that FreeIX Remote can provide its
members.
### Ingress: Learning Prefixes
With what I've defined so far, I can start to set informational BGP communtiies:
* The prefixes learned on subclass **router** for `chrma0` will have value of device.id=1:
`(50869,1010,1)`
* The prefixes learned on subclass **country** for `chrma0` will learn from device.country=CH and
be able to look up in `countries['CH']` that this means value 1: `(50869,1020,1)`
* When learning prefixes from a given internet exchange, Kees already knows its PeeringDB
_ixp_id_, which is a unique value for each exchange point. Thus, subclass **group** for `chrma0` at
[[CommunityIX](https://www.peeringdb.com/ix/2013)] is ixp_id=2013: `(50869,1030,2013)`
#### Ingress: Learning from members
I need to make sure that members send only the prefixes that I expect from them. To do this, I'll
make use of a common tool called [[bgpq4](https://github.com/bgp/bgpq4)] which cobbles together the
prefixes belonging to an AS-SET by referencing one or more IRR databases.
In Python, I'll prepare the Jinja context by generating the prefix filter lists like so:
```
if session["type"] == "member":
session = {**session, **data["member"][asn]}
pf = ebgp_merge_value(data["ebgp"], group, session, "prefix_filter", None)
if pf:
ctx["prefix_filter"] = {}
pfn = pf
pfn = pfn.replace("-", "_")
pfn = pfn.replace(":", "_")
for af in [4, 6]:
filter_name = "%s_%s_IPV%d" % (groupname.upper(), pfn, af)
filter_contents = fetch_bgpq(filter_name, pf, af, allow_morespecifics=True)
if "[" in filter_contents:
ctx["prefix_filter"][filter_name] = { "str": filter_contents, "af": af }
ctx["prefix_filter_ipv%d" % af] = True
else:
log.warning(f"Filter {filter_name} is empty!")
ctx["prefix_filter_ipv%d" % af] = False
```
First, if a given BGP session is of type _member_, I'll merge the `member[asn]` dictionary
into the `ebgp.group.session[asn]`. I've left out error handling for brevity, but in case the member
YAML file doesn't have an entry for the given ASN, it'll just revert back to being of type _peer_.
I'll use a helper function `ebgp_merge_value()` to walk the YAML hiearchy from the member-data
enriched _session_ to the _group_ and finally to the _ebgp_ scope, looking for the existence of a
key called _prefix_filter_ and defaulting to None in case none was found. With the value of
_prefix_filter_ in hand (in this case `AS-SET-DNET`), I shell out to `bgpq4` for IPv4 and IPv6
respectively. Sometimes, there are no IPv6 prefixes (why must you be like this?!) and sometimes
there are no IPv4 prefixes (welcome to the Internet, kid!)
All of this context, including the session and group information, are then fed as context to a
Jinja renderer, where I can use them in an _import_ filter like so:
```
{% for plname, pl in (prefix_filter | default({})).items() %}
{{pl.str}}
{% endfor %}
filter ebgp_{{group_name}}_{{their_asn}}_import {
{% if not prefix_filter_ipv4 | default(True) %}
# WARNING: No IPv4 prefix filter found
if (net.type = NET_IP4) then reject;
{% endif %}
{% if not prefix_filter_ipv6 | default(True) %}
# WARNING: No IPv6 prefix filter found
if (net.type = NET_IP6) then reject;
{% endif %}
{% for plname, pl in (prefix_filter | default({})).items() %}
{% if pl.af == 4 %}
if (net.type = NET_IP4 && ! (net ~ {{plname}})) then reject;
{% elif pl.af == 6 %}
if (net.type = NET_IP6 && ! (net ~ {{plname}})) then reject;
{% endif %}
{% endfor %}
{% if session_type is defined %}
if ! ebgp_import_{{session_type}}({{their_asn}}) then reject;
{% endif %}
# Add FreeIX Remote: Informational
bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.router}},{{device.id}})); ## informational.router = {{ device.hostname }}
bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.country}},{{country[device.country]}})); ## informational.country = {{ device.country }}
{% if group.peeringdb_ix.id %}
bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.group}},{{group.peeringdb_ix.id}})); ## informational.group = {{ group_name }}
{% endif %}
## NOTE(pim): More comes here, see Member-to-IXP below
accept;
}
```
Let me explain what's going on here, as Jinja templating language that my generator uses is a bit
... chatty. The first block will print the dictionary of zero or more `prefix_filter` entries. If
the `prefix_filter` context variable doesn't exist, assume it's the empty dictionary and thus,
print no prefix lists.
Then, I create a Bird2 filter and these must each have a globally unique name. I satisfy this
requirement by giving it a name with the tuple of {group, their_asn}. The first thing this filter
does, is inspect `prefix_filter_ipv4` and `prefix_filter_ipv6`, and if they are explicitly set to
False (for example, if a member doesn't have any IRR prefixes associated with their AS-SET), then
I'll reject any prefixes from them. Then, I'll match the prefixes with the `prefix_filter`, if
provided, and reject any prefixes that aren't in the list I'm expecting on this session. Assuming
we're still good to go, I'll hand this prefix off to a function called `ebgp_import_peer()` for
peers and `ebgp_import_member()` for members, both of which ensure BGP communities are scrubbed.
```
function ebgp_import_peer(int remote_as) -> bool
{
# Scrub BGP Communities (RFC 7454 Section 11)
bgp_community.delete([(50869, *)]);
bgp_large_community.delete([(50869, *, *)]);
# Scrub BLACKHOLE community
bgp_community.delete((65535, 666));
return ebgp_import(remote_as);
}
function ebgp_import_member(int remote_as) -> bool
{
# We scrub only our own (informational, permissions) BGP Communities for members
bgp_large_community.delete([(50869,1000..2999,*)]);
return ebgp_import(remote_as);
}
```
After scrubbing the communities (peers are not allowed to set _any_ communities, and members are not
allowed to set their own informational or permissions communities, but they are allowed to inhibit
themselves or prepend, if they wish), one last check is performed by calling the underlying
`ebgp_import()`:
```
function ebgp_import(int remote_as) -> bool
{
if aspath_bogon() then return false;
if (net.type = NET_IP4 && ipv4_bogon()) then return false;
if (net.type = NET_IP6 && ipv6_bogon()) then return false;
if (net.type = NET_IP4 && ipv4_rpki_invalid()) then return false;
if (net.type = NET_IP6 && ipv6_rpki_invalid()) then return false;
# Graceful Shutdown (https://www.rfc-editor.org/rfc/rfc8326.html)
if (65535, 0) ~ bgp_community then bgp_local_pref = 0;
return true;
}
```
Here, belt-and-suspenders checks are performed, notably bogon AS Paths, IPv4/IPv6 prefixes and RPKI
invalids are filtered out. If the prefix has well-known community for [[BGP Graceful
Shutdown](https://www.rfc-editor.org/rfc/rfc8326.html)], honor it and set the local preference to
zero (making sure to prefer any other available path).
OK, after all these checks are done, I am finally ready to accept the prefix from this peer or
member. It's time to add the informational communities based on the _router_id_, the router's
_country_id_ and (if this is a session at a public internet exchange point documented in PeeringDB),
the group's _ixp_id_.
#### Ingress Example: member
Here's what the rendered template looks like for Antonios' member session at CHIX:
```
# bgpq4 -Ab4 -R 32 -l 'define CHIX_AS_SET_DNET_IPV4' AS-SET-DNET
define CHIX_AS_SET_DNET_IPV4 = [
44.31.27.0/24{24,32}, 44.154.130.0/24{24,32}, 44.154.132.0/24{24,32},
147.189.216.0/21{21,32}, 193.5.16.0/22{22,32}, 212.46.55.0/24{24,32}
];
# bgpq4 -Ab6 -R 128 -l 'define CHIX_AS_SET_DNET_IPV6' AS-SET-DNET
define CHIX_AS_SET_DNET_IPV6 = [
2001:678:f5c::/48{48,128}, 2a05:dfc1:9174::/48{48,128}, 2a06:9f81:2500::/40{40,128},
2a06:9f81:2600::/40{40,128}, 2a0a:6044:7100::/40{40,128}, 2a0c:2f04:100::/40{40,128},
2a0d:3dc0::/29{29,128}, 2a12:bc0::/29{29,128}
];
filter ebgp_chix_210312_import {
if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject;
if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject;
if ! ebgp_import_member(210312) then reject;
# Add FreeIX Remote: Informational
bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
bgp_large_community.add((50869,1020,1)); ## informational.country = CH
bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
## NOTE(pim): More comes here, see Member-to-IXP below
accept;
}
```
#### Ingress Example: peer
For completeness, here's a regular peer Cloudflare at CHIX, and I hope you agree that the Jinja
template renders down to something waaaay more readable now:
```
filter ebgp_chix_13335_import {
if ! ebgp_import_peer(13335) then reject;
# Add FreeIX Remote: Informational
bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
bgp_large_community.add((50869,1020,1)); ## informational.country = CH
bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
accept;
}
```
Most sessions will actually look like this one: just learning prefixes, scrubbing inbound
communities that are nobody's business to be setting but mine, tossing weird prefixes like bogons
and then setting typically the three informational communities. I now know exactly which prefixes
are picked up at group CHIX, which ones in country Switzerland, and which ones on router `chrma0`.
### Egress: Propagating Prefixes
And with that, I've completed the 'learning' part. Let me move to the 'propagating' part. A design
goal of FreeIX Remote is to have _symmetric_ propagation. In my example above, member M1 should have
its prefixes announced at IXP2 and IXP3, and all prefixes learned at IXP2 and IXP3 should be
announced to member M1.
First, let me create a helper function in the generator. It's job is to take the symbolic
`member.*.permissions` and `member.*.inhibit` lists and resolve them into a structure of numeric
values suitable for BGP community list adding and matching. It's a bit of a beast, but I've
simplified it a bit. Notably, I've removed all the error and exception handling for brevity:
```
def parse_member_communities(data, asn, type):
myasn = data["ebgp"]["asn"]
cls = data["ebgp"]["community"]["large"]["class"]
sub = data["ebgp"]["community"]["large"]["subclass"]
bgp_cl = []
member = data["member"][asn]
for perm in perms:
if perm == "all":
el = { "class": int(cls[type]), "subclass": int(sub["all"]),
"value": 0, "description": f"{type}.all" }
return [el]
k, v = perm.split(":")
if k == "country":
country_id = data["country"][v]
el = { "class": int(cls[type]), "subclass": int(sub["country"]),
"value": int(country_id), "description": f"{type}.{k} = {v}" }
bgp_cl.append(el)
elif k == "asn":
el = { "class": int(cls[type]), "subclass": int(sub["asn"]),
"value": int(v), "description": f"{type}.{k} = {v}" }
bgp_cl.append(el)
elif k == "router":
device_id = data["_devices"][v]["id"]
el = { "class": int(cls[type]), "subclass": int(sub["router"]),
"value": int(device_id), "description": f"{type}.{k} = {v}" }
bgp_cl.append(el)
elif k == "group":
group = data["ebgp"]["groups"][v]
if isinstance(group["peeringdb_ix"], dict):
ix_id = group["peeringdb_ix"]["id"]
else:
ix_id = group["peeringdb_ix"]
el = { "class": int(cls[type]), "subclass": int(sub["group"]),
"value": int(ix_id), "description": f"{type}.{k} = {v}" }
bgp_cl.append(el)
else:
log.warning (f"No implementation for {type} subclass '{k}' for member AS{asn}, skipping")
return bgp_cl
```
The essence of this function is to take a human readable list of symbols, like 'router:chrma0' and
look up what subclass is called 'router' and what router_id is 'chrma0'. It does this for keywords
'router', 'country', 'group' and 'asn' and for a special keyword called 'all' as well.
Running this a function on Antonios' member data above would reveal the following:
```
Member 210312 has permissions:
[{'class': 2000, 'subclass': 10, 'value': 1, 'description': 'permission.router = chrma0'}]
Member 210312 has inhibits:
[{'class': 3000, 'subclass': 30, 'value': 2365, 'description': 'inhibit.group = chix'}]
```
The neat thing about this is, that this data will come in handy for _both_ types of propagation, and
the `parse_member_communities()` helper function returns pretty readable data, which will help in
debugging and further understanding the ultimately generated configuration.
#### Egress: Member-to-IXP
OK, when I learned Antonios' prefixes, I have instructed the system to propagate them to all
sessions on router `chrma0`, except sessions on group `chix`. This means that in the direction of
_from AS50869 to others_, I can do the following:
**1. Tag permissions and inhibits on ingress**
I add a tiny bit of logic using this data structure I just created above. In the import filter,
remember I added `NOTE(pim): More comes here`? After setting the informational communities, I also
add these:
```
{% if session_type == "member" %}
{% if permissions %}
# Add FreeIX Remote: Permission
{% for el in permissions %}
bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description
}}
{% endfor %}
{% endif %}
{% if inhibits %}
# Add FreeIX Remote: Inhibit
{% for el in inhibits %}
bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description
}}
{% endfor %}
{% endif %}
{% endif %}
```
Seeing as this block only gets rendered if the session type is _member_, let me show you how
Antonios' import filter looks like in its full glory:
```
filter ebgp_chix_210312_import {
if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject;
if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject;
if ! ebgp_import_member(210312) then reject;
# Add FreeIX Remote: Informational
bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
bgp_large_community.add((50869,1020,1)); ## informational.country = CH
bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
# Add FreeIX Remote: Permission
bgp_large_community.add((50869,2010,1)); ## permission.router = chrma0
# Add FreeIX Remote: Inhibit
bgp_large_community.add((50869,3030,2365)); ## inhibit.group = chix
accept;
}
```
Remember, the `ebgp_import_member()` helper will strip any informational (the 1000s) and permissions
(the 2000s), but it would allow Antonios to set inhibits and prepends (the 3000s) so these BGP
communities will still be allowed in. In other words, Antonios can't give himself propagation rights
(sorry, buddy!) but if he would like to make AS50869 stop sending his prefixes to, say, CommunityIX,
he could simply add the BGP community `(50869,3030,2013)` on his announcements, and that will get
honored. If he'd like AS50869 to prepend itself twice before announcing to peer AS8298, he could set
`(50869,3200,8298)` and that will also get picked up.
**2. Match permissions and inhibits on egress**
Now that all of Antonios' prefixes are tagged with permissions and inhibits, I can reveal how I
implemented the export filters for AS50869:
```
function member_prefix(int group) -> bool
{
bool permitted = false;
if (({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then {
permitted = true;
}
if (({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community ||
({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then {
permitted = false;
}
return (permitted);
}
function valid_prefix(int group) -> bool
{
return (source_prefix() || member_prefix(group));
}
function ebgp_export_peer(int remote_as; int group) -> bool
{
if (source != RTS_BGP && source != RTS_STATIC) then return false;
if !valid_prefix(group) then return false;
bgp_community.delete([(50869, *)]);
bgp_large_community.delete([(50869, *, *)]);
return ebgp_export(remote_as);
}
```
From the bottom, the function `ebgp_export_peer()` is invoked on each peering session, and it gets
the argument of the remote AS (for example 13335 for CloudFlare), and the group (for example 2365
for CHIX). The function ensures that it's either a _static_ route or a _BGP_ route. Then it makes
sure it's a `valid_prefix()` for the group.
The `valid_prefix()` function first checks if it's one of our own (as in: AS50869's own) prefixes,
which it does by calling `source_prefix()`, which i've ommitted here as it would be a distraction.
All it does is check if the prefix is in a static prefix list generated with `bgpq4` for AS50869
itself. The more interesting observation is that to be eligible, the prefix needs to be either
`source_prefix()` **or** `member_prefix(group)`.
The propagation decision for 'Member-to-IXP' actually happens in that `member_prefix()` function. It
starts off by assuming the prefix is not permitted. Then it scans all relevant _permissions_
communities which may be present in the RIB for this prefix:
- is the `all` permissions community `(50869,2000,0)` set?
- what about the `router` permission `(50869,2010,R)` for my _router_id_?
- perhaps the `country` permission `(50869,2020,C)` for my _country_id_?
- or maybe the `group` permission `(50869,2030,G)` for the _ixp_id_ that this session lives on?
If any of these conditions are true, then this prefix _might_ pe permitted, so I set the variable to
True. Next, I check and see if any of the _inhibit_ communities are set, either by me (in
`members.yaml`) or by the member on the live BGP session. If any one of them matches, then I flip
the variable to False again. Once the verdict is known, I can return True or False here, which
makes its way all the way up the call stack and ultimately announces the member prefix on the BGP
session, or not. Slick!
#### Egress: IXP-to-Member
At this point, members' prefixes get announced at the correct internet exchange points, but I need to
satisfy one more requirement: the prefixes picked up at those IXPs, should _also_ be announced to
members. For this, the helper dictionary with permissions and inhibits can be used in a clever way.
What if I held them against the informational communities? For example, I have _permitted_
Antonios to be annouced at any IXP connected to router `chrma0`, then all prefixes I learned at
`chrma0` are fair game, right? But, I configured an _inhibit_ for Antonios' prefixes at CHIX. No
problem, I have an informational community for all prefixes I learned from the CHIX group!
I come to the realization that IXP-to-Member simply adds to the Member-to-IXP logic. Everything that
I would announce to a peer, I will also announce to a member. Off I go, adding one last helper
function to the BGP session Jinja template:
```
{% if session_type == "member" %}
function ebgp_export_{{group_name}}_{{their_asn}}(int remote_as; int group) -> bool
{
bool permitted = false;
if (source != RTS_BGP && source != RTS_STATIC) then return false;
if valid_prefix(group) then return ebgp_export(remote_as);
{% for el in permissions | default([]) %}
if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=true; ## {{el.description}}
{% endfor %}
{% for el in inhibits | default([]) %}
if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=false; ## {{el.description}}
{% endfor %}
if (permitted) then return ebgp_export(remote_as);
return false;
}
{% endif %}
```
Note that in essence, this new function still calls `valid_prefix()`, which in turn calls
`source_prefix()` **or** `member_prefix(group)`, so it announces the same prefixes that are also
announced to sessions of type 'peer'. But then, I'll also inspect the _informational_ communities,
where the value of `0` is replaced with a wildcard, because 'permit or inhibit all' would mean
'match any of these BGP communities'. This template renders as follows for Antonios at CHIX:
```
function ebgp_export_chix_210312(int remote_as; int group) -> bool
{
bool export = false;
if (source != RTS_BGP && source != RTS_STATIC) then return false;
if valid_prefix(group) then return ebgp_export(remote_as);
if (bgp_large_community ~ [(50869,1010,1)]) then export=true; ## permission.router = chrma0
if (bgp_large_community ~ [(50869,1030,2365)]) then export=false; ## inhibit.group = chix
if (export) then return ebgp_export(remote_as);
return false;
}
```
## Results
With this, the propagation logic is complete. Announcements are _symmetric_, that is to say the function
`ebgp_export_chix_210312()` sees to it that Antonios gets the prefixes learned at router `chrma0`
but not those learned at group `CHIX`. Similarly, the `ebgp_export_peer()` ensures that Antonios'
prefixes are propagated to any session at router `chrma0` except those sessions at group `CHIX`.
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
I have installed VPP with [[OSPFv3]({{< ref 2024-06-22-vpp-ospf-2.md >}})] unnumbered interfaces,
so each router has exactly one IPv4 and IPv6 loopback address. The router in R&uuml;mlang has been
operational for a while, the one in Amsterdam (nlams0.free-ix.net) and Thessaloniki
(grskg0.free-ix.net) have been deployed and are connecting to IXPs now, and the one in Milan
(itmil0.free-ix.net) has been installed but is pending physical deployment at Caldara.
I deployed a test setup with a few permissions and inhibits on the R&uuml;mlang router, with many thanks
to Jurrian, Sam and Antonios for allowing me to guinnaepig-ize their member sessions. With the
following test configuration:
```
member:
35202:
description: OnTheGo (Sam Aschwanden)
prefix_filter: AS-OTG
permission: [ router:chrma0 ]
inhibit: [ group:comix ]
210312:
description: DaKnObNET
prefix_filter: AS-SET-DNET
permission: [ router:chrma0 ]
inhibit: [ group:chix ]
212635:
description: Jurrian van Iersel
prefix_filter: AS212635:AS-212635
permission: [ router:chrma0 ]
inhibit: [ group:chix, group:fogixp ]
```
I can see the following prefix learn/announce counts towards _members_:
```
pim@chrma0:~$ for i in $(birdc show protocol | grep member | cut -f1 -d' '); do echo -n $i\ ; birdc
show protocol all $i | grep Routes; done
chix_member_35202_ipv4_1 2 imported, 0 filtered, 159984 exported, 0 preferred
chix_member_35202_ipv6_1 2 imported, 0 filtered, 61730 exported, 0 preferred
chix_member_210312_ipv4_1 3 imported, 0 filtered, 3518 exported, 3 preferred
chix_member_210312_ipv6_1 2 imported, 0 filtered, 1251 exported, 2 preferred
comix_member_35202_ipv4_1 2 imported, 0 filtered, 159981 exported, 2 preferred
comix_member_35202_ipv4_2 2 imported, 0 filtered, 159981 exported, 1 preferred
comix_member_35202_ipv6_1 2 imported, 0 filtered, 61727 exported, 2 preferred
comix_member_35202_ipv6_2 2 imported, 0 filtered, 61727 exported, 1 preferred
fogixp_member_212635_ipv4_1 1 imported, 0 filtered, 442 exported, 1 preferred
fogixp_member_212635_ipv6_1 14 imported, 0 filtered, 181 exported, 14 preferred
freeix_ch_member_210312_ipv4_1 3 imported, 0 filtered, 3521 exported, 0 preferred
freeix_ch_member_210312_ipv6_1 2 imported, 0 filtered, 1253 exported, 0 preferred
```
Let me make a few observations:
* Hurricane Electric AS6939 is present at CHIX, and they tend to announce a very large number of
prefixes. So every member who is permitted (and not inhibited) at CHIX will see all of those: Sam's
AS35202 is inhibited on CommunityIX but not on CHIX, and he's permitted on both. That explains why
he is seeing the routes on both sessions.
* I've inhibited Jurrian's AS212635 to/from both CHIX and FogIXP, which means he will be seeing
CommunityIX (~245 IPv4, 85 IPv6 prefixes), and FreeIX CH (~173 IPv4 and ~60 IPv6). We also send him
the member prefixes, which is about 35 or so additional prefixes. This explains why Jurrian is
receiving from us ~440 IPv4 and ~180 IPv6.
* Antonios' AS210312, the exemplar in this article, is receiving all-but-CHIX. FogIXP yields 3077
or so IPv4 and 1056 IPv6 prefixes, while I've already added up FreeIX, CommunityIX, and our members
(this is what we're sending Jurrian!), at 330 resp 180, so Antonios should be getting about 3500 IPv4
prefixes and 1250 IPv6 prefixes.
In the other direction, I would expect to be announcing to _peers_ only prefixes belonging to either
AS50869 itself, or those of our members:
```
pim@chrma0:~$ for i in $(birdc show protocol | grep peer.*_1 | cut -f1 -d' '); do echo -n $i\ ; birdc
show protocol all $i | grep Routes || echo; done
chix_peer_212100_ipv4_1 57618 imported, 0 filtered, 24 exported, 778 preferred
chix_peer_212100_ipv6_1 21979 imported, 1 filtered, 37 exported, 7186 preferred
chix_peer_13335_ipv4_1 4767 imported, 9 filtered, 24 exported, 4765 preferred
chix_peer_13335_ipv6_1 371 imported, 1 filtered, 37 exported, 369 preferred
chix_peer_6939_ipv4_1 151787 imported, 27 filtered, 24 exported, 133943 preferred
chix_peer_6939_ipv6_1 61191 imported, 6 filtered, 37 exported, 16223 preferred
comix_peer_44596_ipv4_1 594 imported, 0 filtered, 25 exported, 10 preferred
comix_peer_44596_ipv6_1 1147 imported, 0 filtered, 50 exported, 0 preferred
comix_peer_8298_ipv4_1 23 imported, 0 filtered, 25 exported, 0 preferred
comix_peer_8298_ipv6_1 34 imported, 0 filtered, 50 exported, 0 preferred
fogixp_peer_47498_ipv4_1 3286 imported, 1 filtered, 27 exported, 3077 preferred
fogixp_peer_47498_ipv6_1 1838 imported, 0 filtered, 39 exported, 1056 preferred
freeix_ch_peer_51530_ipv4_1 355 imported, 0 filtered, 28 exported, 0 preferred
freeix_ch_peer_51530_ipv6_1 143 imported, 0 filtered, 53 exported, 0 preferred
```
Some observations:
* Nobody is inhibited at FreeIX Switzerland. It stands to reason therefore, that it has the most
exported prefixes: 28 for IPv4 and 53 for IPv6.
* Two members are inhibited at CHIX, which makes it have the lowest amount of exported prefixes:
24 for IPv4 and 27 for IPv6.
* All members at each exchange (group) will have the same amount of prefixes. I can confirm that
at CHIX, all thre peers have the same amount of announced prefixes. Similarly, at CommunityIX, all
peers have the same amount.
* If Antonios, Sam or Jurrian would add an outgoing announcement to AS50869 with an additional inhibit
BGP community (eg `(50869,3020,1)` to inhibit country Switzerland), they could tweak these numbers.
## What's next
This all adds up. I'd like to test the waters with my friendly neighborhood canaries a little bit,
to make sure that announcements are expected, and traffic flows where appropriate. In the mean time,
I'll chase the deployment of LSIX, FrysIX, SpeedIX and possibly a few others in Amsterdam. And of
course FreeIX Greece in Thessaloniki. I'll try to get the Milano VPP router deployed (it's already
installed and configured, but currently powered off) and connected to PCIX, MIX and a few others.
## How can you help?
If you're willing to participate with a VPP router and connect it to either multiple local internet
exchanges (like I've demonstrated in Zurich), or better yet, to one or more of the other existing
routers, I would welcome your contribution. [[Contact]({{< ref contact.md >}})] me for details.
A bit further down the pike, a connection from Amsterdam to Zurich, from Zurich to Milan and from
Milan to Thessaloniki is on the horizon. If you are willing and able to donate some bandwidth (point
to point VPWS, VLL, L2VPN) and your transport network is capable of at least 2026 bytes of _inner_
payload, please also [[reach out]({{< ref contact.md >}})] as I'm sure many small network operators
would be thrilled.

BIN
static/assets/freeix/freeix-artist-rendering.png (Stored with Git LFS) Normal file

Binary file not shown.