Compare commits
2 Commits
b2129702ae
...
26397d69c6
Author | SHA1 | Date | |
---|---|---|---|
26397d69c6 | |||
388293baef |
@ -91,7 +91,7 @@ their traffic to these remote internet exchanges.
|
||||
There are two types of BGP neighbor adjacency:
|
||||
|
||||
1. ***Members***: these are {ip-address,AS}-tuples which FreeIX has explicitly configured. Learned prefixes are added
|
||||
to as-set AS50869:AS-MEMBERS. Members receive _all_ prefixes from FreeIX, each annotated with BGP **informational**
|
||||
to as-set AS50869:AS-MEMBERS. Members receive _some or all_ prefixes from FreeIX, each annotated with BGP **informational**
|
||||
communities, and members can drive certain behavior with BGP **action** communities.
|
||||
|
||||
1. ***Peers***: these are all other entities with whom FreeIX has an adjacency at public internet exchanges or private
|
||||
@ -195,12 +195,12 @@ network interconnects:
|
||||
* `(50869,3020,1)`: Inhibit Action (30XX), Country (3020), Switzerland (1)
|
||||
* `(50869,3030,1308)`: Inhibit Action (30XX), IXP (3030), PeeringDB IXP for LS-IX (1308)
|
||||
|
||||
Further actions can be placed on a per-remote-neighbor basis:
|
||||
Four actions can be placed on a per-remote-asn basis:
|
||||
|
||||
* `(50869,3040,13030)`: Inhibit Action (30XX), AS (3040), Init7 (AS13030)
|
||||
* `(50869,3041,6939)`: Prepend Action (30XX), Prepend Once (3041), Hurricane Electric (AS6939)
|
||||
* `(50869,3042,12859)`: Prepend Action (30XX), Prepend Twice (3042), BIT BV (AS12859)
|
||||
* `(50869,3043,8283)`: Prepend Action (30XX), Prepend Three Times (3043), Coloclue (AS8283)
|
||||
* `(50869,3100,6939)`: Prepend Once Action (3100), Hurricane Electric (AS6939)
|
||||
* `(50869,3200,12859)`: Prepend Twice Action (3200), BIT BV (AS12859)
|
||||
* `(50869,3300,8283)`: Prepend Thice Action (3300), Coloclue (AS8283)
|
||||
|
||||
Peers cannot set these actions, as all action communities will be stripped on ingress. Members can set these action
|
||||
communities on their sessions with FreeIX routers, however in some cases they may also be set by FreeIX operators when
|
||||
|
778
content/articles/2024-10-21-freeix-2.md
Normal file
778
content/articles/2024-10-21-freeix-2.md
Normal file
@ -0,0 +1,778 @@
|
||||
---
|
||||
date: "2024-10-21T10:52:11Z"
|
||||
title: "FreeIX - Remote, part 2"
|
||||
---
|
||||
|
||||
{{< image width="18em" float="right" src="/assets/freeix/freeix-artist-rendering.png" alt="FreeIX, Artists Rendering" >}}
|
||||
|
||||
# Introduction
|
||||
|
||||
A few months ago, I wrote about [[an idea]({{< ref 2024-04-27-freeix-1.md >}})] to help boost the
|
||||
value of small Internet Exchange Points (_IXPs). When such an exchange doesn't have many members,
|
||||
then the operational costs of connecting to it (cross connects, router ports, finding peers, etc)
|
||||
are not very favorable.
|
||||
|
||||
Clearly, the benefit of using an Internet Exchange is to reduce the portion of an ISP’s (and CDN’s)
|
||||
traffic that must be delivered via their upstream transit providers, thereby reducing the average
|
||||
per-bit delivery cost and as well reducing the end to end latency as seen by their users or
|
||||
customers. Furthermore, the increased number of paths available through the IXP improves routing
|
||||
efficiency and fault-tolerance, and at the same time it avoids traffic going the scenic route to a
|
||||
large hub like Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local.
|
||||
|
||||
## Refresher: FreeIX Remote
|
||||
|
||||
{{< image width="20em" float="right" src="/assets/freeix/Free IX Remote.svg" alt="FreeIX Remote" >}}
|
||||
|
||||
Let's take for example the [[Free IX in Greece](https://free-ix.gr/)] that was announced at GRNOG16
|
||||
in Athens on April 19th, 2024. This exchange initially targets Athens and Thessaloniki, with 2x100G
|
||||
between the two cities. Members can connect to either site for the cost of only a cross connect.
|
||||
The 1G/10G/25G ports will be _Gratis_, so please make sure to apply if you're in this region! I
|
||||
myself have connected one very special router to Free IX Greece, which will be offering an outreach
|
||||
infrastructure by connecting to _other_ Internet Exchange Points in Amsterdam, and allowing all FreeIX
|
||||
Greece members to benefit from that in the following way:
|
||||
|
||||
1. FreeIX Remote uses AS50869 to peer with any network operator (or routeserver) available at public
|
||||
Internet Exchange Points or using private interconnects. For these peers, it looks like a completely
|
||||
normal service provider in this regard. It will connect to internet exchange points, and learn a bunch of
|
||||
routes and announce other routes.
|
||||
|
||||
1. FreeIX Remote _members_ can join the program, after which they are granted certain propagation
|
||||
permissions by FreeIX Remote at the point where they have a BGP session with AS50869. The prefixes
|
||||
learned on these _member_ sessions are marked as such, and will be allowed to propagate. Members
|
||||
will receive some or all learned prefixes from AS50869.
|
||||
|
||||
1. FreeIX _members_ can set fine grained BGP communities to determine which of their prefixes are
|
||||
propagated to and from which locations, by router, country or Internet Exchange Point.
|
||||
|
||||
Members at smaller internet exchange points greatly benefit from this type of outreach, by receiving large
|
||||
portions of the public internet directly at their preferred peering location. The _Free IX Remote_
|
||||
routers will carry member traffic to and from these remote Internet Exchange Points. My [[previous
|
||||
article]({{< ref 2024-04-27-freeix-1.md >}})] went into a good amount of detail on the principles of
|
||||
operation, but back then I made a promise to come back to the actual _implementation_ of such a
|
||||
complex routing topology. As a starting point, I work with the structure I shared in [[IPng's
|
||||
Routing Policy]({{< ref 2021-11-14-routing-policy.md >}})]. If you haven't read that yet, I think
|
||||
it may make sense to take a look as many of the structural elements and concepts will be similar.
|
||||
|
||||
## Implementation
|
||||
|
||||
The routing policy calls for three classes of (large) BGP communities: informational, permission and
|
||||
inhibit. It also defines a few classic BGP communties, but I'll skip over those as they are not
|
||||
very interesting. Firstly, I will use the _informational_ communities to tag which prefixes were
|
||||
learned by which _router_, in which _country_ and at which internet exchange point, which I will call a
|
||||
_group_.
|
||||
|
||||
Then, I will use the same structure to grant members _permissions_, that is to say, when AS50869
|
||||
learns their prefixes, they will get tagged with specific action communities that enable propagation
|
||||
to other places. I will call this 'Member-to-IXP'. Sometimes, I'd like to be able to _inhibit_
|
||||
propagation of 'Member-to-IXP', so there will be a third set of communities that perform this
|
||||
function. Finally, matching on the informational communities in a clever way will enable a symmetric
|
||||
'IXP-to-Member' propagation.
|
||||
|
||||
To help structure this implementation, it helps if I think about it in
|
||||
the following way:
|
||||
|
||||
Let's say, AS50869 is connected to IXP1, IXP2, IXP3 and IXP4. AS50869 has a _member_ called M1 at
|
||||
IXP1, and that member is 'permitted' to reach IXP2 and IXP3, but it is 'inhibited' from reaching
|
||||
IXP4. My _FreeIX Remote_ implementation now has to satisfy three main requirements:
|
||||
|
||||
1. **Ingress**: learn prefixes (from peers and members alike) at internet exchange points or
|
||||
private network interconnects, and 'tag' them with the correct informational communities.
|
||||
1. **Egress: Member-to-IXP**: Announce M1's prefixes to IXP2 and IXP3, but not to IXP4.
|
||||
1. **Egress: IXP-to-Member**: Announce IXP2's and IXP3's prefixes to M1, but not IXP4's.
|
||||
|
||||
### Defining Countries and Routers
|
||||
|
||||
I'll start by giving each country which has at least one router a unique _country_id_ in a YAML
|
||||
file, leaving the value 0 to mean 'all' countries:
|
||||
|
||||
```
|
||||
$ cat config/common/countries.yaml
|
||||
country:
|
||||
all: 0
|
||||
CH: 1
|
||||
NL: 2
|
||||
GR: 3
|
||||
IT: 4
|
||||
```
|
||||
|
||||
Each router has its own configuration file, and at the top, I'll define some metadata which
|
||||
includes things like the country in which it operates, and its own unique _router_id_, like so:
|
||||
|
||||
```
|
||||
$ cat config/chrma0.net.free-ix.net.yaml
|
||||
device:
|
||||
id: 1
|
||||
hostname: chrma0.free-ix.net
|
||||
shortname: chrma0
|
||||
country: CH
|
||||
loopbacks:
|
||||
ipv4: 194.126.235.16
|
||||
ipv6: "2a0b:dd80:3101::"
|
||||
location: "Hofwiesenstrasse, Ruemlang, Zurich, Switzerland"
|
||||
...
|
||||
```
|
||||
|
||||
### Defining communities
|
||||
|
||||
Next, I define the BGP communities in `class` and `subclass` types, in the following YAML structure:
|
||||
|
||||
```
|
||||
ebgp:
|
||||
community:
|
||||
legacy:
|
||||
noannounce: 0
|
||||
blackhole: 666
|
||||
inhibit: 3000
|
||||
prepend1: 3100
|
||||
prepend2: 3200
|
||||
prepend3: 3300
|
||||
large:
|
||||
class:
|
||||
informational: 1000
|
||||
permission: 2000
|
||||
inhibit: 3000
|
||||
prepend1: 3100
|
||||
prepend2: 3200
|
||||
prepend3: 3300
|
||||
subclass:
|
||||
all: 0
|
||||
router: 10
|
||||
country: 20
|
||||
group: 30
|
||||
asn: 40
|
||||
```
|
||||
|
||||
### Defining Members
|
||||
|
||||
In order to keep this system manageable, I have to rely on automation. I intend to leverage the
|
||||
BGP community _subclasses_ in a simple ACL system consisting of the following YAML, taking my buddy
|
||||
Antonios' network as an example:
|
||||
|
||||
```
|
||||
$ cat config/common/members.yaml
|
||||
member:
|
||||
210312:
|
||||
description: DaKnObNET
|
||||
prefix_filter: AS-SET-DNET
|
||||
permission: [ router:chrma0 ]
|
||||
inhibit: [ group:chix ]
|
||||
...
|
||||
```
|
||||
|
||||
The syntax of the `permission` and `inhibit` fields are identical. They are lists of key:value pairs
|
||||
where they key must be one of the _subclasses_ (eg. 'router', 'country', 'group', 'asn'), and the
|
||||
value appropriate for that type. In this example, AS50869 is being asked to grant permissions for
|
||||
Antonios' prefixes to any peer connected to `router:chrma0`, but inhibit propagation to/from the
|
||||
exchange point called `group:chix`. I could extend this list, for example by adding a permission to
|
||||
`country:NL` or an inhibit to `router:grskg0` and so on.
|
||||
|
||||
I decide that sensible defaults are to give permissions to all, and keep inhibit empty. In other
|
||||
words: be very liberal in propagation, to maximize the value that FreeIX Remote can provide its
|
||||
members.
|
||||
|
||||
### Ingress: Learning Prefixes
|
||||
|
||||
With what I've defined so far, I can start to set informational BGP communtiies:
|
||||
* The prefixes learned on subclass **router** for `chrma0` will have value of device.id=1:
|
||||
`(50869,1010,1)`
|
||||
* The prefixes learned on subclass **country** for `chrma0` will learn from device.country=CH and
|
||||
be able to look up in `countries['CH']` that this means value 1: `(50869,1020,1)`
|
||||
* When learning prefixes from a given internet exchange, Kees already knows its PeeringDB
|
||||
_ixp_id_, which is a unique value for each exchange point. Thus, subclass **group** for `chrma0` at
|
||||
[[CommunityIX](https://www.peeringdb.com/ix/2013)] is ixp_id=2013: `(50869,1030,2013)`
|
||||
|
||||
#### Ingress: Learning from members
|
||||
|
||||
I need to make sure that members send only the prefixes that I expect from them. To do this, I'll
|
||||
make use of a common tool called [[bgpq4](https://github.com/bgp/bgpq4)] which cobbles together the
|
||||
prefixes belonging to an AS-SET by referencing one or more IRR databases.
|
||||
|
||||
In Python, I'll prepare the Jinja context by generating the prefix filter lists like so:
|
||||
|
||||
```
|
||||
if session["type"] == "member":
|
||||
session = {**session, **data["member"][asn]}
|
||||
|
||||
pf = ebgp_merge_value(data["ebgp"], group, session, "prefix_filter", None)
|
||||
if pf:
|
||||
ctx["prefix_filter"] = {}
|
||||
pfn = pf
|
||||
pfn = pfn.replace("-", "_")
|
||||
pfn = pfn.replace(":", "_")
|
||||
|
||||
for af in [4, 6]:
|
||||
filter_name = "%s_%s_IPV%d" % (groupname.upper(), pfn, af)
|
||||
filter_contents = fetch_bgpq(filter_name, pf, af, allow_morespecifics=True)
|
||||
if "[" in filter_contents:
|
||||
ctx["prefix_filter"][filter_name] = { "str": filter_contents, "af": af }
|
||||
ctx["prefix_filter_ipv%d" % af] = True
|
||||
else:
|
||||
log.warning(f"Filter {filter_name} is empty!")
|
||||
ctx["prefix_filter_ipv%d" % af] = False
|
||||
```
|
||||
|
||||
First, if a given BGP session is of type _member_, I'll merge the `member[asn]` dictionary
|
||||
into the `ebgp.group.session[asn]`. I've left out error handling for brevity, but in case the member
|
||||
YAML file doesn't have an entry for the given ASN, it'll just revert back to being of type _peer_.
|
||||
|
||||
I'll use a helper function `ebgp_merge_value()` to walk the YAML hiearchy from the member-data
|
||||
enriched _session_ to the _group_ and finally to the _ebgp_ scope, looking for the existence of a
|
||||
key called _prefix_filter_ and defaulting to None in case none was found. With the value of
|
||||
_prefix_filter_ in hand (in this case `AS-SET-DNET`), I shell out to `bgpq4` for IPv4 and IPv6
|
||||
respectively. Sometimes, there are no IPv6 prefixes (why must you be like this?!) and sometimes
|
||||
there are no IPv4 prefixes (welcome to the Internet, kid!)
|
||||
|
||||
All of this context, including the session and group information, are then fed as context to a
|
||||
Jinja renderer, where I can use them in an _import_ filter like so:
|
||||
|
||||
```
|
||||
{% for plname, pl in (prefix_filter | default({})).items() %}
|
||||
{{pl.str}}
|
||||
{% endfor %}
|
||||
|
||||
filter ebgp_{{group_name}}_{{their_asn}}_import {
|
||||
{% if not prefix_filter_ipv4 | default(True) %}
|
||||
# WARNING: No IPv4 prefix filter found
|
||||
if (net.type = NET_IP4) then reject;
|
||||
{% endif %}
|
||||
{% if not prefix_filter_ipv6 | default(True) %}
|
||||
# WARNING: No IPv6 prefix filter found
|
||||
if (net.type = NET_IP6) then reject;
|
||||
{% endif %}
|
||||
{% for plname, pl in (prefix_filter | default({})).items() %}
|
||||
{% if pl.af == 4 %}
|
||||
if (net.type = NET_IP4 && ! (net ~ {{plname}})) then reject;
|
||||
{% elif pl.af == 6 %}
|
||||
if (net.type = NET_IP6 && ! (net ~ {{plname}})) then reject;
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
{% if session_type is defined %}
|
||||
if ! ebgp_import_{{session_type}}({{their_asn}}) then reject;
|
||||
{% endif %}
|
||||
|
||||
# Add FreeIX Remote: Informational
|
||||
bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.router}},{{device.id}})); ## informational.router = {{ device.hostname }}
|
||||
bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.country}},{{country[device.country]}})); ## informational.country = {{ device.country }}
|
||||
{% if group.peeringdb_ix.id %}
|
||||
bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.group}},{{group.peeringdb_ix.id}})); ## informational.group = {{ group_name }}
|
||||
{% endif %}
|
||||
|
||||
## NOTE(pim): More comes here, see Member-to-IXP below
|
||||
|
||||
accept;
|
||||
}
|
||||
```
|
||||
|
||||
Let me explain what's going on here, as Jinja templating language that my generator uses is a bit
|
||||
... chatty. The first block will print the dictionary of zero or more `prefix_filter` entries. If
|
||||
the `prefix_filter` context variable doesn't exist, assume it's the empty dictionary and thus,
|
||||
print no prefix lists.
|
||||
|
||||
Then, I create a Bird2 filter and these must each have a globally unique name. I satisfy this
|
||||
requirement by giving it a name with the tuple of {group, their_asn}. The first thing this filter
|
||||
does, is inspect `prefix_filter_ipv4` and `prefix_filter_ipv6`, and if they are explicitly set to
|
||||
False (for example, if a member doesn't have any IRR prefixes associated with their AS-SET), then
|
||||
I'll reject any prefixes from them. Then, I'll match the prefixes with the `prefix_filter`, if
|
||||
provided, and reject any prefixes that aren't in the list I'm expecting on this session. Assuming
|
||||
we're still good to go, I'll hand this prefix off to a function called `ebgp_import_peer()` for
|
||||
peers and `ebgp_import_member()` for members, both of which ensure BGP communities are scrubbed.
|
||||
|
||||
```
|
||||
function ebgp_import_peer(int remote_as) -> bool
|
||||
{
|
||||
# Scrub BGP Communities (RFC 7454 Section 11)
|
||||
bgp_community.delete([(50869, *)]);
|
||||
bgp_large_community.delete([(50869, *, *)]);
|
||||
|
||||
# Scrub BLACKHOLE community
|
||||
bgp_community.delete((65535, 666));
|
||||
|
||||
return ebgp_import(remote_as);
|
||||
}
|
||||
|
||||
function ebgp_import_member(int remote_as) -> bool
|
||||
{
|
||||
# We scrub only our own (informational, permissions) BGP Communities for members
|
||||
bgp_large_community.delete([(50869,1000..2999,*)]);
|
||||
|
||||
return ebgp_import(remote_as);
|
||||
}
|
||||
```
|
||||
|
||||
After scrubbing the communities (peers are not allowed to set _any_ communities, and members are not
|
||||
allowed to set their own informational or permissions communities, but they are allowed to inhibit
|
||||
themselves or prepend, if they wish), one last check is performed by calling the underlying
|
||||
`ebgp_import()`:
|
||||
|
||||
```
|
||||
function ebgp_import(int remote_as) -> bool
|
||||
{
|
||||
if aspath_bogon() then return false;
|
||||
if (net.type = NET_IP4 && ipv4_bogon()) then return false;
|
||||
if (net.type = NET_IP6 && ipv6_bogon()) then return false;
|
||||
|
||||
if (net.type = NET_IP4 && ipv4_rpki_invalid()) then return false;
|
||||
if (net.type = NET_IP6 && ipv6_rpki_invalid()) then return false;
|
||||
|
||||
# Graceful Shutdown (https://www.rfc-editor.org/rfc/rfc8326.html)
|
||||
if (65535, 0) ~ bgp_community then bgp_local_pref = 0;
|
||||
|
||||
return true;
|
||||
}
|
||||
```
|
||||
|
||||
Here, belt-and-suspenders checks are performed, notably bogon AS Paths, IPv4/IPv6 prefixes and RPKI
|
||||
invalids are filtered out. If the prefix has well-known community for [[BGP Graceful
|
||||
Shutdown](https://www.rfc-editor.org/rfc/rfc8326.html)], honor it and set the local preference to
|
||||
zero (making sure to prefer any other available path).
|
||||
|
||||
OK, after all these checks are done, I am finally ready to accept the prefix from this peer or
|
||||
member. It's time to add the informational communities based on the _router_id_, the router's
|
||||
_country_id_ and (if this is a session at a public internet exchange point documented in PeeringDB),
|
||||
the group's _ixp_id_.
|
||||
|
||||
#### Ingress Example: member
|
||||
|
||||
Here's what the rendered template looks like for Antonios' member session at CHIX:
|
||||
|
||||
```
|
||||
# bgpq4 -Ab4 -R 32 -l 'define CHIX_AS_SET_DNET_IPV4' AS-SET-DNET
|
||||
define CHIX_AS_SET_DNET_IPV4 = [
|
||||
44.31.27.0/24{24,32}, 44.154.130.0/24{24,32}, 44.154.132.0/24{24,32},
|
||||
147.189.216.0/21{21,32}, 193.5.16.0/22{22,32}, 212.46.55.0/24{24,32}
|
||||
];
|
||||
|
||||
# bgpq4 -Ab6 -R 128 -l 'define CHIX_AS_SET_DNET_IPV6' AS-SET-DNET
|
||||
define CHIX_AS_SET_DNET_IPV6 = [
|
||||
2001:678:f5c::/48{48,128}, 2a05:dfc1:9174::/48{48,128}, 2a06:9f81:2500::/40{40,128},
|
||||
2a06:9f81:2600::/40{40,128}, 2a0a:6044:7100::/40{40,128}, 2a0c:2f04:100::/40{40,128},
|
||||
2a0d:3dc0::/29{29,128}, 2a12:bc0::/29{29,128}
|
||||
];
|
||||
|
||||
filter ebgp_chix_210312_import {
|
||||
if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject;
|
||||
if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject;
|
||||
if ! ebgp_import_member(210312) then reject;
|
||||
|
||||
# Add FreeIX Remote: Informational
|
||||
bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
|
||||
bgp_large_community.add((50869,1020,1)); ## informational.country = CH
|
||||
bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
|
||||
|
||||
## NOTE(pim): More comes here, see Member-to-IXP below
|
||||
|
||||
accept;
|
||||
}
|
||||
```
|
||||
|
||||
#### Ingress Example: peer
|
||||
|
||||
For completeness, here's a regular peer Cloudflare at CHIX, and I hope you agree that the Jinja
|
||||
template renders down to something waaaay more readable now:
|
||||
|
||||
```
|
||||
filter ebgp_chix_13335_import {
|
||||
if ! ebgp_import_peer(13335) then reject;
|
||||
|
||||
# Add FreeIX Remote: Informational
|
||||
bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
|
||||
bgp_large_community.add((50869,1020,1)); ## informational.country = CH
|
||||
bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
|
||||
|
||||
accept;
|
||||
}
|
||||
```
|
||||
|
||||
Most sessions will actually look like this one: just learning prefixes, scrubbing inbound
|
||||
communities that are nobody's business to be setting but mine, tossing weird prefixes like bogons
|
||||
and then setting typically the three informational communities. I now know exactly which prefixes
|
||||
are picked up at group CHIX, which ones in country Switzerland, and which ones on router `chrma0`.
|
||||
|
||||
### Egress: Propagating Prefixes
|
||||
|
||||
And with that, I've completed the 'learning' part. Let me move to the 'propagating' part. A design
|
||||
goal of FreeIX Remote is to have _symmetric_ propagation. In my example above, member M1 should have
|
||||
its prefixes announced at IXP2 and IXP3, and all prefixes learned at IXP2 and IXP3 should be
|
||||
announced to member M1.
|
||||
|
||||
First, let me create a helper function in the generator. It's job is to take the symbolic
|
||||
`member.*.permissions` and `member.*.inhibit` lists and resolve them into a structure of numeric
|
||||
values suitable for BGP community list adding and matching. It's a bit of a beast, but I've
|
||||
simplified it a bit. Notably, I've removed all the error and exception handling for brevity:
|
||||
|
||||
```
|
||||
def parse_member_communities(data, asn, type):
|
||||
myasn = data["ebgp"]["asn"]
|
||||
cls = data["ebgp"]["community"]["large"]["class"]
|
||||
sub = data["ebgp"]["community"]["large"]["subclass"]
|
||||
|
||||
bgp_cl = []
|
||||
member = data["member"][asn]
|
||||
|
||||
for perm in perms:
|
||||
if perm == "all":
|
||||
el = { "class": int(cls[type]), "subclass": int(sub["all"]),
|
||||
"value": 0, "description": f"{type}.all" }
|
||||
return [el]
|
||||
k, v = perm.split(":")
|
||||
if k == "country":
|
||||
country_id = data["country"][v]
|
||||
el = { "class": int(cls[type]), "subclass": int(sub["country"]),
|
||||
"value": int(country_id), "description": f"{type}.{k} = {v}" }
|
||||
bgp_cl.append(el)
|
||||
elif k == "asn":
|
||||
el = { "class": int(cls[type]), "subclass": int(sub["asn"]),
|
||||
"value": int(v), "description": f"{type}.{k} = {v}" }
|
||||
bgp_cl.append(el)
|
||||
elif k == "router":
|
||||
device_id = data["_devices"][v]["id"]
|
||||
el = { "class": int(cls[type]), "subclass": int(sub["router"]),
|
||||
"value": int(device_id), "description": f"{type}.{k} = {v}" }
|
||||
bgp_cl.append(el)
|
||||
elif k == "group":
|
||||
group = data["ebgp"]["groups"][v]
|
||||
if isinstance(group["peeringdb_ix"], dict):
|
||||
ix_id = group["peeringdb_ix"]["id"]
|
||||
else:
|
||||
ix_id = group["peeringdb_ix"]
|
||||
el = { "class": int(cls[type]), "subclass": int(sub["group"]),
|
||||
"value": int(ix_id), "description": f"{type}.{k} = {v}" }
|
||||
bgp_cl.append(el)
|
||||
else:
|
||||
log.warning (f"No implementation for {type} subclass '{k}' for member AS{asn}, skipping")
|
||||
|
||||
return bgp_cl
|
||||
|
||||
```
|
||||
|
||||
The essence of this function is to take a human readable list of symbols, like 'router:chrma0' and
|
||||
look up what subclass is called 'router' and what router_id is 'chrma0'. It does this for keywords
|
||||
'router', 'country', 'group' and 'asn' and for a special keyword called 'all' as well.
|
||||
|
||||
Running this a function on Antonios' member data above would reveal the following:
|
||||
```
|
||||
Member 210312 has permissions:
|
||||
[{'class': 2000, 'subclass': 10, 'value': 1, 'description': 'permission.router = chrma0'}]
|
||||
Member 210312 has inhibits:
|
||||
[{'class': 3000, 'subclass': 30, 'value': 2365, 'description': 'inhibit.group = chix'}]
|
||||
```
|
||||
|
||||
The neat thing about this is, that this data will come in handy for _both_ types of propagation, and
|
||||
the `parse_member_communities()` helper function returns pretty readable data, which will help in
|
||||
debugging and further understanding the ultimately generated configuration.
|
||||
|
||||
#### Egress: Member-to-IXP
|
||||
|
||||
OK, when I learned Antonios' prefixes, I have instructed the system to propagate them to all
|
||||
sessions on router `chrma0`, except sessions on group `chix`. This means that in the direction of
|
||||
_from AS50869 to others_, I can do the following:
|
||||
|
||||
**1. Tag permissions and inhibits on ingress**
|
||||
|
||||
I add a tiny bit of logic using this data structure I just created above. In the import filter,
|
||||
remember I added `NOTE(pim): More comes here`? After setting the informational communities, I also
|
||||
add these:
|
||||
|
||||
```
|
||||
{% if session_type == "member" %}
|
||||
{% if permissions %}
|
||||
|
||||
# Add FreeIX Remote: Permission
|
||||
{% for el in permissions %}
|
||||
bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description
|
||||
}}
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
{% if inhibits %}
|
||||
|
||||
# Add FreeIX Remote: Inhibit
|
||||
{% for el in inhibits %}
|
||||
bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description
|
||||
}}
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
Seeing as this block only gets rendered if the session type is _member_, let me show you how
|
||||
Antonios' import filter looks like in its full glory:
|
||||
|
||||
```
|
||||
filter ebgp_chix_210312_import {
|
||||
if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject;
|
||||
if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject;
|
||||
if ! ebgp_import_member(210312) then reject;
|
||||
|
||||
# Add FreeIX Remote: Informational
|
||||
bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net
|
||||
bgp_large_community.add((50869,1020,1)); ## informational.country = CH
|
||||
bgp_large_community.add((50869,1030,2365)); ## informational.group = chix
|
||||
|
||||
# Add FreeIX Remote: Permission
|
||||
bgp_large_community.add((50869,2010,1)); ## permission.router = chrma0
|
||||
|
||||
# Add FreeIX Remote: Inhibit
|
||||
bgp_large_community.add((50869,3030,2365)); ## inhibit.group = chix
|
||||
|
||||
accept;
|
||||
}
|
||||
```
|
||||
|
||||
Remember, the `ebgp_import_member()` helper will strip any informational (the 1000s) and permissions
|
||||
(the 2000s), but it would allow Antonios to set inhibits and prepends (the 3000s) so these BGP
|
||||
communities will still be allowed in. In other words, Antonios can't give himself propagation rights
|
||||
(sorry, buddy!) but if he would like to make AS50869 stop sending his prefixes to, say, CommunityIX,
|
||||
he could simply add the BGP community `(50869,3030,2013)` on his announcements, and that will get
|
||||
honored. If he'd like AS50869 to prepend itself twice before announcing to peer AS8298, he could set
|
||||
`(50869,3200,8298)` and that will also get picked up.
|
||||
|
||||
**2. Match permissions and inhibits on egress**
|
||||
|
||||
Now that all of Antonios' prefixes are tagged with permissions and inhibits, I can reveal how I
|
||||
implemented the export filters for AS50869:
|
||||
|
||||
```
|
||||
function member_prefix(int group) -> bool
|
||||
{
|
||||
bool permitted = false;
|
||||
|
||||
if (({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community ||
|
||||
({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community ||
|
||||
({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community ||
|
||||
({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then {
|
||||
permitted = true;
|
||||
}
|
||||
if (({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community ||
|
||||
({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community ||
|
||||
({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community ||
|
||||
({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then {
|
||||
permitted = false;
|
||||
}
|
||||
return (permitted);
|
||||
}
|
||||
|
||||
function valid_prefix(int group) -> bool
|
||||
{
|
||||
return (source_prefix() || member_prefix(group));
|
||||
}
|
||||
|
||||
function ebgp_export_peer(int remote_as; int group) -> bool
|
||||
{
|
||||
if (source != RTS_BGP && source != RTS_STATIC) then return false;
|
||||
if !valid_prefix(group) then return false;
|
||||
|
||||
bgp_community.delete([(50869, *)]);
|
||||
bgp_large_community.delete([(50869, *, *)]);
|
||||
|
||||
return ebgp_export(remote_as);
|
||||
}
|
||||
```
|
||||
|
||||
From the bottom, the function `ebgp_export_peer()` is invoked on each peering session, and it gets
|
||||
the argument of the remote AS (for example 13335 for CloudFlare), and the group (for example 2365
|
||||
for CHIX). The function ensures that it's either a _static_ route or a _BGP_ route. Then it makes
|
||||
sure it's a `valid_prefix()` for the group.
|
||||
|
||||
The `valid_prefix()` function first checks if it's one of our own (as in: AS50869's own) prefixes,
|
||||
which it does by calling `source_prefix()`, which i've ommitted here as it would be a distraction.
|
||||
All it does is check if the prefix is in a static prefix list generated with `bgpq4` for AS50869
|
||||
itself. The more interesting observation is that to be eligible, the prefix needs to be either
|
||||
`source_prefix()` **or** `member_prefix(group)`.
|
||||
|
||||
The propagation decision for 'Member-to-IXP' actually happens in that `member_prefix()` function. It
|
||||
starts off by assuming the prefix is not permitted. Then it scans all relevant _permissions_
|
||||
communities which may be present in the RIB for this prefix:
|
||||
- is the `all` permissions community `(50869,2000,0)` set?
|
||||
- what about the `router` permission `(50869,2010,R)` for my _router_id_?
|
||||
- perhaps the `country` permission `(50869,2020,C)` for my _country_id_?
|
||||
- or maybe the `group` permission `(50869,2030,G)` for the _ixp_id_ that this session lives on?
|
||||
|
||||
If any of these conditions are true, then this prefix _might_ pe permitted, so I set the variable to
|
||||
True. Next, I check and see if any of the _inhibit_ communities are set, either by me (in
|
||||
`members.yaml`) or by the member on the live BGP session. If any one of them matches, then I flip
|
||||
the variable to False again. Once the verdict is known, I can return True or False here, which
|
||||
makes its way all the way up the call stack and ultimately announces the member prefix on the BGP
|
||||
session, or not. Slick!
|
||||
|
||||
#### Egress: IXP-to-Member
|
||||
|
||||
At this point, members' prefixes get announced at the correct internet exchange points, but I need to
|
||||
satisfy one more requirement: the prefixes picked up at those IXPs, should _also_ be announced to
|
||||
members. For this, the helper dictionary with permissions and inhibits can be used in a clever way.
|
||||
What if I held them against the informational communities? For example, I have _permitted_
|
||||
Antonios to be annouced at any IXP connected to router `chrma0`, then all prefixes I learned at
|
||||
`chrma0` are fair game, right? But, I configured an _inhibit_ for Antonios' prefixes at CHIX. No
|
||||
problem, I have an informational community for all prefixes I learned from the CHIX group!
|
||||
|
||||
I come to the realization that IXP-to-Member simply adds to the Member-to-IXP logic. Everything that
|
||||
I would announce to a peer, I will also announce to a member. Off I go, adding one last helper
|
||||
function to the BGP session Jinja template:
|
||||
|
||||
```
|
||||
{% if session_type == "member" %}
|
||||
function ebgp_export_{{group_name}}_{{their_asn}}(int remote_as; int group) -> bool
|
||||
{
|
||||
bool permitted = false;
|
||||
|
||||
if (source != RTS_BGP && source != RTS_STATIC) then return false;
|
||||
if valid_prefix(group) then return ebgp_export(remote_as);
|
||||
|
||||
{% for el in permissions | default([]) %}
|
||||
if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=true; ## {{el.description}}
|
||||
{% endfor %}
|
||||
{% for el in inhibits | default([]) %}
|
||||
if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=false; ## {{el.description}}
|
||||
{% endfor %}
|
||||
|
||||
if (permitted) then return ebgp_export(remote_as);
|
||||
return false;
|
||||
}
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
Note that in essence, this new function still calls `valid_prefix()`, which in turn calls
|
||||
`source_prefix()` **or** `member_prefix(group)`, so it announces the same prefixes that are also
|
||||
announced to sessions of type 'peer'. But then, I'll also inspect the _informational_ communities,
|
||||
where the value of `0` is replaced with a wildcard, because 'permit or inhibit all' would mean
|
||||
'match any of these BGP communities'. This template renders as follows for Antonios at CHIX:
|
||||
|
||||
```
|
||||
function ebgp_export_chix_210312(int remote_as; int group) -> bool
|
||||
{
|
||||
bool export = false;
|
||||
|
||||
if (source != RTS_BGP && source != RTS_STATIC) then return false;
|
||||
if valid_prefix(group) then return ebgp_export(remote_as);
|
||||
|
||||
if (bgp_large_community ~ [(50869,1010,1)]) then export=true; ## permission.router = chrma0
|
||||
if (bgp_large_community ~ [(50869,1030,2365)]) then export=false; ## inhibit.group = chix
|
||||
|
||||
if (export) then return ebgp_export(remote_as);
|
||||
return false;
|
||||
}
|
||||
```
|
||||
|
||||
## Results
|
||||
|
||||
With this, the propagation logic is complete. Announcements are _symmetric_, that is to say the function
|
||||
`ebgp_export_chix_210312()` sees to it that Antonios gets the prefixes learned at router `chrma0`
|
||||
but not those learned at group `CHIX`. Similarly, the `ebgp_export_peer()` ensures that Antonios'
|
||||
prefixes are propagated to any session at router `chrma0` except those sessions at group `CHIX`.
|
||||
|
||||
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
||||
|
||||
I have installed VPP with [[OSPFv3]({{< ref 2024-06-22-vpp-ospf-2.md >}})] unnumbered interfaces,
|
||||
so each router has exactly one IPv4 and IPv6 loopback address. The router in Rümlang has been
|
||||
operational for a while, the one in Amsterdam (nlams0.free-ix.net) and Thessaloniki
|
||||
(grskg0.free-ix.net) have been deployed and are connecting to IXPs now, and the one in Milan
|
||||
(itmil0.free-ix.net) has been installed but is pending physical deployment at Caldara.
|
||||
|
||||
I deployed a test setup with a few permissions and inhibits on the Rümlang router, with many thanks
|
||||
to Jurrian, Sam and Antonios for allowing me to guinnaepig-ize their member sessions. With the
|
||||
following test configuration:
|
||||
|
||||
```
|
||||
member:
|
||||
35202:
|
||||
description: OnTheGo (Sam Aschwanden)
|
||||
prefix_filter: AS-OTG
|
||||
permission: [ router:chrma0 ]
|
||||
inhibit: [ group:comix ]
|
||||
210312:
|
||||
description: DaKnObNET
|
||||
prefix_filter: AS-SET-DNET
|
||||
permission: [ router:chrma0 ]
|
||||
inhibit: [ group:chix ]
|
||||
212635:
|
||||
description: Jurrian van Iersel
|
||||
prefix_filter: AS212635:AS-212635
|
||||
permission: [ router:chrma0 ]
|
||||
inhibit: [ group:chix, group:fogixp ]
|
||||
```
|
||||
|
||||
I can see the following prefix learn/announce counts towards _members_:
|
||||
|
||||
```
|
||||
pim@chrma0:~$ for i in $(birdc show protocol | grep member | cut -f1 -d' '); do echo -n $i\ ; birdc
|
||||
show protocol all $i | grep Routes; done
|
||||
chix_member_35202_ipv4_1 2 imported, 0 filtered, 159984 exported, 0 preferred
|
||||
chix_member_35202_ipv6_1 2 imported, 0 filtered, 61730 exported, 0 preferred
|
||||
chix_member_210312_ipv4_1 3 imported, 0 filtered, 3518 exported, 3 preferred
|
||||
chix_member_210312_ipv6_1 2 imported, 0 filtered, 1251 exported, 2 preferred
|
||||
comix_member_35202_ipv4_1 2 imported, 0 filtered, 159981 exported, 2 preferred
|
||||
comix_member_35202_ipv4_2 2 imported, 0 filtered, 159981 exported, 1 preferred
|
||||
comix_member_35202_ipv6_1 2 imported, 0 filtered, 61727 exported, 2 preferred
|
||||
comix_member_35202_ipv6_2 2 imported, 0 filtered, 61727 exported, 1 preferred
|
||||
fogixp_member_212635_ipv4_1 1 imported, 0 filtered, 442 exported, 1 preferred
|
||||
fogixp_member_212635_ipv6_1 14 imported, 0 filtered, 181 exported, 14 preferred
|
||||
freeix_ch_member_210312_ipv4_1 3 imported, 0 filtered, 3521 exported, 0 preferred
|
||||
freeix_ch_member_210312_ipv6_1 2 imported, 0 filtered, 1253 exported, 0 preferred
|
||||
```
|
||||
|
||||
Let me make a few observations:
|
||||
* Hurricane Electric AS6939 is present at CHIX, and they tend to announce a very large number of
|
||||
prefixes. So every member who is permitted (and not inhibited) at CHIX will see all of those: Sam's
|
||||
AS35202 is inhibited on CommunityIX but not on CHIX, and he's permitted on both. That explains why
|
||||
he is seeing the routes on both sessions.
|
||||
* I've inhibited Jurrian's AS212635 to/from both CHIX and FogIXP, which means he will be seeing
|
||||
CommunityIX (~245 IPv4, 85 IPv6 prefixes), and FreeIX CH (~173 IPv4 and ~60 IPv6). We also send him
|
||||
the member prefixes, which is about 35 or so additional prefixes. This explains why Jurrian is
|
||||
receiving from us ~440 IPv4 and ~180 IPv6.
|
||||
* Antonios' AS210312, the exemplar in this article, is receiving all-but-CHIX. FogIXP yields 3077
|
||||
or so IPv4 and 1056 IPv6 prefixes, while I've already added up FreeIX, CommunityIX, and our members
|
||||
(this is what we're sending Jurrian!), at 330 resp 180, so Antonios should be getting about 3500 IPv4
|
||||
prefixes and 1250 IPv6 prefixes.
|
||||
|
||||
In the other direction, I would expect to be announcing to _peers_ only prefixes belonging to either
|
||||
AS50869 itself, or those of our members:
|
||||
|
||||
```
|
||||
pim@chrma0:~$ for i in $(birdc show protocol | grep peer.*_1 | cut -f1 -d' '); do echo -n $i\ ; birdc
|
||||
show protocol all $i | grep Routes || echo; done
|
||||
chix_peer_212100_ipv4_1 57618 imported, 0 filtered, 24 exported, 778 preferred
|
||||
chix_peer_212100_ipv6_1 21979 imported, 1 filtered, 37 exported, 7186 preferred
|
||||
chix_peer_13335_ipv4_1 4767 imported, 9 filtered, 24 exported, 4765 preferred
|
||||
chix_peer_13335_ipv6_1 371 imported, 1 filtered, 37 exported, 369 preferred
|
||||
chix_peer_6939_ipv4_1 151787 imported, 27 filtered, 24 exported, 133943 preferred
|
||||
chix_peer_6939_ipv6_1 61191 imported, 6 filtered, 37 exported, 16223 preferred
|
||||
comix_peer_44596_ipv4_1 594 imported, 0 filtered, 25 exported, 10 preferred
|
||||
comix_peer_44596_ipv6_1 1147 imported, 0 filtered, 50 exported, 0 preferred
|
||||
comix_peer_8298_ipv4_1 23 imported, 0 filtered, 25 exported, 0 preferred
|
||||
comix_peer_8298_ipv6_1 34 imported, 0 filtered, 50 exported, 0 preferred
|
||||
fogixp_peer_47498_ipv4_1 3286 imported, 1 filtered, 27 exported, 3077 preferred
|
||||
fogixp_peer_47498_ipv6_1 1838 imported, 0 filtered, 39 exported, 1056 preferred
|
||||
freeix_ch_peer_51530_ipv4_1 355 imported, 0 filtered, 28 exported, 0 preferred
|
||||
freeix_ch_peer_51530_ipv6_1 143 imported, 0 filtered, 53 exported, 0 preferred
|
||||
```
|
||||
|
||||
Some observations:
|
||||
|
||||
* Nobody is inhibited at FreeIX Switzerland. It stands to reason therefore, that it has the most
|
||||
exported prefixes: 28 for IPv4 and 53 for IPv6.
|
||||
* Two members are inhibited at CHIX, which makes it have the lowest amount of exported prefixes:
|
||||
24 for IPv4 and 27 for IPv6.
|
||||
* All members at each exchange (group) will have the same amount of prefixes. I can confirm that
|
||||
at CHIX, all thre peers have the same amount of announced prefixes. Similarly, at CommunityIX, all
|
||||
peers have the same amount.
|
||||
* If Antonios, Sam or Jurrian would add an outgoing announcement to AS50869 with an additional inhibit
|
||||
BGP community (eg `(50869,3020,1)` to inhibit country Switzerland), they could tweak these numbers.
|
||||
|
||||
## What's next
|
||||
|
||||
This all adds up. I'd like to test the waters with my friendly neighborhood canaries a little bit,
|
||||
to make sure that announcements are expected, and traffic flows where appropriate. In the mean time,
|
||||
I'll chase the deployment of LSIX, FrysIX, SpeedIX and possibly a few others in Amsterdam. And of
|
||||
course FreeIX Greece in Thessaloniki. I'll try to get the Milano VPP router deployed (it's already
|
||||
installed and configured, but currently powered off) and connected to PCIX, MIX and a few others.
|
||||
|
||||
## How can you help?
|
||||
|
||||
If you're willing to participate with a VPP router and connect it to either multiple local internet
|
||||
exchanges (like I've demonstrated in Zurich), or better yet, to one or more of the other existing
|
||||
routers, I would welcome your contribution. [[Contact]({{< ref contact.md >}})] me for details.
|
||||
|
||||
A bit further down the pike, a connection from Amsterdam to Zurich, from Zurich to Milan and from
|
||||
Milan to Thessaloniki is on the horizon. If you are willing and able to donate some bandwidth (point
|
||||
to point VPWS, VLL, L2VPN) and your transport network is capable of at least 2026 bytes of _inner_
|
||||
payload, please also [[reach out]({{< ref contact.md >}})] as I'm sure many small network operators
|
||||
would be thrilled.
|
BIN
static/assets/freeix/freeix-artist-rendering.png
(Stored with Git LFS)
Normal file
BIN
static/assets/freeix/freeix-artist-rendering.png
(Stored with Git LFS)
Normal file
Binary file not shown.
Reference in New Issue
Block a user