diff --git a/content/articles/2024-04-27-freeix-1.md b/content/articles/2024-04-27-freeix-1.md index ca0c801..c8b2096 100644 --- a/content/articles/2024-04-27-freeix-1.md +++ b/content/articles/2024-04-27-freeix-1.md @@ -91,7 +91,7 @@ their traffic to these remote internet exchanges. There are two types of BGP neighbor adjacency: 1. ***Members***: these are {ip-address,AS}-tuples which FreeIX has explicitly configured. Learned prefixes are added - to as-set AS50869:AS-MEMBERS. Members receive _all_ prefixes from FreeIX, each annotated with BGP **informational** + to as-set AS50869:AS-MEMBERS. Members receive _some or all_ prefixes from FreeIX, each annotated with BGP **informational** communities, and members can drive certain behavior with BGP **action** communities. 1. ***Peers***: these are all other entities with whom FreeIX has an adjacency at public internet exchanges or private @@ -195,12 +195,12 @@ network interconnects: * `(50869,3020,1)`: Inhibit Action (30XX), Country (3020), Switzerland (1) * `(50869,3030,1308)`: Inhibit Action (30XX), IXP (3030), PeeringDB IXP for LS-IX (1308) -Further actions can be placed on a per-remote-neighbor basis: +Four actions can be placed on a per-remote-asn basis: * `(50869,3040,13030)`: Inhibit Action (30XX), AS (3040), Init7 (AS13030) -* `(50869,3041,6939)`: Prepend Action (30XX), Prepend Once (3041), Hurricane Electric (AS6939) -* `(50869,3042,12859)`: Prepend Action (30XX), Prepend Twice (3042), BIT BV (AS12859) -* `(50869,3043,8283)`: Prepend Action (30XX), Prepend Three Times (3043), Coloclue (AS8283) +* `(50869,3100,6939)`: Prepend Once Action (3100), Hurricane Electric (AS6939) +* `(50869,3200,12859)`: Prepend Twice Action (3200), BIT BV (AS12859) +* `(50869,3300,8283)`: Prepend Thice Action (3300), Coloclue (AS8283) Peers cannot set these actions, as all action communities will be stripped on ingress. Members can set these action communities on their sessions with FreeIX routers, however in some cases they may also be set by FreeIX operators when diff --git a/content/articles/2024-10-21-freeix-2.md b/content/articles/2024-10-21-freeix-2.md new file mode 100644 index 0000000..8b72768 --- /dev/null +++ b/content/articles/2024-10-21-freeix-2.md @@ -0,0 +1,764 @@ +--- +date: "2024-10-21T10:52:11Z" +title: "FreeIX - Remote, part 2" +--- + +{{< image width="18em" float="right" src="/assets/freeix/freeix-artist-rendering.png" alt="FreeIX, Artists Rendering" >}} + +# Introduction + +A few months ago, I wrote about [[an idea]({{< ref 2024-04-27-freeix-1.md >}})] to help boost the +value of small internet exchange points.j When the internet exchange doesn't have many members, +then the operational costs of connecting to it (cross connects, router ports, finding peers, etc) +are not very favorable. + +Yet, the benefit of using an Internet Exchange is to reduce the portion of an ISP’s (and CDN’s) +traffic that must be delivered via their upstream transit providers, thereby reducing the average +per-bit delivery cost and as well reducing the end to end latency as seen by their users or +customers. Furthermore, the increased number of paths available through the IXP improves routing +efficiency and fault-tolerance, and it avoids traffic going the scenic route to a large hub like +Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local. + +## Refresher: FreeIX Remote + +{{< image width="20em" float="right" src="/assets/freeix/Free IX Remote.svg" alt="FreeIX Remote" >}} + +Let's take for example the Free IX in Greece that was announced at GRNOG16 in Athens on April 19th, +2024. This exchange initially targets Athens and Thessaloniki, with 2x100G between the two cities. +Members can connect to either site for the cost of only a cross connect. The 1G/10G/25G ports will +be _Gratis_. But I have connected one very special router to Free IX Greece, which will be +offering an outreach infrastructure by connecting to other internet exchange points in Amsterdam, +and allowing all FreeIX Greece members to benefit from that in the following way: + +1. FreeIX uses AS50869 to peer with any network operator (or routeserver) available at public +internet exchanges or using private interconnects. It looks like a normal service provider in this +regard. It will connect to internet exchanges, and learn a bunch of routes. + +1. FreeIX _members_ can join the program, after which they are granted certain propagation +permissions by FreeIX at the point where they have a BGP session with AS50869. The prefixes learned +on these _member_ sessions are marked as such, and will be allowed to propagate. Members will +receive some or all learned prefixes from AS50869. + +1. FreeIX _members_ can set fine grained BGP communities to determine which of their prefixes are +propagated and at which locations. + +Members at smaller internet exchanges greatly benefit from this type of outreach, by receiving large +portions of the public internet directly at their preferred peering location. Similarly, the _Free +IX Remote_ routers will carry their traffic to these remote internet exchanges. + +My [[previous article]({{< ref 2024-04-27-freeix-1.md >}})] went into a good amount of detail on the +principles of operation, but back then I made a promise to come back to the actual _implementation_ +of such a complex routing topology. As a starting point, I work with the structure I shared in +[[IPng's Routing Policy]({{< ref 2021-11-14-routing-policy.md >}})], if you haven't read that yet, I +think you should consider taking a look as many of the structural elements will be the same. + +## Implementation + +The routing policy calls for three classes of (large) BGP communities (informational, permission and +inhibit). It also defines a few classic BGP communties, but I'll skip over those as they are not +very interesting. Firstly, I will use the _informational_ communities to tag which prefixes were +learned by which router, in which country and at which exchange point. + +Then, I will use the same structure to grant members _permissions_, that is to say, when AS50869 +learns their prefixes, they will get tagged with specific action communities that enable propagation +to other places. I will call this 'Member-to-IXP'. Sometimes, I'd like to be able to _inhibit_ +propagation of 'Member-to-IXP', so there will be a third set of communities that perform this +function. Finally, matching on the informational communities in a clever way will enable a symmetric +'IXP-to-Member' propagation. + +To help structure this implementation, it helps if I think about it in +the following way: + +Let's say, AS50869 is connected to IXP1, IXP2, IXP3 and IXP4. AS50869 has a _member_ called M1 at +IXP1, and that member is 'allowed' to reach IXP2 and IXP3, but not IXP4. My FreeIX Remote +implementation now has to satisfy three main requirements: + +1. **Ingress**: learn prefixes (from peers and members alike) at internet exchanges and 'tag' them +with the correct informational communities. +1. **Egress: Member-to-IXP**: Announce M1's prefixes at IXP2 and IXP3, but not at IXP4. +1. **Egress: IXP-to-Member**: Announce IXP2's and IXP3's prefixes to M1, but not IXP4's. + +### Defining Countries and Routers + +I'll start by giving each country which has a router a unique _country_id_ in a YAML file: + +``` +$ cat config/common/countries.yaml +country: + all: 0 + CH: 1 + NL: 2 + GR: 3 + IT: 4 +``` + +Each router has its own configuration file, and at the top, I'll define some meta data which +includes things like the country in which it operates, and its own unique _router_id_, like so: + +``` +$ cat config/chrma0.net.free-ix.net.yaml +device: + id: 1 + hostname: chrma0.free-ix.net + shortname: chrma0 + country: CH + loopbacks: + ipv4: 194.126.235.16 + ipv6: "2a0b:dd80:3101::" + location: "Hofwiesenstrasse, Ruemlang, Zurich, Switzerland" +... +``` + +### Defining communities + +Next, I define the BGP communities in `class` and `subclass` types, in the following YAML structure: + +``` +ebgp: + community: + legacy: + noannounce: 0 + blackhole: 666 + inhibit: 3000 + prepend1: 3100 + prepend2: 3200 + prepend3: 3300 + large: + class: + informational: 1000 + permission: 2000 + inhibit: 3000 + prepend1: 3100 + prepend2: 3200 + prepend3: 3300 + subclass: + all: 0 + router: 10 + country: 20 + group: 30 + asn: 40 +``` + +### Defining Members + +In order to keep this system manageable, I have to rely on automation. I intend to leverage the +BGP community _subclasses_ in a simple ACL system consisting of the following YAML, taking my buddy +Antonios's network as an example: + +``` +$ cat config/common/members.yaml +member: + 210312: + description: DaKnObNET + prefix_filter: AS-SET-DNET + permission: [ router:chrma0, country:NL ] + inhibit: [ group:chix ] + ... +``` + +The syntax of the `permission` and `inhibit` fields are identical. They are lists of key:value pairs +where they key must be one of the _subclasses_ (eg. 'router', 'country', 'group', 'asn'), and the +value appropriate for that type. In this example, AS50869 is being asked to grant permissions for +Antonios' prefixes to any peer connected to `router:chrma0` or to any router in `countery:NL`, but +inhibit propagation to/from the exchange point called `group:chix`. + +I decide that sensible defaults are to give permissions to all, and keep inhibit empty. In other +words: be very liberal in propagation, to maximize the value that FreeIX Remote can provide its +members. + +### Ingress: Learning Prefixes + +With what I've defined so far, I can see how to set informational communtiies: +* The prefixes learned on subclass **router** for `chrma0` will have value of device.id=1: +`(50869,1010,1)` +* The prefixes learned on subclass **country** for `chrma0` will learn from device.country=CH and +be able to look up in `countries['CH']` that this means value 1: `(50869,1020,1)` +* When learning prefixes from a given internet exchange, Kees already knows its PeeringDB +_ixp_id_, which is a unique value for each exchange point. Thus, subclass **group** for `chrma0` at +[[CommunityIX](https://www.peeringdb.com/ix/2013)] is ixp_id=2013: `(50869,1030,2013)` + +#### Ingress: Learning from members + +I need to make sure that members send only the prefixes that I expect from them. To do this, I'll +make use of a common tool called [[bgpq4](https://github.com/bgp/bgpq4)] which cobbles together the +prefixes belonging to an AS-SET by referencing one or more IRR databases. + +In Python, I'll prepare the Jinja context by generating the prefix filter lists like so: + +``` +if session["type"] == "member": + session = {**session, **data["member"][asn]} + +pf = ebgp_merge_value(data["ebgp"], group, session, "prefix_filter", None) +if pf: + ctx["prefix_filter"] = {} + pfn = pf + pfn = pfn.replace("-", "_") + pfn = pfn.replace(":", "_") + + for af in [4, 6]: + filter_name = "%s_%s_IPV%d" % (groupname.upper(), pfn, af) + filter_contents = fetch_bgpq(filter_name, pf, af, allow_morespecifics=True) + if "[" in filter_contents: + ctx["prefix_filter"][filter_name] = { "str": filter_contents, "af": af } + ctx["prefix_filter_ipv%d" % af] = True + else: + log.warning(f"Filter {filter_name} is empty!") + ctx["prefix_filter_ipv%d" % af] = False +``` + +First, if a given BGP session is of type _member_, I'll merge the `member[asn]` dictionary +into the `ebgp.group.session[asn]`. I've left out error handling for brevity, but in case the member +YAML file doesn't have an entry for the given ASN, it'll just revert back to being of type _peer_. + +I'll use a helper function `ebgp_merge_value()` to walk the YAML hiearchy from the member-data +enriched _session_ to the _group_ and finally to the _ebgp_ scope, looking for the existence of a +key called _prefix_filter_ and defaulting to None in case none was found. With the value of +_prefix_filter_ in hand (in this case `AS-SET-DNET`), I shell out to `bgpq4` for IPv4 and IPv6 +respectively. Sometimes, there are no IPv6 prefixes (booh!) and sometimes there are no IPv4 prefixes +(welcome to the Internet, kid!) + +All of this context, including the session and group information, are then fed as context to a +Jinja renderer, where I can use them in an _import_ filter like so: + +``` +{% for plname, pl in (prefix_filter | default({})).items() %} +{{pl.str}} +{% endfor %} + +filter ebgp_{{group_name}}_{{their_asn}}_import { +{% if not prefix_filter_ipv4 | default(True) %} + # WARNING: No IPv4 prefix filter found + if (net.type = NET_IP4) then reject; +{% endif %} +{% if not prefix_filter_ipv6 | default(True) %} + # WARNING: No IPv6 prefix filter found + if (net.type = NET_IP6) then reject; +{% endif %} +{% for plname, pl in (prefix_filter | default({})).items() %} +{% if pl.af == 4 %} + if (net.type = NET_IP4 && ! (net ~ {{plname}})) then reject; +{% elif pl.af == 6 %} + if (net.type = NET_IP6 && ! (net ~ {{plname}})) then reject; +{% endif %} +{% endfor %} +{% if session_type is defined %} + if ! ebgp_import_{{session_type}}({{their_asn}}) then reject; +{% endif %} + + # Add FreeIX Remote: Informational + bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.router}},{{device.id}})); ## informational.router = {{ device.hostname }} + bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.country}},{{country[device.country]}})); ## informational.country = {{ device.country }} +{% if group.peeringdb_ix.id %} + bgp_large_community.add(({{my_asn}},{{community.large.class.informational+community.large.subclass.group}},{{group.peeringdb_ix.id}})); ## informational.group = {{ group_name }} +{% endif %} + + ## NOTE(pim): More comes here, see Member-to-IXP below + + accept; +} +``` + +Let me explain what's going on here, as Jinja templating language that my generator uses is a bit +... chatty. The first block will print the dictionary of zero or more `prefix_filter` entries. If +the `prefix_filter` context variable doesn't exist, assume it's the empty dictionary and thus, +print no prefix lists. + +Then, I create a Bird2 filter that has a globally unique name. I satisfy this requirement by giving +it a name with the tuple of {group, their_asn}. The first thing this filter does, is inspect +`prefix_filter_ipv4` and `prefix_filter_ipv6`, and if they are explicitly set to False (for example, +if a member doesn't have any IRR prefixes associated with their AS-SET), then I'll reject any +prefixes from them. Then, I'll match the prefixes with the `prefix_filter`, if provided, and reject +any prefixes that aren't in the list I'm expecting on this session. Assuming we're still good to go, +I'll hand this prefix off to a function called `ebgp_import_peer()` for peers and +`ebgp_import_member()` for members, both of which ensure BGP communities are scrubbed. + +``` +function ebgp_import_peer(int remote_as) -> bool +{ + # Scrub BGP Communities (RFC 7454 Section 11) + bgp_community.delete([(50869, *)]); + bgp_large_community.delete([(50869, *, *)]); + + # Scrub BLACKHOLE community + bgp_community.delete((65535, 666)); + + return ebgp_import(remote_as); +} + +function ebgp_import_member(int remote_as) -> bool +{ + # We scrub only our own (informational, permissions) BGP Communities for members + bgp_large_community.delete([(50869,1000..2999,*)]); + + return ebgp_import(remote_as); +} +``` + +After scrubbing the communities (peers are not allowed to set _any_ communities, and members are not +allowed to set their own informational or permissions communities, but they are allowed to inhibit +themselves, if they wish), one last check is performed by calling the underlying `ebgp_import()`: + +``` +function ebgp_import(int remote_as) -> bool +{ + if aspath_bogon() then return false; + if (net.type = NET_IP4 && ipv4_bogon()) then return false; + if (net.type = NET_IP6 && ipv6_bogon()) then return false; + + if (net.type = NET_IP4 && ipv4_rpki_invalid()) then return false; + if (net.type = NET_IP6 && ipv6_rpki_invalid()) then return false; + + # Graceful Shutdown (https://www.rfc-editor.org/rfc/rfc8326.html) + if (65535, 0) ~ bgp_community then bgp_local_pref = 0; + + return true; +} +``` + +Here, belt-and-suspenders checks are performed, notably bogon AS Paths, IPv4/IPv6 prefixes and RPKI +invalids are filtered out. If the prefix has well-known community for [[BGP Graceful +Shutdown](https://www.rfc-editor.org/rfc/rfc8326.html)], honor it and set the local preference to +zero (making sure to prefer any other available path). + +OK, after all these checks are done, I am finally ready to accept the prefix from this peer or +member. It's time to add the informational communities based on the router, country and (if this is +a session at a public peering point documented in PeeringDB), the group's ixp_id. + +#### Ingress Example: member + +Here's what the rendered template looks like for Antonios' member session at CHIX: + +``` +# bgpq4 -Ab4 -R 32 -l 'define CHIX_AS_SET_DNET_IPV4' AS-SET-DNET +define CHIX_AS_SET_DNET_IPV4 = [ + 44.31.27.0/24{24,32}, 44.154.130.0/24{24,32}, 44.154.132.0/24{24,32}, + 147.189.216.0/21{21,32}, 193.5.16.0/22{22,32}, 212.46.55.0/24{24,32} +]; + +# bgpq4 -Ab6 -R 128 -l 'define CHIX_AS_SET_DNET_IPV6' AS-SET-DNET +define CHIX_AS_SET_DNET_IPV6 = [ + 2001:678:f5c::/48{48,128}, 2a05:dfc1:9174::/48{48,128}, 2a06:9f81:2500::/40{40,128}, + 2a06:9f81:2600::/40{40,128}, 2a0a:6044:7100::/40{40,128}, 2a0c:2f04:100::/40{40,128}, + 2a0d:3dc0::/29{29,128}, 2a12:bc0::/29{29,128} +]; + +filter ebgp_chix_210312_import { + if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject; + if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject; + if ! ebgp_import_member(210312) then reject; + + # Add FreeIX Remote: Informational + bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net + bgp_large_community.add((50869,1020,1)); ## informational.country = CH + bgp_large_community.add((50869,1030,2365)); ## informational.group = chix + + ## NOTE(pim): More comes here, see Member-to-IXP below + + accept; +} +``` + +#### Ingress Example: peer + +For completeness, here's a regular peer Cloudflare at CHIX, and I hope you agree that the Jinja +template renders down to something waaaay more readable now: + +``` +filter ebgp_chix_13335_import { + if ! ebgp_import_peer(13335) then reject; + + # Add FreeIX Remote: Informational + bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net + bgp_large_community.add((50869,1020,1)); ## informational.country = CH + bgp_large_community.add((50869,1030,2365)); ## informational.group = chix + + accept; +} +``` + +Most sessions will actually look like this one: just learning prefixes, scrubbing inbound +communities that are nobody's business to be setting but mine, tossing weird prefixes like bogons +and then setting typically the three informational communities. I now know exactly which prefixes +are picked up at CHIX, which in Switzerland, and which at router `chrma0`. + +### Egress: Propagating Prefixes + +And with that, I've completed the 'learning' part. Let me move to the 'propagating' part. A design +goal of FreeIX Remote is to have _symmetric_ propagation. In my example above, member M1 should have +its prefixes announced at IXP2 and IXP3, and all prefixes learned at IXP2 and IXP3 should be +announced to member M1. + +First, let me create a helper function in the generator. It's job is to take the symbolic +`member.*.permissions` and `member.*.inhibit` lists and resolve them into a structure of numeric +values suitable for BGP community list adding and matching. It's a bit of a beast, but I'll rewrite +it a bit. Notably, I've removed all the error and exception handling for brevity: + +``` +def parse_member_communities(data, asn, type): + myasn = data["ebgp"]["asn"] + cls = data["ebgp"]["community"]["large"]["class"] + sub = data["ebgp"]["community"]["large"]["subclass"] + + bgp_cl = [] + member = data["member"][asn] + + for perm in perms: + if perm == "all": + el = { "class": int(cls[type]), "subclass": int(sub["all"]), + "value": 0, "description": f"{type}.all" } + return [el] + k, v = perm.split(":") + if k == "country": + country_id = data["country"][v] + el = { "class": int(cls[type]), "subclass": int(sub["country"]), + "value": int(country_id), "description": f"{type}.{k} = {v}" } + bgp_cl.append(el) + elif k == "asn": + el = { "class": int(cls[type]), "subclass": int(sub["asn"]), + "value": int(v), "description": f"{type}.{k} = {v}" } + bgp_cl.append(el) + elif k == "router": + device_id = data["_devices"][v]["id"] + el = { "class": int(cls[type]), "subclass": int(sub["router"]), + "value": int(device_id), "description": f"{type}.{k} = {v}" } + bgp_cl.append(el) + elif k == "group": + group = data["ebgp"]["groups"][v] + if isinstance(group["peeringdb_ix"], dict): + ix_id = group["peeringdb_ix"]["id"] + else: + ix_id = group["peeringdb_ix"] + el = { "class": int(cls[type]), "subclass": int(sub["group"]), + "value": int(ix_id), "description": f"{type}.{k} = {v}" } + bgp_cl.append(el) + else: + log.warning (f"No implementation for {type} subclass '{k}' for member AS{asn}, skipping") + + return bgp_cl + +``` + +Running such a function on Antonios' member data above would reveal the following: +``` +Member 210312 has permissions: + [{'class': 2000, 'subclass': 10, 'value': 1, 'description': 'permission.router = chrma0'}] +Member 210312 has inhibits: + [{'class': 3000, 'subclass': 30, 'value': 2365, 'description': 'inhibit.group = chix'}] +``` + +The neat thing about this is, that I can use it in a clever way for both types of propagation, and +the `parse_member_communities()` helper function returns pretty readable data. + +#### Egress: Member-to-IXP + +OK, when I learned Antonios' prefixes, I have instructed the system to propagate them to all +sessions on router `chrma0`, except sessions on group `chix`. This means that in the direction of +_from AS50869 to others_, I can do the following: + +**1. Tag permissions and inhibits on ingress** + +I can add a tiny bit of logic using this data structure I just created above, in the import filter, +remember I added "NOTE(pim): More comes here"? After setting the informational communities, I also +add these: + +``` +{% if session_type == "member" %} +{% if permissions %} + + # Add FreeIX Remote: Permission +{% for el in permissions %} + bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description +}} +{% endfor %} +{% endif %} +{% if inhibits %} + + # Add FreeIX Remote: Inhibit +{% for el in inhibits %} + bgp_large_community.add(({{my_asn}},{{el.class+el.subclass}},{{el.value}})); ## {{ el.description +}} +{% endfor %} +{% endif %} +{% endif %} +``` + +Seeing as this block only gets rendered if the session type is _member_, let me show you how +Antonios' import filter looks like in its full glory: + +``` +filter ebgp_chix_210312_import { + if (net.type = NET_IP4 && ! (net ~ CHIX_AS_SET_DNET_IPV4)) then reject; + if (net.type = NET_IP6 && ! (net ~ CHIX_AS_SET_DNET_IPV6)) then reject; + if ! ebgp_import_member(210312) then reject; + + # Add FreeIX Remote: Informational + bgp_large_community.add((50869,1010,1)); ## informational.router = chrma0.free-ix.net + bgp_large_community.add((50869,1020,1)); ## informational.country = CH + bgp_large_community.add((50869,1030,2365)); ## informational.group = chix + + # Add FreeIX Remote: Permission + bgp_large_community.add((50869,2010,1)); ## permission.router = chrma0 + + # Add FreeIX Remote: Inhibit + bgp_large_community.add((50869,3030,2365)); ## inhibit.group = chix + + accept; +} +``` + +Remember, the `ebgp_import_member()` helper will strip any informational (the 1000s) and permissions +(the 2000s), but it would allow Antonios to set inhibits (the 3000s) which will be allowed in. In +other words, Antonios can't give himself propagation rights (sorry, buddy!) but if he would like to +make AS50869 stop sending his prefixes to, say, CommunityIX, he could simply add the BGP community +`(50869,3030,2013)` on his announcements, and that will get honored. + +**2. Match permissions and inhibits on egress** + +Now that all of Antonios' prefixes are tagged with permissions and inhibits, I can share how I +implemented the export filters for AS50869: + +``` +function member_prefix(int group) -> bool +{ + bool permitted = false; + + if (({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community || + ({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community || + ({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community || + ({{ebgp.asn}}, {{ebgp.community.large.class.permission+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then { + permitted = true; + } + if (({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.all}}, 0) ~ bgp_large_community || + ({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.router}}, {{ device.id }}) ~ bgp_large_community || + ({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.country}}, {{ country[device.country] }}) ~ bgp_large_community || + ({{ebgp.asn}}, {{ebgp.community.large.class.inhibit+ebgp.community.large.subclass.group}}, group) ~ bgp_large_community) then { + permitted = false; + } + return (permitted); +} + +function valid_prefix(int group) -> bool +{ + return (source_prefix() || member_prefix(group)); +} + +function ebgp_export_peer(int remote_as; int group) -> bool +{ + if (source != RTS_BGP && source != RTS_STATIC) then return false; + if !valid_prefix(group) then return false; + + bgp_community.delete([(50869, *)]); + bgp_large_community.delete([(50869, *, *)]); + + return ebgp_export(remote_as); +} +``` + +From the bottom, the function `ebgp_export_peer()` is invoked on each peering session, and it gets +the argument of the remote AS (for example 13335 for CloudFlare), and the group (for example 2365 +for CHIX). The function ensures that it's either a _static_ route or a _BGP_ route. Then it makes +sure it's a `valid_prefix()` for the group. + +The `valid_prefix()` function first checks if it's one of our own (as in: AS50869's own) prefixes, +which it does by calling `source_prefix()`, which i've ommitted here as it would be a distraction. +All it does is check if the prefix is in a static prefix list generated with `bgpq4` for AS50869 +itself. The more interesting observation is that to be eligible, the prefix needs to be either +`source_prefix()` **or** `member_prefix(group)`. + +The propagation decision for 'Member-to-IXP' actually happens in that `member_prefix()` function. It +starts off by assuming the prefix is not permitted. Then it scans all relevant _permissions_ +communities which may be present in the RIB for this prefix: +- is the `all` permissions community `(50869,2000,0)` set? +- what about the `router` permission `(50869,2010,R)` for my _router_id_? +- perhaps the `country` permission `(50869,2020,C)` for my _country_id_? +- or maybe the `group` permission `(50869,2030,G)` for the _ixp_id_ that this session lives on? + +If any of these conditions are true, then this prefix _might_ pe permitted, so I set the variable to +True. Next, I see if any of the _inhibit_ communities are set, in much the same way. If any one of +them matches, then I flip the variable to False again. + +Once the verdict is known, I can return True or False here, which makes its way all the way up the +call stack and ultimately announces the member prefix on the BGP session, or not. Neat! + +#### Egress: IXP-to-Member + +At this point, members' prefixes get announced at the correct internet exchanges, but I need to +satisfy one more requirement: the prefixes picked up at those IXPs, should _also_ be announced to +members. For this, the helper dictionary with permissions and inhibits can be used in a clever way. +What if I held them against the informational communities? For example, I have _permitted_ +Antonios to be annouce at any IXP connected to router `chrma0`, then all prefixes I learned at +`chrma0` are fair game, right? But, I configured an _inhibit_ for Antonios' prefixes at CHIX, but +again, I have an informational community for all prefixes I learned from the CHIX group! + +I come to the realization that IXP-to-Member simply adds to the Member-to-IXP logic. Everything that +I would announce to a peer, I will also announce to a member. Off I go, adding one last helper +function to the BGP session Jinja template: + +``` +{% if session_type == "member" %} +function ebgp_export_{{group_name}}_{{their_asn}}(int remote_as; int group) -> bool +{ + bool permitted = false; + + if (source != RTS_BGP && source != RTS_STATIC) then return false; + if valid_prefix(group) then return ebgp_export(remote_as); + +{% for el in permissions | default([]) %} + if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=true; ## {{el.description}} +{% endfor %} +{% for el in inhibits | default([]) %} + if (bgp_large_community ~ [({{ my_asn }},{{ 1000+el.subclass}},{% if el.value == 0%}*{% else %}{{el.value}}{% endif %})]) then permitted=false; ## {{el.description}} +{% endfor %} + + if (permitted) then return ebgp_export(remote_as); + return false; +} +{% endif %} +``` + +Note that in essence, this new function still calls `valid_prefix()`, which in turn calls +`source_prefix()` **or** `member_prefix(group)`, so it announces the same prefixes that are also +announced to sessions of type 'peer'. But then, I'll also inspect the _informational_ communities, +where the value of `0` is replaced with a wildcard, because 'permit or inhibit all' would mean +'match any of these BGP communities'. This template renders as follows for Antonis at CHIX: + +``` +function ebgp_export_chix_210312(int remote_as; int group) -> bool +{ + bool export = false; + + if (source != RTS_BGP && source != RTS_STATIC) then return false; + if valid_prefix(group) then return ebgp_export(remote_as); + + if (bgp_large_community ~ [(50869,1010,1)]) then export=true; ## permission.router = chrma0 + if (bgp_large_community ~ [(50869,1030,2365)]) then export=false; ## inhibit.group = chix + + if (export) then return ebgp_export(remote_as); + return false; +} +``` + +## Results + +With this, the logic is completed. Announcements are _symmetric_, that is to say the function +`ebgp_export_chix_210312()` sees to it that Antonios gets the prefixes learned at router `chrma0` +but not those learned at group `CHIX`. Similarly, the `ebgp_export_peer()` ensures that Antonios' +prefixes are propagated to any session at router `chrma0` except those sessions at group `CHIX`. + +{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}} + +I have installed VPP with [[OSPFv3]({{< ref 2024-06-22-vpp-ospf-2.md >}})] unnumbered interfaces, +so each router has exactly one IPv4 and IPv6 loopback address. The router in Zurich has been +operational for a while, the one in Amsterdam (nlams0.free-ix.net) and Thessaloniki +(grskg0.free-ix.net) have been deployed and are connecting to IXPs now, and the one in Milan +(itmil0.free-ix.net) has been installed but is pending physical deployment at Caldara. + +I deployed a test setup with a few permissions and inhibits on the Zurich router, with many thanks +to Jurrian, Sam and Antonios for allowing me to guinnaepig-ize their member sessions. With the +following configuration: + +``` +member: + 35202: + description: OnTheGo (Sam Aschwanden) + prefix_filter: AS-OTG + permission: [ router:chrma0 ] + inhibit: [ group:comix ] + 210312: + description: DaKnObNET + prefix_filter: AS-SET-DNET + permission: [ router:chrma0 ] + inhibit: [ group:chix ] + 212635: + description: Jurrian van Iersel + prefix_filter: AS212635:AS-212635 + permission: [ router:chrma0 ] + inhibit: [ group:chix, group:fogixp ] +``` + +I can see the following prefix learn/announce counts towards _members_: + +``` +pim@chrma0:~$ for i in $(birdc show protocol | grep member | cut -f1 -d' '); do echo -n $i\ ; birdc +show protocol all $i | grep Routes; done +chix_member_35202_ipv4_1 2 imported, 0 filtered, 159984 exported, 0 preferred +chix_member_35202_ipv6_1 2 imported, 0 filtered, 61730 exported, 0 preferred +chix_member_210312_ipv4_1 3 imported, 0 filtered, 3518 exported, 3 preferred +chix_member_210312_ipv6_1 2 imported, 0 filtered, 1251 exported, 2 preferred +comix_member_35202_ipv4_1 2 imported, 0 filtered, 159981 exported, 2 preferred +comix_member_35202_ipv4_2 2 imported, 0 filtered, 159981 exported, 1 preferred +comix_member_35202_ipv6_1 2 imported, 0 filtered, 61727 exported, 2 preferred +comix_member_35202_ipv6_2 2 imported, 0 filtered, 61727 exported, 1 preferred +fogixp_member_212635_ipv4_1 1 imported, 0 filtered, 442 exported, 1 preferred +fogixp_member_212635_ipv6_1 14 imported, 0 filtered, 181 exported, 14 preferred +freeix_ch_member_210312_ipv4_1 3 imported, 0 filtered, 3521 exported, 0 preferred +freeix_ch_member_210312_ipv6_1 2 imported, 0 filtered, 1253 exported, 0 preferred +``` + +Let me make a few observations: +* Hurricane Electric AS6939 is present at CHIX, and they tend to announce a very large number of +prefixes. So every member who is permitted (and not inhibited) at CHIX will see all of those: Sam's +AS35202 is inhibited on CommunityIX but not on CHIX, and he's permitted on both. That explains why +he is seeing the routes on both sessions. +* I've inhibited Jurrian's AS212635 to/from both CHIX and FogIXP, which means he will be seeing +CommunityIX (~245 IPv4, 85 IPv6 prefixes), and FreeIX CH (~173 IPv4 and ~60 IPv6). We also send him +the member prefixes, which is about 35 or so additional prefixes. This explains why Jurrian is +receiving from us ~440 IPv4 and ~180 IPv6. +* Antonios' AS210312, the exemplar in this article, is receiving all-but-CHIX. FogIXP yields 3077 +or so IPv4 and 1056 IPv6 prefixes, while I've already added up FreeIX, CommunityIX, and our members +(this is what we're sending Jurrian!), at 330 resp 180, so Antonis should be getting about 3500 IPv4 +prefixes and 1250 IPv6 prefixes. + +In the other direction, I would expect to be announcing to _peers_ only our members: + +``` +pim@chrma0:~$ for i in $(birdc show protocol | grep peer.*_1 | cut -f1 -d' '); do echo -n $i\ ; birdc +show protocol all $i | grep Routes || echo; done +chix_peer_212100_ipv4_1 57618 imported, 0 filtered, 24 exported, 778 preferred +chix_peer_212100_ipv6_1 21979 imported, 1 filtered, 37 exported, 7186 preferred +chix_peer_13335_ipv4_1 4767 imported, 9 filtered, 24 exported, 4765 preferred +chix_peer_13335_ipv6_1 371 imported, 1 filtered, 37 exported, 369 preferred +chix_peer_6939_ipv4_1 151787 imported, 27 filtered, 24 exported, 133943 preferred +chix_peer_6939_ipv6_1 61191 imported, 6 filtered, 37 exported, 16223 preferred +comix_peer_44596_ipv4_1 594 imported, 0 filtered, 25 exported, 10 preferred +comix_peer_44596_ipv6_1 1147 imported, 0 filtered, 50 exported, 0 preferred +comix_peer_8298_ipv4_1 23 imported, 0 filtered, 25 exported, 0 preferred +comix_peer_8298_ipv6_1 34 imported, 0 filtered, 50 exported, 0 preferred +fogixp_peer_47498_ipv4_1 3286 imported, 1 filtered, 27 exported, 3077 preferred +fogixp_peer_47498_ipv6_1 1838 imported, 0 filtered, 39 exported, 1056 preferred +freeix_ch_peer_51530_ipv4_1 355 imported, 0 filtered, 28 exported, 0 preferred +freeix_ch_peer_51530_ipv6_1 143 imported, 0 filtered, 53 exported, 0 preferred +``` + +Some observations: + +* Nobody is inhibited at FreeIX Switzerland. It stands to reason therefore, that it has the most +exported prefixes: 28 for IPv4 and 53 for IPv6. +* Two members are inhibited at CHIX, which makes it have the lowest amount of exported prefixes: +24 for IPv4 and 27 for IPv6. +* All members at each exchange (group) will have the same amount of prefixes. I can confirm that +at CHIX, all thre peers have the same amount of announced prefixes. Similarly, at CommunityIX, all +pers have the same amount. +* If Antonios, Sam or Jurrian would add an outgoing announcement to AS50869 with an additional inhibit +BGP community (eg `(50869,3020,1)` to inhibit country Switzerland), they would tweak these numbers. + +## What's next + +This all adds up. I'd like to test the waters with my friendly neighborhood canaries a little bit, +to make sure that announcements are expected, and traffic flows where appropriate. In the mean time, +I'll chase the deployment of LSIX, FrysIX, SpeedIX and possibly a few others in Amsterdam. And of +course FreeIX Greece in Thessaloniki. I'll try to get the Milano VPP router deployed (it's already +installed and configured, but currently powered off). + +## How can you help? + +If you're willing to participate with a VPP router and connect it to either multiple local internet +exchanges (like I've demonstrated in Zurich), or to the other routers, I would welcome your +contribution. [[Contact]({{< ref contact.md >}})] me for details. + +A bit further down the pike, a connection from Amsterdam to Zurich, from ZUrich to Milan and from +Milan to Thessaloniki is on the horizon. If you are willing and able to donate some bandwidth (point +to point VPWS, VLL, L2VPN) and your transport network is capable of at least 2026 bytes of _inner_ +payload, please also [[reach out]({{< ref contact.md >}})] as I'm sure many small network operators +would be thrilled. diff --git a/static/assets/freeix/freeix-artist-rendering.png b/static/assets/freeix/freeix-artist-rendering.png new file mode 100644 index 0000000..c870e7e --- /dev/null +++ b/static/assets/freeix/freeix-artist-rendering.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51270bff6d10ab3956652ec21ba6f7058db6c087ac7488a64172496cbe11fb54 +size 223513