diff --git a/content/articles/2024-10-21-freeix-2.md b/content/articles/2024-10-21-freeix-2.md index 8b72768..7a8abd7 100644 --- a/content/articles/2024-10-21-freeix-2.md +++ b/content/articles/2024-10-21-freeix-2.md @@ -8,56 +8,58 @@ title: "FreeIX - Remote, part 2" # Introduction A few months ago, I wrote about [[an idea]({{< ref 2024-04-27-freeix-1.md >}})] to help boost the -value of small internet exchange points.j When the internet exchange doesn't have many members, +value of small Internet Exchange Points (_IXPs). When such an exchange doesn't have many members, then the operational costs of connecting to it (cross connects, router ports, finding peers, etc) are not very favorable. -Yet, the benefit of using an Internet Exchange is to reduce the portion of an ISP’s (and CDN’s) +Clearly, the benefit of using an Internet Exchange is to reduce the portion of an ISP’s (and CDN’s) traffic that must be delivered via their upstream transit providers, thereby reducing the average per-bit delivery cost and as well reducing the end to end latency as seen by their users or customers. Furthermore, the increased number of paths available through the IXP improves routing -efficiency and fault-tolerance, and it avoids traffic going the scenic route to a large hub like -Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local. +efficiency and fault-tolerance, and at the same time it avoids traffic going the scenic route to a +large hub like Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local. ## Refresher: FreeIX Remote {{< image width="20em" float="right" src="/assets/freeix/Free IX Remote.svg" alt="FreeIX Remote" >}} -Let's take for example the Free IX in Greece that was announced at GRNOG16 in Athens on April 19th, -2024. This exchange initially targets Athens and Thessaloniki, with 2x100G between the two cities. -Members can connect to either site for the cost of only a cross connect. The 1G/10G/25G ports will -be _Gratis_. But I have connected one very special router to Free IX Greece, which will be -offering an outreach infrastructure by connecting to other internet exchange points in Amsterdam, -and allowing all FreeIX Greece members to benefit from that in the following way: +Let's take for example the [[Free IX in Greece](https://free-ix.gr/)] that was announced at GRNOG16 +in Athens on April 19th, 2024. This exchange initially targets Athens and Thessaloniki, with 2x100G +between the two cities. Members can connect to either site for the cost of only a cross connect. +The 1G/10G/25G ports will be _Gratis_, so please make sure to apply if you're in this region! I +myself have connected one very special router to Free IX Greece, which will be offering an outreach +infrastructure by connecting to _other_ Internet Exchange Points in Amsterdam, and allowing all FreeIX +Greece members to benefit from that in the following way: -1. FreeIX uses AS50869 to peer with any network operator (or routeserver) available at public -internet exchanges or using private interconnects. It looks like a normal service provider in this -regard. It will connect to internet exchanges, and learn a bunch of routes. +1. FreeIX Remote uses AS50869 to peer with any network operator (or routeserver) available at public +Internet Exchange Points or using private interconnects. For these peers, it looks like a completely +normal service provider in this regard. It will connect to internet exchange points, and learn a bunch of +routes and announce other routes. -1. FreeIX _members_ can join the program, after which they are granted certain propagation -permissions by FreeIX at the point where they have a BGP session with AS50869. The prefixes learned -on these _member_ sessions are marked as such, and will be allowed to propagate. Members will -receive some or all learned prefixes from AS50869. +1. FreeIX Remote _members_ can join the program, after which they are granted certain propagation +permissions by FreeIX Remote at the point where they have a BGP session with AS50869. The prefixes +learned on these _member_ sessions are marked as such, and will be allowed to propagate. Members +will receive some or all learned prefixes from AS50869. 1. FreeIX _members_ can set fine grained BGP communities to determine which of their prefixes are -propagated and at which locations. +propagated to and from which locations, by router, country or Internet Exchange Point. -Members at smaller internet exchanges greatly benefit from this type of outreach, by receiving large -portions of the public internet directly at their preferred peering location. Similarly, the _Free -IX Remote_ routers will carry their traffic to these remote internet exchanges. - -My [[previous article]({{< ref 2024-04-27-freeix-1.md >}})] went into a good amount of detail on the -principles of operation, but back then I made a promise to come back to the actual _implementation_ -of such a complex routing topology. As a starting point, I work with the structure I shared in -[[IPng's Routing Policy]({{< ref 2021-11-14-routing-policy.md >}})], if you haven't read that yet, I -think you should consider taking a look as many of the structural elements will be the same. +Members at smaller internet exchange points greatly benefit from this type of outreach, by receiving large +portions of the public internet directly at their preferred peering location. The _Free IX Remote_ +routers will carry member traffic to and from these remote Internet Exchange Points. My [[previous +article]({{< ref 2024-04-27-freeix-1.md >}})] went into a good amount of detail on the principles of +operation, but back then I made a promise to come back to the actual _implementation_ of such a +complex routing topology. As a starting point, I work with the structure I shared in [[IPng's +Routing Policy]({{< ref 2021-11-14-routing-policy.md >}})]. If you haven't read that yet, I think +it may make sense to take a look as many of the structural elements and concepts will be similar. ## Implementation -The routing policy calls for three classes of (large) BGP communities (informational, permission and -inhibit). It also defines a few classic BGP communties, but I'll skip over those as they are not +The routing policy calls for three classes of (large) BGP communities: informational, permission and +inhibit. It also defines a few classic BGP communties, but I'll skip over those as they are not very interesting. Firstly, I will use the _informational_ communities to tag which prefixes were -learned by which router, in which country and at which exchange point. +learned by which _router_, in which _country_ and at which internet exchange point, which I will call a +_group_. Then, I will use the same structure to grant members _permissions_, that is to say, when AS50869 learns their prefixes, they will get tagged with specific action communities that enable propagation @@ -70,17 +72,18 @@ To help structure this implementation, it helps if I think about it in the following way: Let's say, AS50869 is connected to IXP1, IXP2, IXP3 and IXP4. AS50869 has a _member_ called M1 at -IXP1, and that member is 'allowed' to reach IXP2 and IXP3, but not IXP4. My FreeIX Remote -implementation now has to satisfy three main requirements: +IXP1, and that member is 'permitted' to reach IXP2 and IXP3, but it is 'inhibited' from reaching +IXP4. My _FreeIX Remote_ implementation now has to satisfy three main requirements: -1. **Ingress**: learn prefixes (from peers and members alike) at internet exchanges and 'tag' them -with the correct informational communities. -1. **Egress: Member-to-IXP**: Announce M1's prefixes at IXP2 and IXP3, but not at IXP4. +1. **Ingress**: learn prefixes (from peers and members alike) at internet exchange points or +private network interconnects, and 'tag' them with the correct informational communities. +1. **Egress: Member-to-IXP**: Announce M1's prefixes to IXP2 and IXP3, but not to IXP4. 1. **Egress: IXP-to-Member**: Announce IXP2's and IXP3's prefixes to M1, but not IXP4's. ### Defining Countries and Routers -I'll start by giving each country which has a router a unique _country_id_ in a YAML file: +I'll start by giving each country which has at least one router a unique _country_id_ in a YAML +file, leaving the value 0 to mean 'all' countries: ``` $ cat config/common/countries.yaml @@ -92,7 +95,7 @@ country: IT: 4 ``` -Each router has its own configuration file, and at the top, I'll define some meta data which +Each router has its own configuration file, and at the top, I'll define some metadata which includes things like the country in which it operates, and its own unique _router_id_, like so: ``` @@ -143,7 +146,7 @@ ebgp: In order to keep this system manageable, I have to rely on automation. I intend to leverage the BGP community _subclasses_ in a simple ACL system consisting of the following YAML, taking my buddy -Antonios's network as an example: +Antonios' network as an example: ``` $ cat config/common/members.yaml @@ -151,7 +154,7 @@ member: 210312: description: DaKnObNET prefix_filter: AS-SET-DNET - permission: [ router:chrma0, country:NL ] + permission: [ router:chrma0 ] inhibit: [ group:chix ] ... ``` @@ -159,8 +162,9 @@ member: The syntax of the `permission` and `inhibit` fields are identical. They are lists of key:value pairs where they key must be one of the _subclasses_ (eg. 'router', 'country', 'group', 'asn'), and the value appropriate for that type. In this example, AS50869 is being asked to grant permissions for -Antonios' prefixes to any peer connected to `router:chrma0` or to any router in `countery:NL`, but -inhibit propagation to/from the exchange point called `group:chix`. +Antonios' prefixes to any peer connected to `router:chrma0`, but inhibit propagation to/from the +exchange point called `group:chix`. I could extend this list, for example by adding a permission to +`country:NL` or an inhibit to `router:grskg0` and so on. I decide that sensible defaults are to give permissions to all, and keep inhibit empty. In other words: be very liberal in propagation, to maximize the value that FreeIX Remote can provide its @@ -168,7 +172,7 @@ members. ### Ingress: Learning Prefixes -With what I've defined so far, I can see how to set informational communtiies: +With what I've defined so far, I can start to set informational BGP communtiies: * The prefixes learned on subclass **router** for `chrma0` will have value of device.id=1: `(50869,1010,1)` * The prefixes learned on subclass **country** for `chrma0` will learn from device.country=CH and @@ -215,8 +219,8 @@ I'll use a helper function `ebgp_merge_value()` to walk the YAML hiearchy from t enriched _session_ to the _group_ and finally to the _ebgp_ scope, looking for the existence of a key called _prefix_filter_ and defaulting to None in case none was found. With the value of _prefix_filter_ in hand (in this case `AS-SET-DNET`), I shell out to `bgpq4` for IPv4 and IPv6 -respectively. Sometimes, there are no IPv6 prefixes (booh!) and sometimes there are no IPv4 prefixes -(welcome to the Internet, kid!) +respectively. Sometimes, there are no IPv6 prefixes (why must you be like this?!) and sometimes +there are no IPv4 prefixes (welcome to the Internet, kid!) All of this context, including the session and group information, are then fed as context to a Jinja renderer, where I can use them in an _import_ filter like so: @@ -264,14 +268,14 @@ Let me explain what's going on here, as Jinja templating language that my genera the `prefix_filter` context variable doesn't exist, assume it's the empty dictionary and thus, print no prefix lists. -Then, I create a Bird2 filter that has a globally unique name. I satisfy this requirement by giving -it a name with the tuple of {group, their_asn}. The first thing this filter does, is inspect -`prefix_filter_ipv4` and `prefix_filter_ipv6`, and if they are explicitly set to False (for example, -if a member doesn't have any IRR prefixes associated with their AS-SET), then I'll reject any -prefixes from them. Then, I'll match the prefixes with the `prefix_filter`, if provided, and reject -any prefixes that aren't in the list I'm expecting on this session. Assuming we're still good to go, -I'll hand this prefix off to a function called `ebgp_import_peer()` for peers and -`ebgp_import_member()` for members, both of which ensure BGP communities are scrubbed. +Then, I create a Bird2 filter and these must each have a globally unique name. I satisfy this +requirement by giving it a name with the tuple of {group, their_asn}. The first thing this filter +does, is inspect `prefix_filter_ipv4` and `prefix_filter_ipv6`, and if they are explicitly set to +False (for example, if a member doesn't have any IRR prefixes associated with their AS-SET), then +I'll reject any prefixes from them. Then, I'll match the prefixes with the `prefix_filter`, if +provided, and reject any prefixes that aren't in the list I'm expecting on this session. Assuming +we're still good to go, I'll hand this prefix off to a function called `ebgp_import_peer()` for +peers and `ebgp_import_member()` for members, both of which ensure BGP communities are scrubbed. ``` function ebgp_import_peer(int remote_as) -> bool @@ -297,7 +301,8 @@ function ebgp_import_member(int remote_as) -> bool After scrubbing the communities (peers are not allowed to set _any_ communities, and members are not allowed to set their own informational or permissions communities, but they are allowed to inhibit -themselves, if they wish), one last check is performed by calling the underlying `ebgp_import()`: +themselves or prepend, if they wish), one last check is performed by calling the underlying +`ebgp_import()`: ``` function ebgp_import(int remote_as) -> bool @@ -322,8 +327,9 @@ Shutdown](https://www.rfc-editor.org/rfc/rfc8326.html)], honor it and set the lo zero (making sure to prefer any other available path). OK, after all these checks are done, I am finally ready to accept the prefix from this peer or -member. It's time to add the informational communities based on the router, country and (if this is -a session at a public peering point documented in PeeringDB), the group's ixp_id. +member. It's time to add the informational communities based on the _router_id_, the router's +_country_id_ and (if this is a session at a public internet exchange point documented in PeeringDB), +the group's _ixp_id_. #### Ingress Example: member @@ -380,7 +386,7 @@ filter ebgp_chix_13335_import { Most sessions will actually look like this one: just learning prefixes, scrubbing inbound communities that are nobody's business to be setting but mine, tossing weird prefixes like bogons and then setting typically the three informational communities. I now know exactly which prefixes -are picked up at CHIX, which in Switzerland, and which at router `chrma0`. +are picked up at group CHIX, which ones in country Switzerland, and which ones on router `chrma0`. ### Egress: Propagating Prefixes @@ -391,8 +397,8 @@ announced to member M1. First, let me create a helper function in the generator. It's job is to take the symbolic `member.*.permissions` and `member.*.inhibit` lists and resolve them into a structure of numeric -values suitable for BGP community list adding and matching. It's a bit of a beast, but I'll rewrite -it a bit. Notably, I've removed all the error and exception handling for brevity: +values suitable for BGP community list adding and matching. It's a bit of a beast, but I've +simplified it a bit. Notably, I've removed all the error and exception handling for brevity: ``` def parse_member_communities(data, asn, type): @@ -439,7 +445,11 @@ def parse_member_communities(data, asn, type): ``` -Running such a function on Antonios' member data above would reveal the following: +The essence of this function is to take a human readable list of symbols, like 'router:chrma0' and +look up what subclass is called 'router' and what router_id is 'chrma0'. It does this for keywords +'router', 'country', 'group' and 'asn' and for a special keyword called 'all' as well. + +Running this a function on Antonios' member data above would reveal the following: ``` Member 210312 has permissions: [{'class': 2000, 'subclass': 10, 'value': 1, 'description': 'permission.router = chrma0'}] @@ -447,8 +457,9 @@ Member 210312 has inhibits: [{'class': 3000, 'subclass': 30, 'value': 2365, 'description': 'inhibit.group = chix'}] ``` -The neat thing about this is, that I can use it in a clever way for both types of propagation, and -the `parse_member_communities()` helper function returns pretty readable data. +The neat thing about this is, that this data will come in handy for _both_ types of propagation, and +the `parse_member_communities()` helper function returns pretty readable data, which will help in +debugging and further understanding the ultimately generated configuration. #### Egress: Member-to-IXP @@ -458,8 +469,8 @@ _from AS50869 to others_, I can do the following: **1. Tag permissions and inhibits on ingress** -I can add a tiny bit of logic using this data structure I just created above, in the import filter, -remember I added "NOTE(pim): More comes here"? After setting the informational communities, I also +I add a tiny bit of logic using this data structure I just created above. In the import filter, +remember I added `NOTE(pim): More comes here`? After setting the informational communities, I also add these: ``` @@ -508,14 +519,16 @@ filter ebgp_chix_210312_import { ``` Remember, the `ebgp_import_member()` helper will strip any informational (the 1000s) and permissions -(the 2000s), but it would allow Antonios to set inhibits (the 3000s) which will be allowed in. In -other words, Antonios can't give himself propagation rights (sorry, buddy!) but if he would like to -make AS50869 stop sending his prefixes to, say, CommunityIX, he could simply add the BGP community -`(50869,3030,2013)` on his announcements, and that will get honored. +(the 2000s), but it would allow Antonios to set inhibits and prepends (the 3000s) so these BGP +communities will still be allowed in. In other words, Antonios can't give himself propagation rights +(sorry, buddy!) but if he would like to make AS50869 stop sending his prefixes to, say, CommunityIX, +he could simply add the BGP community `(50869,3030,2013)` on his announcements, and that will get +honored. If he'd like AS50869 to prepend itself twice before announcing to peer AS8298, he could set +`(50869,3200,8298)` and that will also get picked up. **2. Match permissions and inhibits on egress** -Now that all of Antonios' prefixes are tagged with permissions and inhibits, I can share how I +Now that all of Antonios' prefixes are tagged with permissions and inhibits, I can reveal how I implemented the export filters for AS50869: ``` @@ -575,21 +588,21 @@ communities which may be present in the RIB for this prefix: - or maybe the `group` permission `(50869,2030,G)` for the _ixp_id_ that this session lives on? If any of these conditions are true, then this prefix _might_ pe permitted, so I set the variable to -True. Next, I see if any of the _inhibit_ communities are set, in much the same way. If any one of -them matches, then I flip the variable to False again. - -Once the verdict is known, I can return True or False here, which makes its way all the way up the -call stack and ultimately announces the member prefix on the BGP session, or not. Neat! +True. Next, I check and see if any of the _inhibit_ communities are set, either by me (in +`members.yaml`) or by the member on the live BGP session. If any one of them matches, then I flip +the variable to False again. Once the verdict is known, I can return True or False here, which +makes its way all the way up the call stack and ultimately announces the member prefix on the BGP +session, or not. Slick! #### Egress: IXP-to-Member -At this point, members' prefixes get announced at the correct internet exchanges, but I need to +At this point, members' prefixes get announced at the correct internet exchange points, but I need to satisfy one more requirement: the prefixes picked up at those IXPs, should _also_ be announced to members. For this, the helper dictionary with permissions and inhibits can be used in a clever way. What if I held them against the informational communities? For example, I have _permitted_ -Antonios to be annouce at any IXP connected to router `chrma0`, then all prefixes I learned at -`chrma0` are fair game, right? But, I configured an _inhibit_ for Antonios' prefixes at CHIX, but -again, I have an informational community for all prefixes I learned from the CHIX group! +Antonios to be annouced at any IXP connected to router `chrma0`, then all prefixes I learned at +`chrma0` are fair game, right? But, I configured an _inhibit_ for Antonios' prefixes at CHIX. No +problem, I have an informational community for all prefixes I learned from the CHIX group! I come to the realization that IXP-to-Member simply adds to the Member-to-IXP logic. Everything that I would announce to a peer, I will also announce to a member. Off I go, adding one last helper @@ -621,7 +634,7 @@ Note that in essence, this new function still calls `valid_prefix()`, which in t `source_prefix()` **or** `member_prefix(group)`, so it announces the same prefixes that are also announced to sessions of type 'peer'. But then, I'll also inspect the _informational_ communities, where the value of `0` is replaced with a wildcard, because 'permit or inhibit all' would mean -'match any of these BGP communities'. This template renders as follows for Antonis at CHIX: +'match any of these BGP communities'. This template renders as follows for Antonios at CHIX: ``` function ebgp_export_chix_210312(int remote_as; int group) -> bool @@ -641,7 +654,7 @@ function ebgp_export_chix_210312(int remote_as; int group) -> bool ## Results -With this, the logic is completed. Announcements are _symmetric_, that is to say the function +With this, the propagation logic is complete. Announcements are _symmetric_, that is to say the function `ebgp_export_chix_210312()` sees to it that Antonios gets the prefixes learned at router `chrma0` but not those learned at group `CHIX`. Similarly, the `ebgp_export_peer()` ensures that Antonios' prefixes are propagated to any session at router `chrma0` except those sessions at group `CHIX`. @@ -649,14 +662,14 @@ prefixes are propagated to any session at router `chrma0` except those sessions {{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}} I have installed VPP with [[OSPFv3]({{< ref 2024-06-22-vpp-ospf-2.md >}})] unnumbered interfaces, -so each router has exactly one IPv4 and IPv6 loopback address. The router in Zurich has been +so each router has exactly one IPv4 and IPv6 loopback address. The router in Rümlang has been operational for a while, the one in Amsterdam (nlams0.free-ix.net) and Thessaloniki (grskg0.free-ix.net) have been deployed and are connecting to IXPs now, and the one in Milan (itmil0.free-ix.net) has been installed but is pending physical deployment at Caldara. -I deployed a test setup with a few permissions and inhibits on the Zurich router, with many thanks +I deployed a test setup with a few permissions and inhibits on the Rümlang router, with many thanks to Jurrian, Sam and Antonios for allowing me to guinnaepig-ize their member sessions. With the -following configuration: +following test configuration: ``` member: @@ -707,10 +720,11 @@ the member prefixes, which is about 35 or so additional prefixes. This explains receiving from us ~440 IPv4 and ~180 IPv6. * Antonios' AS210312, the exemplar in this article, is receiving all-but-CHIX. FogIXP yields 3077 or so IPv4 and 1056 IPv6 prefixes, while I've already added up FreeIX, CommunityIX, and our members -(this is what we're sending Jurrian!), at 330 resp 180, so Antonis should be getting about 3500 IPv4 +(this is what we're sending Jurrian!), at 330 resp 180, so Antonios should be getting about 3500 IPv4 prefixes and 1250 IPv6 prefixes. -In the other direction, I would expect to be announcing to _peers_ only our members: +In the other direction, I would expect to be announcing to _peers_ only prefixes belonging to either +AS50869 itself, or those of our members: ``` pim@chrma0:~$ for i in $(birdc show protocol | grep peer.*_1 | cut -f1 -d' '); do echo -n $i\ ; birdc @@ -739,9 +753,9 @@ exported prefixes: 28 for IPv4 and 53 for IPv6. 24 for IPv4 and 27 for IPv6. * All members at each exchange (group) will have the same amount of prefixes. I can confirm that at CHIX, all thre peers have the same amount of announced prefixes. Similarly, at CommunityIX, all -pers have the same amount. +peers have the same amount. * If Antonios, Sam or Jurrian would add an outgoing announcement to AS50869 with an additional inhibit -BGP community (eg `(50869,3020,1)` to inhibit country Switzerland), they would tweak these numbers. +BGP community (eg `(50869,3020,1)` to inhibit country Switzerland), they could tweak these numbers. ## What's next @@ -749,15 +763,15 @@ This all adds up. I'd like to test the waters with my friendly neighborhood cana to make sure that announcements are expected, and traffic flows where appropriate. In the mean time, I'll chase the deployment of LSIX, FrysIX, SpeedIX and possibly a few others in Amsterdam. And of course FreeIX Greece in Thessaloniki. I'll try to get the Milano VPP router deployed (it's already -installed and configured, but currently powered off). +installed and configured, but currently powered off) and connected to PCIX, MIX and a few others. ## How can you help? If you're willing to participate with a VPP router and connect it to either multiple local internet -exchanges (like I've demonstrated in Zurich), or to the other routers, I would welcome your -contribution. [[Contact]({{< ref contact.md >}})] me for details. +exchanges (like I've demonstrated in Zurich), or better yet, to one or more of the other existing +routers, I would welcome your contribution. [[Contact]({{< ref contact.md >}})] me for details. -A bit further down the pike, a connection from Amsterdam to Zurich, from ZUrich to Milan and from +A bit further down the pike, a connection from Amsterdam to Zurich, from Zurich to Milan and from Milan to Thessaloniki is on the horizon. If you are willing and able to donate some bandwidth (point to point VPWS, VLL, L2VPN) and your transport network is capable of at least 2026 bytes of _inner_ payload, please also [[reach out]({{< ref contact.md >}})] as I'm sure many small network operators