Rewrite all images to Hugo format
This commit is contained in:
230
content/articles/2024-04-27-freeix-1.md
Normal file
230
content/articles/2024-04-27-freeix-1.md
Normal file
@ -0,0 +1,230 @@
|
||||
---
|
||||
date: "2024-04-27T10:52:11Z"
|
||||
title: FreeIX - Remote
|
||||
---
|
||||
|
||||
# Introduction
|
||||
|
||||
{{< image width="300px" float="right" src="/assets/freeix/openart-image_REzWzO43_1714219288118_raw.jpg" alt="OpenART" >}}
|
||||
|
||||
Tier1 and aspiring Tier2 providers interconnect only in large metropolitan areas, due to commercial incentives and
|
||||
politics. They won't often peer with smaller providers, because why peer with a potential customer? Due to this,
|
||||
it’s entirely likely that traffic between two parties in Thessaloniki is sent to Frankfurt or Milan and back.
|
||||
|
||||
One possible antidote to this is to connect to a local Internet Exchange point. Not all ISPs have access to large
|
||||
metropolitan datacenters where larger internet exchanges have a point of presence, and it doesn't help that the
|
||||
datacenter operator is happy to charge a substantial amount of money each month, just for the privilege of having
|
||||
a passive fiber cross connect to the exchange. Many Internet Exchanges these days ask for per-month port costs *and*
|
||||
meter the traffic with policers and rate limiters, such that the total cost of peering starts to exceed what one
|
||||
might pay for transit, especially at low volumes, which further exacerbates the problem. Bah.
|
||||
|
||||
This is an unfortunate market effect (the race to the bottom), where transit providers are continuously lowering their
|
||||
prices to compete. And while transit providers can make up to some extent due to economies of scale, at some point they
|
||||
are mostly all of equal size, and thus the only thing that can flex is quality of service.
|
||||
|
||||
The benefit of using an Internet Exchange is to reduce the portion of an ISP’s (and CDN’s) traffic that must be
|
||||
delivered via their upstream transit providers, thereby reducing the average per-bit delivery cost and as well reducing
|
||||
the end to end latency as seen by their users or customers. Furthermore, the increased number of paths available through
|
||||
the IXP improves routing efficiency and fault-tolerance, and it avoids traffic going the scenic route to a large hub
|
||||
like Frankfurt, London, Amsterdam, Paris or Rome, if it could very well remain local.
|
||||
|
||||
IPng Networks really believes in an open and affordable Internet, and I would like to do my part in ensuring the
|
||||
internet stays accessible for smaller parties.
|
||||
|
||||
## Smöl IXPs
|
||||
|
||||
One notable problem with small exchanges, like for example [[FNC-IX](https://www.fnc-ix.net/)] in the Paris metro, or
|
||||
[[CHIX-CH](https://ch-ix.ch/)], [[Community IX](https://www.community-ix.ch/)] and [[Free-IX](https://free-ix.ch/)] in
|
||||
the Zurich metropolitan area, is that they are, well, small. They may be cheaper to connect to, in some cases even free,
|
||||
but they don't have a sizable membership which means that there is inherently less traffic flowing, which in turn makes
|
||||
it less appealing for prospect members to connect to.
|
||||
|
||||
At IPng, I have partnered with a few super cool ISPs and carriers to offer a Free Internet Exchange platform. Just to
|
||||
head the main question off at the pass: _Free_ here actually does mean "Free as in beer" or
|
||||
[[Gratis](https://en.wikipedia.org/wiki/Gratis)], a gift to the community that does not cost money. It also more
|
||||
philosophically wants to be "Free as in open, and transparent" or
|
||||
[[Libre](https://en.wikipedia.org/wiki/Free_software)].
|
||||
|
||||
Two examples are:
|
||||
* [[Free IX: Switzerland](https://free-ix.ch/)] with POPs at STACK GEN01 Geneva, NTT Zurich and Bancadati Lugano.
|
||||
* [[Free IX: Greece](https://free-ix.gr/)] with POPs at TISparkle in Athens and Balkan Gate in Thessaloniki.
|
||||
|
||||
.. but there are actually quite a few out there once you start looking :)
|
||||
|
||||
## Growing Smöl IXPs
|
||||
|
||||
Some internet exchanges break through the magical 1Tbps barrier (and get a courtesy callout on Twitter from Dr. King),
|
||||
but many remain smöl. Perhaps it's time to break the _chicken-and-egg_ problem. What if there was a way to interconnect
|
||||
these exchanges?
|
||||
|
||||
Let's take for example the Free IX in Greece that was announced at GRNOG16 in Athens on April 19th. This exchange
|
||||
initially targets Athens and Thessaloniki, with 2x100G between the two cities. Members can connect to either site for
|
||||
the cost of only a cross connect. The 1G/10G/25G ports will be _Gratis_. But I will be connecting one very special
|
||||
member to Free IX Greece, AS50869:
|
||||
|
||||
{{< image src="/assets/freeix/Free IX Remote.svg" alt="FreeIX Remote" >}}
|
||||
|
||||
## Free IX: Remote
|
||||
|
||||
Here's what I am going to build. The _Free IX Remote_ project offers an outreach infrastructure which connects to
|
||||
internet exchange points, and allows members to benefit from that in the following way:
|
||||
|
||||
1. FreeIX uses AS50869 to peer with any network operator who is available at public internet exchanges or using
|
||||
private interconnects. It looks like a normal service provider in this regard. It will connect to internet
|
||||
exchanges, and learn a bunch of routes.
|
||||
1. FreeIX _members_ can join the program, after which they are granted certain propagation permissions by FreeIX
|
||||
at the point where they have a BGP session with AS50869. The prefixes learned on these _member_ sessions are marked
|
||||
as such, and will be allowed to propagate. Members will receive some or all learned prefixes from AS50869.
|
||||
1. FreeIX _members_ can set fine grained BGP communities to determine which of their prefixes are propagated and at
|
||||
which locations.
|
||||
|
||||
Members at smaller internet exchanges greatly benefit from this type of outreach, by receiving large portions of the
|
||||
public internet directly at their preferred peering location. Similarly, the _Free IX Remote_ routers will carry
|
||||
their traffic to these remote internet exchanges.
|
||||
|
||||
## Detailed Design
|
||||
|
||||
### Peer types
|
||||
|
||||
There are two types of BGP neighbor adjacency:
|
||||
|
||||
1. ***Members***: these are {ip-address,AS}-tuples which FreeIX has explicitly configured. Learned prefixes are added
|
||||
to as-set AS50869:AS-MEMBERS. Members receive _all_ prefixes from FreeIX, each annotated with BGP **informational**
|
||||
communities, and members can drive certain behavior with BGP **action** communities.
|
||||
|
||||
1. ***Peers***: these are all other entities with whom FreeIX has an adjacency at public internet exchanges or private
|
||||
network interconnects. Peers receive some (or all) _member prefixes_ from FreeIX and cannot drive any behavior
|
||||
with communities. With respect to internet exchanges and peers, AS50869 looks like a completely normal ISP,
|
||||
advertising subsets of the customer AS cone from AS50869:AS-MEMBERS at each exchange point.
|
||||
|
||||
BGP sessions with members use strict ingress filtering by means of `bgpq4`, and will be tagged with a set of
|
||||
informational BGP communities, such as where the prefix was learned, and what propagation permissions that it received
|
||||
(eg. at which internet exchanges will it be allowed to be announced). Of course, prefixes that are RPKI invalid will be
|
||||
dropped, while valid and unknown prefixes will be accepted. Members are granted _permissions_ by FreeIX, which determine
|
||||
where their prefixes will be announced by AS50869. Further, members can perform optional actions by means of BGP communities
|
||||
at their ingress point, to inhibit announcements to a certain peer or at a given exchange point.
|
||||
|
||||
Peers on the other hand are not granted any permissions and all action BGP communities will be stripped on prefixes
|
||||
learned. Informational communities will still be tagged on learned prefixes. Two things happen here. Firstly, members
|
||||
will be offered only those prefixes for which they have permission -- in other words, I will create a configuration file
|
||||
that says member AS8298 may receive prefixes learned from Frys-IX. Secondly, even for those prefixes that are advertised,
|
||||
the member AS8298 can use the informational communities to further filter what they accept from Free IX Remote AS50869.
|
||||
|
||||
# BGP Classic Communities
|
||||
|
||||
Members are allowed to set the following legacy action BGP communities for coarse grained distribution of their prefixes
|
||||
through the FreeIX network.
|
||||
|
||||
* `(50869,0)` or `(50869,3000)` do not announce anywhere
|
||||
* `(50869,666)` or `(65535,666)` blackhole everywhere (can be on any more specific from the member's AS-SET)
|
||||
* `(50869,3100)` prepend once everywhere
|
||||
* `(50869,3200)` prepend twice everywhere
|
||||
* `(50869,3300)` prepend three times everywhere
|
||||
|
||||
Peers, on the other hand, are not allowed to set _any_ communities, so all classic BGP communities from them are stripped
|
||||
on ingress.
|
||||
|
||||
# BGP Large Communities
|
||||
|
||||
Free IX Remote will use three types of BGP Large Communities, which each serve a distinct purpose:
|
||||
|
||||
1. ***Informational***: These communities are set by the FreeIX router when learning a prefix. They cannot be set by
|
||||
peers or members, and will be stripped on ingress. They will be sent to both members and peers, allowing operators to
|
||||
choose which prefixes to learn based on their origin details, like which country or internet exchange they were
|
||||
learned at.
|
||||
|
||||
1. ***Permission***: These communities are also set by FreeIX operators when learning a prefix (eg. on the ingress
|
||||
router). They cannot be set by peers or members, and will be stripped on ingress. The permission communities
|
||||
determine where FreeIX will allow the prefix to propagate. They will be stripped on egress.
|
||||
|
||||
1. ***Action***: Based on the permissions, members can further steer announcements by sending certain action communities
|
||||
to FreeIX. These actions cannot be sent by peers, but in certain cases they can be set by FreeIX operators on ingress.
|
||||
Similarly to the _permission_ communties, all _action_ communities will be stripped on egress.
|
||||
|
||||
Regular peers of AS50869 at exchange points and private network interconnects will not be able to set any communities,
|
||||
so all large BGP communities from them are stripped on ingress.
|
||||
|
||||
### Informational Communities
|
||||
|
||||
When FreeIX routers learn prefixes, they will annotate them with certain communities. For example, the router at
|
||||
Amsterdam NIKHEF (which is router #1, country #2), when learning a prefix at FrysIX (which is ixp #1152), will set the
|
||||
following BGP large communities:
|
||||
|
||||
* `(50869,1010,1)`: Informational (10XX), Router (1010), vpp0.nlams0.free-ix.net (1)
|
||||
* `(50869,1020,2)`: Informational (10XX), Country (1020), Netherlands (2)
|
||||
* `(50869,1030,1152)`: Informational (10XX), IXP (1030), PeeringDB IXP for FrysIX (1152)
|
||||
|
||||
When propagating these prefixes to neighbors (both members and peers), these informational communities can be used to
|
||||
determine local policy, for example by setting a different localpref or dropping prefixes from a certain location.
|
||||
Informational communities can be read, but they can't be _set_ by peers or members -- they are always cleared by FreeIX
|
||||
routers when learning prefixes, and as such the only routers which will set them are the FreeIX ones.
|
||||
|
||||
### Permission Communities
|
||||
|
||||
FreeIX maintains a list of permissions per member. When members announce their prefixes to FreeIX routers, these
|
||||
permissions communities are set. They determine what the member is allowed to do with FreeIX propagation - notably which
|
||||
routers, countries, and internet exchanges the member will be allowed to propagate to.
|
||||
|
||||
Usually, member prefixes are allowed to propagate everywhere, so the following communities might be set by the FreeIX
|
||||
router on ingress:
|
||||
|
||||
* `(50869,2010,0)`: Permission (20XX), Router (2010), everywhere (0)
|
||||
* `(50869,2020,0)`: Permission (20XX), Country (2020), everywhere (0)
|
||||
* `(50869,2030,0)`: Permission (20XX), IXP (2030), everywhere (0)
|
||||
|
||||
If the member prefixes are allowed to propagate only to certain places, the 'everywhere' communities will not be set,
|
||||
and instead lists of communities with finer grained permissions can be used, for example:
|
||||
|
||||
* `(50869,2010,2)`: Permission (20XX), Router (2010), vpp0.grskg0.free-ix.net (2)
|
||||
* `(50869,2020,3)`: Permission (20XX), Country (2020), Greece (3)
|
||||
* `(50869,2030,60)`: Permission (20XX), IXP (2030), PeeringDB IXP for SwissIX (60)
|
||||
|
||||
Permission communities can't be set by peers, nor by members -- they are always cleared by FreeIX routers when learning
|
||||
prefixes, and are configured explicitly by FreeIX operators.
|
||||
|
||||
### Action Communities
|
||||
|
||||
Based on the permission communities, zero or more egress routers, countries and internet exchanges are eligible to
|
||||
propagate member prefixes by AS50869 to its peers. Members can define very fine grained action communities to further
|
||||
tweak which prefixes propagate on which routers, in which countries and towards which internet exchanges and private
|
||||
network interconnects:
|
||||
|
||||
* `(50869,3010,3)`: Inhibit Action (30XX), Router (3010), vpp0.gratt0.free-ix.net (3)
|
||||
* `(50869,3020,1)`: Inhibit Action (30XX), Country (3020), Switzerland (1)
|
||||
* `(50869,3030,1308)`: Inhibit Action (30XX), IXP (3030), PeeringDB IXP for LS-IX (1308)
|
||||
|
||||
Further actions can be placed on a per-remote-neighbor basis:
|
||||
|
||||
* `(50869,3040,13030)`: Inhibit Action (30XX), AS (3040), Init7 (AS13030)
|
||||
* `(50869,3041,6939)`: Prepend Action (30XX), Prepend Once (3041), Hurricane Electric (AS6939)
|
||||
* `(50869,3042,12859)`: Prepend Action (30XX), Prepend Twice (3042), BIT BV (AS12859)
|
||||
* `(50869,3043,8283)`: Prepend Action (30XX), Prepend Three Times (3043), Coloclue (AS8283)
|
||||
|
||||
Peers cannot set these actions, as all action communities will be stripped on ingress. Members can set these action
|
||||
communities on their sessions with FreeIX routers, however in some cases they may also be set by FreeIX operators when
|
||||
learning prefixes.
|
||||
|
||||
## What's next
|
||||
|
||||
{{< image width="200px" float="right" src="/assets/freeix/bird-logo.svg" alt="Bird" >}}
|
||||
|
||||
Perhaps this interaction between _informational_, _permission_ and _action_ BGP communities gives you an idea on how
|
||||
such a network may operate. It's somewhat different to a classic Transit provider, in that AS50869 will not carry a
|
||||
full table. It'll _merely_ provide a form of partial transit from member A at IXP #1, to and from all peers that
|
||||
can be found at IXPs #2-#N. Makes the mind boggle? Don't worry, we'll figure it out together :)
|
||||
|
||||
In an upcoming article I'll detail the programming work that goes into implementing this complex peering policy in Bird2
|
||||
as driving VPP routers (duh), with an IGP that is IPv4-less, because at this point, I [[may as well]({%post_url
|
||||
2024-04-06-vpp-ospf %})] put my money where my mouth is.
|
||||
|
||||
If you're interested in this kind of stuff, take a look at the IPng Networks AS8298 [[Routing Policy]({% post_url
|
||||
2021-11-14-routing-policy %})]. Similar to that one, this one will use a combination of functional programming, templates,
|
||||
and clever expansions to make a customized per-member and per-peer configuration based on a YAML input file which
|
||||
dictates which member and which prefix is allowed to go where.
|
||||
|
||||
{{< image width="200px" float="right" src="/assets/vpp/fdio-color.svg" alt="VPP" >}}
|
||||
|
||||
First, I need to get a replacement router for the Thessaloniki router, which will run VPP of course. My buddy Antonis
|
||||
noticed that there are CPU and/or DDR errors on that chassis, so it may need to be RMAd. But once it's operational, I will
|
||||
start by deploying one instance in Amsterdam NIKHEF, and another in Thessaloniki Balkan Gate, with a 100G connection
|
||||
between them, graciously provided by [[LANCOM](https://www.lancom.gr/en/)]. Just look at that FD.io hound runnnnn!!1
|
Reference in New Issue
Block a user