--- date: "2021-11-14T22:49:09Z" title: Case Study - BGP Routing Policy aliases: - /s/articles/2021/11/14/routing-policy.html --- # Introduction BGP Routing policy is a very interesting topic. I get asked about it formally and informally all the time. I have to admit, there are lots of ways to organize an automous system. Vendors have unique features and templating / procedural functions, but in the end, BGP routing policy all boils down to two+two things: 1. Not accepting the prefixes you don't want (inbound) * For those prefixes accepted, ensure they have correct attributes. 1. Not announcing prefixes to folks who shouldn't see them (outbound) * For those prefixes announced, ensure they have correct attributes. At IPng Networks, I've cycled through a few iterations and landed on a specific setup that works well for me. It provides sufficient information to enable our downstream (customers) to make good decisions on what they should accept from us, as well as enough expressivity for them to determine which prefixes we should propagate for them, where, and how. This article describes one approach to a relatively feature rich routing policy which is in use at IPng Networks (AS8298). It uses the [Bird2](https://bird.network.cz/) configuration language, although the concepts would be implementable in ~any modern routing suite (ie. FRR, Cisco, Juniper, Arista, Extreme, et cetera). Interested in one operator's opinion? Read on! ## 1. Concepts There are three basic pieces of routing filtering, which I'll describe briefly. ### Prefix Lists A prefix list (also sometimes referred to as an access-list in older software) is a list of IPv4 of IPv6 prefixes, often with a prefixlen boundary, that determines if a given prefix is "in" or "out". An example could be: `2001:db8::/32{32,48}` which describes any prefix in the supernet `2001:db8::/32` that has a prefix length of anywhere between /32 and /48, inclusive. ### AS Paths In BGP, each prefix learned comes with an AS path on how to reach it. If my router learns a prefix from a peer with AS number `65520`, it'll see every prefix that peer sends as a list of AS numbers starting with 65520. With AS Paths, the very first one in the list is the one the router directly learned the prefix from, and the very last one is the origin of the prefix. Often times the prefix is shown as a regular expression, starting with `^` and ending with `$` and to help readability, spaces are often written as `_`. Examples: `^25091_1299_3301$` and `^58299_174_1299_3301$` ### BGP Communities When learning (or originating) a prefix in BGP, zero or more so called `communities` can be added to it along the way. The _Routing Information Base_ or _RIB_ carries these communities and can share them between peering sessions. Communities can be added, removed and modified. Some communities have special meaning (which is agreed upon by everyone), and some have local meaning (agreed upon by only one or a small set of operators). There's three types of communities: _normal_ communities are a pair of 16-bit integers; _extended_ communities are 8 bytes, split into one 16-bit integer and an additional 48-bit value; and finally _large_ communities consist of a triplet of 32-bit values. Examples: `(8298, 1234)` (normal), or `(8298, 3, 212323)` (large) # Routing Policy Now that I've explained a little bit about the ingredients we have to work with, let me share an observation that took me a few decades to make: BGP sessions are really all the same. As such, every single one of the BGP sessions at IPng Networks are generated with one template. What makes the difference between 'Transit', 'Customer' and 'Peer' and 'Private Interconnect', really all boils down to what types of filtering are applied on in- and outbound updates. I will demonstrate this by means of two main functions in Bird: `ebgp_import()` discussed first in the section ***Inbound: Learning Routes*** section, and `ebgp_export()` in the section ***Outbound: Announcing Routes***. ## 2. Inbound: Learning Routes Let's consider this function: ``` function ebgp_import(int remote_as) { if aspath_bogon() then return false; if (net.type = NET_IP4 && ipv4_bogon()) then return false; if (net.type = NET_IP6 && ipv6_bogon()) then return false; if (net.type = NET_IP4 && ipv4_rpki_invalid()) then return false; if (net.type = NET_IP6 && ipv6_rpki_invalid()) then return false; # Demote certain AS nexthops to lower pref if (bgp_path.first ~ AS_LOCALPREF50 && bgp_path.len > 1) then bgp_local_pref = 50; if (bgp_path.first ~ AS_LOCALPREF30 && bgp_path.len > 1) then bgp_local_pref = 30; if (bgp_path.first ~ AS_LOCALPREF10 && bgp_path.len > 1) then bgp_local_pref = 10; # Graceful Shutdown (RFC8326) if (65535, 0) ~ bgp_community then bgp_local_pref = 0; # Scrub BLACKHOLE community bgp_community.delete((65535, 666)); return true; } ``` The function works by order of elimination -- for each prefix that is offered on the session, it will either be rejected (by means of returning `false`), or modified (by means of setting attributes like `bgp_local_pref`) and then accepted (by means of returning `true`). ***AS-Path Bogon*** filtering is a way to remove prefixes that have an invalid AS number in their path. The main example of this are private AS numbers (64496-131071) and their 32 bit equivalents (4200000000-4294967295). In case you haven't come across this yet, AS number 23456 is also magic, see [RFC4893](https://datatracker.ietf.org/doc/html/rfc4893) for details: ``` function aspath_bogon() { return bgp_path ~ [0, 23456, 64496..131071, 4200000000..4294967295]; } ``` ***Prefix Bogon*** comes next, as certain prefixes that are not publicly routable (you know, such as [RFC1918](https://datatracker.ietf.org/doc/html/rfc1918), but there are many others). They look differently for IPv4 and IPv6: ``` function ipv4_bogon() { return net ~ [ 0.0.0.0/0, # Default 0.0.0.0/32-, # RFC 5735 Special Use IPv4 Addresses 0.0.0.0/0{0,7}, # RFC 1122 Requirements for Internet Hosts -- Communication Layers 3.2.1.3 10.0.0.0/8+, # RFC 1918 Address Allocation for Private Internets 100.64.0.0/10+, # RFC 6598 IANA-Reserved IPv4 Prefix for Shared Address Space 127.0.0.0/8+, # RFC 1122 Requirements for Internet Hosts -- Communication Layers 3.2.1.3 169.254.0.0/16+, # RFC 3927 Dynamic Configuration of IPv4 Link-Local Addresses 172.16.0.0/12+, # RFC 1918 Address Allocation for Private Internets 192.0.0.0/24+, # RFC 6890 Special-Purpose Address Registries 192.0.2.0/24+, # RFC 5737 IPv4 Address Blocks Reserved for Documentation 192.168.0.0/16+, # RFC 1918 Address Allocation for Private Internets 198.18.0.0/15+, # RFC 2544 Benchmarking Methodology for Network Interconnect Devices 198.51.100.0/24+, # RFC 5737 IPv4 Address Blocks Reserved for Documentation 203.0.113.0/24+, # RFC 5737 IPv4 Address Blocks Reserved for Documentation 224.0.0.0/4+, # RFC 1112 Host Extensions for IP Multicasting 240.0.0.0/4+ # RFC 6890 Special-Purpose Address Registries ]; } function ipv6_bogon() { return net ~ [ ::/0, # Default ::/96, # IPv4-compatible IPv6 address - deprecated by RFC4291 ::/128, # Unspecified address ::1/128, # Local host loopback address ::ffff:0.0.0.0/96+, # IPv4-mapped addresses ::224.0.0.0/100+, # Compatible address (IPv4 format) ::127.0.0.0/104+, # Compatible address (IPv4 format) ::0.0.0.0/104+, # Compatible address (IPv4 format) ::255.0.0.0/104+, # Compatible address (IPv4 format) 0000::/8+, # Pool used for unspecified, loopback and embedded IPv4 addresses 0100::/8+, # RFC 6666 - reserved for Discard-Only Address Block 0200::/7+, # OSI NSAP-mapped prefix set (RFC4548) - deprecated by RFC4048 0400::/6+, # RFC 4291 - Reserved by IETF 0800::/5+, # RFC 4291 - Reserved by IETF 1000::/4+, # RFC 4291 - Reserved by IETF 2001:10::/28+, # RFC 4843 - Deprecated (previously ORCHID) 2001:20::/28+, # RFC 7343 - ORCHIDv2 2001:db8::/32+, # Reserved by IANA for special purposes and documentation 2002:e000::/20+, # Invalid 6to4 packets (IPv4 multicast) 2002:7f00::/24+, # Invalid 6to4 packets (IPv4 loopback) 2002:0000::/24+, # Invalid 6to4 packets (IPv4 default) 2002:ff00::/24+, # Invalid 6to4 packets 2002:0a00::/24+, # Invalid 6to4 packets (IPv4 private 10.0.0.0/8 network) 2002:ac10::/28+, # Invalid 6to4 packets (IPv4 private 172.16.0.0/12 network) 2002:c0a8::/32+, # Invalid 6to4 packets (IPv4 private 192.168.0.0/16 network) 3ffe::/16+, # Former 6bone, now decommissioned 4000::/3+, # RFC 4291 - Reserved by IETF 5f00::/8+, # RFC 5156 - used for the 6bone but was returned 6000::/3+, # RFC 4291 - Reserved by IETF 8000::/3+, # RFC 4291 - Reserved by IETF a000::/3+, # RFC 4291 - Reserved by IETF c000::/3+, # RFC 4291 - Reserved by IETF e000::/4+, # RFC 4291 - Reserved by IETF f000::/5+, # RFC 4291 - Reserved by IETF f800::/6+, # RFC 4291 - Reserved by IETF fc00::/7+, # Unicast Unique Local Addresses (ULA) - RFC 4193 fe80::/10+, # Link-local Unicast fec0::/10+, # Site-local Unicast - deprecated by RFC 3879 (replaced by ULA) ff00::/8+ # Multicast ]; } ``` That's a long list!! But operators on the _DFZ_ should really never be accepting any of these, and we should all collectively yell at those who propagate them. ***RPKI Filtering*** is a fantastic routing security feature, described in [RFC6810](https://datatracker.ietf.org/doc/html/rfc6810) and relatively straight forward to implement. For each _originating_ AS number, we can check in a table of known `` mapping, if it is the correct ISP to originate the prefix. The lookup can either match (which makes the prefix RPKI valid), the lookup can fail because the prefix is missing (which makes the prefix RPKI unknown), and it can specifically mismatch (which makes the prefix RPKI invalid). Operators are encouraged to flag and drop _invalid_ prefixes: ``` function ipv4_rpki_invalid() { return roa_check(t_roa4, net, bgp_path.last) = ROA_INVALID; } function ipv6_rpki_invalid() { return roa_check(t_roa6, net, bgp_path.last) = ROA_INVALID; } ``` ***NOTE***: In NLNOG my post sparked a bit of debate on the use of `bgp_path.last_nonaggregated` versus simply `bgp_path.last`. Job Snijders did some spelunking and offered [this post](https://bird.network.cz/pipermail/bird-users/2019-September/013805.html) and a reference to [RFC6907](https://datatracker.ietf.org/doc/html/rfc6907) for details, and Tijn confirmed that Coloclue (on which many of my approaches have been modeled) indeed uses `bgp_path.last`. I've updated my configs, with many thanks for the discussion. Alright, now that I've determined the as-path and prefix are kosher, and that it is not known to be hijacked (ie. is either `ROA_VALID` or `ROA_UNKNOWN`), I'm ready to set a few attributes, notably: * ***AS_LOCALPREF*** If the peer I learned this prefix from is in the given list, set the BGP local preference to either 50, 30 or 10 respectively (a lower localpref means the prefix is less likely to be selected). Some internet providers send lots of prefixes, but have poor network connectivity to the place I learned the routes from (a few examples to this, 6939 is often oversubscribed in Amsterdam, and 39533 was for a while connected via a tunnel (!) to Zurich, and several hobby/amateur IXPs are on a VXLAN bridged domain rather than a physical switch). * ***Graceful Shutdown*** described in [RFC8326](https://datatracker.ietf.org/doc/html/rfc8326), shows a way to allow operators to pre-announce their downtime by setting a special BGP community that informs their peers to deselect that path by setting the local preference to the lowest possible value. This oneliner matching on `(65535,0)` implements that behavior. * ***Blackhole Community*** described in [RFC7999](https://datatracker.ietf.org/doc/html/rfc7999), is another special BGP community of `(65535,666)` which signals the need to stop sending traffic to the prefix at hand. I haven't yet implemented the blackhole routing (this has to do with an intricacy of the VPP Linux-CP code that I wrote), so for now I'll just remove the community. Alright, based on this one template, I'm now ready to implement all three types of BGP session: ***Peer***, ***Upstream***, and ***Downstream***. ### Peers ``` function ebgp_import_peer(int remote_as) { # Scrub BGP Communities (RFC 7454 Section 11) bgp_community.delete([(8298, *)]); bgp_large_community.delete([(8298, *, *)]); return ebgp_import(remote_as); } ``` It's dangerous to accept communities for my own AS8298 from peers. This is because several of them can actively change the behavior of route propagation (these types of communities are commonly called _action_ communities). So with peering relationships, I'll just toss them all. Now, working my way up to the actual BGP peering session, taking for example a peer that I'm connecting to at LSIX (the routeserver, in fact) in Amsterdam: ``` filter ebgp_lsix_49917_import { if ! ebgp_import_peer(49917) then reject; # Add IXP Communities bgp_community.add((8298,1036)); bgp_large_community.add((8298,1,1036)); accept; } protocol bgp lsix_49917_ipv4_1 { description "LSIX IX Route Servers (LSIX)"; local as 8298; source address 185.1.32.74; neighbor 185.1.32.254 as 49917; default bgp_med 0; default bgp_local_pref 200; ipv4 { import keep filtered; import filter ebgp_lsix_49917_import; export filter ebgp_lsix_49917_export; receive limit 100000 action restart; next hop self on; }; }; ``` Parsing this through: the ipv4 import filter is called `ebgp_lsix_49917_import` and its job is to run the whole kittenkaboodle of filtering I described above, and then if the `ebgp_import_peer()` function returns false, to simply drop the prefix. But if it is accepted, I'll tag it with a few communities. As I'll show later, any other peer will receive these communities if I decide to propagate the prefix to them. This is specifically useful for downstream (customers), who can decide to accept/deny the prefix based on a wellknown set of communities we tag. ***IXP Community***: If the prefix is learned at an IXP, I'll add a large community `(8298,1,*)` and backwards compat normal community `(8298,10XX)`. One last thing I'll note, and this is a matter of taste, is for most peering prefixes picked up at internet exchanges (like LSIX), are typically much cheaper per megabit than the transit routes, so I will set a default `bgp_local_pref` of 200 (higher localpref is more likely to be selected as the active route). ### Upstream An interesting observation: from Peers and from Upstreams I typically am happy to take all the prefixes I can get (but see the epilog below for an important note on this). For a Peer, this is mostly "their own prefixes" and for a Transit, this is mostly "all prefixes", but there's things in the middle, say partial transit of "all prefixes learned at IXP A B and C". Really, all inbound sessions are very similar: ``` function ebgp_import_upstream(int remote_as) { # Scrub BGP Communities (RFC 7454 Section 11) bgp_community.delete([(8298, *)]); bgp_large_community.delete([(8298, *, *)]); return ebgp_import(remote_as); } ``` ... is in fact identical to the `ebgp_import_peer()` function above, so I'll not discuss it further. But for the sessions to upstream (==transit) providers, it can make sense to use slightly different BGP community tags and a lower localpref: ``` filter ebgp_ipmax_25091_import { if ! ebgp_import_upstream(25091) then reject; # Add BGP Large Communities bgp_large_community.add((8298,2,25091)); # Add BGP Communities bgp_community.add((8298,2000)); accept; } protocol bgp ipmax_25091_ipv4_1 { description "IP-Max Transit"; local as 8298; source address 46.20.242.210; neighbor 46.20.242.209 as 25091; default bgp_med 0; default bgp_local_pref 50; ipv4 { import keep filtered; import filter ebgp_ipmax_25091_import; export filter ebgp_ipmax_25091_export; next hop self on; }; }; ``` Again, a very similar pattern; the only material difference is that the inbound prefixes are tagged with an ***Upstream Community*** which is of the form `(8298,2,*)` and backwards compatible `(8298,20XX)`. Downstream customers can use this, if they wish, to select or reject routes (maybe they don't like routes coming from AS25091, although they should know better because IP-Max rocks!). The other slight change here is the `bgp_local_pref` is set to 50, which implies that it will be used only if there are no alternatives in the _RIB_ with a higher localpref, or with a similar localpref but shorter as-path, or many other scenarios which I won't get into here, because BGP selection criteria 101 is a whole blogpost of its own. ## Downstream That brings us to the third type of BGP sessions -- commonly referred to as customers except that not everybody pays :) so I just call them _downstreams_: ``` function ebgp_import_downstream(int remote_as) { # We do not scrub BGP Communities (RFC 7454 Section 11) for customers return ebgp_import(remote_as); } ``` Here, I have a special relationship with the `remote_as`, and I do not scrub the communities, letting the downstream operator set whichever they like. As I'll demonstrate in the next chapter, they can use these communities to drive certain types of behavior. Here's how I use this `ebgp_import_downstream()` function in the full filter for a downstream: ``` # bgpq4 -Ab4 -R 24 -m 24 -l 'define AS201723_IPV4' AS201723 define AS201723_IPV4 = [ 185.54.95.0/24 ]; # bgpq4 -Ab6 -R 48 -m 48 -l 'define AS201723_IPV6' AS201723 define AS201723_IPV6 = [ 2001:678:3d4::/48, 2001:67c:6bc::/48 ]; filter ebgp_raymon_201723_import { if (net.type = NET_IP4 && ! (net ~ AS201723_IPV4)) then reject; if (net.type = NET_IP6 && ! (net ~ AS201723_IPV6)) then reject; if ! ebgp_import_downstream(201723) then reject; # Add BGP Large Communities bgp_large_community.add((8298,3,201723)); # Add BGP Communities bgp_community.add((8298,3500)); accept; } protocol bgp raymon_201723_ipv4_1 { local as 8298; source address 185.54.95.250; neighbor 185.54.95.251 as 201723; default bgp_med 0; default bgp_local_pref 400; ipv4 { import keep filtered; import filter ebgp_raymon_201723_import; export filter ebgp_raymon_201723_export; receive limit 94 action restart; next hop self on; }; }; ``` OK, so this is a mouthful, but the one thing that I really need to do with customers is ensure that I only accept prefixes from them that they're supposed to send me. I do this with a `prefix-list` for IPv4 and IPv6, and in the importer, I simply reject any prefixes that are not in the list. From then on, it looks very much like a peer, with identical filtering and tagging, except now I'm using yet another ***Customer Community*** which starts with `(8298,3,*)` and a vanilla `(8298,3500)` community. Anybody who wishes to, can act on the presence of these communities to know that it's a downstream of IPng Networks AS8298. ***A note on Peers and Downstreams***: Some ISPs will not peer with their customers (as in: once you become a transit customer they will terminate all BGP sessions at public internet exchanges), and I find that silly. However, for me the situation becomes a little bit more complex if I were to have AS201723 both as a Downstream (as shown here) as well as a Peer (which in fact, I do, at multiple Amsterdam based internet exchanges). Note how the `bgp_local_pref` is 400 on this session, and it will always be lower on other types of sessions. The implication is that this prefix from the _RIB_ which carries `(8298,3,201723)` will be selected, and the ones I learn from LSIX will carry `(8298,1,*)` and the ones I learn from A2B (a transit provider) will carry `(8298,2,51088)` and both will not be selected due to those having a lower localpref. As I'll demonstrate below, I can make smart use of these communities when announcing prefixes to my own peers and upstreams, ... read on :) ## 3. Outbound: Announcing Routes Alright, the _RIB_ is now filled with lots of prefixes that have the right localpref and communities, for example from having been learned at an IXP, from an Upstream, or from a Downstream. Now let's consider the following generic exporter: ``` function ebgp_export(int remote_as) { # Remove private ASNs bgp_path.delete([64512..65535, 4200000000..4294967295]); # Well known BGP Large Communities if (8298, 0, remote_as) ~ bgp_large_community then return false; if (8298, 0, 0) ~ bgp_large_community then return false; # Well known BGP Communities if (0, 8298) ~ bgp_community then return false; if (remote_as < 65536 && (0, remote_as) ~ bgp_community) then return false; # AS path prepending if ((8298, 103, remote_as) ~ bgp_large_community || (8298, 103, 0) ~ bgp_large_community) then { bgp_path.prepend( bgp_path.first ); bgp_path.prepend( bgp_path.first ); bgp_path.prepend( bgp_path.first ); } else if ((8298, 102, remote_as) ~ bgp_large_community || (8298, 102, 0) ~ bgp_large_community) then { bgp_path.prepend( bgp_path.first ); bgp_path.prepend( bgp_path.first ); } else if ((8298, 101, remote_as) ~ bgp_large_community || (8298, 101, 0) ~ bgp_large_community) then { bgp_path.prepend( bgp_path.first ); } return true; } ``` Oh, wow! There's some really cool stuff to unpack here. As a belt-and-braces type safety, I will remove any private AS numbers from the as-path - this avoids my own announcements from tripping any as-path bogon filtering. But then, there's a few well-known communities that help determine if the announcement is made or not, and there are three-and-a-half ways of doing this: 1. `(8298,0,remote_as)` 1. `(8298,0,0)` 1. `(0,8298)` 1. `(0,remote_as)` but only if the remote_as is 16 bits. All four of these methods will tell the router to refuse announcing the prefix on this session. Note that downstreams are allowed to set `(8298,*,*)` and `(8298,*)` communities (and they're the only ones who are allowed to do so). So here is where some of the cool magic starts to happen. Then, to drive prepending of the prefix on this session, I'll again match certain communities `(8298, 103, *)` will prepend the customer's AS number three times, using `102` will prepend twice, and `101` will prepend once. If the third digit is `0`, then any session with this filter will prepend. If the third digit is the AS number, then only sessions to this AS number will be prepended. Using these types of communities allow downstream (customers) incredibly fine grained propagation actions, at the per-IPng-session level. Not many ISPs offer this functionality! ### Peers Exporting to peers, I really need to make sure that I don't send too many prefixes. Most of us have at some point gone through the embarassing motions of being told by a fellow operator "hey you're sending a full table". It is paramount to good peering hygiene that I do not leak. So I'll define a healthy set of _defense in depth_ principles here: ``` # bgpq4 -A4b -R 24 -m 24 -l 'define AS8298_IPV4' AS8298 define AS8298_IPV4 = [ 92.119.38.0/24, 194.1.163.0/24, 194.126.235.0/24 ]; # bgpq4 -A6bR 48 -m 48 -l 'define AS8298_IPV6' AS8298 define AS8298_IPV6 = [ 2001:678:d78::/48, 2a0b:dd80::/29{29,48} ]; # bgpq4 -A4b -R 24 -m 24 -l 'define AS_IPNG_IPV4' AS-IPNG define AS_IPNG_IPV4 = [ ... ## Removed for brevity ]; # bgpq4 -A6bR 48 -m 48 -l 'define AS_IPNG_IPV6' AS-IPNG define AS_IPNG_IPV6 = [ .. ## Removed for brevity ]; # bgpq4 -t4b -l 'define AS_IPNG' AS-IPNG define AS_IPNG = [112, 8298, 50869, 57777, 60557, 201723, 212323, 212855]; function aspath_first_valid() { return (bgp_path.len = 0 || bgp_path.first ~ AS_IPNG); } # A list of well-known tier1 transit providers function aspath_contains_tier1() { return bgp_path ~ [ 174, # Cogent 209, # Qwest (HE carries this on IXPs IPv6 (Jul 12 2018)) 701, # UUNET 702, # UUNET 1239, # Sprint 1299, # Telia 2914, # NTT Communications 3257, # GTT Backbone 3320, # Deutsche Telekom AG (DTAG) 3356, # Level3 3549, # Level3 3561, # Savvis / CenturyLink 4134, # Chinanet 5511, # Orange opentransit 6453, # Tata Communications 6762, # Seabone / Telecom Italia 7018 ]; # AT&T } # The list of our own uplink (transit) providers # Note: This list is autogenerated by our automation. function aspath_contains_upstream() { return bgp_path ~ [ 8283,25091,34549,51088,58299 ]; } function ipv4_prefix_valid() { # Our (locally sourced) prefixes if (net ~ AS8298_IPV4) then return true; # Customer prefixes in AS-IPNG must be tagged with customer community if (net ~ AS_IPNG_IPV4 && (bgp_large_community ~ [(8298, 3, *)] || bgp_community ~ [(8298, 3500)]) ) then return true; return false; } function ipv6_prefix_valid() { # Our (locally sourced) prefixes if (net ~ AS8298_IPV6) then return true; # Customer prefixes in AS-IPNG must be tagged with customer community if (net ~ AS_IPNG_IPV6 && (bgp_large_community ~ [(8298, 3, *)] || bgp_community ~ [(8298, 3500)]) ) then return true; return false; } function prefix_valid() { # as-path based filtering if !aspath_first_valid() then return false; if aspath_contains_tier1() then return false; if aspath_contains_upstream() then return false; # prefix (and BGP community) based filtering if (net.type = NET_IP4 && !ipv4_prefix_valid()) then return false; if (net.type = NET_IP6 && !ipv6_prefix_valid()) then return false; return true; } function ebgp_export_peer(int remote_as) { if !prefix_valid() then return false; return ebgp_export(remote_as); } ``` Wow, alrighty then!! All I'm doing here is checking if the call to `prefix_valid()` returns true. That function isn't very complex. It takes a look at three as-path based filters and then a prefix-list based filter. Let's go over them in turn: ***aspath_first_valid()*** takes a look at the first hop in the as-path. I need to make sure that I've received this prefix from an actual downstream, and those are collected in a RIPE `as-set` called `AS-IPNG`. So if the first BGP hop in the path is not one of these, I'll refuse to announce the prefix. ***aspath_contains_tier1()*** is a belt-and-braces style check. How on earth would I provide transit for any prefix for which there's already a global _Tier1_ provider in the path? I mean, in no universe would AS174 or AS1299 need me to reach any of their customers, or indeed, any place in the world. So this filter helps me never announce the prefix, if it has one of these ISPs in the path. ***aspath_contains_upstream()*** similarly, if I am receiving a full table from an upstream provider, I should not be passing this prefix along - I would for similar reasons never be a transit provider for A2B or IP-Max or Meerfarbig. Due to a bug in my configuration, my buddy Erik kindly pointed out this issue to me, so hat-tip to him for the intelligence. ***ipv[46]_prefix_valid()*** is the main thrust of prefix-based filtering. At this point we've already established that the as-path is clean, but it could be that the downstream is sending prefixes they should not (possibly leaking a full table) so let's take a look at a good way to avoid this. * First, we look at locally sourced routes from `AS8298`, that is the ones that I myself originate at IPng Networks. These are always OK. The list is carefully curated. * Alternatively, the prefix needs to be from the as-set `AS-IPNG` (which contains both my prefixes and all `route` and `route6` objects belonging to any AS number that I consider a downstream), * Finally, if the prefix is from `AS-IPNG`, I'll still add one additional check to ensure that there is a so-called _customer community_ attached. Remember that I discused this specifically up in the ***Inbound - Downstream*** section. So before I were to announce anything on such a session, all _four_ of as-path, inbound prefix-list, outbound prefix-list and bgp-community are checked. This makes it incredibly unlikely that AS8298 ever leaks prefixes -- knock on wood! ### Upstream Interestingly and if you think about it, unsurprisingly, an upstream configuration is exactly identical to a peer: ``` function ebgp_export_upstream(int remote_as) { if !prefix_valid() then return false; return ebgp_export(remote_as); } ``` Alright, nothing to see here, moving on ... ### Downstream Now the difference between a Peer and an Upstream on the one hand, and a Downstream on the other, is that the former two will only see a very limited set of prefixes, heavily guarded by all of that filtering I described. But a downstream typically has the luxury of getting to learn every prefix I've learned: ``` function ipv4_acceptable_size() { if net.len < 8 then return false; if net.len > 24 then return false; return true; } function ipv6_acceptable_size() { if net.len < 12 then return false; if net.len > 48 then return false; return true; } function ebgp_export_downstream(int remote_as) { if (source != RTS_BGP && source != RTS_STATIC) then return false; if (net.type = NET_IP4 && ! ipv4_acceptable_size()) then return false; if (net.type = NET_IP6 && ! ipv6_acceptable_size()) then return false; return ebgp_export(remote_as); } ``` So here I'll assert that the prefix has to be either from the `RTS_BGP` source, or from the `RTS_STATIC` source. This latter source is what Bird uses for locally generated routes (ie. the ones in AS8298 itself). Locally generated routes are not known from BGP, but known instead because they are blackholed / null-routed on the router itself. And from these routes, I further deselect those prefixes that are too short or too long, which are slightly different based on address family (IPv4 is anywhere between /8-/24 and for IPv6 is anywhere between /12-/48). Now, I will note that I've seen many operators who inject OSPF or connected or static routes into BGP, and all of those folks will have to maintain elaborate egress "bogon" route filters, for example for those IXP prefixes that they picked up due to them being directly connected. If those operators would simply not propagate directly connected routes, their life would be so much simpler .. but I digress and it's time for me to wrap up. ## Epilog I hope this little dissertation proves useful for other Bird enthusiasts out there. I myself had to fiddle a bit over the years with the idiosyncracies (and bugs) of Bird and Bird2. I wanted to make a few comments: 1. Thanks to the crew at [Coloclue](https://coloclue.net/) for having a really phenomenal routing setup, with a lot of thoughtful documentation, action communities, and strict ingress and egress filtering. It's also fully automated and I've derived, although completely rewritten, my own automation based off of [Kees](https://github.com/coloclue/kees). 1. I understand that the main destinction on inbound Peer and Upstream, is that for Peers many folks will want to do strict filtering. I've considered this for a long time and ultimately decided against it, because a combination of max prefix, tier1 as-path filtering and RPKI filtering would take care of the most egregious mistakes and otherwise, I'm actually happy to get more prefixes via IXPs rather than less.