Rewrite all images to Hugo format
This commit is contained in:
		
							
								
								
									
										412
									
								
								content/articles/2024-05-25-nat64-1.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										412
									
								
								content/articles/2024-05-25-nat64-1.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,412 @@ | ||||
| --- | ||||
| date: "2024-05-25T12:23:54Z" | ||||
| title: 'Case Study: NAT64' | ||||
| --- | ||||
|  | ||||
| # Introduction | ||||
|  | ||||
| {{< image width="400px" float="right" src="/assets/oem-switch/s5648x-front-opencase.png" alt="Front" >}} | ||||
|  | ||||
| IPng's network is built up in two main layers, (1) an MPLS transport layer, which is disconnected | ||||
| from the Internet, and (2) a VPP overlay, which carries the Internet. I created a BGP Free core | ||||
| transport network, which uses MPLS switches from a company called Centec. These switches offer IPv4, | ||||
| IPv6, VxLAN, GENEVE and GRE all in silicon, are very cheap on power and relatively affordable per | ||||
| port. | ||||
|  | ||||
| Centec switches allow for a modest but not huge amount of routes in the hardware forwarding tables. | ||||
| I loadtested them in [[a previous article]({% post_url 2022-12-05-oem-switch-1 %})] at line rate | ||||
| (well, at least 8x10G at 64b packets and around 110Mpps), and they forward IPv4, IPv6 and MPLS | ||||
| traffic effortlessly, at 45 watts. | ||||
|  | ||||
| I wrote more about the Centec switches in [[my review]({% post_url 2023-03-11-mpls-core %})] of them | ||||
| back in 2022. | ||||
|  | ||||
| ### IPng Site Local | ||||
|  | ||||
| {{< image width="400px" float="right" src="/assets/nat64/MPLS IPng Site Local v2.svg" alt="IPng SL" >}} | ||||
|  | ||||
| I leverage this internal transport network for more than _just_ MPLS. The transport switches are | ||||
| perfectly capable of line rate (at 100G+) IPv4 and IPv6 forwarding as well. When designing IPng Site | ||||
| Local, I created a number plan that assigns ***IPv4*** from the **198.19.0.0/16** prefix, and ***IPv6*** | ||||
| from the **2001:678:d78:500::/56** prefix. Within these, I allocate blocks for _Loopback_ addresses, | ||||
| _PointToPoint_ subnets, and hypervisor networks for VMs and internal traffic. | ||||
|  | ||||
| Take a look at the diagram to the right. Each site has one or more Centec switches (in red), and | ||||
| there are three redundant gateways that connect the IPng Site Local network to the Internet (in | ||||
| orange). I run lots of services in this red portion of the network: site to site backups | ||||
| [[Borgbackup](https://www.borgbackup.org/)], ZFS replication [[ZRepl](https://zrepl.github.io/)], a | ||||
| message bus using [[Nats](https://nats.io)], and of course monitoring with SNMP and Prometheus all | ||||
| make use of this network.  But it's not only internal services like management traffic, I also | ||||
| actively use this private network to expose _public_ services! | ||||
|  | ||||
| For example, I operate a bunch of [[NGINX Frontends]({% post_url 2023-03-17-ipng-frontends %})] that | ||||
| have a public IPv4/IPv6 address, and reversed proxy for webservices (like | ||||
| [[ublog.tech](https://ublog.tech)] or [[Rallly](https://rallly.ipng.ch/)]) which run on VMs and | ||||
| Docker hosts which don't have public IP addresses. Another example which I wrote about [[last | ||||
| week]({% post_url 2024-05-17-smtp %})], is a bunch of mail services that run on VMs without public | ||||
| access, but are each carefully exposed via reversed proxies (like Postfix, Dovecot, or | ||||
| [[Roundcube](https://webmail.ipng.ch)]). It's an incredibly versatile network design! | ||||
|  | ||||
| ### Border Gateways | ||||
|  | ||||
| Seeing as IPng Site Local uses native IPv6, it's rather straight forward to give each hypervisor and | ||||
| VM an IPv6 address, and configure IPv4 only on the externally facing NGINX Frontends. As a reversed | ||||
| proxy, NGINX will create a new TCP session to the internal server, and that's a fine solution. | ||||
| However, I also want my internal hypervisors and servers to have full Internet connectivity. For | ||||
| IPv6, this feels pretty straight forward, as I can just route the **2001:678:d78:500::/56** through | ||||
| a firewall that blocks incoming traffic, and call it a day. For IPv4, similarly I can use classic | ||||
| NAT just like one would in a residential network. | ||||
|  | ||||
| **But what if I wanted to go IPv6-only?** This poses a small challenge, because while IPng is fully | ||||
| IPv6 capable, and has been since the early 2000s, the rest of the internet is not quite there yet. | ||||
| For example, the quite popular [[GitHub](https://github.com/pimvanpelt/)] hosting site still has | ||||
| only an IPv4 address. Come on, folks, what's taking you so long?! It is for this purpose that NAT64 | ||||
| was invented. Described in [[RFC6146](https://datatracker.ietf.org/doc/html/rfc6146)]: | ||||
|  | ||||
| > Stateful NAT64 translation allows IPv6-only clients to contact IPv4 servers using unicast | ||||
| > UDP, TCP, or ICMP. One or more public IPv4 addresses assigned to a NAT64 translator are shared | ||||
| > among several IPv6-only clients.  When stateful NAT64 is used in conjunction with DNS64, no | ||||
| > changes are usually required in the IPv6 client or the IPv4 server. | ||||
|  | ||||
| The rest of this article describes version 2 of the IPng SL border gateways, which opens the path | ||||
| for IPng to go IPv6-only. By the way, I thought it would be super complicated, but in hindsight: I | ||||
| should have done this years ago! | ||||
|  | ||||
| #### Gateway Design | ||||
|  | ||||
| {{< image width="400px" float="right" src="/assets/nat64/IPng NAT64.svg" alt="IPng Border Gateway" >}} | ||||
|  | ||||
| Let me take a closer look at the orange boxes that I drew in the network diagram above. I call these | ||||
| machines _Border Gateways_. Their job is to sit between IPng Site Local and the Internet. They'll | ||||
| each have one network interface connected to the Centec switch, and another connected to | ||||
| the VPP routers at AS8298. They will provide two main functions: firewalling, so that no unwanted | ||||
| traffic enters IPng Site local, and NAT translation, so that: | ||||
| 1.    IPv4 users from **198.19.0.0/16** can reach external IPv4 addresses, | ||||
| 1.    IPv6 users from **2001:678:d78:500::/56** can reach external IPv6, | ||||
| 1.    _IPv6-only_ users can reach external **IPv4** addresses, a neat trick. | ||||
|  | ||||
| #### IPv4 and IPv6 NAT | ||||
|  | ||||
| Let me start off with the basic tablestakes. You'll likely be familiar with _masquerading_, a | ||||
| NAT technique in Linux that uses the public IPv4 address assigned by your provider, allowing | ||||
| many internal clients, often using [[RFC1918](https://datatracker.ietf.org/doc/html/rfc1918)] addresses, | ||||
| to access the internet via that shared IPv4 address. You may not have come across IPv6 _masquerading_ | ||||
| though, but it's equally possible to take an internal (private, non-routable) | ||||
| IPv6 network and access the internet via a shared IPv6 address. | ||||
|  | ||||
| I will assign a pool of four public IPv4 addresses and eight IPv6 addresses to each border gateway: | ||||
|  | ||||
| | **Machine** | **IPv4 pool** | **IPv6 pool** | | ||||
| | border0.chbtl0.net.ipng.ch | <span style='color:green;'>194.126.235.0/30</span> | <span style='color:blue;'>2001:678:d78::3:0:0/125</span> | | ||||
| | border0.chrma0.net.ipng.ch | <span style='color:green;'>194.126.235.4/30</span> | <span style='color:blue;'>2001:678:d78::3:1:0/125</span> | | ||||
| | border0.chplo0.net.ipng.ch | <span style='color:green;'>194.126.235.8/30</span> | <span style='color:blue;'>2001:678:d78::3:2:0/125</span> | | ||||
| | border0.nlams0.net.ipng.ch | <span style='color:green;'>194.126.235.12/30</span> | <span style='color:blue;'>2001:678:d78::3:3:0/125</span> | | ||||
|  | ||||
| Linux iptables _masquerading_ will only work with the IP addresses assigned to the external | ||||
| interface, so I will need to use a slightly different approach to be able to use these _pools_. In | ||||
| case you're wondering -- IPng's internal network has grown to the size now that I cannot expose it | ||||
| all behind a single IPv4 address; there will not be enough TCP/UDP ports. Luckily, NATing via a pool | ||||
| is pretty easy using the _SNAT_ module: | ||||
|  | ||||
| ``` | ||||
| pim@border0-chrma0:~$ cat << EOF | sudo tee /etc/rc.firewall.ipng-sl | ||||
| # IPng Site Local: Enable stateful firewalling on IPv4/IPv6 forwarding | ||||
| iptables  -P FORWARD DROP | ||||
| ip6tables -P FORWARD DROP | ||||
| iptables  -I FORWARD -i enp1s0f1 -m state --state NEW -s 198.19.0.0/16 -j ACCEPT | ||||
| ip6tables -I FORWARD -i enp1s0f1 -m state --state NEW -s 2001:678:d78:500::/56 -j ACCEPT | ||||
| iptables  -I FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT | ||||
| ip6tables -I FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT | ||||
|  | ||||
| # IPng Site Local: Enable NAT on external interface using NAT pools | ||||
| iptables  -t nat -I POSTROUTING -s 198.19.0.0/16 -o enp1s0f0 \ | ||||
|                                 -j SNAT --to 194.126.235.4-194.126.235.7 | ||||
| ip6tables -t nat -I POSTROUTING -s 2001:678:d78:500::/56 -o enp1s0f0 \ | ||||
|                                 -j SNAT --to 2001:678:d78::3:1:0-2001:678:d78::3:1:7 | ||||
| EOF | ||||
| ``` | ||||
|  | ||||
| From the top -- I'll first make it the default for the kernel to refuse to _FORWARD_ any traffic that | ||||
| is not explicitly accepted. I will only allow traffic that comes in via `enp1s0f1` (the internal | ||||
| interface), only if it comes from the assigned IPv4 and IPv6 site local prefixes. On the way back, | ||||
| I'll allow traffic that matches states created on the way out. This is the _firewalling_ portion of | ||||
| the setup. | ||||
|  | ||||
| Then, two _POSTROUTING_ rules turn on network address translation. If the source address is any of | ||||
| the site local prefixes, I'll rewrite it to come from the IPv4 or IPv6 pool addresses, respectively. | ||||
| This is the _NAT44_ and _NAT66_ portion of the setup. | ||||
|  | ||||
| #### NAT64: Jool | ||||
|  | ||||
| {{< image width="400px" float="right" src="/assets/nat64/jool.png" alt="Jool" >}} | ||||
|  | ||||
| So far, so good. But this article is about NAT64 :-) Here's where I grossly overestimated how | ||||
| difficult it might be -- and if there's one takeaway from my story here, it should be that NAT64 is | ||||
| as straight forward as the others! Enter [[Jool](https://jool.mx)], an Open Source SIIT and NAT64 | ||||
| for Linux. It's available in Debian as a DKMS kernel module and userspace tool, and it integrates | ||||
| cleanly with both _iptables_ and _netfilter_. | ||||
|  | ||||
| Jool is a network address and port translating | ||||
| implementation, which is referred to as _NAPT_, just as regular IPv4 NAT. When internal IPv6 clients | ||||
| try to reach an external endpoint, Jool will make note of the internal src6:port, then select an | ||||
| external IPv4 address:port, rewrite the packet, and on the way back, correlate the src4:port with | ||||
| the internal src6:port, and rewrite the packet. If this sounds an awful lot like NAT, then you're | ||||
| not wrong! The only difference is, Jool will also translate the *address family*: it will rewrite | ||||
| the internal IPv6 addresses to external IPv4 addresses. | ||||
|  | ||||
| Installing Jool is as simple as this: | ||||
|  | ||||
| ``` | ||||
| pim@border0-chrma0:~$ sudo apt install jool-dkms jool-tools | ||||
| pim@border0-chrma0:~$ sudo mkdir /etc/jool | ||||
| pim@border0-chrma0:~$ cat << EOF | sudo tee /etc/jool/jool.conf | ||||
| { | ||||
|   "comment": { | ||||
|     "description": "Full NAT64 configuration for border0.chrma0.net.ipng.ch", | ||||
|     "last update": "2024-05-21" | ||||
|   }, | ||||
|   "instance": "default", | ||||
|   "framework": "netfilter", | ||||
|   "global": { "pool6": "2001:678:d78:564::/96", "lowest-ipv6-mtu": 1280, "logging-debug": false }, | ||||
|   "pool4": [ | ||||
|     { "protocol": "TCP", "prefix": "194.126.235.4/30", "port range": "1024-65535" }, | ||||
|     { "protocol": "UDP", "prefix": "194.126.235.4/30", "port range": "1024-65535" }, | ||||
|     { "protocol": "ICMP", "prefix": "194.126.235.4/30" } | ||||
|   ] | ||||
| } | ||||
| EOF | ||||
| pim@border0-chrma0:~$ sudo systemctl start jool | ||||
| ``` | ||||
|  | ||||
| .. and that, as they say, is all there is to it! There's two things I make note of here: | ||||
| 1.   I have assigned **2001:678:d78:564::/96** as NAT64 `pool6`, which means that if this machine | ||||
|      sees any traffic _destined_ to that prefix, it'll activate Jool, select an available IPv4 | ||||
|      address:port from the `pool4`, and send the packet to the IPv4 destination address which it | ||||
|      takes from the last 32 bits of the original IPv6 destination address. | ||||
| 1.   Cool trick: I am **reusing** the same IPv4 pool as for regular NAT. The Jool kernel module | ||||
|      happily coexists with the _iptables_ implementation! | ||||
|  | ||||
| #### DNS64: Unbound | ||||
|  | ||||
| There's one vital piece of information missing, and it took me a little while to appreciate that. If | ||||
| I take an IPv6 only host, like Summer, and I try to connect to an IPv4-only host, how does that even | ||||
| work? | ||||
|  | ||||
| ``` | ||||
| pim@summer:~$ ip -br a | ||||
| lo               UNKNOWN        127.0.0.1/8 ::1/128 | ||||
| eno1             UP             2001:678:d78:50b::f/64 fe80::7e4d:8fff:fe03:3c00/64 | ||||
| pim@summer:~$ ip -6 ro | ||||
| 2001:678:d78:50b::/64 dev eno1 proto kernel metric 256 pref medium | ||||
| fe80::/64 dev eno1 proto kernel metric 256 pref medium | ||||
| default via 2001:678:d78:50b::1 dev eno1 proto static metric 1024 pref medium | ||||
|  | ||||
| pim@summer:~$ host github.com | ||||
| github.com has address 140.82.121.4 | ||||
| pim@summer:~$ ping github.com | ||||
| ping: connect: Network is unreachable | ||||
| ``` | ||||
|  | ||||
| Now comes the really clever reveal -- NAT64 works by assigning an IPv6 prefix that snugly fits the | ||||
| entire IPv4 address space, typically **64:ff9b::/96**, but operators can chose any prefix they'd like. | ||||
| For IPng's site local network, I decided to assign **2001:678:d78:564::/96** for this purpose | ||||
| (this is the `global.pool6` attribute in Jool's config file I described above). A resolver can then | ||||
| tweak DNS lookups for IPv6-only hosts to return addresses from that IPv6 range.  This tweaking is | ||||
| called DNS64, described in [[RFC6147](https://datatracker.ietf.org/doc/html/rfc6147)]: | ||||
|  | ||||
| >   DNS64 is a mechanism for synthesizing AAAA records from A records.  DNS64 is used with an | ||||
| >   IPv6/IPv4 translator to enable client-server communication between an IPv6-only client and an | ||||
| >   IPv4-only server, without requiring any changes to either the IPv6 or the IPv4 node, for the | ||||
| >   class of applications that work through NATs. | ||||
|  | ||||
| I run the popular [[Unbound](https://www.nlnetlabs.nl/projects/unbound/about/)] resolver at IPng, | ||||
| deployed as a set of anycasted instances across the network. With two lines of configuration only, I | ||||
| can turn on this feature: | ||||
|  | ||||
| ``` | ||||
| pim@border0-chrma0:~$ cat << EOF | sudo tee /etc/unbound/unbound.conf.d/dns64.conf | ||||
| server: | ||||
|   module-config: "dns64 iterator" | ||||
|   dns64-prefix: 2001:678:d78:564::/96 | ||||
| EOF | ||||
| pim@border0-chrma0:~$ sudo systemctl restat unbound | ||||
| ``` | ||||
|  | ||||
| The behavior of the resolver now changes in a very subtle but cool way: | ||||
|  | ||||
| ``` | ||||
| pim@summer:~$ host github.com | ||||
| github.com has address 140.82.121.3 | ||||
| github.com has IPv6 address 2001:678:d78:564::8c52:7903 | ||||
| pim@summer:~$ host 2001:678:d78:564::8c52:7903 | ||||
| 3.0.9.7.2.5.c.8.0.0.0.0.0.0.0.0.4.6.5.0.8.7.d.0.8.7.6.0.1.0.0.2.ip6.arpa | ||||
|   domain name pointer lb-140-82-121-3-fra.github.com. | ||||
| ``` | ||||
|  | ||||
| Before, [[github.com](https://github.com/pimvanpelt/)] did not return an AAAA record, so there was | ||||
| no way for Summer to connect to it. But now, not only does it return an AAAA record, but it also | ||||
| rewrites the PTR request, knowing that I'm asking for something in the DNS64 range of | ||||
| **2001:678:d78:564::/96**, Unbound will instead strip off the last 32 bits (`8c52:7903`, which is the | ||||
| hex encoding for the original IPv4 address), and return the answer for a PTR lookup for the original | ||||
| `3.121.82.140.in-addr.arpa` instead. Game changer! | ||||
|  | ||||
| {{< image width="400px" float="right" src="/assets/nat64/IPng NAT64.svg" alt="IPng Border Gateway" >}} | ||||
|  | ||||
| #### DNS64 + NAT64 | ||||
|  | ||||
| What I learned from this, is that the _combination_ of these two tools provides the magic: | ||||
|  | ||||
| 1.   When an IPv6-only client asks for AAAA for an IPv4-only hostname, Unbound will synthesize an AAAA | ||||
|      from the IPv4 address, casting it into the last 32 bits of its NAT64 prefix **2001:678:d78:564::/96** | ||||
| 1.   When an IPv6-only client tries to send traffic to **2001:678:d78:564::/96**, Jool will do the | ||||
|      address family (and address/port) translation. This is represented by the red (ipv6) flow in the | ||||
|      diagram to the right turning into a green (ipv4) flow to the left. | ||||
|  | ||||
| What's left for me to do is to ensure that (a) the NAT64 prefix is routed from IPng Site Local to | ||||
| the gateways and (b) that the IPv4 and IPv6 NAT address pools is routed from the Internet to the | ||||
| gateways. | ||||
|  | ||||
| #### Internal: OSPF | ||||
|  | ||||
| I use Bird2 to accomplish the dynamic routing - and considering the Centec switch network is by | ||||
| design _BGP Free_, I will use OSPF and OSPFv3 for these announcements.  Using OSPF has an important | ||||
| benefit: I can selectively turn on and off the Bird announcements to the Centec IPng Site local | ||||
| network. Seeing as there will be multiple redundant gateways, if one of them goes down (either due | ||||
| to failure or because of maintenance), the network will quickly reconverge on another replica. Neat! | ||||
|  | ||||
| Here's how I configure the OSPF import and export filters: | ||||
|  | ||||
| ``` | ||||
| filter ospf_import { | ||||
|   if (net.type = NET_IP4 && net ~ [ 198.19.0.0/16 ]) then accept; | ||||
|   if (net.type = NET_IP6 && net ~ [ 2001:678:d78:500::/56 ]) then accept; | ||||
|   reject; | ||||
| } | ||||
|  | ||||
| filter ospf_export { | ||||
|   if (net.type=NET_IP4 && !(net~[198.19.0.255/32,0.0.0.0/0])) then reject; | ||||
|   if (net.type=NET_IP6 && !(net~[2001:678:d78:564::/96,2001:678:d78:500::1:0/128,::/0])) then reject; | ||||
|  | ||||
|   ospf_metric1 = 200; unset(ospf_metric2); | ||||
|   accept; | ||||
| } | ||||
| ``` | ||||
|  | ||||
| When learning prefixes _from_ the Centec switch, I will only accept precisely the IPng Site Local | ||||
| IPv4 (198.19.0.0/16) and IPv6 (2001:678:d78:500::/56) supernets. On sending prefixes _to_ the Centec | ||||
| switches, I will announce: | ||||
| *   ***198.19.0.255/32*** and ***2001:678:d78:500::1:0/128***: These are the anycast addresses of the Unbound resolver. | ||||
| *   ***0.0.0.0/0*** and ***::/0***: These are default routes for IPv4 and IPv6 respectively | ||||
| *   ***2001:678:d78:564::/96***: This is the NAT64 prefix, which will attract the IPv6-only traffic | ||||
|     towards DNS64-rewritten destinations, for example 2001:678:d78:564::8c52:7903 as DNS64 representation | ||||
|     of github.com, which is reachable only at legacy address 140.82.121.3. | ||||
|  | ||||
| {{< image width="100px" float="left" src="/assets/nat64/brain.png" alt="Brain" >}} | ||||
|  | ||||
| I have to be careful with the announcements into OSPF. The cost of E1 routes is the cost of the | ||||
| external metric **in addition to** the internal cost within OSPF to reach that network. The cost | ||||
| of E2 routes will always be the external metric, the metric will take no notice of the internal | ||||
| cost to reach that router. Therefor, I emit these prefixes without Bird's `ospf_metric2` set, so | ||||
| that the closest border gateway is always used. | ||||
|  | ||||
| With that, I can see the following: | ||||
| ``` | ||||
| pim@summer:~$ traceroute6 github.com | ||||
| traceroute to github.com (2001:678:d78:564::8c52:7903), 30 hops max, 80 byte packets | ||||
|  1  msw0.chbtl0.net.ipng.ch (2001:678:d78:50b::1)  4.134 ms  4.640 ms  4.796 ms | ||||
|  2  border0.chbtl0.net.ipng.ch (2001:678:d78:503::13)  0.751 ms  0.818 ms  0.688 ms | ||||
|  3  * * * | ||||
|  4  * * * ^C | ||||
| ``` | ||||
|  | ||||
| I'm not quite there yet, I have one more step to go.  What's happening at the Border Gateway? Let me | ||||
| take a look at this, while I ping6 to github.com: | ||||
|  | ||||
| ``` | ||||
| pim@summer:~$ ping6 github.com | ||||
| PING github.com(lb-140-82-121-4-fra.github.com (2001:678:d78:564::8c52:7904)) 56 data bytes | ||||
| ... (nothing) | ||||
|  | ||||
| pim@border0-chbtl0:~$ sudo tcpdump -ni any src host 2001:678:d78:50b::f or dst host 140.82.121.4 | ||||
| 11:25:19.225509 enp1s0f1 In  IP6 2001:678:d78:50b::f > 2001:678:d78:564::8c52:7904: | ||||
|                              ICMP6, echo request, id 3904, seq 7, length 64 | ||||
| 11:25:19.225603 enp1s0f0 Out IP 194.126.235.3 > 140.82.121.4: | ||||
|                              ICMP echo request, id 61668, seq 7, length 64 | ||||
| ``` | ||||
|  | ||||
| Unbound and Jool are doing great work. Unbound saw my DNS request for IPv4-only github.com, and | ||||
| synthesized a DNS64 response for me. Jool then saw the inbound packet from enp1s0f1, the internal | ||||
| interface pointed at IPng Site Local. This is because the **2001:678:d78:564::/96** prefix is | ||||
| announced in OSPFv3 so every host knows to route traffic to that prefix to this border gateway. | ||||
| But then, I see the NAT64 in action on the outbound interface enp1s0f0. Here, one of the IPv4 pool | ||||
| addresses is selected as source address. But there is no return packet, because there is no route | ||||
| back from the Internet, yet. | ||||
|  | ||||
| #### External: BGP | ||||
|  | ||||
| The final step for me is to allow return traffic, from the Internet to the IPv4 and IPv6 pools to | ||||
| reach this Border Gateway instance. For this, I configure BGP with the following Bird2 | ||||
| configuration snippet: | ||||
|  | ||||
|  | ||||
| ``` | ||||
| filter bgp_import { | ||||
|   if (net.type = NET_IP4 && !(net = 0.0.0.0/0)) then reject; | ||||
|   if (net.type = NET_IP6 && !(net = ::/0)) then reject; | ||||
|   accept; | ||||
| } | ||||
| filter bgp_export { | ||||
|   if (net.type = NET_IP4 && !(net ~ [ 194.126.235.4/30 ])) then reject; | ||||
|   if (net.type = NET_IP6 && !(net ~ [ 2001:678:d78::3:1:0/125 ])) then reject; | ||||
|  | ||||
|   # Add BGP Wellknown community no-export (FFFF:FF01) | ||||
|   bgp_community.add((65535,65281)); | ||||
|   accept; | ||||
| } | ||||
| ``` | ||||
|  | ||||
| I then establish an eBGP session from private AS64513 to two of IPng Networks' core routers at | ||||
| AS8298. I add the wellknown BGP no-export community (`FFFF:FF01`) so that these prefixes are learned | ||||
| in AS8298, but never propagated. It's not strictly necessary, because AS8298 won't announce more | ||||
| specifics like these anyway, but it's a nice way to really assert that these are meant to stay | ||||
| local. Because AS8298 is already announcing **194.126.235.0/24** and **2001:678:d78::/48** | ||||
| supernets, return traffic will already be able to reach IPng's routers upstream. With these more | ||||
| specific announcements of the /30 and /125 pools, the upstream VPP routers will be able to route the | ||||
| return traffic to this specific server. | ||||
|  | ||||
| And with that, the ping to Unbound's DNS64 provided IPv6 address for github.com shoots to life. | ||||
|  | ||||
| ### Results | ||||
|  | ||||
| I deployed four of these Border Gateways using Ansible: one at my office in Brüttisellen, one | ||||
| in Zurich, one in Geneva and one in Amsterdam. They do all three types of NAT: | ||||
|  | ||||
| *   Announcing the IPv4 default **0.0.0.0/0** will allow them to serve as NAT44 gateways for | ||||
|     **198.19.0.0/16** | ||||
| *   Announcing the IPv6 default **::/0** will allow them to serve as NAT66 gateway for | ||||
|     **2001:678:d78:500::/56** | ||||
| *   Announcing the IPv6 nat64 prefix **2001:678:d78:564::/96** will allow them to serve as NAT64 gateway | ||||
| *   Announcing the IPv4 and IPv6 anycast address for `nscache.net.ipng.ch` allows them to serve DNS64 | ||||
|  | ||||
| Each individual service can be turned on or off. For example, stopping to announce the IPv4 default | ||||
| into the Centec network, will no longer attract NAT44 traffic through a replica. Similarly, stopping | ||||
| to announce the NAT64 prefix will no longer attract NAT64 traffic through that replica. OSPF in the | ||||
| IPng Site Local network will automatically select an alternative replica in such cases. Shutting | ||||
| down Bird2 alltogether will immediately drain the machine of all traffic, while traffic is | ||||
| immediately rerouted. | ||||
|  | ||||
| If you're curious, here's a few minutes of me playing with failover, while watching YouTube videos | ||||
| concurrently. | ||||
|  | ||||
| {{< image src="/assets/nat64/nat64.gif" alt="Asciinema" >}} | ||||
|  | ||||
| ### What's Next | ||||
|  | ||||
| I've added an Ansible module in which I can configure the individual instances' IPv4 and IPv6 NAT | ||||
| pools, and turn on/off the three NAT types by means of steering the OSPF announcements. I can also | ||||
| turn on/off the Anycast Unbound announcements, in much the same way. | ||||
|  | ||||
| If you're a regular reader of my stories, you'll maybe be asking: Why didn't you use VPP? And that | ||||
| would be an excellent question. I need to noodle a little bit more with respect to having all three | ||||
| NAT types concurrently working alongside Linux CP for the Bird and Unbound stuff, but I think in the | ||||
| future you might see a followup article on how to do all of this in VPP. Stay tuned! | ||||
		Reference in New Issue
	
	Block a user