tlakh/dynamic_host_configuration_please.org

#+TITLE: Dynamic host configuration, please
#+DATE: 2023-03-07
* Prologue
The minimal viable product for an OpenBSD laptop has the following
features:
1. It has a real time clock (RTC).
2. It runs Emacs.
3. It can suspend *and* resume.
4. It has working Wi-Fi.
With those things available we can start to improve the user
experience.

A smart phone is basically always online in urban areas and even in
rural areas[fn:: My phone automatically connected to the Wi-Fi at Elk
Lakes Cabin. Never mind that we had to drag the satellite dish over
the pass.]. Nearly seven years ago at a hackathon in Cambridge, UK, we
set out to have a similar experience for our laptops. We will look at
how OpenBSD configures Wi-Fi networks, deals with network
auto-configuration for IPv4 and IPv6, and DNS resolution. We will show
how it does this in a reasonably secure way with minimal manual
configuration.
* Join the Wi-Fi.
The reader might recognize this conversation when arriving at a new
location and taking out their phone:
#+begin_quote
Me: Hey, what's the Wi-Fi password?

Them: We are in the middle of nowhere, there is no Wi-Fi.

Me: All lower-case, one word?
#+end_quote
On the phone, we need to select the Wi-Fi and enter the password only
once.  The phone then remembers it indefinitely and auto-connects to
it whenever the Wi-Fi is in range.

On OpenBSD, network interfaces are configured by [[https://man.openbsd.org/ifconfig.8][ifconfig(8)]], or
persistently in [[https://man.openbsd.org/hostname.if.5][/etc/hostname.IF]][fn::IF denotes a specific network
interface. For example for iwm0 the file is =/etc/hostname.iwm0=],
which is read by [[https://man.openbsd.org/netstart.8][netstart(8)]] during boot. netstart(8) calls ifconfig(8)
internally to handle the network configuration.

For a long time, we could only configure one Wi-Fi network:
#+begin_src shell
  $ cat /etc/hostname.iwm0
  nwid home wpakey "trivial password"
  inet autoconf
  inet6 autoconf
  up
#+end_src

This configures a Wi-Fi network named "home" and a password "trivial
password". IPv4 and IPv6 auto-configuration are enabled. Whenever the
network is in range the kernel automatically connects to it.

That is not a good user experience (UX). We typically take our laptops
with us and connect to different Wi-Fi networks, like our phones. We
have a Wi-Fi at home, at work, there are open Wi-Fis at hotels, and so
on.

People came up with all kinds of weird shell scripts that would run in
the background or triggered by [[https://man.openbsd.org/cron.8][cron(8)]] to notice when the laptop moved
to a different Wi-Fi. The script would then call ifconfig(8) to
reconfigure Wi-Fi from a list of networks it knew about. This was all
incredibly fragile and not the OpenBSD way.

Peter Hessler (phessler@), with the help of Stefan Sperling (stsp@)
went ahead and tackled this problem: What if we could pass multiple
=(name, password)= tuples to the kernel and the kernel would chose the
right one?

#+begin_src shell
  $ cat /etc/hostname.iwm0
  join home wpakey "trivial password"
  join work wpakey zUDciIezevfySqam
  join "Airport Wi-Fi"
  join ""
  inet autoconf
  inet6 autoconf
  up
#+end_src
=join= implements exactly this. The argument to =join= is the name of
the network and the following =wpakey= is the password for that
network. If we leave out the =wpakey=, the Wi-Fi is open and does not
require a password. Using =join= with the empty string (~join ""~)
means the kernel will try to connect to any open Wi-Fi if no Wi-Fi
from the join list is found first.

We still need to configure the name and password by editing a file
in =/etc/= and run netstart(8) when we encounter a new Wi-Fi. This is
probably not the best UI[fn::As far as I am concerned ed(1) is the
pinnacle of UI design, but YMMV.] but the UX is pretty good and on par
with a smart phone. Once the Wi-Fi is configured by adding a =join=
line, the kernel will automatically re-connect to a known Wi-Fi
whenever it comes into range.
* Stop slacking.
Now that we are connected to the Wi-Fi, we need to configure IP
addresses.

We started our efforts to improve the network configuration user
experience with IPv6 for two reasons. Even in this day and age
IPv6 is a technology for early adopters[fn::Which is quite sad.], they
are used to pain. When we break IPv4, people tend to complain. With
IPv6 they are eager to help debug the problem.

The other reason was, OpenBSD got IPv6 support from the KAME project
in the late 1990s and early 2000s and then there was not a lot of work
done afterwards. The network configuration was handled mostly in the
kernel, so there was no isolation from malicious input. For the most
part it assumed a stationary work station that tried to acquire an
IPv6 prefix for stateless address auto-configuration during boot by
sending three router solicitations and then listened for router
advertisements to create auto-configuration addresses and renewed
their lifetimes when a new advertisement flew by. There was some
rudimentary code in rtsold(8) to handle movement between networks, but
nobody was using it because it was optional. rtsold(8) was used in
one-shot mode where it would sent at most three router solicitations
when an interface connected to the network and then it would exit.

We started to write [[https://man.openbsd.org/slaacd.8][slaacd(8)]][fn:name_things:I should not be allowed
to name things.] and once that was working we could delete rtsold(8)
and remove a lot of code from the kernel.

slaacd(8) is a privilege separated network daemon that build previous
experience with privilege separation in OpenBSD. It uses three
processes, the /parent/ process to configure the system, the
/frontend/ process to talk to the outside world and the /engine/
process to handle untrusted data and run a state machine for the
stateless address auto-configuration protocol.

pledge(2) restricts what a process is allowed to do and this is
enforced by the kernel. Enforcement means that the kernel will
terminate processes that violate what they pledged they would do. The
pledges themselves are in broad strokes, we do not concern ourselves
with single system calls but with groups of system calls. For example,
the process is allowed to interact with open file descriptors
(="stdio"=), it is allowed to open connections to hosts on the
Internet (="inet"=), or it is allowed to open files for reading
(="rpath"=).

The /parent/ process pledges that it will only open new network
sockets, send those to other processes and reconfigure the routing
table (="stdio inet sendfd wroute"=). The /frontend/ process pledges
to only receive file descriptors, open unix domain sockets and check
the state of the routing table (="stdio unix recvfd route"=). Checking
the routing table includes seeing which flags are configured per
interface. The /engine/ process pledges to only read and write to
already open file-descriptors (="stdio"=). The /engine/ process is
very restricted what it is allowed to do. This is important because it
handles untrusted data coming from the network. While the /frontend/
process talks to the network, it never looks at the data. An attacker
will not be able to confuse the /frontend/ process with data they
sent. They can and did [[https://ftp.openbsd.org/pub/OpenBSD/patches/7.0/common/014_slaacd.patch.sig][confuse]] the /engine/ process.

For more details see [[file:privsep.org]["Privilege drop, privilege separation, and
restricted-service operating mode in OpenBSD"]].

slaacd(8) is enabled per default on all OpenBSD installations.

IPv6 stateless address auto-configuration is enabled on an interface
by setting the =AUTCONF6= flag using [[https://man.openbsd.org/ifconfig.8][ifconfig(8)]]: =ifconfig iwm0 inet6
autoconf=. The kernel announces this changed interface flag to the
whole system using a broadcasted route message. slaacd(8) reads those
messages using a [[https://man.openbsd.org/route.4][route(4)]] socket.

slaacd(8) handles all aspects of stateless address
auto-configuration. It sends router solicitations when needed, either
multi-cast or uni-cast, depending on which is appropriate. It waits
for router advertisements, parses them, and configures default routes,
global and temporary IPv6 addresses, and passes name server
information via a route message to the rest of the system. It takes
care of the lifetimes of addresses, default routes, and name server
information expiring and removing those from the system when no router
advertisements are received to extend the lifetime.

slaacd(8) also monitors when network interfaces regain their
connection to a network. For example because the laptop woke up from
suspend or it got moved out of range of a Wi-Fi network and moved back
into range. It then needs to find out if it connected to the same
network as before or if it is now in a new network. If it is a new
network we need to replace the old addresses, default route, and name
servers. If there is no IPv6 available it needs to remove the old
information.

The stateless address auto-configuration specification allows multiple
default routers being present on the same layer two network,
announcing the same or different network information. slaacd(8) tries
to handle this, but this has not been extensively tested in all
possible cases. There are still open questions being discussed at the
IETF on how to run networks with different network prefixes in the
same layer two network. Hic sunt dracones...

slaacd(8) does handle multiple interfaces just fine and we will show
later how we pick the right source address when multiple are available
to chose from.

* Dynamic host configuration, please.
With IPv6 address configuration mostly solved, it was time to look at
IPv4 again. We used a fork of ISC's dhclient(8). Henning Brauer
(henning@) added privilege separation to it and in recent years
Kenneth Westerback (krw@) heroically maintained it. It was showing its
age though. The privilege separation was never quite right. This
became more visible with the integration of pledge(2) and it would be
difficult to integrate some of the features we developed in slaacd(8).

It was time to write a new daemon. Otto Moerbeek (otto@) solved the
most pressing problem by suggesting a name for it: dhcpleased(8). We
try to be polite towards the computer. It is pronounced "dynamic host
configuration, please". The "d" is silent.

On a very high level IPv4 DHCP and IPv6 stateless address
auto-configuration are very similar. We request some information from
the router[fn::In IPv6 we might not need to request the information,
it might just show up unannounced.], we use it to configure the system
and we make sure that information does not expire. When we move
networks we need to probe if our information is still up to date and
if not, reconfigure the system.

The obvious solution is to copy =sbin/slaacd= to =sbin/dhcpleased= and
replace the IPv6 specific bits with IPv4 specific bits. And that is
exactly what we did.

On paper DHCP looks more complicated than IPv6 stateless address
auto-configuration because it negotiates with the server and there is
a complicated state machine to implement.

In practice it is the other way around. The "stateless" part in IPv6
does not apply to the client. The client must keep state and implement
a state machine to keep track of which routers are available and when
various information expires. In IPv4 we talk to one server and all
information expires at the same time.

We will talk about a few differences between slaacd(8) and
dhcpleased(8) in a moment, but from the user perspective both behave
the same. They make sure that the address configuration and default
gateway are always up to date and they pay attention when the machine
moves between networks, either while awake or while sleeping.

Because dhcpleased(8) has to use [[https://man.openbsd.org/bpf.4][bpf(4)]] instead of regular sockets for
some of the network packets it needs to sent, the /parent/ process
cannot use pledge(2). There is nothing it could pledge that would
allow the usage of bpf(4) at the moment. To protect the system and
prevent exfiltration of sensitive data we use [[https://man.openbsd.org/unveil.2][unveil(2)]] to restrict
the /parent/ process' view of the file system. dhcpleased(8) can only
read its configuration file, read and write =/dev/bpf=, and read,
write and create files underneath =/var/db/dhcpleased/= to store
information about received leases.

While we could get away with not implementing a config file for
slaacd(8), we were not this lucky with dhcpleased(8). Some systems out
there will only give us a DHCP lease if we sent the correct /client
id/ for example.

There are a lot of DHCP options specified in RFC 2132. We only
implement the bare minimum, only the options we need and can
handle. We do not need a swap server or a cookie server to get the
quote of the day.

Like slaacd(8), dhcpleased(8) is enabled on all OpenBSD
installations.
* Route priorities.
dhcpleased(8) and slaacd(8) can handle multiple interfaces at the same
time. The routing table might look like this:
#+begin_src shell
  $ netstat -nrf inet
  Routing tables

  Internet:
  Destination        Gateway            Flags   Refs      Use   Mtu  Prio Iface
  default            192.168.1.1        UGS        4      110     -     8 em0
  default            192.168.178.1      UGS        0        0     -    12 iwm0
  [...]
#+end_src
We end up with two default routes, one gateway is reachable via the
/em0/ interface with priority value 8 and the other gateway is
reachable via the /iwm0/ interface with priority value 12. A route has
higher priority when its priority value is lower. /em0/ is an Ethernet
interface and it gets higher priority over the Wi-Fi interface
/iwm0/. All things being equal, the kernel will pick the address from
/em0/ as source address when making a new connection to the internet
and route traffic over the Ethernet interface, which is presumably
faster.

If we pick up the laptop and unplug the Ethernet interface, all things
are no longer equal, the route over /em0/ is no longer usable and
existing connections using it will stall and time out. New connections
will instead use /iwm0/.

If we plug /em0/ back in again, session might come alive again and new
connections will use /em0/. Connections that are running over /iwm0/
will continue working, because the interface is still connected to
the Wi-Fi.

Applications like web browsers, email clients or even video
conferencing systems will automatically establish a new connection
when they notice the old one is dead.

Unfortunately [[https://man.openbsd.org/ssh.1][ssh(1)]] is not one of them. If switching between wired
and wireless happens seldomly [[https://man.openbsd.org/tmux.1][tmux(1)]] on the remote system might help
with ssh(1) disconnects. Or maybe a [[https://man.openbsd.org/wg.4][wg(4)]] tunnel can be used so that
the source address does not change when switching between wired and
wireless.
* Cellular networks.
In addition to Ethernet and Wi-Fi networks, OpenBSD supports "Mobile
Broadband Interface Model" devices using the [[https://man.openbsd.org/umb.4][umb(4)]] driver. These can
be used to connect to UMTS or LTE networks. They require a sim card
and after being configured using a PIN they will connect to cellular
networks and automatically configure an IP address and default
route. The default route has an even lower route priority than Wi-Fi
so it will only be used when Ethernet and Wi-Fi are not connected.
* It is always DNS.[fn::In my line of work that is certainly true, but that is just sample bias.]
We need to talk about DNS next. Humans are not particularly good at
remembering =2606:2800:220:1:248:1893:25c8:1946=, we are much better
with names like /example.com/. When we run ~ping6 example.com~ we
sooner or later end up in [[https://man.openbsd.org/asr_run.3][libc's stub resolver]]. It will open
=/etc/resolv.conf=, and look for /nameserver/ lines to use for DNS
resolution.

We can learn name servers from dhcpleased(8), slaacd(8), umb(4),
and [[https://man.openbsd.org/iked.8][iked(8)]]. Historically dhclient(8) owned =/etc/resolv.conf=, which
means that no other process could add name servers to it. dhclient(8)
would just overwrite whatever was in there whenever it renewed its
lease. This made it impossible to sometimes move to an IPv6-only
network. slaacd(8) could not configure name servers and the left-over
IPv4 name servers were not reachable.

We can either teach all name server sources to somehow cooperate and
to not scribble over each other and share responsibility of
=/etc/resolv.conf= or we can run an arbitrator that collects name
servers from diverse sources and handles the contents of
=/etc/resolv.conf=.

[[https://man.openbsd.org/resolvd.8][resolvd(8)]] is such an arbitrator. It is another always enabled
daemon. It collects name servers from all the mentioned sources and
adds them to =/etc/resolv.conf=.

It also monitors if =/etc/resolv.conf= gets edited in which case it
re-reads the file and makes sure that the learned name servers are at
the beginning of the file. This is useful when the administrator of
the machine decides to add options to =/etc/resolv.conf=. For example,
we can edit the file and add =family inet6 inet= to prefer IPv6 over
IPv4 and resolvd(8) will cope. There is no need for an extra
configuration file, =/etc/resolv.conf= is the configuration file.

Name servers are announced using route messages and resolvd(8) listens
for them using a route(4) socket. They can also be observed using the
[[https://man.openbsd.org/route.8][route(8)]] tool: ~$ route monitor~.

resolvd(8) can also request that name servers are re-announced by their
sources. This is useful when resolvd(8) gets restarted.
* Let us unwind[fn:: See [fn:name_things].] a bit.
Good old plain DNS is not a secure protocol. It exchanges
un-authenticated UDP packets without any integrity protection. This
makes it easy for an attacker to spoof answer packets.

DNS answer packets are untrusted data, they come from the
network. However, the process that sends DNS queries and parses the
answer using the libc functions is almost always the single main
process of the tool. When we run ~ping example.com~, DNS packets are
parsed using our user. An attacker who can spoof a DNS answer might be
able to trigger a bug in libc and gain code execution that way.

On OpenBSD ping(8) pledges ="stdio DNS"=, so the attacker will not get
very far, but there are many more programs in ports that are not
pledged that might want to resolve names.

It would be worthwhile to have some sort of proxy running on localhost
so that DNS packets from the outside need to traverse a well locked
down process running in a different address-space and as a different
user than the program that needs to resolve a name.

An early experiment was rebound(8), written by Ted Unangst (tedu@). It
was simplistic and did not understand DNS at all, it would just
forward packets, but it would sit between the Internet and the
program.

An alternative is to run a full recursive resolver like [[https://man.openbsd.org/unbound.8][unbound(8)]] on
the laptop, but this leads to problems, too. unbound(8) expects a well
working network where nobody interferes with DNS, this is true in data
centres and can be achieved in well maintained home networks, but it
is not something we find when moving laptops to arbitrary networks
like free Wi-Fi in a hotel or airport.

We can either give up and move to a different hotel[fn::Which is not
realistic.], or we need to adjust our expectations, figure out what we
have and work with that.

It turns out that often the quality of the network changes over
time. When we first connect to a hotel Wi-Fi we may find ourselves in
what is referred to as a /captive portal/. Everything is blocked, DNS
gets intercepted, and we are redirected to a web site where we need to
agree to the terms and conditions. Maybe provide our name and room
number. Once we are past that, network quality improves considerably
and we are mostly free to talk to the outside world.

This is where [[https://man.openbsd.org/unwind.8][unwind(8)]] comes in. It is another privilege separated
network daemon that provides a recursive name server for the local
machine. resolvd(8) detects when it is running and automatically
rewrites =/etc/resolv.conf= to have only =nameserver 127.0.0.1= listed
as name server.

With that we have the first problem solved, or at least improved on
the situation. Programs that need DNS resolution are insulated from
the Internet. An attacker needs to get past unwind(8) first before
they can try to attack the libc stub resolver.

unwind(8) understands and speaks DNS and it actively observes the
network quality.

We did not write our own recursive name server. That would be
difficult, it would be unlikely we would get it right on first
try[fn:: Or second or third try for that matter.], and DNS is
constantly evolving, so it is a lot of effort to keep up. Instead we
are standing on the shoulders of giants and use libunbound, which is
part of [[https://man.openbsd.org/unbound.8][unbound(8)]]. It is developed under a BSD license by [[https://www.nlnetlabs.nl/][NLnet Labs]].


The resolver process pledges ="stdio inet dns rpath"= and
restricts access to the file system using unveil(2) to
=/etc/ssl/cert.pem=. This is the process that is exposed to the
Internet and handles untrusted data. It would be preferable to have
one process exposed to the Internet and another to parse untrusted
data but that is not possible to do with libunbound.

Since we are using a real recursive name server, that gives us a lot
of options on how we can resolve names:
+ We can do our own recursion, walk down from the root zone using
  qname minimization to improve privacy.
+ We can use the name server we learned from dhcpleased(8) and
  slaacd(8) as forwarders, so we do not need to do our own recursion,
  which might be faster.
+ We can try to opportunistically speak DNS over TLS (DoT) to the
  learned name servers to prevent eavesdroppers from listening in.
+ We can configure forwarders manually to not depend on the network
  provided name servers. Those might be more trustworthy. They can
  also be DoT forwarders to prevent eavesdropping.
+ As a last resort, unwind(8) can behave exactly like the libc stub
  resolver[fn::I call this the Dutch train problem. The free Wi-Fi on
  Dutch trains do not like DNS queries with an /EDNS0/ option, they
  intercept them, do not understand them, and answer /NXDOMAIN/. There
  are other free Wi-Fi networks that are similarly broken.].
We call these resolving strategies and unwind(8) actively probes if
they are usable by sending test queries when it notices that the
network changed, for example because we moved to a different Wi-Fi
network or woke up from suspend. It then orders them by quality and
picks the best one.

There is an implicit skew in the strategies for finding the best one:
A manually configured DoT name server is always considered better than
a name server provided by the local network. As long as its available
and not atrociously slow.

unwind(8) is not too concerned about preserving privacy, it is
pragmatic and tries to resolve names the best way it can, if that
means using the local name servers provided by the network because
they are the only ones available it will use them.

Since unwind(8) uses libunbound it also supports DNSSEC. DNSSEC
provides data integrity and cryptographic authenticity, it does not
provide confidentiality.

unwind(8) is pragmatic about DNSSEC. When it tests the quality of a
resolving strategy it also tries to find out if DNSSEC is
available. There are many reasons why DNSSEC is not available: The
network is misconfigured, DNSSEC is flat out blocked or the laptop
does not (yet) have the correct time. If DNSSEC does not work
unwind(8) does not insist on using it.

Of course this makes it susceptible to a downgrade attack. To mitigate
this, unwind(8) will insist on DNSSEC working after it discovered once
that DNSSEC is working in the local network. This means that an
attacker needs to be able to block DNSSEC from the moment we connect
to a network. They cannot show up later and try to downgrade
us. unwind(8) will only become lenient again when we connect to a new
network.

This is not a strong mitigation of course, but DNSSEC is not a silver
bullet that fixes everything at the resolver. Applications also need
to do their part and decide how much they are willing to trust
DNS. For example ssh(1)'s /VerifyHostKeyDNS/ feature will only trust
host key fingerprints it obtained from DNS if they were validated
using DNSSEC and the validator runs on the local
machine[fn::Technically not entirely true, ssh(1) trusts what libc
indicates and libc automatically trusts localhost. See /trust-ad/ in
[[https://man.openbsd.org/resolv.conf.5][resolv.conf(5)]].]. Otherwise it will ask the user what to do.

A worst case scenario when joining a somewhat broken Wi-Fi network
with captive portal and a manually configured DoT name server might
look like this:
1. We connect to the network, we cannot reach the DoT name server and
   cannot do our own recursion.
2. unwind(8) will chose the name server provided by the
   network. It also notes that we just connected to a new network so
   it is lenient with respect to DNSSEC validation. In effect it will
   ignore validation errors.
3. We try to access a web site and the captive portal
   detection in the browser triggers. We click the buttons and fill in
   the forms until we are allowed on the internet.
4. unwind(8) notices that it can do its own recursion.
5. At the same time, unwind(8) notices that the DoT name server is
   also reachable now and starts using it.

unwind(8) does not natively support DNS over HTTPS (DoH) and we
sometimes find ourselves in networks that block everything except for
TCP port 443. One way around this is to use dnscrypt-proxy from ports
which does support DoH. We can point unwind(8) at it by manually
configuring a plain DNS forwarder in addition to a DoT forwarder:
#+begin_src shell
  $ cat /etc/unwind.conf
  forwarder "9.9.9.9" port 853 authentication name "dns.quad9.net" DoT
  forwarder "2620:fe::9" port 853 authentication name "dns.quad9.net" DoT
  forwarder "127.0.0.1" port 5353 # dnscrypt-proxy for DoH
#+end_src
* Time for gelato.[fn:: Again, see [fn:name_things].]
People from the future might encounter networks without any IPv4.  If
they are not too far in the future they might still need to talk to
IPv4 hosts on the Internet.

There are various transition technologies that get us from an IPv4
only Internet to an IPv6 only Internet. We will only look at /NAT64/,
/DNS64/, and /464XLAT/.
/NAT64/ allows us to reach IPv4 hosts from an IPv6 only network by
pretending that the hosts are IPv6 enabled. IPv6 addresses are so big
that we can easily encode all of IPv4 in an IPv6 /64 prefix, which is
the usual size of on IPv6 prefix we see per layer two network. In fact
we don't need the whole /64, a /96 is enough to encode the whole IPv4
Internet.

Let us pretend we know the /96 prefix used for /NAT64/ and the IPv4
address we want to reach. Forming an IPv6 address for the host is then
simply a bitwise-or operation of the IPv4 address with the /96 prefix,
the IPv4 address fills in the lower bits of the IPv6 prefix. This is
called address synthesis.

We can then use this address to connect to the IPv4-only
host. Somewhere on the network path is the /NAT64/ gateway that is
dual stacked. It knows that our packets are using /NAT64/ because it
is configured with the /96 prefix. It intercepts the packets and forms
IPv4 packets and sends them on their way. The gateway needs to be
stateful to be able to /NAT/ the return traffic back to us.

To find out the IPv4 address we want to connect to we of course use
DNS. The local name servers that slaacd(8) learned about would know
about the /NAT64/ prefix used in the network and do the address
synthesis for us. This is called /DNS64/. The problem with this is that
the name servers spoof DNS answers, something that DNSSEC tries very
hard to prevent. unwind(8) will detect this and generate an error, or
unwind(8) might not even talk to the designated name servers at all.

To get around this unwind(8) can itself detect the presence of /DNS64/
on a network by asking the local name servers for the /AAAA/ record,
i.e. the IPv6 address, for something that is guaranteed to never have
one: /ipv4only.arpa/. If it gets an answer, it can reverse the address
synthesis and learn the /NAT64/ prefix. With that information it can
do /DNS64/ itself and there is no longer a problem with DNSSEC.

The downsides of this mechanism are that it is quite complicated, it
messes around with DNS, and it does not work with IPv4 address
literals. It also does not work with programs that are fundamentally
IPv4 only: =ping example.com= will never work in an IPv6 only network
with only /NAT64 / DNS64/.

Instead of pretending the IPv4 host we want to reach has IPv6, we can
pretend to have working IPv4 if a /NAT64/ gateway is present. We ask
the kernel via the [[https://man.openbsd.org/pf.4][pf(4)]] firewall to do the IPv4 to IPv6 translation
for us. The /NAT64/ gateway will then do the reverse translation and
send an IPv4 packet on its way. This is called /464XLAT/.

We first need an IPv4 address, RFC 7335 reserved =192.0.0.0/29= for
this purpose:
#+begin_src shell
  ifconfig pair1 inet 192.0.0.4/29
#+end_src
We then need a default gateway:
#+begin_src shell
  ifconfig pair2 rdomain 1
  ifconfig pair2 inet 192.0.0.1/29
#+end_src
Because pf(4) will only do address family translation on inbound rules
we need a different /rdomain/ and use [[https://man.openbsd.org/pair.4][pair(4)]] interfaces. We need to
connect them:
#+begin_src shell
  ifconfig pair1 patch pair2
#+end_src
And then we can configure our default route:
#+begin_src
  route add -host -inet default 192.0.0.1 -priority 48
#+end_src
We set it to a very low priority[fn:: Remember, a high priority
*value* means low priority.] so that it does not interfere with routes
dhcpleased(8) configures when we move to an IPv4 enabled network.

We then need to configure address family translation in pf(4) when we
detect /NAT64/ being present. This is were [[https://codeberg.org/fobser/gelatod][gelatod(8)]] comes in. It is
a Customer-side transLATor (/CLAT/) configuration daemon[fn::If you
squint just right, gelato kinda sounds like clat[fn::Again, I really
really should be prohibited from naming things.].]. /CLAT/ is what
/464XLAT/ calls the address translation happening on the laptop.

gelatod(8) is yet another privilege separated daemon[fn::At this point
you should believe me that that is a good thing and I will not go into
pledge details.] that checks for the presence of a /NAT64/
gateway whenever we change networks. It does so either via the
/ipv4only.arpa/ trick or explicitly via router advertisements. RFC
8781 specifies how a network can signal the presence of a /NAT64/
gateway.

gelatod(8) needs a pf(4) anchor into which it adds rules that are
similar to this example:
#+begin_src
  pass in log quick on pair2 inet af-to inet6 \
    from 2001:db8::da68:f613:4573:4ed0 to 64:ff9b::/96 \
    rtable 0
#+end_src
The rule is doing address family translation to IPv6 on incoming
packets on =pair2=. In this example it uses
=2001:db8::da68:f613:4573:4ed0= as the IPv6 source address, gelatod(8)
learned this from the system when slaacd(8) configured
it. =64:ff9b::/96= is the learned /NAT64/ prefix and we are moving
traffic back to =rtable 0=. Remember =pair2= is in rdomain 1[fn::Do
not ask me about the difference between an rdomain and an rtable, I do
not know either.].

While this is all cute and works rather well, it is also completely
horribly complicated to set up. And that is why gelatod(8) is not in
OpenBSD base but lives in ports. We believe in good defaults in
OpenBSD and try to keep the buttons a user has to push to get
something working to an absolute minimum.
* Future work.
Which brings us to future work.

We want the functionality of gelatod(8) in OpenBSD base. gelatod(8)
was mostly a proof of concept. We imagine that a new network device
like clat(4) take over the role of client side address family
translation. It could be always present and gelatod(8) just enables
and disables it. At that point we can move the functionality into
slaacd(8) and delete gelatod(8). /CLAT/ is defined as a stateless
mechanism so it does not need the full pf(4) machinery for address
family translation.

It would be nice to have DNS over HTTPS (DoH) and DNS over Quic (DoQ)
natively in unwind(8). We are mostly waiting on upstream to implement
support in unbound(8).

And then there is some ongoing maintenance, little things that could
be improved:
+ The captive portal detection in unwind(8) is not perfect and it will
  probably never be.
+ dhcpleased(8) and slaacd(8) should remember IP addresses from
  networks they have been connected to before to be able to quickly
  re-establish connectivity by probing if we are connecting to a
  previous network while the lifetime of our addresses did not expire
  yet. RFC 4436 "Detecting Network Attachment in IPv4 (DNAv4)" and RFC
  6059 "Simple Procedures for Detecting Network Attachment in IPv6"
  have the details.
+ It would be nice if the dhcpleased(8) parent process could be
  pledged. This is not currently possible because of bpf(4). Things to
  investigate here are changes to the network stack that would allow
  us to use raw sockets instead of bpf(4) sockets or the ability to
  [[https://man.openbsd.org/dup.2][dup(2)]] an existing bpf(4) socket and re-program the interface it is
  using.
* Epilogue
Writing all this software over the last six to seven years was a lot
of fun. And combined with all the other features OpenBSD has to offer
like the /join/ feature, working suspend and resume and accelerated
video on /amd/ and /intel/ graphic cards makes it a pleasure to use
OpenBSD on a laptop as a daily driver. Things just work. Mostly. And
if they do not you have something to fix!