130 lines
6.5 KiB
Org Mode
130 lines
6.5 KiB
Org Mode
#+Title: VerifyHostKeyDNS
|
||
#+SUBTITLE: ... or how I enroll new hosts into my infrastructure.
|
||
#+DATE: 2023-01-15
|
||
* Prologue
|
||
I run my own infrastructure. I self-host my email, DNS, this website,
|
||
a [[https://git.tlakh.xyz/explore/repos][git server]], [[https://restic.net/][backups]], and probably a bunch of other stuff that I
|
||
forgot about. Ah yes, [[https://icinga.com/][monitoring]], Ubiquiti uniFi for my Wi-Fi access
|
||
points at home and probably even more stuff.
|
||
|
||
All of it running [[https://openbsd.org/][OpenBSD]], except for one machine running [[https://debian.org/][debian]]. It's
|
||
all tied together with [[https://www.ansible.com/][ansible]][fn:: I started out with ansible,
|
||
switched to salt stack and moved back to ansible. Because reasons.].
|
||
|
||
So far it's eight machines. I was reinstalling and consolidating some
|
||
VMs and physical machines the other day and hooking up new machines
|
||
became annoying because of ssh host-keys.
|
||
* StrictHostKeyChecking
|
||
My ansible orchestration host needs to be able to talk to new machines
|
||
over ssh. New machines need to talk to the backup server over ssh and
|
||
submit passive check results over ssh to the monitoring server. The
|
||
monitoring server needs to talk to new hosts over ssh[fn:: I don't
|
||
trust nrpe. I have seen the code. Instead I use ~by_ssh~ to monitor
|
||
hosts. Ansible adds an ssh public-key to a monitoring user with a
|
||
force-command. The force-command is a shell-script switching over
|
||
~${SSH_ORIGINAL_COMMAND}~ to run specific check_commands. It does not
|
||
trust the remote ssh at all.].
|
||
|
||
So we have the issue of existing infrastructure needing to verify
|
||
host-keys of new hosts and new hosts needing to verify host-keys of
|
||
existing infrastructure. One way to deal with this is to run a [[https://www.lorier.net/docs/ssh-ca.html][CA,
|
||
sign host-keys with it and roll certificates out]].
|
||
|
||
I on the other hand, prefer to use DNS[fn:: I have a laptop sticker and
|
||
travel mug with "We reject kings, presidents and voting. We believe in
|
||
rough consensus and running code." crossed out with "Fuck that! Just
|
||
put it in DNS." I also have a RUN DNS sticker. I am biased]. [[https://www.rfc-editor.org/rfc/rfc4255][RFC4255]] provides
|
||
facilities to store host-keys in SSHFP resource records in DNS and we
|
||
can secure those with DNSSEC.
|
||
|
||
* VerifyHostKeyDNS
|
||
[[https://man.openbsd.org/ssh_config.5#VerifyHostKeyDNS][ssh_config(5)]] explains how [[https://man.openbsd.org/ssh.1][ssh(1)]] can use SSHFP records to verify
|
||
host-keys:
|
||
|
||
+ *VerifyHostKeyDNS* :: Specifies whether to verify the remote key using
|
||
DNS and SSHFP resource records. If this option is set to yes, the
|
||
client will implicitly trust keys that match a secure fingerprint
|
||
from DNS. Insecure fingerprints will be handled as if this option
|
||
was set to ask. If this option is set to ask, information on
|
||
fingerprint match will be displayed, but the user will still need to
|
||
confirm new host keys according to the StrictHostKeyChecking option.
|
||
The default is no.
|
||
|
||
One problem with this is, if you put
|
||
#+begin_example
|
||
Host *
|
||
VerifyHostKeyDNS yes
|
||
#+end_example
|
||
into your =.ssh/config= it will not work. The magic is /secure
|
||
fingerprint/. What the documentation means is that a DNS answer for
|
||
SSHFP needs to have the /Authentic Data (AD)/ flag set. The flag gets
|
||
set by a validating name-server if it can DNSSEC validate the SSHFP.
|
||
|
||
But when the libc stub resolver[fn:: The thingy[fn:: Thingy is a
|
||
technical term, don't worry about it.] that ssh uses to talk
|
||
to the validating name-server. On OpenBSD that is [[https://man.openbsd.org/man3/asr_run.3][asr]].] gets that
|
||
answer it will strip the AD flag for security reasons. You see, it
|
||
does not know that it can trust the validating name-server. One way to
|
||
have a trustworthy validating name-server is to run one on localhost.
|
||
|
||
[[http://man.openbsd.org/resolv.conf#trust-ad][resolv.conf(5)]] explains the *trust-ad* option:
|
||
|
||
+ *trust-ad* :: A name server indicating that it performed DNSSEC
|
||
validation by setting the Authentic Data (AD) flag in the answer can
|
||
only be trusted if the name server itself is trusted and the network
|
||
path is trusted. Generally this is not the case and the AD flag is
|
||
cleared in the answer. The trust-ad option lets the system
|
||
administrator indicate that the name server and the network path are
|
||
trusted. This option is automatically enabled if resolv.conf only
|
||
lists name servers on localhost.
|
||
|
||
The easiest way is to run [[https://man.openbsd.org/unwind.8][unwind(8)]]:
|
||
#+begin_src shell
|
||
doas rcctl enable unwind
|
||
doas rcctl start unwind
|
||
#+end_src
|
||
|
||
[[https://man.openbsd.org/resolvd.8][resolvd(8)]] will then add =nameserver 127.0.0.1= to
|
||
=/etc/resolv.conf= and comment out all other dynamically learned name
|
||
servers. Just make sure that you are not using any static configured
|
||
name servers[fn:: I use ~! route nameserver $if 149.112.112.9
|
||
2620:fe::9 9.9.9.9 2620:fe::fe:9~ in my main [[http://man.openbsd.org/hostname.if.5][hostname.if(5)]] to add
|
||
some static name servers in case unwind(8) crashes[fn:: Not sure why
|
||
it would do that though. Sounds unpleasant.].] because you really want
|
||
to have only =nameserver 127.0.0.1= in there.
|
||
* Putting it all together
|
||
When I install a new host I have out of band access in one way or
|
||
another. It might be a serial console, a fake html5 console or some
|
||
KVM contraption. Heck, I even used [[https://blog.eulinux.org/2018/07/hetzner-install.html][qemu]] to get OpenBSD running on some
|
||
Hetzner physical machine.
|
||
|
||
On the installed machine I use said out of band access to run
|
||
#+begin_src shell
|
||
ssh-keygen -l -f /etc/ssh/ssh_host_ed25519_key.pub
|
||
#+end_src
|
||
This gives me one ssh host-key fingerprint and I can login over ssh.
|
||
|
||
I have to add IPv6 and legacy-IP addresses to DNS for the machine so I
|
||
also grab the SSHFP to add them at the same time:
|
||
|
||
#+begin_src shell
|
||
ls /etc/ssh/*.pub | xargs -n1 ssh-keygen -r $(hostname) -f
|
||
#+end_src
|
||
|
||
While still logged in, I install python3 and add an ssh-key for
|
||
ansible. I then add the host to the ansible inventory. The ansible
|
||
orchestrator can now finish the installation of the host over ssh
|
||
while trusting the SSHFP it finds in DNS.
|
||
|
||
Ansible also hooks up the host to my monitoring system and the
|
||
monitoring system can connect to the new host over ssh, again trusting
|
||
that it talks to the correct host because of SSHFP in DNS.
|
||
|
||
The newly installed host knows that it's talking to my backup and
|
||
monitoring server using their published SSHFP records.
|
||
* Epilogue
|
||
I have some ideas how to streamline this even more, but I do not
|
||
install new machines that often. This strikes a reasonable balance
|
||
between manual work and working on automation. It's probably best to
|
||
leave it like this.
|