124 lines
6.4 KiB
Org Mode
Raw Normal View History

2023-01-14 19:32:54 +01:00
#+Title: VerifyHostKeyDNS
2023-01-14 19:32:00 +01:00
#+SUBTITLE: ... or how I enroll new hosts into my infrastructure.
2023-01-15 08:26:22 +01:00
#+DATE: 2023-01-15
2023-01-14 19:32:00 +01:00
* Prologue
I run my own infrastructure. I self-host my email, DNS, this website,
a [[][git server]], [[][backups]], and probably a bunch of other stuff that I
forgot about. Ah yes, [[][monitoring]], Ubiquiti uniFi for my Wi-Fi access
points at home and probably even more stuff.
All of it running [[][OpenBSD]], except for one machine running [[][debian]]. It's
all tied together with [[][ansible]][fn:: I started out with ansible,
switched to salt stack and moved back to ansible. Because reasons.].
2023-01-15 08:32:16 +01:00
So far it's eight machines. I was reinstalling and consolidating some
VMs and physical machines the other day and hooking up new machines
became annoying because of ssh host-keys.
2023-01-14 19:32:00 +01:00
* StrictHostKeyChecking
My ansible orchestration host needs to be able to talk to new machines
over ssh. New machines need to talk to the backup server over ssh and
submit passive check results over ssh to the monitoring server. The
monitoring server needs to talk to new hosts over ssh[fn:: I don't
trust nrpe. I have seen the code. Instead I use ~by_ssh~ to monitor
2023-01-14 19:32:00 +01:00
hosts. Ansible adds an ssh public-key to a monitoring user with a
force-command. The force-command is a shell-script switching over
~${SSH_ORIGINAL_COMMAND}~ to run specific check_commands. It does not
2023-01-14 19:32:00 +01:00
trust the remote ssh at all.].
So we have the issue of existing infrastructure needing to verify
host-keys of new hosts and new hosts needing to verify host-keys of
existing infrastructure. One way to deal with this is to run a [[][CA,
sign host-keys with it and roll certificates out]].
I on the other hand prefer to use DNS[fn:: I have a laptop sticker and
travel mug with "We reject kings, presidents and voting. We believe in
rough consensus and running code." crossed out with "Fuck that! Just
put it in DNS." I also have a RUN DNS sticker. I am biased]. [[][RFC4255]] provides
facilities to store host-keys in SSHFP resource records in DNS and we
can secure those with DNSSEC.
* VerifyHostKeyDNS
[[][ssh_config(5)]] explains how [[][ssh(1)]] can use SSHFP records to verify
2023-01-14 19:32:00 +01:00
Specifies whether to verify the remote key using DNS and SSHFP
resource records. If this option is set to yes, the client will
implicitly trust keys that match a secure fingerprint from DNS.
Insecure fingerprints will be handled as if this option was set
to ask. If this option is set to ask, information on fingerprint
match will be displayed, but the user will still need to confirm
new host keys according to the StrictHostKeyChecking option. The
default is no.
One problem with this is, if you put
Host *
VerifyHostKeyDNS yes
into your =.ssh/config= it will not work. The magic is /secure
fingerprint/. What the man page means is that a DNS answer for SSHFP
needs to have the /Authentic Data (AD)/ flag set. The flag gets set
when a validating name-server is asked for the SSHFP record, it finds
it and it can validate the answer using DNSSEC.
But then the libc stub resolver[fn:: The thingy that ssh uses to talk
to the validating name-server. On OpenBSD that is [[][asr]].] gets that
answer it will strip the AD flag for security reasons. You see, it
does not know that it can trust the validating name-server. One way to
have a trustworthy validating name-server is to run one on localhost.
[[][resolv.conf(5)]] explains the *trust-ad* option:
trust-ad A name server indicating that it performed DNSSEC
validation by setting the Authentic Data (AD) flag
in the answer can only be trusted if the name
server itself is trusted and the network path is
trusted. Generally this is not the case and the
AD flag is cleared in the answer. The trust-ad
option lets the system administrator indicate that
the name server and the network path are trusted.
This option is automatically enabled if
resolv.conf only lists name servers on localhost.
The easiest way is to run [[][unwind(8)]]. [[][resolvd(8)]] will then add
=nameserver into =/etc/resolv.conf= and comment out all
other dynamically learned name servers. Just make sure that you are
not using any static configured name servers[fn:: I use ~! route
nameserver $if 2620:fe::9 2620:fe::fe:9~ in my
main [[][hostname.if(5)]] to add some static name servers in case unwind(8)
crashes[fn:: Not sure why it would do that though. Sounds
unpleasant.].] because you really want to have only =nameserver in there.
* Putting it all together
When I install a new host I have out of band access in one way or
another. It might be a serial console, a fake html5 console or some
KVM contraption. Heck, I even used [[][qemu]] to get OpenBSD running on some
Hetzner physical machine.
On the installed machine I use said out of band access to run
#+begin_src shell
ssh-keygen -l -f /etc/ssh/
This gives me one ssh host-key fingerprint and I can then login over
I then run
#+begin_src shell
ls /etc/ssh/*.pub | xargs -n1 ssh-keygen -r $(hostname) -f
and copy & paste the result into my DNS zone file along side A and
AAAA records for legacy IP and IPv6. I use [[][PowerDNS]] as a hidden DNSSEC
signer so I paste into the editor ~pdnsutil edit-zone~
While still logged in I install python3 and add an ssh-key for
ansible. I then add the host to the ansible inventory. The ansible
orchestrator can now finish the installation of the host over ssh
while trusting the SSHFP it finds in DNS.
The newly installed host knows that it's talking to my backup and
monitoring server using their published SSHFP records.