Florian Obser 024073d10e spelling; from weerd

2023-02-21 13:17:22 +01:00

36 KiB

Raw Permalink Blame History

Privilege drop, privilege separation, and restricted-service operating mode in OpenBSD

Prologue
Why
Privilege drop
Restricted-service operating mode
Privilege separation
OpenBSD network daemons
Epilogue

Prologue

My main focus in OpenBSD are privilege separated network daemons running in restricted-service operation mode. I gave talks at BSDCan and FOSDEM in the past about how I used these techniques to write slaacd(8) and unwind(8). While I do not think of myself as a one-trick pony, I have written some more: slowcgi(8), rad(8), dhcpleased(8), and gelatod(8). I also wrote the first version of what later turned into resolvd(8).

At one point I claimed that it would take me about a week to transmogrify one daemon into a new one.

Why

Privilege drop, privilege separation, and restricted-service operating mode are exploit mitigations. When¹ an attacker finds a bug we try to stop them from causing damage. The mitigations we are talking about here are aimed at attackers that achieved arbitrary code execution. Due to other mitigations that is quite difficult to pull off. These are the last line of defence. We try to remove as many resources from the attacker to play with and try to crash the program as quickly as possible if an attacker touches something they are not supposed to.

Privilege drop

Privilege drop is probably the weakest mitigation discussed in this article. It is a very old technique, but still important for set-user-ID root binaries.

Theo de Raadt refactored ping(8) over 26 years ago to open a raw socket early and then drop root privileges. This prevents a local user from elevating their privileges when finding a bug in ping(8):

@@ -191,6 +191,14 @@
 	char rspace[3 + 4 * NROUTES + 1];	/* record route space */
 #endif
 
+	if (!(proto = getprotobyname("icmp")))
+		errx(1, "unknown protocol icmp");
+	if ((s = socket(AF_INET, SOCK_RAW, proto->p_proto)) < 0)
+		err(1, "socket");
+
+	/* revoke privs */
+	setuid(getuid());
+
 	preload = 0;
 	datap = &outpack[8 + sizeof(struct timeval)];
 	while ((ch = getopt(argc, argv, "DI:LRS:c:dfh:i:l:np:qrs:T:t:vw:")) != EOF)
@@ -235,6 +243,8 @@
 			loop = 0;
 			break;
 		case 'l':
+			if (getuid() != 0)
+				errx(1, "must be root to specify preload");
 			preload = strtol(optarg, NULL, 0);
 			if (preload < 0)
 				errx(1, "bad preload value: %s", optarg);
@@ -323,12 +333,6 @@
 			*datap++ = i;
 
 	ident = getpid() & 0xFFFF;
-
-	if (!(proto = getprotobyname("icmp")))
-		errx(1, "unknown protocol icmp");
-	if ((s = socket(AF_INET, SOCK_RAW, proto->p_proto)) < 0)
-		err(1, "socket");
-	hold = 1;
 
 	if (options & F_SADDR) {
 		if (IN_MULTICAST(ntohl(to->sin_addr.s_addr)))

20 years later we realized that we would not drop privileges when ping(8) is invoked as root. We would just "drop" from root to root. We can protect ourselves from a malicious ping target by dropping to a dedicated user:

@@ -272,8 +275,12 @@
 
 	/* revoke privs */
 	uid = getuid();
-	if (setresuid(uid, uid, uid) == -1)
-		err(1, "setresuid");
+	if ((pw = getpwnam(PING_USER)) == NULL)
+		errx(1, "no %s user", PING_USER);
+	if (setgroups(1, &pw->pw_gid) ||
+	    setresgid(pw->pw_gid, pw->pw_gid, pw->pw_gid) ||
+	    setresuid(pw->pw_uid, pw->pw_uid, pw->pw_uid))
+		err(1, "unable to revoke privs");
 
 	preload = 0;
 	datap = &outpack[ECHOLEN + ECHOTMLEN];

ping(8) needs a raw socket to be able to send ICMP echo request packets. This is an operation that only root is allowed to do[fn::This prevents normal users from sending arbitrary IP packets, for example with spoofed IP addresses or from privileged (<1024) source ports.]. Once that socket is open though, ping(8) no longer needs to do any other privileged operation. It can hold on to the socket for later use and drop root privileges.

Another use for privilege drop is in daemons to restrict file-system access by chroot(2)'ing to /var/empty. The daemon needs root privileges to call chroot(2), but afterwards it can run without elevated permissions. To the process it looks like there is only the / directory where it does not have any permissions.

The standard pattern can be seen in frontend.c of rad(8):

  if ((pw = getpwnam(RAD_USER)) == NULL)
          fatal("getpwnam");

  if (chroot(pw->pw_dir) == -1)
          fatal("chroot");
  if (chdir("/") == -1)
          fatal("chdir(\"/\")");

  if (setgroups(1, &pw->pw_gid) ||
      setresgid(pw->pw_gid, pw->pw_gid, pw->pw_gid) ||
      setresuid(pw->pw_uid, pw->pw_uid, pw->pw_uid))
          fatal("can't drop privileges");

We first get a user with getpwnam(3) to drop to. The user has /var/empty configured as its home directory so we can use that in chroot(2). Next we chdir(2) to the new file-system root to have a valid current working directory. This prevents us accidentally marking a file-system as busy depending on from where the daemon was started, preventing unmounting file-systems while the daemon is running.

We then drop privileges by putting the user into a single group using setgroups(2). The calls to setresgid(2) and setresuid(2) set the real, effective and saved group and user IDs. This safely drops from root:wheel to (in this case) _rad:_rad with no way to escalate back to root.

This technique is probably used in most, if not all, OpenBSD's privilege separated daemons.

Restricted-service operating mode

With privilege drop in ping(8) we prevent a local unprivileged user to gain superuser or root privileges. If there were a bug in message parsing[fn::I do not want to heckle FreeBSD, it is just that it is a good illustration for what we are currently discussing. FreeBSD's ping(8) is using capsicum, so it is well locked away, too. And it is not like I am not making any mistakes…], a malicious ping target, or even host in the middle, could still read and exfiltrate ssh private keys. ping(8) runs as my user-id. It can read all files my user can read, it can open network connections to any host on the internet, it can execute arbitrary programs, heck it can talk to my GPU. That is a lot of power that it does not need. It only needs to write to stdout and stderr², and send and receive ICMP packets.

We could lock ping(8) away using chroot(2), that at least takes away file-system access. But what can we do about programs that need file-system access but should not execute other programs or talk to the Internet? Like file(1) for example.

Decades ago Niels Provos developed systrace(4) but it turned out that it was difficult to use. The only user in OpenBSD base was sshd(8).

In 2015 Theo de Raadt tricked Nicholas Marriott into privilege separating and sand-boxing file(1) using systrace. It lasted for half a year, it was that painful.

One problem with systrace(4) was that it worked on the level of syscalls and their arguments. This is not something user-land developers are intimately familiar with. We are interacting with libc and do not know what kind of syscalls libc does on our behalf. Another issue is that a program might need some ioctl(2)s or sysctl(2)s but it should not be able to do all of them. So we need to encode restrictions on arguments of syscalls. This gets unwieldy, fast.

There was also systrace(1) to define a policy outside of the program. It turns out that most programs need to do some sort of initialization where they need wide access to the system. This is before they touch untrusted data. Once the initialization is done we can restrict access. How much we can restrict access can depend on command line flags. systrace(1) could not help with this, the program would retain all the privileges it needed for initialization. They would be fewer than all privileges, but still way to many.

As far as I know, the experience with file(1) was the last straw. Theo set out to improve on this situation by developing tame(2), which was later renamed to pledge(2)[fn::It turned out that it was difficult to use tame(2) in a sentence when presenting the concept, hence the rename to pledge(2).].

pledge(2) was developed by studying all programs in OpenBSD base and putting their needed services into categories using broad strokes like memory management, read-write on open file descriptors, opening of files, or networking. If a program violates what it pledged to do, for example trying to open a file when it did not pledge rpath, it will be terminated with an uncatchable SIGABRT.

It is worth repeating that: If a program violates what it pledged to do it will be terminated by the kernel. An attacker does not get to play again and try something else[fn::Needless to say that I despise init systems that restart services when they crash.].

This was an iterative process with patches floating around. A few co-conspirators, including myself, joined the effort a bit later to add pledge(2) to more programs. Once we hit 50 or so pledged programs, Theo considered it mature enough for commit and work continued in tree. Soon after, the list of programs not pledged at all was shorter than the list of pledged programs. This is a huge success and speaks to the usability of pledge(2). In decades we only had one program using systrace(4) in OpenBSD base, then pledge(2) shows up and in less than a year nearly all of OpenBSD base uses it.

To add pledge(2) to a program we need to know what it does and potentially re-factor it to pull (hoist) one-time initialization up before pledge(2) is called for the first time. Some programs are sloppy in the sense that they open a certain resource the moment they need it, this means that they retain more access than they need. As we have seen with ping(8), if we pull opening of the raw socket before option parsing we can drop root privileges before touching untrusted data[fn::With a set-user-ID root program command line options are untrusted data!].

Since pledge(2) is internal to the program we can call it once we are done with option parsing and pledge different things depending on given options. For example, ping(8) retains the ability to do DNS lookups depending on the -n flag:

  if (options & F_HOSTNAME) {
          if (pledge("stdio inet dns", NULL) == -1)
                  err(1, "pledge");
  } else {
          if (pledge("stdio inet", NULL) == -1)
                  err(1, "pledge");
  }

pledge(2) is not fine-grained. It turns out that programs fall into broad categories of what they want to do after initialization. There are not hundreds of different promises for every obscure program, it is not needed. To add a new promise, a rule of thumb is: At least two programs have been identified that need a new promise. To add a syscall to an existing promise, that is, to give more power to an existing promise, needs careful evaluation of what all the other programs already using the promise, gain. It is not enough to show that it is fine for the new program, existing programs are much more important. Another question is how much additional kernel attack surface this exposes.

pledge(2) does not only protect the user of the system or systems on the Internet from harm when a bug is found, it also protects the kernel from user-land. Checking if a syscall is allowed happens early, before a lot of kernel code runs.

pledge(2) can be used to gain understanding on what a program does. We see the following pledges in file(1):

  $ cat -n file.c | fgrep 'pledge("'
   171 if (pledge("stdio rpath getpw recvfd sendfd id proc", NULL) == -1)
   210 if (pledge("stdio rpath sendfd", NULL) == -1)
   374 if (pledge("stdio getpw recvfd id", NULL) == -1)
   389 if (pledge("stdio recvfd", NULL) == -1)

The reader is encouraged to stop here and read the code around those pledge(2) calls to figure out what file does and why it pledges those things.

On line 171, file(1) pledges all the things it needs. It can read-write already open file descriptors[fn::It can only read if the file descriptor was opened for reading and only write if the file descriptor was opened for writing. Since file(1) cannot open files for writing or create new files, it cannot write to disk.], open files for reading, pass file descriptors around, figure out under which user it runs and lookup another user. It can also fork itself.

On line 210 it forked itself and we are in the parent process. It can shed all pledges that the forked child needs for its initialization as well as the ability to fork another instance. The parent process can now only open files for reading, pass those file descriptors to the child process and read-write already open file descriptors.

On line 374 we are in the child process and we shed all pledges that the parent needed. We need to be able to read-write open file descriptors and receive file descriptors from the parent process. During initialization we need to know if we are running as root and be able to look up a user. file(1) can privilege drop to a dedicated user when invoked as root! Once we are done with that, the child process, which does all the magic³, sheds all those additional privileges and on 389 it can only read-write existing file descriptors and receive new file descriptors from the parent process.

We see file(1) is privilege separated, running two processes. One process opens files but does not look at the contents. The other process, which is completely locked away, parses untrusted data and informs the parent process about the result.

Neither process can talk to the Internet or write to disk. They cannot create new files or open existing files for writing. While it can read ssh private keys, it cannot exfiltrate them.

We can find out what file(1) can and cannot do from just those four pledge lines, that is very powerful when starting a code review. They can also be used to get a quick overview how file(1) operates by reading the code adjacent to the pledge(2) calls, that is where interesting stuff happens.

Privilege separation

A single process design that pledges "stdio inet rpath" still has a lot of attack surface. This is not good if it is a network daemon running as root and enabled per default on all installations. Like dhcpleased(8) for example. For starters it can read and exfiltrate ssh private keys.

As we have seen with file(1), we can split up a program into multiple communicating processes that each pledge less operations than the sum of all pledges. We can move the risky operations of parsing untrusted data to a process that does not have access to the Internet, nor the file-system. That process will also not have any elevated privileges to change the system configuration like configuring network addresses or changing the routing table.

An attacker who finds a loophole into this least privileged process will have a hard time creating havoc. They can only talk to more privileged processes using a very narrow communication channel with easy to parse[fn::When using the imsg framework, it is not even parsing, nor data marshalling. It is raw C structs of a fixed, and known at compile time, size.] messages.

OpenBSD network daemons

We will look in detail at how dhcpleased(8) uses privilege separation and pledge(2) to implement a DHCP client in a safe way. Other network daemons follow a similar pattern and the reader should be able to study the source code of rad(8), slaacd(8), and unwind(8) to find out how those daemons work since they share a common ancestry.

Overview

dhcpleased(8) uses three communicating processes to implement privilege separation: parent, engine, and frontend. The parent process retains its root privileges to make changes to the system like configuring IP addresses. This process must be protected from the outside world and we make sure that it does not interact with untrusted data. The frontend process interacts with the outside world, for example by sending and receiving DHCP messages. But it does not parse DHCP messages it receives because that is untrusted data. It is the engine's job to parse DHCP messages and implement the state machine for the DHCP protocol. The engine process is completely locked away.

dhcpleased(8) has a configuration file, following the typical OpenBSD syntax. For example, to send a custom vendor-class option in a DHCPDISCOVER or DHCPREQUEST packet, the configuration looks like this:

interface vio0 {
    send vendor class id "foobar"
}

Most configuration files on OpenBSD use this kind of syntax, from cwmrc(5) to pf.conf(5).

It also comes with a control program, dhcpleasectl(8), to interact with the running daemon. Many OpenBSD daemons provide a similar tool to interact with them.

These are the features that come for free when using an existing OpenBSD network daemon as a template to write a new one: It will be privilege separated and using pledge(2) as a guide on how functionality should be split up between the processes. It will have a configuration file that uses a syntax that people familiar with OpenBSD will find easy to use. And there is a control process to interact with the running daemon. And finally there is a logging framework that handles logging to syslog or stderr. All this scaffolding and tooling is already there, we just need to swap out the specific code the old daemon uses and replace them with something new.

Initialization

dhcpleased(8) comes to life in void main(int argc, char *argv[]) in dhcpleased.c. After a bit of house keeping and argument parsing it ends up creating communications channels to talk to the frontend and engine processes and starts those two child processes:

	if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
	    PF_UNSPEC, pipe_main2frontend) == -1)
		fatal("main2frontend socketpair");
	if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
	    PF_UNSPEC, pipe_main2engine) == -1)
		fatal("main2engine socketpair");

	/* Start children. */
	engine_pid = start_child(PROC_ENGINE, saved_argv0, pipe_main2engine[1],
	    debug, verbose);
	frontend_pid = start_child(PROC_FRONTEND, saved_argv0,
	    pipe_main2frontend[1], debug, verbose);

start_child() ensures that after forking the file descriptor the child process can use to talk to the parent process has number three:

	if (fd != 3) {
		if (dup2(fd, 3) == -1)

It then sets up an argv array for execvp(3) to re-exec itself. The flags -E and -F control if the child process runs as frontend or engine process:

	argv[argc++] = argv0;
	switch (p) {
	case PROC_MAIN:
		fatalx("Can not start main process");
	case PROC_ENGINE:
		argv[argc++] = "-E";
		break;
	case PROC_FRONTEND:
		argv[argc++] = "-F";
		break;
	}
	if (debug)
		argv[argc++] = "-d";
	if (verbose)
		argv[argc++] = "-v";
	if (verbose > 1)
		argv[argc++] = "-v";
	argv[argc++] = NULL;

	execvp(argv0, argv);

We used to only fork child processes, which is good enough for privilege separation. It then occurred to us that the child process will have the same memory layout and use the same stack protector cookies. Using fork & exec ensures that the child processes get a different memory layout. If there is an information leak in one process it cannot be used by an attacker to find gadgets in a different, potentially more privileged process.

Going back to the main function, after option parsing we know if we are still in the parent process or in engine or frontend process:

	if (engine_flag)
		engine(debug, verbose);
	else if (frontend_flag)
		frontend(debug, verbose);

The engine() and frontend() functions live in engine.c and frontend.c respectively. Neither returns to the main() function.

In the initialization of the child processes we drop privileges and pledge what the process needs to run. The engine process pledges stdio recvfd and the frontend process stdio unix recvfd route.

We then set-up the communication channel to the parent (also known as the main) process:

	imsg_init(&iev_main->ibuf, 3);
	iev_main->handler = frontend_dispatch_main;

As mentioned before we force the file descriptor to number 3 in start_child() so the child process knows how it can reach the parent process. We are using libevent to call functions when an event happens on a file descriptor. Here frontend_dispatch_main is called when we receive a message from the parent process. There is also a function frontend_dispatch_engine for messages from the engine process. The naming scheme is RECEIVINGPROCESS_dispatch_SENDINGPROCESS and there are functions in engine.c and dhpleased.c to have a full mesh of communication channels between all three processes.

Now that we have started the child processes and hooked up the communication channels between parent and frontend as well as parent and engine, it is time to send our first message from the parent to both children. The child processes pledged recvfd, which allows them to receive open file descriptors over an existing open file descriptor. The parent process calls main_imsg_send_ipc_sockets() to create another socket pair and pass the end points to engine and frontend to create a full mesh.

The file descriptor is received by engine_dispatch_main() and frontend_dispatch_main() using a message type of IMSG_SOCKET_IPC.

After receiving this file descriptor, engine can drop the recvfd pledge and only pledges stdio. It no longer expects any more file descriptors.

The start-up of frontend is a bit more complicated. It needs to receive a route socket from the parent process to learn of interfaces gaining or losing the autoconf flag during runtime. Once it received the route socket from the parent process it can get a list of all interfaces that already have the autoconf flag and drop the route pledge in frontend_startup() afterwards:

frontend_startup(void)
{
	if (!event_initialized(&ev_route))
		fatalx("%s: did not receive a route socket from the main "
		    "process", __func__);

	init_ifaces();
	if (pledge("stdio unix recvfd", NULL) == -1)
		fatal("pledge");
	event_add(&ev_route, NULL);
}

It still needs to hold on to unix for communication with dhcpleasectl(8) and recvfd for receiving bpf(4) sockets when new interfaces are set to autoconf with ifconfig(8).

The astute reader will notice that we have not talked about pledging the parent process. Unfortunately that is not possible because there is no pledge that would allow opening and programming a new bpf(4) socket and we need to create a new one when an interface is set to autoconf while dhcpleased(8) is already running.

However, not all is lost. The parent process, which has to keep running as root, is not touching any untrusted data. An attacker needs to go through the frontend or engine process to gain a foothold in the parent process. And they need to do this via the main_dispatch_frontend and main_dispatch_engine functions. We will look at those in a bit to see why that is a very difficult proposition.

But not all is lost. We can restrict the amount of havoc an attacker can cause if they ever get all the way to the parent process using unveil(2):

	if (unveil(conffile, "r") == -1)
		fatal("unveil %s", conffile);
	if (unveil("/dev/bpf", "rw") == -1)
		fatal("unveil /dev/bpf");

	if (unveil(_PATH_LEASE, "rwc") == -1) {
		no_lease_files = 1;
		log_warn("disabling lease files, unveil " _PATH_LEASE);
	}

	if (unveil(NULL, NULL) == -1)
		fatal("unveil");

With unveil(2) a process can restrict its view of the file-system. The first call to unveil(2) removes access to the entire file-system from the process, except for the first argument of unveil(2). The second argument specifies the permission the program requests for the path: "r" means read, "w" write and "c" create. Subsequent calls unveil more parts of the file-system. Since we cannot use pledge(2) in the parent process, we need to somehow lock-in the list of unveiled paths so that an attacker cannot simply add more. unveil(NULL, NULL); does that, any subsequent calls to unveil(2) will fail.

It turns out that the parent process needs very little access to the file system. It needs access to the dhcpleased(8) config file, the bpf(4) device and the directory where lease files are stored. That list of files and directory does not include access to ssh private keys.

dispatching

As we said the parent process is not touching any untrusted data, that is left to the frontend and engine process. The frontend and engine processes send imsg messages to the parent process. Let's have a look at how those messages arrive and what the parent process does with them.

Messages from the frontend process arrive at main_dispatch_frontend() in dhcpleased.c:

void
main_dispatch_frontend(int fd, short event, void *bula)
{
// [...]
	uint32_t		 if_index;
	int			 verbose;
// [...]
	for (;;) {
		if ((n = imsg_get(ibuf, &imsg)) == -1)
			fatal("imsg_get");
		if (n == 0)	/* No more messages. */
			break;

		switch (imsg.hdr.type) {
		case IMSG_OPEN_BPFSOCK:
			if (IMSG_DATA_SIZE(imsg) != sizeof(if_index))
				fatalx("%s: IMSG_OPEN_BPFSOCK wrong length: "
				    "%lu", __func__, IMSG_DATA_SIZE(imsg));
			memcpy(&if_index, imsg.data, sizeof(if_index));
			open_bpfsock(if_index);
			break;
		case IMSG_CTL_RELOAD:
			if (main_reload() == -1)
				log_warnx("configuration reload failed");
			else
				log_warnx("configuration reloaded");
			break;
		case IMSG_CTL_LOG_VERBOSE:
			if (IMSG_DATA_SIZE(imsg) != sizeof(verbose))
				fatalx("%s: IMSG_CTL_LOG_VERBOSE wrong length: "
				    "%lu", __func__, IMSG_DATA_SIZE(imsg));
			memcpy(&verbose, imsg.data, sizeof(verbose));
			log_setverbose(verbose);
			break;
		case IMSG_UPDATE_IF:
			if (IMSG_DATA_SIZE(imsg) != sizeof(imsg_ifinfo))
				fatalx("%s: IMSG_UPDATE_IF wrong length: %lu",
				    __func__, IMSG_DATA_SIZE(imsg));
			memcpy(&imsg_ifinfo, imsg.data, sizeof(imsg_ifinfo));
			read_lease_file(&imsg_ifinfo);
			main_imsg_compose_engine(IMSG_UPDATE_IF, -1,
			    &imsg_ifinfo, sizeof(imsg_ifinfo));
			break;
		default:
			log_debug("%s: error handling imsg %d", __func__,
			    imsg.hdr.type);
			break;
		}
		imsg_free(&imsg);
	}
// [...]
}

We see that it handles four message types IMSG_OPEN_BPFSOCK, IMSG_CTL_RELOAD, IMSG_CTL_LOG_VERBOSE and IMSG_UPDATE_IF.

One of them, IMSG_CTL_RELOAD, does not have any payload data, the parent process just performs a predefined action. The frontend process cannot influence how the action is performed, only when it is performed.

For the other three, the frontend process sends a piece of data that influences how the action is performed. The piece of data has a fixed, known at compile time, length. For example IMSG_OPEN_BPFSOCK is sent by the frontend process when it learns of a new network interface gaining the autoconf flag. It needs a new bpf(4) socket to send and receive DHCP messages on that interface[fn::bpf(4) sockets are bound to a specific interface and then locked, meaning the frontend process will not be able to change anything about the socket. That also means that the frontend process cannot send and receive arbitrary messages but only those that match the bpf filter.]. It sends a single uint32_t that uniquely identifies the network interface in the system. If it sends something else, we assume the frontend process has been compromised and is no longer trusted. The parent process terminates itself:

	if (IMSG_DATA_SIZE(imsg) != sizeof(if_index))
		fatalx("%s: IMSG_OPEN_BPFSOCK wrong length: "
		    "%lu", __func__, IMSG_DATA_SIZE(imsg));

We are using the same pattern of checking the data size for all the other message types send from the frontend process. The same is true for how main_dispatch_engine() handles the messages sent by the engine process.

The frontend process sends the IMSG_OPEN_BPFSOCK message using frontend_imsg_compose_main() in frontend.c. The sending function names follow a pattern of SENDINGPROCESS_imsg_compose_RECEIVINGPROCESS and they are each a thin wrapper around imsg_compose_event() defined in dhcpleased.c which in turn is a thin wrapper around imsg_compose(3).

A good way to understand what dhcpleased(8) is doing is searching for where all the imsg_types are sent and received.

Configuration file, control process and logging framework.

We will not go into detail on how these features are implemented. They are not that relevant to the topic of privilege separation and the author is not an expert in LR(1) grammars or yacc(1). We will give some pointers to get interested readers started to learn on their own.

The parsed configuration file is stored and passed around in a struct dhcpleased_conf structure. The entry point into the parser is parse_config() in parse.y. parse.y contains a hand written lexer as well as the yacc(1) grammar. Changing or adding to the grammar of the configuration file is done in three places:

New tokens are added to the grammar at the beginning of the file using the %token keyword. Tokens are in all caps.
Adding rules to the grammar, using the defined tokens
Teaching the lexer about new tokens. This is done by adding to the keywords array in the lookup() function.

Passing -nv flags to dhcpleased(8) runs a configuration test and prints out the canonical form of the parsed configuration file. The code for this lives in printconf.c. Care should be taken to print a valid configuration, i.e. one that can be passed to dhcpleased(8) again.

The source code for the control process can be found in usr.sbin/dhcpleased/dhcpleasectl.c. Communication is done using UNIX-domain sockets and connections are accepted by the frontend process. Communication uses imsgs, and messages from the control process are handled by control_dispatch_imsg() in control.c. The main difference to the dispatch function between the dhcpleased(8) processes is that messages with invalid data sizes are ignored instead of exiting the daemon. This is done because everyone in the wheel group can send messages to the daemon and we do not want them to be able to make the daemon crash[fn::Or even everyone on the system in case of unwind(8).]:

		case IMSG_CTL_LOG_VERBOSE:
			if (IMSG_DATA_SIZE(imsg) != sizeof(verbose))
				break;

Here dhcpleased(8) expects single integer to set the verbosity of the daemon. If we get something else we just ignore the message using break.

Finally, the network daemons come with a logging framework that handles logging to syslog or stderr when the daemons runs in the foreground using -d. The code for this can be found in log.c.

Epilogue

Writing software in C with security in mind can be a lot of fun when standing on the shoulders of giants and having things like privilege separation and restricted-service operating mode in your toolbox.

I wrote two daemons, dhcpleased(8) and slaacd(8), that are enabled by default on every OpenBSD installation. We might eventually add a third one. Having the mitigations shown here as well as all the other mitigations constantly being added and enabled per default is what lets me sleep at night. I am cautiously optimistic that when a bug is found in dhcpleased(8) or slaacd(8) an attacker will have a hard time pivoting to arbitrary code execution as root.

not if!

Which is usually the terminal.

Pun intended.

36 KiB Raw Permalink Blame History Unescape Escape