2022-12-02 15:51:07 +01:00
#+TITLE : Fuzzing ping(8)
#+SUBTITLE : ... and finding a 24 year old bug.
#+DATE : 2022-12-01
* Prologue
[[https://freebsd.org ][FreeBSD ]] had a [[https://www.freebsd.org/security/advisories/FreeBSD-SA-22:15.ping.asc ][security fluctuation ]] in their implementation of =ping(8)=
the other day. As someone who has done a lot of work on [[https://man.openbsd.org/man/ping.8 ][=ping(8)= ]] in
[[https://openbsd.org ][OpenBSD ]] this tickled my interests.
* What about OpenBSD?
=ping(8)= is ancient:
#+begin_example
* Author -
* Mike Muuss
* U. S. Army Ballistic Research Laboratory
* December, 1983
#+end_example
What we know today as =ping(8)= started to become recognizable in 1986, for
example see this [[https://github.com/csrg/csrg/commit/962056110ebf62ed8d4368964c7e82ac7434ea82 ][csrg commit ]].
FreeBSD identified a stack overflow in the =pr_pack()= function and I
expected a lot of similarity between the BSDs. This stuff did not
change a lot since the csrg days.
Step one: Does this effect us? Turns out, it does not. FreeBSD rewrote
=pr_pack()= in [[https://github.com/freebsd/freebsd-src/commit/d9cacf605e2ac0f704e1ce76357cbfbe6cb63d52 ][2019 ]], citing alignment problems.
Now we could join the punters on the Internet and point and laugh. But
that's just rude, uncalled for, and generally boring and
pointless. Technically I'm on vacation and I had resolved to only do
fun things this week. So let's have some fun.
2022-12-02 16:58:09 +01:00
Step two: Did we mess something else up? FreeBSD had a problem in
=pr_pack()= because that function handles data from the network. The
data is untrusted and needs to be validated. Now is a good a time as
any to check OpenBSD's implementation of =pr_pack()= . I wanted to try
fuzzing something, anything, with [[https://en.wikipedia.org/wiki/American_fuzzy_lop_(fuzzer) ][afl ]] for a few years, but never got
around to it. I thought I might as well do it now, might be fun.
2022-12-02 15:51:07 +01:00
* Make sure you are not holding it wrong.
2022-12-02 16:58:09 +01:00
I installed =afl++= from packages and glanced at
"[[https://aflplus.plus/docs/tutorials/libxml2_tutorial/ ][Fuzzing libxml2 with AFL++ ]]". Here is what we need:
2022-12-02 15:51:07 +01:00
+ A program to test. Something with a know bug so that we can tell the
fuzzing works.
2022-12-02 16:58:09 +01:00
+ An input file, that does not trigger the bug.
2022-12-02 15:51:07 +01:00
+ Compile the program with =afl-clang-fast= .
+ Run =afl-fuzz= .
2022-12-02 17:11:25 +01:00
[[file:fuzzing-ping/test.c ][=test.c= ]]:
2022-12-02 15:51:07 +01:00
#+begin_src C
/* Written by Florian Obser, Public Domain */
#include <err.h >
#include <stdio.h >
#include <stdlib.h >
#include <string.h >
int
main(int argc, char **argv)
{
FILE *f;
size_t fsize;
uint8_t *buf, len, *dbuf;
f = fopen(argv[1], "rb");
fseek(f, 0, SEEK_END);
fsize = ftell(f);
rewind(f);
buf = malloc(fsize + 1);
if (buf == NULL)
err(1, NULL);
fread(buf, fsize, 1, f);
fclose(f);
buf[fsize] = 0;
len = buf[0];
dbuf = malloc(len);
if (dbuf == NULL)
err(1, NULL);
memcpy(buf + 1, dbuf, fsize - 1);
warnx("len: %d", len);
return 0;
}
#+end_src
This program has a trivial buffer overflow. It figures out how big a
2022-12-02 16:58:09 +01:00
file is on disk and stores this in =fsize= . It allocates a buffer of
this size and then reads the whole file into it. It interprets the
first byte as the length of the data (=len= ) and allocates a new
buffer (=dbuf= ) of this size. It skips the length byte and copies
=fsize - 1= bytes into the new buffer. So it trusts that the amount of
data it read from disk is the same as indicated by the length byte.
2022-12-02 15:51:07 +01:00
2022-12-02 18:33:57 +01:00
While this might seem silly, this is what real world buffer overflows
2022-12-02 15:51:07 +01:00
look like.
Here is a file where the length byte and file size agree. Create
folders =in= and =out= and place =test.txt= into =in/test.txt= . Don't
forget the newline.
2022-12-02 17:11:25 +01:00
[[file:fuzzing-ping/test.txt ][=test.txt= ]]:
2022-12-02 15:51:07 +01:00
#+begin_example
ABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
#+end_example
Compile =test.c= :
#+begin_src shell
CC=/usr/local/bin/afl-clang-fast make test
#+end_src
and run =afl-fuzz= :
#+begin_src shell
afl-fuzz -i in/ -o out -- ./test @@
#+end_src
It more or less immediately finds a crash. The reproducer(s) are in
=out/default/crashes/= .
* Fuzzing =ping(8)=
At this point we are facing a few problems. What does it mean to fuzz
=ping(8)= , where are we getting the sample input from and how do we feed
it to =ping(8)= .
2022-12-02 16:58:09 +01:00
From a high level point of view =ping(8)= parses arguments,
initializes a bunch of stuff and then enters an infinite loop sending
ICMP echo request packets and waiting for a reply. It parses and
prints each reply.
2022-12-02 15:51:07 +01:00
Parsing the reply is the interesting thing. The reply comes from the
2022-12-02 16:58:09 +01:00
network and is untrusted. This is where things can go wrong. The
parsing is handled by =pr_pack()= , so that's what we should fuzz.
2022-12-02 15:51:07 +01:00
** =in/= for =ping(8)=
2022-12-02 16:58:09 +01:00
We need some sample data. An ICMP package is binary data
2022-12-02 15:51:07 +01:00
on-wire. Crafting it by hand is annoying. So let's just hack =ping(8)=
to dump the packet to disk.
2022-12-02 17:11:25 +01:00
[[file:fuzzing-ping/ping_output_hack.diff ][=ping_output_hack.diff= ]]:
2022-12-02 15:51:07 +01:00
#+begin_src diff
diff --git sbin/ping/ping.c sbin/ping/ping.c
index a3b3d650eb5..78b571b95b4 100644
--- sbin/ping/ping.c
+++ sbin/ping/ping.c
@@ -79,6 +79,7 @@
#include <sys/types.h >
#include <sys/socket.h >
+#include <sys/stat.h >
#include <sys/time.h >
#include <sys/uio.h >
@@ -95,6 +96,7 @@
#include <ctype.h >
#include <err.h >
#include <errno.h >
+#include <fcntl.h >
#include <limits.h >
#include <math.h >
#include <poll.h >
@@ -217,6 +219,8 @@ const char *pr_addr(struct sockaddr *, socklen_t);
void pr_pack(u_char *, int, struct msghdr *);
__dead void usage(void);
+void output(char *, u_char *, int);
+
/* IPv4 specific functions */
void pr_ipopt(int, u_char *);
int in_cksum(u_short *, int);
@@ -255,7 +259,7 @@ main(int argc, char *argv[])
int df = 0, tos = 0, bufspace = IP_MAXPACKET, hoplimit = -1, mflag = 0;
u_char *datap, *packet;
u_char ttl = MAXTTL;
- char *e, *target, hbuf[NI_MAXHOST], *source = NULL;
+ char *e, *target, hbuf[NI_MAXHOST], *source = NULL, *output_path = NULL;
char rspace[3 + 4 * NROUTES + 1]; /* record route space */
const char *errstr;
double fraction, integral, seconds;
@@ -264,11 +268,13 @@ main(int argc, char *argv[])
u_int rtableid = 0;
extern char *__progname;
+#if 0
/* Cannot pledge due to special setsockopt()s below */
if (unveil("/", "r") == -1)
err(1, "unveil /");
if (unveil(NULL, NULL) == -1)
err(1, "unveil");
+#endif
if (strcmp("ping6", __progname) == 0) {
v6flag = 1;
@@ -297,8 +303,8 @@ main(int argc, char *argv[])
preload = 0;
datap = &outpack[ECHOLEN + ECHOTMLEN];
while ((ch = getopt(argc, argv, v6flag ?
- "c:DdEefgHh:I:i:Ll:mNnp:qS:s:T:V:vw:" :
- "DEI:LRS:c:defgHi:l:np:qs:T:t:V:vw:")) != -1) {
+ "c:DdEefgHh:I:i:Ll:mNno:p:qS:s:T:V:vw:" :
+ "DEI:LRS:c:defgHi:l:no:p:qs:T:t:V:vw:")) != -1) {
switch(ch) {
case 'c':
npackets = strtonum(optarg, 0, INT64_MAX, &errstr);
@@ -375,6 +381,9 @@ main(int argc, char *argv[])
case 'n':
options &= ~F_HOSTNAME;
break;
+ case 'o':
+ output_path = optarg;
+ break;
case 'p': /* fill buffer with user pattern */
options |= F_PINGFILLED;
fill((char *)datap, optarg);
@@ -768,10 +777,10 @@ main(int argc, char *argv[])
}
if (options & F_HOSTNAME) {
- if (pledge("stdio inet dns", NULL) == -1)
+ if (pledge("stdio inet dns wpath cpath", NULL) == -1)
err(1, "pledge");
} else {
- if (pledge("stdio inet", NULL) == -1)
+ if (pledge("stdio inet wpath cpath", NULL) == -1)
err(1, "pledge");
}
@@ -960,8 +969,11 @@ main(int argc, char *argv[])
}
}
continue;
- } else
+ } else {
+ if (output_path != NULL)
+ output(output_path, packet, cc);
pr_pack(packet, cc, &m);
+ }
if (npackets && nreceived >= npackets)
break;
@@ -2274,3 +2286,29 @@ usage(void)
}
exit(1);
}
+
+void
+output(char *path, u_char *pack, int len)
+{
+ size_t bsz, off;
+ ssize_t nw;
+ int fd;
+ char *fname;
+
+ bsz = len;
+ if (asprintf(&fname, "%s/ping_%lld_ %d.out", path, time(NULL),
+ getpid()) == -1)
+ err(1, NULL);
+
+ fd = open(fname, O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP |
+ S_IROTH);
+ free(fname);
+
+ if (fd == -1)
+ err(1, "open");
+
+ for (off = 0; off < bsz; off + = nw)
+ if ((nw = write(fd, pack + off, bsz - off)) == 0 || nw = = -1)
+ err(1, "write");
+ close(fd);
+}
#+end_src
After building and installing our hacked version of =ping(8)= we can
create sample input data for afl thusly:
#+begin_src shell
while :; do
ping -o ./in/ -w 1 -c 1 \
$(jot -r 0 255 | head -4 | tr '\n' '.' | sed 's/.$/ /')
done
#+end_src
=jot= creates a stream of random numbers between 0 and 255, we get the
first four, concatenate them with '.' and cut of the trailing
dot. Voilà we have a bunch of random IPv4 addresses. We then send a
single ping and wait for one second. The ICMP reply is written to
=./in/= .
** Fuzzing =pr_pack()=
At this point I wrote a =main()= function that accepts a file name as
2022-12-02 16:58:09 +01:00
argument and reads it into a buffer. I then ripped =pr_pack()= out of
2022-12-02 15:51:07 +01:00
=ping(8)= and fed it the file contents.
Of course compiling fails quite spectacularly at this point. So I
added a bunch of missing functions, defines and global variables. It
gets pretty close now. We don't have the =msghdr= from =recvfrom(2)= so
we need to =#if 0= some code. We also need to get rid of the
validation of the data packet using =SipHash= because the whole point
is that the data does not validate and =SipHash= would short circuit.
Oh yeah, and the thing is legacy IP only at this point.
2022-12-02 16:58:09 +01:00
So [[file:fuzzing-ping/afl_ping.c ][here (=afl_ping.c=) ]] it is, it is quite terrible. It would probably
make more sense to copy all of =ping(8)= and slap on a new =main()=
function. Maybe.
2022-12-02 15:51:07 +01:00
Anyway, at this point I was 30 minutes in, from reading about afl for
the first time until firing up =afl-fuzz= on my hacked
2022-12-02 16:58:09 +01:00
=pr_pack()= . Not too bad. It was time for dinner and I left the thing
running.
2022-12-02 15:51:07 +01:00
** The promised bug
I came back after dinner and afl found zero crashes. That's
2022-12-02 16:58:09 +01:00
disappointing. Or good. Depending on how you look at it. But it found
2022-12-02 15:51:07 +01:00
hangs. Running =afl_ping= on one of the reproducers, it printed
2022-12-02 16:58:09 +01:00
"=unknown option 20= " forever.
2022-12-02 15:51:07 +01:00
The problem is in this part of the code:
#+begin_src C
for (; hlen > (int)sizeof(struct ip); --hlen, ++cp) {
/* [...] */
switch (*cp) {
/* [...] */
default:
printf("\nunknown option %x", *cp);
hlen = hlen - (cp[IPOPT_OLEN] - 1);
cp = cp + (cp[IPOPT_OLEN] - 1);
break;
}
}
#+end_src
=cp= is untrusted data and if =cp[IPOPT_OLEN]= is zero we would
increase =hlen= by one and the for loop would subtract one, same for
=cp= . We never make any progress and spin forever.
The diff is fairly simple:
#+begin_src diff
diff --git ping.c ping.c
index fb31365ad31..6019c87d8db 100644
--- ping.c
+++ ping.c
@@ -1525,8 +1525,11 @@ pr_ipopt(int hlen, u_char *buf)
break;
default:
printf("\nunknown option %x", *cp);
- hlen = hlen - (cp[IPOPT_OLEN] - 1);
- cp = cp + (cp[IPOPT_OLEN] - 1);
+ if (cp[IPOPT_OLEN] > 0 && (cp[IPOPT_OLEN] - 1) <= hlen) {
+ hlen = hlen - (cp[IPOPT_OLEN] - 1);
+ cp = cp + (cp[IPOPT_OLEN] - 1);
+ } else
+ hlen = 0;
break;
}
}
#+end_src
I foolishly tweaked the diff after collecting OKs and of course the
tweak was wrong. Note to self: Never do this. So it's spread out over
two commits: [[https://cvsweb.openbsd.org/src/sbin/ping/ping.c#rev1.247 ][ping.c, Revision 1.247 ]] and [[https://cvsweb.openbsd.org/src/sbin/ping/ping.c#rev1.248 ][ping.c, Revision 1.248 ]].
This bug was introduced April 3rd, 1998 in [[https://cvsweb.openbsd.org/src/sbin/ping/ping.c#rev1.30 ][revision 1.30 ]], over 24
years ago.
* Epilogue
Afl uses files to feed data to programs to get them to crash or
otherwise misbehave. I had wondered for a few years how I could use
afl with things that talk to the network. Because that's what I mostly
work on. In hindsight it's quite obvious. You identify the main
parsing function, wrap it in a new =main()= function and Robert is
your father's nearest male relative.
The two main takeaways from this are: One, if someone messes up
somewhere, go look if you messed up in the same or similar way
somewhere else. Two, afl is pretty easy to use, even for network
programs. 30 minutes from reading about afl for the first time to
finding a bug in a real world program is pretty neat.