aquasecurity / libbpfgo

eBPF library for Go. Powered by libbpf.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Undefined behaviour while handling variadic arguments

javierhonduco opened this issue · comments

Context

Parca Agent sometimes crashes on startup with the SIGSEGV below. We currently use v0.4.4-libbpf-1.0.1. After some investigation, it was clear that this was happening more consistently in older kernels from the 4.x series. Running one of our tests under GDB:

Thread 1 "cpu.test" received signal SIGSEGV, Segmentation fault.
0x00007ffff7e2c99b in __strstr_sse2_unaligned () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7e2c99b in __strstr_sse2_unaligned () from /lib64/libc.so.6
#1  0x00000000008eaa9d in libbpf_print_fn ()
#2  0x00000000008f208b in libbpf_print (level=level@entry=LIBBPF_WARN, 
    format=0xb363f9 "libbpf: BTF loading error: %d\n") at libbpf.c:234
#3  0x00000000009183c6 in btf_load_into_kernel (btf=btf@entry=0xf0dd60, 
    log_buf=0x0, log_sz=0, log_level=1) at btf.c:1212
#4  0x00000000008f574a in bpf_object__sanitize_and_load_btf (obj=0xf0daa0)
    at libbpf.c:3076
#5  bpf_object_load (obj=0xf0daa0, extra_log_level=0, target_btf_path=0x0)
    at libbpf.c:7651
#6  0x00000000008ec9b9 in _
[...]

Spent some time trying to find for any issues in the libbpf_print libbpf function and surrounding code, but could not find anything nor a clear reason why this crash would occur. All the code seemed sensible and it had checks in place.

Given that other BPF applications I work on that also use libbpf but don't use libbpfgo weren't experiencing this crash made me think that perhaps this issue is somewhere else.

Variadic arguments 💣

libbpfgo sets a logging callback in here to filter out certain logs. This code however contains a bug that leads to undefined behaviour. For example, in the va_arg calls, we are assuming that the first variadic argument is a C string, but there are no guarantees that this is the case.

char *str = va_arg(exclusivity_check, char *);
if (strstr(str, "Exclusivity flag on") != NULL) {
[...]

Indeed, after adding some debugging statements I spotted the following string (source):

libbpf_print (level=level@entry=LIBBPF_WARN, format=0xc03d49 "libbpf: BTF loading error: %d\n") at libbpf.c:228

As per the manpage:

If there is no next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), random errors will occur.

To summarise:

  • There's undefined behaviour in how we are handling/reading variadic arguments;
  • We don't hit this as long as we get lucky and the arguments in the formatting strings are C strings (%s) and in the order we expect (see va_arg calls);
  • This bug consistently reproduces with a crash in older kernels without BTF support due to the error message above, which formats a signed integer (%d);

Note that no variadic arguments also would cause undefined behaviour, as per the C99 spec:

If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:

Possible solutions

  1. Removing the printing callback override to simplify this code but this will be a regression in the log filtering behaviour (see aquasecurity/tracee#1676 and #214);
  2. Formatting the string using a string formatted from libc. Easier and more reliable than the solution below, but will need a buffer to write to;
  3. Ensure correct handling of the variadic arguments, which forces us to reimplement part of i.e. printf. I have a WIP patch that implements this solution, still a WIP, but happy to discuss over the next couple of days if this is something you would consider.

It looks like this issue can be fixed via #270

@mozillazg oh, I should have checked the PRs more carefully!

@javierhonduco thank you for bringing a full analysis. We hope to definitely fix this via #270.

This was solved via #273.