xdp-project / xdp-tools

Utilities and example programs for use with XDP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Potential fd leak in xdp_program__is_attached

AdrianJab opened this issue · comments

Hello.
We have a watchdog class (using libxdp), which periodically calls xdp_program__is_attached in order to check if our XDP programs are still attached on sockets. The watchdog fires every second, and every second it creates two new files, which we can track with lsof command.

Example:

data_engi 336030 root    8u  a_inode      0,14        0     12569 [eventpoll]
data_engi 336030 root    9u     sock       0,8      0t0 166896037 protocol: XDP
data_engi 336030 root   10u     sock       0,8      0t0 166896052 protocol: XDP
data_engi 336030 root   11u  a_inode      0,14        0     12569 [eventfd]
data_engi 336030 root   12r      CHR     246,3      0t0       115 /dev/ptp3
data_engi 336030 root   13u  a_inode      0,14        0     12569 [eventfd]
data_engi 336030 root   14u  a_inode      0,14        0     12569 [eventfd]
data_engi 336030 root   15r  a_inode      0,14        0     12569 btf
data_engi 336030 root   16u  a_inode      0,14        0     12569 bpf-map
data_engi 336030 root   17u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   18u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   19u  a_inode      0,14        0     12569 [eventfd]
data_engi 336030 root   20r  a_inode      0,14        0     12569 btf
data_engi 336030 root   21u  a_inode      0,14        0     12569 bpf-map
data_engi 336030 root   22u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   23u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   24u  a_inode      0,14        0     12569 [eventfd]
data_engi 336030 root   25u  a_inode      0,14        0     12569 [eventfd]
data_engi 336030 root   26u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   27u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   28u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   29u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   30u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   31u  a_inode      0,14        0     12569 bpf-prog
data_engi 336030 root   32u  a_inode      0,14        0     12569 bpf-prog

Every call to xdp_program__is_attached creates two new nodes with name bpf-prog.
Number of file descriptors grows at the same pace, by two per second (watched by ls /proc/$pid/fd/ | wc -l).
In the long run it leads to "Too many open files" errors in our system.
If i remove the watchdog loop, the creation of new files stops, which shows that there is no problem with loading/unloading XDP programs, just with xdp_program__is_attached method called periodically.

After running valgrind (valgrind -q --tool=none --track-fds=yes) on a minimal program which only loads xdp prog and calls is_attached we can see that there are exactly four unclosed files after the program finishes. Two left from detach method and two from is_attached method.

==114843== FILE DESCRIPTORS: 7 open (3 std) at exit.
==114843== Open file descriptor 12:
==114843==    at 0x4CB9A3D: syscall (syscall.S:38)
==114843==    by 0x1590E0: bpf_obj_get_opts (bpf.c:75)
==114843==    by 0x152054: xdp_program__from_pin (libxdp.c:1449)
==114843==    by 0x1543AC: xdp_multiprog__link_pinned_progs (libxdp.c:2307)
==114843==    by 0x15487E: xdp_multiprog__fill_from_fd (libxdp.c:2409)
==114843==    by 0x154A78: xdp_multiprog__from_fd (libxdp.c:2454)
==114843==    by 0x154B79: xdp_multiprog__from_id (libxdp.c:2491)
==114843==    by 0x154E6B: xdp_multiprog__get_from_ifindex (libxdp.c:2585)
==114843==    by 0x15322D: xdp_program__detach_multi (libxdp.c:1904)
==114843==    by 0x1537A8: xdp_program__detach (libxdp.c:2041)
...
==114843== 
==114843== Open file descriptor 4:
==114843==    at 0x4CB9A3D: syscall (syscall.S:38)
==114843==    by 0x159EDF: bpf_prog_get_fd_by_id_opts (bpf.c:75)
==114843==    by 0x154AD6: xdp_multiprog__from_id (libxdp.c:2474)
==114843==    by 0x154E6B: xdp_multiprog__get_from_ifindex (libxdp.c:2585)
==114843==    by 0x15322D: xdp_program__detach_multi (libxdp.c:1904)
==114843==    by 0x1537A8: xdp_program__detach (libxdp.c:2041)
...
==114843==    by 0x11C67F: main (main.cpp:60)
==114843== 
==114843== Open file descriptor 6:
==114843==    at 0x4CB9A3D: syscall (syscall.S:38)
==114843==    by 0x1590E0: bpf_obj_get_opts (bpf.c:75)
==114843==    by 0x152054: xdp_program__from_pin (libxdp.c:1449)
==114843==    by 0x1543AC: xdp_multiprog__link_pinned_progs (libxdp.c:2307)
==114843==    by 0x15487E: xdp_multiprog__fill_from_fd (libxdp.c:2409)
==114843==    by 0x154A78: xdp_multiprog__from_fd (libxdp.c:2454)
==114843==    by 0x154B79: xdp_multiprog__from_id (libxdp.c:2491)
==114843==    by 0x154E6B: xdp_multiprog__get_from_ifindex (libxdp.c:2585)
==114843==    by 0x1503C8: xdp_program__is_attached (libxdp.c:643)
...
==114843==    by 0x11C639: main (main.cpp:58)
==114843== 
==114843== Open file descriptor 3:
==114843==    at 0x4CB9A3D: syscall (syscall.S:38)
==114843==    by 0x159EDF: bpf_prog_get_fd_by_id_opts (bpf.c:75)
==114843==    by 0x154AD6: xdp_multiprog__from_id (libxdp.c:2474)
==114843==    by 0x154E6B: xdp_multiprog__get_from_ifindex (libxdp.c:2585)
==114843==    by 0x1503C8: xdp_program__is_attached (libxdp.c:643)
...
==114843==    by 0x11C639: main (main.cpp:58)
==114843== 
==114843==

Hi all,
I worked together with @AdrianJab

After some investigation it turns out that there is problem with duplicating FD:

static int xdp_program__fill_from_fd(struct xdp_program *xdp_prog, int fd)
{
	struct bpf_prog_info info = {};
	__u32 len = sizeof(info);
	struct btf *btf = NULL;
	int err = 0, prog_fd;

	if (!xdp_prog)
		return -EINVAL;

	/* Duplicate the descriptor, as we take ownership of the fd below */
	prog_fd = fcntl(fd, F_DUPFD_CLOEXEC, MIN_FD);

https://github.com/xdp-project/xdp-tools/blob/00ff5bf76f28f518acbd40a7aed7606ff21dc364/lib/libxdp/libxdp.c#L1354C12-L1354C17

In the function we duplicate desrpitor but we don't close the duplicated one anywhere and we loose handle for it as we start using the duplicate. I don't know yet how to fix it, as in other cases function:
xdp_program__clone thats expected behavior.

I proposed a pull request here -> #345