Extra data copy during write

Question

Extra data copy during write

asomers opened this issue 3 years ago · comments

During a write, fuse3 first copies data from the kernel into userland in Session::dispatch. Then it passes a slice of that buffer to handle_write, which ends up copying the data again into a new Vec. It then passes that data as a slice to Filesystem::write, where it might well be copied again. The same thing happens in setxattr.

Instead, Session::dispatch should read from /dev/fuse using readv into a header-sized buffer and a large data buffer. Then it should pass the data buffer by value to Filesystem::write using a Vec. That would eliminate one data copy, and possibly two, depending on how the file system implements write.

Sherlock Holo · Answer 1 · Tue Mar 05 2024 19:46:35 GMT+0800 (China Standard Time)

use writev should avoid memory copy, we own the header buffer and user data(such as Filesystem::read will return Bytes)

when read fuse request, we can allocate 2 buffer, one for header the other for fuse data, when receive a write opcode, consider

The max size of write requests from the kernel. The absolute minimum is 4k, FUSE recommends at least 128k, max 16M. The FUSE default is 16M on macOS and 128k on other systems.

the data may be large or small

small like 4K size data: if we pass the data buffer to Filesystem::write, we need to allocate the data buffer(the buffer size is 16M) again
large like 15M size data: we pass the data buffer to Filesystem::write then we allocate the data buffer again, but this is no different from the status quo.

anyway, we can replace read/write with readv/writev at first, then find a way to improve write opcode

Alan Somers · Answer 2 · Tue Mar 05 2024 22:25:19 GMT+0800 (China Standard Time)

BTW, the maximum size of write that a filesystem will receive is given by the max_write field during FUSE_INIT. So it could be much less than 16M.