johannesvollmer / exrs

100% Safe Rust OpenEXR file library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Soft allocation limits increase peak memory usage, are all-around harmful

Shnatsel opened this issue · comments

In the io::read_into_vec function it's possible to specify a "soft limit" on allocations, and the slice will only be resized in the specified increments:

exrs/src/io.rs

Lines 309 to 317 in 0fd36ce

// do not allocate more than $chunks memory at once
// (most of the time, this loop will run only once)
while data.len() < end {
let chunk_start = data.len();
let chunk_end = (chunk_start + soft_max).min(data_size);
data.resize(chunk_end, Self::default());
Self::read_slice(read, &mut data[chunk_start .. chunk_end])?; // safe because of `min(data_size)``
}

This does not do what it advertises.

When a Vec is reallocated, it needs to first allocate the bigger storage, then copy over the data from the old allocation to the new one, and finally drop the old allocation. So the peak memory usage is old_capacity + new_capacity.

If the soft limit kicks in, and the loop gets executed more than once, then what happens is first the Vec is reallocated with capacity to hold data.len() + soft_limit (thus copying all the data), and then gets reallocated again to hold old_capacity + soft_limit + new_capacity + soft_limit * 2.

The memory usage also gets worse with every iteration of the loop, and it incurs an expensive memory copy every time.

Instead of looping with a limit, the Vec should be set to the appropriate capacity and length up front, saving memory and the extra copies.

Okay, I see that they prevent files with very large declared block sizes from OOMing the system.

The memory wouldn't actually be provisioned unless it's written to, but very large allocation values can cause issues all on their own, without the data being actually written.

As long as the allocation limit is much greater than the typical block size, this is fine.

yes :) when calling the function, the soft max value should be very large, so large that it only triggers for an invalid file, or absurdly large files. The resizing algorithm can perhaps be improved nonetheless though :)

did you find a file that triggers this loop? because in that case, we should definitely increase the soft limit in the appropriate calls

No, I have not seen such files, but then again I haven't really looked.

let's insert a panic where the soft limit would kick in, and then run the tests, just to be sure.

the images in the repository are small, which means it probably won't kick in. if it does kick in for those small images, it's a really badly chosen limit