MemoryBackedStore cloned on every FileLoad::map
spl opened this issue · comments
Consider this snippet of the implementation of the FileLoad
trait for MemoryBackedStore
(src/storage/memory.rs
):
pub struct MemoryBackedStore {
vec: Arc<sync::RwLock<Vec<u8>>>,
}
impl FileLoad for MemoryBackedStore {
// ...
fn map(&self) -> Box<dyn Future<Item = SharedVec, Error = std::io::Error> + Send> {
let vec = self.vec.clone();
Box::new(future::lazy(move || {
future::ok(SharedVec(Arc::new(vec.read().unwrap().clone())))
}))
}
}
In the expression vec.read().unwrap().clone()
here, if my understanding is correct, it appears that the Vec<u8>
underlying the RwLock
is being cloned.
I understand that MemoryBackedStore
may be primarily intended for testing and, therefore, for relatively small vectors. However, that may not always be the case, and I'd guess that a clone like this could result in excessive memory usage and possibly even a surprise out-of-memory error. (Then again, I could be blowing things out of proportion, pun intended! 💥)
Would it be a good idea to avoid the .clone()
here?
It is probably a good idea yes. While it was indeed primarily intended for tests, we are actually using the memory store in terminusdb right now, and I'm sure there's many use cases.
The idea behind the extra clone here is that you get a snapshot view of whatever the contents of the in-memory 'file' is, at the time of resolving this future. There's probably better ways of doing this though.
The idea behind the extra clone here is that you get a snapshot view of whatever the contents of the in-memory 'file' is, at the time of resolving this future. There's probably better ways of doing this though.
Ah, thanks, it's good to know that the clone was intended for a snapshot. Is there an important reason not to replace it with an active view of the data?
It shouldn't be important. The data should not actually change, since all files are write-once, but this is not properly enforced by the types.
For the file backend, we actually give out mmap'ed buffers which could in theory change if the underlying file changes though (although I'm seriously considering replacing that with a version that reads all data into memory instead). But for the current MemoryBackedStore implementation we can't really give out the RwLock-wrapped vec directly though, since it isn't an AsRef<[u8]>.
I don't know the order of operations. Does the MemoryBackedStore
need to always be backed by a RwLock
? Is the data being written more than once, or is it written once and then read multiple times without being written to again? If the latter is true, then can the RwLock
be consumed after writing?