terminusdb / terminusdb-store

a tokio-enabled data store for triple data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MemoryBackedStore cloned on every FileLoad::map

spl opened this issue · comments

Consider this snippet of the implementation of the FileLoad trait for MemoryBackedStore (src/storage/memory.rs):

pub struct MemoryBackedStore {
    vec: Arc<sync::RwLock<Vec<u8>>>,
}
impl FileLoad for MemoryBackedStore {
    // ...
    fn map(&self) -> Box<dyn Future<Item = SharedVec, Error = std::io::Error> + Send> {
        let vec = self.vec.clone();
        Box::new(future::lazy(move || {
            future::ok(SharedVec(Arc::new(vec.read().unwrap().clone())))
        }))
    }
}

In the expression vec.read().unwrap().clone() here, if my understanding is correct, it appears that the Vec<u8> underlying the RwLock is being cloned.

I understand that MemoryBackedStore may be primarily intended for testing and, therefore, for relatively small vectors. However, that may not always be the case, and I'd guess that a clone like this could result in excessive memory usage and possibly even a surprise out-of-memory error. (Then again, I could be blowing things out of proportion, pun intended! 💥)

Would it be a good idea to avoid the .clone() here?

It is probably a good idea yes. While it was indeed primarily intended for tests, we are actually using the memory store in terminusdb right now, and I'm sure there's many use cases.

The idea behind the extra clone here is that you get a snapshot view of whatever the contents of the in-memory 'file' is, at the time of resolving this future. There's probably better ways of doing this though.

The idea behind the extra clone here is that you get a snapshot view of whatever the contents of the in-memory 'file' is, at the time of resolving this future. There's probably better ways of doing this though.

Ah, thanks, it's good to know that the clone was intended for a snapshot. Is there an important reason not to replace it with an active view of the data?

It shouldn't be important. The data should not actually change, since all files are write-once, but this is not properly enforced by the types.
For the file backend, we actually give out mmap'ed buffers which could in theory change if the underlying file changes though (although I'm seriously considering replacing that with a version that reads all data into memory instead). But for the current MemoryBackedStore implementation we can't really give out the RwLock-wrapped vec directly though, since it isn't an AsRef<[u8]>.

I don't know the order of operations. Does the MemoryBackedStore need to always be backed by a RwLock? Is the data being written more than once, or is it written once and then read multiple times without being written to again? If the latter is true, then can the RwLock be consumed after writing?