Enet4 / faiss-rs

Rust language bindings for Faiss

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memory leak detected.

jtong11 opened this issue · comments

Hello,

I use the crate-0.10.0 for our embedding recall, and have found some tips:
a. IndexImpl has not impl concurrentIndex, so IVFScalarQuantizerIndexImpl returned by into_ivf_scalar_quantizer builds error;
b. search increase the leaking of memory without upbound limit, so does the read_index, dropped but no release the RES memory.

What would the problem be? Anybody encount?

Thanks

Hello. Could you please provide a few examples that reproduce the problems reported? This would help to better understand the underlying issues with the bindings.

a. IndexImpl has not impl concurrentIndex, so IVFScalarQuantizerIndexImpl returned by into_ivf_scalar_quantizer builds error;

IndexImpl should definitely not implement ConcurrentIndex because it is unknown whether shared access to the index is safe. At best, there may be a poor integration of that with the IVF scalar quantizer.

Sry for reply so late. @Enet4

Here is an example code. Mock as a server, its memory will raises rapidly from 2M to 10+G and without upper bound.
I build it with "rustc 1.58.0-nightly (b426445c6 2021-11-24)",and faiss "c_api_v1.7.1" with "cmake -B build -DFAISS_ENABLE_C_API=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DFAISS_ENABLE_GPU=false" by "g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0".

extern crate faiss;
extern crate rand; //rand = "0.3.17"

fn random_embedding(rng: &mut ThreadRng, dims: usize) -> Vec<f32> {
    let mut embedding = Vec::with_capacity(dims);
    for _ in 0..dims {
        embedding.push(rng.gen_range(-1.0, 1.0));
    }

    return embedding;
}

fn main() {
    let mut rng = rand::thread_rng();

    let dims = 64u32;
    let doc_count = 10000usize;

    {
        let mut ids = Vec::with_capacity(doc_count);
        let mut embeddings = Vec::with_capacity(doc_count * dims as usize);
        for i in 0..doc_count {
            ids.push(Idx::new(i as u64));
            embeddings.extend_from_slice(random_embedding(&mut rng, dims as usize).as_slice());
        }

        let mut index =
            IVFFlatIndexImpl::new_ip(faiss::FlatIndex::new_ip(dims).unwrap(), dims, 10).unwrap();
        index.set_verbose(true);
        index.train(&embeddings).unwrap();
        index.add_with_ids(&embeddings, &ids).unwrap();

        write_index(&index, "/tmp/test.index").unwrap();
    }

    loop {
        let query = random_embedding(&mut rng, dims as usize);
        let mut index = read_index("/tmp/test.index").unwrap();

        let k = 100;
        let mut count = 0;
        loop {
            if count > 10000 {
                break;
            }

            let result = index.search(&query, k);
            count += 1;
            if count % 1000 == 0 {
                println!("{} {:?}", count, result);
            }
        }

        sleep(Duration::from_secs(10));
    }
}

This seems as a issue about memory leak in search case.
facebookresearch/faiss#2054

@ava57r @Enet4 , it works.
Many thanks.