the choice of faiss index
jcyk opened this issue · comments
hi, thanks for open-sourcing the project. great work!
I have questions on the choice of faiss index, i'd really appreciate if you find time to clarify:
-
Could you please share the detailed procedure of how you index wikipedia?
-
Is
IVF1048576_HNSW32_SQ8
and search withnprobe=64
a precise summary of your choice? -
I find in
open/build_index.py
, there is a function namedmerge_indexes
. Did you build multiple sub-indexes then merge? or did not? because I feel the choice may have some effect on the performance. -
just more specific Q1, the process of building index seems quite complicated in your code as follows. by default, it goes through
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L121
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L126-L131
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L134-L137
then
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L148
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L164
Can the following two lines encode the same idea?
index = faiss.index_factory(d, "IVF1048576_HNSW32,SQ8")
index.train(data)
thanks!