kagisearch / vectordb

A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.

Home Page:https://vectordb.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: not enough values to unpack (expected 2, got 1)

trufae opened this issue · comments

i'm using vectordb in to index data documentation from different sources, but sometimes i get those backtraces. x.shape only contains one element (0,) this issue happens only when there 's nothing saved (or the data saved is too small)

146$ ./venv/bin/python
Python 3.11.6 (main, Oct  2 2023, 13:45:54) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import vectordb
Warning: mprt could not be imported. Install with 'pip install git+https://github.com/vioshyvo/mrpt/'. Falling back to Faiss.
>>> a=vectordb.Memory()
>>> a.search("riscv", top_n=5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/vectordb/memory.py", line 68, in search
    indices = self.vector_search.search_vectors(query_embedding, embeddings, top_n)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/vectordb/vector_search.py", line 60, in search_vectors
    indices = call_search(query_embedding, embeddings, top_n)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/vectordb/vector_search.py", line 26, in run_faiss
    index.add(vectors)
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/faiss/class_wrappers.py", line 226, in replacement_add
    n, d = x.shape
    ^^^^
ValueError: not enough values to unpack (expected 2, got 1)
>>>

In your example you have not indexed any content and you try to search it.

Why should a backtrace happen if thats the case? Also if i add only one entry of 2 words i get the same crash. It seems that only works when a large text is stored

Can you post an example how to reproduce it?

Sure:

130$ ./venv/bin/python bug.py
Warning: mprt could not be imported. Install with 'pip install git+https://github.com/vioshyvo/mrpt/'. Falling back to Faiss.
Traceback (most recent call last):
  File "/Users/pancake/prg/r2ai/bug.py", line 4, in <module>
    v.search("test")
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/vectordb/memory.py", line 68, in search
    indices = self.vector_search.search_vectors(query_embedding, embeddings, top_n)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/vectordb/vector_search.py", line 60, in search_vectors
    indices = call_search(query_embedding, embeddings, top_n)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/vectordb/vector_search.py", line 26, in run_faiss
    index.add(vectors)
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/faiss/class_wrappers.py", line 226, in replacement_add
    n, d = x.shape
    ^^^^
ValueError: not enough values to unpack (expected 2, got 1)
1$ cat bug.py
import vectordb
v = vectordb.Memory()
v.save("",{})
v.search("test")
0$

the same crash happens if no save is done or even if i put some short text in this save call like this:

v.save("hello world",{"title": "domination"})

imho search should return nothing instead of crashing

You are probably right, there is a bug if you save an empty string. I invite you to check the code (it is simple) and submit a MR.

I tried to fix the issue, but i don't know very well the internals so i'm unsure where the proper fix should be, because i don't even know where the bug is. i can try/except but that's an ugly workaround. So I would appreciate some guidance here or a fix from your side.

I had some spare time to dig a little into the issue and i think this is the cleaner way to fix this crash by doing an early check before going into the search with empty entries ^ please take a look, and feel free to merge, i would love to see a new release out with the fix as its a bit anoying for some use cases and i prefer not to try/catch these blocks.

thanks!