vector embeddings are not stored in weaviate
rigolepe opened this issue · comments
Hi,
I tried your example in integrations/llamaindex
and it seems to work based on the fact that a response is formulated from the query. However, I seem to have some trouble understanding what's happening under the hood because I see some unexpected behavior:
- When I list al the documents in the weaviate DB using
http://localhost:8080/v1/objects?class=BlogPost
, it lists the content, but there is no propertyvector
containing the embeddings. However, my OpenAI API usage breakdown lists:- text-embedding-ada-002-v2, 4 requests 18,266 prompt + 0 completion = 18,266 tokens
- (I used other markdown files than the example, so the actual numbers may be different when using the blog posts)
- When I use the index as a simple retriever
index.as_retriever()
and thenretriever.retrieve("<some query>")
I get results, but the listed score isNone
, which implies that there was no distance function used. This may be consistent with the fact that there are no vectors stored in weaviate. So under the hood, some other approximation was used when we expect vector based proximity?
How can I change your sample code in integrations/llamaindex
to:
- actually store the OpenAI ada v2 embeddings in weaviate?
- actually use these embeddings when retrieving/querying?
Thank you,
Peter
Hi Peter,
Thanks so much for your question!
Question 1:
The output is only going to show the response, but there is a way for you to see the vectors. If you head over to WCS and connect to your external cluster
(put in http://localhost:8080
as the cluster URL), you can output the vectors with this:
{
Get {
BlogPost {
content
_additional {
vector
}
}
}
}
You could run a query, and then include _additional
and vector
to output the embedding of each object.
Question 2:
If you simply want to search based on the embeddings, then one way in Weaviate is to use nearVector
https://weaviate.io/developers/weaviate/api/graphql/vector-search-parameters#nearvector
I need to look into index.as_retriever()
, I'm not familiar with this as its a LlamaIndex functionality. I can take a look at it if you aren't happy with the above workaround!
Hi Erika,
Thanks for your quick reply. I must say that the weaviate tooling looks really good! The WCS showed the vectors as you suggested, which reassures me the integration with the OpenAI embeddings is working as expected:
...
{
"_additional": {
"vector": [
0.007973205,
-0.019565739,
0.003659394,
-0.02828685,
-0.011258647,
...
]
}
}
...
I will have a deeper look into the codebase myself to understand the second question based on your feedback. I will add it to this thread later if I find a solution using llama_index.
Thank you,
Peter
No problem! 🙂