Question around `vector_query` param

Question

Question around `vector_query` param

lyra-white opened this issue 4 months ago · comments

Not really a bug, but more of a question.

curl 'http://localhost:8108/search' \
  -X POST \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -d '{
        "searches": [
          {
            "collection": "docs",
            "q": "white shirt",
            "query_by": "vec, non_embedding_field",
            "vector_query": "vec:([], k: 1000)"
          }
        ]
      }'

What is intended behavior of above query? Does k=1000 mean I am always going to get 1000 results?
How does above query work with built-in embedding models? In the examples, sometimes embeddings are passed in the vector_query like vec:([0.96826, 0.94, 0.39557, 0.306488]. If I use typesense for computing the embeddings, I won't have the vector to pass here. So do i need to perform two queries, first to get embeddings, and second to use these embeddings? Or is typesense going to figure out that this field can be filled internally, and computed query embeddings for white shirt are plugged in runtime?
Are vector_query params documented anywhere else? I saw some threads about tuning alpha, etc. Where is the complete documentation?

Kishore Nallan · Answer 1 · Sat Apr 06 2024 20:02:42 GMT+0800 (China Standard Time)

Yes k means you are looking for the top K nearest embeddings that match the embedding of the query white shirt.
If you use built-in embedding, you don't have to pass the vector explicitly. You can pass an empty [] vector if you want to control the parameters like k. By default k defaults to value of per_page parameter.
We don't have a single table with the parameters listed. The supported params at the moment are: flat_search_cutoff, distance_threshold, alpha, k, ef. There will be sections with examples of these parameters on this page.