[DOC] number of results for k is unclear
tenpura-shrimp opened this issue · comments
k Integer Optional The number of results returned by the k-NN search. Default is 10.
It appears that k is actually the number of results per shard not total, though I'm not totally sure. This seemed unclear from earlier
Thanks for creating the issue, @tenpura-shrimp ! We'll take a look. Can you please post the link to where you found the statement?
oops, sorry, I got here from the following page: https://opensearch.org/docs/latest/query-dsl/specialized/neural/
In my research, I found this: "The k value refers to the total number of neighbor results returned across all shards for a given query vector, not per shard." @navneet1v - Can you please confirm?
@hdhalter @tenpura-shrimp
Please refer this highlighted section of the k-NN docs that provides better info: https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/#:~:text=k%20is%20the,value%20of%2010%2C000.
Now on neural search docs it says The number of results returned by the k-NN search
I can understand from where the confusion is coming.
To just clear out the confusion, the behavior is little different what is the underline k-NN engine you are choosing.
Just to simplify for native engines: K
the max number of documents which will be returned per segments of a shard.
and for Lucene engine, k
is the number of docs returned per shard.
In both the cases total number of results that will be returned to coordinator nodes from each shard will be capped to size
. and then finally for an opensearch index the final total number of results be capped to size
from size*shards
results.
I hope this clarifies.
To make sure I'm understanding what we're saying @navneet1v:
Let's assume a shard contains 4 shards each contains an index with 4 documents.
- In a native engine,
k
measured as the max number of documents returned per segment, so we would set a max of4
documents per segment of a shard. - In Lucene, since
k
is the max number of docs per shards, a max of16
would make more sense.
Am I understanding this correctly?
Let's assume a shard contains 4 shards each contains an index with 4 documents.
I think there is some typo. @Naarcha-AWS can you please fix it so that I understand what is the exact question here.