opensearch-project / documentation-website

The documentation for OpenSearch, OpenSearch Dashboards, and their associated plugins.

Home Page:https://opensearch.org/docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[DOC] number of results for k is unclear

tenpura-shrimp opened this issue · comments

The docs say:

k Integer Optional The number of results returned by the k-NN search. Default is 10.

It appears that k is actually the number of results per shard not total, though I'm not totally sure. This seemed unclear from earlier

Thanks for creating the issue, @tenpura-shrimp ! We'll take a look. Can you please post the link to where you found the statement?

oops, sorry, I got here from the following page: https://opensearch.org/docs/latest/query-dsl/specialized/neural/

In my research, I found this: "The k value refers to the total number of neighbor results returned across all shards for a given query vector, not per shard." @navneet1v - Can you please confirm?

@hdhalter @tenpura-shrimp
Please refer this highlighted section of the k-NN docs that provides better info: https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/#:~:text=k%20is%20the,value%20of%2010%2C000.

Now on neural search docs it says The number of results returned by the k-NN search I can understand from where the confusion is coming.

To just clear out the confusion, the behavior is little different what is the underline k-NN engine you are choosing.

Just to simplify for native engines: K the max number of documents which will be returned per segments of a shard.
and for Lucene engine, k is the number of docs returned per shard.

In both the cases total number of results that will be returned to coordinator nodes from each shard will be capped to size. and then finally for an opensearch index the final total number of results be capped to size from size*shards results.

I hope this clarifies.

To make sure I'm understanding what we're saying @navneet1v:

Let's assume a shard contains 4 shards each contains an index with 4 documents.

  • In a native engine, k measured as the max number of documents returned per segment, so we would set a max of 4 documents per segment of a shard.
  • In Lucene, since k is the max number of docs per shards, a max of 16 would make more sense.

Am I understanding this correctly?

Let's assume a shard contains 4 shards each contains an index with 4 documents.

I think there is some typo. @Naarcha-AWS can you please fix it so that I understand what is the exact question here.