meilisearch / meilisearch-rust

Rust wrapper for the Meilisearch API.

Home Page:https://www.meilisearch.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sorting order is not correct

Jasperav opened this issue · comments

When querying the database with this SDK, the sorting order is different than using the Rest API:

Index

{
	"uid": "users",
	"primaryKey": "user_id"
}

Settings

{
	"displayedAttributes":["user_id"],
	"rankingRules": [
		"d_karma:desc",
		"typo",
		"proximity",
		"attribute",
		"sort",
		"exactness",
		"words"
	],
	"searchableAttributes": [
		"username",
		"bio"
	],
	"typoTolerance": {
		"minWordSizeForTypos": {
			"oneTypo": 3,
			"twoTypos": 6
		}
	},
	"pagination": {
		"maxTotalHits": 50
	}
}

Documents through Rest Api
Url: http://127.0.0.1:7700/indexes/users/search
POST request
Body:

{
	"q": "piet"
}

Result:

{
	"hits": [
		{
			"user_id": "a0bacccc-b2a7-4c91-83f9-8dbf2ea42e8d"
		},
		{
			"user_id": "ec5a4e56-bf1a-44e3-aeee-91b28c25eb30"
		},
		{
			"user_id": "858a3a83-af26-48f4-b735-4987f4e2bc19"
		}
	],
	"estimatedTotalHits": 3,
	"query": "piet",
	"limit": 20,
	"offset": 0,
	"processingTimeMs": 0
}

Documents through Rust SDK
Code:

Query::new(&self.index)
    .with_query("piet")
    .execute()
    .await;

Result:

[
    SearchResult {
        result: SearchResultUser {
            user_id: a0baccccb2a74c9183f98dbf2ea42e8d,
        },
        formatted_result: None,
        matches_position: None,
    },
    SearchResult {
        result: SearchResultUser {
            user_id: 858a3a83af2648f4b7354987f4e2bc19,
        },
        formatted_result: None,
        matches_position: None,
    },
    SearchResult {
        result: SearchResultUser {
            user_id: ec5a4e56bf1a44e3aeee91b28c25eb30,
        },
        formatted_result: None,
        matches_position: None,
    },
]

Problem
Last two user_id's are swapped

Raw documents

{
	"results": [
		{
			"username": "6RzDxughVD",
			"bio": "",
			"user_id": "b9fcf6fc-fee1-4e63-8461-72310e175035",
			"d_karma": 0
		},
		{
			"username": "pietjepietnm",
			"bio": "",
			"user_id": "858a3a83-af26-48f4-b735-4987f4e2bc19",
			"d_karma": 5
		},
		{
			"username": "pietpietpiet",
			"bio": "",
			"user_id": "a0bacccc-b2a7-4c91-83f9-8dbf2ea42e8d",
			"d_karma": 12
		},
		{
			"username": "pietpiethenk",
			"bio": "",
			"user_id": "ec5a4e56-bf1a-44e3-aeee-91b28c25eb30",
			"d_karma": 10
		}
	],
	"offset": 0,
	"limit": 20,
	"total": 4
}

Hello @Jasperav! Thanks for raising this issue.
The only reason that I can see for this bug is that the SDK alters the search query parameters before making the request. Which should not happen!

I'll have to look into this! I'm going on holiday soon so it might take a while. If you have time could you print the query parameters just before the request?

@bidoubiwa Thanks for the quick response. This is the trace logging below. As you can see, the query parameters are correctly sent. I would also like to mention that when I have 2 users in Meili(instead of 3), the users are swapped compared to the Rest API. It looks like it is swapping the last 2 elements.

Logging:

2022-09-08T12:42:31.592895Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: >> POST /indexes/users/search HTTP/1.1\r\nHost: 127.0.0.1:7700\r\nAccept: */*\r\nAccept-Encoding: deflate, gzip\r\nauthorization: Bearer MASTER_KEY\r\ncontent-type: application/json\r\nuser-agent: Meilisearch Rust (v0.18.0)\r\nContent-Length: 12\r\n\r\n
2022-09-08T12:42:31.592960Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: >> {\"q\":\"piet\"}
2022-09-08T12:42:31.592981Z DEBUG send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::handler: We are completely uploaded and fine
2022-09-08T12:42:31.593004Z TRACE agent_thread{id=0}: polling::kqueue: add: kqueue_fd=11, fd=16, ev=Event { key: 16, readable: true, writable: false }    
2022-09-08T12:42:31.593031Z TRACE agent_thread{id=0}: polling: Poller::wait(_, Some(1s))    
2022-09-08T12:42:31.593046Z TRACE agent_thread{id=0}: polling::kqueue: wait: kqueue_fd=11, timeout=Some(1s)    
2022-09-08T12:42:31.594915Z TRACE agent_thread{id=0}: polling::kqueue: new events: kqueue_fd=11, res=1    
2022-09-08T12:42:31.594949Z TRACE agent_thread{id=0}: isahc::agent: socket event socket=16 readable=true writable=false
2022-09-08T12:42:31.594988Z DEBUG send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::handler: Mark bundle as not supporting multiuse
2022-09-08T12:42:31.595010Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: << HTTP/1.1 200 OK\r\n
2022-09-08T12:42:31.595050Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: << transfer-encoding: chunked\r\n
2022-09-08T12:42:31.595095Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: << content-type: application/json\r\n
2022-09-08T12:42:31.595130Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: << vary: accept-encoding\r\n
2022-09-08T12:42:31.595164Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: << content-encoding: deflate\r\n
2022-09-08T12:42:31.595196Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: << access-control-allow-origin: *\r\n
2022-09-08T12:42:31.595229Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: << date: Thu, 08 Sep 2022 12:42:31 GMT\r\n
2022-09-08T12:42:31.595269Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: << \r\n
2022-09-08T12:42:31.595287Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::wire: << BC\r\nx\x01]\x8fAj\xc30\x10E\xef2k\r\xc8\xd2H\x95|\x82n\xba\xcb\xae\x942\x96F\xad \x8e\x13KY\x84\xe0\xbb\xc7d\x15\xb2\xfb|x\xf0\xde\x1d\xfeko0~\xdf\xe1\xdad\xfd\xad\x19F\x90\xe0\x86\xe8$a\xf4\xc2HI\x1c\x86\x92#\x92\r\x89%\xfbX\x12\xc3\xa6^\x11\xad\x8b\xb6\xa5\x10\x92\xd7\x13\x92\x99\"\x06\x1f\x05\x89\x92\xf5\x96>8\xdbw\x84\x83u9\x92\xc7\x89\x06\x8d$b0\x86\x811\x19\x1d\x0c\xb1\x18\x172l?\n\xa4\xf5:s\x97|X:\x1f?\x9f\xbeV\xc1\xe5*\xebm\xb7=W\xe9\xa0\xe0X\xe7\xdaa4Z\xc1RJ\xdb\xbfq\x9f\xe7uI\xd2Z=\xfd\x1d\xea,_{\xa9\xde\x1e}\xf8I\xa5\r\n0\r\n\r\n
2022-09-08T12:42:31.595334Z TRACE send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}:write: isahc::handler: received 244 bytes of data
2022-09-08T12:42:31.595401Z DEBUG send_async{method=POST uri=http://127.0.0.1:7700/indexes/users/search}:handler{id=0}: isahc::handler: Connection #0 to host 127.0.0.1 left intact
2022-09-08T12:42:31.595430Z TRACE agent_thread{id=0}: polling::kqueue: add: kqueue_fd=11, fd=16, ev=Event { key: 0, readable: false, writable: false }    
2022-09-08T12:42:31.595554Z TRACE meilisearch_sdk::request: Request succeed    

Hey @Jasperav
Sorry I left on hollidays! I'll have to investigate and try to reproduce locally. I'll keep you updated here :)

Hi,

I am looking at this issue.

I don't see typotolerance in settings like in the golang version https://github.com/meilisearch/meilisearch-go/blob/7427b4c288a162c3ccd4b2149cec131438b3ab0d/types.go#L58.

Thanks a lot @vishalsodani

It is an open issue #260. The feature should be added!

@bidoubiwa is there any timeline set for this issue?

@bidoubiwa I am sorry to ping you again but do you have any date? This is a production blocker issue.

Hey @Jasperav is it possible it is linked to this ? meilisearch/meilisearch#1495 (comment)

@bidoubiwa yeah it could be, I am also using the searchable attributes.

Unfortunately I can only suggest that you add a comment on the related issue. Until fixed in the Meilisearch engine, it will not be fixed here :(

@bidoubiwa I am sorry, I think I misread it. I think this is not an issue in the engine itself. The Rest API works correctly but this driver isn't. This looks like a driver related issue to me.

After discussing it with @irevoire, he brought up some interesting points. I'll let him provide it here.
Additionally, we might also want the opinion of @Kerollmops on the subject, as it may be possible that the solution should be implemented on the engine side.

Ok, so I logged the query by using an nc -l 7700 and here are the results.

When using cURL (7.84.0, libcurl/7.84.0)

curl -XPOST 'http://127.0.0.1:7700/indexes/what/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "piet" }'
POST /indexes/what/search HTTP/1.1
Host: 127.0.0.1:7700
User-Agent: curl/7.84.0
Accept: */*
Content-Type: application/json
Content-Length: 15

{ "q": "piet" }

When using the SDK (0.20.1, or 7d3f1a7)

use meilisearch_sdk::client::Client;
use meilisearch_sdk::indexes::Index;
use meilisearch_sdk::search::SearchQuery;

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let client: Client = Client::new("http://localhost:7700", "masterKey");
    let index = Index::new("what", client);
    let lol = SearchQuery::new(&index)
        .with_query("piet")
        .execute::<()>()
        .await;
}
POST /indexes/what/search HTTP/1.1
Host: localhost:7700
Accept: */*
Accept-Encoding: deflate, gzip
authorization: Bearer masterKey
content-type: application/json
user-agent: Meilisearch Rust (v0.20.1)
Content-Length: 12

{"q":"piet"}

I then tried reproducing the issue with the latest release-v0.30.0 branch of meilisearch and failed. We can close this issue as it seems to be fixed on the latest version of the engine.

When using cURL

curl -XPOST 'http://127.0.0.1:7700/indexes/what/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "piet" }'
{
    "hits": [
        {
            "user_id": "a0bacccc-b2a7-4c91-83f9-8dbf2ea42e8d"
        },
        {
            "user_id": "ec5a4e56-bf1a-44e3-aeee-91b28c25eb30"
        },
        {
            "user_id": "858a3a83-af26-48f4-b735-4987f4e2bc19"
        }
    ],
    "query": "piet",
    "processingTimeMs": 0,
    "limit": 20,
    "offset": 0,
    "estimatedTotalHits": 3
}

When using the SDK

use meilisearch_sdk::client::Client;
use meilisearch_sdk::indexes::Index;
use meilisearch_sdk::search::SearchQuery;
use serde_json::Value;

#[tokio::main(flavor = "current_thread")]
async fn main() {
    let client: Client = Client::new("http://localhost:7700", "masterKey");
    let index = Index::new("what", client);
    let lol = SearchQuery::new(&index)
        .with_query("piet")
        .execute::<Value>()
        .await;

    dbg!(lol);
}
[examples/simple.rs:16] lol = Ok(
    SearchResults {
        hits: [
            SearchResult {
                result: Object {
                    "user_id": String("a0bacccc-b2a7-4c91-83f9-8dbf2ea42e8d"),
                },
                formatted_result: None,
                matches_position: None,
            },
            SearchResult {
                result: Object {
                    "user_id": String("ec5a4e56-bf1a-44e3-aeee-91b28c25eb30"),
                },
                formatted_result: None,
                matches_position: None,
            },
            SearchResult {
                result: Object {
                    "user_id": String("858a3a83-af26-48f4-b735-4987f4e2bc19"),
                },
                formatted_result: None,
                matches_position: None,
            },
        ],
        offset: 0,
        limit: 20,
        estimated_total_hits: 3,
        facet_distribution: None,
        processing_time_ms: 0,
        query: "piet",
    },
)
commented

Oops, sorry, I totally forgot to answer.

Personally, from what I understand the issue comes down to the preserve_order feature of serde_json.
That ensures your JSON Value will stay ordered according to how you received it over the network.

And I don't think it's in the scope of this crate to enable this feature. You should check how the feature unification works in rust but basically, to do a quick tldr:
We can't enable it only for meilisearch-rust. Thus, if we enable it'll be enabled for all your workspace and the other crate using serde_json.
Since the features slow down the deserialization a little bit and are typically not needed, I don't think it's a good idea to ship it by default.

EDIT: Ooops yeah sorry I totally misunderstood the issue ignore me 🙈

@Kerollmops I still can reproduce the bug, how can I provide more information because it works for you? Did you enable any features? If you could share your exact project I can try to reproduce it on my own machine (or vice-versa).

@irevoire I am not sure why the feature isn't enabled by default. The whole point of Meili is showing relevant hits in order right? If I add the feature flag to the serde_json in my own project, nothing happens. Probably because the feature doesn't propagate to other dependencies. I am unsure the flag has to do anything with this bug, this is the description of the feature:

Consider enabling features = ["preserve_order"] if you care about the order of map keys.

But this has nothing to do with map keys right? The order of the whole hits array inside SearchResults is different than my curl response.

@irevoire is talking about the order of the fields of your documents, not sure the reason why we engaged this subject here.

However, regarding the order of your documents, I am not sure why you think there is an issue with the order of them, in your example the two documents (ec5a4e56-bf1a-44e3-aeee-91b28c25eb30 and 858a3a83-af26-48f4-b735-4987f4e2bc19) that are in reverse order are considered equal by the engine. Both have an empty bio field; this is the only field specified in the settings used to determine their order.

Am I right to assume that nothing in your settings is defined as sortableAttributes and therefore we must not take the d_karma into account here, right?

@Kerollmops I got this in my ranking rules:

	"rankingRules": [
		"d_karma:desc",

This means it should rank the results first by d_karma and then the other properties right? Of course I can add it to the sortable attributes, but I don't think I keep the d_karma as the first ranking rule in production. This is just for demonstration purposes. I am still confused why the CLI consistently works as expected and this SDK consistently gives a different results. They should be equal right?

Ho! you are absolutely right, indeed! Could you please remember me the Meilisearch version you are using? And yes you are also right, results should be the same, it should not depend on the client used!

@Kerollmops Sure, I am using Meilisearch v0.30.0 and SDK 0.20.1.

Have you tried starting from a freshly created database by deleting the data.ms, and by sending only those four documents in one single request, already? It could be related to this issue we have where duplicated documents can be visible in search requests. We hope to fix it quickly and continue to investigate that.

@Kerollmops I shut down meili in docker compose and restarted it (I am using no volumes so it should be fresh). The problem isn't that there are duplicate documents/versions right? I am inserting 3 unique users, no updates and no different versions of the users. I just tried sending all the documents again in a request and I am still experiencing this bug.

Hey @Jasperav,

I want a fresh eye to look at this issue, like @ManyTheFish or @loiclec, as you worked on the sort ranking rule. For context, this user is experiencing strange behavior with the d_karma:desc ranking rule but only when using the Rust SDK, not when doing the same call using cURL. Do you have an idea? Could it be related to the ranking rule? I am not able to reproduce that on my side...

In the meantime, we are releasing the v0.30.1 this week, I advise you to switch to this version when it is ready 😃

I can confirm that meilisearch/meilisearch#3165 , which was fixed in v0.30.1 , could cause the sort order to be incorrect. But I don't think that is what caused this particular issue. And as you said @Kerollmops , meilisearch/meilisearch#3021 (comment) could also be related, as it could corrupt the documents database (one of the possible effects is to have duplicated documents, but it is not the only one). But this wouldn't explain why @Jasperav is still able to reproduce it from a clean slate just by sending three unique documents.

I also tried to reproduce it using the latest Rust SDK and Meilisearch v0.30.1, but it returned the correct results for me. So I am out of ideas to debug this particular issue, sorry :(

So I tried debugging it one more time in a new project, then it worked. I cleared my cargo dir and docker cache of the main project to force downloading the newest versions of both projects, than it also worked over there :(. It looks like I wasted a lot of people's time including my own, sorry 😔

I am glad to hear that it appears to be resolved! And don't worry about wasting our time. It is always better to report bugs and you clearly put effort into it, which I really appreciate 🙂