meilisearch / meilisearch-rust

Rust wrapper for the Meilisearch API.

Home Page:https://www.meilisearch.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NDJSON/CSV methods to add and update documents

curquiza opened this issue · comments

⚠️ This issue is generated, it means the nameing might be done differently in this package (ex: add_documents_json instead of addDocumentsJson). Keep the already existing way of naming in this package to stay idiomatic with the language and this repository.

📣 We strongly recommend doing multiple PRs to solve all the points of this issue

MeiliSearch v0.23.0 introduces two changes:

  • new valid formats to push data files, additionally to the JSON format: CSV and NDJSON formats.
  • it enforces the Content-type header for every route requiring a payload (POST and PUT routes)

Here are the expected changes to completely close the issue:

  • Currently, the SDKs always send Content-Type: application/json to every request. Only the POST and PUT requests should send the Content-Type: application/json and not the DELETE and GET ones.

  • Add the following methods and 🔥 the associated tests 🔥 to ADD the documents. Depending on the format type (csv or ndjson) the SDK should send Content-Type: application/x-dnjson or Content-Type: text/csv)

    • addDocumentsJson(string docs, string primaryKey)
    • addDocumentsCsv(string docs, string primaryKey)
    • addDocumentsCsvInBatches(string docs, int batchSize, string primaryKey)
    • addDocumentsNdjson(string docs, string primaryKey)
    • addDocumentsNdjsonInBatches(string docs, int batchSize, string primaryKey)
  • Add the following methods and 🔥 the associated tests 🔥 to UPDATE the documents. Depending on the format type (csv or ndjson) the SDK should send Content-Type: application/x-dnjson or Content-Type: text/csv)

    • updateDocumentsJson(string docs, string primaryKey)
    • updateDocumentsCsv(string docs, string primaryKey)
    • updateDocumentsCsvInBatches(string docs, int batchSize, string primaryKey)
    • updateDocumentsNdjson(string docs, string primaryKey)
    • updateDocumentsNdjsonInBatches(string docs, int batchSize, string primaryKey)

docs are the documents sent as String
primaryKey is the primary key of the index
batchSize is the size of the batch. Example: you can send 2000 documents in raw String in docs and ask for a batchSize of 1000, so your documents will be sent to MeiliSearch in two batches.

Example of PRs:


Related to: meilisearch/integration-guides#146

If this issue is partially/completely implemented, feel free to let us know.

The idea is the encapsulation of the functions for ndjson / json / csv? It seems add_or_replace_unchecked_payload permits these formats

    /// let task = movie_index.add_or_replace_unchecked_payload(
    ///     r#"{ "id": 1, "body": "doggo" }
    ///     { "id": 2, "body": "catto" }"#.as_bytes(),
    ///     "application/x-ndjson",
    ///     Some("id"),
    ///   ).await.unwrap();

Also, about the naming, shouldn't it be snakecase , not camelcase?

Hey @carlosb1
Indeed add_or_replace_unchecked_payload does the trick. The PR is kept open in case someone want to specifically implement these functions.

@curquiza issue is an issue that was created in our different SDK's, this is why the API design might not exactly follow rust conventions. We expect the contributor to adapt this design to be more in line with rust :)

Well. I did a first PR with the first functions... I think it can be a good kickoff.... Furthermore, I think there are an issue with the tests, it saw some random error.

Not all features are done so I re open

Checking the code for the implementation of the batches functions: addDocumentsNdjsonInBatches , addDocumentsCsvInBatches , etc. I am not sure it makes sense... How should it split in batches?.... the input param for each function is a string.. you can not decide how to split these strings... In the current implementation works because it uses &[T] where T is an independent object serialized... it can send as each request in parallel

Indeed, you are right. If it makes no sense for the community, let's close it. This package is for the community, so let's not add useless maintenance 😄