NDJSON/CSV methods to add and update documents

Question

NDJSON/CSV methods to add and update documents

curquiza opened this issue 3 years ago · comments

⚠️ This issue is generated, it means the nameing might be done differently in this package (ex: add_documents_json instead of addDocumentsJson). Keep the already existing way of naming in this package to stay idiomatic with the language and this repository.

📣 We strongly recommend doing multiple PRs to solve all the points of this issue

MeiliSearch v0.23.0 introduces two changes:

new valid formats to push data files, additionally to the JSON format: CSV and NDJSON formats.
it enforces the Content-type header for every route requiring a payload (POST and PUT routes)

Here are the expected changes to completely close the issue:

docs are the documents sent as String
primaryKey is the primary key of the index
batchSize is the size of the batch. Example: you can send 2000 documents in raw String in docs and ask for a batchSize of 1000, so your documents will be sent to MeiliSearch in two batches.

Example of PRs:

in PHP SDK: meilisearch/meilisearch-php#235
in Python SDK: meilisearch/meilisearch-python#329

Related to: meilisearch/integration-guides#146

If this issue is partially/completely implemented, feel free to let us know.

Carlos B · Answer 1 · Mon Jul 10 2023 01:31:45 GMT+0800 (China Standard Time)

The idea is the encapsulation of the functions for ndjson / json / csv? It seems add_or_replace_unchecked_payload permits these formats

    /// let task = movie_index.add_or_replace_unchecked_payload(
    ///     r#"{ "id": 1, "body": "doggo" }
    ///     { "id": 2, "body": "catto" }"#.as_bytes(),
    ///     "application/x-ndjson",
    ///     Some("id"),
    ///   ).await.unwrap();

Also, about the naming, shouldn't it be snakecase , not camelcase?

cvermand · Answer 2 · Tue Jul 11 2023 17:48:40 GMT+0800 (China Standard Time)

Hey @carlosb1
Indeed add_or_replace_unchecked_payload does the trick. The PR is kept open in case someone want to specifically implement these functions.

@curquiza issue is an issue that was created in our different SDK's, this is why the API design might not exactly follow rust conventions. We expect the contributor to adapt this design to be more in line with rust :)

Carlos B · Answer 3 · Wed Aug 09 2023 22:50:33 GMT+0800 (China Standard Time)

Well. I did a first PR with the first functions... I think it can be a good kickoff.... Furthermore, I think there are an issue with the tests, it saw some random error.

Clémentine · Answer 4 · Tue Sep 12 2023 19:39:25 GMT+0800 (China Standard Time)

Not all features are done so I re open

Carlos B · Answer 5 · Sat Sep 23 2023 05:33:29 GMT+0800 (China Standard Time)

Checking the code for the implementation of the batches functions: addDocumentsNdjsonInBatches , addDocumentsCsvInBatches , etc. I am not sure it makes sense... How should it split in batches?.... the input param for each function is a string.. you can not decide how to split these strings... In the current implementation works because it uses &[T] where T is an independent object serialized... it can send as each request in parallel

Clémentine · Answer 6 · Thu Sep 28 2023 00:13:52 GMT+0800 (China Standard Time)

Indeed, you are right. If it makes no sense for the community, let's close it. This package is for the community, so let's not add useless maintenance 😄