scrippt-tech / orca

LLM Orchestrator built in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

General solution to generate embeddings concurrently

santiagomed opened this issue · comments

Concurrent processing is a key strategy for speeding up embedding generation, especially when generating embeddings for a series of texts. However, a general issue that surfaces with concurrency is the difficulty in maintaining the correct order of the output embeddings. In concurrent setups, due to asynchronous task completions, the order of the output embeddings (Vec<Vec<f32>>) often doesn’t align with the order of the input prompts (Vec<Box<dyn Prompt>>). This misalignment is causes issues such as a misalignment of text-to-embedding when inserting in a vector database such as Qdrant.

Concurrency:

  • Rayon Crate: Experiments with the Rayon crate have shown drastic improvements in processing speed (80% faster) for Bert embeddings but have fallen short of solving the ordering issue.
  • Tokio: Using tokio::spawn to start multiple asynchronous embedding tasks is another potential solution under exploration for async embedding contexts (such as using OpenAI for embedding API calls).

The goal here is to identify a solution that not only enhances processing speed but also ensures the integrity of data by preserving the correct order of embeddings. Any insights or experiences that could contribute to resolving this challenge would be highly valuable.