open-text-embeddings
Many open source projects support the compatibility of the completions
and the chat/completions
endpoints of the OpenAI API, but do not support the embeddings
endpoint.
The goal of this project is to create an OpenAI API-compatible version of the embeddings
endpoint, which serves open source sentence-transformers models and other models supported by the LangChain's HuggingFaceEmbeddings, HuggingFaceInstructEmbeddings and HuggingFaceBgeEmbeddings class.
ℹ️ Supported Text Embeddings Models
Below is a compilation of open-source models that are tested via the embeddings
endpoint:
- BAAI/bge-large-en
- intfloat/e5-large-v2
- sentence-transformers/all-MiniLM-L6-v2
- sentence-transformers/all-mpnet-base-v2
- universal-sentence-encoder-large/5 (Please refer to the
universal_sentence_encoder
branch for more details)
The models mentioned above have undergone testing and verification. It is worth noting that all sentence-transformers models are expected to perform seamlessly with the endpoint.
🔍 Demo
Try out open-text-embeddings in your browser:
🖥️ Standalone FastAPI Server
To run the embeddings endpoint locally as a standalone FastAPI server, follow these steps:
-
Install the dependencies by executing the following commands:
pip install --no-cache-dir open-text-embeddings[server]
-
Run the server with the desired model using the following command which enabled normalize embeddings (Omit the
NORMALIZE_EMBEDDINGS
if the model don't support normalize embeddings):MODEL=intfloat/e5-large-v2 NORMALIZE_EMBEDDINGS=1 python -m open.text.embeddings.server
If a GPU is detected in the runtime environment, the server will automatically execute using the
cuba
mode. However, you have the flexibility to specify theDEVICE
environment variable to choose betweencpu
andcuba
. Here's an example of how to run the server with your desired configuration:MODEL=intfloat/e5-large-v2 NORMALIZE_EMBEDDINGS=1 DEVICE=cpu python -m open.text.embeddings.server
This setup allows you to seamlessly switch between CPU and GPU modes, giving you control over the server's performance based on your specific requirements.
-
You will see the following text from your console once the server has started:
INFO: Started server process [19705] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
☁️ AWS Lambda Function
To deploy the embeddings endpoint as an AWS Lambda Function using GitHub Actions, follow these steps:
-
Fork the repo.
-
Add your AWS credentials (
AWS_KEY
andAWS_SECRET
) to the repository secrets. You can do this by navigating to https://github.com/username/open-text-embeddings/settings/secrets/actions. -
Manually trigger the
Deploy Dev
orRemove Dev
GitHub Actions to deploy or remove the AWS Lambda Function.
🧪 Testing the Embeddings Endpoint
To test the embeddings
endpoint, the repository includes an embeddings.ipynb notebook with a LangChain-compatible OpenAIEmbeddings
class.
To get started:
-
Install the dependencies by executing the following command:
pip install --no-cache-dir open-text-embeddings openai
-
Execute the cells in the notebook to test the embeddings endpoint.
❓ Known Issues
- Gzip compression for web request doesn't seems working in AWS Lambda Function.
🧑💼 Contributing
Contributions are welcome! Please check out the issues on the repository, and feel free to open a pull request. For more information, please see the contributing guidelines.
Thank you very much for the following contributions:
📔 License
This project is licensed under the terms of the MIT license.
🗒️ Citation
If you utilize this repository, please consider citing it with:
@misc{open-text-embeddings,
author = {Lim Chee Kin},
title = {open-text-embeddings: Open Source Text Embedding Models with OpenAI API-Compatible Endpoint},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/limcheekin/open-text-embeddings}},
}