This Flask backend API takes a document in multiple formats (.txt, .docx, .pptx, .jpg, .png, .eml, .html, and .pdf) and allows you to perform a semantic search in 100+ languages supported by Cohere Multilingual API. Qdrant vector database is used to save embeddings.
Install all the python dependencies using pip
pip install -r requirements.txt
Documents are read and extracted using a library named Unstructured which requires addition installations using Brew
brew install libmagic poppler tesseract libxml2 libxslt
Please make an account on Qdrant and create a new cluster. You will then be able to get the qdrant_url and qdrant_api_key used in the section below.
Please assign environment variables as follows.
cohere_api_key="insert here"
openai_api_key="insert here"
qdrant_url="insert here"
qdrant_api_key="insert here"
Run the app using Gunicorn command
gunicorn app:app
The app should now be running with an api route /embed
and another api route /retrieve
.
Feel free to reach out if any questions on Twitter