This Flask backend API takes a document in multiple formats (.txt, .docx, .pptx, .jpg, .png, .eml, .html, and .pdf) and allows you to perform a semantic search in 100+ languages supported by Cohere Multilingual API. Qdrant vector database is used to save embeddings.
The following steps will guide you on how to run the application on macOS/Linux.
- Python 3
- Git
- virtualenv
- Homebrew
- Clone the repository
git clone https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval docQA
- Change into the directory
cd docQA
- Create and activate a virtual environment
python3 -m venv env
source env/bin/activate
- Install the required packages
pip install -r requirements.txt
- Install Homebrew
Follow the installation guide on Homebrew website.
- Install the following brew packages
brew install libmagic poppler tesseract libxml2 libxslt
- Create a
.env
file and set the following environment variables:
cohere_api_key="insert here"
openai_api_key="insert here"
qdrant_url="insert here"
qdrant_api_key="insert here"
Replace the values with your own API keys and Qdrant URL.
Please signup for a free cloud-based account of Qdrant and create a new cluster. You will then be able to get the qdrant_url and qdrant_api_key used in the section above.
- Run the application using the following command:
gunicorn app:app
- Access the API endpoints
The API endpoints will be live at the following routes:
/embed
/retrieve
You have successfully installed and ran the DocQA system on your local machine. Feel free to explore the code and make changes as per your requirements.
Feel free to reach out if any questions on Twitter