dheerajbatra / langchain-cohere-qdrant-doc-retrieval

This Flask backend API takes a document in multiple formats and allows you to perform semantic search using Langchain, Cohere and Qdrant.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

langchain-cohere-qdrant-doc-retrieval

This Flask backend API takes a document in multiple formats (.txt, .docx, .pptx, .jpg, .png, .eml, .html, and .pdf) and allows you to perform a semantic search in 100+ languages supported by Cohere Multilingual API. Qdrant vector database is used to save embeddings.

Setup

The following steps will guide you on how to run the application on macOS/Linux.

Prerequisites

  • Python 3
  • Git
  • virtualenv
  • Homebrew

Installation

  1. Clone the repository
git clone https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval docQA
  1. Change into the directory
cd docQA
  1. Create and activate a virtual environment
python3 -m venv env
source env/bin/activate
  1. Install the required packages
pip install -r requirements.txt
  1. Install Homebrew

Follow the installation guide on Homebrew website.

  1. Install the following brew packages
brew install libmagic poppler tesseract libxml2 libxslt
  1. Create a .env file and set the following environment variables:
cohere_api_key="insert here"
openai_api_key="insert here"
qdrant_url="insert here"
qdrant_api_key="insert here"

Replace the values with your own API keys and Qdrant URL.

Qdrant url and api keys

Please signup for a free cloud-based account of Qdrant and create a new cluster. You will then be able to get the qdrant_url and qdrant_api_key used in the section above.

  1. Run the application using the following command:
gunicorn app:app
  1. Access the API endpoints

The API endpoints will be live at the following routes:

  • /embed
  • /retrieve

Conclusion

You have successfully installed and ran the DocQA system on your local machine. Feel free to explore the code and make changes as per your requirements.

Feel free to reach out if any questions on Twitter

About

This Flask backend API takes a document in multiple formats and allows you to perform semantic search using Langchain, Cohere and Qdrant.


Languages

Language:Python 100.0%