orionsolidified / docs-qa-bot

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Docs QA Bot🤖

Docs QA Bot Thumbnail A streamlit app that enables users to interact with the uploaded PDF. You can ask questions or doubts regarding the PDF and our Chatbot would answer them with a friendly response.

Tech stack

  • 🐍Python
  • 🛑🔥Streamlit
  • 🦜️🔗Langchain
  • 🔰Weaviate
  • ❇️OpenAI
  • 🆚Git & Github
  • 🤗Hugging Face (used for testing purpose)
  • 🥭MongoDB (used for testing purpose)

Demo App

Streamlit App

Working

Let's breakdown the working of the app into chunks to make it easier to understand:

  • Upload the PDF
  • Extract the text from the PDF file
  • Generate embeddings of the text
  • Store the embeddings in the vectorstore
  • Retrieve the closest match
  • Display the results in a Chatbot (Interface)

Upload the PDF

image

  • It has to be a file with .pdf extension and it must be within 15 MB for time being.
  • Then this file will be used for further processing.

Extract the text from the PDF file

image

- We need to extract the text from the PDF for which we use [PyPDF2](https://pypdf2.readthedocs.io/en/3.0.0/) library and does its part really well and quick.

Generate embeddings of the text

image

- We are then using generated text and to split the text into small chunks and create documents and are fed as input into the OpenAI Embedding library.

Store the embeddings in the vectorstore

image

- We are storing the embeddings into the Weaviate vectorstore where we have a certain schema to maintain modularity and all the embeddings are stored there.

Retrieve the closest match

image

- We then run the Weaviate hybrid search on the schema, using Langchain and OpenAI that will return the closest match

Display the results in a Chatbot (Interface)

image

- Finally we display the results as a chat like interface provided by Streamlit

About


Languages

Language:Python 100.0%