Docs QA Bot🤖

A streamlit app that enables users to interact with the uploaded PDF. You can ask questions or doubts regarding the PDF and our Chatbot would answer them with a friendly response.

Tech stack

🐍Python
🛑🔥Streamlit
🦜️🔗Langchain
🔰Weaviate
❇️OpenAI
🆚Git & Github
🤗Hugging Face (used for testing purpose)
🥭MongoDB (used for testing purpose)

Demo App

Working

Let's breakdown the working of the app into chunks to make it easier to understand:

Upload the PDF
Extract the text from the PDF file
Generate embeddings of the text
Store the embeddings in the vectorstore
Retrieve the closest match
Display the results in a Chatbot (Interface)

Upload the PDF

It has to be a file with .pdf extension and it must be within 15 MB for time being.
Then this file will be used for further processing.

Extract the text from the PDF file

- We need to extract the text from the PDF for which we use [PyPDF2](https://pypdf2.readthedocs.io/en/3.0.0/) library and does its part really well and quick.

Generate embeddings of the text

- We are then using generated text and to split the text into small chunks and create documents and are fed as input into the OpenAI Embedding library.

Store the embeddings in the vectorstore

- We are storing the embeddings into the Weaviate vectorstore where we have a certain schema to maintain modularity and all the embeddings are stored there.

Retrieve the closest match

- We then run the Weaviate hybrid search on the schema, using Langchain and OpenAI that will return the closest match

Display the results in a Chatbot (Interface)

- Finally we display the results as a chat like interface provided by Streamlit

orionsolidified / docs-qa-bot