In this project, we will implement a chatbot (RAG model) that can answer questions about Next.js documentation.
- Documentation Acquisition: Downloading HTML content from the Next.js Official Documentation.
- HTML Scrapping: Extracting critical data, focusing on the
article
tag from each page. - Data Processing: Tokenizing and vectorizing the collected information.
- Data Indexing: Storing the processed data in a Pinecone index for efficient retrieval.
- Chatbot Creation: Using LangChain in conjunction with OpenAI models and the Pinecone index to develop a responsive chatbot.
- Create a python virtual environment:
python -m venv .venv
- Activate the virtual environment:
source .venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
-
Duplicate .env.template to create .env..
Remember to populate it with your OpenAI API key, Pinecone API key, and Pinecone environment name.
-
To download the sources, run the following command in your terminal:
python download_sources.py
- To vectorize the sources, run the following command in your terminal:
python vectorize_sources.py
- To start the assistant, run the following command in your terminal:
streamlit run main.py