A Streamlit-based chatbot powered by Retrieval-Augmented Generation (RAG) and OpenAI. Upload your PDFs and chat with them! This app leverages LangChain, FAISS, and OpenAIβs GPT models to extract and query document content with metadata-aware answers.
- π Upload multiple PDFs and query across all of them
- π Metadata-rich answers with filename and page references
- π§ Uses LangChain + FAISS for semantic search
- π€ Streamlit Chat UI for natural conversation
- πΎ OpenAI API support with streaming responses
.
βββ .gitignore
βββ LICENSE
βββ README.md # β You're reading it
βββ app.py # Main Streamlit app
βββ brain.py # PDF parsing and vector index logic
βββ compare medium.gif # Optional UI illustration
βββ requirements.txt # Python dependencies
βββ thumbnail.webp # Preview image
git clone https://github.com/co-dev0909/chatbot-using-rag-and-langchain.git
cd chatbot-using-rag-and-langchainpip install -r requirements.txtCreate a .streamlit/secrets.toml file with:
OPENAI_API_KEY = "your-openai-key"Or export it via environment variable:
export OPENAI_API_KEY="your-openai-key"streamlit run app.py- Upload PDFs via the UI
- Each PDF is parsed using
PyPDF2and chunked via LangChainβsRecursiveCharacterTextSplitter - Chunks are embedded using OpenAI Embeddings
- Stored in a FAISS vector store for semantic similarity search
- Queries are matched to top PDF chunks and passed to ChatGPT with context
- Answers include file name and page number metadata for citation
- Streamlit β UI framework
- LangChain β PDF chunking and retrieval
- FAISS β Vector search backend
- OpenAI GPT β LLM-based answer generation
- PyPDF2 β PDF parsing
"What are the main points from the introduction?"
Answer: The introduction highlights... (example.pdf, page 1)
This project is licensed under the MIT License.
Made with β€οΈ by co-dev0909. Contributions welcome!