A application that enables users to upload PDF documents and ask questions about their content using LangChain and Ollama. The system utilizes embeddings and vector storage for efficient document retrieval and provides concise, context-aware answers.
- PDF document upload and processing
- Text chunking and embedding generation
- Semantic search for relevant context retrieval
- Question answering using the Deepseek language model
- Streamlit-based user interface
- Python 3.9+
- Ollama installed and running locally
- The Deepseek model downloaded in Ollama
- Clone the repository:
git clone https://github.com/yourusername/pdf-qa-system.git
cd deepseek-ollama- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows use: .venv\Scripts\activate- Install the required packages:
pip install -r requirements.txtpdf-qa-system/
├── main.py # Main application file
├── pdfs/ # Directory for uploaded PDFs
├── requirements.txt # Project dependencies
├── .gitignore # Git ignore file
└── README.md # Project documentation
streamlit
pypdf
langchain
langchain-community
langchain-core
langchain-ollama
- Start the Ollama service and ensure the Deepseek model is available:
ollama run deepseek-r1:1.5b- Run the Streamlit application:
streamlit run main.py-
Access the application in your web browser at
http://localhost:8501 -
Upload a PDF document using the file uploader
-
Ask questions about the document content using the chat input
Key parameters can be adjusted in the Config class within main.py:
CHUNK_SIZE: Size of text chunks (default: 1000)CHUNK_OVERLAP: Overlap between chunks (default: 200)MODEL_NAME: Ollama model to use (default: "deepseek-r1:1.5b")MAX_RETRIES: Maximum retry attempts for operations (default: 3)
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for the document processing pipeline
- Ollama for the local language model hosting
- Streamlit for the web interface framework