Article_Research_Tool is an advanced RAG application designed for efficient extraction and analysis of information from news articles using Python, Streamlit, LangChain, FAISS, and open-source language models.
- Many researchers, journalists, and everyday users face the challenge of extracting specific information from lengthy articles or websites. This process is time-consuming as they need to read through entire articles or web pages to find the details they seek.
- For instance, researchers might need to understand a specific topic within an article, while consumers may want to find particular policy rules or product features before making a purchase decision.
- The News Research Tool aims to streamline this process by providing precise and concise answers to user queries, saving valuable time and effort.
- A clean and intuitive Streamlit interface.
- Sidebar input for up to three news article URLs.
- Utilizes WebBaseLoader for scraping and parsing articles.
- Supports dynamic web pages with customizable class names for precise data extraction.
- Employs HuggingFaceEmbeddings with the sentence-transformers all-MiniLM-l6-v2 model for robust text embeddings.
- Configurable model parameters and encoding options for optimized performance.
- Integrates FAISS for efficient vector storage and retrieval.
- Utilizes llama3-8b-8192 open-source LLM for question answering and information retrieval, ensuring transparency and flexibility.
git clone https://github.com/theshubh007/Article_Research_Tool_Based_on_OpenSouceLLMs.git
python -m venv venv
.\.venv\Scripts\activate
pip install -r requirements.txt
GROQ_API_KEY=your_groq_api_key
streamlit run app.py
- Streamlit: Provides the interactive web interface.
- LangChain: Handles document loading, text splitting, and prompt management.
- FAISS: Manages the vector storage and retrieval of document embeddings.
- HuggingFaceEmbeddings: Generates text embeddings using a pre-trained model.
- ChatGroq: Provides advanced language modeling for question answering.
- Support for Additional Languages: Extend the tool to support news articles in multiple languages.
- Improved Scraping Capabilities: Enhance the scraping logic to handle a wider variety of news websites.
- Enhanced Visualization: Add more visualization options for the retrieved data and analysis results.
Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.