damiangilgonzalez1995 / TalkDocument

This project is about how to talk with a document using LLM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Name: Talk with Document using LLM's

Diagram of how the web application works πŸ“·:

Schema

Description

πŸ€– The Interactive Document Query System project offers a dynamic web application that empowers users to inquire about documents available in PDF, TXT, or URL formats. Behind this project lies a robust technological stack, including embeddings, vector storage, distance calculation algorithms like FAISS, and a large language model for facilitating user-document interactions.

Demo πŸŽ₯

Demo Video

Project Structure

The project is organized as follows:

TalkDocument
β”œβ”€ .streamlit
β”‚  └─ config.toml
β”œβ”€ data
β”œβ”€ example
β”œβ”€ README.md
β”œβ”€ requirements.txt
β”œβ”€ resources
β”œβ”€ setup.py
└─ src
   β”œβ”€ Home.py
   β”œβ”€ pages
   β”‚  β”œβ”€ 1_Step 1️⃣ Create Data Base.py
   β”‚  β”œβ”€ 2_Step 2️⃣ Ask to the document.py
   β”‚  β”œβ”€ __init__.py
   β”‚
   β”œβ”€ qa_tool.py
   β”œβ”€ style.py
   β”œβ”€ utils
   β”‚  β”œβ”€ util.py
   β”‚  └─ __init__.py
   β”œβ”€ __init__.py

  • The .streamlit directory contains the Streamlit configuration file config.toml for customizing the web application's behavior.
  • The data directory holds sample documents (test.pdf and test.txt) that will be used for creating the database and querying.
  • The docs directory is intended for documentation-related assets, such as images.
  • 'README.md' (this file) is the project's main documentation file.
  • 'requirements.txt' lists the required Python packages for setting up the project environment.
  • The src directory contains the main source code for the project.
    • Home.py likely represents the main application entry point or landing page.
    • The pages directory includes the implementation for different steps/pages of the application.
    • qa_tool.py defines the TalkDocument class responsible for creating the database and handling queries.
    • Other utility files like style.py and utils.py might provide styling and helper functions, respectively.

Requirements

To successfully utilize the Interactive Document Query System, you must satisfy the following prerequisites:

Warning A free API key from Hugging Face Hub: The system employs Hugging Face models for embedding and vector storage. Obtain your API key by registering on the Hugging Face website.

Warning Optionally, an API key from OpenAI (if using OpenAI embedding): If you choose to utilize OpenAI's embedding model, you'll need an OpenAI API key. Register on the OpenAI platform to acquire your key.

Getting Started

To launch the application, follow these steps:

  1. Clone the repository to your local machine.
  2. Install the required dependencies using pip install -r requirements.txt.
  3. Open your terminal and navigate to the project's root directory.
  4. Run the following command:
streamlit run yourpath/TalkDocument/src/home.py

Credits

This project was developed by DamiΓ‘n Gil GonzΓ‘lez.

About

This project is about how to talk with a document using LLM


Languages

Language:Python 100.0%