π€ The Interactive Document Query System project offers a dynamic web application that empowers users to inquire about documents available in PDF, TXT, or URL formats. Behind this project lies a robust technological stack, including embeddings, vector storage, distance calculation algorithms like FAISS, and a large language model for facilitating user-document interactions.
The project is organized as follows:
TalkDocument
ββ .streamlit
β ββ config.toml
ββ data
ββ example
ββ README.md
ββ requirements.txt
ββ resources
ββ setup.py
ββ src
ββ Home.py
ββ pages
β ββ 1_Step 1οΈβ£ Create Data Base.py
β ββ 2_Step 2οΈβ£ Ask to the document.py
β ββ __init__.py
β
ββ qa_tool.py
ββ style.py
ββ utils
β ββ util.py
β ββ __init__.py
ββ __init__.py
- The .streamlit directory contains the Streamlit configuration file config.toml for customizing the web application's behavior.
- The data directory holds sample documents (test.pdf and test.txt) that will be used for creating the database and querying.
- The docs directory is intended for documentation-related assets, such as images.
- 'README.md' (this file) is the project's main documentation file.
- 'requirements.txt' lists the required Python packages for setting up the project environment.
- The src directory contains the main source code for the project.
- Home.py likely represents the main application entry point or landing page.
- The pages directory includes the implementation for different steps/pages of the application.
- qa_tool.py defines the TalkDocument class responsible for creating the database and handling queries.
- Other utility files like style.py and utils.py might provide styling and helper functions, respectively.
To successfully utilize the Interactive Document Query System, you must satisfy the following prerequisites:
Warning A free API key from Hugging Face Hub: The system employs Hugging Face models for embedding and vector storage. Obtain your API key by registering on the Hugging Face website.
Warning Optionally, an API key from OpenAI (if using OpenAI embedding): If you choose to utilize OpenAI's embedding model, you'll need an OpenAI API key. Register on the OpenAI platform to acquire your key.
To launch the application, follow these steps:
- Clone the repository to your local machine.
- Install the required dependencies using
pip install -r requirements.txt
. - Open your terminal and navigate to the project's root directory.
- Run the following command:
streamlit run yourpath/TalkDocument/src/home.py
This project was developed by DamiΓ‘n Gil GonzΓ‘lez.