jellchou / IncarnaMind

Connect and chat with your multiple documents (pdf and txt) through GPT, Claude and Local Open-Source LLMs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🧠 IncarnaMind

πŸ‘€ In a Nutshell

IncarnaMind enables you to chat with your personal documents πŸ“ (PDF, TXT) using Large Language Models (LLMs) like GPT (architecture overview). While OpenAI has recently launched a fine-tuning API for GPT models, it doesn't enable the base pretrained models to learn new data, and the responses can be prone to factual hallucinations. Utilize our Sliding Window Chunking mechanism and Ensemble Retriever enables efficient querying of both fine-grained and coarse-grained information within your ground truth documents to augment the LLMs.

Feel free to use it and we welcome any feedback and new feature suggestions πŸ™Œ.

✨ New Updates

Open-Source and Local LLMs Support

  • Recommended Model: We've primarily tested with the Llama2 series models and recommend using llama2-70b-chat (either full or GGUF version) for optimal performance. Feel free to experiment with other LLMs.
  • System Requirements: It requires more than 35GB of GPU RAM to run the GGUF quantized version.

Alternative Open-Source LLMs Options

  • Insufficient RAM: If you're limited by GPU RAM, consider using the Together.ai API. It supports llama2-70b-chat and most other open-source LLMs. Plus, you get $25 in free usage.
  • Upcoming: Smaller and cost-effecitive, fine-tuned models will be released in the future.

How to use GGUF models

  • For instructions on acquiring and using quantized GGUF LLM (similar to GGML), please refer to this video (from 10:45 to 12:30)..

Here is a comparison table of the different models I tested, for reference only:

Metrics GPT-4 GPT-3.5 Claude 2.0 Llama2-70b Llama2-70b-gguf Llama2-70b-api
Reasoning High Medium High Medium Medium Medium
Speed Medium High Medium Very Low Low Medium
GPU RAM N/A N/A N/A Very High High N/A
Safety Low Low Low High High Low

πŸ’» Demo

Demo.mp4

πŸ’‘ Challenges Addressed

  • Fixed Chunking: Traditional RAG tools rely on fixed chunk sizes, limiting their adaptability in handling varying data complexity and context.

  • Precision vs. Semantics: Current retrieval methods usually focus either on semantic understanding or precise retrieval, but rarely both.

  • Single-Document Limitation: Many solutions can only query one document at a time, restricting multi-document information retrieval.

  • Stability: IncarnaMind is compatible with OpenAI GPT, Anthropic Claude, Llama2, and other open-source LLMs, ensuring stable parsing.

🎯 Key Features

  • Adaptive Chunking: Our Sliding Window Chunking technique dynamically adjusts window size and position for RAG, balancing fine-grained and coarse-grained data access based on data complexity and context.

  • Multi-Document Conversational QA: Supports simple and multi-hop queries across multiple documents simultaneously, breaking the single-document limitation.

  • File Compatibility: Supports both PDF and TXT file formats.

  • LLM Model Compatibility: Supports OpenAI GPT, Anthropic Claude, Llama2 and other open-source LLMs.

πŸ— Architecture

High Level Architecture

image

Sliding Window Chunking

image

πŸš€ Getting Started

1. Installation

The installation is simple, you just need to run few commands.

1.0. Prerequisites

1.1. Clone the repository

git clone https://github.com/junruxiong/IncarnaMind
cd IncarnaMind

1.2. Setup

Create Conda virtual environment:

conda create -n IncarnaMind python=3.10

Activate:

conda activate IncarnaMind

Install all requirements:

pip install -r requirements.txt

Install llama-cpp seperatly if you want to run quantized local LLMs:

  • For NVIDIA GPUs support, use cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
  • For Apple Metal (M1/M2) support, use
CMAKE_ARGS="-DLLAMA_METAL=on"  FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

Setup your one/all of API keys in configparser.ini file:

[tokens]
OPENAI_API_KEY = (replace_me)
ANTHROPIC_API_KEY = (replace_me)
TOGETHER_API_KEY = (replace_me)
# if you use full Meta-Llama models, you may need Huggingface token to access.
HUGGINGFACE_TOKEN = (replace_me)

(Optional) Setup your custom parameters in configparser.ini file:

[parameters]
PARAMETERS 1 = (replace_me)
PARAMETERS 2 = (replace_me)
...
PARAMETERS n = (replace_me)

2. Usage

2.1. Upload and process your files

Put all your files (please name each file correctly to maximize the performance) into the /data directory and run the following command to ingest all data: (You can delete example files in the /data directory before running the command)

python docs2db.py

2.2. Run

In order to start the conversation, run a command like:

python main.py

2.3. Chat and ask any questions

Wait for the script to require your input like the below.

Human:

2.4. Others

When you start a chat, the system will automatically generate a IncarnaMind.log file. If you want to edit the logging, please edit in the configparser.ini file.

[logging]
enabled = True
level = INFO
filename = IncarnaMind.log
format = %(asctime)s [%(levelname)s] %(name)s: %(message)s

🚫 Limitations

  • Citation is not supported for current version, but will release soon.
  • Limited asynchronous capabilities.

πŸ“ Upcoming Features

  • Frontend UI interface
  • Fine-tuned small size open-source LLMs
  • OCR support
  • Asynchronous optimization
  • Support more document formats

πŸ™Œ Acknowledgements

Special thanks to Langchain, Chroma DB, LocalGPT, Llama-cpp for their invaluable contributions to the open-source community. Their work has been instrumental in making the IncarnaMind project a reality.

πŸ–‹ Citation

If you want to cite our work, please use the following bibtex entry:

@misc{IncarnaMind2023,
  author = {Junru Xiong},
  title = {IncarnaMind},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/junruxiong/IncarnaMind}}
}

πŸ“‘ License

Apache 2.0 License

About

Connect and chat with your multiple documents (pdf and txt) through GPT, Claude and Local Open-Source LLMs

License:Apache License 2.0


Languages

Language:Python 100.0%