InternLM / HuixiangDou

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

English | 简体中文

HuixiangDou is a group chat assistant based on LLM (Large Language Model).

Advantages:

  1. Design a two-stage pipeline of rejection and response to cope with group chat scenario, answer user questions without message flooding, see 2401.08772 and 2405.02817
  2. Low cost, requiring only 1.5GB memory and no need for training
  3. Offers a complete suite of Web, Android, and pipeline source code, which is industrial-grade and commercially viable

Check out the scenes in which HuixiangDou are running and join WeChat Group to try AI assistant inside.

If this helps you, please give it a star ⭐

🔆 News

The web portal is available on OpenXLab, where you can build your own knowledge assistant without any coding, using WeChat and Feishu groups.

Visit web portal usage video on YouTube and BiliBili.

📖 Support

Model File Format IM Application
  • pdf
  • word
  • excel
  • ppt
  • html
  • markdown
  • txt
  • WeChat
  • Lark
  • ..

📦 Hardware

The following are the hardware requirements for running. It is suggested to follow this document, starting with the basic version and gradually experiencing advanced features.

Version GPU Memory Requirements Features Tested on Linux
Cost-effective Edition 1.5GB Use openai API (e.g., kimi and deepseek) to handle source code-level issues
Free within quota
Standard Edition 19GB Deploy local LLM can answer basic questions
Complete Edition 40GB Fully utilizing search + long-text, answer source code-level questions

🔥 Run

First agree BCE license and login huggingface.

huggingface-cli login

Then install requirements.

# parsing `word` format requirements
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
# python requirements
pip install -r requirements.txt

Standard Edition

The standard edition runs text2vec, rerank and a 7B model locally.

STEP1. First, without rejection pipeline, run test cases:

# Standalone mode
# main creates a subprocess to run the LLM API, then sends requests to the subprocess
python3 -m huixiangdou.main --standalone
..
..Topics unrelated to the knowledge base.."How to install mmpose?"
..Topics unrelated to the knowledge base.."How's the weather tomorrow?"

You can see that the result of handling the example question in main.py is the same, whether it's about mmpose installation or How's the weather tomorrow?

STEP2. Use mmpose and test documents to build a knowledge base and enable the rejection pipeline

Copy all the commands below (including the '#' symbol) and execute them.

# Download knowledge base documents
cd HuixiangDou
mkdir repodir
git clone https://github.com/open-mmlab/mmpose --depth=1 repodir/mmpose
git clone https://github.com/tpoisonooo/huixiangdou-testdata --depth=1 repodir/testdata

# Save the features of repodir to workdir
mkdir workdir
python3 -m huixiangdou.service.feature_store

Note

If restarting local LLM is too slow, first python3 -m huixiangdou.service.llm_server_hybrid, then open a new terminal, and only execute python3 -m huixiangdou.main without restarting LLM.

Then rerun main, Huixiangdou will be able to answer mmpose installation and reject casual chats.

python3 -m huixiangdou.main --standalone
..success.. To install mmpose, you should..
..Topics unrelated to the knowledge base.."How's the weather tomorrow?"

Please adjust the repodir documents, good_questions, and bad_questions to try your own domain knowledge (medical, financial, power, etc.).

STEP3. Test sending messages to Feishu group (optional)

This step is just for testing algorithm pipeline, STEP4 also support IM applications.

Click Create Feishu Custom Bot to obtain the callback WEBHOOK_URL and fill it in config.ini

# config.ini
...
[frontend]
type = "lark"
webhook_url = "${YOUR-LARK-WEBHOOK-URL}"

Run. After the end, the technical assistant's response will be sent to Feishu group.

python3 -m huixiangdou.main --standalone

STEP4. WEB service and IM applications

We provide a complete front-end UI and backend service that supports:

  • Multi-tenant management
  • Zero-programming access to Feishu, WeChat groups

See the effect at OpenXlab APP, please read the web deployment document.

Cost-effective Edition

If your machine only has 2G of video memory, or if you are pursuing cost-effectiveness, you only need to read this Zhihu document.

The cost-effective version only discards the local LLM and uses the remote LLM instead, and other functions are the same as the standard version.

Take kimi as an example, fill in the API KEY applied from the official website into config-2G.ini

# config-2G.ini
[llm]
enable_local = 0
enable_remote = 1
...
remote_type = "kimi"
remote_api_key = "${YOUR-API-KEY}"

Note

The worst case for each Q&A is to call the LLM 7 times, subject to the free user RPM limit, you can modify the rpm parameter in config.ini

Execute the command to get the Q&A result

python3 -m huixiangdou.main --standalone --config-path config-2G.ini # Start all services at once

Complete Edition

The HuixiangDou deployed in the WeChat group is the complete version.

When 40G of GPU memory is available, long text + retrieval capabilities can be used to improve accuracy.

Please read following topics

🛠️ FAQ

  1. What if the robot is too cold/too chatty?

    • Fill in the questions that should be answered in the real scenario into resource/good_questions.json, and fill the ones that should be rejected into resource/bad_questions.json.
    • Adjust the theme content in repodir to ensure that the markdown documents in the main library do not contain irrelevant content.

    Re-run feature_store to update thresholds and feature libraries.

    ⚠️ You can directly modify reject_throttle in config.ini. Generally speaking, 0.5 is a high value; 0.2 is too low.

  2. Launch is normal, but out of memory during runtime?

    LLM long text based on transformers structure requires more memory. At this time, kv cache quantization needs to be done on the model, such as lmdeploy quantization description. Then use docker to independently deploy Hybrid LLM Service.

  3. How to access other local LLM / After access, the effect is not ideal?

  4. What if the response is too slow/request always fails?

    • Refer to hybrid llm service to add exponential backoff and retransmission.
    • Replace local LLM with an inference framework such as lmdeploy, instead of the native huggingface/transformers.
  5. What if the GPU memory is too low?

    At this time, it is impossible to run local LLM, and only remote LLM can be used in conjunction with text2vec to execute the pipeline. Please make sure that config.ini only uses remote LLM and turn off local LLM.

  6. No module named 'faiss.swigfaiss_avx2' locate installed faiss package

    import faiss
    print(faiss.__file__)
    # /root/.conda/envs/InternLM2_Huixiangdou/lib/python3.10/site-packages/faiss/__init__.py

    add soft link

    # cd your_python_path/site-packages/faiss
    cd /root/.conda/envs/InternLM2_Huixiangdou/lib/python3.10/site-packages/faiss/
    ln -s swigfaiss.py swigfaiss_avx2.py

🍀 Acknowledgements

📝 Citation

@misc{kong2024huixiangdou,
      title={HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance}, 
      author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
      year={2024},
      eprint={2401.08772},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kong2024huixiangdoucr,
      title={HuixiangDou-CR: Coreference Resolution in Group Chats}, 
      author={Huanjun Kong},
      year={2024},
      eprint={2405.02817},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Python 71.9%Language:TypeScript 19.9%Language:Kotlin 4.1%Language:Less 3.2%Language:JavaScript 0.6%Language:HTML 0.2%Language:Shell 0.1%