remixer-dec/botality-ii

Botality II

This project is an implementation of a modular telegram bot based on aiogram, designed for local ML Inference with remote service support. Currently integrated with:

Stable Diffusion (using stable-diffusion-webui API),
TTS text-to-speech engine (using TTS (VITS) and so-vits-SVC) as well as OS voices.
STT integrated with multiple speech recognition engines, including whisper.cpp ¹, silero, wav2vec2
LLMs such as llama, gpt-j, gpt-2 with support for assistant mode via instruct-tuned lora models and multimodality via adapter-model
TTA experimental text-to-audio support via audiocraft

Accelerated LLM inference support: llama.cpp, mlc-llm and llama-mps
Remote LLM inference support: oobabooga/text-generation-webui, LostRuins/koboldcpp and llama.cpp server
Compatibility table is available here

Evolved from predecessor Botality I
Shipped with an easy-to-use webui, you can run commands and talk with the bot right in the webui.

Documentation

You can find it here (coming soon)

Changelog

Some versions have breaking changes, see Changelog file for more information

Features

[Bot]

User-based queues and delayed task processing
Multiple modes to filter access scopes (WL/BL/Both/Admin-only)
Support of accelerated inference on M1 Macs
Memory manager, keeps track of models loaded at the same time and loads/unloads them on demand.

[LLM]

Supports dialog mode casually playing a role described in a character file, keeping chat history with all users in group chats or with each user separately
Character files can be easily localized for any language for non-english models
Assistant mode via /ask command or with direct replies (configurable)
Single-reply short-term memory for assistant feedback
Supports visual question answering, when multimodal-adapter is available

[SD]

CLI-like way to pass stable diffusion parameters
pre-defined prompt wrappers
lora integration with easy syntax: lora_name100 => <lora:lora_name:1.0> and custom lora activators

[TTS]

can be run remotely, or on the same machine
tts output is sent as voice messages
can be used on voice messages (speech and acapella songs) to dub them with a different voice

[STT]

can be activated as a speech recognition tool via /stt command replying to voice messages
if stt_autoreply_mode parameter is not none, it recognizes voice messages and replies to them with LLM and TTS modules

[TTA]

can be used with /sfx and /music commands after adding tta to active_modules

Setup:

copy .env.example file and rename the copy to .env, do NOT add the .env file to your commits!
set up your telegram bot token and other configuration options in .env file
install requirements pip install -r requrements.txt
install optional requirements if you want to use tts and tts_server pip install -r requrements-tts.txt and pip install -r requrements-llm.txt if you want to use llm, you'll probably also need a fresh version of pytorch. For speech-to-text run pip install -r requrements-stt.txt, for text-to-audio run pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft
you can continue configuration in the webui, it has helpful tips about each configuration option
for stable diffusion module, make sure that you have webui installed and it is running with --api flag
for text-to-speech module download VITS models, put their names in tts_voices configuration option and path to their directory in tts_path
for llm module, see LLM Setup section bellow
if you want to use webui + api, run it with python dashboard.py, otherwise run the bot with python bot.py

python3.10+ is recommended, due to aiogram compatibility, if you are experiencing problems with whisper or logging, please update numpy.

Supported language models (tested):

Python/Pytorch backend

original llama (7b version was tested on llama-mps fork for macs), requires running the bot with python3.10 -m torch.distributed.launch --use_env bot.py
assistant mode for original llama is available with LLaMa-Adapter, to use both chat and assistant mode, some changes[1][2] are necessary for non-mac users.
hf llama (tests outdated) + alpaca-lora / ru-turbo-alpaca-lora
gpt-2 (tested on ru-gpt3), nanoGPT (tested on minChatGPT [weights])
gpt-j (tested on a custom model)

C++ / TVM backend

llama.cpp (tested on a lot of models)[models]]
mlc-llm-chat (tested using prebuilt binaries on demo-vicuna-v1-7b-int3 model, M1 GPU acceleration confirmed, integrated via mlc-chatbot)

Remote api backend

oobabooga webui
kobold.cpp with the same remote_ob backend
llama.cpp server with remote_lcpp llm backend option (Obsidian model w/ multimodality tested)

LLM Setup

Make sure that you have enough RAM / vRAM to run models.
Download the weights (and the code if needed) for any large language model
in .env file, make sure that "llm" is in active_modules, then set:
llm_paths - change the path(s) of model(s) that you downloaded
llm_backend - select from pytorch, llama.cpp, mlc_pb, remote_ob, remote_lcpp llm_python_model_type = if you set pytorch in the previous option, set the model type that you want to use, it can be gpt2,gptj,llama_orig, llama_hf and auto_hf.
llm_character = a character of your choice, from characters directory, for example characters.gptj_6B_default, character files also have prompt templates and model configuration options optimal to specific model, feel free to change the character files, edit their personality and use with other models.
llm_assistant_chronicler = a input/output formatter/parser for assistant task, can be instruct or raw, do not change if you do not use mlc_pb.
llm_history_grouping = user to store history with each user separately or chat to store group chat history with all users in that chat
llm_assistant_use_in_chat_mode = True/False when False, use /ask command to ask the model questions without any input history, when True, all messages are treated as questions.
For llama.cpp: make sure that you have a c++ compiler, then put all necessary flags to enable GPU support, and install it pip install llama-cpp-python, download model weights and change the path in llm_paths.
For mlc-llm, follow the installation instructions from the docs, then clone mlc-chatbot, and put 3 paths in llm_paths. Use with llm_assistant_use_in_chat_mode=True and with raw chronicler.
For oobabooga webui and kobold.cpp, instead of specifying llm_paths, set llm_host, set llm_active_model_type to remote_ob and set the llm_character to one that has the same prompt format / preset as your model. Run the server with --api flag.
For llama.cpp c-server, start the ./server, set its URL in llm_host and set llm_active_model_type to remote_lcpp, for multimodality please refer to this thread

Bot commands

Send a message to your bot with the command /tti -h for more info on how to use stable diffusion in the bot, and /tts -h for tts module. The bot uses the same commands as voice names in configuration file for tts. Try /llm command for llm module details. LLM defaults to chat mode for models that support it, assistant can be called with /ask command

License: the code of this project is currently distributed under CC BY-NC-SA 4.0 license, third party libraries might have different licenses.

remixer-dec / botality-ii