Cliobot - multimodal generative AI bot for chat platforms

Cliobot is a modular bot platform for generative AI agents. It's goal is to provide a simple, easy to use and extendable platform for running generative AI agents that can handle audio, video, text and images, on any chat platform.

It can be easily extend it to use multiple APIs and services, from Stable Diffusion to OpenAI, and you can run it on your own device or deploy it online.

It comes with Telegram support and multiuser handling out of the box, and minimal dependencies.

Important: This repo is a work in progress - I'm porting over code from a startup I was working on, so it's still a bit rough and subject to multiple rewrites.

The Basic

Cliobot has two main working modes: command mode or LLM mode

In command mode, you interact by using slash commands (messages starting with a /). It comes with a set default of slash commands and you can easily create yor own.

[WIP] In LLM mode, the bot works like chatgpt & other multimodal chatbots out there: it follows a configurable system prompt that defines its core behavior and can use functions to perform actions (including executing code or browsing the web).

Notice both modes use the same command definitions, so the only difference between them is a tradeoff between more natural language interpretation versus cost (since running GPT4 & other models can get expensive quickly).

Running the bot

Install all dependencies with:

poetry install

Rename config.example.yml to config.yml and set the appropriate variables you want.
Run the bot using the following command:

poetry run python fullbot.py

The codebase includes other examples, such as a simple chat-only bot that uses OpenAI's API to respond to messages (chatbot.py). Documentation for these examples is still a work in progress.

Running tests

poetry run pytest

Built-in commands

Cliobot comes with a set of built-in commands that you can use out of the box. You can also easily add your own!

/image

Generates an image from a text prompt.

Built-in implementations: DALL-E 3, any image model hosted on Replicate.com.

/describe [WIP]

Describe an image using text.

Built-in implementations: OpenAI GPT4V, Ollama (Llava, etc), any image to text model hosted on Replicate.com.

/transcribe

Transcribes an audio file into text.

Built-in implementations: OpenAI Whisper-1

/ask

Ask a question to an LLM agent. This doesn't take any conversation context.

Built-in implementations: GPT-4 or any model supported by Ollama running in server mode, any LLM hosted on Replicate.com.

/chat [WIP]

Chat with an LLM agent, including a backlog of context

Command syntax

Cliobot uses a simple prompt parsing system (common across apps such as Midjourney & others). It's based on the following format:

/<command> <text prompt>? [--<param_name> <value>]+

Each command handler is defined as a pydantic model, and the parameters are automatically parsed and validated.

When a certain command requires multiple inputs, such as image, the bot will ask for them in sequence, then run the command after you provide all the inputs.

An example of a command using the default dalle3 image generation command would be as follows:

/image a giant hamster in space --size 1024x1024 --model dalle3

Installing

Running a bot locally is simple:

Clone this repo
Setup the python env
Rename config.example.yml to config.yml and set the appropriate variables you want.
Install all dependencies with:

python -m venv create venv
source venv/bin/activate
pip install -r requirements.txt

Running Cliobot on Telegram

The bare minimum you'll need is an API Token for a Telegram bot. Please refer to the official documentation for how to obtain an API token. It should look like this: 4839574812:AAFD39kkdpWt3ywyRZergyOLMaJhac60qc

Once you get a token, change your config.yml to include the following session:

bot:
  platform: telegram
  token: "1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"

Then, run the bot using the following command:

source venv/bin/activate
python app.py

Running with Docker

If you don't have Python on your system or just prefer to keep things simple, you can run Cliobot using Docker too:

docker build -t cliobot .
docker run -it --rm -v $(pwd)/data:/content/data -v $(pwd)/.env:/content/.env -v $(pwd)/config.yml:/content/config.yml cliobot

Using Automatic1111 WebUI as a backend

You can plug in Automatic1111 WebUI and use it as a backend for image generation! To do so, you'll need to set the following variables on your config.yml:

webui:
    endpoint: http://localhost:7860
    auth: user:pass

Notice you'll need to start webui with the --api flag. The auth field is optional (you can leave it blank if you don't use API authentication). For more information on how to use the API, please refer to the official documentation.

Supported operations

You can use any Stable Diffusion model that's installed along webui with the /image command. The following is an example using all the supported parameters:

/image a hamster in space --negative cartoon, drawing, illustration --model sdxl1.0 --steps 20 --sampler 'DPM++ 2M SDE' --cfg 7 --seed 1234 --steps 50 --width 1024 --height 1024 --batchcount 1 --batchsize 4

Configuring OpenAI

To use OpenAI models (gpt, dalle3, whisper, etc), include the following in your config.yml:

openai:
  endpoints:
    - api_key: sk-....
      api_type: open_ai
      base_url: https://api.openai.com/v1/

    - api_key: xxx
      api_type: azure
      api_version: 2023-10-01-preview
      base_url: https://xxx.openai.azure.com
      model: gpt4
      kind: gpt-4

    - api_key: xxx
      api_type: azure
      api_version: 2
      base_url: https://xxx.openai.azure.com
      model: embeddings
      kind: embeddings

    - api_key: xxx
      api_type: azure
      api_version: 2023-12-01-preview
      base_url: https://xxx.openai.azure.com
      model: dalle3
      kind: dall-e-3

    - api_key: xxx
      api_type: azure
      api_version: 2023-12-01-preview
      base_url: https://xxx.openai.azure.com
      model: whisper1
      kind: whisper-1

Notice that for Azure deployments, you'll need to set one entry per model kind (dall-e-3, whisper-1, embeddings, gpt-4). The API key can be the same for all of them.

Configuring Ollama

In order to use any LLM via Ollama, simply include the following in your config.yml:

ollama:
  endpoint: http://localhost:11434
  models:
    - llama2

Each model on the models list will be exposed as a model on the bot. You can then use it by using the /ask command:

/ask what's the meaning of life? --model llama2

Configuring Replicate

You can use any model hosted on Replicate by mapping it out on your config.yml. The mapping is a bit more involved than other models, since you need to map out each parameter. Here's a complete example using SDXL hosted on Replicate:

replicate:
  api_token: xxx
  endpoints:
    - model: 'sdxl'
      kind: 'image'
      version: 'stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b'
      params:
        prompt:
          kind: str
          required: true
        negative_prompt:
          alias: no
          kind: str
        width:
          kind: int
          default: 1024
        height:
          kind: int
          default: 1024
        num_outputs:
          alias: num
          kind: int
          default: 1
        num_inference_steps:
          alias: steps
          kind: int
          default: 25
        guidance_scale:
            alias: cfg
            kind: float
            default: 7.5
        prompt_strength:
            alias: ps
            kind: float
            default: 0.8
        seed:
            kind: int
        apply_watermark:
            alias: watermark
            kind: bool
            default: true
        scheduler:
            kind: str
            default: 'KarrasDPM'
        refine:
            kind: str
            alias: refiner
            default: 'no_refiner'
            value_map:
              no: no_refiner
              expert: expert_ensemble_refiner
              base: base_image_refiner
        refine_steps:
            kind: int
            alias: rs

With the above config, you'll be able to generate images using the following command:

/image photo of a giant hamster in space --model sdxl --no illustration, cartoon, drawing --width 1280 --num 4 --steps 50 --rs 8 --refiner expert

Notice the parameter names on your slash command will match the param name on the config, or an optional alias. This allows you to use shorter parameter names on your commands (eg typing out --no instead of --negative_prompt).

Built-in extensions

These are all deactivated by default, but easily enabled:

Sentry.io support for error reporting/tracking
Automatic message translation using Google Translate API
Utilization metrics using MixPanel
S3 for file storage

Features

OpenAI API support for DALL-E, GPT-3, GPT-4 and Whisper, including Azure support and multiple API keys
Ollama support for any LLM model (including image to text)
Support for any model hosted on Replicate.com
Multiuser support
File storage support (local & S3)
Automatic message translation using Google Translate API
Persistent preferences to reduce repetitive prompt parameters

Running on K8s

TODO

Writing plugins

TODO

Planned features

Discord integration
Whatsapp integration
Stable Diffusion
StableHorde processing

TODO

RAG mode
chat history
Finish the LLM mode
save generated images to storage
save uploads
i18n support
img2txt commands
llama implementation

herval / cliobot

Cliobot - multimodal generative AI bot for chat platforms

The Basic

Running the bot

Running tests

Built-in commands

/image

/describe [WIP]

/transcribe

/ask

/chat [WIP]

Command syntax

Installing

Running Cliobot on Telegram

Running with Docker

Using Automatic1111 WebUI as a backend

Supported operations

Configuring OpenAI

Configuring Ollama

Configuring Replicate

Built-in extensions

Features

Running on K8s

Writing plugins

Planned features

TODO

About

Languages