lpurdy01 / simpleAI

An easy way to host your own AI API and expose alternative models, while being compatible with "open" AI clients.

Home Page:https://pypi.org/project/simple-ai-server/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


A self-hosted alternative to the not-so-open AI API. It is focused on replicating the main endpoints for LLM:

  • Text completion (/completions/)
    • ✔️ Non stream responses
    • ✔️ stream responses
  • Chat (/chat/completions/) [ example ]
    • ✔️ Non stream responses
    • ✔️ stream responses
  • Edits (/edits/) [ example ]
  • Embeddings (/embeddings/) [ example ]
  • Not supported (yet): images, audio, files, fine-tunes, moderations

It allows you to experiment with competing approaches quickly and easily.


Why this project?

Well first of all it's a fun little project, and perhaps a better use of my time than watching some random dog videos on Reddit or YouTube. I also believe it can be a great way to:

  • experiment with new models and not be too dependent on a specific API provider,
  • create benchmarks to decide which approach works best for you,
  • handle some specific use cases where you cannot fully rely on an external service, without the need of re-writing everything

If you find interesting use cases, feel free to share your experience.


On a machine with Python 3.9+:

  • [Latest] From source:
pip install git+https://github.com/lhenault/simpleAI 
  • From Pypi:
pip install simple_ai_server


Start by creating a configuration file to declare your models:

simple_ai init

It should create models.toml, where you declare your different models (see how below). Then start the server with:

simple_ai serve [--host] [--port 8080]

You can then see the docs and try it there.

Integrating and declaring a model

Model integration

Models are queried through gRPC, in order to separate the API itself from the model inference, and to support several languages beyond Python through this protocol.

To expose for instance an embedding model in Python, you simply have to import a few things, and implements the .embed() method of your EmbeddingModel class:

import logging
from dataclasses import dataclass

from simple_ai.api.grpc.embedding.server import serve, LanguageModelServicer

class EmbeddingModel:
    def embed(self, 
        inputs: list=[],
    ) -> list:
        # TODO : implements the embed method
        return [[]]

if __name__ == '__main__':   
    model_servicer = LanguageModelServicer(model=EmbeddingModel())
    serve(address='[::]:50051', model_servicer=model_servicer)

For a completion task, follow the same logic, but import from simple_ai.api.grpc.completion.server instead, and implements a complete method.

Declaring a model

To add a model, you first need to deploy a gRPC service (using the provided .proto file and / or the tools provided in src/api/). Once your model is live, you only have to add it to the models.toml configuration file. For instance, let's say you've locally deployed a llama.cpp model available on port 50051, just add:

        owned_by    = 'Meta / ggerganov'
        permission  = []
        description = 'C++ implementation of LlaMA model, 7B parameters, 4-bit quantization'
        url = 'localhost:50051'
        type = 'gRPC'

You can see see and try of the provided examples in examples/ directory (might require GPU).


Thanks to the Swagger UI, you can see and try the different endpoints here:

Example query with cUrl

Or you can directly use the API with the tool of your choice.

curl -X 'POST' \
  '' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "alpaca-lora-7B",
  "instruction": "Make this message nicer and more formal",
  "input": "This meeting was useless and should have been a bloody email",
  "top_p": 1,
  "n": 1,
  "temperature": 1,
  "max_tokens": 256

It's also compatible with OpenAI python client:

import openai

# Put anything you want in `API key`
openai.api_key = 'Free the models'

# Point to your own url
openai.api_base = ""

# Do your usual things, for instance a completion query:
completion = openai.Completion.create(model="llama-7B", prompt="Hello everyone this is")

Common issues and solutions

Adding a CORS middleware

If you encounter CORS issues, it is suggested to not use the simple_ai serve command, but to rather use your own script to add your CORS configuration, using the FastAPI CORS middleware.

For instance you can create my_server.py with:

from simple_ai.server import app
from fastapi.middleware.cors import CORSMiddleware

def add_cors(app):
    origins = [
    return app

def serve_app(host="", port=8080, **kwargs):
    app = add_cors(app)
    uvicorn.run(app=app, host=host, port=port)
if __name__ == "__main__":
    serve_app(host="", port=8080)

And run it as python3 my_server.py instead.

Router and needing /v1 prefix in the endpoints

Some projects have decided to include the /v1 prefix as part of the endpoints, while OpenAI client includes it in its api_base parameter. If you need to have it as part of the endpoints for your project, you can use FastAPI's APIRouter (see here) in a custom script:

from simple_ai.server import app
from fastapi import APIRouter

def add_router(app):
    router = APIRouter(prefix="/v1")
    return app

def serve_app(host="", port=8080, **kwargs):
    app = add_router(app)
    uvicorn.run(app=app, host=host, port=port)
if __name__ == "__main__":
    serve_app(host="", port=8080)


This is very much work in progress and far from being perfect, so let me know if you want to help. PR, issues, documentation, cool logo, all the usual candidates are welcome.

Development Environment

In order for the following steps to work it is required to have make and poetry installed on your system.

To install the development environment run:

make install-dev 

This will install all dev dependencies as well as configure your pre-commit helpers.


An easy way to host your own AI API and expose alternative models, while being compatible with "open" AI clients.


License:MIT License


Language:Python 99.8%Language:Makefile 0.2%