[WIP] OpenAI.mini

This repo implements OpenAI APIs with open source models, for example, open source LLMs for chat, Whisper for audio, SDXL for image and so on. With this repo, you can interact with LLMs using the openai libraries or the LangChain library.

Frontend

How to use

1. Install dependencies

pip3 install -r requirements.txt
pip3 install -r app/requirements.txt

If you have make, you can also run with

make install

2. Get frontend

You may build the frontend with yarn yourself, or just download the built package from the release.

Option 1: install the dependencies and build frontend

cd app/frontend
yarn install
yarn build

Option 2: download it from release

Download the dist.tar.gz from release page.

Extract the dist directory and put it in the app/frontend directory

Please make sure that the directory layout should be like this:

┌─hjmao in ~/workspace/openai.mini/app/frontend/dist
└─± ls
asset-manifest.json  assets  favicon.ico  index.html  manifest.json  robots.txt  static

3. Set the environment variables, and modify it.

cp .env.example .env
# Modify the `.env` file

4. Download the model weight manually (Optional)

If you have already downloaded the weight files, or you want to manage the model weights in some place, you can specify a MODEL_HUB_PATH in the .env and put the weight files in it. MODEL_HUB_PATH is set to hub by default. OpenAI.mini will first find the model weight in MODEL_HUB_PATH, if it does not exist in it, it will automatically download the weight files from Huggingface by the model name. The MODEL_HUB_PATH directory will be like this

MODEL_HUB_PATH directory layout

┌─hjmao at 573761 in ~/workspace/openai.mini/hub
└─○ tree -L 2
.
├── baichuan-inc
│   ├── Baichuan-13B-Base
│   ├── Baichuan-13B-Chat
│   └── Baichuan2-13B-Chat
├── intfloat
│   ├── e5-large-v2
│   └── multilingual-e5-large
├── meta-llama
│   ├── Llama-2-13b-chat-hf
│   └── Llama-2-7b-chat-hf
├── openai
│   ├── base.pt
│   ├── large-v2.pt
│   ├── medium.en.pt
│   ├── medium.pt
│   ├── small.pt
│   └── tiny.pt
├── Qwen
│   └── Qwen-7B-Chat
├── stabilityai
│   ├── FreeWilly2
│   ├── stable-diffusion-xl-base-0.9
│   └── stable-diffusion-xl-base-1.0
├── thenlper
│   └── gte-large
└── THUDM
    ├── chatglm2-6b
    ├── chatglm3-6b
    └── codegeex2-6b

Notice: the models can be loadded on startup or on the fly.

5. Start server with OpenAI.mini

python3 -m src.api
python3 -m app.server

6. Access the OpenAI.mini services

OpenAI.mini have implemented most APIs of the OpenAI platform and also a ChatGPT-like web frontend. You may access the OpenAI.mini services with the openai libraries or chat with the models in the web frontend.

Access as a openai service: You can use openai packages or the Langchain library to access it by setting the openai.api_base="YOUR_OWN_IP:8000/api/v1" and `openai.api_key="none_or_any_other_string". Find more detail examples here.
Access as a ChatGPT: You can open it with your web browser with http://YOUR_OWN_IP:8001/index.html?model=MODEL_NAME.

OpenAI API Status

Services	API	Status	Description
Authorization
Models	List models	✅ Done
Models	Retrieve model	✅ Done
Chat	Create chat completion	Partial Done	Support Multi. LLMs
Completions	Create completion
Images	Create image	✅ Done
Images	Create image edit
Images	Create image variation
Embeddings	Create embeddings	✅ Done	Support Multi. LLMs
Audio	Create transcription	✅ Done
Audio	Create translation	✅ Done
Files	List files	✅ Done
Files	Upload file	✅ Done
Files	Delete file	✅ Done
Files	Retrieve file	✅ Done
Files	Retrieve file content	✅ Done
Fine-tuning	Create fine-tuning job
Fine-tuning	Retrieve fine-tuning job
Fine-tuning	Cancel fine-tuning
Fine-tuning	List fine-tuning events
Moderations	Create moderation
Edits	Create edit

Supported Language Models

Model	#Params	Checkpoint link
FreeWilly2	70B	stabilityai/FreeWilly2
Baichuan2-13B-Chat	13B	baichuan-inc/Baichuan2-13B-Chat
Baichuan-13B-Chat	13B	baichuan-inc/Baichuan-13B-Chat
Llama-2-13b-chat-hf	13B	meta-llama/Llama-2-13b-chat-hf
Llama-2-7b-chat-hf	7B	meta-llama/Llama-2-7b-chat-hf
Qwen-7B-Chat	7B	Qwen/Qwen-7B-Chat
internlm-chat-7b	7B	internlm/internlm-chat-7b
chatglm3-6b	6B	THUDM/chatglm3-6b
chatglm2-6b	6B	THUDM/chatglm2-6b
chatglm-6b	6B	THUDM/chatglm-6b

Supported Embedding Models

Model	Embedding Dim.	Sequnce Length	Checkpoint link
bge-large-zh	1024	?	BAAI/bge-large-zh
m3e-large	1024	?	moka-ai/m3e-large
text2vec-large-chinese	1024	?	GanymedeNil/text2vec-large-chinese
gte-large	1024	512	thenlper/gte-large
e5-large-v2	1024	512	intfloat/e5-large-v2

Supported Diffusion Modles

Model	#Resp Format	Checkpoint link
stable-diffusion-xl-base-1.0	b64_json, url	stabilityai/stable-diffusion-xl-base-1.0
stable-diffusion-xl-base-0.9	b64_json, url	stabilityai/stable-diffusion-xl-base-0.9

Supported Audio Models

Model	#Params	Checkpoint link
whisper-1	1550	alias for whisper-large-v2
whisper-large-v2	1550 M	large-v2
whisper-medium	769 M	medium
whisper-small	244 M	small
whisper-base	74 M	base
whisper-tiny	39 M	tiny

Example Code

Stream Chat

import openai

openai.api_base = "http://localhost:8000/api/v1"
openai.api_key = "none"

for chunk in openai.ChatCompletion.create(
    model="Baichuan2-13B-Chat",
    messages=[{"role": "user", "content": "Which moutain is the second highest one in the world?"}],
    stream=True
):
    if hasattr(chunk.choices[0].delta, "content"):
        print(chunk.choices[0].delta.content, end="", flush=True)

Chat

import openai

openai.api_base = "http://localhost:8000/api/v1"
openai.api_key = "none"

resp = openai.ChatCompletion.create(
    model="Baichuan2-13B-Chat",
    messages = [{ "role":"user", "content": "Which moutain is the second highest one in the world?" }]
)
print(resp.choices[0].message.content)

Create Embeddings

import openai

openai.api_base = "http://localhost:8000/api/v1"
openai.api_key = "none"

embeddings = openai.Embedding.create(
  model="gte-large",
  input="The food was delicious and the waiter..."
)

print(embeddings)

List LLM Models

import os
import openai

openai.api_base = "http://localhost:8000/api/v1"
openai.api_key = "none"

openai.Model.list()

Create Image

import os
import openai
from base64 import b64decode
from IPython.display import Image

openai.api_base = "http://localhost:8000/api/v1"
openai.api_key = "none"

response = openai.Image.create(
  prompt="An astronaut riding a green horse",
  n=1,
  size="1024x1024",
  response_format='b64_json'
)

b64_json = response['data'][0]['b64_json']
image = b64decode(b64_json)
Image(image)

Create Transcription

# Cell 1: set openai
import openai

openai.api_base = "http://localhost:8000/api/v1"
openai.api_key = "None"

# Cell 2: create a recorder in notebook
# ===================================================
# sudo apt install ffmpeg
# pip install torchaudio ipywebrtc notebook
# jupyter nbextension enable --py widgetsnbextension

from IPython.display import Audio
from ipywebrtc import AudioRecorder, CameraStream

camera = CameraStream(constraints={'audio': True,'video':False})
recorder = AudioRecorder(stream=camera)
recorder

# Cell 3: transcribe
import os
import openai

temp_file = '/tmp/recording.webm'
with open(temp_file, 'wb') as f:
    f.write(recorder.audio.value)
audio_file = open(temp_file, "rb")

transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript.text)

Acknowledgement

项目参考了很多大佬的代码，例如 @xusenlinzy 大佬的api-for-open-llm, @hiyouga 大佬的LLaMA-Efficient-Tuning 等，表示感谢。

llmapp / openai.mini