drewskidang / transformers-openai-api

An OpenAI Completions API compatible server for NLP transformers models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

transformers-openai-api

transformers-openai-api is a server for hosting locally running NLP transformers models via the OpenAI Completions API. In short, you can run transformers models and offer them through an API compatible with existing OpenAI tooling such as the OpenAI Python Client itself or any package that uses it (e.g. LangChain).

Quickstart

From pip

pip install transformers-openai-api
wget https://raw.githubusercontent.com/jquesnelle/transformers-openai-api/master/config.example.json
mv config.example.json config.json
transformers-openai-api

From source

git clone https://github.com/jquesnelle/transformers-openai-api
cd transformers-openai-api
cp config.example.json config.json
pip install -r requirements.txt
python transformers_openai_api/

Using with OpenAI Python Client

Simply set the environment variable OPENAI_API_BASE to http://HOST:PORT/v1 before importing the openai package. For example, to access a local instance of transformers-openai-api, set OPENAI_API_BASE to http://127.0.0.1:5000/v1. Alternatively, you can set the api_base property on the openai object:

import openai
openai.api_base = 'http://HOST:PORT/v1'

Configuration

All configuration is managed through config.json. By default transformers-openai-api looks for this file the in the current working directory, however a different path can be passed as the command-line argument to the program. See config.example.json.

Hosting

By default the API server listens on 127.0.0.1:5000 to change this, add a HOST and/or PORT entries to the configuration file. For example to serve publicly:

{
    "HOST": "0.0.0.0",
    "PORT": 80
}

Models

The MODELS object handles mapping an OpenAI model name to a transformers model configuration. The structure of a model configuration is:

Key Description
ENABLED Boolean value to disable a model
TYPE Either "Seq2Seq" or "CausalLM"
MODEL_CONFIG Parameters for model creation; passed to AutoModelForTYPE.from_pretrained
MODEL_DEVICE Convert model to this device; passed to to called on the created model (default cuda)
TOKENIZER_CONFIG Parameters for tokenizer creation; passed to AutoTokenizer.from_pretrained
TOKENIZER_DEVICE Convert tokens to this device; passed to to called on the tokenized input (default cuda)
GENERATE_CONFIG Parameters for generation; passed to the model's generate function
DECODE_CONFIG Parameters for decoding; passed to the tokenizer's decode function

Using accelerate

To use accelerate, set device_map on the MODEL_CONFIG to auto and explicitly set MODEL_DEVICE to null. The default text-davinci-003 model in config.example.json is an example of this.

Using CPU

To switch to CPU inference, set MODEL_DEVICE and TOKENIZER_DEVICE to cpu.

Using FP16

To use a model at half-precision, set torch_dtype on the MODEL_CONFIG to torch_dtype. The disabled text-curie-001 model in config.example.json is an example of this.

Authorization

To limit access to the API (i.e. enforcing OPENAI_API_KEY), fill in the BEARER_TOKENS object with a list of authorized tokens (e.g. your OpenAI key). If the BEARER_TOKENS list does not exist, no authorization will be enforced.

{
    "BEARER_TOKENS": ["sk-..."]
}

About

An OpenAI Completions API compatible server for NLP transformers models

License:MIT License


Languages

Language:Python 100.0%