superopenai: logging and caching superpowers for the openai sdk

superopenai is a minimal convenience library for logging and caching LLM requests and responses for visibility and rapid iteration during development.

Star us on Github Read the blogpost

Introduction

superopenai was built to solve the following problems:

Prompt and request visibility — LLM libraries like langchain, guardrails, instructor modify your prompts or even make additional requests under the hood. Sometimes this is useful, sometimes it's counter-productive. We think it's good to adopt a "show me the prompt" attitude.

Debugging — Complex chains or agents go wrong because of cascading failure. To debug the failure we need to inspect intermediate results and identify the source of error, then improve the prompt, try a different model, etc. It starts with logging and eyeballing the sequence of requests.

Quality-cost-speed tradeoff — Better models, multi-step prompting, longer prompts can improve quality of results but at the expense of cost and speed. View cost, token usage and latency in your logs as you’re developing so you can make this tradeoff ex ante not ex post.

Repeated identical requests — LLM development is slow because often you change a single prompt or parameter and have to wait for every LLM request to re-run. Can’t iterate quickly if you need to wait that long between experiments. Superopenai caches the last N LLM requests and responses in memory (when temperature=0)

Privacy, security and speed. Superopenai focuses on local development when you’re just starting a new project, iterating on prompts or playing around. No need to sign up, get api keys, or learn a new interface. Your data never leaves your environment.

Installation & basic usage

Run pip install superopenai or poetry add superopenai

To initialize superopenai, before initializing your openai client, run

from superopenai import init_superopenai

init_superopenai()

This will monkey-patch the relevant functions in the OpenAI class. Then you can use openai library as usual with all the superpowers of superopenai

Basic logging example

from openai import OpenAI
from superopenai import init_logger, init_superopenai

init_superopenai()
client = OpenAI()

with init_logger() as logger:
  client.chat.completions.create(
    model="gpt-4-1106-preview",
    messages=[
      {"role": "user", "content": "What's the capital of France?"}
    ])
  for log in logger.logs:
    print(log)

Expand to see output

+-----------+----------------------------------------------+
| Messages  | - user: What's the capital of France?        |
+-----------+----------------------------------------------+
| Arguments | - model: gpt-4-1106-preview                  |
+-----------+----------------------------------------------+
| Output    | - assistant: The capital of France is Paris. |
+-----------+----------------------------------------------+
| Metadata  | - Cost: $0.00035                             |
|           | - Prompt tokens: 14                          |
|           | - Completion tokens: 7                       |
|           | - Total tokens: 21                           |
|           | - Start time: 1709914488.7480488             |
|           | - Latency: 0.7773971557617188                |
+-----------+----------------------------------------------+
| Cached    | False                                        |
+-----------+----------------------------------------------+

You can also avoid the context manager and directly manage starting and stopping loggers.

logger = init_logger()
client.chat.completions.create(
  model="gpt-4-1106-preview",
  messages=[
  {"role": "user", "content": "What's the capital of France?"}
  ])
for log in logger.logs:
  print(log)
logger.end()

Expand to see output

+-----------+----------------------------------------------+
| Messages  | - user: What's the capital of France?        |
+-----------+----------------------------------------------+
| Arguments | - model: gpt-4-1106-preview                  |
+-----------+----------------------------------------------+
| Output    | - assistant: The capital of France is Paris. |
+-----------+----------------------------------------------+
| Metadata  | - Cost: $0.00035                             |
|           | - Prompt tokens: 14                          |
|           | - Completion tokens: 7                       |
|           | - Total tokens: 21                           |
|           | - Start time: 1709914489.536513              |
|           | - Latency: 3.981590270996094e-05             |
+-----------+----------------------------------------------+
| Cached    | True                                         |
+-----------+----------------------------------------------+

Notice the second request's latency is almost 0 and Cached is True

Logging

superopenai wraps the OpenAI.chat.completions.create and AsyncOpenAI.chat.completions.create functions and stores logs into a superopenai.Logger object. The following fields are captured and logged:

Basic logging

To start logging, call init_logger() either as a context manager with init_logger() as logger or as a simple function call. If not using a context manager, make sure to called logger.end().

Every openai chat completion request will not be logged and logs will be stored in logger.logs. Each log is a ChatCompletionLog object containing the following fields:

input_messages: a list of input prompts
input_args: an object containing request arguments (model, streaming, temperature, etc.)
output: a list of outputs (completion responses) produced by the LLM request
metadata: metadata about the request
cached: whether the response was returned from cache

By default all logs are stored in the logs folder in your project root. A new logfile is created for every day, so today's logs will be stored in ./logs/2024-03-08.log. You can change the log directory when calling init_logger:

with init_logger("/path/to/log/dir") as logger:
  # your code

Token usage, cost and latency

Inside the metadata field of each log you will find information about how many prompt and completions tokens were used, what the total cost was and the latency, ie. time between request being sent and response being received.

Cost is calculated based on prompt and completion token prices tokens defined in estimator.py. Only OpenAI models have pre-defined prices. If you're using non-OpenAI models, you can optionally specify a price dictionary when initializing superopenai. Prices are specified per 1M tokens in a tuple representing prompt and completion tokens respectively.

init_superopenai(cost_dict={
  'mistralai/Mixtral-8x7B-Instruct-v0.1': [0.5, 1.0]
})

Streaming and async

Logging works in streaming mode (setting stream=True in the chat completion request) as well as when using the async chat completion api.

In streaming mode, the output is a list of streamed chunks rather than a list of completion responses. All other fields are the same. The log object is a StreamingChatCompletionLog object.

Function Calling and Tools

superopenai works out of the box when using function calling or tools. The functions called and their arguments will be captured and printed in the output field. This works in streaming mode too.

Statistics

When you run a chain or agent with multiple LLM calls, it's useful to look at summary statistics over all the calls rather than individual ones.

To look at summary statistics, call logger.summary_statistics()

with init_logger() as logger:
  client.chat.completions.create(
    model="gpt-4-1106-preview",
    messages=[
      {"role": "user", "content": "What's the capital of France?"}
    ]
  )
  print(logger.summary_statistics())

Expand to see output

+----------------------------+----------------------------+
|      Number of Calls       |             1              |
|       Number Cached        |             1              |
+----------------------------+----------------------------+
|            Cost            |          $0.00035          |
+----------------------------+----------------------------+
|       Prompt Tokens        |             14             |
|     Completion Tokens      |             7              |
|        Total Tokens        |             21             |
+----------------------------+----------------------------+
|   Prompt Tokens by Model   | {'gpt-4-1106-preview': 14} |
| Completion Tokens by Model | {'gpt-4-1106-preview': 7}  |
|   Total Tokens by Model    | {'gpt-4-1106-preview': 21} |
+----------------------------+----------------------------+
|       Total Latency        |   3.981590270996094e-05    |
|      Average Latency       |   3.981590270996094e-05    |
|  Average Latency (Cached)  |   3.981590270996094e-05    |
| Average Latency (Uncached) |             0              |
+----------------------------+----------------------------+

Caching

superopenai caches all requests in-memory using cachetools and returns the cached response next time if all request parameters are exactly the same and the same OpenAI client is used.

Caching is automatically enabled when you called init_superopenai and applies both to regular chat.completion.create and async chat.completion.create requests. It works in both streaming and regular mode.

You can disable caching or change the cache size (default 1000) when initializating superopenai:

init_superopenai(enable_caching=True, cache_size=100)

Using with langchain, etc.

superopenai is fully compatible with langchain, llama-index, instructor, guidance, DSpy and most other third party libraries.

This is particularly useful when you're doing local development with langchain and want to quickly inspect your chain runs, or understand what requests were made under the hood. For example:

from langchain.prompts import PromptTemplate
from langchain_experimental.smart_llm import SmartLLMChain
from langchain_openai import ChatOpenAI

hard_question = "I have a 12 liter jug and a 6 liter jug.\
I want to measure 6 liters. How do I do it?"
prompt = PromptTemplate.from_template(hard_question)
llm = ChatOpenAI(temperature=0.8, model_name="gpt-3.5-turbo")

with init_logger() as logger:
  chain = SmartLLMChain(llm=llm, prompt=prompt,
                      n_ideas=2,
                      verbose=True)
  result = chain.run({})

print(logger.summary_statistics())

Output:

+----------------------------+-------------------------+
|      Number of Calls       |            4            |
|       Number Cached        |            0            |
+----------------------------+-------------------------+
|            Cost            |        $0.001318        |
+----------------------------+-------------------------+
|       Prompt Tokens        |           1094          |
|     Completion Tokens      |           514           |
|        Total Tokens        |           1608          |
+----------------------------+-------------------------+
|   Prompt Tokens by Model   | {'gpt-3.5-turbo': 1094} |
| Completion Tokens by Model |  {'gpt-3.5-turbo': 514} |
|   Total Tokens by Model    | {'gpt-3.5-turbo': 1608} |
+----------------------------+-------------------------+
|       Total Latency        |    10.062347888946533   |
|      Average Latency       |    2.5155869722366333   |
|  Average Latency (Cached)  |            0            |
| Average Latency (Uncached) |    2.5155869722366333   |
+----------------------------+-------------------------+

Future work

Port to TypeScript
Simplifying retries
Tracing
Disk and remote caching
Thread-safe caching
Integrate with 3rd party hosted logging services

Contributing

superopenai is free, open-source, and licensed under the MIT license. We welcome contributions from the community. You can always contribute by giving us a star :)

License

superopenai is released under the MIT License. See the LICENSE file for more details.

villagecomputing / superopenai

superopenai: logging and caching superpowers for the openai sdk

Introduction

Installation & basic usage

Logging

Caching

Using with langchain, etc.

Future work

Contributing

License

About

Languages