zhudotexe / kani-ratelimits

A simple ratelimit engine wrapper for kani. Supports RPM, TPM, and max_concurrency limits.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kani-ratelimits

This is a simple, small package to to enforce request-per-minute (RPM), token-per-minute (TPM), and/or max-concurrency ratelimits before making requests to an underlying engine.

Installation

pip install kani-ratelimits

Usage

from kani.ext.ratelimits import RatelimitedEngine

# limit requests to 10 req/min and 30k tokens/min
inner_engine = ...  # your engine here, e.g. `OpenAIEngine(..., model="gpt-4")`
engine = RatelimitedEngine(inner_engine, rpm_limit=10, tpm_limit=30_000)

The RatelimitedEngine takes the following parameters:

  • engine: The engine to wrap.
  • max_concurrency (int): The maximum number of concurrent requests to serve at once (default unlimited).
  • rpm_limit (float): The maximum number of requests to serve per rpm_period (default unlimited).
  • rpm_period (float): The duration, in seconds, of the time period in which to limit the rate. Note that up to rpm_limit requests are allowed within this time period in a burst (default 60s).
  • tpm_limit (float): The maximum number of tokens to send in requests per tpm_period (default unlimited).
  • tpm_period (float): The duration, in seconds, of the time period in which to limit the rate. Note that up to tpm_limit tokens are allowed within this time period in a burst (default 60s).

The ratelimiter will ensure that all conditions are met before forwarding the request to the wrapped engine.

About

A simple ratelimit engine wrapper for kani. Supports RPM, TPM, and max_concurrency limits.

License:MIT License


Languages

Language:Python 100.0%