AnderMendoza / litellm

Contribution to the Call All LLM APIs project using the OpenAI format. Use Azure, OpenAI, Cohere, Anthropic, Ollama, VLLM, Sagemaker, HuggingFace, Replicate (100+ LLM)

Home Page:https://docs.litellm.ai/docs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸš… LiteLLM

Call all LLM APIs using the OpenAI format [Anthropic, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]

Bug Report Β· Feature Request

Docs 100+ Supported Models Demo Video

LiteLLM manages

  • Translating inputs to the provider's completion and embedding endpoints
  • Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
  • Exception mapping - common exceptions across providers are mapped to the OpenAI exception types

🚨 Seeing errors? Chat on WhatsApp Chat on Discord

05/10/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more

Usage

Open In Colab
pip install litellm
from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-openai-key" 
os.environ["COHERE_API_KEY"] = "your-cohere-key" 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)

Streaming (Docs)

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. Streaming is supported for OpenAI, Azure, Anthropic, Huggingface models

response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
    print(chunk['choices'][0]['delta'])

# claude 2
result = completion('claude-2', messages, stream=True)
for chunk in result:
  print(chunk['choices'][0]['delta'])

Caching (Docs)

LiteLLM supports caching completion() and embedding() calls for all LLMs. Hosted Cache LiteLLM API

import litellm
from litellm.caching import Cache
import os

litellm.cache = Cache()
os.environ['OPENAI_API_KEY'] = ""
# add to cache
response1 = litellm.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "why is LiteLLM amazing?"}], 
    caching=True
)
# returns cached response
response2 = litellm.completion(
    model="gpt-3.5-turbo", 
    messages=[{"role": "user", "content": "why is LiteLLM amazing?"}], 
    caching=True
)

print(f"response1: {response1}")
print(f"response2: {response2}")

OpenAI Proxy Server (Docs)

Spin up a local server to translate openai api calls to any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.)

This works for async + streaming as well.

litellm --model <model_name>

Running your model locally or on a custom endpoint ? Set the --api-base parameter see how

Supported Provider (Docs)

Provider Completion Streaming Async Completion Async Streaming
openai βœ… βœ… βœ… βœ…
cohere βœ… βœ… βœ… βœ…
anthropic βœ… βœ… βœ… βœ…
replicate βœ… βœ… βœ… βœ…
huggingface βœ… βœ… βœ… βœ…
together_ai βœ… βœ… βœ… βœ…
openrouter βœ… βœ… βœ… βœ…
vertex_ai βœ… βœ… βœ… βœ…
palm βœ… βœ… βœ… βœ…
ai21 βœ… βœ… βœ… βœ…
baseten βœ… βœ… βœ… βœ…
azure βœ… βœ… βœ… βœ…
sagemaker βœ… βœ… βœ… βœ…
bedrock βœ… βœ… βœ… βœ…
vllm βœ… βœ… βœ… βœ…
nlp_cloud βœ… βœ… βœ… βœ…
aleph alpha βœ… βœ… βœ… βœ…
petals βœ… βœ… βœ… βœ…
ollama βœ… βœ… βœ… βœ…
deepinfra βœ… βœ… βœ… βœ…

Read the Docs

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Step 1: Clone the repo

git clone https://github.com/BerriAI/litellm.git

Step 2: Navigate into the project, and install dependencies:

cd litellm
poetry install

Step 3: Test your change:

cd litellm/tests # pwd: Documents/litellm/litellm/tests
pytest .

Step 4: Submit a PR with your changes! πŸš€

  • push your fork to your GitHub repo
  • submit a PR from there

Learn more on how to make a PR

Support / talk with founders

Why did we build this

  • Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI, Cohere

Contributors

About

Contribution to the Call All LLM APIs project using the OpenAI format. Use Azure, OpenAI, Cohere, Anthropic, Ollama, VLLM, Sagemaker, HuggingFace, Replicate (100+ LLM)

https://docs.litellm.ai/docs/

License:MIT License


Languages

Language:Python 100.0%