This project is to demo the load balancing with Litellm among different LLMs from Amazon Bedrock.
pip install requirements.txt
Ensure you have at least one AWS_PROFILE set up. Refer to here if you do not know what is AWS Profile.
Then you can set AWS_PROFILE_A
and AWS_PROFILE_B
accordingly as
export AWS_PROFILE_A=my-profile
python main.py
example output
IIntialized router with Routing strategy: simple-shuffle
Routing fallbacks: None
Routing context window fallbacks: None
Router Redis Caching=None
{'id': 'chatcmpl-257ad25f-8e04-4002-a1dd-a399b2f858b5', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': "I'm doing well, thanks for asking! I'm Claude, an AI assistant created by Anthropic.", 'role': 'assistant', 'tool_calls': []}}], 'created': 1718690987, 'model': 'anthropic.claude-3-sonnet-20240229-v1:0', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'prompt_tokens': 14, 'completion_tokens': 25, 'total_tokens': 39}}
{'id': 'chatcmpl-f54b70b6-0a3b-4808-969c-bf1fbfbb08e9', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': "I'm doing well, thanks for asking!", 'role': 'assistant', 'tool_calls': []}}], 'created': 1718690989, 'model': 'anthropic.claude-instant-v1', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'prompt_tokens': 16, 'completion_tokens': 13, 'total_tokens': 29}}
{'id': 'chatcmpl-bdcbccc5-722a-4e72-8903-6b7c4ce7dac8', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': "I'm doing well, thanks for asking!", 'role': 'assistant', 'tool_calls': []}}], 'created': 1718690991, 'model': 'anthropic.claude-instant-v1', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'prompt_tokens': 16, 'completion_tokens': 13, 'total_tokens': 29}}
{'id': 'chatcmpl-319cd0e0-3392-4e26-9795-683cbe6fbd3a', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': "I'm doing well, thanks for asking!", 'role': 'assistant', 'tool_calls': []}}], 'created': 1718690993, 'model': 'anthropic.claude-instant-v1', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'prompt_tokens': 16, 'completion_tokens': 13, 'total_tokens': 29}}
{'id': 'chatcmpl-bcdf1d92-9cff-4f17-b8ae-397e927b7759', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': "I'm doing well, thanks for asking! As an AI assistant, I don't experience emotions in the same way humans do, but I'm functioning properly and ready to assist you with any questions or tasks you may have. How can I help you today?", 'role': 'assistant', 'tool_calls': []}}], 'created': 1718690995, 'model': 'anthropic.claude-3-haiku-20240307-v1:0', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'prompt_tokens': 14, 'completion_tokens': 55, 'total_tokens': 69}}
As you can see, I choose to invoke bedrock 5 times from the provided models list, and output the response json from different models.