timoklimmer / powerproxy-aoai

Monitors and processes traffic to and from Azure OpenAI endpoints.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature: load-balance between instances

jjczopek opened this issue · comments

I've scanned the source code, and didn't find it there, but maybe it's already there. If not, consider it as feature request.

Long story short, there are differences between model versions avaialble in different Azure regions for Azure OpenAI. Therefore there are situations where there are multiple azure opena ai instances, but the deployments available on those endpoints are slightly different.

It would be nice to have that whenever PowerProxy load-balances the requests, it takes into account the deployments available on a given endpoint(s).

Available deployments could come from the config.

Hey @jjczopek, take a look at the example config. By specifying virtual deployments in the config, PowerProxy knows which deployments can be used at a given endpoint depending on the deployment name used in the request and load balances accordingly. If virtual deployment names are equal across multiple endpoints, it will load balance even across endpoints (if load balancing across deployments does not succeed).