Feature: load-balance between instances

Question

Feature: load-balance between instances

jjczopek opened this issue 5 days ago · comments

I've scanned the source code, and didn't find it there, but maybe it's already there. If not, consider it as feature request.

Long story short, there are differences between model versions avaialble in different Azure regions for Azure OpenAI. Therefore there are situations where there are multiple azure opena ai instances, but the deployments available on those endpoints are slightly different.

It would be nice to have that whenever PowerProxy load-balances the requests, it takes into account the deployments available on a given endpoint(s).

Available deployments could come from the config.

Timo Klimmer · Answer 1 · Mon Jun 17 2024 15:31:54 GMT+0800 (China Standard Time)

Hey @jjczopek, take a look at the example config. By specifying virtual deployments in the config, PowerProxy knows which deployments can be used at a given endpoint depending on the deployment name used in the request and load balances accordingly. If virtual deployment names are equal across multiple endpoints, it will load balance even across endpoints (if load balancing across deployments does not succeed).