Refactor Inference Engine extensions to Backend

Question

Refactor Inference Engine extensions to Backend

louis-jan opened this issue a month ago · comments

Description:

All current inference engines primarily function within the browser host. We now require them to be refactored to operate on the backend. This adjustment will enable UI components to simply request with minimal model knowledge, such as the model's ID and messages.

Without requiring extensive knowledge of running the model, the client can simply send an OpenAI-compatible request without additional parameters. Models will be loaded using default settings, read from model.json, on the server side.
See: #2758

This approach would also contribute to scaling the model hub, as clients can easily retrieve the latest supported model list from the backend, which is dynamically updatable.

graph LR
UI[UI Components]-->|chat/completion| Backend
Backend -->|retrieve| Model[model.json]
Model-->|settings| Load[Model Loader]
Load-->|inference| Inference[Inference Engines]
Inference -->|Response| UI