janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)

Home Page:https://jan.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Refactor Inference Engine extensions to Backend

louis-jan opened this issue · comments

commented

Description:

All current inference engines primarily function within the browser host. We now require them to be refactored to operate on the backend. This adjustment will enable UI components to simply request with minimal model knowledge, such as the model's ID and messages.

Without requiring extensive knowledge of running the model, the client can simply send an OpenAI-compatible request without additional parameters. Models will be loaded using default settings, read from model.json, on the server side.
See: #2758

This approach would also contribute to scaling the model hub, as clients can easily retrieve the latest supported model list from the backend, which is dynamically updatable.

graph LR
UI[UI Components]-->|chat/completion| Backend
Backend -->|retrieve| Model[model.json]
Model-->|settings| Load[Model Loader]
Load-->|inference| Inference[Inference Engines]
Inference -->|Response| UI