kserve / modelmesh

Distributed Model Serving Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it possible to use a different container for each individual model?

smolendawid opened this issue · comments

I'm trying to understand the project as it seems so promising. I'd like to ask for some clarification.

Let me start with the main idea:

'Model' is rarely just the model's weights, in contrary with what we see in many tutorials. In practice the model is the whole pipeline from API request, through the data imputation and feature extraction, to classifier/regressor prediction. It is often inconvenient to dump the model using joblib/onnx formats in python world.

And the question is:
Is it possible to use a different container for each individual model?

The idea of puller and the charts suggest to me that it's not possible, the only difference between the models is the downloaded artifacts (model weights by default). This would be too big restriction both in terms of versioning and individual models/clients requirements. The new models are being trained with the different underlying packages versions etc, that in practice requires the different container. Therefore each time new model is trained on some packages version and added, all the rest of the models have to be retrained on the new env version to assure the quality.

On the other hand

modelmesh-runtime-adapter - the containers which run in each model serving pod and act as an intermediary between ModelMesh and third-party model-server containers.

suggest that the third-party model-server containers are what I am looking for, if different version can be used for each model.

Hi @smolendawid if I'm interpreting your question correctly, model-mesh might not be the best fit for what you're looking for. It's geared more towards the model-as-data cases rather than model-as-code, and for "dense" deployment of multiple models per process. In some cases, the model runtime/server may start multiple processes within the same container to aid with request parallelism - applicable for example to python-based servers like Seldon's MLServer. But that isn't for the purpose of sandboxing different dependencies.

You may be interested to look in to Ray / Ray Serve and runtime environments feature, which is intended to address this kind of requirement.

I think most of the real-world applications use different preprocessing steps (sometimes called inference graphs for example). Please let me ask a few more questions to confirm what I understood @njhill.

Let's imagine we want to have 3 different models - image recognizer, text classifier and some fraud detector. Let's assume all of them are pytorch NNs, but also image recognizer needs to run opencv code in front of it and text classifier uses a custom tokenizer. Is it possible to use all of them with ModelMesh? Or is only the NN propagation possible, and adding the preprocessing steps like opencv filter makes it impossible to use with ModelMesh?

Another scenario: company has one model type, let's say sklearn Fraud Detection (FD) system. However the company is supplying the technology to 500 clients. Each client can train the FD model on own data. Company is hosting the models. Each client is training the model at a different time, some are retraining it often, others don't. In the meantime, company changes algorithm from LogisticRegression to pytorch NN or something like that. When a client trained the model in January version 0.2, and another client trains it in June version 0.3, both models, 0.2 and 0.3 should be running. Is this possible?