awslabs / multi-model-server

Multi Model Server is a tool for serving neural net models for inference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Preloading models on Sagemaker multi-model endpoint doesn't work

sassarini-marco opened this issue · comments

Hi,

I'm trying to load some models at sagemaker endpoint server startup to make them already available on model prediction requests to skip the loading step phase on first request.

I've configured the mms with the following parameters accordingly to the mms documentation:

  • model_store = '/'
  • default_workers_per_model = 1
  • preload_model = 'true'
  • load_models = .. # the container local path where i store the model.

The model is a decompressed tar.gz archive generated through sagemaker training process plus a MAR-INF/MANIFEST.json directory with the model_name information.

From cloudwatch logs i see the model has been loaded correctly on a worker thread which immediatly stops after scale-down call.

Following some screen with the logs.
The configuration:
image

The load-scale down:
image

I don't see errors in the logs: what's going on? Is it a bug?

Best regards.