kamalkraj / stable-diffusion-tritonserver

Deploy stable diffusion model with onnx/tenorrt + tritonserver

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

failed to load 'stable_diffusion' version 1:

whatsondoc opened this issue · comments

Hello,

Following instructions to deploy this project, and observing that Triton is unable to load the stable_diffusion model.

This is seen in the Triton Server logs printed to stdout:

1028 08:21:03.012132 581 pb_stub.cc:309] Failed to initialize Python stub: AttributeError: 'LMSDiscreteScheduler' object has no attribute 'set_format'

At:
  /models/stable_diffusion/1/model.py(58): initialize

I1028 08:21:03.465850 1 onnxruntime.cc:2606] TRITONBACKEND_ModelInstanceInitialize: encoder (GPU device 1)
E1028 08:21:03.470367 1 model_lifecycle.cc:596] failed to load 'stable_diffusion' version 1: Internal: AttributeError: 'LMSDiscreteScheduler' object has no attribute 'set_format'

At:
  /models/stable_diffusion/1/model.py(58): initialize

The specific function referenced in model.py is here (line 58, indicated below):

    def initialize(self, args: Dict[str, str]) -> None:
        """
        Initialize the tokenization process
        :param args: arguments from Triton config file
        """
        current_name: str = str(Path(args["model_repository"]).parent.absolute())
        self.device = "cpu" if args["model_instance_kind"] == "CPU" else "cuda"
        self.tokenizer = CLIPTokenizer.from_pretrained(current_name + "/stable_diffusion/1/")
        self.scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear")
        self.scheduler = self.scheduler.set_format("pt")   <--
        self.height = 512
        self.width = 512
        self.num_inference_steps = 50
        self.guidance_scale = 7.5
        self.eta = 0.0

I tried commenting this line out so self.scheduler is only defined in the line previous, and Triton Server starts and all models (including stable_diffusion) loads successfully and is reported by Triton as online and ready.

Leaving this in place, when subsequently working through the Jupyter Notebook, an error is raised (somewhat expectedly):

InferenceServerException: Failed to process the request(s) for model instance 'stable_diffusion', message: Stub process is not healthy.

So, forced back to the original issue - have you seen this before, or any idea on a fix?

Same issue on my side. Testing on a V100 using pip install --upgrade diffusers (0.7.2)
"failed to load 'stable_diffusion' version 1: Internal: AttributeError: 'LMSDiscreteScheduler' object has no attribute 'set_format'"
Which version of Diffusers library are you using for this demo?

EDIT : with a previous version of diffusers (0.3.0) it's loading the model without error.
But then, still the other error
E1107 15:43:31.899870 116 python_be.cc:1818] Stub process is unhealthy and it will be restarted.

What do you think this comes from ?
Thanks for your help

Unfortunately still not working when launching an Inference.
The server is however fixed with this new diffusers version.
E1108 12:53:16.125181 95 python_be.cc:1818] Stub process is unhealthy and it will be restarted. ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
What Hardware are you testing on ?
Thanks for the support.

I'm seeing the same, I'm afraid.

The Triton container build & server launch run smoothly, however when whizzing through the Inference notebook I get to stage #7, which produces the InferenceServerException: Failed to process the request(s) for model stable_diffusion, message: stub process is not healthy (which is also populated in the TritonServer log).

For reference, I'm trying this on a DGX-2 with 16 x V100-SXM3-32GB.

Hardware tested 1080Ti

@whatsondoc @lolagiscard
Could you please share screenshot/logs ?

Server Part (before inference)
image

Inference part :
image

Server after inference:
image

Sure, here's a few screenshots (let me know if you'd like the full logs, which would take a bit to get them out of the environment but it's doable).

The TritonServer logs were screenshotted after the Inference call was made.

bild
bild

Try running the docker with below cmd

docker run -it --rm --gpus device=0 -p8000:8000 -p8001:8001 -p8002:8002 --shm-size 16384m   \
-v $PWD/stable-diffusion-v1-4-onnx/models:/models tritonserver \
tritonserver --model-repository /models/

I'm already testing testing on 1 GPU only, if this is the change you want us to try.
There might be something wrong in the nvidia Triton docker itself, might not work with some of the GPU architectures

Okay.
I will test it on v100 and let you know.

@lolagiscard @lolagiscard running on v100 ?

Great , thanks let us know
Yes Im also on a V100 sorry should have said that already above :)

I was able to reproduce the issue with torch version 1.13

I have pinned torch 1.12.1 in docker and fixed the issue. 666e148

All good now, in deed !
Thanks a lot :)

Nice - thanks kamalkraj! Works like a charm.

One observation is that I needed to reduce this to run on a single GPU, when using more than one (originally I tried with 4) I get the following (screenshot attached).

As mentioned though, with a single GPU it works great, appreciate the support.

bild

I will check the multi-gpu issue.

Please checkout v2

let me know of any issues

I also encountered the same issue when I use multiple gpu and checked out the v3 branch,hope this screenshot helps in any way
debug

@whozwhat
Multi-GPU not yet fixed
run the docker using below cmd should fix the issue for now

docker run -it --rm --gpus device=0 -p8000:8000 -p8001:8001 -p8002:8002 --shm-size 16384m   \
-v $PWD/models:/models sd_trt bash

Thanks Reply, this cmd works.
It would be awesome if the multi-gpu issue could be fixed