[Question] How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker?

Question

[Question] How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker?

weiZhenkun opened this issue a year ago · comments

GenkenWei commented a year ago

Checklist

I've attached the script to reproduce the bug
I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html

Concise Description:

How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker?

DLC image/dockerfile:

763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04

Current behavior:

HF_MODEL_ID is a must, and I have set the S3 path for model_data, but it always downloads model files from remote hunggingface when I want to deploy the Sagemaker endpoint in AWS.

import json
from sagemaker.huggingface import HuggingFaceModel

# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 300

# Define Model and Endpoint configuration parameter
config = {
'HF_MODEL_ID':'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5',
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(2000),
'MAX_TOTAL_TOKENS': json.dumps(2048),
}

# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
model_data="s3://S3_PATH/oasst-sft-4-pythia-12b-epoch-3.5.tar.gz",
role=role,
image_uri=llm_image,
env=config
)

llm = llm_model.deploy(
endpoint_name="oasst-sft-4-pythia-12b-epoch-35-12x",
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)

Expected behavior:

I can use the model file on AWS S3 without remote hunggingface.

amzn-choeric · Answer 1 · Wed Aug 30 2023 01:29:55 GMT+0800 (China Standard Time)

This is likely an issue with the TGI repo itself as opposed to something within the container. I noticed you opened a similar issue: huggingface/text-generation-inference#887. Could you please follow up with that issue for this topic?

GenkenWei · Answer 2 · Thu Sep 07 2023 16:48:06 GMT+0800 (China Standard Time)

This is likely an issue with the TGI repo itself as opposed to something within the container. I noticed you opened a similar issue: huggingface/text-generation-inference#887. Could you please follow up with that issue for this topic?

Thnaks, I have get the answer.