[Question] How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker?
weiZhenkun opened this issue · comments
Checklist
- I've attached the script to reproduce the bug
- I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
Concise Description:
How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker?
DLC image/dockerfile:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04
Current behavior:
HF_MODEL_ID is a must, and I have set the S3 path for model_data, but it always downloads model files from remote hunggingface when I want to deploy the Sagemaker endpoint in AWS.
import json
from sagemaker.huggingface import HuggingFaceModel
# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 300
# Define Model and Endpoint configuration parameter
config = {
'HF_MODEL_ID':'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5',
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(2000),
'MAX_TOTAL_TOKENS': json.dumps(2048),
}
# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
model_data="s3://S3_PATH/oasst-sft-4-pythia-12b-epoch-3.5.tar.gz",
role=role,
image_uri=llm_image,
env=config
)
llm = llm_model.deploy(
endpoint_name="oasst-sft-4-pythia-12b-epoch-35-12x",
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)
Expected behavior:
I can use the model file on AWS S3 without remote hunggingface.
This is likely an issue with the TGI repo itself as opposed to something within the container. I noticed you opened a similar issue: huggingface/text-generation-inference#887. Could you please follow up with that issue for this topic?
This is likely an issue with the TGI repo itself as opposed to something within the container. I noticed you opened a similar issue: huggingface/text-generation-inference#887. Could you please follow up with that issue for this topic?
Thnaks, I have get the answer.