[bug] huggingface-pytorch-inference container cannot initialize on AWS SageMaker
garrett-mesalabs opened this issue · comments
Checklist
- I've prepended issue tag with type of change: [bug]
- [] (If applicable) I've attached the script to reproduce the bug
- (If applicable) I've documented below the DLC image/dockerfile this relates to
- (If applicable) I've documented below the tests I've run on the DLC image
- I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
- I've built my own container based off DLC (and I've attached the code used to build my own image)
Concise Description:
The image 763104351884.dkr.ecr.us-west-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04
fails to start on AWS SageMaker with the following error:
Traceback (most recent call last):
File "/usr/local/bin/deep_learning_container.py", line 22, in <module>
import botocore.session
File "/opt/conda/lib/python3.10/site-packages/botocore/session.py", line 25, in <module>
import botocore.configloader
File "/opt/conda/lib/python3.10/site-packages/botocore/configloader.py", line 19, in <module>
from botocore.compat import six
File "/opt/conda/lib/python3.10/site-packages/botocore/compat.py", line 25, in <module>
from botocore.exceptions import MD5UnavailableError
File "/opt/conda/lib/python3.10/site-packages/botocore/exceptions.py", line 15, in <module>
from botocore.vendored.requests.exceptions import ConnectionError
File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/__init__.py", line 58, in <module>
from . import utils
File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/utils.py", line 26, in <module>
from .compat import parse_http_list as _parse_list_header
File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/compat.py", line 7, in <module>
from .packages import chardet
File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/__init__.py", line 3, in <module>
from . import urllib3
File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/urllib3/__init__.py", line 10, in <module>
from .connectionpool import (
File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py", line 38, in <module>
from .response import HTTPResponse
File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/urllib3/response.py", line 9, in <module>
from ._collections import HTTPHeaderDict
File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/urllib3/_collections.py", line 1, in <module>
from collections import Mapping, MutableMapping
DLC image/dockerfile:
763104351884.dkr.ecr.us-west-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04
Current behavior:
The container crashes when starting up.
Expected behavior:
The container starts successfully.
Additional context:
Here is the CloudFormation resource for the hosted model:
LlamaSageMakerModel:
Type: AWS::SageMaker::Model
Properties:
PrimaryContainer:
Image: '763104351884.dkr.ecr.us-west-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04'
Mode: SingleModel
ModelDataUrl: !Sub s3://mybucket-ml-models/nsql-llama-2-7B.tar.gz
Environment:
{
"HF_TASK":"text-generation"
}
ExecutionRoleArn: !GetAtt LlamaExecutionRole.Arn
ModelName: llama-7b-sql-model