aws / deep-learning-containers

AWS Deep Learning Containers are pre-built Docker images that make it easier to run popular deep learning frameworks and tools on AWS.

Home Page:https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/what-is-dlc.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[bug] huggingface-pytorch-inference container cannot initialize on AWS SageMaker

garrett-mesalabs opened this issue · comments

Checklist

Concise Description:

The image 763104351884.dkr.ecr.us-west-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04 fails to start on AWS SageMaker with the following error:

Traceback (most recent call last):
  File "/usr/local/bin/deep_learning_container.py", line 22, in <module>
    import botocore.session
  File "/opt/conda/lib/python3.10/site-packages/botocore/session.py", line 25, in <module>
    import botocore.configloader
  File "/opt/conda/lib/python3.10/site-packages/botocore/configloader.py", line 19, in <module>
    from botocore.compat import six
  File "/opt/conda/lib/python3.10/site-packages/botocore/compat.py", line 25, in <module>
    from botocore.exceptions import MD5UnavailableError
  File "/opt/conda/lib/python3.10/site-packages/botocore/exceptions.py", line 15, in <module>
    from botocore.vendored.requests.exceptions import ConnectionError
  File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/__init__.py", line 58, in <module>
    from . import utils
  File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/utils.py", line 26, in <module>
    from .compat import parse_http_list as _parse_list_header
  File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/compat.py", line 7, in <module>
    from .packages import chardet
  File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/__init__.py", line 3, in <module>
    from . import urllib3
  File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/urllib3/__init__.py", line 10, in <module>
    from .connectionpool import (
  File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py", line 38, in <module>
    from .response import HTTPResponse
  File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/urllib3/response.py", line 9, in <module>
    from ._collections import HTTPHeaderDict
  File "/opt/conda/lib/python3.10/site-packages/botocore/vendored/requests/packages/urllib3/_collections.py", line 1, in <module>
    from collections import Mapping, MutableMapping

DLC image/dockerfile:

763104351884.dkr.ecr.us-west-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04

Current behavior:

The container crashes when starting up.

Expected behavior:

The container starts successfully.

Additional context:

Here is the CloudFormation resource for the hosted model:

  LlamaSageMakerModel:
    Type: AWS::SageMaker::Model
    Properties:
      PrimaryContainer:
        Image: '763104351884.dkr.ecr.us-west-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04'
        Mode: SingleModel
        ModelDataUrl: !Sub s3://mybucket-ml-models/nsql-llama-2-7B.tar.gz
        Environment:
          {
            "HF_TASK":"text-generation"
          }
      ExecutionRoleArn: !GetAtt LlamaExecutionRole.Arn
      ModelName: llama-7b-sql-model