aws / sagemaker-mxnet-training-toolkit

Toolkit for running MXNet training scripts on SageMaker. Dockerfiles used for building SageMaker MXNet Containers are at https://github.com/aws/deep-learning-containers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running MxNet 1.1.0 container

mklissa opened this issue · comments

Hi there,
when running the line:

"docker build -t preprod-mxnet:1.1.0-cpu-py2 --build-arg py_version=2 --build-arg framework_installable=mxnet-1.1.0-py2.py3-none-manylinux1_x86_64.whl -f Dockerfile.cpu ."

I get the following error:

COPY failed: stat /var/lib/docker/tmp/docker-builder295259424/mxnet-1.1.0-py2.py3-none-manylinux1_x86_64.whl: no such file or directory

Seems like a file is missing or I am missing a step to get the file at the right place?

Thanks

Yea, we're missing some explanation in the documentation. You need to download the file to your current directory from here: https://pypi.org/project/mxnet/1.1.0/#files

It's set up that way so that it can easily build using different MXNet binaries. However, we should change it so that by default it downloads that version for you. I'll see if I can get to that soon.

Hi there, thank you for your answer. The script seems to continue, but stops on this line:

Step 11/15 : COPY $framework_support_installable .
COPY failed: stat /var/lib/docker/tmp/docker-builder114960697/sagemaker_mxnet_container-1.0.0.tar.gz: no such file or directory

I am not sure what the MXNet container is or where I should get it from?
Thanks

Hi,

The sagemaker_mxnet_container-1.0.0.tar.gz file can be produced by running

python setup.py sdist

And then this file will be in dist/

This is recorded in the README too
https://github.com/aws/sagemaker-mxnet-container/blob/master/README.rst