MNIST Pytorch tutorial docker image not available for ARM (aka M1 macs)

Question

MNIST Pytorch tutorial docker image not available for ARM (aka M1 macs)

EMCP opened this issue 7 months ago · comments

Is your feature request related to a problem? Please describe.

Currently attempting to test on local laptop, but getting an error when attempting to reference this image

ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch

Error response from daemon: image with reference ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch was found but does not match the specified platform: wanted linux/arm64/v8, actual: linux/amd64

Describe the solution you'd like

Add an ARM64/v8 to the build CICD process of the demos

How would the solution positively affect the functionality?

Users working on Macs with Apple Silicon can easily run the demo

Describe any drawbacks (if any)

Might add some complexity to the CICD process

Contact Details
ping for email

Fredrik Wrede · Answer 1 · Sat Mar 02 2024 00:29:48 GMT+0800 (China Standard Time)

Hi, thank you for reporting the issue!
If you add --platform linux/amd64 to the docker command, does the error persist? In other words:

docker run --platform linux/amd64 \ -v $PWD/client.yaml:/app/client.yaml \ -v $PWD/data/clients/1:/var/data \ -e ENTRYPOINT_OPTS=--data_path=/var/data/mnist.pt \ ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch run client -in client.yaml --name client1

Erik · Answer 2 · Mon Mar 04 2024 16:59:18 GMT+0800 (China Standard Time)

ah thank you..

I had tried that in PyCharm .. but in the build options.. whereas it must be in the "run" options to get a bit further in the build.

Retried it with just docker run --platform linux/amd64 to start and it seems to be successfully getting past that error in it's original place... but then I get stuck on the next

Step 9/12 : RUN scripts/create_client.sh --experimenturl ${EXPERIMENT_URL} --networkid ${NETWORK_ID} --token ${TOKEN}
 ---> [Warning] The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
 ---> Running in a1234

I am assuming there are no plans for ARM based builds of the examples yet.. as this is still a niche then?

EDIT: Instead of all the runtime options you pasted.. I think the supplied script sets this up for us.. in the Dockerfile

Dockerfile:

FROM ghcr.io/scaleoutsystems/fedn/fedn:master-mnist-pytorch

ARG EXPERIMENT_URL
ARG NETWORK_ID
ARG TOKEN

WORKDIR /app

RUN apt-get update -y && \
    apt-get upgrade -y && \
    apt-get install -y curl sudo wget vim && \
    mkdir scripts 

COPY scripts/create_client.sh scripts/create_client.sh 

RUN chmod +x scripts/create_client.sh
RUN scripts/create_client.sh --experimenturl ${EXPERIMENT_URL} --networkid ${NETWORK_ID} --token ${TOKEN}

COPY fedn_env/mydata/data /var/data

ENV ENTRYPOINT_OPTS --data_path=/var/data

CMD ["run", "client", "--secure=True", "--force-ssl", "-in", "client.yaml"]

I think I might switch to a windows based machine and retry..

Fredrik Wrede · Answer 3 · Mon Mar 04 2024 19:19:21 GMT+0800 (China Standard Time)

I do not recognize the bash script "scripts/create_client.sh" nor the Dockerfile above, so I assume this is a modified version of the example?

Erik · Answer 4 · Mon Mar 04 2024 19:39:25 GMT+0800 (China Standard Time)

I do not recognize the bash script "scripts/create_client.sh" nor the Dockerfile above, so I assume this is a modified version of the example?

Correct. We can see a bit more when running this same modified sample on an x64 box via Azure App Services.. I've reached out to the source of this customization to check if they've gotten an experiment up and ready to initiate training.. but was just thinking you must have other customers on Apple Silicon dev environments.

I admit this is my first apple silicon docker issue so.. if im just doing something stupidly wrong I will be glad to close.. but it seems the supplied script from this other party is not able to pivot to using the rosetta translator on apple silicon.. but not a blocker as we've got x64 boxes everywhere too

Addi Ait-Mlouk · Answer 5 · Fri Mar 08 2024 00:22:06 GMT+0800 (China Standard Time)

Hello,
If you are using macOS M1/ M2/Max, you could solve this error by adding this line to your docker-compose file :
platform: linux/amd64
example:
version: '3.1' services: web: image: image platform: linux/amd64 # this line is for macOS M1 depends_on: - mydb ports:

Erik · Answer 6 · Sun Mar 24 2024 22:06:57 GMT+0800 (China Standard Time)

going to set to closed.. I am building the training env from scratch due to the sensitivity of the data/setup.. and the mac is mostly a corner case and not worth fiddling with yet..

When I tried to modify the build with the platform flag nothing changed.. but then we will simply proceed with a "from-scratch" python baseline image if/when needed..