dask / dask-docker

Docker images for dask

Home Page:https://hub.docker.com/u/daskdev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Migrate away from Docker Hub (sort of)

jacobtomlinson opened this issue · comments

Over the last couple of years Docker have been slowly adding limitations to Docker Hub.

I am not concerned about automated builds as we build with GitHub Actions. The org member limit is a little frustrating as @jrbourbeau and I recently struggled with this. The pull rate limits may be a frustration to some of our users, particularly those using container clusters.

None of these are huge issues, but it shows a general trend that Docker Hub is becoming less and less friendly toward free and OSS communities.

I propose that we start the process of migrating away by pushing our images to the GitHub container registry in addition to Docker Hub and update our documentation to reference the GitHub location as the default.

Docker images follow the naming convention <URL>/<org>/<image>:<tag> and if the URL is omitted it is assumed to be hub.docker.com.

So daskdev/dask:latest would become ghcr.io/dask/dask:latest, daskdev/dask-notebook:latest would become ghcr.io/dask/dask-notebook:latest, etc.

We can continue pushing images to Docker Hub (maybe indefinitely) to support existing users, but this move would allow us to easily step away if the trend continues. It also moves more things inside GitHub which simplifies things for maintainers.

Thoughts?

Do you know what the storage and data transfer limits are for the github container registry / the dask org ?

My understanding is that GitHub Packages is free for public repos. Storage and data transfer limits only apply to private repos.

GitHub Packages usage is free for public packages. For private packages, each GitHub account receives a certain amount of free storage and data transfer, depending on the product used with the account. Any usage beyond the included amounts is controlled by spending limits.

https://docs.github.com/en/billing/managing-billing-for-github-packages/about-billing-for-github-packages

Thanks for raising this issue @jacobtomlinson. Do we know of other OSS projects which have started to migrate to GitHub packages? If so, I'm wondering what their experience was like

I know that Linux Server moved all of their packages over a while back. I'm not sure of any examples in the scipy/pydata communities.

FYI in the JupyterHub org, we have opted to work against quay.io recently. I don't have a clear overview of the motivations for steering towards quay.io but I recall some ideas about RedHat to be likely to be a provider of the server in a reliable way going onwards.

Some traces of related discussions within the JupyterHub org:

I'd be personally comfortable in using different or multiple container registries for various repo's needs without having one single for the entire organization and all projects. I think though that it would make sense to register for example an organization on quay.io before someone else does it if you haven't already done so, no matter what you end up using.

I propose that we start the process of migrating away by pushing our images to the GitHub container registry in addition to Docker Hub and update our documentation to reference the GitHub location as the default.

I think that sounds great! I think going for something will be better than investing more time to exploring this decision.

@manics do you know if there is a key reason the Dask developers should choose for example quay.io over ghcr.io? Was there a big reason we choose to steer towards quay.io over ghcr.io that you recall?

GHCR only became generally available in June so most likely no-one had experience with it at the time.

This conversation has been raised again in dask/dask-gateway#484 as part of the latest Dask Gateway release. I am concerned that Dask Gateway is intending to move images to another platform without a decision being made here.

Currently dask-docker pushes images to the daskdev org on DockerHub and dask-gateway pushes images to the daskgateway org, which is already unpleasant. But it is important to recognise that our users have already built on top of these and we should try and avoid breakages. I want to ensure we are working towards uniting all our images in one place instead of fracturing it further.

I want to propose the following:

  • Dask Gateway releases images to GHCR as planned in dask/dask-gateway#484 (I don't want to derail things there)
  • We migrate daskdev/dask and daskdev/dask-notebook to GHCR also.
  • We update all documentation references to mention GHCR.
  • We continue pushing images to Docker Hub for the time being but somehow include a deprecation warning.
  • At some point in the future we push images to Docker Hub that fail with an error saying images have been moved to GHCR.
  • We take ownership of the Quay dask org but park it and not use it to avoid futher fracturing of where we publish images.

Given the lack of interest in this topic I am keen to reach a consensus between myself and @consideRatio and move forwards.

Thanks @jacobtomlinson for working on this!

We continue pushing images to Docker Hub for the time being [...]

Wherever we have credentials, yes! Creating new credentials is tricky though and I gave up on it in the JupyterHub org.

If you get credentials setup for GitHub's CI system in dask/dask-gateway, I'll make us push to dockerhub there as well - but otherwise I suggest we drop it directly for dask/dask-gateway. Overall, dropping it there isn't problematic in my mind. Any old helm chart will keep functioning and the new helm chart's will also - users won't notice it. Our packaged helm charts, packaged and published as .tgz files and referenced via an index.yaml file in the gh-pages branch in dask/helm-charts, includes a fixed reference to an image that won't change.

[...] but somehow include a deprecation warning.

The best idea I have is to update the DockerHub's image repo's readme. I suspect it will only be read if someone having errored hard and went investigating what is going on, but that is better than having no indication of the change.

Wherever we have credentials, yes! Creating new credentials is tricky though and I gave up on it in the JupyterHub org.

For dask-docker we use my access token in a secret to push. If I can be added to the daskgateway org on Docker Hub then we could use that same token. Alternatively I did create a daskbot user on Docker Hub in the past to use for this, but in order to add it to the daskdev org we would need to remove a bunch of folks as there is a limit of 3 members.

Overall, dropping it there isn't problematic in my mind.

Yeah I'm much less concerned about dask-gateway for the reasons you mention, and also the fact that few folks will be building images that base off the Dask Gateway ones.

The best idea I have is to update the DockerHub's image repo's readme. I suspect it will only be read if someone having errored hard and went investigating what is going on, but that is better than having no indication of the change.

Yeah I think a REAMDE message would be useful. Perhaps we can also update the entrypoint.sh script in the Dask and Notebook images to print a warning. Then when we do fully stop updating images on Dask Hub we could push one last image which just runs a shell script that prints an error and exits.