Support GCP-auth for private PyPi repositories
tuulos opened this issue · comments
Quoting a Slack discussion:
Hello all, we are running a docker container for our dev environment. I am able to use the container shell and use pip to install packages from our private repository, but as soon as we try to run flows that require packages from that same private repo, it doesn't work, we get a 401 Error. It seems like metaflow is building the conda command but the pip keyring configurations do not get passed through from metaflow-->conda-->pip. Any help would be appreciated on how to configure either metaflow or conda to pass those through!
GCP has the concept of Application Default Credentials, kind of like a "personal service account key" that you can use to authenticate to the different services from your machine.
In this case, I am using it as described here, the part where it talks about the python keyring and ADC. I am passing the GOOGLE_APPLICATION_CREDENTIALS environment variable to the container and mounting the json "key" file as a volume.
from there, using pip in the container "just works", but it seems like metaflow/conda/pip doesn't respect that same flow
Hello, original requestor from the slack discussion, asking for a follow-up on this please!
The fix is to probably add keyring
and keyrings.google-artifactregistry-auth
packages in the pypi_base
decorator but I will wait for @martinhausio for a confirmation...
Reason: the environment variable GOOGLE_APPLICATION_CREDENTIALS
needs keyrings.google-artifactregistry-auth
package in order to be able to store stuff...
@savingoyal and @saikonen -- I have tried the above in a fresh new local conda / mamba environment. It turns out that indeed this is the issue. (Will still wait for @martinhausio to confirm though).
But in case this resolves it, this is not really a metaflow issue then...
If we need to automate this, we probably need to inject keyrings.google-artifactregistry-auth
as a dependency that gets installed automatically somewhere..
Do you guys have any further thoughts?
Great work looking into this.
One place we could consider adding the dependency is in metaflow_config.py#get_pinned_conda_libs
, but there is a consideration to be made with this:
The packages returned by get_pinned_conda_libs
depend on the configured datastore. Can we reliably say that only users with a GS datastore will be requiring the keyring package? If so, then adding to the list should be straightforward
After a lot of debugging over slack with both @madhur-ob and @romain-intel , we found the root cause of the problem.
The first issue was that the flow code itself didn't contain the required keyring
and keyrings.google-artifactregistry-auth
as pointed out by @madhur-ob.
When trying to add those packages, we found out that the builder environment needs to contain those packages for the actual environment to resolve the packages required in the flow.
since pip doesn't support index prioritization, having the authentication to the private repo is required in the builder environment, otherwise it won't be able to resolve any package, therefore not able to install anything if your configuration includes any private repository.
Romain is looking at a solution to make this configurable.