spotify / gcs-tools

GCS support for avro-tools, parquet-tools and protobuf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

All latest tools fail to authenticate to GCS

shnapz opened this issue · comments

STR:

1a. Install all latest (v0.2.2 on Aug 29) tools
1b. Or build latest master to parquet-cli-1.12.3.jar, proto-tools-3.21.1.jar, avro-tools-1.11.0.jar,magnolify-tools-0.4.8.jar

  1. Run all of them using basic read command like <TOOL> tojson <GCS_PATH>

Actual:
Tool launches browser that shows a page:
Screen Shot 2022-08-29 at 9 50 20 AM

With a message:

The version of the app you're using doesn't include the latest security features to keep you protected. Please make sure to download from a trusted source and update to the latest, most secure version.

Exected:
Tool reads a file according to spec

The auth is failing because Google deprecated support of OAuth out-of-band (oob) flow:
https://developers.googleblog.com/2022/02/making-oauth-flows-safer.html#dates-oob

Although Oct 3, 2022 is a date when existing clients will be deprecated, currently auth is not working for gcs-tools

Solution: move to a new OAuth flow, by upgrading to "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop3-2.2.7" and changing hadoop connector configuration

"User credentials" type of auth (real user via SSO) is not supported in 2.2.X releases yet, although it is in master branch:
https://github.com/GoogleCloudDataproc/hadoop-connectors
configuration of master: https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/CONFIGURATION.md
configuration of 2.2.X: https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/branch-2.2.x/gcs/CONFIGURATION.md

Possible resolutions:

  • wait until they release OAuth support?
  • open a ticket and ask for the fix, and as a result get some estimation from project maintainers?

A workaround using application default credentials is https://github.com/spotify/gcs-tools/compare/main...jwiklund:gcs-tools:update?expand=1 (I don't know enough about other use cases to tell if this makes sense, but it works "on my machine").

Thanks @jwiklund
I assume you are using service account when accessing GCS?

    <name>fs.gs.auth.service.account.enable</name>
    <value>true</value>
    <description>Force OAuth2 flow</description>

You set:

export GOOGLE_APPLICATION_CREDENTIALS=$HOME/.config/gcloud/application_default_credentials.json

Did you specifically modify your default credentials?

No, I've only done gcloud auth application-default login, but there is some code in the connector that doesn't check for the default location unless that environment variable is present (it times out checking for the metadata service instead). If possible it might be useful to check if that file exist and only enable service.account.enable then (or I guess implement our own provider that uses the standard google auth lookup library).

Actually I don't have to use that env var on my other machine, something is different here so that it picks it up directly.

Ok, now it happened again, so "sometimes" you have to include the GOOGLE_APPLICATION_CREDENTIALS environment variable or else it will fallback to metadata lookup (which fails on non gce/gke machines). I'm not sure what triggers the different behavior.

User credentials auth will be supported in 3.0.0 release in 2023 Q1
cc: @RustedBones

FYI they just closed the ticket and committed to a fix in 3.0, but 3.0 is not there yet, we don't have estimations

Fixed in 0.3.0