piskvorky / smart_open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GCS permission denied 'storage.buckets.get' when using 'open'

baptistejub opened this issue · comments

Hello,

After updating to 6.3.0 (from 6.2.0), we encounter a permission denied on storage.buckets.get when uploading a file to GCS.

Raised by https://github.com/RaRe-Technologies/smart_open/blob/v6.3.0/smart_open/gcs.py#L156 when checking for the bucket existence.

from google.cloud import storage
from smart_open import open

client = storage.Client()
file = open('gs://my-bucket/my_file.ext', 'wb', transport_param=dict(client=client))

File "/home/app/.local/lib/python3.10/site-packages/smart_open/smart_open_lib.py", line 224, in open
  binary = _open_binary_stream(uri, binary_mode, transport_params)
File "/home/app/.local/lib/python3.10/site-packages/smart_open/smart_open_lib.py", line 400, in _open_binary_stream
  fobj = submodule.open_uri(uri, mode, transport_params)
File "/home/app/.local/lib/python3.10/site-packages/smart_open/gcs.py", line 47, in open_uri
  return open(parsed_uri['bucket_id'], parsed_uri['blob_id'], mode, **kwargs)
File "/home/app/.local/lib/python3.10/site-packages/smart_open/gcs.py", line 101, in open
  _blob = Writer(bucket=bucket_id,
File "/home/app/.local/lib/python3.10/site-packages/smart_open/gcs.py", line 156, in Writer
  if not g_bucket.exists():
File "/home/app/.local/lib/python3.10/site-packages/google/cloud/storage/bucket.py", line 900, in exists
  client._get_resource(
File "/home/app/.local/lib/python3.10/site-packages/google/cloud/storage/client.py", line 372, in _get_resource
  return self._connection.api_request(
File "/home/app/.local/lib/python3.10/site-packages/google/cloud/storage/_http.py", line 72, in api_request
  return call()
File "/home/app/.local/lib/python3.10/site-packages/google/api_core/retry.py", line 349, in retry_wrapped_func
  return retry_target(
File "/home/app/.local/lib/python3.10/site-packages/google/api_core/retry.py", line 191, in retry_target
  return target()
File "/home/app/.local/lib/python3.10/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
  raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/my-bucket?fields=name&prettyPrint=false: my-sa@my-org.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist).

The requirement for this permission was removed with #516 but reintroduced by #744.

We've updated our SA permissions for now, but it would be nice if we could avoid it, for the same reasons as presented in #516

I could submit a PR to remove the check altogether if that's fine (trying to upload to a not found bucket should already raise a not found error).

remove the check altogether if that's fine (trying to upload to a not found bucket should already raise a not found error).

you're right! PR is up ^ cc @mpenkov

Argh, we need to sync up @ddelange , I just made a new bugfix release a few minutes ago :)