piskvorky / smart_open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GCP retry mechanism

cometta opened this issue · comments

commented

i'm using version 6. May i know any example on how to enable retry on flaky internet connection when reading stream from GCP bucket ?

There is no existing example code I'm aware of that would help in your case.

@petedannemann may be able to help here.

smart-open's GCS reader uses google.cloud.storage.Blob under the hood. The network call when reading is done by the Blob's download_as_bytes method in smart open's code here. We do not specify a retry in smart-open so we use the default one provided by download_as_bytes which you can read more about in the first link provided. If that is not sufficient for your use case, please submit a PR allow the user to provide a retry to the download_as_bytes calls used in smart-open.

commented

Thank you for responding @petedannemann @mpenkov got it. need to pass retry object to download_as_bytes to enable retry.

With #744 this is now supported by passing in a Retry object or ConditionalRetryPolicy as a blob_open_kwarg with a key of retry. You can read more about this here. I think we can close this issue now.