http_400 Invalid argument when gcs_list_objects() returns exactly 1000 rows

lisovyk opened this issue · comments

Today my shiny app started returning such an error when executing gcs_list_objects('my-bucket'):

> gcs_list_objects('my-images')
ℹ 2023-07-31 17:40:24 > Request Status Code:  400
Error in `abort_http()`:
! http_400 Invalid argument.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
Error in `abort_http()`:
! http_400 Invalid argument.
 1. └─googleCloudStorageR::gcs_list_objects("my-images")
 2.   └─googleAuthR::gar_api_page(...)
 3.     └─googleAuthR (local) f(pars_arguments = l)
 4.       └─googleAuthR:::doHttrRequest(...)
 5.         └─googleAuthR:::retryRequest(...)
 6.           └─googleAuthR:::abort_http(status_code, error)

I'm using the latest version – please, help me debug it. Is it a problem on my side or the google API has changed? I have not found information about API changes..

Further debug lead me to the thought that the problem is with pagination - This problem arised when the bucket got to 1000 entries, thus pagination started to matter.

page_f parameter in gar_api_page() is set to page_f = function(x) attr(x, "nextPageToken"). Renaming the nextPageToken to anything else removes the error, but pagination does not work: it returns only 1000 entries.

I have added another item to the bucket non-programatically, so it has 1001 entries – the problem dissapeared! I guess now I'm waiting when we get 2000 items in a bucket :)

Weird it started to go wrong, will check if api response has changed.

@MarkEdmondson1234 hey, have you had the time to look into it?

I happened to get to 1000 entries in another bucket, and here adding an entry by hand does not solve the problem,
I get same error, but can still get "some" results by passing a delimiter parameter..

> dim(gcs_list_objects('my-images', delimiter = ""))
ℹ 2023-08-22 07:54:16 > Request Status Code:  400
Error in `abort_http()`:
! http_400 Invalid argument.
Run `rlang::last_trace()` to see where the error occurred.
> dim(gcs_list_objects('my-images', delimiter = "a"))
[1] 830   3

Can I see you sessionInfo()?

Sorry for late reply. I have reverted the code to the commit where the issue was persistent – as I have removed the gcs_list_objects()-related functionality from the app – but I can not reproduce it now, the function works as intended for me.

Here is the session info in any case – the same issue was present on ubuntu 18 server that runs shinyproxy with the app.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.5.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

This could have been there a while but intermittent if its exactly when the paging == page_size.

Will have a look through here to see if anything has changed recently

This looks different:

Returns results in a directory-like mode, with / being a common value for the delimiter.

    items[] contains object metadata for objects whose names do not contain delimiter, or whose names only have instances of delimiter in their prefix.
    prefixes[] contains truncated object names for objects whose names contain delimiter after any prefix. Object names are truncated beyond the first applicable instance of the delimiter, mimicking a directory. If multiple objects have the same truncated name, duplicates are omitted. Truncated object names in prefixes[] always end with /.

Must be set to / when used with the matchGlob parameter to filter results in a directory-like mode.

For what it's worth, I just tested this for ropensci/targets#1172 using version 0.7.0, and gcs_list_objects() worked fine on my end even when there were exactly 1000 objects. Maybe somebody already solved this on the Google Cloud API end?