http_400 Invalid argument when gcs_list_objects() returns exactly 1000 rows
lisovyk opened this issue · comments
Today my shiny app started returning such an error when executing gcs_list_objects('my-bucket'):
> gcs_list_objects('my-images')
ℹ 2023-07-31 17:40:24 > Request Status Code: 400
Error in `abort_http()`:
! http_400 Invalid argument.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/http_400>
Error in `abort_http()`:
! http_400 Invalid argument.
---
Backtrace:
▆
1. └─googleCloudStorageR::gcs_list_objects("my-images")
2. └─googleAuthR::gar_api_page(...)
3. └─googleAuthR (local) f(pars_arguments = l)
4. └─googleAuthR:::doHttrRequest(...)
5. └─googleAuthR:::retryRequest(...)
6. └─googleAuthR:::abort_http(status_code, error)
I'm using the latest version – please, help me debug it. Is it a problem on my side or the google API has changed? I have not found information about API changes..
Further debug lead me to the thought that the problem is with pagination - This problem arised when the bucket got to 1000 entries, thus pagination started to matter.
page_f
parameter in gar_api_page()
is set to page_f = function(x) attr(x, "nextPageToken")
. Renaming the nextPageToken to anything else removes the error, but pagination does not work: it returns only 1000 entries.
I have added another item to the bucket non-programatically, so it has 1001 entries – the problem dissapeared! I guess now I'm waiting when we get 2000 items in a bucket :)
Weird it started to go wrong, will check if api response has changed.
@MarkEdmondson1234 hey, have you had the time to look into it?
I happened to get to 1000 entries in another bucket, and here adding an entry by hand does not solve the problem,
I get same error, but can still get "some" results by passing a delimiter
parameter..
> dim(gcs_list_objects('my-images', delimiter = ""))
ℹ 2023-08-22 07:54:16 > Request Status Code: 400
Error in `abort_http()`:
! http_400 Invalid argument.
Run `rlang::last_trace()` to see where the error occurred.
> dim(gcs_list_objects('my-images', delimiter = "a"))
[1] 830 3
Can I see you sessionInfo()
?
Sorry for late reply. I have reverted the code to the commit where the issue was persistent – as I have removed the gcs_list_objects()
-related functionality from the app – but I can not reproduce it now, the function works as intended for me.
Here is the session info in any case – the same issue was present on ubuntu 18 server that runs shinyproxy with the app.
> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.5.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] googleCloudStorageR_0.7.0 RColorBrewer_1.1-3 stringdist_0.9.8
[4] plotly_4.10.0 ggplot2_3.3.6 stringi_1.7.8
[7] stringr_1.4.1 lubridate_1.8.0 dotenv_1.0.3
[10] mongolite_2.6.2 rclipboard_0.1.6 shinyvalidate_0.1.2
[13] ssh_0.8.1 emojifont_0.5.5 data.table_1.14.2
[16] rhandsontable_0.3.8 shinyBS_0.61.1 shinyjs_2.1.0
[19] shinydashboardPlus_2.0.3 shinydashboard_0.7.2 shiny_1.7.4.1
[22] httr_1.4.6 DT_0.25
loaded via a namespace (and not attached):
[1] tidyr_1.2.1 viridisLite_0.4.1 jsonlite_1.8.7 showtext_0.9-5 assertthat_0.2.1
[6] askpass_1.1 showtextdb_3.0 renv_0.17.3 yaml_2.3.5 pillar_1.8.1
[11] glue_1.6.2 digest_0.6.33 promises_1.2.0.1 googleAuthR_2.0.1 colorspace_2.0-3
[16] htmltools_0.5.5 httpuv_1.6.11 pkgconfig_2.0.3 sysfonts_0.8.8 purrr_0.3.4
[21] xtable_1.8-4 scales_1.2.1 later_1.3.1 tibble_3.1.8 openssl_2.1.0
[26] generics_0.1.3 ellipsis_0.3.2 cachem_1.0.8 withr_2.5.0 lazyeval_0.2.2
[31] credentials_1.3.2 cli_3.6.1 proto_1.0.0 magrittr_2.0.3 mime_0.12
[36] memoise_2.0.1 fs_1.6.3 fansi_1.0.3 tools_4.2.1 gargle_1.5.2
[41] lifecycle_1.0.3 munsell_0.5.0 zip_2.2.1 compiler_4.2.1 rlang_1.1.1
[46] grid_4.2.1 rstudioapi_0.14 sys_3.4.2 htmlwidgets_1.5.4 gtable_0.3.1
[51] curl_5.0.1 R6_2.5.1 knitr_1.40 dplyr_1.0.10 fastmap_1.1.1
[56] utf8_1.2.2 parallel_4.2.1 Rcpp_1.0.11 vctrs_0.4.1 tidyselect_1.1.2
[61] xfun_0.33
This could have been there a while but intermittent if its exactly when the paging == page_size.
Will have a look through here to see if anything has changed recently https://cloud.google.com/storage/docs/json_api/v1/objects/list
This looks different:
Returns results in a directory-like mode, with / being a common value for the delimiter.
items[] contains object metadata for objects whose names do not contain delimiter, or whose names only have instances of delimiter in their prefix.
prefixes[] contains truncated object names for objects whose names contain delimiter after any prefix. Object names are truncated beyond the first applicable instance of the delimiter, mimicking a directory. If multiple objects have the same truncated name, duplicates are omitted. Truncated object names in prefixes[] always end with /.
Must be set to / when used with the matchGlob parameter to filter results in a directory-like mode.
For what it's worth, I just tested this for ropensci/targets#1172 using version 0.7.0, and gcs_list_objects()
worked fine on my end even when there were exactly 1000 objects. Maybe somebody already solved this on the Google Cloud API end?