cloudyr / googleCloudStorageR

Google Cloud Storage API to R

Home Page:https://code.markedmondson.me/googleCloudStorageR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

http_400 Invalid argument when gcs_list_objects() returns exactly 1000 rows

lisovyk opened this issue · comments

Today my shiny app started returning such an error when executing gcs_list_objects('my-bucket'):

> gcs_list_objects('my-images')
ℹ 2023-07-31 17:40:24 > Request Status Code:  400
Error in `abort_http()`:
! http_400 Invalid argument.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/http_400>
Error in `abort_http()`:
! http_400 Invalid argument.
---
Backtrace:
    ▆
 1. └─googleCloudStorageR::gcs_list_objects("my-images")
 2.   └─googleAuthR::gar_api_page(...)
 3.     └─googleAuthR (local) f(pars_arguments = l)
 4.       └─googleAuthR:::doHttrRequest(...)
 5.         └─googleAuthR:::retryRequest(...)
 6.           └─googleAuthR:::abort_http(status_code, error)

I'm using the latest version – please, help me debug it. Is it a problem on my side or the google API has changed? I have not found information about API changes..

Further debug lead me to the thought that the problem is with pagination - This problem arised when the bucket got to 1000 entries, thus pagination started to matter.

page_f parameter in gar_api_page() is set to page_f = function(x) attr(x, "nextPageToken"). Renaming the nextPageToken to anything else removes the error, but pagination does not work: it returns only 1000 entries.

I have added another item to the bucket non-programatically, so it has 1001 entries – the problem dissapeared! I guess now I'm waiting when we get 2000 items in a bucket :)

Weird it started to go wrong, will check if api response has changed.

@MarkEdmondson1234 hey, have you had the time to look into it?

I happened to get to 1000 entries in another bucket, and here adding an entry by hand does not solve the problem,
I get same error, but can still get "some" results by passing a delimiter parameter..

> dim(gcs_list_objects('my-images', delimiter = ""))
ℹ 2023-08-22 07:54:16 > Request Status Code:  400
Error in `abort_http()`:
! http_400 Invalid argument.
Run `rlang::last_trace()` to see where the error occurred.
> dim(gcs_list_objects('my-images', delimiter = "a"))
[1] 830   3

Can I see you sessionInfo()?

Sorry for late reply. I have reverted the code to the commit where the issue was persistent – as I have removed the gcs_list_objects()-related functionality from the app – but I can not reproduce it now, the function works as intended for me.

Here is the session info in any case – the same issue was present on ubuntu 18 server that runs shinyproxy with the app.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.5.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] googleCloudStorageR_0.7.0 RColorBrewer_1.1-3        stringdist_0.9.8         
 [4] plotly_4.10.0             ggplot2_3.3.6             stringi_1.7.8            
 [7] stringr_1.4.1             lubridate_1.8.0           dotenv_1.0.3             
[10] mongolite_2.6.2           rclipboard_0.1.6          shinyvalidate_0.1.2      
[13] ssh_0.8.1                 emojifont_0.5.5           data.table_1.14.2        
[16] rhandsontable_0.3.8       shinyBS_0.61.1            shinyjs_2.1.0            
[19] shinydashboardPlus_2.0.3  shinydashboard_0.7.2      shiny_1.7.4.1            
[22] httr_1.4.6                DT_0.25                  

loaded via a namespace (and not attached):
 [1] tidyr_1.2.1       viridisLite_0.4.1 jsonlite_1.8.7    showtext_0.9-5    assertthat_0.2.1 
 [6] askpass_1.1       showtextdb_3.0    renv_0.17.3       yaml_2.3.5        pillar_1.8.1     
[11] glue_1.6.2        digest_0.6.33     promises_1.2.0.1  googleAuthR_2.0.1 colorspace_2.0-3 
[16] htmltools_0.5.5   httpuv_1.6.11     pkgconfig_2.0.3   sysfonts_0.8.8    purrr_0.3.4      
[21] xtable_1.8-4      scales_1.2.1      later_1.3.1       tibble_3.1.8      openssl_2.1.0    
[26] generics_0.1.3    ellipsis_0.3.2    cachem_1.0.8      withr_2.5.0       lazyeval_0.2.2   
[31] credentials_1.3.2 cli_3.6.1         proto_1.0.0       magrittr_2.0.3    mime_0.12        
[36] memoise_2.0.1     fs_1.6.3          fansi_1.0.3       tools_4.2.1       gargle_1.5.2     
[41] lifecycle_1.0.3   munsell_0.5.0     zip_2.2.1         compiler_4.2.1    rlang_1.1.1      
[46] grid_4.2.1        rstudioapi_0.14   sys_3.4.2         htmlwidgets_1.5.4 gtable_0.3.1     
[51] curl_5.0.1        R6_2.5.1          knitr_1.40        dplyr_1.0.10      fastmap_1.1.1    
[56] utf8_1.2.2        parallel_4.2.1    Rcpp_1.0.11       vctrs_0.4.1       tidyselect_1.1.2 
[61] xfun_0.33 

This could have been there a while but intermittent if its exactly when the paging == page_size.

Will have a look through here to see if anything has changed recently https://cloud.google.com/storage/docs/json_api/v1/objects/list

This looks different:

Returns results in a directory-like mode, with / being a common value for the delimiter.

    items[] contains object metadata for objects whose names do not contain delimiter, or whose names only have instances of delimiter in their prefix.
    prefixes[] contains truncated object names for objects whose names contain delimiter after any prefix. Object names are truncated beyond the first applicable instance of the delimiter, mimicking a directory. If multiple objects have the same truncated name, duplicates are omitted. Truncated object names in prefixes[] always end with /.

Must be set to / when used with the matchGlob parameter to filter results in a directory-like mode.

For what it's worth, I just tested this for ropensci/targets#1172 using version 0.7.0, and gcs_list_objects() worked fine on my end even when there were exactly 1000 objects. Maybe somebody already solved this on the Google Cloud API end?