[BUG] error in data reading

Question

[BUG] error in data reading

SanderDevisscher opened this issue a year ago · comments

Describe the bug
Er klopt nog iets niet. Het lukt me om de docker te bouwen maar volgens de lokale versie van de app is er maar data tot 31/12/2022 beschikbaar. Maar als ik kijk naar de UAT op faunabeheer-dev.inbo.be zie ik dat er data tot 05/08/2023 beschikbaar is. Ik vermoed dat de huidige versie de data uit de foute bron inleest.

To Reproduce
Steps to reproduce the behavior:

Build public docker locally
goto localhost
goto faunabeheer-dev.inbo.be
compare dates from both & See error

Expected behavior
Both apps are built from same datasources

Screenshots

Local version (0.3.0)

UAT version (0.2.9)

Additional context
Current UAT hash: 9c2fffa

Machteld Varewyck · Answer 1 · Tue Sep 12 2023 17:43:29 GMT+0800 (China Standard Time)

I'm not sure where the data for faunabeheer-dev.inbo.be comes from. It should be defined in the shinyproxy yaml which I don't have access to #386 (comment)

The data in bucket inbo-wbe-uat-data has indeed maximum date 31/12/2022.

library(reportingGrofwild)
setupS3()
ecoData <- loadRawData(type = "eco")
max(ecoData$afschot_datum, na.rm = TRUE)
[1] "2022-12-31"

Sander Devisscher · Answer 2 · Mon Sep 18 2023 18:55:16 GMT+0800 (China Standard Time)

@berthuygens @TheJenne18 can you guys look into the data source of faunabeheer-dev.inbo.be ?

Sander Devisscher · Answer 3 · Mon Sep 18 2023 20:40:01 GMT+0800 (China Standard Time)

@TheJenne18 verified the UAT app gets its data from the UAT bucket (& PRD from PRD bucket).

But I suspect both versions read a different file from the UAT bucket. There are currently 3 ecology files on the bucket, namely:

"rshiny_reporting_data_ecology.csv" with max(df$afschot_datum, na.rm = TRUE) == "2023-07-19" => no longer updated in favor of reportingGrofwild preprocessing function createRawData().
"rshiny_reporting_data_ecology_processed.csv" with max(df$afschot_datum, na.rm = TRUE) == "2023-08-05"
"rshiny_reporting_data_ecology_processed.RData" which I'm unable to load using aws.s3::s3load(). It allways returns NULL

I suspect the UAT reads from "rshiny_reporting_data_ecology_processed.csv" while the local docker file reads from "rshiny_reporting_data_ecology_processed.RData".

Sander Devisscher · Answer 4 · Mon Sep 18 2023 21:22:00 GMT+0800 (China Standard Time)

Additionally when running the code you provided (exclu. setupS3()) max(ecoData$afschot_datum, na.rm = TRUE) yields "2023-08-05" using reportingGrofwild v0.2.9 but the same code yields "2022-12-31" using reportingGrofwild v0.3.0. suggesting something substantial concerning loading of the data changed between versions.

Machteld Varewyck · Answer 5 · Wed Sep 20 2023 03:47:46 GMT+0800 (China Standard Time)

Thanks for explaining, now I see the problem (I think so)

The function createRawData takes as input file "rshiny_reporting_data_ecology.csv" and outputs "rshiny_reporting_data_ecology_processed.RData"

So the problem should be fixed if you rename your input file to rshiny_reporting_data_ecology.csv and run the createRawData() function again.

I suspect similar changes are needed to the other input raw data files, listed here

Machteld Varewyck · Answer 6 · Wed Sep 20 2023 03:47:56 GMT+0800 (China Standard Time)

On a side note

3. "rshiny_reporting_data_ecology_processed.RData" which I'm unable to load using aws.s3::s3load(). It allways returns NULL

Each .RData file contains an object called rawData which can then be assigned to another variable depending on the data type.
So, this should work:

aws.s3::s3load(bucket = "inbo-wbe-uat-data", object = "rshiny_reporting_data_ecology_processed.RData")
dim(rawData)

ecoData <- rawData
dim(ecoData)

Sander Devisscher · Answer 7 · Wed Sep 20 2023 12:58:46 GMT+0800 (China Standard Time)

Using reportingGrofwild v0.3.0 i get some errors when updating wnm en spread data. I'll add the error messages after fieldwork.

Sander Devisscher · Answer 8 · Wed Sep 20 2023 18:30:21 GMT+0800 (China Standard Time)

when running createSpreadData() I get the following error right after loading "Municipalities_ModelOutput_toekomst_verspreiding_2023.shp":

Error in `[.data.frame`(x@data, i, j, ..., drop = FALSE) :
undefined columns selected

When running

createRawData(dataDir = tempdir_wnm,
                type = "waarnemingen",
                bucket = "inbo-wbe-uat-data")

I get the following error:

Error in `[.data.table`(x, i, which = TRUE) : 
  When i is a data.table (or character vector), the columns to join by must be specified using 'on=' argument (see ?data.table), by keying x (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing column names between x and i (i.e., a natural join). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.

running both functions with reportingGrofwild 0.2.9 results in no errors.

Machteld Varewyck · Answer 9 · Thu Sep 21 2023 13:52:41 GMT+0800 (China Standard Time)

Could you share the input files you use for these two? With the files I have it works.

Sander Devisscher · Answer 10 · Mon Sep 25 2023 16:55:53 GMT+0800 (China Standard Time)

These are the input files I used:

Machteld Varewyck · Answer 11 · Thu Sep 28 2023 15:32:50 GMT+0800 (China Standard Time)

Sorry for the delay.

For waarnemingen: I had to update the code to fix the error.
For spread data: The files in the zip folder are named differently from before. E.g. new "Model_output_Pixels" versus old "Pixels_ModelOutput_toekomst_verspr". @SanderDevisscher I can update the code for this unless you will use the old names in the future?

For both: I've updated the data in the UAT bucket wrt the files you shared

Sander Devisscher · Answer 12 · Fri Sep 29 2023 15:34:57 GMT+0800 (China Standard Time)

Sorry for the delay.

For waarnemingen: I had to update the code to fix the error.

I'll test the changes.

For spread data: The files in the zip folder are named differently from before. E.g. new "Model_output_Pixels" versus old "Pixels_ModelOutput_toekomst_verspr". @SanderDevisscher I can update the code for this unless you will use the old names in the future?

No need I'll change the filenames on our side.

For both: I've updated the data in the UAT bucket wrt the files you shared

Sander Devisscher · Answer 13 · Fri Sep 29 2023 16:52:57 GMT+0800 (China Standard Time)

Updating waarnemingen works as it should.

My Sessioninfo:

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=Dutch_Belgium.utf8  LC_CTYPE=Dutch_Belgium.utf8    LC_MONETARY=Dutch_Belgium.utf8 LC_NUMERIC=C                   LC_TIME=Dutch_Belgium.utf8    

time zone: Europe/Brussels
tzcode source: internal

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] testthat_3.1.10         sp_2.0-0                reportingGrofwild_0.3.0 lubridate_1.9.2.9000    forcats_1.0.0           stringr_1.5.0          
 [7] dplyr_1.1.2             purrr_1.0.1             readr_2.1.4             tidyr_1.3.0             tibble_3.2.1            ggplot2_3.4.3          
[13] tidyverse_2.0.0         janitor_2.2.0           aws.s3_0.3.21           svDialogs_1.1.0        

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3    rstudioapi_0.15.0     jsonlite_1.8.7        magrittr_2.0.3        rmarkdown_2.24        fs_1.6.2              vctrs_0.6.2          
  [8] memoise_2.0.1         config_0.3.1          base64enc_0.1-3       webshot_0.5.5         htmltools_0.5.5       usethis_2.1.6         curl_5.0.2           
 [15] sass_0.4.7            KernSmooth_2.23-20    bslib_0.5.1           desc_1.4.2            htmlwidgets_1.6.2     plyr_1.8.8            plotly_4.10.2        
 [22] cachem_1.0.8          conflicted_1.2.0      mime_0.12             lifecycle_1.0.3       pkgconfig_2.0.3       Matrix_1.6-0          R6_2.5.1             
 [29] fastmap_1.1.1         shiny_1.7.5           snakecase_0.11.0      digest_0.6.31         showtext_0.9-6        colorspace_2.1-0      shinycssloaders_1.0.0
 [36] ps_1.7.5              rprojroot_2.0.3       pkgload_1.3.2.1       aws.signature_0.6.0   crosstalk_1.2.0       fansi_1.0.4           timechange_0.2.0     
 [43] httr_1.4.7            mgcv_1.8-42           compiler_4.3.0        proxy_0.4-27          remotes_2.4.2.1       bit64_4.0.5           withr_2.5.0          
 [50] DBI_1.1.3             pkgbuild_1.4.0        aws.ec2metadata_0.2.0 sessioninfo_1.2.2     leaflet_2.1.2         classInt_0.4-9        units_0.8-3          
 [57] odbc_1.3.4            httpuv_1.6.11         glue_1.6.2            callr_3.7.3           nlme_3.1-162          promises_1.2.0.1      grid_4.3.0           
 [64] sf_1.0-14             reshape2_1.4.4        generics_0.1.3        gtable_0.3.4          tzdb_0.4.0            class_7.3-21          data.table_1.14.8    
 [71] hms_1.1.3             xml2_1.3.5            utf8_1.2.3            pillar_1.9.0          vroom_1.6.3           later_1.3.1           splines_4.3.0        
 [78] lattice_0.21-8        showtextdb_3.0        bit_4.0.5             tidyselect_1.2.0      miniUI_0.1.1.1        knitr_1.43            xfun_0.40            
 [85] devtools_2.4.5        brio_1.1.3            DT_0.29               stringi_1.7.12        lazyeval_0.2.2        yaml_2.3.7            geojsonsf_2.0.3      
 [92] evaluate_0.21         cli_3.6.1             xtable_1.8-4          munsell_0.5.0         processx_3.8.1        jquerylib_0.1.4       svGUI_1.0.1          
 [99] Rcpp_1.0.10           parallel_4.3.0        ellipsis_0.3.2        assertthat_0.2.1      blob_1.2.4            prettyunits_1.1.1     profvis_0.3.8        
[106] urlchecker_1.0.1      flexdashboard_0.6.2   INBOtheme_0.5.9       viridisLite_0.4.2     scales_1.2.1          fortunes_1.5-4        sysfonts_0.8.8       
[113] e1071_1.7-13          crayon_1.5.2          rlang_1.1.1           waldo_0.5.1

Sander Devisscher · Answer 14 · Tue Oct 03 2023 13:46:46 GMT+0800 (China Standard Time)

@mvarewyck any idea which package is the culprit ?

Machteld Varewyck · Answer 15 · Wed Oct 04 2023 23:33:31 GMT+0800 (China Standard Time)

I'll need to investigate more. I now get another error

Reading layer `Model_output_Municipalities_2023' from data source 
  `/home/mvarewyck/git/reporting-rshiny-grofwildjacht/data/Model_output_Municipalities_2023.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 308 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2.541329 ymin: 50.68749 xmax: 5.911109 ymax: 51.50511
Geodetic CRS:  GCS_WGS_84_with_axis_order_normalized_for_visualization
Error: arguments have different crs

Sander Devisscher · Answer 16 · Wed Oct 11 2023 19:58:30 GMT+0800 (China Standard Time)

A similar issue (different data availability between 0.3.0 & 0.2.9, the later is correct & up to date, as described in #414 (comment)) occurs with Wildschade data.

What is the correct name of the wildschade_georef - file used in createRawData()? I now provide WildSchade_georef.csv

Machteld Varewyck · Answer 17 · Tue Oct 17 2023 23:50:26 GMT+0800 (China Standard Time)

A similar issue (different data availability between 0.3.0 & 0.2.9, the later is correct & up to date, as described in #414 (comment)) occurs with Wildschade data.

What is the correct name of the wildschade_georef - file used in createRawData()? I now provide WildSchade_georef.csv

WildSchade_georef.csv is the correct name.

Machteld Varewyck · Answer 18 · Wed Oct 18 2023 17:07:22 GMT+0800 (China Standard Time)

I've made spatialData a compulsory function argument to preprocess the spread data. This fixes #414 (comment)

readS3(file = "spatialData_sf.RData")
createSpreadData(spatialData = spatialData)

Machteld Varewyck · Answer 19 · Wed Oct 18 2023 17:07:58 GMT+0800 (China Standard Time)

A similar issue (different data availability between 0.3.0 & 0.2.9, the later is correct & up to date, as described in #414 (comment)) occurs with Wildschade data.
What is the correct name of the wildschade_georef - file used in createRawData()? I now provide WildSchade_georef.csv

WildSchade_georef.csv is the correct name.

@SanderDevisscher If you send me the raw data file for wildschade, I'll have a look at it.

Sander Devisscher · Answer 20 · Thu Oct 19 2023 18:25:05 GMT+0800 (China Standard Time)

filesize of raw data exceeds 25mb so I had to wetransfer the file. You should have received a mail.